PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation

Illustration of PR-MaGIC.

PR-MaGIC iteratively refines prompts for segmentation by updating the embedding vector distribution ρ_t with mask decoder gradient flow.

This process minimizes the KL divergence between ρ_t and the target embedding vector distribution μ derived from the ground truth mask.

At each iteration t, given an initial set of prompts and their corresponding segmentation masks, the embedding vector z_t is updated along the gradient flow derived from the mask decoder, which in turn updates ρ_t.

(a) Support Image

(b) PerSAM-F

(d) PR-MaGIC (Ours)

(e) Ground Truth

Example of prompts (green dots) and segmentation results (red masks). (b) & (c) show misaligned prompts and degraded segmentation from PerSAM-F and Matcher, while (d) PR-MaGIC successfully captures the full extent of the elephant.

Abstract

Visual Foundation Models (VFMs) such as the Segment Anything Model (SAM) have significantly advanced broad use of image segmentation. However, SAM and its variants necessitate substantial manual effort for prompt generation and additional training for specific applications. Recent approaches address these limitations by integrating SAM into in-context (one/few-shot) segmentation, enabling auto-prompting through semantic alignment between query and support images. Despite these efforts, they still generate sub-optimal prompts that degrade segmentation quality due to visual inconsistencies between support and query images.

To tackle this limitation, we introduce PR-MaGIC (Prompt Refinement via Mask Decoder Gradient Flow for In-Context Segmentation), a training-free test-time framework that refines prompts via gradient flow derived from SAM's mask decoder. PR-MaGIC seamlessly integrates into in-context segmentation frameworks, being theoretically grounded yet practically stabilized through a simple top-1 selection strategy that ensures robust performance across samples. Extensive evaluations demonstrate that PR-MaGIC consistently improves segmentation quality across various benchmarks, effectively mitigating inadequate prompts without requiring additional training or architectural modifications.

Key Contributions

🧪
Training-free test-time refinement via gradient flow. A novel training-free refinement method that updates the query embedding with a mask-decoder-driven gradient flow and resamples prompts, improving segmentation quality without learnable parameters, architectural changes, or extra data.
🎯
Segmentation mask selection for robustness. We generate candidate masks across T iterations and propose a simple top-1 support–query similarity selection to choose the final mask, which safeguards against step-size sensitivity and sample-dependent instability.
📐
Convergence analysis under a proximity assumption. We provide a theoretical analysis showing that, when the initial embedding distribution is near a decoder-optimal neighborhood, the entropy-regularized KL gradient flow drives exponential convergence of the query embeddings.

Method

**Overview of PR-MaGIC.** The *Encoder* maps support and query images to embeddings. **(1) Image Embedding:** encoder maps the support and query images to embeddings *z^s₀* and *z^q₀*. **(2) Prompt Refinement via Gradient Flow:** using the mask decoder logit *d_φ*, query embeddings are updated and prompts *P_t+1* are resampled for *t = 0…T−1*, forming a candidate mask set. **(3) Segmentation Mask Selection (Top-1):** the mask whose mask-aware query embedding has the highest similarity to the support embedding is selected as the final output.

Image Embedding

The encoder extracts query embedding z^q₀ and support embedding z^s₀ from the query and support images.

Gradient Flow Refinement

Query embeddings are iteratively updated via mask-decoder-driven gradient flow. At each step, new prompts are sampled from the updated embeddings.

Top-1 Mask Selection

From T candidate masks, the one with the highest support–query similarity is selected as the final segmentation.

Entropy-Regularized KL Gradient Flow

PR-MaGIC minimizes the entropy-regularized KL-divergence between the target distribution μ and the candidate distribution ρ:

\(\displaystyle \min_{\rho}\, F_{\mu}(\rho) = \min_{\rho} \bigl\{ \mathrm{KL}(\mu \| \rho) - \gamma\,H(\rho) \bigr\}\)

At each iteration t, the query embedding v_t is updated as:

\(\displaystyle v_{t+1} = v_t + \eta\,\nabla_v d_{\phi}(v) + \sqrt{2\gamma\eta}\,\xi_t, \quad \xi_t \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\)

where η > 0 is the step size, φ denotes the mask decoder parameters, and γ controls entropy regularization. This theoretically supported flow drives query embeddings toward a decoder-optimal neighborhood, effectively mitigating sub-optimal initial prompts.

Quantitative Results

mIoU (%) on six benchmark datasets. B = Baseline, T = PR-MaGIC with Top-1 selection, O* = PR-MaGIC with Oracle selection. Bold indicates T values that improve over baseline.

Method	(a) Semantic Segmentation									(b) Part / Fine-grained Segmentation
Method	FSS-1000			COCO-20ⁱ			LVIS-92ⁱ			PACO-Part			PASCAL-Part			DIS5K
	B	T	O*	B	T	O*	B	T	O*	B	T	O*	B	T	O*	B	T	O*
PerSAM-F	58.41	67.19	72.45	44.64	46.83	51.74	42.37	44.48	47.29	39.60	40.72	43.39	42.72	43.87	46.43	46.82	49.99	53.46
Matcher (1-shot)	92.08	92.06	93.55	69.53	71.23	76.14	59.39	61.52	64.75	50.27	54.08	56.71	54.76	58.28	61.13	46.65	55.08	58.10
Matcher (5-shot)	93.26	93.41	94.32	67.69	70.74	74.88	57.14	60.79	63.88	48.66	53.15	55.30	54.54	59.27	61.55	—	—	—

Qualitative Results

PR-MaGIC (Refined) consistently produces more complete and accurate segmentations compared to the baselines (PerSAM-F and Matcher) across both semantic and part segmentation tasks.

Semantic Segmentation

PerSAM-F + PR-MaGIC

Support Image

Baseline (PerSAM-F)

Refined (Ours)

Ground Truth

Matcher 1-shot + PR-MaGIC

Support Image

Baseline (Matcher)

Refined (Ours)

Ground Truth

Part Segmentation

PerSAM-F + PR-MaGIC

Support Image

Baseline (PerSAM-F)

Refined (Ours)

Ground Truth

Matcher 1-shot + PR-MaGIC

Support Image

Baseline (Matcher)

Refined (Ours)

Ground Truth

BibTeX

@misc{lee2026prmagicpromptrefinementmask,
      title={PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation}, 
      author={Minjae Lee and Sungwoo Hur and Soojin Hwang and Won Hwa Kim},
      year={2026},
      eprint={2604.12113},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.12113}, 
}

Acknowledgment

This research was supported by RS-2025-02216257 (70%), RS-2022-II220290 (20%), and RS-2019-II191906 (AI Graduate Program at POSTECH, 10%).