PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow
For In-Context Segmentation

Minjae Lee*,  Sungwoo Hur*,  Soojin Hwang,  Won Hwa Kim
* Equal contribution
Pohang University of Science and Technology (POSTECH), Pohang, South Korea
{lalswo010, hursungwoo, soojin0622, wonhwa}@postech.ac.kr
CVPR 2026 Oral
PR-MaGIC Gradient Flow Overview

Illustration of PR-MaGIC.

PR-MaGIC iteratively refines prompts for segmentation by updating the embedding vector distribution ρt with mask decoder gradient flow.

This process minimizes the KL divergence between ρt and the target embedding vector distribution μ derived from the ground truth mask.

At each iteration t, given an initial set of prompts and their corresponding segmentation masks, the embedding vector zt is updated along the gradient flow derived from the mask decoder, which in turn updates ρt.

Support Image
(a) Support Image
PerSAM-F result
(b) PerSAM-F
Matcher result
(c) Matcher
PR-MaGIC result
(d) PR-MaGIC (Ours)
Ground Truth
(e) Ground Truth
Example of prompts (green dots) and segmentation results (red masks). (b) & (c) show misaligned prompts and degraded segmentation from PerSAM-F and Matcher, while (d) PR-MaGIC successfully captures the full extent of the elephant.

Abstract

Visual Foundation Models (VFMs) such as the Segment Anything Model (SAM) have significantly advanced broad use of image segmentation. However, SAM and its variants necessitate substantial manual effort for prompt generation and additional training for specific applications. Recent approaches address these limitations by integrating SAM into in-context (one/few-shot) segmentation, enabling auto-prompting through semantic alignment between query and support images. Despite these efforts, they still generate sub-optimal prompts that degrade segmentation quality due to visual inconsistencies between support and query images.

To tackle this limitation, we introduce PR-MaGIC (Prompt Refinement via Mask Decoder Gradient Flow for In-Context Segmentation), a training-free test-time framework that refines prompts via gradient flow derived from SAM's mask decoder. PR-MaGIC seamlessly integrates into in-context segmentation frameworks, being theoretically grounded yet practically stabilized through a simple top-1 selection strategy that ensures robust performance across samples. Extensive evaluations demonstrate that PR-MaGIC consistently improves segmentation quality across various benchmarks, effectively mitigating inadequate prompts without requiring additional training or architectural modifications.

Key Contributions

Method

Overview of PR-MaGIC
Overview of PR-MaGIC. The Encoder maps support and query images to embeddings. (1) Image Embedding: encoder maps the support and query images to embeddings zs0 and zq0. (2) Prompt Refinement via Gradient Flow: using the mask decoder logit dφ, query embeddings are updated and prompts Pt+1 are resampled for t = 0…T−1, forming a candidate mask set. (3) Segmentation Mask Selection (Top-1): the mask whose mask-aware query embedding has the highest similarity to the support embedding is selected as the final output.
1

Image Embedding

The encoder extracts query embedding zq0 and support embedding zs0 from the query and support images.

2

Gradient Flow Refinement

Query embeddings are iteratively updated via mask-decoder-driven gradient flow. At each step, new prompts are sampled from the updated embeddings.

3

Top-1 Mask Selection

From T candidate masks, the one with the highest support–query similarity is selected as the final segmentation.

Entropy-Regularized KL Gradient Flow

PR-MaGIC minimizes the entropy-regularized KL-divergence between the target distribution μ and the candidate distribution ρ:

\(\displaystyle \min_{\rho}\, F_{\mu}(\rho) = \min_{\rho} \bigl\{ \mathrm{KL}(\mu \| \rho) - \gamma\,H(\rho) \bigr\}\)

At each iteration t, the query embedding vt is updated as:

\(\displaystyle v_{t+1} = v_t + \eta\,\nabla_v d_{\phi}(v) + \sqrt{2\gamma\eta}\,\xi_t, \quad \xi_t \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\)

where η > 0 is the step size, φ denotes the mask decoder parameters, and γ controls entropy regularization. This theoretically supported flow drives query embeddings toward a decoder-optimal neighborhood, effectively mitigating sub-optimal initial prompts.

Quantitative Results

mIoU (%) on six benchmark datasets. B = Baseline, T = PR-MaGIC with Top-1 selection, O* = PR-MaGIC with Oracle selection. Bold indicates T values that improve over baseline.

Method (a) Semantic Segmentation (b) Part / Fine-grained Segmentation
FSS-1000 COCO-20i LVIS-92i PACO-Part PASCAL-Part DIS5K
BTO* BTO* BTO* BTO* BTO* BTO*
PerSAM-F 58.4167.1972.45 44.6446.8351.74 42.3744.4847.29 39.6040.7243.39 42.7243.8746.43 46.8249.9953.46
Matcher (1-shot) 92.0892.0693.55 69.5371.2376.14 59.3961.5264.75 50.2754.0856.71 54.7658.2861.13 46.6555.0858.10
Matcher (5-shot) 93.2693.4194.32 67.6970.7474.88 57.1460.7963.88 48.6653.1555.30 54.5459.2761.55

Qualitative Results

PR-MaGIC (Refined) consistently produces more complete and accurate segmentations compared to the baselines (PerSAM-F and Matcher) across both semantic and part segmentation tasks.

Semantic Segmentation
PerSAM-F + PR-MaGIC
Support
Support Image
PerSAM-F
Baseline (PerSAM-F)
Ours
Refined (Ours)
GT
Ground Truth
Matcher 1-shot + PR-MaGIC
Support
Support Image
Matcher
Baseline (Matcher)
Ours
Refined (Ours)
GT
Ground Truth
Part Segmentation
PerSAM-F + PR-MaGIC
Support
Support Image
PerSAM-F
Baseline (PerSAM-F)
Ours
Refined (Ours)
GT
Ground Truth
Matcher 1-shot + PR-MaGIC
Support
Support Image
Matcher
Baseline (Matcher)
Ours
Refined (Ours)
GT
Ground Truth

BibTeX

@misc{lee2026prmagicpromptrefinementmask,
      title={PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation}, 
      author={Minjae Lee and Sungwoo Hur and Soojin Hwang and Won Hwa Kim},
      year={2026},
      eprint={2604.12113},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.12113}, 
}

Acknowledgment

This research was supported by RS-2025-02216257 (70%), RS-2022-II220290 (20%), and RS-2019-II191906 (AI Graduate Program at POSTECH, 10%).