Paper deep dive
Toward Faithful Segmentation Attribution via Benchmarking and Dual-Evidence Fusion
Abu Noman Md Sakib, OFM Riaz Rahman Aranya, Kevin Desai, Zijie Zhang
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%
Last extracted: 3/26/2026, 1:36:58 AM
Summary
The paper introduces a reproducible benchmark for evaluating semantic segmentation attribution methods, focusing on intervention-based faithfulness, off-target leakage, and perturbation robustness. It proposes Dual-Evidence Attribution (DEA), a method that fuses gradient-based and region-level intervention signals to improve causal faithfulness while maintaining robustness, addressing the limitations of visual-only evaluation.
Entities (5)
Relation Signals (3)
Dual-Evidence Attribution → evaluatedon → PASCAL VOC
confidence 100% · We evaluate on Pascal VOC 2012 and SBD
Dual-Evidence Attribution → improves → Target Deletion Faithfulness
confidence 90% · DEA consistently improves deletion-based faithfulness over gradient-only baselines
Dual-Evidence Attribution → uses → DeepLabV3-ResNet50
confidence 90% · We use pretrained DeepLabV3-ResNet50... for each image
Cypher Suggestions (2)
Find all methods evaluated on a specific dataset · confidence 90% · unvalidated
MATCH (m:Method)-[:EVALUATED_ON]->(d:Dataset {name: 'Pascal VOC'}) RETURN m.nameIdentify metrics used to evaluate a specific method · confidence 85% · unvalidated
MATCH (m:Method {name: 'Dual-Evidence Attribution'})-[:EVALUATED_BY]->(metric:Metric) RETURN metric.nameAbstract
Abstract:Attribution maps for semantic segmentation are almost always judged by visual plausibility. Yet looking convincing does not guarantee that the highlighted pixels actually drive the model's prediction, nor that attribution credit stays within the target region. These questions require a dedicated evaluation protocol. We introduce a reproducible benchmark that tests intervention-based faithfulness, off-target leakage, perturbation robustness, and runtime on Pascal VOC and SBD across three pretrained backbones. To further demonstrate the benchmark, we propose Dual-Evidence Attribution (DEA), a lightweight correction that fuses gradient evidence with region-level intervention signals through agreement-weighted fusion. DEA increases emphasis where both sources agree and retains causal support when gradient responses are unstable. Across all completed runs, DEA consistently improves deletion-based faithfulness over gradient-only baselines and preserves strong robustness, at the cost of additional compute from intervention passes. The benchmark exposes a faithfulness-stability tradeoff among attribution families that is entirely hidden under visual evaluation, providing a foundation for principled method selection in segmentation explainability. Code is available at this https URL.
Tags
Links
- Source: https://arxiv.org/abs/2603.22624v1
- Canonical: https://arxiv.org/abs/2603.22624v1
Full Text
31,119 characters extracted from source content.
Expand or collapse full text
Toward Faithful Segmentation Attribution via Benchmarking and Dual-Evidence Fusion Abu Noman Md Sakib, OFM Riaz Rahman Aranya, Kevin Desai, Zijie Zhang The University of Texas at San Antonio abunomanmd.sakib, ofmriazrahman.aranya, kevin.desai, zijie.zhang@utsa.edu Abstract Attribution maps for semantic segmentation are almost always judged by visual plausibility. Yet looking convincing does not guarantee that the highlighted pixels actually drive the model’s prediction, nor that attribution credit stays within the target region. These questions require a dedicated evaluation protocol. We introduce a reproducible benchmark that tests intervention-based faithfulness, off-target leakage, perturbation robustness, and runtime on Pascal VOC and SBD across three pretrained backbones. To further demonstrate the benchmark, we propose Dual-Evidence Attribution (DEA), a lightweight correction that fuses gradient evidence with region-level intervention signals through agreement-weighted fusion. DEA increases emphasis where both sources agree and retains causal support when gradient responses are unstable. Across all completed runs, DEA consistently improves deletion-based faithfulness over gradient-only baselines and preserves strong robustness, at the cost of additional compute from intervention passes. The benchmark exposes a faithfulness–stability tradeoff among attribution families that is entirely hidden under visual evaluation, providing a foundation for principled method selection in segmentation explainability. Code is available at https://github.com/anmspro/DEA 1 Introduction Post-hoc attribution methods for deep neural networks identify which input regions are causally responsible for a given prediction. In image classification, gradient-based [49, 35, 5, 41, 30, 12, 37] and perturbation-based [45, 29, 16, 17] methods offer well-characterised trade-offs between spatial resolution, computational cost, and faithfulness. Semantic segmentation poses a distinct challenge: an attribution map must explain not only the presence of a target class but whether the model’s evidence is correctly localised within a predicted region without assigning importance to spatially disjoint areas. Despite this, segmentation attribution methods remain predominantly gradient-based CAM variants [49, 35, 39, 19], evaluated almost exclusively by visual plausibility. This criterion does not test causal faithfulness. Two questions remain unexamined: does occluding the highest-attributed pixels within the target region reduce the model’s confidence in that region, and do attribution maps assign substantial importance to pixels outside the target mask? A method that fails either test may still produce convincing heatmaps, as gradient activations can reflect feature co-occurrence rather than causal evidence [1]. We introduce a reproducible benchmark for segmentation attribution faithfulness, formalising two evaluation axes absent from prior work: target deletion faithfulness, measuring the causal dependence of region-level confidence on highest-attributed pixels, and absolute off-target leakage, quantifying attribution credit outside the target mask. Together with perturbation robustness and runtime, these axes enable principled multi-criteria comparison of segmentation attribution methods. We demonstrate the benchmark with Dual-Evidence Attribution (DEA), showing that it surfaces faithfulness differences invisible to visual inspection. Our contributions are as follows: • A reproducible segmentation attribution benchmark comprising intervention-based faithfulness tests, off-target leakage, perturbation robustness, and runtime profiling across three backbones on Pascal VOC and SBD. • Dual-Evidence Attribution (DEA), a lightweight dual-evidence correction fusing gradient and intervention signals, improving deletion faithfulness over gradient baselines. • All per-sample outputs and aggregation scripts are released for independent verification. 2 Related Work Gradient-based attribution for dense prediction. CAM [49] and Grad-CAM [35] produce class-discriminative maps by weighting activations with globally pooled gradients, but discard spatial information through average pooling. Grad-CAM++ [5], Score-CAM [41], Ablation-CAM [30], and HiResCAM [12] address different limitations of this pooling step. For segmentation, Vinogradova et al. [39] restricted the gradient signal to a masked target region (Seg-Grad-CAM), and Hasany et al. [19] further improved spatial specificity with elementwise weighting (Seg-XRes-CAM, our EGA baseline). Neither work evaluates causal faithfulness: both validate explanations by visual comparison rather than direct intervention tests. Perturbation and intervention-based attribution. Occlusion-based methods [45] estimate pixel importance by masking regions and measuring the change in model output, providing a direct causal signal at higher computational cost. RISE [29], meaningful perturbations [16], and extremal perturbations [17] extend this with randomised and optimised masking schemes. These methods are model-agnostic but their evaluation in segmentation remains limited to visual plausibility. Our RIA baseline and the intervention component of DEA adopt the same causal viewpoint within the segmentation evaluation loop. Broader attribution context. Beyond CAM-style heatmaps, integrated gradients [37], concept-based testing [24], and Shapley-value approximations [28] have shaped widely used desiderata for explanation quality, including sensitivity, implementation invariance, and completeness. As vision models expanded beyond standard CNNs [11, 43, 42, 22, 26], dedicated attention-based and propagation-based explanation methods followed [7, 6, 14, 40, 38, 32, 4], often revealing that attribution behaviour varies substantially across architectures. However, the majority of these methods and their evaluation protocols target image classification, leaving dense prediction tasks without comparable evaluation standards. These lines of work motivate the need for faithfulness evaluation that goes beyond visual plausibility and accounts for the spatial structure specific to segmentation. Faithfulness evaluation. Adebayo et al. [1] showed that many saliency methods produce outputs largely independent of learned weights. Hooker et al. [20] proposed ROAR, measuring faithfulness by accuracy degradation after retraining on data with top-attributed pixels removed. Yeh et al. [44] introduced infidelity and sensitivity criteria. These works establish that visual plausibility is insufficient [2, 31, 15, 3, 25, 23, 36, 8, 33, 34, 47], but operate in the classification setting. ROAR requires retraining, and none address off-target leakage specific to dense prediction. Our benchmark instantiates inference-time deletion and leakage tests directly within the segmentation loop, requiring no retraining and explicitly accounting for the spatial structure of dense prediction targets. Figure 1: Overview of DEA. Elementwise gradient evidence (EGA) and region intervention evidence (RIA) are combined through multiplicative agreement and residual intervention support. 3 Method We first define the region-level attribution objective for semantic segmentation, then introduce the dual-evidence correction used in DEA, and finally specify the evaluation metrics. Figure 1 illustrates the pipeline. 3.1 Problem Setup Given an input image x∈ℝ3×H×Wx ^3× H× W and a pretrained segmentation model f, we study explanations for a target class c and target mask M∈0,1H×WM∈\0,1\^H× W. Let z=f(x)z=f(x) be per-pixel logits and let pc(x)p_c(x) denote the softmax probability map for class c. We evaluate evidence at the region level through sc(x,M)=∑u,vMuvpc(x)uv∑u,vMuv+ϵ,s_c(x,M)= _u,vM_uv\,p_c(x)_uv _u,vM_uv+ε, (1) which is the masked mean class probability inside the target region. An attribution method outputs a heatmap A∈[0,1]H×WA∈[0,1]^H× W, and we test whether high-valued pixels in A are causally important for sc(x,M)s_c(x,M). 3.2 Dual-Evidence Attribution We compare three base attributions that expose complementary behavior. GPA uses gradient pooling at the selected feature layer, EGA uses elementwise gradient-activation products at the same layer, and RIA computes intervention deltas by masking fixed grid regions and measuring the corresponding drop in (1). All maps are min-max normalized to [0,1][0,1]. Let AgA_g be the EGA map and ArA_r the RIA map. Our corrected attribution is Af=αAg⊙(1+βAr)+(1−α)Ar,A_f=α A_g (1+β A_r)+(1-α)A_r, (2) where (α,β)(α,β) control the balance between fine gradient structure and intervention support, and ⊙ denotes elementwise multiplication. The multiplicative term increases weight on pixels where both sources agree, while the residual ArA_r term keeps coarse but causally supported evidence when gradient responses are unstable. 3.3 Metrics For a heatmap A, let St(A,M,k)S_t(A,M,k) be the top-k fraction of pixels within M, and let So(A,M,k)S_o(A,M,k) be the top-k fraction outside M. Using mean-value occlusion operator (x,S)O(x,S), we define target deletion drop TDD=sc(x,M)−sc((x,St),M)|sc(x,M)|+ϵ,TDD= s_c(x,M)-s_c(O(x,S_t),M)|s_c(x,M)|+ε, (3) and off-target deletion drop ODD=sc(x,M)−sc((x,So),M)|sc(x,M)|+ϵ.ODD= s_c(x,M)-s_c(O(x,S_o),M)|s_c(x,M)|+ε. (4) Large TDD indicates that attributed target pixels are causally important. Large |ODD||ODD| indicates undesired sensitivity to high-valued pixels outside the target mask. We summarize this tradeoff with LeakAbs=|ODD||TDD|+ϵ.LeakAbs= |ODD||TDD|+ε. (5) We also report target insertion gain by starting from a mean-value baseline image and reinserting St(A,M,k)S_t(A,M,k), perturbation robustness as the average correlation between original and perturbed heatmaps (noise, brightness, contrast, blur, horizontal flip), and wall-clock runtime per explanation. We report absolute off-target metrics for primary ranking and keep the signed leakage ratio as a diagnostic statistic. Figure 2: Representative success cases where DEA improves target-region faithfulness while preserving spatial focus. 4 Experimental Setup 4.1 Datasets and Models We evaluate on Pascal VOC 2012 and SBD [13, 18]. Images and masks are resized to 224×224224× 224 and processed with the TorchVision segmentation pipeline. We use pretrained DeepLabV3-ResNet50 [9, 10], FCN-ResNet50 [27], and LRASPP-MobileNetV3 [21]. These choices cover canonical dense-prediction design families used in modern semantic segmentation pipelines [46, 48, 22, 26]. For each image, we choose a single evaluation target class as the most frequent foreground label in the ground-truth mask (excluding background and ignore label), then define M as that class mask. This protocol avoids degenerate empty targets and makes per-sample comparisons consistent across methods. Figure 3: Mechanistic decomposition of DEA: elementwise gradient map, region intervention map, interaction, and corrected output (single case). 4.2 Settings The compared methods are Gradient-Pooled Attribution (GPA, corresponding to Seg-Grad-CAM [39]), Elementwise Gradient Attribution (EGA, corresponding to Seg-XRes-CAM [19]), Region Intervention Attribution (RIA), and DEA. The top-k fraction used in deletion and insertion tests is k=0.2k=0.2. Region-intervention methods (RIA and DEA) use grid size 14 by default. For DEA, unless noted otherwise, we use α=0.65α=0.65 and β=0.35β=0.35 from the benchmark implementation. Ablation runs with varied α and β confirm that deletion faithfulness remains above EGA for α∈[0.5,0.8]α∈[0.5,0.8]. Robustness is evaluated with perturbation strength 0.030.03 over additive noise, brightness shift, contrast change, Gaussian blur, and horizontal flip. Runtime is measured as wall-clock time per explanation call in the same evaluation loop. For SBD, we aggregate all completed runs: six core runs (three backbones, two seeds each), three extra runs, and two DeepLab ablation runs with altered evidence-mixing and grid settings. For VOC, we aggregate six core runs (three backbones, two seeds each). 5 Results 5.1 Quantitative Results Across completed runs, RIA reaches the strongest deletion faithfulness and the lowest absolute off-target drop, while EGA remains best in stability and latency. DEA lies between these two ends of the tradeoff curve and consistently improves deletion faithfulness over both gradient baselines in every completed run (over EGA: SBD 11/11, VOC 6/6; over GPA: SBD 9/9, VOC 6/6). Detailed mean and standard deviation values are provided in the appendix in Secs. A.2 and 1. The tradeoffs are easiest to read relative to EGA, which is the strongest gradient-only baseline in robustness. DEA increases target-region deletion faithfulness and reduces off-target absolute drop versus EGA on both datasets, while remaining slower because intervention passes dominate compute. Relative to RIA, DEA gives up some absolute faithfulness but recovers substantial robustness. This supports a scoped claim: DEA is a practical correction when intervention-aligned faithfulness is needed without fully adopting the least stable intervention baseline. Target insertion gain follows a different ordering and tends to favor broader maps, with GPA highest on both datasets in our aggregates, and full insertion values are reported in the appendix in Secs. A.2 and 1. We therefore treat insertion as a complementary diagnostic and avoid using it as a single ranking criterion. 5.2 Qualitative Results Fig. 2 shows the qualitative behavior behind the aggregate metrics and makes the deletion and leakage tradeoff visually explicit. Across success cases, DEA suppresses broad low-confidence context activation that appears in gradient-only maps, while preserving contiguous object structure inside the target region. Compared with RIA, the corrected map typically keeps sharper intra-object detail and avoids the block-like over-smoothing introduced by coarse intervention regions. Fig. 3 explains this behavior at the mechanism level. The interaction term acts as a gate that promotes pixels supported by both evidence streams and attenuates pixels favored by only one stream. In the selected mechanistic cases generated by our pipeline, agreement mass is concentrated on target pixels with near-zero off-target overlap, and the final map is consistently tighter than the raw intervention map while remaining less noisy than the gradient map. Additional failure-mode analysis, including thin-boundary under-coverage and clutter-driven residual leakage, is provided in the appendix in Secs. A.1 and 4. 6 Discussion The central finding is not that DEA dominates every axis; it does not. The robust conclusion is that DEA reliably shifts gradient-based attribution toward intervention-validated faithfulness, at the predictable cost of intervention-level compute. This is useful in evaluation-heavy settings where explanation quality is more important than millisecond latency, and less attractive for strict real-time constraints where EGA is still preferable. Two limitations remain important. First, our uncertainty reporting currently uses run-level variability, not formal hypothesis tests, so claims should be interpreted as strong empirical trends rather than definitive significance statements. Second, intervention maps are built from fixed grids, which can miss thin structures and contribute to residual leakage in difficult scenes. 7 Conclusion We introduced a reproducible benchmark for segmentation attribution and a dual-evidence correction that combines high-resolution gradient structure with intervention support. Across completed SBD and VOC runs, DEA consistently improves deletion-based faithfulness over gradient baselines while preserving high robustness, though it remains much slower than pure gradient methods because intervention passes dominate compute. The empirical picture is intentionally scoped. DEA is best viewed as a correction that moves gradient attribution toward intervention-aligned behavior, not as a universal best method across all axes. In our aggregates, pure intervention attribution still leads on absolute faithfulness metrics, while EGA remains strongest on speed and stability. Future work should add formal significance testing, denser and adaptive intervention schemes for boundary-sensitive regions, and faster region-level evaluation so intervention-grounded attribution becomes practical in larger-scale and lower-latency settings. References [1] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim (2018) Sanity checks for saliency maps. Advances in neural information processing systems 31. Cited by: §1, §2. [2] N. Bansal, C. Agarwal, and A. Nguyen (2020) SAM: the sensitivity of attribution methods to hyperparameters. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 8670–8680. Cited by: §2. [3] H. Behzadi-Khormouji and J. Oramas (2023) A protocol for evaluating model interpretation methods from visual explanations. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), p. 1421–1429. Cited by: §2. [4] I. Benou and T. R. Raviv (2025) Show and tell: visually explainable deep neural nets via spatially-aware concept bottleneck models. In 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 30063–30072. Cited by: §2. [5] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), p. 839–847. Cited by: §1, §2. [6] H. Chefer, S. Gur, and L. Wolf (2021) Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), p. 397–406. Cited by: §2. [7] H. Chefer, S. Gur, and L. Wolf (2021) Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 782–791. Cited by: §2. [8] J. Chen, L. Song, M. Wainwright, and M. Jordan (2018) Learning to explain: an information-theoretic perspective on model interpretation. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, p. 883–892. Cited by: §2. [9] L. Chen, G. Papandreou, F. Schroff, and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: §4.1. [10] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), p. 801–818. Cited by: §4.1. [11] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby (2021) An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), Cited by: §2. [12] R. Draelos and L. Carin (2020) Use hirescam instead of grad-cam for faithful explanations of convolutional neural networks. arXiv preprint arXiv:2011.08891. Cited by: §1, §2. [13] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2), p. 303–338. Cited by: §4.1. [14] T. Fel, A. Picard, L. Bethune, T. Boissin, D. Vigouroux, J. Colin, R. Cadene, and T. Serre (2023) CRAFT: concept recursive activation factorization for explainability. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 2711–2721. Cited by: §2. [15] T. Fel, D. Vigouroux, R. Cadene, and T. Serre (2022) How good is your explanation? algorithmic stability measures to assess the quality of explanations for deep neural networks. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), p. 1565–1575. Cited by: §2. [16] R. C. Fong and A. Vedaldi (2017) Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE international conference on computer vision, p. 3429–3437. Cited by: §1, §2. [17] R. Fong, M. Patrick, and A. Vedaldi (2019) Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE/CVF international conference on computer vision, p. 2950–2958. Cited by: §1, §2. [18] B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik (2011) Semantic contours from inverse detectors. In 2011 international conference on computer vision, p. 991–998. Cited by: §4.1. [19] S. N. Hasany, C. Petitjean, and F. Mériaudeau (2023) Seg-xres-cam: explaining spatially local regions in image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 3733–3738. Cited by: §1, §2, §4.2. [20] S. Hooker, D. Erhan, P. Kindermans, and B. Kim (2019) A benchmark for interpretability methods in deep neural networks. Advances in neural information processing systems 32. Cited by: §2. [21] A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al. (2019) Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, p. 1314–1324. Cited by: §4.1. [22] J. Jain, J. Li, M. Chiu, A. Hassani, N. Orlov, and H. Shi (2023) OneFormer: one transformer to rule universal image segmentation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 2989–2998. Cited by: §2, §4.1. [23] A. Kapishnikov, T. Bolukbasi, F. Viegas, and M. Terry (2019) XRAI: better attributions through regions. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), p. 4947–4956. Cited by: §2. [24] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, et al. (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). In International conference on machine learning, p. 2668–2677. Cited by: §2. [25] S. S. Kim, N. Meister, V. V. Ramaswamy, R. Fong, and O. Russakovsky (2022) HIVE: evaluating the human interpretability of visual explanations. In European Conference on Computer Vision, p. 280–298. Cited by: §2. [26] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Lo, P. Dollar, and R. Girshick (2023) Segment anything. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), p. 3992–4003. Cited by: §2, §4.1. [27] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3431–3440. Cited by: §4.1. [28] S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30. Cited by: §2. [29] V. Petsiuk, A. Das, and K. Saenko (2018) RISE: randomized input sampling for explanation of black-box models. In BMVC, Cited by: §1, §2. [30] H. G. Ramaswamy et al. (2020) Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In proceedings of the IEEE/CVF winter conference on applications of computer vision, p. 983–991. Cited by: §1, §2. [31] S. Rao, M. Bohle, and B. Schiele (2022) Towards better understanding attribution methods. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 10213–10222. Cited by: §2. [32] S. Rao, S. Mahajan, M. Bohle, and B. Schiele (2024) Discover-then-name: task-agnostic concept bottlenecks via automated concept discovery. In European Conference on Computer Vision (ECCV), p. 444–461. Cited by: §2. [33] M. T. Ribeiro, S. Singh, and C. Guestrin (2018) Anchors: high-precision model-agnostic explanations. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). Cited by: §2. [34] A. S. Ross, M. C. Hughes, and F. Doshi-Velez (2017) Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, p. 2662–2670. Cited by: §2. [35] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, p. 618–626. Cited by: §1, §2. [36] S. Srinivas and F. Fleuret (2019) Full-gradient representation for neural network visualization. Advances in neural information processing systems 32. Cited by: §2. [37] M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. In International conference on machine learning, p. 3319–3328. Cited by: §1, §2. [38] A. Tan, F. Zhou, and H. Chen (2024) Explain via any concept: concept bottleneck model with open vocabulary concepts. In European Conference on Computer Vision (ECCV), p. 123–138. Cited by: §2. [39] K. Vinogradova, A. Dibrov, and G. Myers (2020) Towards interpretable semantic segmentation via gradient-weighted class activation mapping (student abstract). In Proceedings of the AAAI conference on artificial intelligence, Vol. 34, p. 13943–13944. Cited by: §1, §2, §4.2. [40] B. Wang, L. Li, Y. Nakashima, and H. Nagahara (2023) Learning bottleneck concepts in image classification. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 10962–10971. Cited by: §2. [41] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu (2020) Score-cam: score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, p. 24–25. Cited by: §1, §2. [42] W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, X. Wang, and Y. Qiao (2023) InternImage: exploring large-scale vision foundation models with deformable convolutions. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 14408–14419. Cited by: §2. [43] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie (2023) ConvNeXt v2: co-designing and scaling convnets with masked autoencoders. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 16133–16142. Cited by: §2. [44] C. Yeh, C. Hsieh, A. Suggala, D. I. Inouye, and P. K. Ravikumar (2019) On the (in) fidelity and sensitivity of explanations. Advances in neural information processing systems 32. Cited by: §2. [45] M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. In European conference on computer vision, p. 818–833. Cited by: §1, §2. [46] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017) Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 2881–2890. Cited by: §4.1. [47] W. Zhao, S. Oyama, and M. Kurihara (2020) Generating natural counterfactual visual explanations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, p. 5204–5205. Cited by: §2. [48] S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr, et al. (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 6881–6890. Cited by: §4.1. [49] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba (2016) Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 2921–2929. Cited by: §1, §2. Appendix A Appendix A.1 Additional Qualitative Cases Failure cases are shown in Figure 4. Two recurring patterns appear. First, thin structures and weak boundaries produce under-coverage even when the target object is partially highlighted. Second, highly textured co-occurring regions can attract non-trivial activation when repeatedly reinforced by intervention responses. These failure modes explain the residual off-target emphasis in difficult scenes and motivate future work on adaptive region partitioning. Figure 4: Representative SBD failure cases under the same comparison pipeline used for main-text figures. Residual off-target activation remains in cluttered contexts, and fine boundary detail can be missed on thin target structures. A.2 Quantitative Details Table 1 reports method performance as mean and standard deviation across run-level means, so each completed run contributes equally to the aggregate (11 SBD runs, 6 VOC runs). The insertion-gain column clarifies the complementary behaviour noted in the main text: broader maps score higher on insertion even when less selective under deletion and leakage diagnostics. The VOC EGA outlier in off-target absolute drop (0.982±1.2280.982± 1.228) is driven by the DeepLabV3 backbone in both seeds, where per-run values reach approximately 2.5652.565, while FCN and LRASPP runs remain much lower (∼0.265 0.265 and ∼0.116 0.116 respectively). This reflects a backbone-specific concentration effect rather than a single-seed anomaly. Table 1: Completed aggregates reported as mean ± std across runs. Higher is better for TDD, insertion gain, and stability; lower is better for off-target absolute drop. Data Method TDD ↑ OT abs ↓ Ins. ↑ Stab. ↑ SBD GPA .259±.021.259 ±.021 .748±.736.748 ±.736 .281±.196.281 ±.196 .883±.006.883 ±.006 EGA .223±.027.223 ±.027 .268±.302.268 ±.302 .226±.199.226 ±.199 .978±.004.978 ±.004 RIA .453±.021.453 ±.021 .177±.125.177 ±.125 .097±.037.097 ±.037 .827±.010.827 ±.010 DEA .381±.030.381 ±.030 .235±.134.235 ±.134 .114±.047.114 ±.047 .959±.009.959 ±.009 VOC GPA .270±.044.270 ±.044 .281±.133.281 ±.133 .123±.094.123 ±.094 .894±.004.894 ±.004 EGA .287±.048.287 ±.048 .982±1.23.982 ± 1.23 .060±.009.060 ±.009 .978±.003.978 ±.003 RIA .522±.050.522 ±.050 .249±.184.249 ±.184 .031±.013.031 ±.013 .821±.001.821 ±.001 DEA .449±.043.449 ±.043 .398±.313.398 ±.313 .074±.017.074 ±.017 .961±.004.961 ±.004