Paper deep dive

Interpreting CLIP with Hierarchical Sparse Autoencoders

Vladimir Zaigrajew, Hubert Baniecki, Przemyslaw Biecek

Year: 2025Venue: ICML 2025Area: Mechanistic Interp.Type: EmpiricalEmbeddings: 223

Models: CLIP

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 96%

Last extracted: 3/12/2026, 6:49:27 PM

Summary

The paper introduces Matryoshka Sparse Autoencoder (MSAE), a novel architecture for interpreting vision-language models like CLIP. By applying multiple TopK operations with increasing sparsity thresholds during training, MSAE learns hierarchical representations at multiple granularities. This approach optimizes the Pareto frontier between reconstruction quality and sparsity, outperforming traditional L1-regularized and standard TopK SAEs, and enables effective concept-based interpretability and bias analysis.

Entities (5)

CLIP · model · 100%MSAE · architecture · 100%Sparse Autoencoders · methodology · 100%CC3M · dataset · 95%CelebA · dataset · 95%

Relation Signals (3)

MSAE → interprets → CLIP

confidence 100% · we demonstrate the utility of MSAE as a tool for interpreting and controlling CLIP

MSAE → improvesupon → Sparse Autoencoders

confidence 95% · MSAE establishes a new state-of-the-art Pareto frontier between reconstruction quality and sparsity

MSAE → usedon → CelebA

confidence 90% · perform concept-based similarity search and bias analysis in downstream tasks like CelebA

Cypher Suggestions (2)

Find all models interpreted by MSAE · confidence 90% · unvalidated

MATCH (a:Architecture {name: 'MSAE'})-[:INTERPRETS]->(m:Model) RETURN m.name

List datasets used for evaluation · confidence 90% · unvalidated

MATCH (a:Architecture {name: 'MSAE'})-[:EVALUATED_ON]->(d:Dataset) RETURN d.name

Abstract

Abstract:Sparse autoencoders (SAEs) are useful for detecting and steering interpretable features in neural networks, with particular potential for understanding complex multimodal representations. Given their ability to uncover interpretable features, SAEs are particularly valuable for analyzing large-scale vision-language models (e.g., CLIP and SigLIP), which are fundamental building blocks in modern systems yet remain challenging to interpret and control. However, current SAE methods are limited by optimizing both reconstruction quality and sparsity simultaneously, as they rely on either activation suppression or rigid sparsity constraints. To this end, we introduce Matryoshka SAE (MSAE), a new architecture that learns hierarchical representations at multiple granularities simultaneously, enabling a direct optimization of both metrics without compromise. MSAE establishes a new state-of-the-art Pareto frontier between reconstruction quality and sparsity for CLIP, achieving 0.99 cosine similarity and less than 0.1 fraction of variance unexplained while maintaining ~80% sparsity. Finally, we demonstrate the utility of MSAE as a tool for interpreting and controlling CLIP by extracting over 120 semantic concepts from its representation to perform concept-based similarity search and bias analysis in downstream tasks like CelebA. We make the codebase available at this https URL.

PDF

Open source PDF →Open local PDF →

Full Text

223,200 characters extracted from source content.

Expand or collapse full text

Interpreting CLIP with Hierarchical Sparse Autoencoders Vladimir Zaigrajew Hubert Baniecki Przemyslaw Biecek Abstract Sparse autoencoders (SAEs) are useful for detecting and steering interpretable features in neural networks, with particular potential for understanding complex multimodal representations. Given their ability to uncover interpretable features, SAEs are particularly valuable for analyzing vision-language models (e.g., CLIP and SigLIP), which are fundamental building blocks in modern large-scale systems yet remain challenging to interpret and control. However, current SAE methods are limited by optimizing both reconstruction quality and sparsity simultaneously, as they rely on either activation suppression or rigid sparsity constraints. To this end, we introduce Matryoshka SAE (MSAE), a new architecture that learns hierarchical representations at multiple granularities simultaneously, enabling a direct optimization of both metrics without compromise. MSAE establishes a state-of-the-art Pareto frontier between reconstruction quality and sparsity for CLIP, achieving 0.99 cosine similarity and less than 0.1 fraction of variance unexplained while maintaining 80% sparsity. Finally, we demonstrate the utility of MSAE as a tool for interpreting and controlling CLIP by extracting over 120 semantic concepts from its representation to perform concept-based similarity search and bias analysis in downstream tasks like CelebA. We make the codebase available at https://github.com/WolodjaZ/MSAE. sparse autoencoders, Matryoshka representation learning, interpretability, explainable AI, CLIP, SAE, MRL Figure 1: Matryoshka Sparse Autoencoder (MSAE) enables learning hierarchical concept representations from coarse to fine-grained features while avoiding rigid sparsity constraints in TopK and the activation shrinkage problem in ReLU SAE. (B) At training, MSAE uses multiple top-k values up to dimension d instead of a single k like in TopK SAE, combining losses across different granularities. (C) At inference, our method uses the whole d-dimensional representation. (D) MSAE allows for more precise editing and manipulation in the concept space. 1 Introduction Vision-language models, particularly contrastive language-image pre-training (CLIP, Radford et al., 2021; Cherti et al., 2023), revolutionize multimodal understanding by learning robust representations that bridge visual and textual information. Through contrastive learning on massive datasets, CLIP and its less adopted successor SigLIP (Zhai et al., 2023) demonstrate remarkable capabilities that extend far beyond their primary objective of cross-modal similarity search. Its representation is a foundational component in text-to-image generation models like Stable Diffusion (Podell et al., 2024) and serves as a powerful feature extractor for numerous downstream vision and language tasks (Shen et al., 2022), establishing CLIP as a crucial building block in modern VLMs (Liu et al., 2023; Wang et al., 2024). Despite CLIP’s widespread adoption, understanding how it processes and represents information remains a challenge. The distributed nature of its learned representations and the complexity of the optimized loss function make it particularly difficult to interpret. Traditional explainability approaches have limited success in addressing this challenge: gradient-based feature attributions (Simonyan, 2013; Shrikumar et al., 2017; Selvaraju et al., 2017; Sundararajan et al., 2017; Abnar & Zuidema, 2020) struggle to provide human-interpretable explanations, perturbation-based approaches (Zeiler & Fergus, 2014; Ribeiro et al., 2016; Lundberg & Lee, 2017; Adebayo et al., 2018; Baniecki et al., 2025) yield inconsistent results, and concept-based methods (Ramaswamy et al., 2023; Oikarinen et al., 2023) are constrained by their reliance on manually curated concept datasets. This interpretability gap hinders our ability to identify and mitigate potential biases or failure modes of CLIP in downstream applications (Biecek & Samek, 2024). Recent advances in mechanical interpretability (Conmy et al., 2023; Bereska & Gavves, 2024) use sparse autoencoders (SAEs) as a tool for disentangling interpretable features in neural networks (Cunningham et al., 2024). When applied to CLIP’s representation space, SAEs offer the potential to decompose complex, distributed representations into human-interpretable components through self-supervised learning. It eliminates the need for concept datasets and limits predefined concept sets in favor of natural concept emergence. However, training effective SAEs poses unique challenges. The richness of data distribution and high dimensionality of CLIP’s multimodal embedding space require tuning the sparsity-reconstruction trade-off (Bricken et al., 2023; Gao et al., 2025). Furthermore, evaluating SAE effectiveness extends beyond traditional metrics, requiring the discovery of interpretable features that maintain their semantic meaning across both visual and textual modalities. Current approaches for enforcing sparsity in autoencoders use either L1subscript1L_1L1 (Bricken et al., 2023) or TopK (Gao et al., 2025) proxy functions, each with significant drawbacks. L1subscript1L_1L1 regularization results in activation shrinkage, systematically underestimating feature activations and potentially missing subtle but important concepts. TopK enforces a fixed number of active neurons, imposing rigid constraints that may not align with the natural concept density in different regions of CLIP’s embedding space (Gao et al., 2025; Bussmann et al., 2024). To this end, we propose a hierarchical approach to sparse autoencoders, a new architecture inspired by Matryoshka representation learning (, Kusupati et al., 2022), as illustrated in Figure 1. While Matryoshka SAE (MSAE) can be applied to interpret any neural network representation, we demonstrate its utility in the CLIP’s complex multimodal embedding space. At its core, MSAE applies TopK operations hℎh-times with progressively increasing numbers of k neurons, learning representations at hℎh granularities simultaneously – from coarse concepts to fine-grained features. By combining reconstruction losses across all granularity levels, MSAE achieves a more flexible and adaptive sparsity pattern. We remove the rigid constraints of simple TopK while avoiding the activation shrinkage problems associated with L1subscript1L_1L1 regularization, resulting in the state-of-the-art Pareto frontier between the reconstruction quality and sparsity. Contributions. We introduce a hierarchical SAE architecture that establishes a new leading Pareto frontier between reconstruction quality (0.990.990.990.99 cosine similarity and <0.1absent0.1<0.1< 0.1 FVU) and sparsity (∼80%similar-toabsentpercent80 80\%∼ 80 %), while maintaining computational efficiency comparable to standard SAEs at inference time. We develop a robust methodology for validating discovered concepts in CLIP’s multimodal embedding space, successfully identifying and verifying over 120 interpretable concepts across both image and text domains. Through extensive empirical evaluation on C3M and ImageNet datasets, we demonstrate progressive recovery capabilities and the effectiveness of hierarchical sparsity thresholds compared to existing approaches. We showcase the practical utility of MSAE in two key applications: concept-based similarity search with controllable concept strength and systematic analysis of gender biases in downstream classification models through SAE activations and concept-level interventions on the CelebA dataset. 2 Related Work Interpreting CLIP models. CLIP interpretability research follows two main directions: a direct interpretation of CLIP’s behavior and using CLIP to explain other models. Direct interpretation studies focus on understanding CLIP’s components through feature attributions (Joukovsky et al., 2023; Sammani et al., 2024; Zhao et al., 2024), residual transformations (Balasubramanian et al., 2024), attention heads (Gandelsman et al., 2024), and individual neurons (Goh et al., 2021; Li et al., 2022). Li et al. (2022) discovered CLIP’s tendency to focus on image backgrounds through saliency analysis, while Goh et al. (2021) identified CLIP’s multimodal neurons responding consistently to concepts across modalities. For model explanation, CLIP is used to analyze challenging examples (Jain et al., 2023), robustness to distribution shifts (Crabbé et al., 2024), and label individual neurons (Oikarinen & Weng, 2023). In this work, we explore both directions in Section 5 via the detection of semantic concepts learned by CLIP using MSAE (Section 5.1) and the analysis of biases in downstream models built on MSAE-explained CLIP embeddings (Section 5.3). Mechanistic interpretability. Mechanistic interpretability seeks to reverse engineer neural networks analogously to decompiling computer programs (Conmy et al., 2023; Bereska & Gavves, 2024). While early approaches focus on generating natural language descriptions of individual neurons (Hernandez et al., 2021; Bills et al., 2023), the polysemantic nature of neural representations makes this challenging. A breakthrough comes with sparse autoencoders (SAEs) (Bricken et al., 2023; Cunningham et al., 2024), which demonstrate the ability to recover monosemantic features. Recent architectural advancements like Gated (Rajamanoharan et al., 2024a) and TopK SAE variants (Gao et al., 2025) improve the sparsity–reconstruction trade-off, enabling successful application to LLMs (Templeton et al., 2024), diffusion models (Surkov et al., 2024), and medical imaging (Abdulaal et al., 2024). Recent work on SAE-based interpretation of CLIP embeddings (Rao et al., 2024) shows promise in extracting interpretable features. Concept-based explainability. Concept-based explanations provide interpretability by identifying human-coherent concepts within neural networks’ latent spaces. While early approaches relied on manually curated concept datasets (Kim et al., 2018; Zhou et al., 2018; Bykov et al., 2023), recent work has explored automated concept extraction (Ghorbani et al., 2019; Kopf et al., 2024) and explicit concept learning (Liu et al., 2020; Koh et al., 2020; Espinosa Zarlenga et al., 2022), with successful applications in out-of-distribution detection (Madeira et al., 2023), image generation (Misino et al., 2022), and medicine (Lucieri et al., 2020). However, existing methods often struggle to scale to modern transformer architectures with hundreds of millions of parameters. Our approach addresses this limitation by first training SAE without supervision on concept learning, then efficiently mapping unit-norm decoder columns to defined vocabulary concepts using cosine similarity with CLIP embeddings. 3 Matryoshka Sparse Autoencoder 3.1 Preliminaries Sparse autoencoders (SAEs) decompose model activations x∈ℝnsuperscriptℝx ^nx ∈ blackboard_Rn into sparse linear combinations of learned directions, aiming for interpretability and monosemanticity. The standard SAE architecture consists of: z=ReLU⁢(Wenc⁢(x−bpre)+benc),x^=Wdec⁢z+bpre,formulae-sequenceReLUsubscriptencsubscriptpresubscriptenc^subscriptdecsubscriptpre gatheredz=ReLU (W_enc (x-b_pre% )+b_enc ),\\ x=W_decz+b_pre, gatheredstart_ROW start_CELL z = ReLU ( Wroman_enc ( x - broman_pre ) + broman_enc ) , end_CELL end_ROW start_ROW start_CELL over start_ARG x end_ARG = Wroman_dec z + broman_pre , end_CELL end_ROW (1) where encoder matrix Wenc∈ℝn×dsubscriptencsuperscriptℝW_enc ^n× dWroman_enc ∈ blackboard_Rn × d, encoder bias benc∈ℝdsubscriptencsuperscriptℝb_enc ^dbroman_enc ∈ blackboard_Rd, decoder matrix Wdec∈ℝd×nsubscriptdecsuperscriptℝW_dec ^d× nWroman_dec ∈ blackboard_Rd × n, and preprocessing bias bpre∈ℝnsubscriptpresuperscriptℝb_pre ^nbroman_pre ∈ blackboard_Rn are the learnable parameters, with d being the dimension of the latent space. The basic reconstruction objective is ℒ⁢(x):=‖x−x^‖22assignℒsuperscriptsubscriptnorm^22L(x):=\|x- x\|_2^2L ( x ) := ∥ x - over start_ARG x end_ARG ∥22. Existing approaches established two primary sparsity mechanisms. ReLU SAE (Bricken et al., 2023) uses L1subscript1L_1L1 regularization with the objective ℒ⁢(x):=‖x−x^‖22+λ⁢‖z‖1assignℒsuperscriptsubscriptnorm^22subscriptnorm1L(x):=\|x- x\|_2^2+λ\|z\|_1L ( x ) := ∥ x - over start_ARG x end_ARG ∥22 + λ ∥ z ∥1, while TopK SAE (Gao et al., 2025) enforces fixed sparsity through z=ReLU⁢(TopK⁢(Wenc⁢(x−bpre)+benc))ReLUTopKsubscriptencsubscriptpresubscriptencz=ReLU (TopK (W_enc (x-b_pre% )+b_enc ) )z = ReLU ( TopK ( Wroman_enc ( x - broman_pre ) + broman_enc ) ). However, each approach faces distinct limitations: L1subscript1L_1L1 regularization causes activation shrinkage (Rajamanoharan et al., 2024a), while TopK imposes rigid sparsity constraints. A recent effort to address the rigidity of TopK is BatchTopK (Bussmann et al., 2024), which replaces the standard TopKTopKTopKTopK function with BatchTopKBatchTopKBatchTopKBatchTopK within the TopK SAE method. The BatchTopKBatchTopKBatchTopKBatchTopK function treats all batch activations as a single, flattened vector before applying TopKTopKTopKTopK. This allows for a flexible number of active features per sample, with the total number of active features across the batch averaging to k×batch sizebatch sizek×batch sizek × batch size. Although BatchTopK relaxes the fixed sparsity of traditional TopK, it still relies on a predetermined k parameter that requires careful tuning and continues to suffer from the potential for certain features to become ‘dead’ or rarely activated if they consistently fall outside the TopK selection. 3.2 Matryoshka SAE Architecture Following Matryoshka representation learning (, Kusupati et al., 2022), we propose a SAE architecture that learns representations at multiple granularities simultaneously. Instead of enforcing a single sparsity threshold k or using L1subscript1L_1L1 regularization, our approach applies multiple TopK operations with increasing k values, optimizing across all granularity levels. We set k values as powers of 2, i.e. ki=2isubscriptsuperscript2k_i=2^ikitalic_i = 2i up to dimension d, which provides effective coverage of the representation space while maintaining reasonable computational costs. For a given input x, MSAE computes hℎh latent representations during training using a sequence of increasing k values k1,k2,…,khsubscript1subscript2…subscriptℎ\k_1,k_2,…,k_h\ k1 , k2 , … , kitalic_h with k1<k2<…<kh≤dsubscript1subscript2…subscriptℎk_1<k_2<…<k_h≤ dk1 < k2 < … < kitalic_h ≤ d: zi=ReLU⁢(TopKi⁢(Wenc⁢(x−bpre)+benc)),x^i=Wdec⁢zi+bpre,ℒ⁢(x):=∑i=1hαi⁢‖x−x^i‖22,formulae-sequencesubscriptReLUsubscriptTopKsubscriptencsubscriptpresubscriptencformulae-sequencesubscript^subscriptdecsubscriptsubscriptpreassignℒsuperscriptsubscript1ℎsubscriptsuperscriptsubscriptdelimited-∥subscript^22 gatheredz_i=ReLU(TopK_i(W_enc(x-b_% pre)+b_enc)),\\ x_i=W_decz_i+b_pre,\\ L(x):= _i=1^h _i\|x- x_i\|_2^2, gatheredstart_ROW start_CELL zitalic_i = ReLU ( TopKitalic_i ( Wroman_enc ( x - broman_pre ) + broman_enc ) ) , end_CELL end_ROW start_ROW start_CELL over start_ARG x end_ARGi = Wroman_dec zitalic_i + broman_pre , end_CELL end_ROW start_ROW start_CELL L ( x ) := ∑i = 1h αitalic_i ∥ x - over start_ARG x end_ARGi ∥22 , end_CELL end_ROW (2) where αisubscript _iαitalic_i are weighting coefficients for each granularity level. At inference time, we can either apply TopK with any desired granularity or discard it entirely, leaving only ReLU, which allows the model to utilize all neurons it deems essential for reconstruction. Hierarchical learning. The key insight of our approach is that different samples require different levels of sparsity (numbers of concepts) for an optimal representation. By simultaneously optimizing across multiple k values, MSAE learns a natural hierarchy of features. Our TopK operations maintain a nested structure where features selected at each level form a subset of those selected at higher k values, i.e. TopK1⊆TopK2⊆…⊆TopKhsubscriptTopK1subscriptTopK2…subscriptTopKℎTopK_1 _2 … % TopK_hTopK1 ⊆ TopK2 ⊆ … ⊆ TopKitalic_h. Such a hierarchical structure ensures coherence between granularity levels, where low k values capture coarse, high-level concepts while higher k values progressively enable fine-grained feature representation. Sparsity coefficient weighting. We propose and evaluate two strategies for setting the weighting coefficients αisubscript _iαitalic_i. The uniform weighting (UW) approach sets αi=1subscript1 _i=1αitalic_i = 1 for all i, while the reverse weighting (RW) strategy uses αi=h−i+1subscriptℎ1 _i=h-i+1αitalic_i = h - i + 1, giving higher weights to lower k values. By weighting the loss more heavily for sparser reconstructions, RW encourages the model to learn features that maintain reconstruction quality at lower k values. As shown in Table 1, this results in improved sparsity without significant performance degradation as the model learns that sparse representations achieve better loss even with slightly worse reconstruction quality, compared to UW which focuses primarily on reconstruction quality. 3.3 Training and Inference CLIP embeddings exhibit misalignment across modalities, which can impact SAE training convergence and cross-modal transferability. Following (Bhalla et al., 2024), we normalize embeddings to ensure consistent behavior across modalities. We first center embeddings by subtracting the per-modality mean estimated from the training dataset. Next, we scale the centered embeddings by a dataset-computed scaling factor following (Conerly et al., 2024) to obtain x∈⁢[‖x‖2]=nsubscriptdelimited-[]subscriptnorm2E_x [\|x\|_2]= nblackboard_Ex ∈ X [ ∥ x ∥2 ] = square-root start_ARG n end_ARG. This scaling ensures that λ has consistent effects across different CLIP architectures and modalities. For training, we compute the mean vector and scaling factor from the image modality. During inference on text embeddings, we apply the text-specific mean and scaling factor. Additionally, at inference, we remove TopK constraints from TopK-trained models, allowing the model to adaptively select the number of active features based only on ReLU activation. 4 Evaluating MSAE In this section, we conduct extensive experiments to evaluate MSAE against ReLU and TopK SAEs. We compare the sparsity–fidelity trade-off (Section 4.2), at multiple granularity levels (Section 4.3). We follow with evaluating the semantic quality of learned representations beyond traditional distance metrics (Section 4.4), analyzing decoder orthogonality (Section 4.5), and examining the statistical properties of SAE activation magnitudes (Section 4.6). To verify that MSAE successfully learns hierarchical features, we conduct experiments on the progressive recovery task (Section 4.7). We conclude with an ablation study comparing the influence of different training modalities in Section 4.8. In response to reviewer feedback, we’ve incorporated an analysis of BatchTopK models within Section 4.2. However, given their performance characteristics were not as competitive, we didn’t extend their evaluation to subsequent sections. Setup. All SAE models are trained on the C3M (Sharma et al., 2018) training set with features (post-pooled) from the CLIP ViT-L/14 or ViT-B/16 model. Image modality is evaluated on ImageNet-1k training set (Russakovsky et al., 2015), while text modality is evaluated on the C3M validation set. Each SAE is trained with expansion rates of 8×8×8 ×, 16×16×16 × and 32×32×32 ×, effectively scaling the latent layer from 768 to 6144, 12288, 24576 neurons for ViT-L/14, and from 512 to 4096, 8192, 16384 neurons for ViT-B/16. We provide further details on the implementation and hyperparameter settings in Appendix B. Table 1: Quantitative comparison of SAE models on ImageNet-1k. We compare the following SAEs with expansion rate 8: ReLU with varying sparsity regularization (λ), TopK and BatchTopK with 64 or 256 active neurons, and Matryoshka using uniform (UW) or reverse weighting (RW) α coefficients. Arrows (↑ ↑/↓ ↓) indicate the preferred direction of metrics. NDN values in parentheses show the dead neuron count on the training set. LP (KL) values are scaled by 106superscript10610^6106 for readability. Extended results for higher expansion rates and the text modality are reported in Appendix F.5. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ LP (KL) ↓ ↓ LP (Acc) ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .920±.008subscript.920plus-or-minus.008.920_±.008.920± .008 .185±.031subscript.185plus-or-minus.031.185_±.031.185± .031 .928±.009subscript.928plus-or-minus.009.928_±.009.928± .009 50.5±77.1subscript50.5plus-or-minus77.150.5_± 77.150.5± 77.1 .977±.149subscript.977plus-or-minus.149.977_±.149.977± .149 .742±.005subscript.742plus-or-minus.005.742_±.005.742± .005 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .649±.007subscript.649plus-or-minus.007.649_±.007.649± .007 .004±.000subscript.004plus-or-minus.000.004_±.000.004± .000 .998±.000subscript.998plus-or-minus.000.998_±.000.998± .000 0.66±1.03subscript0.66plus-or-minus1.030.66_± 1.030.66± 1.03 .994±.083subscript.994plus-or-minus.083.994_±.083.994± .083 .781±.004subscript.781plus-or-minus.004.781_±.004.781± .004 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) TopK (k=6464k=64k = 64) .950±.009subscript.950plus-or-minus.009.950_±.009.950± .009 .172±.026subscript.172plus-or-minus.026.172_±.026.172± .026 .912±.013subscript.912plus-or-minus.013.912_±.013.912± .013 60.1±90.8subscript60.1plus-or-minus90.860.1_± 90.860.1± 90.8 .930±.255subscript.930plus-or-minus.255.930_±.255.930± .255 .762±.004subscript.762plus-or-minus.004.762_±.004.762± .004 .002.002.002.002 0⁢(335)03350(335)0 ( 335 ) TopK (k=256256k=256k = 256) .900±.004subscript.900plus-or-minus.004.900_±.004.900± .004 .011±.003subscript.011plus-or-minus.003.011_±.003.011± .003 .994±.002subscript.994plus-or-minus.002.994_±.002.994± .002 2.71±5.40subscript2.71plus-or-minus5.402.71_± 5.402.71± 5.40 .987±.114subscript.987plus-or-minus.114.987_±.114.987± .114 .874±.003subscript.874plus-or-minus.003.874_±.003.874± .003 .003.003.003.003 0⁢(296)02960(296)0 ( 296 ) BatchTopK (k=6464k=64k = 64) .877±.012subscript.877plus-or-minus.012.877_±.012.877± .012 .162±.022subscript.162plus-or-minus.022.162_±.022.162± .022 .917±.011subscript.917plus-or-minus.011.917_±.011.917± .011 56.9±85.8subscript56.9plus-or-minus85.856.9_± 85.856.9± 85.8 .931±.253subscript.931plus-or-minus.253.931_±.253.931± .253 .769±.004subscript.769plus-or-minus.004.769_±.004.769± .004 .002.002.002.002 0⁢(1477)014770(1477)0 ( 1477 ) BatchTopK (k=256256k=256k = 256) .882±.005subscript.882plus-or-minus.005.882_±.005.882± .005 .010±.005subscript.010plus-or-minus.005.010_±.005.010± .005 .995±.002subscript.995plus-or-minus.002.995_±.002.995± .002 2.42±5.12subscript2.42plus-or-minus5.122.42_± 5.122.42± 5.12 .988±.108subscript.988plus-or-minus.108.988_±.108.988± .108 .860±.003subscript.860plus-or-minus.003.860_±.003.860± .003 .002.002.002.002 3⁢(919)39193(919)3 ( 919 ) Matryoshka (RW) .829±.008subscript.829plus-or-minus.008.829_±.008.829± .008 .007±.003subscript.007plus-or-minus.003.007_±.003.007± .003 .997±.002subscript.997plus-or-minus.002.997_±.002.997± .002 3.13±7.08subscript3.13plus-or-minus7.083.13_± 7.083.13± 7.08 .987±.115subscript.987plus-or-minus.115.987_±.115.987± .115 .809±.002subscript.809plus-or-minus.002.809_±.002.809± .002 .002.002.002.002 2⁢(4)242(4)2 ( 4 ) Matryoshka (UW) .748±.006subscript.748plus-or-minus.006.748_±.006.748± .006 .002±.001subscript.002plus-or-minus.001.002_±.001.002± .001 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 0.35±0.82subscript0.35plus-or-minus0.820.35_± 0.820.35± 0.82 .995±.070subscript.995plus-or-minus.070.995_±.070.995± .070 .848±.003subscript.848plus-or-minus.003.848_±.003.848± .003 .002.002.002.002 0⁢(22)0220(22)0 ( 22 ) 4.1 Evaluation Metrics Here, we briefly define each metric used to evaluate SAE. L0subscript0L_0L0 denotes the mean proportion of zero elements in SAE activations. Fraction of variance unexplained (FVU), also known as Normalized MSE (Gao et al., 2025), measures reconstruction fidelity by normalizing the mean squared reconstruction error ℒ⁢(x)ℒL(x)L ( x ) by the mean squared value of the (mean-centered) input. Explained variance ratio (EVR) is FVU’s complement metric, defined as 1−FVU1FVU1-FVU1 - FVU. Linear probing (LP) assesses how well SAE preserves semantic information in the reconstructed embeddings on the downstream task. To evaluate this, we train a linear probe model on ImageNet-1k using CLIP embeddings as a backbone, with AdamW optimizer (lr=1⁢e−3lr13lr=1e-3lr = 1 e - 3), ReduceLROnPlateau scheduler, and batch_size=256. We measure performance by comparing predictions from original versus reconstructed embeddings using two metrics: Kullback-Leibler divergence (KL) between predicted class distributions and classification accuracy (Acc), where accuracy uses argmaxargmaxargmaxargmax predictions from original embeddings as targets. Centered kernel nearest neighbor alignment (CKNNA) (Huh et al., 2024) measures kernel alignment based on mutual nearest neighbors, providing a quantitative assessment of alignment between SAE activations and input embeddings. A detailed explanation is provided in Appendix E. Decoder orthogonality (DO) calculates the mean cosine similarity of the lower triangular portion of the SAE decoder, where 0 indicates perfect orthogonality. This metric assesses how orthogonal the monosemantic feature directions are in the decoder. Number of dead neurons (NDN) is a metric that measures how many neurons remain consistently inactive (zero in the SAE activations layer) across all inputs during training or evaluation, indicating the network’s inability to fully utilize its capacity for learning semantic features. Figure 2: Comparison of sparsity–fidelity trade-offs across SAE architectures on ImageNet-1k. Each model presents results from all 3 expansion rates, comparing ReLU SAE (λ=0.03,0.01,0.0030.030.010.003λ=\0.03,0.01,0.003\λ = 0.03 , 0.01 , 0.003 ), TopK SAE (k=64,128,256k=64,128,256\k = 64 , 128 , 256 ), BatchTopK SAE (k=64,128,256k=64,128,256\k = 64 , 128 , 256 ) and MSAE (RW, UW). The optimal SAE would occupy the upper right corner, achieving both high sparsity and reconstruction fidelity. For extended results across both modalities, refer to Figure 11. 4.2 Sparsity–Fidelity Trade-off We assess SAE performance using sparsity–fidelity trade-off, measuring sparsity with L0subscript0L_0L0 and reconstruction quality with EVR, following previous work. Figure 2 reveals that ReLU SAE shows difficulty balancing performance, achieving either high fidelity with low L0subscript0L_0L0 or the opposite, with expansion rate primarily improving sparsity. TopK SAE with higher k values achieves better but not ReLU-level fidelity while offering improved sparsity yet consistently suffering from at least 5% of dead neurons (Table 1). While the BatchTopK variant performs similarly to TopK, it exhibits higher fidelity with lesser sparsity when trained on the same k. Both variants of MSAE achieve better sparsity than ReLU and better fidelity than TopK or BatchTopK, establishing a superior Pareto frontier while maintaining less than 1% of dead neurons. The RW variant further improves sparsity as expected, with only minor fidelity degradation. Notably, only Matryoshka consistently improves on both metrics with higher expansion rates, while TopK struggles with reconstruction, BatchTopK with sparsity, and ReLU shows improvements only in the highest λ at increased expansion rates. As an ablation, we evaluate Cosine Similarity as an alternative reconstruction metric, motivated by observations that SAEs primarily struggle with embedding magnitude reconstruction and CLIP embeddings are commonly L2subscript2L_2L2-normalized. Results in Appendix F.1 show consistent findings, with MSAE showing even clearer advantages through stable, low-variance performance across both modalities. 4.3 Ablation: Matryoshka at Lower Granularity Levels We train both MSAE variants (RW and UW) on two granularities [128,256]128256[128,256][ 128 , 256 ] and compare them against TopK with k=128128k=128k = 128 and k=256256k=256k = 256 to analyze MSAE behavior at lower granularity levels. Figure 3 shows that Matryoshka achieves similar sparsity to at least the lower TopK variant while maintaining CKNNA and EVR performance comparable to the best TopK variant, and even better with MSAE RW. This demonstrates that even at small granularity, MSAE maintains or improves the Pareto frontier over TopK across various metrics, with RW achieving better trade-offs. As observed also in Section 4.2, MSAE’s performance advantages over TopK increase at higher expansion rates. Figure 3: Low Granularity Level Matroshka vs. TopK SAE on ImageNet-1k. We report FVU (left) and CKNNA (right) metrics for two TopK variants (k=128,256128256k=128,256k = 128 , 256), and Matryoshka trained on these granularities in RW and UW variants at expansion rates 8 and 16. Even at this small granularity, MSAE improves the Pareto frontier relative to both TopK variants, pushing it as the expansion rate grows from 8 to 16. For extended results across other metrics, refer to Figure 13. 4.4 Semantic Preservation Analysis In Section 4.2, we only evaluated SAEs using L0subscript0L_0L0 for activation sparsity and EVR for reconstruction fidelity, however these metrics have limitations. L0subscript0L_0L0 only counts active neurons without assessing how well SAE representations align with original embeddings, and EVR focuses solely on distance reconstruction rather than semantic preservation. To address these limitations, we introduce additional metrics. Following (Yu et al., 2025), we adopt the CKNNA metric to assess how well SAE activations preserve the neighborhood structure of CLIP embeddings. We also evaluate semantic preservation through linear probing metrics (Gao et al., 2025; Lieberum et al., 2024), we use LP (KL) to measure prediction distribution alignment and LP (Acc) to compare classification accuracy. All metrics are defined in Section 4.1 and presented in Table 1. Our analysis reveals that while cosine similarity and FVU correlate well with linear probing metrics, the alignment metric demonstrates Matryoshka’s strength in preserving semantic structures. 4.5 Orthogonality of SAE Features SAEs can disentangle polysemantic representations into monosemantic features, as shown and explained by (Bricken et al., 2023). To evaluate feature monosemanticity, we measure decoder orthogonality using the DO metric, with results reported in Table 1. While all methods achieve high orthogonality as indicated by low DO values, none reach perfect orthogonality. This might stem from multiple factors, including feature absorption as noted in (Chanin et al., 2025), or just learning similar concepts (such as different numbers). We argue that understanding these sources of non-orthogonality is crucial for advancing the development of more effective monosemantic feature learning in SAEs. Figure 4: Distribution of non-zero SAE activations on ImageNet-1k validation set. Frequency histograms for ReLU (λ=0.0030.003λ=0.003λ = 0.003), TopK (k=3232k=32k = 32), and Matryoshka (RW) models at expansion rate 8. Matryoshka models exhibit a double-curvature distribution similar to ReLU models but without activation shrinkage, while TopK shows this pattern only at higher k values, as can be seen in an extended Figure 14. Extended results for higher expansion rates are reported in Figure 15. 4.6 Activations Magnitudes Analysis To analyze the impact of sparsity proxies on SAE, we examine non-zero activation distributions across ViT-L with expansion rate 8 in Figure 4. Matryoshka models display a distinctive double-curvature distribution similar to ReLU-based models, with values between 5555 to 10101010 appearing almost linear in log10subscript10 _10log10 space. Following (Templeton et al., 2024), we attribute low activations to reconstruction purposes rather than semantic meaning. The second curvature reflects natural images’ complexity, which requires multiple concept reconstructions rather than single dominant features, as evidenced by the small number of very high values corresponding to rare, nearly singular concept images (Figure 10). As the sparsity parameter k in TopK methods increases (Figure 14), the transition from one to double-curvature behavior suggests that stronger sparsity constraints create composite features, supported by Appendix C showing that high-activation features (>15absent15>15> 15) in TopK methods have a lower ratio of valid named features compared to Matryoshka. Figure 5: Progressive recovery performance on ImageNet-1k. We report FVU (left) and CKNNA (right) metrics for different SAE architectures with expansion rate 8 as functions of increasing top k values by magnitudes of SAE activations during inference. SAE trained with TopK variants (k=32,643264k=32,64k = 32 , 64) show performance plateaus beyond their training thresholds, while ReLU-based models (λ=0.001,0.0030.0010.003λ=0.001,0.003λ = 0.001 , 0.003) and Matryoshka variants (UW and RW) demonstrate continuous improvement. Extended results for higher expansion rates and across other metrics are reported in Figures 16 & 17. Table 2: Training modality influence on MSAE performance. We train MSAE on the text version of the C3M train set and compare it to models trained on its original image version, evaluating across both domains using the C3M validation text set and ImageNet-1k. While models perform best on their training modality, text-trained variants show better cross-domain generalization. Bold values indicate the best performance per metric, with NDN showing dead neuron count from the final checkpoint. Matryoshka SAE variant Language metrics on C3M Vision metrics on ImageNet-1k NDN ↓ ↓ L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ Image (RW) .824±.029subscript.824plus-or-minus.029.824_±.029.824± .029 .060±.052subscript.060plus-or-minus.052.060_±.052.060± .052 .971±.026subscript.971plus-or-minus.026.971_±.026.971± .026 .775±.001subscript.775plus-or-minus.001.775_±.001.775± .001 .829±.008subscript.829plus-or-minus.008.829_±.008.829± .008 .007±.003subscript.007plus-or-minus.003.007_±.003.007± .003 .997±.002subscript.997plus-or-minus.002.997_±.002.997± .002 .809±.002subscript.809plus-or-minus.002.809_±.002.809± .002 4444 Image (UW) .755±.024subscript.755plus-or-minus.024.755_±.024.755± .024 .026±.027subscript.026plus-or-minus.027.026_±.027.026± .027 .988±.012subscript.988plus-or-minus.012.988_±.012.988± .012 .790±.002subscript.790plus-or-minus.002.790_±.002.790± .002 .748±.006subscript.748plus-or-minus.006.748_±.006.748± .006 .002±.001subscript.002plus-or-minus.001.002_±.001.002± .001 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .848±.003subscript.848plus-or-minus.003.848_±.003.848± .003 22222222 Text (RW) .841±.014subscript.841plus-or-minus.014.841_±.014.841± .014 .008±.003subscript.008plus-or-minus.003.008_±.003.008± .003 .996±.002subscript.996plus-or-minus.002.996_±.002.996± .002 .782±.008subscript.782plus-or-minus.008.782_±.008.782± .008 .841±.014subscript.841plus-or-minus.014.841_±.014.841± .014 .008±.003subscript.008plus-or-minus.003.008_±.003.008± .003 .996±.002subscript.996plus-or-minus.002.996_±.002.996± .002 .782±.008subscript.782plus-or-minus.008.782_±.008.782± .008 00 Text (UW) .791±.010subscript.791plus-or-minus.010.791_±.010.791± .010 .001±.001subscript.001plus-or-minus.001.001_±.001.001± .001 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .784±.007subscript.784plus-or-minus.007.784_±.007.784± .007 .799±.012subscript.799plus-or-minus.012.799_±.012.799± .012 .015±.013subscript.015plus-or-minus.013.015_±.013.015± .013 .993±.006subscript.993plus-or-minus.006.993_±.006.993± .006 .877±.003subscript.877plus-or-minus.003.877_±.003.877± .003 00 4.7 Progressive Recovery To verify that our method learns hierarchical structure, we perform a progressive reconstruction task by using an increasing number of SAE activations, ordered by magnitude, to recover the original vector. Figure 5 shows that reconstruction quality improves with decreasing sparsity thresholds (increasing k) during inference. TopK variants exhibit performance plateaus shortly after their training thresholds (k=32,643264k=\32,64\k = 32 , 64 ), while ReLU-based models show continued improvement but with inferior performance at higher sparsity. MSAE demonstrates a better hierarchical structure that combines TopK’s efficient high-sparsity performance with ReLU’s scaling capabilities. While our method performs slightly below TopK (k=3232k=32k = 32) at the highest sparsity, it quickly surpasses TopK’s plateau at lower sparsity, achieving performance levels above ReLU models. We observe similar patterns in the CKNNA alignment metric, with MSAE outperforming both TopK (k=3232k=32k = 32) and ReLU models beyond k=1010k=10k = 10 while performing only slightly below TopK (k=256256k=256k = 256) at the lowest sparsity. Evidence of improved hierarchical feature learning across metrics and modalities is presented in Appendix F.4. 4.8 Training Modality: Language and Vision We evaluate how training modality affects MSAE performance by comparing models trained on text with the original image-trained models, validating both modality models across text and image domains in Table 2. While both variants perform best in their training domains, text-trained models achieve superior cross-modal performance, demonstrating stronger generalization capabilities. Moreover, text-trained models achieve higher sparsity on both modalities with no dead neurons, showing better utilization of learned features. These findings position text training as a preferred approach for multi-modal applications where balanced performance is desired. Future research could explore training SAEs on varying ratios of text and image data to optimize cross-modal performance or try to train crosscoders (Lindsey et al., 2024) on both modalities simultaneously. We defer extended MSAE evaluations, to Appendix F. 5 Interpreting CLIP with MSAE In this section, we demonstrate how MSAE can enhance interpretability and control interpretable features in CLIP-based applications. We first establish neuron-concept mappings in the activation layer through an automated technique described in Section 5.1. Then, we show its effectiveness in concept-based similarity search across the ImageNet validation set, enabling retrieval of images with varying degrees of explicit concept presence. Moreover, we leverage MSAE to study potential conceptual biases in a gender classification model trained on the CelebA dataset (Liu et al., 2015). 5.1 Concept Naming While self-supervised training of SAE enables learning up to d monosemantic concepts, mapping these concepts to specific neurons remains non-trivial. Previous work used LLMs for identifying neuron-encoded concepts (Bills et al., 2023), but we adopt the more efficient method for CLIP-trained SAE proposed in (Rao et al., 2024), which leverages CLIP’s representation space. Our concept detection and validation methodology is detailed in Appendix A, with comprehensive results on valid concept counts across SAE models presented in Table 3. Figure 8 demonstrates the highest-activating text and image examples for the best-matched feature concept ‘face’ across ReLU, TopK, and MSAE. The consistent concept presence across diverse inputs observed primarily in MSAE variants suggests that only Matryoshka-based methods were capable of learning this monosemantic feature. Supplementary analysis of highly activated concept examples in Appendix G.1 showcases SAE’s ability to learn a wide range of concepts, from simple textures and colors to more complex ones like light (lights in darkness), countable concepts like trio (groups of three), and even nationality-related concepts like ireland or germany. Naming SAE features enables using SAE to conduct diverse interpretability analyses related to CLIP. We present two use cases where we apply the MSAE RW variant with an expansion rate of 8. Figure 6: Impact of concept manipulation on gender classification. By increasing concept magnitudes (bearded, glasses, blonde) in SAE space and mapping back to CLIP space, we observe changes in gender classification probabilities. Results reveal the model’s learned gender associations through plateauing effects: bearded and glasses bias toward male classification, while blonde bias toward female. Figure 7: Nearest neighbor analysis with enhanced germany concept. By increasing the magnitude of the germany concept in SAE space (from 0.3 to 20, then 30) and mapping back to CLIP space, we observe shifts in nearest neighbors. While the input image remains the top match (with increasing distance), the second-nearest neighbor changes from a British police vehicle (shown in Figure 21) to a German one. 5.2 Similarity Search CLIP embeddings are widely used for cross-modal similarity search between images and text through cosine similarity metric, primarily for retrieval engines. We extend this capability using SAE in three ways. First, SAE provides interpretable insights into nearest neighbor (N) image retrievals. Figure 21 shows the top 8 concepts for the two closest retrieved images, revealing shared semantic concept patterns and explaining why both NNs match the query image of an Irish police vehicle, with the first N (Irish police vehicle) being closer than the second (British police vehicle). Second, we compare similarity search in CLIP embedding space against SAE activation space using Manhattan distance (detailed and visualized in Appendix G.2). While the first N remains consistent across both spaces, the second N in the SAE space shows the same vehicle type from a different angle, demonstrating that similarity searches can be done in both spaces while SAE enables additional concept-based interpretability. Finally, we demonstrate a controlled similarity search by manipulating concept magnitudes. In Figure 7, increasing the germany concept strength preserves the original image as the top match but shifts the second N from an Irish to a German police vehicle, while preserving the overall input image structure. The increasing distances from the original image embedding show how larger magnitude adjustments affect embedding coherence. 5.3 Bias Validation on a Downstream Task CLIP models are commonly used as feature extractors for downstream tasks, enabling efficient fine-tuning with limited data. With MSAE, we can investigate whether downstream models learn to associate specific concepts with classes. To demonstrate this, we train a single-layer classifier on CLIP embeddings from the CelebA dataset to perform binary gender classification (1 for female, 0 for male), achieving an F1 score of approximately 0.99. Through statistical analysis in Appendix G.3, we uncover several concept-gender associations: bearded biases toward male classification, blonde toward female, and glasses showing modest male bias. To validate these findings, Figure 6 demonstrates an example of how increasing these concepts’ magnitudes affects classification scores for a female example (see Figure 23 for a male example). The results confirm our statistical analysis, and the plateaus in classification probabilities as concept magnitudes increase help quantify the strength of concept-gender associations in the model. Figure 8: Comparison of top-activating examples for the ‘face’ concept across SAE methods. Through an automated interpretability process, we identified the best matching SAE neuron for the face concept, quantified by a similarity score and a validation status (Valid or Invalid). For this neuron, we then identified the highest activated examples in both text and image modalities across TopK, ReLU, and MSAE, each presented in variants optimized for either sparsity or reconstruction. Both text and image examples with the highest activation values strongly confirm. confirm the concept’s presence and demonstrate that only MSAE variants learned the concept of the face. We show additional examples of valid concepts with their top-activating examples in Figure 19. 6 Conclusion We propose Matryoshka SAE to advance our understanding of CLIP embeddings through hierarchical sparse autoencoders. MSAE improves upon both TopK and ReLU approaches, achieving superior sparsity–fidelity trade-off while providing flexible sparsity control via the α coefficient. Our experiments demonstrate MSAE’s effectiveness through near-optimal metrics, progressive feature recovery, and extraction of over 120 validated concepts, enabling new applications in concept-based similarity search and bias detection in downstream tasks. Limitations and future work. MSAE faces three limitations with clear paths for future improvement. The current implementation’s use of multiple decoder passes with different TopK activations introduces computational overhead, which could be addressed through optimized CUDA kernels enabling parallel processing of multiple granularities. While we demonstrated MSAE’s effectiveness using CLIP embeddings, it has great potential to explain hierarchical representations in other embedding spaces, such as SigLIP (Zhai et al., 2023) or modality-specific representations. Finally, since not all neurons correspond to simple concepts in our vocabulary, investigating complex semantic features through LLM-based interpretability methods could provide deeper insights into the learned hierarchical representations. In concurrent independent work, Bussmann et al. (2025) propose an MSAE approach to interpreting language models. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning Interpretability. There are some potential societal consequences of our work, none which we feel must be specifically highlighted here. Acknowledgement Work on this project was financially supported from the SONATA BIS grant 2019/34/E/ST6/00052 funded by Polish National Science Centre (NCN). We also thank the anonymous reviewers for their useful comments. References Abdulaal et al. (2024) Abdulaal, A., Fry, H., Montaña-Brown, N., Ijishakin, A., Gao, J., Hyland, S., Alexander, D. C., and Castro, D. C. An x-ray is worth 15 features: Sparse autoencoders for interpretable radiology report generation. arXiv preprint arXiv:2410.03334, 2024. Abnar & Zuidema (2020) Abnar, S. and Zuidema, W. Quantifying attention flow in transformers. In ACL, 2020. Adebayo et al. (2018) Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. Sanity checks for saliency maps. In NeurIPS, 2018. Balasubramanian et al. (2024) Balasubramanian, S., Basu, S., and Feizi, S. Decomposing and interpreting image representations via text in ViTs beyond CLIP. In NeurIPS, 2024. Baniecki et al. (2025) Baniecki, H., Casalicchio, G., Bischl, B., and Biecek, P. Efficient and accurate explanation estimation with distribution compression. In ICLR, 2025. Bereska & Gavves (2024) Bereska, L. and Gavves, E. Mechanistic interpretability for AI safety–A review. Transactions on Machine Learning Research, 2024. Bhalla et al. (2024) Bhalla, U., Oesterling, A., Srinivas, S., Calmon, F., and Lakkaraju, H. Interpreting CLIP with sparse linear concept embeddings (SpLiCE). In NeurIPS, 2024. Biecek & Samek (2024) Biecek, P. and Samek, W. Position: Explain to question not to justify. In ICML, 2024. Bills et al. (2023) Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., and Saunders, W. Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html, May 2023. Bricken et al. (2023) Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N. L., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Tamkin, A., Nguyen, K., McLean, B., Burke, J. E., Hume, T., Carter, S., Henighan, T., and Olah, C. Towards monosemanticity: Decomposing language models with dictionary learning. https://transformer-circuits.pub/2023/monosemantic-features/index.html, October 2023. Transformer Circuits Thread. Bussmann et al. (2024) Bussmann, B., Leask, P., and Nanda, N. BatchTopK sparse autoencoders. arXiv preprint arXiv:2412.06410, 2024. Bussmann et al. (2025) Bussmann, B., Nabeshima, N., Karvonen, A., and Nanda, N. Learning multi-level features with matryoshka sparse autoencoders. arXiv preprint arXiv:2503.17547, 2025. Bykov et al. (2023) Bykov, K., Kopf, L., Nakajima, S., Kloft, M., and Höhne, M. M. Labeling neural representations with inverse recognition. In NeurIPS, 2023. Chanin et al. (2025) Chanin, D., Wilken-Smith, J., Dulka, T., Bhatnagar, H., and Bloom, J. A is for absorption: Studying feature splitting and absorption in sparse autoencoders. In ICLR, 2025. Cherti et al. (2023) Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gordon, C., Schuhmann, C., Schmidt, L., and Jitsev, J. Reproducible scaling laws for contrastive language-image learning. In CVPR, 2023. Conerly et al. (2024) Conerly, T., Templeton, A., Bricken, T., Marcus, J., and Henighan, T. Update on how we train saes. https://transformer-circuits.pub/2024/april-update/index.html, April 2024. Transformer Circuits Thread. Conmy et al. (2023) Conmy, A., Mavor-Parker, A., Lynch, A., Heimersheim, S., and Garriga-Alonso, A. Towards automated circuit discovery for mechanistic interpretability. In NeurIPS, 2023. Crabbé et al. (2024) Crabbé, J., Rodriguez, P., Shankar, V., Zappella, L., and Blaas, A. Interpreting CLIP: Insights on the robustness to imagenet distribution shifts. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. Cunningham et al. (2024) Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. In ICLR, 2024. Espinosa Zarlenga et al. (2022) Espinosa Zarlenga, M., Barbiero, P., Ciravegna, G., Marra, G., Giannini, F., Diligenti, M., Shams, Z., Precioso, F., Melacci, S., Weller, A., et al. Concept embedding models: Beyond the accuracy-explainability trade-off. In NeurIPS, 2022. Gandelsman et al. (2024) Gandelsman, Y., Efros, A. A., and Steinhardt, J. Interpreting CLIP’s image representation via text-based decomposition. In ICLR, 2024. Gao et al. (2025) Gao, L., la Tour, T. D., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., and Wu, J. Scaling and evaluating sparse autoencoders. In ICLR, 2025. Gemma & DeepMind (2024) Gemma, T. and DeepMind, G. Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:2408.00118, 2024. Ghorbani et al. (2019) Ghorbani, A., Wexler, J., Zou, J. Y., and Kim, B. Towards automatic concept-based explanations. In NeurIPS, 2019. Goh et al. (2021) Goh, G., Cammarata, N., Voss, C., Carter, S., Petrov, M., Schubert, L., Radford, A., and Olah, C. Multimodal neurons in artificial neural networks. Distill, 6(3):e30, 2021. Hernandez et al. (2021) Hernandez, E., Schwettmann, S., Bau, D., Bagashvili, T., Torralba, A., and Andreas, J. Natural language descriptions of deep visual features. In ICLR, 2021. Huh et al. (2024) Huh, M., Cheung, B., Wang, T., and Isola, P. The platonic representation hypothesis. In ICML, 2024. Jain et al. (2023) Jain, S., Lawrence, H., Moitra, A., and Madry, A. Distilling model failures as directions in latent space. In ICLR, 2023. Joukovsky et al. (2023) Joukovsky, B., Sammani, F., and Deligiannis, N. Model-agnostic visual explanations via approximate bilinear models. In ICIP, 2023. Kim et al. (2018) Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In ICML, 2018. Koh et al. (2020) Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E., Kim, B., and Liang, P. Concept bottleneck models. In ICML, 2020. Kopf et al. (2024) Kopf, L., Bommer, P. L., Hedström, A., Lapuschkin, S., Höhne, M. M., and Bykov, K. Cosy: Evaluating textual explanations of neurons. In NeurIPS, 2024. Kornblith et al. (2019) Kornblith, S., Norouzi, M., Lee, H., and Hinton, G. Similarity of neural network representations revisited. In ICML, 2019. Kusupati et al. (2022) Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., Howard-Snyder, W., Chen, K., Kakade, S., Jain, P., et al. Matryoshka representation learning. In NeurIPS, 2022. Li et al. (2022) Li, Y., Wang, H., Duan, Y., Xu, H., and Li, X. Exploring visual interpretability for contrastive language-image pre-training. arXiv preprint arXiv:2209.07046, 2022. Lieberum et al. (2024) Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., and Nanda, N. Gemma Scope: Open sparse autoencoders everywhere all at once on Gemma 2. arXiv preprint arXiv:2408.05147, 2024. Lindsey et al. (2024) Lindsey, J., Templeton, A., Marcus, J., Conerly, T., Batson, J., and Olah, C. Sparse crosscoders for cross-layer features and model diffing. https://transformer-circuits.pub/2024/crosscoders/index.html, October 2024. Transformer Circuits Thread. Liu et al. (2023) Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. In NeurIPS, 2023. Liu et al. (2020) Liu, Y., Zhang, X., Zhang, S., and He, X. Part-aware prototype network for few-shot semantic segmentation. In ECCV, 2020. Liu et al. (2015) Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In ICCV, 2015. Lucieri et al. (2020) Lucieri, A., Bajwa, M. N., Braun, S. A., Malik, M. I., Dengel, A., and Ahmed, S. On interpretability of deep learning based skin lesion classifiers using concept activation vectors. In IJCNN, 2020. Lundberg & Lee (2017) Lundberg, S. and Lee, S.-I. A unified approach to interpreting model predictions. In NeurIPS, 2017. Madeira et al. (2023) Madeira, P., Carreiro, A., Gaudio, A., Rosado, L., Soares, F., and Smailagic, A. ZEBRA: Explaining rare cases through outlying interpretable concepts. In CVPR, 2023. Misino et al. (2022) Misino, E., Marra, G., and Sansone, E. VAEL: Bridging variational autoencoders and probabilistic logic programming. In NeurIPS, 2022. Oikarinen & Weng (2023) Oikarinen, T. and Weng, T.-W. CLIP-Dissect: Automatic description of neuron representations in deep vision networks. In ICLR, 2023. Oikarinen et al. (2023) Oikarinen, T., Das, S., Nguyen, L. M., and Weng, T.-W. Label-free concept bottleneck models. In ICLR, 2023. Paulo & Belrose (2025) Paulo, G. and Belrose, N. Sparse autoencoders trained on the same data learn different features. arXiv preprint arXiv:2501.16615, 2025. Podell et al. (2024) Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., and Rombach, R. SDXL: Improving latent diffusion models for high-resolution image synthesis. In ICLR, 2024. Radford et al. (2021) Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In ICML, 2021. Rajamanoharan et al. (2024a) Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., and Nanda, N. Improving dictionary learning with gated sparse autoencoders. arXiv preprint arXiv:2404.16014, 2024a. Rajamanoharan et al. (2024b) Rajamanoharan, S., Lieberum, T., Sonnerat, N., Conmy, A., Varma, V., Kramár, J., and Nanda, N. Jumping ahead: Improving reconstruction fidelity with JumpReLU sparse autoencoders. arXiv preprint arXiv:2407.14435, 2024b. Ramaswamy et al. (2023) Ramaswamy, V. V., Kim, S. S. Y., Fong, R., and Russakovsky, O. Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability. In CVPR, 2023. Rao et al. (2024) Rao, S., Mahajan, S., Böhle, M., and Schiele, B. Discover-then-name: Task-agnostic concept bottlenecks via automated concept discovery. In ECCV, 2024. Ribeiro et al. (2016) Ribeiro, M. T., Singh, S., and Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In KDD, 2016. Russakovsky et al. (2015) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2015. Sammani et al. (2024) Sammani, F., Joukovsky, B., and Deligiannis, N. Visualizing and understanding contrastive learning. IEEE Transactions on Image Processing, 33:541–555, 2024. Schuhmann et al. (2021) Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes, T., Jitsev, J., and Komatsuzaki, A. LAION-400M: Open dataset of CLIP-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021. Selvaraju et al. (2017) Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017. Sharma et al. (2018) Sharma, P., Ding, N., Goodman, S., and Soricut, R. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, 2018. Shen et al. (2022) Shen, S., Li, L. H., Tan, H., Bansal, M., Rohrbach, A., Chang, K.-W., Yao, Z., and Keutzer, K. How much can CLIP benefit vision-and-language tasks? In ICLR, 2022. Shrikumar et al. (2017) Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. Not just a black box: Learning important features through propagating activation differences. In ICML, 2017. Simonyan (2013) Simonyan, K. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Sundararajan et al. (2017) Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attribution for deep networks. In ICML, 2017. Surkov et al. (2024) Surkov, V., Wendler, C., Terekhov, M., Deschenaux, J., West, R., and Gulcehre, C. Unpacking sdxl turbo: Interpreting text-to-image models with sparse autoencoders. arXiv preprint arXiv:2410.22366, 2024. Templeton et al. (2024) Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Tamkin, A., Durmus, E., Hume, T., Mosconi, F., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., Carter, S., Olah, C., and Henighan, T. Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet. https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html, May 2024. Transformer Circuits Thread. Wang et al. (2024) Wang, W., Lv, Q., Yu, W., Hong, W., Qi, J., Wang, Y., Ji, J., Yang, Z., Zhao, L., Song, X., Xu, J., Xu, B., Li, J., Dong, Y., Ding, M., and Tang, J. CogVLM: Visual expert for pretrained language models. In NeurIPS, 2024. Yu et al. (2025) Yu, S., Kwak, S., Jang, H., Jeong, J., Huang, J., Shin, J., and Xie, S. Representation alignment for generation: Training diffusion transformers is easier than you think. In ICLR, 2025. Zeiler & Fergus (2014) Zeiler, M. D. and Fergus, R. Visualizing and understanding convolutional networks. In ECCV, 2014. Zhai et al. (2023) Zhai, X., Mustafa, B., Kolesnikov, A., and Beyer, L. Sigmoid loss for language image pre-training. In ICCV, 2023. Zhao et al. (2024) Zhao, C., Wang, K., Zeng, X., Zhao, R., and Chan, A. B. Gradient-based visual explanation for transformer-based CLIP. In ICML, 2024. Zhou et al. (2018) Zhou, B., Sun, Y., Bau, D., and Torralba, A. Interpretable basis decomposition for visual explanation. In ECCV, 2018. Appendix for “Interpreting CLIP with Hierarchical Sparse Autoencoders” [sections] [sections]l1 Appendix A Concept Discovery and Validation Here, we describe our approach for detection, which concepts SAE learned, and how we validated the mappings of these concepts to specific neurons. While LLMs are commonly used to identify neuron-encoded concepts (Bereska & Gavves, 2024; Conmy et al., 2023), we follow Rao et al. (2024) in implementing a more computationally efficient approach, which is more tailored to the CLIP-based SAE. CLIP-Based concept matching. The method uses predefined vocabulary of concepts (e.g., ‘hair’, ‘pink’) to compute cosine similarity between CLIP embeddings and SAE decoder columns. After mapping concepts to CLIP’s embedding space and applying the same preprocessing as during SAE training, we remove bp⁢r⁢esubscriptb_prebitalic_p r e from the preprocessed CLIP embeddings for comparison with the decoder. For the feature columns in the SAE decoder, which are unit-magnitude by definition, the best matching concept to the neuron is determined by maximizing cosine similarity, where a value of 1 indicates perfect alignment. Thus, the optimal concept scsubscripts_csitalic_c for neuron pcsubscriptp_cpitalic_c is defined as: sc=arg⁡maxv∈V⁢[cos⁡(pc,CLIP⁢(v))]=arg⁡maxv∈V⁢[pc⋅CLIP⁢(v)|pc|⁢|CLIP⁢(v)|].subscriptdelimited-[]subscriptCLIPdelimited-[]⋅subscriptCLIPsubscriptCLIPs_c= v∈ V [ (p_c,CLIP(v)) ]=% v∈ V [ p_c·CLIP(v)|p_c|| % CLIP(v)| ].sitalic_c = start_UNDERACCENT v ∈ V end_UNDERACCENT start_ARG arg max end_ARG [ cos ( pitalic_c , CLIP ( v ) ) ] = start_UNDERACCENT v ∈ V end_UNDERACCENT start_ARG arg max end_ARG [ divide start_ARG pitalic_c ⋅ CLIP ( v ) end_ARG start_ARG | pitalic_c | | CLIP ( v ) | end_ARG ] . (3) Pre-activation bias in similarity calculations. While the method above suggests removing bp⁢r⁢esubscriptb_prebitalic_p r e from CLIP embeddings, our empirical analysis revealed this significantly masks neuron-concept relationships. Without bp⁢r⁢esubscriptb_prebitalic_p r e, in Figure 9, we show that similarities cluster around 0.1 (mean) with maxima around 0.15−0.20.150.20.15-0.20.15 - 0.2, whereas retaining bp⁢r⁢esubscriptb_prebitalic_p r e yields higher similarity scores (>>>0.42) that correspond to correct concepts. Importantly, both approaches preserve neuron rankings, with over 95% of concepts sharing identical highest matching neurons, so not removing bias doesn’t destroy ranking. Manual evaluation confirmed that neurons with bias-removed similarities (∼similar-to ∼0.2) are under-estimated compared to their bias-inclusive counterparts (∼similar-to ∼0.5). Based on these findings, we retain bp⁢r⁢esubscriptb_prebitalic_p r e in our calculations. Figure 9: Impact of pre-activation bias on concept similarities. We take the highest neuron similarities per concept across expansion rates for (RW) and (UW) MSAE variants, (a) with and (b) without bp⁢r⁢esubscriptb_prebitalic_p r e. Not removing bp⁢r⁢esubscriptb_prebitalic_p r e yields a better distribution with higher similarities that better reflect neuron interpretability. Limitations of the current approach. We identify several limitations in the current concept mapping approach. First, the method assigns concepts to neurons based on the highest similarity regardless of the absolute matching quality, potentially leading to poor concept assignments when no good matches exist. Second, hierarchical concepts pose a challenge when matching with more specific neuronal features. For example, a high-level concept like ‘mammal’ may show strong similarity to both ‘cat’ and ‘dog’ neurons, resulting in imprecise assignments. This issue stems from either semantic feature vectors that aren’t perfectly orthogonal or incomplete vocabulary coverage. Threshold-based validation. To address the challenges identified above, we propose three validations to remove weak assignments. Before applying the validations, we switch the mapping from concepts to neurons, to a mapping of neurons to concepts to reduce spurious assignments. Based on this, we threshold results by either: 1. Cosine similarity >0.42absent0.42>0.42> 0.42, which ensures that the neurons exhibit strong alignment with their assigned concepts, preventing weak or ambiguous concept mappings 2. Concept similarity ratio Top similaritySecond-highest similarity>2.0Top similaritySecond-highest similarity2.0 Top similaritySecond-highest similarity>2.0divide start_ARG Top similarity end_ARG start_ARG Second-highest similarity end_ARG > 2.0, confirms concept uniqueness by requiring the best match to be at least twice as strong as the second-best concept, avoiding distributed representations 3. One concept per neuron (with the highest similarity) enforces monosemanticity by assigning only the most strongly aligned concept to each neuron, which is needed due to the vocabulary structure containing multiple variations of the same concept (e.g., ‘bird’ and ‘birdie’) Vocab data. Following (Ramaswamy et al., 2023) principle that vocabulary concepts should be simple, we adopted the vocabulary from (Bhalla et al., 2024). This vocabulary comprises the most frequent unigrams from LAION-400m captions dataset (Schuhmann et al., 2021). To account for semantic relationships between concepts, we perform manual validation of top concepts for each discovered neuron. Semantic consistency. Manual evaluation of top concepts per neuron verifies concept consistency and identifies hierarchical relationships, where top vocab similarities (such as dog breeds) can indicate broader categorical concepts (such as ‘dog’). Results across SAE architectures. We evaluate concept neurons across architectures in Table 3. From 37,445 neurons at expansion rate 8, only a small fraction passed similarity validation: ∼similar-to ∼10% for TopK and 1–3% for ReLU and Matryoshka architectures. While higher expansion rates typically reduce valid neurons, both TopK variants and ReLU (λ=0.0010.001λ=0.001λ = 0.001) exhibit increased valid mappings under best vector validation. Although these results suggest limited concept learning or concept distribution across neurons, the vocabulary structure prevents definitive conclusions, due to the dominance of non-semantic unigrams, and many semantically similar concepts appear across vocabulary (e.g., ‘blue’, ‘blau’ and ‘bleu’). The validation results from the table demonstrate that sparser architectures (TopK) yield 3–8 times more interpretable concept neurons compared to more dense ones (ReLU), with Matryoshka being between the two, supporting the hypothesis that sparsity promotes concept specialization. Table 3: Comparison of valid concept neurons detected across different SAEs and validation methods. The validation methods include a cosine similarity threshold above 0.42, selecting the best matching neuron, combining both criteria, applying the concept similarity ratio threshold between the first and second best vocab concept for the neuron, and enforcing all conditions simultaneously. Measurements were made for each model at three expansion rates (×8∣×16∣×32× 8 × 16 × 32× 8 ∣ × 16 ∣ × 32). Model Similarity above 0.42 Best vector Above and best Ratio threshold All conditions ReLU (λ=0.030.03λ=0.03λ = 0.03) 3308⁢∣3765∣⁢41813308delimited-∣376541813308 3765 41813308 ∣ 3765 ∣ 4181 2740⁢∣3608∣⁢51292740delimited-∣360851292740 3608 51292740 ∣ 3608 ∣ 5129 874⁢∣1046∣⁢1304874delimited-∣10461304874 1046 1304874 ∣ 1046 ∣ 1304 380⁢∣175∣⁢45380delimited-∣17545380 175 45380 ∣ 175 ∣ 45 97⁢∣31∣⁢1697delimited-∣311697 31 1697 ∣ 31 ∣ 16 ReLU (λ=0.0030.003λ=0.003λ = 0.003) 896⁢∣781∣⁢799896delimited-∣781799896 781 799896 ∣ 781 ∣ 799 2372⁢∣3305∣⁢51292372delimited-∣330551292372 3305 51292372 ∣ 3305 ∣ 5129 217⁢∣196∣⁢188217delimited-∣196188217 196 188217 ∣ 196 ∣ 188 395⁢∣251∣⁢194395delimited-∣251194395 251 194395 ∣ 251 ∣ 194 29⁢∣19∣⁢729delimited-∣19729 19 729 ∣ 19 ∣ 7 ReLU (λ=0.0010.001λ=0.001λ = 0.001) 351⁢∣247∣⁢128351delimited-∣247128351 247 128351 ∣ 247 ∣ 128 4116⁢∣6793∣⁢114174116delimited-∣6793114174116 6793 114174116 ∣ 6793 ∣ 11417 77⁢∣63∣⁢3277delimited-∣633277 63 3277 ∣ 63 ∣ 32 169⁢∣47∣⁢3169delimited-∣473169 47 3169 ∣ 47 ∣ 3 8⁢∣2∣⁢08delimited-∣208 2 08 ∣ 2 ∣ 0 TopK (k=3232k=32k = 32) 4081⁢∣4719∣⁢50274081delimited-∣471950274081 4719 50274081 ∣ 4719 ∣ 5027 2755⁢∣3415∣⁢38272755delimited-∣341538272755 3415 38272755 ∣ 3415 ∣ 3827 1021⁢∣1259∣⁢14111021delimited-∣125914111021 1259 14111021 ∣ 1259 ∣ 1411 999⁢∣857∣⁢858999delimited-∣857858999 857 858999 ∣ 857 ∣ 858 216⁢∣197∣⁢203216delimited-∣197203216 197 203216 ∣ 197 ∣ 203 TopK (k=6464k=64k = 64) 3797⁢∣4504∣⁢49153797delimited-∣450449153797 4504 49153797 ∣ 4504 ∣ 4915 2557⁢∣3272∣⁢1672557delimited-∣32721672557 3272 1672557 ∣ 3272 ∣ 167 873⁢∣1080∣⁢1238873delimited-∣10801238873 1080 1238873 ∣ 1080 ∣ 1238 1322⁢∣1151∣⁢11671322delimited-∣115111671322 1151 11671322 ∣ 1151 ∣ 1167 238⁢∣232∣⁢238238delimited-∣232238238 232 238238 ∣ 232 ∣ 238 TopK (k=128128k=128k = 128) 2141⁢∣2590∣⁢30592141delimited-∣259030592141 2590 30592141 ∣ 2590 ∣ 3059 2167⁢∣2670∣⁢33062167delimited-∣267033062167 2670 33062167 ∣ 2670 ∣ 3306 455⁢∣565∣⁢745455delimited-∣565745455 565 745455 ∣ 565 ∣ 745 1508⁢∣1383∣⁢13791508delimited-∣138313791508 1383 13791508 ∣ 1383 ∣ 1379 211⁢∣226∣⁢231211delimited-∣226231211 226 231211 ∣ 226 ∣ 231 TopK (k=256256k=256k = 256) 943⁢∣888∣⁢962943delimited-∣888962943 888 962943 ∣ 888 ∣ 962 1883⁢∣2191∣⁢26311883delimited-∣219126311883 2191 26311883 ∣ 2191 ∣ 2631 168⁢∣167∣⁢171168delimited-∣167171168 167 171168 ∣ 167 ∣ 171 1579⁢∣1523∣⁢15541579delimited-∣152315541579 1523 15541579 ∣ 1523 ∣ 1554 134⁢∣126∣⁢127134delimited-∣126127134 126 127134 ∣ 126 ∣ 127 Matryoshka (RW) 1136⁢∣1109∣⁢10381136delimited-∣110910381136 1109 10381136 ∣ 1109 ∣ 1038 1628⁢∣2213∣⁢25411628delimited-∣221325411628 2213 25411628 ∣ 2213 ∣ 2541 237⁢∣257∣⁢259237delimited-∣257259237 257 259237 ∣ 257 ∣ 259 1429⁢∣1135∣⁢10591429delimited-∣113510591429 1135 10591429 ∣ 1135 ∣ 1059 140⁢∣132∣⁢121140delimited-∣132121140 132 121140 ∣ 132 ∣ 121 Matryoshka (UW) 907⁢∣894∣⁢748907delimited-∣894748907 894 748907 ∣ 894 ∣ 748 1517⁢∣1908∣⁢23961517delimited-∣190823961517 1908 23961517 ∣ 1908 ∣ 2396 195⁢∣191∣⁢167195delimited-∣191167195 191 167195 ∣ 191 ∣ 167 1254⁢∣1169∣⁢10691254delimited-∣116910691254 1169 10691254 ∣ 1169 ∣ 1069 125⁢∣128∣⁢98125delimited-∣12898125 128 98125 ∣ 128 ∣ 98 Appendix B Implementation Details We conducted experiments using CLIP ViT-L/14 (and ViT-B/16, reported later in the appendix) pre-trained on the C3M dataset image training subset. Following (Bricken et al., 2023; Gao et al., 2025) and this blog (Conerly et al., 2024), our SAE implementation uses unit-norm constraint on the decoder columns with untied encoder and decoder. We initialized bpresubscriptpreb_prebroman_pre and bencsubscriptencb_encbroman_enc with zeros, decoder with uniform Kaiming initialization (scaled L2subscript2L_2L2 norm to 0.1), and encoder as the decoder’s transpose. Gradient clipping was set to 1. For data preprocessing, we centralized embeddings per modality (Bhalla et al., 2024) and scaled by a constant to achieve x∈⁢[‖x‖2]=nsubscriptdelimited-[]subscriptnorm2E_x [\|x\|_2]= nblackboard_Ex ∈ X [ ∥ x ∥2 ] = square-root start_ARG n end_ARG. All models were trained for 30 epochs on a single NVIDIA A100 GPU with batch size 4096, except for the model with an expansion rate of 32, which was trained for 20 epochs. While MSAE and TopK showed dead neurons, we omitted revival strategies as only in TopK the number of dead neurons exceeded 1%. B.1 Hyperparameters We first conducted experiments on CLIP RN50 using hyperparameters from (Rao et al., 2024), later validating them on ViT-L/14. For ViT-L/14, we explored parameters near RN50-optimal values to ensure cross-architecture consistency. With expansion factor 8 (768 → 6144), we explore: • Learning rates per method: 1⋅10−5,5⋅10−5,1⋅10−4,5⋅10−4,1⋅10−3⋅1superscript105⋅5superscript105⋅1superscript104⋅5superscript104⋅1superscript1031· 10^-5,5· 10^-5,1· 10^-4,5· 10^-4,1· 10^-31 ⋅ 10- 5 , 5 ⋅ 10- 5 , 1 ⋅ 10- 4 , 5 ⋅ 10- 4 , 1 ⋅ 10- 3 • ReLU L1subscript1L_1L1 coefficients (λ): 1⋅10−4,3⋅10−3,1⋅10−3,3⋅10−2⋅1superscript104⋅3superscript103⋅1superscript103⋅3superscript1021· 10^-4,3· 10^-3,1· 10^-3,3· 10^-21 ⋅ 10- 4 , 3 ⋅ 10- 3 , 1 ⋅ 10- 3 , 3 ⋅ 10- 2 • TopK values: k ∈ 32, 64, 128, 256, up to 256 as (Gao et al., 2025) suggests higher values do not learn interpretable features • Matryoshka K-lists: 32…6144 and 64…6144, for higher expansion rates we adjust the upper limit • α coefficients: uniform weighting (UW) 1,1,1,1,1,1,1 and reverse weighting (RW) 7,6,5,4,3,2,1 The optimal parameters from these experiments were applied to larger expansion factors of 16 (768→12288→76812288768→ 12288768 → 12288), 32 (768→24576→76824576768→ 24576768 → 24576), and all expansion rates of VIT-B/16. Following reviewer suggestions, we extended our evaluations to include TopK (k=512512k=512k = 512) and BatchTopK for ∈ 16, 32, 64, 128, 256. The BatchTopK models were trained using the same hyperparameters as TopK models. These additional results are presented in the Appendix. We also attempted to integrate JumpReLU (Rajamanoharan et al., 2024b) into our evaluations, but did not achieve meaningful results, which may be attributed to an implementation error in our code. B.2 Optimal Parameters Based on RN50 experiments and subsequent adjustment to VIT-L/14 with expansion factor 8, we selected the following optimal configurations: ReLU with learning rate 5⋅10−5⋅5superscript1055· 10^-55 ⋅ 10- 5 and λ values of 1⋅10−3⋅1superscript1031· 10^-31 ⋅ 10- 3, 3⋅10−3⋅3superscript1033· 10^-33 ⋅ 10- 3, 3⋅10−2⋅3superscript1023· 10^-23 ⋅ 10- 2; TopK with learning rate 5⋅10−4⋅5superscript1045· 10^-45 ⋅ 10- 4 and k values of 32, 64, 128, and 256; MSAE with learning rate 1⋅10−4⋅1superscript1041· 10^-41 ⋅ 10- 4, K-list 64…6144, for both uniform (UW) and reverse weighting (RW) α strategies. Appendix C Highest Neuron Magnitudes Based on results from Figure 4, we analyze images from ImageNet-1k validation set that produced the highest neuron magnitudes for TopK and MSAE architectures. In Table 4, we show that more constrained SAEs (TopK (k≤128128k≤ 128k ≤ 128)) produce a higher number of samples with neurons above 15; however, the percentage of valid neurons is lower than in MSAE and TopK (k=256256k=256k = 256) which have significantly less high magnitude samples. This indicates that high-magnitude neurons in highly constrained TopK may presumably learn complex features. Figure 10 presents the top 6 valid highest neuron magnitude images per model, demonstrating that very high magnitudes often correspond to images with almost singular concepts. Table 4: Analysis of high-magnitude neurons across architectures. We analyze samples with magnitude >15absent15>15> 15 in the ImageNet-1k validation set, showing the number of total occurrences, the proportion of valid concepts among high-magnitude concepts, and the rate of high-magnitude valid concepts relative to all valid concepts in the model from Table 3. Model High-Magnitude Samples Valid Concept Rate High-Magnitude Concept Rate TopK (k=3232k=32k = 32) 113113113113 6666 (5%percent55\%5 %) 216216216216 (3%percent33\%3 %) TopK (k=6464k=64k = 64) 18181818 00 (0%percent00\%0 %) 238238238238 (0%percent00\%0 %) TopK (k=128128k=128k = 128) 3333 00 (0%percent00\%0 %) 211211211211 (0%percent00\%0 %) TopK (k=256256k=256k = 256) 12121212 8888 (67%percent6767\%67 %) 134134134134 (6%percent66\%6 %) MSAE (RW) 21212121 8888 (38%percent3838\%38 %) 140140140140 (6%percent66\%6 %) MSAE (UW) 22222222 7777 (32%percent3232\%32 %) 125125125125 (6%percent66\%6 %) Figure 10: Images with the highest valid concept neuron magnitudes. We took 6 images from ImageNet-1k validation set per model based on results from Table 4 with (a) TopK k=3232k=32k = 32, (b) TopK k=256256k=256k = 256, (c) Matryoshka RW, and (d) Matryoshka UW. Appendix D Activation Soft-capping Analysis of MSAE in Figure 4 reveals that despite effective handling of multi-granular sparsity, the model learns to encode concepts using extremely large activation values (>>>15). This can lead to more composite rather than atomic features, as it was in the case of TopK (k≤128128k≤ 128k ≤ 128) revealed in Appendix C. D.1 Definition of Soft-capping. To address this, we introduce activation soft-capping (SC), adapting the logit soft-capping concept from language models (Gemma & DeepMind, 2024). This technique prevents too high activation magnitude and circumvention of sparsity constraints via activation magnitude manipulation: z^=softcap⋅tanh⁡(z/softcap),x^=Wdec⁢z^+bpre,formulae-sequence^⋅softcapsoftcap^subscriptdec^subscriptpre z=softcap· (z/softcap), x=W_ % dec z+b_pre,over start_ARG z end_ARG = softcap ⋅ tanh ( z / softcap ) , over start_ARG x end_ARG = Wroman_dec over start_ARG z end_ARG + broman_pre , (4) where softcapsoftcapsoftcapsoftcap hyperparameter controls maximum activation magnitude. Combined with ReLU, this bounds SAE activations to (0,softcap)0softcap(0,softcap)( 0 , softcap ). D.2 Results. In Table 5, we show soft-capping’s impact on MSAE performance across key metrics using the ImageNet-1k training set. Our analysis reveals two key benefits of applying soft-capping on MSAE. First, it consistently improves L0subscript0L_0L0 sparsity, with MSAE RW (SC) achieving values of 0.830 and 0.889 for 6144 and 12288 sizes, respectively. Second, while base MSAE UW maintains better FVU and CS scores, the soft-capped MSAE RW significantly reduces the number of dead neurons. With a latent size of 12288, MSAE RW exhibits only 66 dead neurons, compared to 491 in the model with a latent size of 6144. These findings show that soft-capping is particularly beneficial for large-scale SAEs with wider sparse layers, where neuron utilization becomes more challenging. The technique provides a practical approach to reducing dead neurons while maintaining high L0subscript0L_0L0 sparsity, with only minimal impact on reconstruction fidelity. Table 5: Impact of soft-capping (SC) on MSAE performance. We evaluate soft-capping across different expansion rates (8 and 16) on the ImageNet-1k validation set, comparing UW and RW variants. While base MSAE maintains better FVU and CS scores, soft-capped variants show improved L0subscript0L_0L0 sparsity and reduced number of dead neurons, particularly at larger sizes. Bold values indicate the best performance per metric and size, with NDN in parentheses showing dead neuron counts from the final checkpoint. Size Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ NDN ↓ ↓ 6144 Matryoshka (RW) 0.829±.008subscript0.829plus-or-minus.0080.829_±.0080.829± .008 0.007±.003subscript0.007plus-or-minus.0030.007_±.0030.007± .003 0.997±.002subscript0.997plus-or-minus.0020.997_±.0020.997± .002 0.809±.002subscript0.809plus-or-minus.0020.809_±.0020.809± .002 2⁢(4)242(4)2 ( 4 ) Matryoshka (UW) 0.748±.006subscript0.748plus-or-minus.0060.748_±.0060.748± .006 0.001±.002subscript0.001plus-or-minus.0020.001_±.0020.001± .002 0.999±.000subscript0.999plus-or-minus.0000.999_±.0000.999± .000 0.848±.003subscript0.848plus-or-minus.0030.848_±.0030.848± .003 0⁢(22)0220(22)0 ( 22 ) Matryoshka (RW, SC) 0.830±.007subscript0.830plus-or-minus.0070.830_±.0070.830± .007 0.010±.003subscript0.010plus-or-minus.0030.010_±.0030.010± .003 0.995±.002subscript0.995plus-or-minus.0020.995_±.0020.995± .002 0.839±.004subscript0.839plus-or-minus.0040.839_±.0040.839± .004 1(2) Matryoshka (UW, SC) 0.774±.006subscript0.774plus-or-minus.0060.774_±.0060.774± .006 0.004±.001subscript0.004plus-or-minus.0010.004_±.0010.004± .001 0.998±.001subscript0.998plus-or-minus.0010.998_±.0010.998± .001 0.856±.003subscript0.856plus-or-minus.0030.856_±.0030.856± .003 1⁢(3)131(3)1 ( 3 ) 12288 Matryoshka (RW) 0.884±.006subscript0.884plus-or-minus.0060.884_±.0060.884± .006 0.005±.003subscript0.005plus-or-minus.0030.005_±.0030.005± .003 0.998±.001subscript0.998plus-or-minus.0010.998_±.0010.998± .001 0.801±.003subscript0.801plus-or-minus.0030.801_±.0030.801± .003 32⁢(124)3212432(124)32 ( 124 ) Matryoshka (UW) 0.830±.003subscript0.830plus-or-minus.0030.830_±.0030.830± .003 0.000±.000subscript0.000plus-or-minus.0000.000_±.0000.000± .000 1.000±.000subscript1.000plus-or-minus.0001.000_±.0001.000± .000 0.853±.002subscript0.853plus-or-minus.0020.853_±.0020.853± .002 22⁢(491)2249122(491)22 ( 491 ) Matryoshka (RW, SC) 0.889±.005subscript0.889plus-or-minus.0050.889_±.0050.889± .005 0.007±.003subscript0.007plus-or-minus.0030.007_±.0030.007± .003 0.997±.001subscript0.997plus-or-minus.0010.997_±.0010.997± .001 0.833±.002subscript0.833plus-or-minus.0020.833_±.0020.833± .002 11(66) Matryoshka (UW, SC) 0.842±.005subscript0.842plus-or-minus.0050.842_±.0050.842± .005 0.001±.001subscript0.001plus-or-minus.0010.001_±.0010.001± .001 0.999±.000subscript0.999plus-or-minus.0000.999_±.0000.999± .000 0.849±.002subscript0.849plus-or-minus.0020.849_±.0020.849± .002 87⁢(172)8717287(172)87 ( 172 ) Appendix E CKNNA Alignment Metric Introduced in Section 4.4, CKNNA (Centered Kernel Nearest-Neighbor Alignment) measures representation similarity between networks while focusing on local neighborhood structures. Unlike its predecessor CKA (Kornblith et al., 2019), CKNNA refines the alignment computation by considering only k-nearest neighbors, making it more sensitive to local geometric relationships. The alignment score between two networks’ representations in our case CLIP embeddings and SAE activation is computed as: CKNNA⁢(,)=Align⁢(,)HSIC⁢(,)⁢HSIC⁢(,),HSIC⁢(,)=1(n−1)2⁢(∑i∑j(⟨ϕi,ϕj⟩−l⁢[⟨ϕi,ϕl⟩])⁢(⟨ψi,ψj⟩−l⁢[⟨ψi,ψl⟩])),Align⁢(,)=1(n−1)2⁢(∑i∑jα⁢(i,j)⁢(⟨ϕi,ϕj⟩−l⁢[⟨ϕi,ϕl⟩])⁢(⟨ψi,ψj⟩−l⁢[⟨ψi,ψl⟩])),α⁢(i,j;k)=⁢[i≠j⁢ and ⁢ϕj∈knn⁢(ϕi;k)⁢ and ⁢ψj∈knn⁢(ψi;k)],formulae-sequenceCKNNAAlignHSICHSICformulae-sequenceHSIC1superscript12subscriptsubscriptsubscriptitalic-ϕsubscriptitalic-ϕsubscriptdelimited-[]subscriptitalic-ϕsubscriptitalic-ϕsubscriptsubscriptsubscriptdelimited-[]subscriptsubscriptformulae-sequenceAlign1superscript12subscriptsubscriptsubscriptitalic-ϕsubscriptitalic-ϕsubscriptdelimited-[]subscriptitalic-ϕsubscriptitalic-ϕsubscriptsubscriptsubscriptdelimited-[]subscriptsubscript1delimited-[] and subscriptitalic-ϕknnsubscriptitalic-ϕ and subscriptknnsubscript gatheredCKNNA(K,L)= Align( % K,L) HSIC(K,K)HSIC(L% ,L),\\ HSIC(K,L)= 1(n-1)^2 ( _i _j(% _i, _j -E_l[ _i, _l % ])( _i, _j -E_l[ _i, _l% ]) ),\\ Align(K,L)= 1(n-1)^2 ( _i _j% α(i,j)( _i, _j -E_l[ _i,% _l ])( _i, _j -E_l[ _% i, _l ]) ),\\ α(i,j;k)=1[i≠ j and _j ( _i;k)% and _j ( _i;k)], gatheredstart_ROW start_CELL CKNNA ( K , L ) = divide start_ARG Align ( K , L ) end_ARG start_ARG square-root start_ARG HSIC ( K , K ) HSIC ( L , L ) end_ARG end_ARG , end_CELL end_ROW start_ROW start_CELL HSIC ( K , L ) = divide start_ARG 1 end_ARG start_ARG ( n - 1 )2 end_ARG ( ∑i ∑j ( ⟨ ϕitalic_i , ϕitalic_j ⟩ - blackboard_El [ ⟨ ϕitalic_i , ϕitalic_l ⟩ ] ) ( ⟨ ψitalic_i , ψitalic_j ⟩ - blackboard_El [ ⟨ ψitalic_i , ψitalic_l ⟩ ] ) ) , end_CELL end_ROW start_ROW start_CELL Align ( K , L ) = divide start_ARG 1 end_ARG start_ARG ( n - 1 )2 end_ARG ( ∑i ∑j α ( i , j ) ( ⟨ ϕitalic_i , ϕitalic_j ⟩ - blackboard_El [ ⟨ ϕitalic_i , ϕitalic_l ⟩ ] ) ( ⟨ ψitalic_i , ψitalic_j ⟩ - blackboard_El [ ⟨ ψitalic_i , ψitalic_l ⟩ ] ) ) , end_CELL end_ROW start_ROW start_CELL α ( i , j ; k ) = 1 [ i ≠ j and ϕitalic_j ∈ knn ( ϕitalic_i ; k ) and ψitalic_j ∈ knn ( ψitalic_i ; k ) ] , end_CELL end_ROW (5) where HSIC measures the global similarity between kernel matrices, and Align introduces the neighborhood constraint through α⁢(i,j;k)α(i,j;k)α ( i , j ; k ). The indicator function α⁢(i,j;k)α(i,j;k)α ( i , j ; k ) ensures that only pairs of points that are k-nearest neighbors in both representation spaces contribute to the alignment score. Here, ϕi,ϕjsubscriptitalic-ϕsubscriptitalic-ϕ _i, _jϕitalic_i , ϕitalic_j represent CLIP embeddings and ψi,ψjsubscriptsubscript _i, _jψitalic_i , ψitalic_j represent SAE activations for corresponding input data points i and j. Following (Yu et al., 2025), we set k=1010k=10k = 10 as it provides better alignment sensitivity and calculate CKNNA over randomly sampled (batch size) 10,000 representations when evaluating. Higher CKNNA scores indicate stronger similarity between the CLIP and SAE learned representations. Appendix F Evaluating MSAE: Additional Results We extend the results from Section 4 by analyzing multiple expansion rates, SAE variants, input modalities, and CLIP architectures. Unless otherwise specified, experiments use CLIP ViT-L/14 with an expansion rate of 8 on image modality. For text modality evaluations, we use the C3M validation subset, while image modality evaluations are performed on the ImageNet-1k training subset. Figure 11: Extended sparsity-fidelity trade-off analysis across modalities. Expanding on Figure 2, we compare ReLU SAE (λ=0.03,0.01,0.003,0.0010.030.010.0030.001λ=0.03,0.01,0.003,0.001λ = 0.03 , 0.01 , 0.003 , 0.001), TopK SAE (k=32,64,128,2563264128256k=32,64,128,256k = 32 , 64 , 128 , 256), BatchTopK SAE (k=32,64,128,2563264128256k=32,64,128,256k = 32 , 64 , 128 , 256), and MSAE (RW, UW) using two reconstruction metrics: mean EVR fidelity (top) and mean cosine similarity (bottom). Results are shown for both image (left) and text (right) modalities, with standard deviation also reported for each metric, demonstrating MSAE’s consistent performance across modalities and metrics. F.1 Sparsity–Fidelity Trade-off Figure 11 presents an extended analysis of sparsity-fidelity trade-offs, including standard deviations and an alternative reconstruction metric tailored for CLIP embeddings (cosine similarity). The results demonstrate MSAE’s superior stability across both modalities, particularly in text representations where only MSAE models shows stable and elevated results. Furthermore, we observe that models learned with lower k or higher sparsity regularization show high variance on the trained (image) modality, and this instability becomes even more pronounced on the text modality. Due to MSAE’s inherently low variance in the image modality, this similar low variance is preserved in the other modality, leading to consistently high and stable performance across both modalities. Figure 12 further strengthens our findings by showing MSAE’s superiority on a different CLIP architecture (ViT-B/16). Figure 12: ViT-B/16 sparsity-fidelity trade-off analysis across modalities. Parallel analysis to ViT-L/14 (Figure 11), demonstrating that MSAE’s superior performance and stability generalizes across CLIP architectures. F.2 Ablation: Matryoshka at Lower Granularity Levels Figure 13 extends the analysis from Figure 3 by evaluating four key metrics: reconstruction fidelity (EVR), reconstruction error (CS), alignment (CKNNA), and neuron utilization (NDN). Our expanded comparison reinforces MSAE’s competitive performance against TopK SAE, demonstrating that MSAE (RW) achieves similar or better results across most metrics except for NDN, where (UW) version of MSAE performs better. Figure 13: Comprehensive comparison of Matryoshka and TopK SAE on ImageNet-1k. Extension of Figure 3 comparing model performance on reconstruction fidelity (EVR), reconstruction error (CS), concept alignment (CKNNA), and neuron utilization (NDN) against sparsity (L0). MSAE (RW) demonstrates competitive performance across most metrics, while MSAE (UW) achieves better results in neuron utilization. F.3 Activation Magnitudes Analysis We extend the activation magnitude analysis from Figure 4 by including varying versions of model sparsity from each evaluated SAE architecture. Figure 14 shows non-zero and maximum SAE activations for expansion rate 8, revealing that less constrained TopK models exhibit double-curvature distributions similar to MSAE and ReLU. Maximum activation analysis highlights ReLU’s shrinkage effect, while TopK and MSAE maintain distributions closer to normal. These patterns persist at higher expansion rates (16 and 32) as shown in Figure 15. Figure 14: Activation distributions at expansion rate 8. Extended analysis of Figure 4 showing: (left) non-zero activation distributions, revealing TopK’s convergence to double-curvature patterns at lower constraints (higher k), (right) maximum activation distributions, demonstrating ReLU shrinkage problem compared to TopK and MSAE behavior which resembles a normal distribution. (a) Expansion rate 16 (b) Expansion rate 32 Figure 15: Activation distributions at higher expansion rates. Extended analysis of Figure 4 for expansion rates 16 (a) and 32 (b), showing consistency of distribution patterns across scales. F.4 Progressive Recovery We extend the analysis from Figure 2 by examining progressive reconstruction performance across additional metrics and modalities. Figure 16 demonstrates performance at expansion rate 8 for reconstruction quality (EVR, CS) and neuron utilization (NDN) across both modalities, while Figure 17 extends this analysis to expansion rates 16 and 32. Figure 16: Progressive recovery analysis at expansion rate 8. Extension of Figure 2 showing reconstruction (EVR, CS), alignment (CKNNA), and neuron utilization (NDN) metrics against an increasing number of utilized top magnitude SAE neurons for image and text modalities. MSAE demonstrates comparable performance to TopK (k=256256k=256k = 256) on image modality and superior performance on text. (a) Expansion rate 16 (b) Expansion rate 32 Figure 17: Progressive recovery analysis at higher expansion rates. Analysis parallel to Figure 16 for expansion rates 16 (a) and 32 (b), demonstrating the stability of the results over higher expansion rates. F.5 Comprehensive Evaluation with CLIP ViT-L/14 and ViT-B/16 Architectures We present an extensive quantitative comparison of SAE variants across CLIP ViT-L/14 and ViT-B/16 architectures. Our evaluation encompasses ReLU (λ=0.03,0.01,0.003,0.0010.030.010.0030.001λ=0.03,0.01,0.003,0.001λ = 0.03 , 0.01 , 0.003 , 0.001), TopK (k=32,64,128,256,5123264128256512k=32,64,128,256,512k = 32 , 64 , 128 , 256 , 512), BatchTopK (k=16,32,64,128,256163264128256k=16,32,64,128,256k = 16 , 32 , 64 , 128 , 256), and MSAE (RW, UW) models, tested across three expansion rates (8, 16, 32) for both image and text modalities. For text modality and ViT-B/16 architecture, we omit LP (Acc) and LP (KL) metrics based on our findings in Section 4.4 that CS and FVU correlate strongly with linear probing metrics. F.5.1 Results for Image Modality For image modality, Tables 6–8 present detailed results for ViT-L/14 across expansion rates 8, 16, and 32, while Tables 9–11 show parallel performance metrics for ViT-B/16. These tables extend the analysis from Table 1, providing comprehensive measurements across different metrics and model configurations. Table 6: CLIP ViT-L/14 SAE comparison at expansion rate 8. Extended evaluation from Table 1 with additional TopK, ReLU and BatchTopK variants on ImageNet-1k. Arrows indicate preferred metric direction, NDN values show training set dead neurons in parentheses. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ LP (KL) ↓ ↓ LP (Acc) ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .920±.008subscript.920plus-or-minus.008.920_±.008.920± .008 .185±.031subscript.185plus-or-minus.031.185_±.031.185± .031 .928±.009subscript.928plus-or-minus.009.928_±.009.928± .009 50.5±77.1subscript50.5plus-or-minus77.150.5_± 77.150.5± 77.1 .936±.244subscript.936plus-or-minus.244.936_±.244.936± .244 .727±.004subscript.727plus-or-minus.004.727_±.004.727± .004 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .762±.010subscript.762plus-or-minus.010.762_±.010.762± .010 .033±.005subscript.033plus-or-minus.005.033_±.005.033± .005 .985±.002subscript.985plus-or-minus.002.985_±.002.985± .002 7.16±11.3subscript7.16plus-or-minus11.37.16_± 11.37.16± 11.3 .977±.149subscript.977plus-or-minus.149.977_±.149.977± .149 .742±.005subscript.742plus-or-minus.005.742_±.005.742± .005 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .649±.007subscript.649plus-or-minus.007.649_±.007.649± .007 .004±.000subscript.004plus-or-minus.000.004_±.000.004± .000 .998±.000subscript.998plus-or-minus.000.998_±.000.998± .000 0.66±1.03subscript0.66plus-or-minus1.030.66_± 1.030.66± 1.03 .994±.083subscript.994plus-or-minus.083.994_±.083.994± .083 .781±.004subscript.781plus-or-minus.004.781_±.004.781± .004 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .553±.006subscript.553plus-or-minus.006.553_±.006.553± .006 .002±.001subscript.002plus-or-minus.001.002_±.001.002± .001 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 0.36±0.65subscript0.36plus-or-minus0.650.36_± 0.650.36± 0.65 .995±.073subscript.995plus-or-minus.073.995_±.073.995± .073 .822±.004subscript.822plus-or-minus.004.822_±.004.822± .004 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .960±.010subscript.960plus-or-minus.010.960_±.010.960± .010 .245±.043subscript.245plus-or-minus.043.245_±.043.245± .043 .874±.021subscript.874plus-or-minus.021.874_±.021.874± .021 109±161subscript109plus-or-minus161109_± 161109± 161 .903±.300subscript.903plus-or-minus.300.903_±.300.903± .300 .711±.005subscript.711plus-or-minus.005.711_±.005.711± .005 .002.002.002.002 0⁢(1235)012350(1235)0 ( 1235 ) TopK (k=6464k=64k = 64) .950±.009subscript.950plus-or-minus.009.950_±.009.950± .009 .172±.026subscript.172plus-or-minus.026.172_±.026.172± .026 .912±.013subscript.912plus-or-minus.013.912_±.013.912± .013 60.1±90.8subscript60.1plus-or-minus90.860.1_± 90.860.1± 90.8 .930±.255subscript.930plus-or-minus.255.930_±.255.930± .255 .762±.004subscript.762plus-or-minus.004.762_±.004.762± .004 .002.002.002.002 0⁢(335)03350(335)0 ( 335 ) TopK (k=128128k=128k = 128) .928±.008subscript.928plus-or-minus.008.928_±.008.928± .008 .098±.015subscript.098plus-or-minus.015.098_±.015.098± .015 .951±.007subscript.951plus-or-minus.007.951_±.007.951± .007 2.71±5.40subscript2.71plus-or-minus5.402.71_± 5.402.71± 5.40 .987±.114subscript.987plus-or-minus.114.987_±.114.987± .114 .811±.004subscript.811plus-or-minus.004.811_±.004.811± .004 .003.003.003.003 0⁢(117)01170(117)0 ( 117 ) TopK (k=256256k=256k = 256) .900±.004subscript.900plus-or-minus.004.900_±.004.900± .004 .011±.003subscript.011plus-or-minus.003.011_±.003.011± .003 .994±.002subscript.994plus-or-minus.002.994_±.002.994± .002 2.71±5.40subscript2.71plus-or-minus5.402.71_± 5.402.71± 5.40 .987±.114subscript.987plus-or-minus.114.987_±.114.987± .114 .874±.003subscript.874plus-or-minus.003.874_±.003.874± .003 .003.003.003.003 0⁢(296)02960(296)0 ( 296 ) TopK (k=512512k=512k = 512) .922±.015subscript.922plus-or-minus.015.922_±.015.922± .015 .346±.442subscript.346plus-or-minus.442.346_±.442.346± .442 .923±.058subscript.923plus-or-minus.058.923_±.058.923± .058 56.1±13.9subscript56.1plus-or-minus13.956.1_± 13.956.1± 13.9 .950±.218subscript.950plus-or-minus.218.950_±.218.950± .218 .006±.003subscript.006plus-or-minus.003.006_±.003.006± .003 .002.002.002.002 0⁢(1)010(1)0 ( 1 ) BatchTopK (k=1616k=16k = 16) .698±.021subscript.698plus-or-minus.021.698_±.021.698± .021 .371±.060subscript.371plus-or-minus.060.371_±.060.371± .060 .798±.037subscript.798plus-or-minus.037.798_±.037.798± .037 281±326subscript281plus-or-minus326281_± 326281± 326 .836±.372subscript.836plus-or-minus.372.836_±.372.836± .372 .698±.037subscript.698plus-or-minus.037.698_±.037.698± .037 .002.002.002.002 0⁢(4278)042780(4278)0 ( 4278 ) BatchTopK (k=3232k=32k = 32) .776±.019subscript.776plus-or-minus.019.776_±.019.776± .019 .242±.034subscript.242plus-or-minus.034.242_±.034.242± .034 .873±.020subscript.873plus-or-minus.020.873_±.020.873± .020 113±157subscript113plus-or-minus157113_± 157113± 157 .901±.299subscript.901plus-or-minus.299.901_±.299.901± .299 .735±.004subscript.735plus-or-minus.004.735_±.004.735± .004 .002.002.002.002 0⁢(3080)030800(3080)0 ( 3080 ) BatchTopK (k=6464k=64k = 64) .877±.012subscript.877plus-or-minus.012.877_±.012.877± .012 .162±.022subscript.162plus-or-minus.022.162_±.022.162± .022 .917±.011subscript.917plus-or-minus.011.917_±.011.917± .011 56.9±85.8subscript56.9plus-or-minus85.856.9_± 85.856.9± 85.8 .931±.253subscript.931plus-or-minus.253.931_±.253.931± .253 .769±.004subscript.769plus-or-minus.004.769_±.004.769± .004 .002.002.002.002 0⁢(1477)014770(1477)0 ( 1477 ) BatchTopK (k=128128k=128k = 128) .898±.010subscript.898plus-or-minus.010.898_±.010.898± .010 .082±.012subscript.082plus-or-minus.012.082_±.012.082± .012 .959±.006subscript.959plus-or-minus.006.959_±.006.959± .006 23.3±36.5subscript23.3plus-or-minus36.523.3_± 36.523.3± 36.5 .959±.197subscript.959plus-or-minus.197.959_±.197.959± .197 .805±.004subscript.805plus-or-minus.004.805_±.004.805± .004 .003.003.003.003 0⁢(539)05390(539)0 ( 539 ) BatchTopK (k=256256k=256k = 256) .882±.005subscript.882plus-or-minus.005.882_±.005.882± .005 .010±.005subscript.010plus-or-minus.005.010_±.005.010± .005 .995±.002subscript.995plus-or-minus.002.995_±.002.995± .002 2.42±5.12subscript2.42plus-or-minus5.122.42_± 5.122.42± 5.12 .988±.108subscript.988plus-or-minus.108.988_±.108.988± .108 .860±.003subscript.860plus-or-minus.003.860_±.003.860± .003 .002.002.002.002 3⁢(919)39193(919)3 ( 919 ) Matryoshka (RW) .829±.008subscript.829plus-or-minus.008.829_±.008.829± .008 .007±.003subscript.007plus-or-minus.003.007_±.003.007± .003 .997±.002subscript.997plus-or-minus.002.997_±.002.997± .002 3.13±7.08subscript3.13plus-or-minus7.083.13_± 7.083.13± 7.08 .987±.115subscript.987plus-or-minus.115.987_±.115.987± .115 .809±.002subscript.809plus-or-minus.002.809_±.002.809± .002 .002.002.002.002 2⁢(4)242(4)2 ( 4 ) Matryoshka (UW) .748±.006subscript.748plus-or-minus.006.748_±.006.748± .006 .002±.001subscript.002plus-or-minus.001.002_±.001.002± .001 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 0.35±0.82subscript0.35plus-or-minus0.820.35_± 0.820.35± 0.82 .995±.070subscript.995plus-or-minus.070.995_±.070.995± .070 .848±.003subscript.848plus-or-minus.003.848_±.003.848± .003 .001.001.001.001 0⁢(22)0220(22)0 ( 22 ) Table 7: CLIP ViT-L/14 SAE comparison at expansion rate 16. Results parallel to Table 6 showing performance scaling at higher expansion rate on ImageNet-1k. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ LP (KL) ↓ ↓ LP (Acc) ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .945±.006subscript.945plus-or-minus.006.945_±.006.945± .006 .147±.033subscript.147plus-or-minus.033.147_±.033.147± .033 .939±.008subscript.939plus-or-minus.008.939_±.008.939± .008 41.1±64.8subscript41.1plus-or-minus64.841.1_± 64.841.1± 64.8 .945±.229subscript.945plus-or-minus.229.945_±.229.945± .229 .714±.004subscript.714plus-or-minus.004.714_±.004.714± .004 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .838±.008subscript.838plus-or-minus.008.838_±.008.838± .008 .036±.005subscript.036plus-or-minus.005.036_±.005.036± .005 .983±.002subscript.983plus-or-minus.002.983_±.002.983± .002 8.41±13.7subscript8.41plus-or-minus13.78.41_± 13.78.41± 13.7 .975±.157subscript.975plus-or-minus.157.975_±.157.975± .157 .738±.005subscript.738plus-or-minus.005.738_±.005.738± .005 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .716±.009subscript.716plus-or-minus.009.716_±.009.716± .009 .006±.001subscript.006plus-or-minus.001.006_±.001.006± .001 .997±.000subscript.997plus-or-minus.000.997_±.000.997± .000 1.08±1.74subscript1.08plus-or-minus1.741.08_± 1.741.08± 1.74 .991±.093subscript.991plus-or-minus.093.991_±.093.991± .093 .695±.003subscript.695plus-or-minus.003.695_±.003.695± .003 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .664±.007subscript.664plus-or-minus.007.664_±.007.664± .007 .001±.000subscript.001plus-or-minus.000.001_±.000.001± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 0.14±0.22subscript0.14plus-or-minus0.220.14_± 0.220.14± 0.22 .997±.056subscript.997plus-or-minus.056.997_±.056.997± .056 .789±.004subscript.789plus-or-minus.004.789_±.004.789± .004 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .972±.009subscript.972plus-or-minus.009.972_±.009.972± .009 .249±.047subscript.249plus-or-minus.047.249_±.047.249± .047 .873±.022subscript.873plus-or-minus.022.873_±.022.873± .022 112±168subscript112plus-or-minus168112_± 168112± 168 .899±.301subscript.899plus-or-minus.301.899_±.301.899± .301 .692±.005subscript.692plus-or-minus.005.692_±.005.692± .005 .002.002.002.002 0⁢(4727)047270(4727)0 ( 4727 ) TopK (k=6464k=64k = 64) .973±.006subscript.973plus-or-minus.006.973_±.006.973± .006 .174±.028subscript.174plus-or-minus.028.174_±.028.174± .028 .911±.014subscript.911plus-or-minus.014.911_±.014.911± .014 61.7±96.8subscript61.7plus-or-minus96.861.7_± 96.861.7± 96.8 .927±.260subscript.927plus-or-minus.260.927_±.260.927± .260 .745±.003subscript.745plus-or-minus.003.745_±.003.745± .003 .002.002.002.002 0⁢(2079)020790(2079)0 ( 2079 ) TopK (k=128128k=128k = 128) .960±.006subscript.960plus-or-minus.006.960_±.006.960± .006 .104±.017subscript.104plus-or-minus.017.104_±.017.104± .017 .948±.008subscript.948plus-or-minus.008.948_±.008.948± .008 30.7±49.0subscript30.7plus-or-minus49.030.7_± 49.030.7± 49.0 .951±.215subscript.951plus-or-minus.215.951_±.215.951± .215 .801±.003subscript.801plus-or-minus.003.801_±.003.801± .003 .002.002.002.002 1⁢(897)18971(897)1 ( 897 ) TopK (k=256256k=256k = 256) .937±.006subscript.937plus-or-minus.006.937_±.006.937± .006 .019±.004subscript.019plus-or-minus.004.019_±.004.019± .004 .991±.002subscript.991plus-or-minus.002.991_±.002.991± .002 3.76±6.85subscript3.76plus-or-minus6.853.76_± 6.853.76± 6.85 .984±.127subscript.984plus-or-minus.127.984_±.127.984± .127 .871±.002subscript.871plus-or-minus.002.871_±.002.871± .002 .004.004.004.004 15⁢(1383)15138315(1383)15 ( 1383 ) TopK (k=512512k=512k = 512) .964±.008subscript.964plus-or-minus.008.964_±.008.964± .008 .336±.413subscript.336plus-or-minus.413.336_±.413.336± .413 .926±.064subscript.926plus-or-minus.064.926_±.064.926± .064 72.9±17.8subscript72.9plus-or-minus17.872.9_± 17.872.9± 17.8 .944±.230subscript.944plus-or-minus.230.944_±.230.944± .230 .007±.003subscript.007plus-or-minus.003.007_±.003.007± .003 .002.002.002.002 0⁢(29)0290(29)0 ( 29 ) BatchTopK (k=1616k=16k = 16) .669±.021subscript.669plus-or-minus.021.669_±.021.669± .021 .404±.055subscript.404plus-or-minus.055.404_±.055.404± .055 .786±.038subscript.786plus-or-minus.038.786_±.038.786± .038 310±353subscript310plus-or-minus353310_± 353310± 353 .829±.377subscript.829plus-or-minus.377.829_±.377.829± .377 .705±.005subscript.705plus-or-minus.005.705_±.005.705± .005 .001.001.001.001 0⁢(9859)098590(9859)0 ( 9859 ) BatchTopK (k=3232k=32k = 32) .742±.021subscript.742plus-or-minus.021.742_±.021.742± .021 .274±.031subscript.274plus-or-minus.031.274_±.031.274± .031 .866±.020subscript.866plus-or-minus.020.866_±.020.866± .020 122±166subscript122plus-or-minus166122_± 166122± 166 .897±.304subscript.897plus-or-minus.304.897_±.304.897± .304 .736±.005subscript.736plus-or-minus.005.736_±.005.736± .005 .002.002.002.002 0⁢(8016)080160(8016)0 ( 8016 ) BatchTopK (k=6464k=64k = 64) .880±.013subscript.880plus-or-minus.013.880_±.013.880± .013 .167±.020subscript.167plus-or-minus.020.167_±.020.167± .020 .916±.012subscript.916plus-or-minus.012.916_±.012.916± .012 55.9±85.0subscript55.9plus-or-minus85.055.9_± 85.055.9± 85.0 .932±.252subscript.932plus-or-minus.252.932_±.252.932± .252 .759±.004subscript.759plus-or-minus.004.759_±.004.759± .004 .002.002.002.002 0⁢(5113)051130(5113)0 ( 5113 ) BatchTopK (k=128128k=128k = 128) .889±.011subscript.889plus-or-minus.011.889_±.011.889± .011 .089±.012subscript.089plus-or-minus.012.089_±.012.089± .012 .957±.006subscript.957plus-or-minus.006.957_±.006.957± .006 24.6±38.5subscript24.6plus-or-minus38.524.6_± 38.524.6± 38.5 .956±.204subscript.956plus-or-minus.204.956_±.204.956± .204 .806±.003subscript.806plus-or-minus.003.806_±.003.806± .003 .002.002.002.002 1⁢(2967)129671(2967)1 ( 2967 ) BatchTopK (k=256256k=256k = 256) .880±.009subscript.880plus-or-minus.009.880_±.009.880± .009 .023±.006subscript.023plus-or-minus.006.023_±.006.023± .006 .990±.003subscript.990plus-or-minus.003.990_±.003.990± .003 4.12±7.11subscript4.12plus-or-minus7.114.12_± 7.114.12± 7.11 .983±.131subscript.983plus-or-minus.131.983_±.131.983± .131 .854±.003subscript.854plus-or-minus.003.854_±.003.854± .003 .002.002.002.002 12⁢(3558)12355812(3558)12 ( 3558 ) Matryoshka (RW) .884±.006subscript.884plus-or-minus.006.884_±.006.884± .006 .005±.003subscript.005plus-or-minus.003.005_±.003.005± .003 .998±.001subscript.998plus-or-minus.001.998_±.001.998± .001 2.08±4.68subscript2.08plus-or-minus4.682.08_± 4.682.08± 4.68 .989±.103subscript.989plus-or-minus.103.989_±.103.989± .103 .801±.003subscript.801plus-or-minus.003.801_±.003.801± .003 .002.002.002.002 32⁢(124)3212432(124)32 ( 124 ) Matryoshka (UW) .830±.003subscript.830plus-or-minus.003.830_±.003.830± .003 .000±.000subscript.000plus-or-minus.000.000_±.000.000± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 0.12±0.41subscript0.12plus-or-minus0.410.12_± 0.410.12± 0.41 .998±.050subscript.998plus-or-minus.050.998_±.050.998± .050 .853±.002subscript.853plus-or-minus.002.853_±.002.853± .002 .002.002.002.002 22⁢(491)2249122(491)22 ( 491 ) Table 8: CLIP ViT-L/14 SAE comparison at expansion rate 32. Analysis at maximum tested expansion rate on ImageNet-1k, completing the scaling study from Tables 6 and 7. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ LP (KL) ↓ ↓ LP (Acc) ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .964±.004subscript.964plus-or-minus.004.964_±.004.964± .004 .120±.029subscript.120plus-or-minus.029.120_±.029.120± .029 .948±.007subscript.948plus-or-minus.007.948_±.007.948± .007 36.1±60.2subscript36.1plus-or-minus60.236.1_± 60.236.1± 60.2 .949±.221subscript.949plus-or-minus.221.949_±.221.949± .221 .707±.005subscript.707plus-or-minus.005.707_±.005.707± .005 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .893±.006subscript.893plus-or-minus.006.893_±.006.893± .006 .032±.005subscript.032plus-or-minus.005.032_±.005.032± .005 .985±.002subscript.985plus-or-minus.002.985_±.002.985± .002 7.65±12.77subscript7.65plus-or-minus12.777.65_± 12.777.65± 12.77 .977±.150subscript.977plus-or-minus.150.977_±.150.977± .150 .752±.008subscript.752plus-or-minus.008.752_±.008.752± .008 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .781±.007subscript.781plus-or-minus.007.781_±.007.781± .007 .011±.002subscript.011plus-or-minus.002.011_±.002.011± .002 .995±.001subscript.995plus-or-minus.001.995_±.001.995± .001 2.06±3.40subscript2.06plus-or-minus3.402.06_± 3.402.06± 3.40 .988±.111subscript.988plus-or-minus.111.988_±.111.988± .111 .619±.007subscript.619plus-or-minus.007.619_±.007.619± .007 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .653±.005subscript.653plus-or-minus.005.653_±.005.653± .005 .004±.001subscript.004plus-or-minus.001.004_±.001.004± .001 .998±.000subscript.998plus-or-minus.000.998_±.000.998± .000 0.77±1.25subscript0.77plus-or-minus1.250.77_± 1.250.77± 1.25 .993±.085subscript.993plus-or-minus.085.993_±.085.993± .085 .493±.007subscript.493plus-or-minus.007.493_±.007.493± .007 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .938±.018subscript.938plus-or-minus.018.938_±.018.938± .018 .246±.047subscript.246plus-or-minus.047.246_±.047.246± .047 .872±.025subscript.872plus-or-minus.025.872_±.025.872± .025 102.37±155.09subscript102.37plus-or-minus155.09102.37_± 155.09102.37± 155.09 .906±.292subscript.906plus-or-minus.292.906_±.292.906± .292 .697±.012subscript.697plus-or-minus.012.697_±.012.697± .012 .002.002.002.002 64⁢(14535)641453564(14535)64 ( 14535 ) TopK (k=6464k=64k = 64) .973±.006subscript.973plus-or-minus.006.973_±.006.973± .006 .174±.028subscript.174plus-or-minus.028.174_±.028.174± .028 .911±.014subscript.911plus-or-minus.014.911_±.014.911± .014 61.7±96.8subscript61.7plus-or-minus96.861.7_± 96.861.7± 96.8 .927±.260subscript.927plus-or-minus.260.927_±.260.927± .260 .745±.003subscript.745plus-or-minus.003.745_±.003.745± .003 .002.002.002.002 0⁢(9347)093470(9347)0 ( 9347 ) TopK (k=128128k=128k = 128) .964±.009subscript.964plus-or-minus.009.964_±.009.964± .009 .110±.021subscript.110plus-or-minus.021.110_±.021.110± .021 .947±.009subscript.947plus-or-minus.009.947_±.009.947± .009 31.4±51.2subscript31.4plus-or-minus51.231.4_± 51.231.4± 51.2 .952±.213subscript.952plus-or-minus.213.952_±.213.952± .213 .794±.005subscript.794plus-or-minus.005.794_±.005.794± .005 .002.002.002.002 10⁢(5604)10560410(5604)10 ( 5604 ) TopK (k=256256k=256k = 256) .942±.012subscript.942plus-or-minus.012.942_±.012.942± .012 .032±.008subscript.032plus-or-minus.008.032_±.008.032± .008 .986±.003subscript.986plus-or-minus.003.986_±.003.986± .003 5.39±8.80subscript5.39plus-or-minus8.805.39_± 8.805.39± 8.80 .980±.139subscript.980plus-or-minus.139.980_±.139.980± .139 .864±.003subscript.864plus-or-minus.003.864_±.003.864± .003 .003.003.003.003 91⁢(6590)91659091(6590)91 ( 6590 ) TopK (k=512512k=512k = 512) .966±.004subscript.966plus-or-minus.004.966_±.004.966± .004 .008±.005subscript.008plus-or-minus.005.008_±.005.008± .005 .996±.002subscript.996plus-or-minus.002.996_±.002.996± .002 1.30±2.69subscript1.30plus-or-minus2.691.30_± 2.691.30± 2.69 .992±.092subscript.992plus-or-minus.092.992_±.092.992± .092 .422±.018subscript.422plus-or-minus.018.422_±.018.422± .018 .002.002.002.002 7822⁢(22446)7822224467822(22446)7822 ( 22446 ) BatchTopK (k=1616k=16k = 16) .638±.018subscript.638plus-or-minus.018.638_±.018.638± .018 .462±.057subscript.462plus-or-minus.057.462_±.057.462± .057 .775±.038subscript.775plus-or-minus.038.775_±.038.775± .038 344.4±377.9subscript344.4plus-or-minus377.9344.4_± 377.9344.4± 377.9 .825±.380subscript.825plus-or-minus.380.825_±.380.825± .380 .735±.005subscript.735plus-or-minus.005.735_±.005.735± .005 .001.001.001.001 0⁢(21631)0216310(21631)0 ( 21631 ) BatchTopK (k=3232k=32k = 32) .756±.015subscript.756plus-or-minus.015.756_±.015.756± .015 .329±.042subscript.329plus-or-minus.042.329_±.042.329± .042 .855±.024subscript.855plus-or-minus.024.855_±.024.855± .024 139.4±186.8subscript139.4plus-or-minus186.8139.4_± 186.8139.4± 186.8 .894±.308subscript.894plus-or-minus.308.894_±.308.894± .308 .747±.004subscript.747plus-or-minus.004.747_±.004.747± .004 .001.001.001.001 0⁢(18965)0189650(18965)0 ( 18965 ) BatchTopK (k=6464k=64k = 64) .880±.011subscript.880plus-or-minus.011.880_±.011.880± .011 .190±.024subscript.190plus-or-minus.024.190_±.024.190± .024 .911±.013subscript.911plus-or-minus.013.911_±.013.911± .013 55.4±84.7subscript55.4plus-or-minus84.755.4_± 84.755.4± 84.7 .934±.249subscript.934plus-or-minus.249.934_±.249.934± .249 .764±.004subscript.764plus-or-minus.004.764_±.004.764± .004 .002.002.002.002 0⁢(15035)0150350(15035)0 ( 15035 ) BatchTopK (k=128128k=128k = 128) .869±.012subscript.869plus-or-minus.012.869_±.012.869± .012 .129±.016subscript.129plus-or-minus.016.129_±.016.129± .016 .947±.007subscript.947plus-or-minus.007.947_±.007.947± .007 26.7±42.2subscript26.7plus-or-minus42.226.7_± 42.226.7± 42.2 .956±.206subscript.956plus-or-minus.206.956_±.206.956± .206 .794±.003subscript.794plus-or-minus.003.794_±.003.794± .003 .002.002.002.002 1⁢(11019)1110191(11019)1 ( 11019 ) BatchTopK (k=256256k=256k = 256) .837±.014subscript.837plus-or-minus.014.837_±.014.837± .014 .069±.011subscript.069plus-or-minus.011.069_±.011.069± .011 .979±.004subscript.979plus-or-minus.004.979_±.004.979± .004 8.22±13.4subscript8.22plus-or-minus13.48.22_± 13.48.22± 13.4 .976±.154subscript.976plus-or-minus.154.976_±.154.976± .154 .851±.003subscript.851plus-or-minus.003.851_±.003.851± .003 .002.002.002.002 12⁢(11802)121180212(11802)12 ( 11802 ) Matryoshka (RW) .927±.004subscript.927plus-or-minus.004.927_±.004.927± .004 .003±.001subscript.003plus-or-minus.001.003_±.001.003± .001 .999±.001subscript.999plus-or-minus.001.999_±.001.999± .001 1.00±2.15subscript1.00plus-or-minus2.151.00_± 2.151.00± 2.15 .992±.090subscript.992plus-or-minus.090.992_±.090.992± .090 .810±.002subscript.810plus-or-minus.002.810_±.002.810± .002 .002.002.002.002 79⁢(142)7914279(142)79 ( 142 ) Matryoshka (UW) .908±.002subscript.908plus-or-minus.002.908_±.002.908± .002 .000±.000subscript.000plus-or-minus.000.000_±.000.000± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 0.09±0.35subscript0.09plus-or-minus0.350.09_± 0.350.09± 0.35 .998±.047subscript.998plus-or-minus.047.998_±.047.998± .047 .850±.003subscript.850plus-or-minus.003.850_±.003.850± .003 .002.002.002.002 297⁢(162)297162297(162)297 ( 162 ) Table 9: CLIP ViT-B/16 SAE comparison at expansion rate 8. Parallel analysis to ViT-L/14 (Table 6) using smaller CLIP architecture on ImageNet-1k. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .908±.010subscript.908plus-or-minus.010.908_±.010.908± .010 .154±.036subscript.154plus-or-minus.036.154_±.036.154± .036 .936±.010subscript.936plus-or-minus.010.936_±.010.936± .010 .671±.004subscript.671plus-or-minus.004.671_±.004.671± .004 .004.004.004.004 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .747±.012subscript.747plus-or-minus.012.747_±.012.747± .012 .030±.005subscript.030plus-or-minus.005.030_±.005.030± .005 .986±.002subscript.986plus-or-minus.002.986_±.002.986± .002 .682±.005subscript.682plus-or-minus.005.682_±.005.682± .005 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .629±.008subscript.629plus-or-minus.008.629_±.008.629± .008 .003±.000subscript.003plus-or-minus.000.003_±.000.003± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .737±.003subscript.737plus-or-minus.003.737_±.003.737± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .366±.012subscript.366plus-or-minus.012.366_±.012.366± .012 .009±.003subscript.009plus-or-minus.003.009_±.003.009± .003 .996±.001subscript.996plus-or-minus.001.996_±.001.996± .001 .695±.003subscript.695plus-or-minus.003.695_±.003.695± .003 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .946±.013subscript.946plus-or-minus.013.946_±.013.946± .013 .200±.037subscript.200plus-or-minus.037.200_±.037.200± .037 .897±.019subscript.897plus-or-minus.019.897_±.019.897± .019 .684±.005subscript.684plus-or-minus.005.684_±.005.684± .005 .004.004.004.004 0⁢(672)06720(672)0 ( 672 ) TopK (k=6464k=64k = 64) .935±.011subscript.935plus-or-minus.011.935_±.011.935± .011 .138±.024subscript.138plus-or-minus.024.138_±.024.138± .024 .930±.012subscript.930plus-or-minus.012.930_±.012.930± .012 .730±.003subscript.730plus-or-minus.003.730_±.003.730± .003 .003.003.003.003 0⁢(196)01960(196)0 ( 196 ) TopK (k=128128k=128k = 128) .843±.020subscript.843plus-or-minus.020.843_±.020.843± .020 .116±.026subscript.116plus-or-minus.026.116_±.026.116± .026 .948±.010subscript.948plus-or-minus.010.948_±.010.948± .010 .735±.003subscript.735plus-or-minus.003.735_±.003.735± .003 .003.003.003.003 0⁢(95)0950(95)0 ( 95 ) TopK (k=256256k=256k = 256) .859±.008subscript.859plus-or-minus.008.859_±.008.859± .008 .024±.007subscript.024plus-or-minus.007.024_±.007.024± .007 .988±.003subscript.988plus-or-minus.003.988_±.003.988± .003 .787±.002subscript.787plus-or-minus.002.787_±.002.787± .002 .004.004.004.004 0⁢(8)080(8)0 ( 8 ) TopK (k=512512k=512k = 512) .882±.008subscript.882plus-or-minus.008.882_±.008.882± .008 .058±.052subscript.058plus-or-minus.052.058_±.052.058± .052 .972±.025subscript.972plus-or-minus.025.972_±.025.972± .025 .005±.003subscript.005plus-or-minus.003.005_±.003.005± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) BatchTopK (k=1616k=16k = 16) .706±.023subscript.706plus-or-minus.023.706_±.023.706± .023 .299±.050subscript.299plus-or-minus.050.299_±.050.299± .050 .845±.030subscript.845plus-or-minus.030.845_±.030.845± .030 .662±.005subscript.662plus-or-minus.005.662_±.005.662± .005 .002.002.002.002 0⁢(2666)026660(2666)0 ( 2666 ) BatchTopK (k=3232k=32k = 32) .810±.019subscript.810plus-or-minus.019.810_±.019.810± .019 .196±.031subscript.196plus-or-minus.031.196_±.031.196± .031 .900±.017subscript.900plus-or-minus.017.900_±.017.900± .017 .695±.005subscript.695plus-or-minus.005.695_±.005.695± .005 .003.003.003.003 0⁢(1763)017630(1763)0 ( 1763 ) BatchTopK (k=6464k=64k = 64) .877±.013subscript.877plus-or-minus.013.877_±.013.877± .013 .127±.020subscript.127plus-or-minus.020.127_±.020.127± .020 .936±.010subscript.936plus-or-minus.010.936_±.010.936± .010 .750±.004subscript.750plus-or-minus.004.750_±.004.750± .004 .003.003.003.003 0⁢(830)08300(830)0 ( 830 ) BatchTopK (k=128128k=128k = 128) .882±.010subscript.882plus-or-minus.010.882_±.010.882± .010 .048±.008subscript.048plus-or-minus.008.048_±.008.048± .008 .976±.004subscript.976plus-or-minus.004.976_±.004.976± .004 .806±.003subscript.806plus-or-minus.003.806_±.003.806± .003 .003.003.003.003 0⁢(387)03870(387)0 ( 387 ) BatchTopK (k=256256k=256k = 256) .876±.003subscript.876plus-or-minus.003.876_±.003.876± .003 .003±.001subscript.003plus-or-minus.001.003_±.001.003± .001 .999±.001subscript.999plus-or-minus.001.999_±.001.999± .001 .843±.003subscript.843plus-or-minus.003.843_±.003.843± .003 .003.003.003.003 35⁢(1766)35176635(1766)35 ( 1766 ) Matryoshka (RW) .783±.009subscript.783plus-or-minus.009.783_±.009.783± .009 .003±.001subscript.003plus-or-minus.001.003_±.001.003± .001 .999±.001subscript.999plus-or-minus.001.999_±.001.999± .001 .792±.003subscript.792plus-or-minus.003.792_±.003.792± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) Matryoshka (UW) .711±.004subscript.711plus-or-minus.004.711_±.004.711± .004 .000±.000subscript.000plus-or-minus.000.000_±.000.000± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .814±.002subscript.814plus-or-minus.002.814_±.002.814± .002 .003.003.003.003 0⁢(1)010(1)0 ( 1 ) Table 10: CLIP ViT-B/16 SAE comparison at expansion rate 16. Extension of Table 9 to expansion rate 16 on ImageNet-1k. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .940±.007subscript.940plus-or-minus.007.940_±.007.940± .007 .125±.030subscript.125plus-or-minus.030.125_±.030.125± .030 .945±.008subscript.945plus-or-minus.008.945_±.008.945± .008 .664±.006subscript.664plus-or-minus.006.664_±.006.664± .006 .004.004.004.004 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .824±.010subscript.824plus-or-minus.010.824_±.010.824± .010 .033±.005subscript.033plus-or-minus.005.033_±.005.033± .005 .984±.002subscript.984plus-or-minus.002.984_±.002.984± .002 .669±.006subscript.669plus-or-minus.006.669_±.006.669± .006 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .695±.010subscript.695plus-or-minus.010.695_±.010.695± .010 .004±.008subscript.004plus-or-minus.008.004_±.008.004± .008 .998±.000subscript.998plus-or-minus.000.998_±.000.998± .000 .635±.005subscript.635plus-or-minus.005.635_±.005.635± .005 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .646±.008subscript.646plus-or-minus.008.646_±.008.646± .008 .001±.000subscript.001plus-or-minus.000.001_±.000.001± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .742±.003subscript.742plus-or-minus.003.742_±.003.742± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .958±.015subscript.958plus-or-minus.015.958_±.015.958± .015 .206±.042subscript.206plus-or-minus.042.206_±.042.206± .042 .896±.021subscript.896plus-or-minus.021.896_±.021.896± .021 .671±.005subscript.671plus-or-minus.005.671_±.005.671± .005 .003.003.003.003 0⁢(2819)028190(2819)0 ( 2819 ) TopK (k=6464k=64k = 64) .962±.008subscript.962plus-or-minus.008.962_±.008.962± .008 .141±.026subscript.141plus-or-minus.026.141_±.026.141± .026 .930±.012subscript.930plus-or-minus.012.930_±.012.930± .012 .710±.003subscript.710plus-or-minus.003.710_±.003.710± .003 .003.003.003.003 0⁢(1268)012680(1268)0 ( 1268 ) TopK (k=128128k=128k = 128) .950±.007subscript.950plus-or-minus.007.950_±.007.950± .007 .072±.016subscript.072plus-or-minus.016.072_±.016.072± .016 .965±.007subscript.965plus-or-minus.007.965_±.007.965± .007 .782±.003subscript.782plus-or-minus.003.782_±.003.782± .003 .004.004.004.004 0⁢(597)05970(597)0 ( 597 ) TopK (k=256256k=256k = 256) .935±.003subscript.935plus-or-minus.003.935_±.003.935± .003 .003±.002subscript.003plus-or-minus.002.003_±.002.003± .002 .998±.001subscript.998plus-or-minus.001.998_±.001.998± .001 .839±.002subscript.839plus-or-minus.002.839_±.002.839± .002 .003.003.003.003 2⁢(4686)246862(4686)2 ( 4686 ) TopK (k=512512k=512k = 512) .943±.015subscript.943plus-or-minus.015.943_±.015.943± .015 .203±.312subscript.203plus-or-minus.312.203_±.312.203± .312 .955±.047subscript.955plus-or-minus.047.955_±.047.955± .047 .006±.003subscript.006plus-or-minus.003.006_±.003.006± .003 .003.003.003.003 1⁢(1)111(1)1 ( 1 ) BatchTopK (k=1616k=16k = 16) .658±.022subscript.658plus-or-minus.022.658_±.022.658± .022 .349±.054subscript.349plus-or-minus.054.349_±.054.349± .054 .837±.033subscript.837plus-or-minus.033.837_±.033.837± .033 .670±.006subscript.670plus-or-minus.006.670_±.006.670± .006 .002.002.002.002 0⁢(6478)064780(6478)0 ( 6478 ) BatchTopK (k=3232k=32k = 32) .757±.022subscript.757plus-or-minus.022.757_±.022.757± .022 .230±.035subscript.230plus-or-minus.035.230_±.035.230± .035 .896±.019subscript.896plus-or-minus.019.896_±.019.896± .019 .704±.005subscript.704plus-or-minus.005.704_±.005.704± .005 .002.002.002.002 0⁢(5114)051140(5114)0 ( 5114 ) BatchTopK (k=6464k=64k = 64) .853±.016subscript.853plus-or-minus.016.853_±.016.853± .016 .135±.019subscript.135plus-or-minus.019.135_±.019.135± .019 .938±.009subscript.938plus-or-minus.009.938_±.009.938± .009 .741±.005subscript.741plus-or-minus.005.741_±.005.741± .005 .003.003.003.003 0⁢(3145)031450(3145)0 ( 3145 ) BatchTopK (k=128128k=128k = 128) .887±.010subscript.887plus-or-minus.010.887_±.010.887± .010 .061±.011subscript.061plus-or-minus.011.061_±.011.061± .011 .971±.005subscript.971plus-or-minus.005.971_±.005.971± .005 .799±.004subscript.799plus-or-minus.004.799_±.004.799± .004 .003.003.003.003 0⁢(1817)018170(1817)0 ( 1817 ) BatchTopK (k=256256k=256k = 256) .876±.003subscript.876plus-or-minus.003.876_±.003.876± .003 .003±.001subscript.003plus-or-minus.001.003_±.001.003± .001 .999±.001subscript.999plus-or-minus.001.999_±.001.999± .001 .843±.003subscript.843plus-or-minus.003.843_±.003.843± .003 .002.002.002.002 40⁢(4947)40494740(4947)40 ( 4947 ) Matryoshka (RW) .861±.005subscript.861plus-or-minus.005.861_±.005.861± .005 .002±.001subscript.002plus-or-minus.001.002_±.001.002± .001 .999±.001subscript.999plus-or-minus.001.999_±.001.999± .001 .778±.003subscript.778plus-or-minus.003.778_±.003.778± .003 .003.003.003.003 8⁢(63)8638(63)8 ( 63 ) Matryoshka (UW) .805±.004subscript.805plus-or-minus.004.805_±.004.805± .004 .000±.000subscript.000plus-or-minus.000.000_±.000.000± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .813±.003subscript.813plus-or-minus.003.813_±.003.813± .003 .003.003.003.003 44⁢(275)4427544(275)44 ( 275 ) Table 11: CLIP ViT-B/16 SAE comparison at expansion rate 32. Completion of ViT-B/16 scaling analysis on ImageNet-1k at maximum tested expansion rate. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .956±.005subscript.956plus-or-minus.005.956_±.005.956± .005 .104±.025subscript.104plus-or-minus.025.104_±.025.104± .025 .953±.007subscript.953plus-or-minus.007.953_±.007.953± .007 .656±.004subscript.656plus-or-minus.004.656_±.004.656± .004 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .879±.008subscript.879plus-or-minus.008.879_±.008.879± .008 .031±.005subscript.031plus-or-minus.005.031_±.005.031± .005 .986±.002subscript.986plus-or-minus.002.986_±.002.986± .002 .688±.005subscript.688plus-or-minus.005.688_±.005.688± .005 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .757±.009subscript.757plus-or-minus.009.757_±.009.757± .009 .010±.002subscript.010plus-or-minus.002.010_±.002.010± .002 .996±.001subscript.996plus-or-minus.001.996_±.001.996± .001 .568±.005subscript.568plus-or-minus.005.568_±.005.568± .005 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .625±.006subscript.625plus-or-minus.006.625_±.006.625± .006 .004±.001subscript.004plus-or-minus.001.004_±.001.004± .001 .998±.000subscript.998plus-or-minus.000.998_±.000.998± .000 .516±.005subscript.516plus-or-minus.005.516_±.005.516± .005 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .933±.025subscript.933plus-or-minus.025.933_±.025.933± .025 .230±.058subscript.230plus-or-minus.058.230_±.058.230± .058 .891±.025subscript.891plus-or-minus.025.891_±.025.891± .025 .674±.005subscript.674plus-or-minus.005.674_±.005.674± .005 .003.003.003.003 0⁢(9223)092230(9223)0 ( 9223 ) TopK (k=6464k=64k = 64) .967±.012subscript.967plus-or-minus.012.967_±.012.967± .012 .152±.032subscript.152plus-or-minus.032.152_±.032.152± .032 .927±.014subscript.927plus-or-minus.014.927_±.014.927± .014 .698±.003subscript.698plus-or-minus.003.698_±.003.698± .003 .003.003.003.003 0⁢(5643)056430(5643)0 ( 5643 ) TopK (k=128128k=128k = 128) .960±.010subscript.960plus-or-minus.010.960_±.010.960± .010 .085±.023subscript.085plus-or-minus.023.085_±.023.085± .023 .961±.009subscript.961plus-or-minus.009.961_±.009.961± .009 .772±.002subscript.772plus-or-minus.002.772_±.002.772± .002 .003.003.003.003 2⁢(3321)233212(3321)2 ( 3321 ) TopK (k=256256k=256k = 256) .922±.014subscript.922plus-or-minus.014.922_±.014.922± .014 .015±.006subscript.015plus-or-minus.006.015_±.006.015± .006 .995±.002subscript.995plus-or-minus.002.995_±.002.995± .002 .822±.002subscript.822plus-or-minus.002.822_±.002.822± .002 .003.003.003.003 1⁢(10480)1104801(10480)1 ( 10480 ) TopK (k=512512k=512k = 512) .967±.000subscript.967plus-or-minus.000.967_±.000.967± .000 .001±.001subscript.001plus-or-minus.001.001_±.001.001± .001 1.000±.001subscript1.000plus-or-minus.0011.000_±.0011.000± .001 .016±.007subscript.016plus-or-minus.007.016_±.007.016± .007 .002.002.002.002 5902⁢(14864)5902148645902(14864)5902 ( 14864 ) BatchTopK (k=1616k=16k = 16) .615±.018subscript.615plus-or-minus.018.615_±.018.615± .018 .450±.060subscript.450plus-or-minus.060.450_±.060.450± .060 .822±.037subscript.822plus-or-minus.037.822_±.037.822± .037 .695±.005subscript.695plus-or-minus.005.695_±.005.695± .005 .002.002.002.002 0⁢(14345)0143450(14345)0 ( 14345 ) BatchTopK (k=3232k=32k = 32) .712±.021subscript.712plus-or-minus.021.712_±.021.712± .021 .312±.043subscript.312plus-or-minus.043.312_±.043.312± .043 .890±.019subscript.890plus-or-minus.019.890_±.019.890± .019 .719±.005subscript.719plus-or-minus.005.719_±.005.719± .005 .002.002.002.002 0⁢(12492)0124920(12492)0 ( 12492 ) BatchTopK (k=6464k=64k = 64) .835±.017subscript.835plus-or-minus.017.835_±.017.835± .017 .184±.028subscript.184plus-or-minus.028.184_±.028.184± .028 .933±.010subscript.933plus-or-minus.010.933_±.010.933± .010 .731±.005subscript.731plus-or-minus.005.731_±.005.731± .005 .002.002.002.002 2⁢(9645)296452(9645)2 ( 9645 ) BatchTopK (k=128128k=128k = 128) .856±.013subscript.856plus-or-minus.013.856_±.013.856± .013 .096±.015subscript.096plus-or-minus.015.096_±.015.096± .015 .966±.005subscript.966plus-or-minus.005.966_±.005.966± .005 .798±.003subscript.798plus-or-minus.003.798_±.003.798± .003 .003.003.003.003 0⁢(6855)068550(6855)0 ( 6855 ) BatchTopK (k=256256k=256k = 256) .844±.011subscript.844plus-or-minus.011.844_±.011.844± .011 .040±.007subscript.040plus-or-minus.007.040_±.007.040± .007 .991±.002subscript.991plus-or-minus.002.991_±.002.991± .002 .827±.003subscript.827plus-or-minus.003.827_±.003.827± .003 .002.002.002.002 0⁢(11638)0116380(11638)0 ( 11638 ) Matryoshka (RW) .915±.003subscript.915plus-or-minus.003.915_±.003.915± .003 .001±.001subscript.001plus-or-minus.001.001_±.001.001± .001 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .794±.003subscript.794plus-or-minus.003.794_±.003.794± .003 .003.003.003.003 6⁢(23)6236(23)6 ( 23 ) Matryoshka (UW) .880±.003subscript.880plus-or-minus.003.880_±.003.880± .003 .000±.000subscript.000plus-or-minus.000.000_±.000.000± .000 .999±.000subscript.999plus-or-minus.000.999_±.000.999± .000 .804±.002subscript.804plus-or-minus.002.804_±.002.804± .002 .002.002.002.002 15⁢(26)152615(26)15 ( 26 ) F.5.2 Results for Text Modality For text modality, Tables 12–14 present results for ViT-L/14, while Tables 15–17 show ViT-B/16 performance on C3M validation text data, enabling cross-modal and cross-architecture comparisons. Table 12: CLIP ViT-L/14 SAE text analysis at expansion rate 8. Evaluation on C3M text validation set parallel to image results in Table 6, highlighting cross-modal performance differences. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .901±.010subscript.901plus-or-minus.010.901_±.010.901± .010 .427±.174subscript.427plus-or-minus.174.427_±.174.427± .174 .802±.049subscript.802plus-or-minus.049.802_±.049.802± .049 .622±.003subscript.622plus-or-minus.003.622_±.003.622± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .522±.041subscript.522plus-or-minus.041.522_±.041.522± .041 .036±.038subscript.036plus-or-minus.038.036_±.038.036± .038 .984±.014subscript.984plus-or-minus.014.984_±.014.984± .014 .716±.006subscript.716plus-or-minus.006.716_±.006.716± .006 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .609±.027subscript.609plus-or-minus.027.609_±.027.609± .027 .041±.039subscript.041plus-or-minus.039.041_±.039.041± .039 .981±.018subscript.981plus-or-minus.018.981_±.018.981± .018 .744±.002subscript.744plus-or-minus.002.744_±.002.744± .002 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .522±.045subscript.522plus-or-minus.045.522_±.045.522± .045 .035±.038subscript.035plus-or-minus.038.035_±.038.035± .038 .984±.014subscript.984plus-or-minus.014.984_±.014.984± .014 .706±.005subscript.706plus-or-minus.005.706_±.005.706± .005 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .724±.318subscript.724plus-or-minus.318.724_±.318.724± .318 1.095±.835subscript1.095plus-or-minus.8351.095_±.8351.095± .835 .511±.206subscript.511plus-or-minus.206.511_±.206.511± .206 .037±.031subscript.037plus-or-minus.031.037_±.031.037± .031 .002.002.002.002 0⁢(1235)012350(1235)0 ( 1235 ) TopK (k=6464k=64k = 64) .715±.295subscript.715plus-or-minus.295.715_±.295.715± .295 .760±.249subscript.760plus-or-minus.249.760_±.249.760± .249 .585±.209subscript.585plus-or-minus.209.585_±.209.585± .209 .025±.020subscript.025plus-or-minus.020.025_±.020.025± .020 .002.002.002.002 0⁢(335)03350(335)0 ( 335 ) TopK (k=128128k=128k = 128) .781±.190subscript.781plus-or-minus.190.781_±.190.781± .190 .537±.252subscript.537plus-or-minus.252.537_±.252.537± .252 .708±.190subscript.708plus-or-minus.190.708_±.190.708± .190 .042±.011subscript.042plus-or-minus.011.042_±.011.042± .011 .003.003.003.003 0⁢(117)01170(117)0 ( 117 ) TopK (k=256256k=256k = 256) .783±.180subscript.783plus-or-minus.180.783_±.180.783± .180 .366±.366subscript.366plus-or-minus.366.366_±.366.366± .366 .742±.301subscript.742plus-or-minus.301.742_±.301.742± .301 .088±.007subscript.088plus-or-minus.007.088_±.007.088± .007 .003.003.003.003 0⁢(296)02960(296)0 ( 296 ) TopK (k=512512k=512k = 512) .900±.026subscript.900plus-or-minus.026.900_±.026.900± .026 .386±.376subscript.386plus-or-minus.376.386_±.376.386± .376 .865±.101subscript.865plus-or-minus.101.865_±.101.865± .101 .038±.006subscript.038plus-or-minus.006.038_±.006.038± .006 .002.002.002.002 0⁢(1)010(1)0 ( 1 ) BatchTopK (k=1616k=16k = 16) .585±.161subscript.585plus-or-minus.161.585_±.161.585± .161 1.222±.929subscript1.222plus-or-minus.9291.222_±.9291.222± .929 .416±.213subscript.416plus-or-minus.213.416_±.213.416± .213 .031±.026subscript.031plus-or-minus.026.031_±.026.031± .026 .002.002.002.002 0⁢(4278)042780(4278)0 ( 4278 ) BatchTopK (k=3232k=32k = 32) .608±.221subscript.608plus-or-minus.221.608_±.221.608± .221 .848±.277subscript.848plus-or-minus.277.848_±.277.848± .277 .494±.232subscript.494plus-or-minus.232.494_±.232.494± .232 .022±.012subscript.022plus-or-minus.012.022_±.012.022± .012 .002.002.002.002 0⁢(3080)030800(3080)0 ( 3080 ) BatchTopK (k=6464k=64k = 64) .712±.197subscript.712plus-or-minus.197.712_±.197.712± .197 .662±.229subscript.662plus-or-minus.229.662_±.229.662± .229 .581±.222subscript.581plus-or-minus.222.581_±.222.581± .222 .019±.013subscript.019plus-or-minus.013.019_±.013.019± .013 .002.002.002.002 0⁢(1477)014770(1477)0 ( 1477 ) BatchTopK (k=128128k=128k = 128) .858±.038subscript.858plus-or-minus.038.858_±.038.858± .038 .415±.180subscript.415plus-or-minus.180.415_±.180.415± .180 .787±.102subscript.787plus-or-minus.102.787_±.102.787± .102 .312±.009subscript.312plus-or-minus.009.312_±.009.312± .009 .003.003.003.003 0⁢(539)05390(539)0 ( 539 ) BatchTopK (k=256256k=256k = 256) .869±.026subscript.869plus-or-minus.026.869_±.026.869± .026 .159±.175subscript.159plus-or-minus.175.159_±.175.159± .175 .918±.108subscript.918plus-or-minus.108.918_±.108.918± .108 .716±.004subscript.716plus-or-minus.004.716_±.004.716± .004 .002.002.002.002 17⁢(919)1791917(919)17 ( 919 ) Matryoshka (RW) .824±.029subscript.824plus-or-minus.029.824_±.029.824± .029 .060±.052subscript.060plus-or-minus.052.060_±.052.060± .052 .971±.026subscript.971plus-or-minus.026.971_±.026.971± .026 .775±.001subscript.775plus-or-minus.001.775_±.001.775± .001 .002.002.002.002 0⁢(4)040(4)0 ( 4 ) Matryoshka (UW) .755±.024subscript.755plus-or-minus.024.755_±.024.755± .024 .026±.027subscript.026plus-or-minus.027.026_±.027.026± .027 .988±.012subscript.988plus-or-minus.012.988_±.012.988± .012 .790±.002subscript.790plus-or-minus.002.790_±.002.790± .002 .001.001.001.001 0⁢(22)0220(22)0 ( 22 ) Table 13: CLIP ViT-L/14 SAE text analysis at expansion rate 16. Extended C3M text evaluation showing scaling effects at expansion rate 16, complementing image results from Table 7. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .930±.027subscript.930plus-or-minus.027.930_±.027.930± .027 .510±.500subscript.510plus-or-minus.500.510_±.500.510± .500 .812±.052subscript.812plus-or-minus.052.812_±.052.812± .052 .581±.006subscript.581plus-or-minus.006.581_±.006.581± .006 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .809±.052subscript.809plus-or-minus.052.809_±.052.809± .052 .221±.255subscript.221plus-or-minus.255.221_±.255.221± .255 .926±.043subscript.926plus-or-minus.043.926_±.043.926± .043 .643±.008subscript.643plus-or-minus.008.643_±.008.643± .008 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .675±.067subscript.675plus-or-minus.067.675_±.067.675± .067 .070±.060subscript.070plus-or-minus.060.070_±.060.070± .060 .973±.023subscript.973plus-or-minus.023.973_±.023.973± .023 .654±.009subscript.654plus-or-minus.009.654_±.009.654± .009 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .599±.028subscript.599plus-or-minus.028.599_±.028.599± .028 .021±.019subscript.021plus-or-minus.019.021_±.019.021± .019 .990±.010subscript.990plus-or-minus.010.990_±.010.990± .010 .781±.002subscript.781plus-or-minus.002.781_±.002.781± .002 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .782±.283subscript.782plus-or-minus.283.782_±.283.782± .283 1.237±1.122subscript1.237plus-or-minus1.1221.237_± 1.1221.237± 1.122 .494±.208subscript.494plus-or-minus.208.494_±.208.494± .208 .038±.028subscript.038plus-or-minus.028.038_±.028.038± .028 .002.002.002.002 0⁢(4727)047270(4727)0 ( 4727 ) TopK (k=6464k=64k = 64) .774±.280subscript.774plus-or-minus.280.774_±.280.774± .280 .790±.258subscript.790plus-or-minus.258.790_±.258.790± .258 .583±.203subscript.583plus-or-minus.203.583_±.203.583± .203 .023±.008subscript.023plus-or-minus.008.023_±.008.023± .008 .002.002.002.002 0⁢(2079)020790(2079)0 ( 2079 ) TopK (k=128128k=128k = 128) .813±.212subscript.813plus-or-minus.212.813_±.212.813± .212 .604±.255subscript.604plus-or-minus.255.604_±.255.604± .255 .690±.190subscript.690plus-or-minus.190.690_±.190.690± .190 .029±.010subscript.029plus-or-minus.010.029_±.010.029± .010 .002.002.002.002 0⁢(897)08970(897)0 ( 897 ) TopK (k=256256k=256k = 256) .848±.156subscript.848plus-or-minus.156.848_±.156.848± .156 .390±.357subscript.390plus-or-minus.357.390_±.357.390± .357 .740±.280subscript.740plus-or-minus.280.740_±.280.740± .280 .093±.004subscript.093plus-or-minus.004.093_±.004.093± .004 .004.004.004.004 0⁢(1383)013830(1383)0 ( 1383 ) TopK (k=512512k=512k = 512) .950±.023subscript.950plus-or-minus.023.950_±.023.950± .023 .435±.453subscript.435plus-or-minus.453.435_±.453.435± .453 .855±.107subscript.855plus-or-minus.107.855_±.107.855± .107 .073±.014subscript.073plus-or-minus.014.073_±.014.073± .014 .002.002.002.002 0⁢(29)0290(29)0 ( 29 ) BatchTopK (k=1616k=16k = 16) .600±.115subscript.600plus-or-minus.115.600_±.115.600± .115 .917±.326subscript.917plus-or-minus.326.917_±.326.917± .326 .450±.202subscript.450plus-or-minus.202.450_±.202.450± .202 .032±.018subscript.032plus-or-minus.018.032_±.018.032± .018 .001.001.001.001 0⁢(9859)098590(9859)0 ( 9859 ) BatchTopK (k=3232k=32k = 32) .619±.181subscript.619plus-or-minus.181.619_±.181.619± .181 .758±.209subscript.758plus-or-minus.209.758_±.209.758± .209 .518±.232subscript.518plus-or-minus.232.518_±.232.518± .232 .031±.021subscript.031plus-or-minus.021.031_±.021.031± .021 .002.002.002.002 0⁢(8016)080160(8016)0 ( 8016 ) BatchTopK (k=6464k=64k = 64) .706±.242subscript.706plus-or-minus.242.706_±.242.706± .242 .687±.236subscript.687plus-or-minus.236.687_±.236.687± .236 .559±.240subscript.559plus-or-minus.240.559_±.240.559± .240 .029±.020subscript.029plus-or-minus.020.029_±.020.029± .020 .002.002.002.002 0⁢(5113)051130(5113)0 ( 5113 ) BatchTopK (k=128128k=128k = 128) .866±.034subscript.866plus-or-minus.034.866_±.034.866± .034 .465±.230subscript.465plus-or-minus.230.465_±.230.465± .230 .796±.081subscript.796plus-or-minus.081.796_±.081.796± .081 .240±.016subscript.240plus-or-minus.016.240_±.016.240± .016 .002.002.002.002 14⁢(2967)14296714(2967)14 ( 2967 ) BatchTopK (k=256256k=256k = 256) .789±.189subscript.789plus-or-minus.189.789_±.189.789± .189 .352±.371subscript.352plus-or-minus.371.352_±.371.352± .371 .741±.334subscript.741plus-or-minus.334.741_±.334.741± .334 .036±.004subscript.036plus-or-minus.004.036_±.004.036± .004 .002.002.002.002 7⁢(3558)735587(3558)7 ( 3558 ) Matryoshka (RW) .880±.021subscript.880plus-or-minus.021.880_±.021.880± .021 .043±.038subscript.043plus-or-minus.038.043_±.038.043± .038 .980±.019subscript.980plus-or-minus.019.980_±.019.980± .019 .783±.006subscript.783plus-or-minus.006.783_±.006.783± .006 .002.002.002.002 4⁢(124)41244(124)4 ( 124 ) Matryoshka (UW) .832±.028subscript.832plus-or-minus.028.832_±.028.832± .028 .017±.017subscript.017plus-or-minus.017.017_±.017.017± .017 .992±.008subscript.992plus-or-minus.008.992_±.008.992± .008 .788±.001subscript.788plus-or-minus.001.788_±.001.788± .001 .002.002.002.002 0⁢(491)04910(491)0 ( 491 ) Table 14: CLIP ViT-L/14 SAE text analysis at expansion rate 32. Maximum expansion rate analysis on C3M text. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .951±.023subscript.951plus-or-minus.023.951_±.023.951± .023 .540±.667subscript.540plus-or-minus.667.540_±.667.540± .667 .822±.053subscript.822plus-or-minus.053.822_±.053.822± .053 .557±.001subscript.557plus-or-minus.001.557_±.001.557± .001 .003.003.003.003 1⁢(0)101(0)1 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .867±.059subscript.867plus-or-minus.059.867_±.059.867± .059 .339±.546subscript.339plus-or-minus.546.339_±.546.339± .546 .927±.047subscript.927plus-or-minus.047.927_±.047.927± .047 .549±.011subscript.549plus-or-minus.011.549_±.011.549± .011 .002.002.002.002 1⁢(0)101(0)1 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .749±.090subscript.749plus-or-minus.090.749_±.090.749± .090 .172±.200subscript.172plus-or-minus.200.172_±.200.172± .200 .966±.027subscript.966plus-or-minus.027.966_±.027.966± .027 .336±.008subscript.336plus-or-minus.008.336_±.008.336± .008 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .631±.054subscript.631plus-or-minus.054.631_±.054.631± .054 .052±.045subscript.052plus-or-minus.045.052_±.045.052± .045 .983±.014subscript.983plus-or-minus.014.983_±.014.983± .014 .376±.007subscript.376plus-or-minus.007.376_±.007.376± .007 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .864±.144subscript.864plus-or-minus.144.864_±.144.864± .144 1.149±1.282subscript1.149plus-or-minus1.2821.149_± 1.2821.149± 1.282 .551±.169subscript.551plus-or-minus.169.551_±.169.551± .169 .058±.018subscript.058plus-or-minus.018.058_±.018.058± .018 .002.002.002.002 0⁢(14535)0145350(14535)0 ( 14535 ) TopK (k=6464k=64k = 64) .888±.156subscript.888plus-or-minus.156.888_±.156.888± .156 .795±.337subscript.795plus-or-minus.337.795_±.337.795± .337 .630±.157subscript.630plus-or-minus.157.630_±.157.630± .157 .049±.026subscript.049plus-or-minus.026.049_±.026.049± .026 .002.002.002.002 0⁢(2079)020790(2079)0 ( 2079 ) TopK (k=128128k=128k = 128) .871±.156subscript.871plus-or-minus.156.871_±.156.871± .156 .612±.252subscript.612plus-or-minus.252.612_±.252.612± .252 .717±.161subscript.717plus-or-minus.161.717_±.161.717± .161 .053±.024subscript.053plus-or-minus.024.053_±.024.053± .024 .002.002.002.002 0⁢(5604)056040(5604)0 ( 5604 ) TopK (k=256256k=256k = 256) .869±.134subscript.869plus-or-minus.134.869_±.134.869± .134 .435±.342subscript.435plus-or-minus.342.435_±.342.435± .342 .740±.240subscript.740plus-or-minus.240.740_±.240.740± .240 .130±.006subscript.130plus-or-minus.006.130_±.006.130± .006 .003.003.003.003 0⁢(6590)065900(6590)0 ( 6590 ) TopK (k=512512k=512k = 512) .937±.062subscript.937plus-or-minus.062.937_±.062.937± .062 .115±.131subscript.115plus-or-minus.131.115_±.131.115± .131 .944±.069subscript.944plus-or-minus.069.944_±.069.944± .069 .070±.034subscript.070plus-or-minus.034.070_±.034.070± .034 .002.002.002.002 240⁢(22446)24022446240(22446)240 ( 22446 ) BatchTopK (k=1616k=16k = 16) .601±.079subscript.601plus-or-minus.079.601_±.079.601± .079 .818±.209subscript.818plus-or-minus.209.818_±.209.818± .209 .496±.191subscript.496plus-or-minus.191.496_±.191.496± .191 .034±.024subscript.034plus-or-minus.024.034_±.024.034± .024 .002.002.002.002 0⁢(21631)0216310(21631)0 ( 21631 ) BatchTopK (k=3232k=32k = 32) .681±.143subscript.681plus-or-minus.143.681_±.143.681± .143 .733±.235subscript.733plus-or-minus.235.733_±.235.733± .235 .592±.190subscript.592plus-or-minus.190.592_±.190.592± .190 .034±.020subscript.034plus-or-minus.020.034_±.020.034± .020 .002.002.002.002 0⁢(21631)0216310(21631)0 ( 21631 ) BatchTopK (k=6464k=64k = 64) .784±.178subscript.784plus-or-minus.178.784_±.178.784± .178 .636±.225subscript.636plus-or-minus.225.636_±.225.636± .225 .657±.181subscript.657plus-or-minus.181.657_±.181.657± .181 .042±.026subscript.042plus-or-minus.026.042_±.026.042± .026 .002.002.002.002 0⁢(18965)0189650(18965)0 ( 18965 ) BatchTopK (k=128128k=128k = 128) .861±.016subscript.861plus-or-minus.016.861_±.016.861± .016 .377±.159subscript.377plus-or-minus.159.377_±.159.377± .159 .830±.055subscript.830plus-or-minus.055.830_±.055.830± .055 .526±.012subscript.526plus-or-minus.012.526_±.012.526± .012 .002.002.002.002 94⁢(11019)941101994(11019)94 ( 11019 ) BatchTopK (k=256256k=256k = 256) .784±.139subscript.784plus-or-minus.139.784_±.139.784± .139 .332±.333subscript.332plus-or-minus.333.332_±.333.332± .333 .778±.288subscript.778plus-or-minus.288.778_±.288.778± .288 .060±.007subscript.060plus-or-minus.007.060_±.007.060± .007 .002.002.002.002 0⁢(11802)0118020(11802)0 ( 11802 ) Matryoshka (RW) .925±.014subscript.925plus-or-minus.014.925_±.014.925± .014 .030±.026subscript.030plus-or-minus.026.030_±.026.030± .026 .986±.013subscript.986plus-or-minus.013.986_±.013.986± .013 .774±.000subscript.774plus-or-minus.000.774_±.000.774± .000 .002.002.002.002 32⁢(142)3214232(142)32 ( 142 ) Matryoshka (UW) .901±.026subscript.901plus-or-minus.026.901_±.026.901± .026 .013±.013subscript.013plus-or-minus.013.013_±.013.013± .013 .994±.006subscript.994plus-or-minus.006.994_±.006.994± .006 .784±.000subscript.784plus-or-minus.000.784_±.000.784± .000 .002.002.002.002 126⁢(162)126162126(162)126 ( 162 ) Table 15: CLIP ViT-B/16 SAE text analysis at expansion rate 8. C3M text evaluation using smaller CLIP architecture, enabling cross-modal and cross-architecture comparisons. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .870±.021subscript.870plus-or-minus.021.870_±.021.870± .021 .472±.166subscript.472plus-or-minus.166.472_±.166.472± .166 .761±.063subscript.761plus-or-minus.063.761_±.063.761± .063 .661±.008subscript.661plus-or-minus.008.661_±.008.661± .008 .004.004.004.004 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .697±.046subscript.697plus-or-minus.046.697_±.046.697± .046 .183±.141subscript.183plus-or-minus.141.183_±.141.183± .141 .920±.048subscript.920plus-or-minus.048.920_±.048.920± .048 .721±.003subscript.721plus-or-minus.003.721_±.003.721± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .580±.028subscript.580plus-or-minus.028.580_±.028.580± .028 .030±.028subscript.030plus-or-minus.028.030_±.028.030± .028 .986±.013subscript.986plus-or-minus.013.986_±.013.986± .013 .764±.000subscript.764plus-or-minus.000.764_±.000.764± .000 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .393±.122subscript.393plus-or-minus.122.393_±.122.393± .122 .118±.177subscript.118plus-or-minus.177.118_±.177.118± .177 .975±.022subscript.975plus-or-minus.022.975_±.022.975± .022 .129±.021subscript.129plus-or-minus.021.129_±.021.129± .021 .002.002.002.002 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .757±.265subscript.757plus-or-minus.265.757_±.265.757± .265 1.095±1.081subscript1.095plus-or-minus1.0811.095_± 1.0811.095± 1.081 .533±.174subscript.533plus-or-minus.174.533_±.174.533± .174 .035±.013subscript.035plus-or-minus.013.035_±.013.035± .013 .004.004.004.004 0⁢(672)06720(672)0 ( 672 ) TopK (k=6464k=64k = 64) .766±.223subscript.766plus-or-minus.223.766_±.223.766± .223 .733±.353subscript.733plus-or-minus.353.733_±.353.733± .353 .644±.155subscript.644plus-or-minus.155.644_±.155.644± .155 .033±.001subscript.033plus-or-minus.001.033_±.001.033± .001 .003.003.003.003 0⁢(196)01960(196)0 ( 196 ) TopK (k=128128k=128k = 128) .747±.164subscript.747plus-or-minus.164.747_±.164.747± .164 .515±.343subscript.515plus-or-minus.343.515_±.343.515± .343 .782±.106subscript.782plus-or-minus.106.782_±.106.782± .106 .275±.016subscript.275plus-or-minus.016.275_±.016.275± .016 .003.003.003.003 0⁢(95)0950(95)0 ( 95 ) TopK (k=256256k=256k = 256) .783±.095subscript.783plus-or-minus.095.783_±.095.783± .095 .229±.152subscript.229plus-or-minus.152.229_±.152.229± .152 .888±.081subscript.888plus-or-minus.081.888_±.081.888± .081 .759±.000subscript.759plus-or-minus.000.759_±.000.759± .000 .004.004.004.004 0⁢(8)080(8)0 ( 8 ) TopK (k=512512k=512k = 512) .794±.174subscript.794plus-or-minus.174.794_±.174.794± .174 .283±.282subscript.283plus-or-minus.282.283_±.282.283± .282 .860±.154subscript.860plus-or-minus.154.860_±.154.860± .154 .079±.021subscript.079plus-or-minus.021.079_±.021.079± .021 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) BatchTopK (k=1616k=16k = 16) .602±.156subscript.602plus-or-minus.156.602_±.156.602± .156 1.172±1.110subscript1.172plus-or-minus1.1101.172_± 1.1101.172± 1.110 .461±.178subscript.461plus-or-minus.178.461_±.178.461± .178 .018±.010subscript.018plus-or-minus.010.018_±.010.018± .010 .002.002.002.002 0⁢(2666)026660(2666)0 ( 2666 ) BatchTopK (k=3232k=32k = 32) .662±.213subscript.662plus-or-minus.213.662_±.213.662± .213 .898±.510subscript.898plus-or-minus.510.898_±.510.898± .510 .547±.176subscript.547plus-or-minus.176.547_±.176.547± .176 .016±.008subscript.016plus-or-minus.008.016_±.008.016± .008 .003.003.003.003 0⁢(1763)017630(1763)0 ( 1763 ) BatchTopK (k=6464k=64k = 64) .716±.204subscript.716plus-or-minus.204.716_±.204.716± .204 .656±.212subscript.656plus-or-minus.212.656_±.212.656± .212 .637±.166subscript.637plus-or-minus.166.637_±.166.637± .166 .029±.008subscript.029plus-or-minus.008.029_±.008.029± .008 .003.003.003.003 0⁢(830)08300(830)0 ( 830 ) BatchTopK (k=128128k=128k = 128) .715±.270subscript.715plus-or-minus.270.715_±.270.715± .270 .654±.322subscript.654plus-or-minus.322.654_±.322.654± .322 .662±.256subscript.662plus-or-minus.256.662_±.256.662± .256 .033±.009subscript.033plus-or-minus.009.033_±.009.033± .009 .003.003.003.003 0⁢(387)03870(387)0 ( 387 ) BatchTopK (k=256256k=256k = 256) .774±.177subscript.774plus-or-minus.177.774_±.177.774± .177 .317±.358subscript.317plus-or-minus.358.317_±.358.317± .358 .776±.301subscript.776plus-or-minus.301.776_±.301.776± .301 .103±.004subscript.103plus-or-minus.004.103_±.004.103± .004 .003.003.003.003 0⁢(1766)017660(1766)0 ( 1766 ) Matryoshka (RW) .762±.063subscript.762plus-or-minus.063.762_±.063.762± .063 .044±.047subscript.044plus-or-minus.047.044_±.047.044± .047 .979±.023subscript.979plus-or-minus.023.979_±.023.979± .023 .799±.001subscript.799plus-or-minus.001.799_±.001.799± .001 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) Matryoshka (UW) .709±.043subscript.709plus-or-minus.043.709_±.043.709± .043 .021±.025subscript.021plus-or-minus.025.021_±.025.021± .025 .990±.012subscript.990plus-or-minus.012.990_±.012.990± .012 .812±.003subscript.812plus-or-minus.003.812_±.003.812± .003 .003.003.003.003 0⁢(1)010(1)0 ( 1 ) Table 16: CLIP ViT-B/16 SAE text analysis at expansion rate 16. Results for expansion rate 16 with ViT-B/16 on C3M text data. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .910±.023subscript.910plus-or-minus.023.910_±.023.910± .023 .457±.178subscript.457plus-or-minus.178.457_±.178.457± .178 .767±.067subscript.767plus-or-minus.067.767_±.067.767± .067 .614±.008subscript.614plus-or-minus.008.614_±.008.614± .008 .004.004.004.004 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .777±.065subscript.777plus-or-minus.065.777_±.065.777± .065 .248±.244subscript.248plus-or-minus.244.248_±.244.248± .244 .911±.049subscript.911plus-or-minus.049.911_±.049.911± .049 .600±.003subscript.600plus-or-minus.003.600_±.003.600± .003 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .641±.055subscript.641plus-or-minus.055.641_±.055.641± .055 .052±.038subscript.052plus-or-minus.038.052_±.038.052± .038 .994±.005subscript.994plus-or-minus.005.994_±.005.994± .005 .767±.001subscript.767plus-or-minus.001.767_±.001.767± .001 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .577±.037subscript.577plus-or-minus.037.577_±.037.577± .037 .013±.012subscript.013plus-or-minus.012.013_±.012.013± .012 .979±.016subscript.979plus-or-minus.016.979_±.016.979± .016 .678±.002subscript.678plus-or-minus.002.678_±.002.678± .002 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .811±.232subscript.811plus-or-minus.232.811_±.232.811± .232 1.163±1.376subscript1.163plus-or-minus1.3761.163_± 1.3761.163± 1.376 .518±.185subscript.518plus-or-minus.185.518_±.185.518± .185 .043±.020subscript.043plus-or-minus.020.043_±.020.043± .020 .003.003.003.003 0⁢(2819)028190(2819)0 ( 2819 ) TopK (k=6464k=64k = 64) .801±.231subscript.801plus-or-minus.231.801_±.231.801± .231 .803±.424subscript.803plus-or-minus.424.803_±.424.803± .424 .618±.170subscript.618plus-or-minus.170.618_±.170.618± .170 .028±.006subscript.028plus-or-minus.006.028_±.006.028± .006 .003.003.003.003 0⁢(1268)012680(1268)0 ( 1268 ) TopK (k=128128k=128k = 128) .781±.217subscript.781plus-or-minus.217.781_±.217.781± .217 .601±.252subscript.601plus-or-minus.252.601_±.252.601± .252 .696±.177subscript.696plus-or-minus.177.696_±.177.696± .177 .045±.002subscript.045plus-or-minus.002.045_±.002.045± .002 .004.004.004.004 0⁢(597)05970(597)0 ( 597 ) TopK (k=256256k=256k = 256) .787±.224subscript.787plus-or-minus.224.787_±.224.787± .224 .430±.401subscript.430plus-or-minus.401.430_±.401.430± .401 .688±.326subscript.688plus-or-minus.326.688_±.326.688± .326 .098±.007subscript.098plus-or-minus.007.098_±.007.098± .007 .003.003.003.003 0⁢(4686)046860(4686)0 ( 4686 ) TopK (k=512512k=512k = 512) .918±.051subscript.918plus-or-minus.051.918_±.051.918± .051 .447±.804subscript.447plus-or-minus.804.447_±.804.447± .804 .891±.089subscript.891plus-or-minus.089.891_±.089.891± .089 .070±.009subscript.070plus-or-minus.009.070_±.009.070± .009 .003.003.003.003 0⁢(1)010(1)0 ( 1 ) BatchTopK (k=1616k=16k = 16) .592±.109subscript.592plus-or-minus.109.592_±.109.592± .109 .940±.510subscript.940plus-or-minus.510.940_±.510.940± .510 .480±.187subscript.480plus-or-minus.187.480_±.187.480± .187 .018±.007subscript.018plus-or-minus.007.018_±.007.018± .007 .002.002.002.002 0⁢(6478)064780(6478)0 ( 6478 ) BatchTopK (k=3232k=32k = 32) .650±.174subscript.650plus-or-minus.174.650_±.174.650± .174 .843±.428subscript.843plus-or-minus.428.843_±.428.843± .428 .560±.182subscript.560plus-or-minus.182.560_±.182.560± .182 .015±.007subscript.015plus-or-minus.007.015_±.007.015± .007 .002.002.002.002 0⁢(5114)051140(5114)0 ( 5114 ) BatchTopK (k=6464k=64k = 64) .734±.174subscript.734plus-or-minus.174.734_±.174.734± .174 .649±.212subscript.649plus-or-minus.212.649_±.212.649± .212 .654±.153subscript.654plus-or-minus.153.654_±.153.654± .153 .022±.006subscript.022plus-or-minus.006.022_±.006.022± .006 .003.003.003.003 0⁢(3145)031450(3145)0 ( 3145 ) BatchTopK (k=128128k=128k = 128) .764±.218subscript.764plus-or-minus.218.764_±.218.764± .218 .643±.310subscript.643plus-or-minus.310.643_±.310.643± .310 .685±.216subscript.685plus-or-minus.216.685_±.216.685± .216 .032±.004subscript.032plus-or-minus.004.032_±.004.032± .004 .003.003.003.003 0⁢(1817)018170(1817)0 ( 1817 ) BatchTopK (k=256256k=256k = 256) .799±.194subscript.799plus-or-minus.194.799_±.194.799± .194 .337±.375subscript.337plus-or-minus.375.337_±.375.337± .375 .755±.321subscript.755plus-or-minus.321.755_±.321.755± .321 .061±.006subscript.061plus-or-minus.006.061_±.006.061± .006 .002.002.002.002 0⁢(4947)049470(4947)0 ( 4947 ) Matryoshka (RW) .847±.040subscript.847plus-or-minus.040.847_±.040.847± .040 .033±.036subscript.033plus-or-minus.036.033_±.036.033± .036 .984±.018subscript.984plus-or-minus.018.984_±.018.984± .018 .800±.002subscript.800plus-or-minus.002.800_±.002.800± .002 .003.003.003.003 0⁢(63)0630(63)0 ( 63 ) Matryoshka (UW) .801±.043subscript.801plus-or-minus.043.801_±.043.801± .043 .017±.021subscript.017plus-or-minus.021.017_±.021.017± .021 .992±.010subscript.992plus-or-minus.010.992_±.010.992± .010 .803±.001subscript.803plus-or-minus.001.803_±.001.803± .001 .003.003.003.003 0⁢(275)02750(275)0 ( 275 ) Table 17: CLIP ViT-B/16 SAE text analysis at expansion rate 32. Final expansion rate evaluation for ViT-B/16 on C3M text. Model L0subscript0L_0L0 ↑ ↑ FVU ↓ ↓ CS ↑ ↑ CKNNA ↑ ↑ DO ↓ ↓ NDN ↓ ↓ ReLU (λ=0.030.03λ=0.03λ = 0.03) .934±.019subscript.934plus-or-minus.019.934_±.019.934± .019 .472±.356subscript.472plus-or-minus.356.472_±.356.472± .356 .790±.058subscript.790plus-or-minus.058.790_±.058.790± .058 .610±.008subscript.610plus-or-minus.008.610_±.008.610± .008 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.010.01λ=0.01λ = 0.01) .828±.123subscript.828plus-or-minus.123.828_±.123.828± .123 .839±1.624subscript.839plus-or-minus1.624.839_± 1.624.839± 1.624 .894±.077subscript.894plus-or-minus.077.894_±.077.894± .077 .145±.015subscript.145plus-or-minus.015.145_±.015.145± .015 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0030.003λ=0.003λ = 0.003) .713±.103subscript.713plus-or-minus.103.713_±.103.713± .103 .183±.201subscript.183plus-or-minus.201.183_±.201.183± .201 .963±.028subscript.963plus-or-minus.028.963_±.028.963± .028 .304±.004subscript.304plus-or-minus.004.304_±.004.304± .004 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) ReLU (λ=0.0010.001λ=0.001λ = 0.001) .600±.051subscript.600plus-or-minus.051.600_±.051.600± .051 .041±.033subscript.041plus-or-minus.033.041_±.033.041± .033 .987±.009subscript.987plus-or-minus.009.987_±.009.987± .009 .422±.002subscript.422plus-or-minus.002.422_±.002.422± .002 .003.003.003.003 0⁢(0)000(0)0 ( 0 ) TopK (k=3232k=32k = 32) .850±.158subscript.850plus-or-minus.158.850_±.158.850± .158 1.091±1.367subscript1.091plus-or-minus1.3671.091_± 1.3671.091± 1.367 .547±.163subscript.547plus-or-minus.163.547_±.163.547± .163 .076±.008subscript.076plus-or-minus.008.076_±.008.076± .008 .003.003.003.003 0⁢(9223)092230(9223)0 ( 9223 ) TopK (k=6464k=64k = 64) .879±.146subscript.879plus-or-minus.146.879_±.146.879± .146 .832±.622subscript.832plus-or-minus.622.832_±.622.832± .622 .645±.138subscript.645plus-or-minus.138.645_±.138.645± .138 .075±.028subscript.075plus-or-minus.028.075_±.028.075± .028 .003.003.003.003 0⁢(5643)056430(5643)0 ( 5643 ) TopK (k=128128k=128k = 128) .863±.145subscript.863plus-or-minus.145.863_±.145.863± .145 .590±.247subscript.590plus-or-minus.247.590_±.247.590± .247 .740±.126subscript.740plus-or-minus.126.740_±.126.740± .126 .150±.029subscript.150plus-or-minus.029.150_±.029.150± .029 .003.003.003.003 0⁢(3321)033210(3321)0 ( 3321 ) TopK (k=256256k=256k = 256) .793±.215subscript.793plus-or-minus.215.793_±.215.793± .215 .365±.359subscript.365plus-or-minus.359.365_±.359.365± .359 .776±.253subscript.776plus-or-minus.253.776_±.253.776± .253 .197±.024subscript.197plus-or-minus.024.197_±.024.197± .024 .003.003.003.003 0⁢(10480)0104800(10480)0 ( 10480 ) TopK (k=512512k=512k = 512) .917±.109subscript.917plus-or-minus.109.917_±.109.917± .109 .097±.172subscript.097plus-or-minus.172.097_±.172.097± .172 .955±.087subscript.955plus-or-minus.087.955_±.087.955± .087 .027±.016subscript.027plus-or-minus.016.027_±.016.027± .016 .002.002.002.002 73⁢(14864)731486473(14864)73 ( 14864 ) BatchTopK (k=1616k=16k = 16) .581±.066subscript.581plus-or-minus.066.581_±.066.581± .066 .769±.191subscript.769plus-or-minus.191.769_±.191.769± .191 .536±.157subscript.536plus-or-minus.157.536_±.157.536± .157 .027±.012subscript.027plus-or-minus.012.027_±.012.027± .012 .002.002.002.002 0⁢(14345)0143450(14345)0 ( 14345 ) BatchTopK (k=3232k=32k = 32) .657±.113subscript.657plus-or-minus.113.657_±.113.657± .113 .726±.284subscript.726plus-or-minus.284.726_±.284.726± .284 .628±.136subscript.628plus-or-minus.136.628_±.136.628± .136 .032±.020subscript.032plus-or-minus.020.032_±.020.032± .020 .002.002.002.002 0⁢(12492)0124920(12492)0 ( 12492 ) BatchTopK (k=6464k=64k = 64) .766±.126subscript.766plus-or-minus.126.766_±.126.766± .126 .631±.227subscript.631plus-or-minus.227.631_±.227.631± .227 .704±.113subscript.704plus-or-minus.113.704_±.113.704± .113 .052±.024subscript.052plus-or-minus.024.052_±.024.052± .024 .002.002.002.002 0⁢(9645)096450(9645)0 ( 9645 ) BatchTopK (k=128128k=128k = 128) .783±.163subscript.783plus-or-minus.163.783_±.163.783± .163 .628±.335subscript.628plus-or-minus.335.628_±.335.628± .335 .727±.166subscript.727plus-or-minus.166.727_±.166.727± .166 .046±.016subscript.046plus-or-minus.016.046_±.016.046± .016 .003.003.003.003 0⁢(6855)068550(6855)0 ( 6855 ) BatchTopK (k=256256k=256k = 256) .773±.172subscript.773plus-or-minus.172.773_±.172.773± .172 .311±.339subscript.311plus-or-minus.339.311_±.339.311± .339 .799±.270subscript.799plus-or-minus.270.799_±.270.799± .270 .117±.006subscript.117plus-or-minus.006.117_±.006.117± .006 .002.002.002.002 0⁢(11638)0116380(11638)0 ( 11638 ) Matryoshka (RW) .897±.035subscript.897plus-or-minus.035.897_±.035.897± .035 .022±.026subscript.022plus-or-minus.026.022_±.026.022± .026 .990±.013subscript.990plus-or-minus.013.990_±.013.990± .013 .807±.002subscript.807plus-or-minus.002.807_±.002.807± .002 .003.003.003.003 1⁢(23)1231(23)1 ( 23 ) Matryoshka (UW) .873±.030subscript.873plus-or-minus.030.873_±.030.873± .030 .010±.014subscript.010plus-or-minus.014.010_±.014.010± .014 .995±.007subscript.995plus-or-minus.007.995_±.007.995± .007 .806±.003subscript.806plus-or-minus.003.806_±.003.806± .003 .002.002.002.002 0⁢(26)0260(26)0 ( 26 ) Appendix G Interpreting CLIP with MSAE: Additional Results In this appendix section, we provide additional analysis supporting Section 5. Section G.1 presents high-magnitude activation samples across modalities from MSAE (RW) with an expansion rate of 8. Section G.2 demonstrates how SAE enhances similarity search with interpretable results. Section G.3 presents statistical gender bias analysis on CelebA dataset, supported by concept manipulation visualizations that reinforce the statistical findings. These analyses strengthen our findings from Section 5 while providing deeper insights into MSAE’s interpretability capabilities. G.1 Concept Visualization Analysis Figures 18 and 19 showcase six valid concepts through their highest-activating images and texts, confirming concept validity. Conversely, Figure 20 demonstrates two invalid concepts, highlighting the importance of validation methods from Section A. Figure 18: High-magnitude image activations for valid concepts. We gather top activating ImageNet-1k images for six valid MSAE (RW) concept neurons. Figure 19: Cross-modal highest valid concept activation samples. Extending Figure 8, we show the highest-activating ImageNet-1k images and C3M texts from valid MSAE (RW) concepts: smile, alcoholic, trio, heart, running, and questions. Figure 20: Analysis of invalid concept neurons in MSAE (RW). In (a), we showcase the invalid concept ’6’ with a low similarity score (<0.42absent0.42<0.42< 0.42), which shows inconsistent presence of the number six in the top active samples. In (b), we present how a low ratio threshold (0.45/0.44<20.450.4420.45/0.44<20.45 / 0.44 < 2) can indicate a broader ’h’ concept rather than a specific ’hl’/’hri’ from the vocabulary. G.2 SAE-Enhanced Similarity Search Building upon Section 5, we demonstrate how SAE enhances nearest neighbor (N) search by revealing shared semantic concepts between query and retrieved images. Figure 21 illustrates how SAE uncovers interpretable features that drive CLIP’s similarity assessments. Furthermore, we show that conducting similarity search directly in the SAE activation space produces comparable results to CLIP-based search while providing more semantically meaningful matches. Figure 21: SAE-enhanced similarity search. Examples demonstrating how SAE reveals shared semantic concepts (bottom row) between query images and their CLIP nearest neighbors (top row), providing interpretable explanations for similarity matches. Additionally, the two rightmost examples show nearest neighbors retrieved based on SAE activation similarity, demonstrating how searching in the SAE space yields similar results to CLIP-based search while making the retrieval process more semantically interpretable. G.3 Gender Bias Analysis in CelebA We analyze gender biases in a CLIP-based classification model using the CelebA dataset, which forms the foundation for our analysis in Section 5.3. Through statistical analysis of concept magnitude distribution against the model gender predictions in Figure 24, we identify significant gender associations for concepts bearded, blondes, and glasses in the classification model. To verify that these concepts align with the true features in the CelebA dataset, we visualize highest-activation images for each concept in Figure 22. Further concept manipulation experiments on both female (Figure 7) and male (Figure 23) examples confirm and strengthen these statistical findings, providing even greater insight into the relationship between gender classification and the chosen concepts. Figure 22: Highest-activating CelebA images for gender-associated concepts. We visualize images from the CelebA test set that produce the highest activations for the concepts bearded, blondes, and glasses, validating their alignment with the concept. Figure 23: Impact of concept manipulation for the male example. Complementing Figure 7, we further strengthen our findings of male association for bearded, moderate for glasses, and female bias for blondes concept. Figure 24: Statistical analysis of concept-gender associations. We analyze six concepts: bearded, blondes, black, hair, glasses, and ginger. For each concept, we show its density distribution of concept magnitude against gender prediction alongside corresponding boxplots. Results reveal that bearded, blondes, and glasses exhibit significant gender-specific associations. Appendix H Stability Evaluations To gain a deeper understanding of the stability of learned feature directions for the decoder and encoder across various training seeds, we calculated the stability metric proposed by Paulo & Belrose (2025). Table 18 shows the results for all tested SAEs at an expansion rate of 8. We observe that the stability metric is highly correlated with sparsity. Furthermore, Matryoshka SAE demonstrates a comparable stability trade-off to alternative architectures. Table 18: Stability–Sparsity–Reconstruction trade-off (Pareto front) for CLIP (ViT-L/14) on ImageNet-1k. Rows are sorted by sparsity. We observe that (1) stability is highly correlated with sparsity, and (2) Matryoshka SAE exhibits an on-par stability trade-off as compared to other architectures. Model Sparsity (L0 ↑ ↑) Reconstruction (FVU ↓ ↓) Stability (Decoder/Encoder ↑ ↑) TopK (k=3232k=32k = 32) .960.960.960.960 .245.245.245.245 .649/.245.649.245.649/.245.649 / .245 TopK (k=6464k=64k = 64) .950.950.950.950 .172.172.172.172 .688/.240.688.240.688/.240.688 / .240 TopK (k=128128k=128k = 128) .928.928.928.928 .098.098.098.098 .625/.187.625.187.625/.187.625 / .187 TopK (k=512512k=512k = 512) .922.922.922.922 .336.336.336.336 .248/.187.248.187.248/.187.248 / .187 ReLU (λ=0.030.03λ=0.03λ = 0.03) .920.920.920.920 .185.185.185.185 .522/.124.522.124.522/.124.522 / .124 TopK (k=256256k=256k = 256) .900.900.900.900 .011.011.011.011 .624/.235.624.235.624/.235.624 / .235 BatchTopK (k=128128k=128k = 128) .898.898.898.898 .082.082.082.082 .622/.238.622.238.622/.238.622 / .238 BatchTopK (k=256256k=256k = 256) .882.882.882.882 .010.010.010.010 .573/.231.573.231.573/.231.573 / .231 BatchTopK (k=6464k=64k = 64) .877.877.877.877 .162.162.162.162 .586/.238.586.238.586/.238.586 / .238 Matryoshka (RW) .829.829.829.829 .007.007.007.007 .437/.102.437.102.437/.102.437 / .102 BatchTopK (k=3232k=32k = 32) .776.776.776.776 .242.242.242.242 .467/.168.467.168.467/.168.467 / .168 ReLU (λ=0.010.01λ=0.01λ = 0.01) .762.762.762.762 .033.033.033.033 .401/.068.401.068.401/.068.401 / .068 Matryoshka (UW) .748.748.748.748 .002.002.002.002 .366/.065.366.065.366/.065.366 / .065 BatchTopK (k=1616k=16k = 16) .698.698.698.698 .371.371.371.371 .352/.108.352.108.352/.108.352 / .108 ReLU (λ=0.0030.003λ=0.003λ = 0.003) .649.649.649.649 .004.004.004.004 .334/.042.334.042.334/.042.334 / .042 ReLU (λ=0.0010.001λ=0.001λ = 0.001) .553.553.553.553 .002.002.002.002 .200/.041.200.041.200/.041.200 / .041