← Back to papers

Paper deep dive

LLM Interpretability with Identifiable Temporal-Instantaneous Representation

Xiangchen Song, Jiaqi Sun, Zijian Li, Yujia Zheng, Kun Zhang

Year: 2025Venue: arXiv preprintArea: Mechanistic Interp.Type: TheoreticalEmbeddings: 156

Models: Pythia-160m-deduped

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/12/2026, 6:01:55 PM

Summary

The paper introduces an identifiable temporal causal representation learning framework for Large Language Models (LLMs). It addresses the limitations of sparse autoencoders (SAEs) by incorporating time-delayed and instantaneous causal relations, providing theoretical guarantees for latent variable recovery, and ensuring scalability for high-dimensional LLM activation spaces.

Entities (4)

Large Language Models · technology · 99%Sparse Autoencoders · method · 98%Causal Representation Learning · framework · 95%Identifiable Temporal Causal Representation Learning Framework · method · 92%

Relation Signals (3)

Sparse Autoencoders lacks Temporal dependency modeling

confidence 95% · Mechanistic interpretability tools such as sparse autoencoders (SAEs) were developed to extract interpretable features from LLMs but lack temporal dependency modeling

Identifiable Temporal Causal Representation Learning Framework models Time-delayed causal relations

confidence 95% · capturing both time-delayed and instantaneous causal relations.

Identifiable Temporal Causal Representation Learning Framework extends Sparse Autoencoders

confidence 90% · By extending SAE techniques with our temporal causal framework, we successfully discover meaningful concept relationships in LLM activations.

Cypher Suggestions (2)

Identify limitations of a specific method · confidence 95% · unvalidated

MATCH (m:Method {name: 'Sparse Autoencoders'})-[:LACKS]->(c:Capability) RETURN c.name

Find all methods used for LLM interpretability · confidence 90% · unvalidated

MATCH (m:Method)-[:USED_FOR]->(t:Technology {name: 'Large Language Models'}) RETURN m.name

Abstract

Abstract:Despite Large Language Models' remarkable capabilities, understanding their internal representations remains challenging. Mechanistic interpretability tools such as sparse autoencoders (SAEs) were developed to extract interpretable features from LLMs but lack temporal dependency modeling, instantaneous relation representation, and more importantly theoretical guarantees, undermining both the theoretical foundations and the practical confidence necessary for subsequent analyses. While causal representation learning (CRL) offers theoretically grounded approaches for uncovering latent concepts, existing methods cannot scale to LLMs' rich conceptual space due to inefficient computation. To bridge the gap, we introduce an identifiable temporal causal representation learning framework specifically designed for LLMs' high-dimensional concept space, capturing both time-delayed and instantaneous causal relations. Our approach provides theoretical guarantees and demonstrates efficacy on synthetic datasets scaled to match real-world complexity. By extending SAE techniques with our temporal causal framework, we successfully discover meaningful concept relationships in LLM activations. Our findings show that modeling both temporal and instantaneous conceptual relationships advances the interpretability of LLMs.

Tags

ai-safety (imported, 100%)interpretability (suggested, 80%)mechanistic-interp (suggested, 92%)theoretical (suggested, 88%)

Links

PDF not stored locally. Use the link above to view on the source site.

Full Text

155,505 characters extracted from source content.

Expand or collapse full text

LLM Interpretability with Identifiable Temporal-Instantaneous Representation Xiangchen Song †,1 Jiaqi Sun †,1 Zijian Li 2 Yujia Zheng 1 Kun Zhang 1,2 1 Carnegie Mellon University 2 Mohamed bin Zayed University of Artificial Intelligence xiangchensong,jiaqisun,kunz1@cmu.edu Abstract Despite Large Language Models’ remarkable capabilities, understanding their internal representations remains challenging. Mechanistic interpretability tools such as sparse autoencoders (SAEs) were developed to extract interpretable fea- tures from LLMs but lack temporal dependency modeling, instantaneous relation representation, and more importantly theoretical guarantees—undermining both the theoretical foundations and the practical confidence necessary for subsequent analyses. While causal representation learning (CRL) offers theoretically-grounded approaches for uncovering latent concepts, existing methods cannot scale to LLMs’ rich conceptual space due to inefficient computation. To bridge the gap, we intro- duce an identifiable temporal causal representation learning framework specifically designed for LLMs’ high-dimensional concept space, capturing both time-delayed and instantaneous causal relations. Our approach provides theoretical guaran- tees and demonstrates efficacy on synthetic datasets scaled to match real-world complexity. By extending SAE techniques with our temporal causal framework, we successfully discover meaningful concept relationships in LLM activations. Our findings show that modeling both temporal and instantaneous conceptual relationships advances the interpretability of LLMs. 1 Introduction Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language tasks, from question answering to content generation. Despite these achievements, a fundamental understanding of their internal representations remains underexplored. This gap between performance and interpretability poses significant challenges for ensuring the reliability, safety, and appropriate deployment of these increasingly powerful systems [47]. Mechanistic interpretability (MI) aims to bridge this gap by reverse-engineering neural networks to understand how they process and represent information [14]. Among all MI tools, sparse autoencoders (SAEs) have emerged as a promising approach for extracting interpretable features from LLMs [14,52]. By decomposing the high-dimensional activations of LLMs into sparse, monosemantic features, SAEs help identify the basic units of computation within these complex systems. However, SAEs present several limitations that restrict their utility for comprehensive model understanding: First, SAEs treat each feature as an isolated representation, failing to capture how features influence one another. This omission disregards semantic connections and transitions within a sequence, which are known as temporal or time-delayed relationships between features * . Second, SAEs lack mechanisms to represent instantaneous or logical relationships between features, such as mutual † Equal contribution. * Alternative approaches, such as [1,30], use attention scores from the LLM to infer time-delayed influence. 39th Conference on Neural Information Processing Systems (NeurIPS 2025). arXiv:2509.23323v2 [cs.LG] 2 Jan 2026 exclusivity or co-occurrence constraints [32]. These relationships complement the temporal dynamics by encoding structural dependencies within the same time step. Third, and most critically, SAEs offer no theoretical guarantees of the uniqueness of the recovered features. This absence undermines confidence that the extracted features reflect meaningful and stable latent variables, rather than arbitrary or unstable transformations [54]. Fortunately, to address these limitations, the causal representation learning (CRL) community has proposed a range of promising frameworks with theoretical guarantees [47]. For instance, [27] and [31] use sparse causal influence and interventions to uncover temporal and instantaneous relationships among latent variables. However, these methods face significant scalability challenges due to the computational inefficiency of estimating Jacobians. As a result, they typically scale to only dozens or hundreds of concepts [56], while interpretability in LLMs demands efficient modeling of thousands or even tens of thousands of concept features [52]. In summary, although CRL offers strong theoretical guarantees for recovering meaningful features and their causal relationships, its limited scalability in high-dimensional settings remains a major obstacle to practical deployment in LLM analysis. To bridge this gap, in this paper, we introduce a computationally efficient temporal causal repre- sentation learning framework specifically designed for the high-dimensional activation space in LLMs. Our approach builds upon recent advances in both sparse autoencoders for LLMs and causal representation learning for sequential data. The key contributions of our work are: (1) We propose a simple yet effective framework that jointly models time-delayed causal relations between concepts and instantaneous constraints, providing a more comprehensive understanding of how information flows through LLMs. (2)Leveraging sparsity principle, we establish theoretical guarantees for our approach, making the representations learned reliable and explainable. (3)Grounded in the theoretical result, we design scalable and efficient algorithms tailored to the high-dimensional concept space of LLMs, significantly extending prior work in CRL. (4)We validate our approach on synthetic datasets scaled to match real-world complexity and demonstrate its effectiveness when applied to activations from real LLMs. 2 Problem Setting <latexit sha1_base64="36uH3HIeDDxMChSP6kEX8zkooNA=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRbBU9kVqT0WvHisYD+kXUs2zbahSXZJsmJZ9ld48aCIV3+ON/+NabsHbX0w8Hhvhpl5QcyZNq777RTW1jc2t4rbpZ3dvf2D8uFRW0eJIrRFIh6pboA15UzSlmGG026sKBYBp51gcj3zO49UaRbJOzONqS/wSLKQEWysdP80SE32kHrZoFxxq+4caJV4OalAjuag/NUfRiQRVBrCsdY9z42Nn2JlGOE0K/UTTWNMJnhEe5ZKLKj20/nBGTqzyhCFkbIlDZqrvydSLLSeisB2CmzGetmbif95vcSEdT9lMk4MlWSxKEw4MhGafY+GTFFi+NQSTBSztyIyxgoTYzMq2RC85ZdXSfui6tWqtdvLSqOex1GEEziFc/DgChpwA01oAQEBz/AKb45yXpx352PRWnDymWP4A+fzByaGkJ8=</latexit> x 1 t <latexit sha1_base64="LUvcg+en1c0tde2iXlzb/qH8DkY=">AAAB8HicbVBNSwMxEM36WetX1aOXYBE8ld0itceCF48V7Ie0a8m2TY0yS7JrFiW/RVePCji1Z/jzX9j2u5BWx8MPN6bYWZeEAtuwHW/nbX1jc2t7cJOcXdv/+CwdHTcNlGiKWvRSES6GxDDBFesBRwE68aaERkI1gkm1zO/88i04ZG6g2nMfElGioecErDS/dMghewhrWaDUtmtuHPgVeLlpIxyNAelr/4woolkCqggxvQ8NwY/JRo4FSwr9hPDYkInZMR6lioimfHT+cEZPrfKEIeRtqUAz9XfEymRxkxlYDslgbFZ9mbif14vgbDup1zFCTBFF4vCRGCI8Ox7POSaURBTSwjV3N6K6ZhoQsFmVLQheMsvr5J2teLVKrXby3KjnsdRQKfoDF0gD12hBrpBTdRCFEn0jF7Rm6OdF+fd+Vi0rjn5zAn6A+fzBygLkKA=</latexit> x 2 t <latexit sha1_base64="zDl9eWGESVQ00FgqHCoqHBQZGKQ=">AAAB8nicbVBNSwMxEM36WetX1aOXYBEEoewWqT0WvHisYD9gu5Zsmm1Ds8mSzIpl6c/w4kERr/4ab/4b03YP2vpg4PHeDDPzwkRwA6777aytb2xubRd2irt7+weHpaPjtlGppqxFlVC6GxLDBJesBRwE6yaakTgUrBOOb2Z+55Fpw5W8h0nCgpgMJY84JWAl/6mfwaU3fciq036p7FbcOfAq8XJSRjma/dJXb6BoGjMJVBBjfM9NIMiIBk4FmxZ7qWEJoWMyZL6lksTMBNn85Ck+t8oAR0rbkoDn6u+JjMTGTOLQdsYERmbZm4n/eX4KUT3IuExSYJIuFkWpwKDw7H884JpREBNLCNXc3orpiGhCwaZUtCF4y+vkna14tUqtburcqOex1FAp+gMXSAPXaMGukVN1EIUKfSMXtGbA86L8+58LFrXnHzmBP2B8/kDAveREA==</latexit> x 2 t+1 <latexit sha1_base64="kjLG0t1euw6su+IWZ3q7d7vFtMc=">AAAB8nicbVBNSwMxEM3Wr1q/qh69BIsgCGVXpPZY8OKxgv2A7VqyadqGZpMlmRXLsj/DiwdFvPprvPlvTNs9aOuDgcd7M8zMC2PBDbjut1NYW9/Y3Cpul3Z29/YPyodHbaMSTVmLKqF0NySGCS5ZCzgI1o01I1EoWCec3Mz8ziPThit5D9OYBREZST7klICV/Kd+Chde9pB6Wb9ccavuHHiVeDmpoBzNfvmrN1A0iZgEKogxvufGEKREA6eCZaVeYlhM6ISMmG+pJBEzQTo/OcNnVhngodK2JOC5+nsiJZEx0yi0nRGBsVn2ZuJ/np/AsB6kXMYJMEkXi4aJwKDw7H884JpREFNLCNXc3orpmGhCwaZUsiF4y+vkvZl1atVa3dXlUY9j6OITtApOkceukYNdIuaqIUoUugZvaI3B5wX5935WLQWnHzmGP2B8/kDAXKRDw==</latexit> x 1 t+1 <latexit sha1_base64="84GYAWVkJePjgl645GMNoa4Fm7U=">AAAB8nicbVBNSwMxEM3Wr1q/qh69BIsgCGVXpPZY8OKxgv2A7VqyadqGZpMlmRXqsj/DiwdFvPprvPlvTNs9aOuDgcd7M8zMC2PBDbjut1NYW9/Y3Cpul3Z29/YPyodHbaMSTVmLKqF0NySGCS5ZCzgI1o01I1EoWCec3Mz8ziPThit5D9OYBREZST7klICV/Kd+Chde9pB6Wb9ccavuHHiVeDmpoBzNfvmrN1A0iZgEKogxvufGEKREA6eCZaVeYlhM6ISMmG+pJBEzQTo/OcNnVhngodK2JOC5+nsiJZEx0yi0nRGBsVn2ZuJ/np/AsB6kXMYJMEkXi4aJwKDw7H884JpREFNLCNXc3orpmGhCwaZUsiF4y+vkvZl1atVa3dXlUY9j6OITtApOkceukYNdIuaqIUoUugZvaI3B5wX5935WLQWnHzmGP2B8/kDBI6REQ==</latexit> z 1 t+1 <latexit sha1_base64="Ya6S1NHbdb7I3nvFUzE7PvWz46g=">AAAB8nicbVBNSwMxEM36WetX1aOXYBEEoewWqT0WvHisYD9gu5Zsmm1Ds8mSzAp16c/w4kERr/4ab/4b03YP2vpg4PHeDDPzwkRwA6777aytb2xubRd2irt7+weHpaPjtlGppqxFlVC6GxLDBJesBRwE6yaakTgUrBOOb2Z+55Fpw5W8h0nCgpgMJY84JWAl/6mfwaU3fciq036p7FbcOfAq8XJSRjma/dJXb6BoGjMJVBBjfM9NIMiIBk4FmxZ7qWEJoWMyZL6lksTMBNn85Ck+t8oAR0rbkoDn6u+JjMTGTOLQdsYERmbZm4n/eX4KUT3IuExSYJIuFkWpwKDw7H884JpREBNLCNXc3orpiGhCwaZUtCF4y+vkna14tUqtburcqOex1FAp+gMXSAPXaMGukVN1EIUKfSMXtGbA86L8+58LFrXnHzmBP2B8/kDBhOREg==</latexit> z 2 t+1 <latexit sha1_base64="tyBRmTs82TGEqHIONEg6G0xOCWo=">AAAB8HicbVBNSwMxEM36WetX1aOXYBE8ld0itceCF48V7Ie0a8m2TY0yS7JrFCX/RVePCji1Z/jzX9j2u5BWx8MPN6bYWZeEAtuwHW/nbX1jc2t7cJOcXdv/+CwdHTcNlGiKWvRSES6GxDDBFesBRwE68aaERkI1gkm1zO/88i04ZG6g2nMfElGioecErDS/dMghewhrWaDUtmtuHPgVeLlpIxyNAelr/4woolkCqggxvQ8NwY/JRo4FSwr9hPDYkInZMR6lioimfHT+cEZPrfKEIeRtqUAz9XfEymRxkxlYDslgbFZ9mbif14vgbDup1zFCTBFF4vCRGCI8Ox7POSaURBTSwjV3N6K6ZhoQsFmVLQheMsvr5J2teLVKrXby3KjnsdRQKfoDF0gD12hBrpBTdRCFEn0jF7Rm6OdF+fd+Vi0rjn5zAn6A+fzBysjkKI=</latexit> z 2 t <latexit sha1_base64="0pq5n6bg63bO9Rm/7Ky3p8sAOBk=">AAAB8HicbVBNSwMxEM3Wr1q/qh69BIvgqeyK1B4LXjxWsB/SriWbZtvQJLsks0Jd9ld48aCIV3+ON/+NabsHbX0w8Hhvhpl5QSy4Adf9dgpr6xubW8Xt0s7u3v5B+fCobaJEU9aikYh0NyCGCa5YCzgI1o01IzIQrBNMrmd+55FpwyN1B9OY+ZKMFA85JWCl+6dBCtlD6mWDcsWtunPgVeLlpIJyNAflr/4woolkCqggxvQ8NwY/JRo4FSwr9RPDYkInZMR6lioimfHT+cEZPrPKEIeRtqUAz9XfEymRxkxlYDslgbFZ9mbif14vgbDup1zFCTBFF4vCRGCI8Ox7POSaURBTSwjV3N6K6ZhoQsFmVLIheMsvr5L2RdWrVWu3l5VGPY+jiE7QKTpHHrpCDXSDmqiFKJLoGb2iN0c7L86787FoLTj5zDH6A+fzBymekKE=</latexit> z 1 t Latent Variables Observed Variables Time-delayed Mixing Function Instantaneous Figure 1: Graphical illustration of the data generation process. We begin by characterizing the generation process of LLM activations to establish interpretability guarantees. These activa- tions—signals produced during inference—are widely assumed to be linearly generated from hidden concepts, consistent with sparse autoencoder (SAE) literature [3,18]. However, existing formulations typically treat these concepts as independent, over- looking dependencies between them. In reality, earlier-token semantics often influence later tokens, and token generation de- pends jointly on the activation of multiple concepts. To account for these interactions, we introduce a data generation process with both temporal and instantaneous relations, adopting CRL terminology. Given a token sequences = (v 1 ,...,v k ), let x t = (x t,1 ,...,x t,m )be then-dimensional activation vector at tokenv t for a specific layer. Following the linear representation hypothesis [43] and SAE formulation [3, 18], we assume: x t = g(z t ),(1) whereg : R n → R m is the linear mixing function, andx t andz t are observed and latent variables, respectively. Besides, each latent variablez t,i is governed by a structural equation model (SEM) capturing both time-delayed and instantaneous dependencies: z t,i = X τ X j∈J i,τ B i,j,τ z t−τ,j + X j∈K i M i,j z t,j + ε t,i , (2) whereB i,j,τ represents the coefficient for the time-delayed effect fromz t−τ,j toz t,i ;J i,τ is the set of indices of latent variables that have a time-delayed effect onz t,i with lagτ;M i,j represents the 2 coefficient for the instantaneous effect fromz t,j toz t,i ;K i is the set of indices of latent variables that have an instantaneous effect onz t,i ; andε t,i denotes the temporally and spatially independent noise extracted from a distribution p ε i . The graphical model for this process is illustrated in Figure 1. To better understand this data generation process in the context of LLM activations,x t represents activations in a specific layerlfor tokenv t , and the latent variablesz t can be considered as the underlying causal factors that generate these activations. In this case, the instantaneous effects (coefficientsM i,j ) reflect semantic or syntactic relationships between different latent factors within the same token, while time-delayed effects (coefficientsB i,j,τ ) represent dependencies on previous tokens. Putting them together, the underlying data generating process can be written as x t = Az t |z Linear mixing , z t,i = X τ>0 X j∈J i,τ B i,j,τ z t−τ,j + X j∈K i M i,j z t,j + ε t,i , i = 1,...,m | z Linear latent temporal SEM .(3) The linear latent temporal SEM in Eq. (3) induces two types of causal relationships: a time-delayed causal graphG d , in which an edgez t−τ,j → z t,i exists if and only ifB i,j,τ ̸= 0, and an instantaneous causal graphG e , in which an edgez t,j → z t,i exists if and only ifM i,j ̸= 0. We assume thatG e is acyclic, i.e., a directed acyclic graph (DAG), which implies thatMcan be permuted to a strictly lower- triangular form. Under this assumption, the conditional distribution ofz t given its past satisfies the Markov property with respect toG e [45], namely p(z t | z <t ) = Q n i=1 p(z t,i | Pa d (z t,i ), Pa e (z t,i )). Remark on the Linearity of the Model We acknowledge that the internal mechanisms of LLMs are inherently nonlinear due to activation functions and attention mechanisms. However, our linear approach is justified by several considerations. First, many successful mechanistic interpretability techniques [15,42,10,1,30] rely on linear representation hypotheses as approximations of localized network behavior. Second, linear models provide an interpretable bridge between the complexity of neural networks and human understanding—they serve as simplified yet informative projections of the underlying causal mechanisms. Third, empirical evidence suggests that linear approximations can capture significant portions of variance in activations within specific contexts [37,19], particularly when examining feature-to-feature relationships within a layer. More importantly, existing causal representation learning (CRL) methods cannot efficiently handle hundreds of latent variables, often encountering out-of-memory issues and prohibitively long compu- tation times. A detailed discussion of these limitations is presented in Section 5.1. While nonlinear interactions certainly exist, our linear framework offers a tractable foundation for identifying causal relationships that can later be extended to incorporate more complex dependencies. This approach follows the scientific principle of starting with simpler models that capture essential phenomena before introducing additional complexity. 3 Theoretical Guarantees Recent work in causal representation learning, particularly for time-series data, has made substantial progress in handling both time-delayed and instantaneous causal relations. Under general assumptions on the data-generating process, strong identifiability results can be established, including recovery of latent variables up to component-wise transformations and estimation of the Markov network up to isomorphic equivalence. However, in our linear setting, the identifiability result of [27] is not directly applicable, since one key assumption (sufficient changes) cannot be satisfied; see Appendix A.1 for a detailed discussion. Therefore, in this section we establish identifiability by exploiting the autocovariance structure induced by linearity, following ideas inspired by [58]. Theorem 1 (Latent Indeterminacy). Suppose the estimated model( ˆ A, ˆ B τ L τ=1 , ˆ M,p ˆε ) and the true model(A,B τ L τ=1 , M,p ε ) both generatex t according to Eq. 3 and are observationally equivalent on the autocovariance matricesR x (k)fork = 0, 1,...,L. If conditions A1–A4 hold, then the model parameters are identifiable up to the following indeterminacies: ˆ ε t = Pε t , ˆ A = AS,(I− ˆ M) = P(I− M)S, ˆ B τ = PB τ S, where S∈ R n×n is invertible and P is a signed permutation matrix. • A1 (Temporally white noise). Bothe t andε t are temporally white, andε t has i.i.d. mutually independent components. To remove scaling indeterminacy, assumeΣ ε t = Iand zero-mean data. 3 • A2 (Rank sufficiency). A∈ R m×n (m≥ n) has full column rank, and B L is full rank. • A3 (Process stability). Eq. 22 defines a stable vector autoregression, i.e., all roots of det(I− P L τ=1 B τ y −τ ) = 0 lie strictly inside the unit circle. • A4 (Non-Gaussianity). At most one component of ε t is Gaussian. Proof Sketch The proof consists of four steps. First, using the VAR representation ofz t and the linear mixingx t = Az t , we derive Yule–Walker-type recursions for the autocovariancesR x (k), yielding a linear system whose unknowns are the transformed coefficientsC τ = A(I−M) −1 B τ A −1 . Second, by stacking lagged observations, we show that the system’s coefficient matrix equals the covariance of a finite stacked vector ofx t , which is positive definite—and thus nonsingular—under the assumption of no nontrivial deterministic finite linear relations. This ensures identifiability of C τ and ofHH T = A(I− M) −1 (I− M) −T A T . Third, exploiting column-space arguments and invertibility ofI− M, we show that any two observationally equivalent models must satisfy ˆ A = AS , (I− ˆ M) = U T (I− M)S, and ˆ B τ = U T B τ S, whereSis invertible andUis orthogonal. Finally, non-Gaussianity of the innovations implies thatUmust be a signed permutation matrix, yielding a precise characterization of the remaining indeterminacies. Discussion This result extends [58] to temporal processes with instantaneous relations. When M̸= 0, the indeterminacy ofAincreases due to the additional mixing introduced byM, changing the ambiguity from an orthogonal transformation to a general invertible one. Consequently, bothI− M andB τ inherit right-multiplicative indeterminacies. Without further structural assumptions onMor B τ , component-wise identifiability is impossible. However, since the indeterminacies are precisely characterized, additional structural constraints can be imposed to further improve identifiability, as shown in the following corollaries. Corollary 1 (Component-wise Identifiability). Suppose the estimated model( ˆ A, ˆ B τ L τ=1 , ˆ M,p ˆε ) and the true model(A,B τ L τ=1 , M,p ε )both generatex t according to Eq. 3 and are observationally equivalent on the autocovariance matricesR x (k)fork = 0, 1,...,L. If conditions A1–A4 of Theorem 1 hold and, in addition, the following assumption on the instantaneous relationsMis satisfied, then the latent variablesz t are identifiable up to permutation and scaling, i.e., ˆ A = A ̃ PD, where ̃ P is a permutation matrix and D is a diagonal matrix with nonzero entries. •A5 (Empty or unique column supports ofM). Each column ofMeither has empty support or contains at least one index that does not appear in the support of any other column. The model is estimated under a sparsity constraint on M. Proof. From Theorem 1, we have ˆ A = AS and(I− ˆ M) = P(I− M)S. SincePonly permutes rows, it does not affect column-support arguments. Under Assumption A5, any column ofMhas a unique nonzero support element, which prevents cancellation across columns. If a columnS :,l contained more than one nonzero entry, thenP(I− M)S :,l would necessarily have a strictly larger support than the corresponding column ofI− M, violating the assumed sparsity. SinceMhas zero diagonal (by the SEM assumption), the same argument applies to ˆ M . Therefore, sparsity-enforced estimation excludes suchS, implying thatSmust be a product of a permutation and a diagonal scaling matrix. Hence, z t is component-wise identifiable. Corollary 2 (Subspace Identifiability). Suppose the estimated model( ˆ A, ˆ B τ L τ=1 , ˆ M,p ˆε ) and the true model(A,B τ L τ=1 , M,p ε ) both generatex t according to Eq. 3 and are observationally equivalent on the autocovariance matricesR x (k)fork = 0, 1,...,L. If conditions A1–A4 of Theorem 1 hold and A6 is satisfied, then the latent variablesz t are subspace identifiable, i.e., ˆ A = A ̃ S, where each column of ̃ S has nonzero entries only within a single subspace. •A6 (Subspace structure ofM). The instantaneous relation matrixMadmits a partition of [n]intoKdisjoint subsets such thatM ij ̸= 0only ifiandjbelong to the same subset. The model is estimated with a sparsity constraint on M. Proof. From Theorem 1, we have ˆ A = AS and(I− ˆ M) = P(I− M)S. Assumption A6 implies that columns ofMcorresponding to different subspaces have disjoint supports. SinceMhas zero 4 diagonal, the same property holds forI− M. If any columnS :,l mixed components from different subspaces, then the corresponding column of(I− M)Swould necessarily contain nonzero entries from multiple subspaces, contradicting Assumption A6. Therefore, each column ofScan only combine latent variables within the same subspace, implying subspace identifiability of z t . Discussion Corollary 1 strengthens Theorem 1 by imposing a strong structural assumption (A5) onM, yielding component-wise identifiability. Corollary 2 relaxes this requirement by allowing instantaneous relations within latent subspaces, leading to subspace identifiability. Both results rely on sparsity constraints imposed during estimation. Moreover, each corollary admits a natural analogue for the lagged matricesB τ . In particular, the counterpart of Corollary 1 requires nonzero diagonal entries inB τ , while the analogue of Corollary 2 assumes a matching subspace structure. Furthermore, ifMis forced to be strictly triangular, the right permutation indeterminacy must match the left one, i.e., ̃ P = P, otherwise the strictly triangular structure will be destroyed. In practice, we therefore enforce sparsity on both instantaneous and lagged relations and strictly lower triangular structure on the instantaneous adjacency. Empirically, the recovered structures from real data approximately satisfy the assumed conditions, while synthetic experiments explicitly enforce them to validate identifiability. 4 Implementation Based on the data generation process in Eq.(3)together with the identifiability result presented in the previous section, we derive the following estimation process based on the standard sparse autoencoder. Illustrated in Figure 2, the whole estimation process can be partitioned into three parts, namely (1) observation reconstruction, (2) independent noise estimation, and (3) sparsity regularization. 4.1 Observation Reconstruction <latexit sha1_base64="iINNmE1BN7um69dxNCtVNDE0cI8=">AAACHHicbVDLSsNAFJ3UV62vqEs3g0VwISWpUl0W3bis0Bc0IUymk3boZBJmJmIJ/RA3/oobF4q4cSH4N07aLPrwwMC55z7m3uPHjEplWb9GYW19Y3OruF3a2d3bPzAPj9oySgQmLRyxSHR9JAmjnLQUVYx0Y0FQ6DPS8Ud3Wb7zSISkEW+qcUzcEA04DShGSkueeemkTojU0A/Sp4lnX8C5qKqjfqTkgth0Jp5ZtirWFHCV2DkpgxwNz/zWc3ASEq4wQ1L2bCtWboqEopiRSclJJIkRHqEB6WnKUUikm06Pm8AzrfRhEAn9uIJTdb4jRaGU49DXldmWcjmXif/leokKbtyU8jhRhOPZR0HCoIpg5hTsU0GwYmNNEBZU7wrxEAmElfazpE2wl09eJe1qxa5Vag9X5fptbkcRnIBTcA5scA3q4B40QAtg8AxewTv4MF6MN+PT+JqVFoy85xgswPj5A/EUoew=</latexit> x 1 ,x 2 ,...,x T <latexit sha1_base64="yOuOs7EXPewabCIOUHdw4u5nUDg=">AAACLnicbVDLSsNAFJ34rPUVdelmsAgupCRFqsuiCC4r9AVNKZPppB06eTBzI5aQL3Ljr+hCUBG3foaTNgtte2DgcM69d+49biS4Ast6N1ZW19Y3Ngtbxe2d3b198+CwpcJYUtakoQhlxyWKCR6wJnAQrBNJRnxXsLY7vsn89gOTiodBAyYR6/lkGHCPUwJa6pu3TuKMCCSOT2Dkesljmvbtc7ygVbQ2CEEtsRpO2jdLVtmaAi8SOycllKPeN1/1NBr7LAAqiFJd24qglxAJnAqWFp1YsYjQMRmyrqYB8ZnqJdNzU3yqlQH2QqlfAHiq/u1IiK/UxHd1ZbammvcycZnXjcG76iU8iGJgAZ195MUCQ4iz7PCAS0ZBTDQhVHK9K6YjIgkFnXBRh2DPn7xIWpWyXS1X7y9Ktes8jgI6RifoDNnoEtXQHaqjJqLoCb2gD/RpPBtvxpfxPStdMfKeI/QPxs8v8qKqUw==</latexit> ˆ x 1 , ˆ x 2 ,..., ˆ x T <latexit sha1_base64="em5aHozKN7DNDbmkGjPU9Ut7X18=">AAACLnicbVDLSsNAFJ34rPUVdelmsAgupCRFqsuiCC4r9AVNKZPppB06eTBzI9SQL3Ljr+hCUBG3foaTNgtte2DgcM69d+49biS4Ast6N1ZW19Y3Ngtbxe2d3b198+CwpcJYUtakoQhlxyWKCR6wJnAQrBNJRnxXsLY7vsn89gOTiodBAyYR6/lkGHCPUwJa6pu3TuKMCCSOT2Dkesljmvbtc7ygVbQ2CEEtsRpO2jdLVtmaAi8SOycllKPeN1/1NBr7LAAqiFJd24qglxAJnAqWFp1YsYjQMRmyrqYB8ZnqJdNzU3yqlQH2QqlfAHiq/u1IiK/UxHd1ZbammvcycZnXjcG76iU8iGJgAZ195MUCQ4iz7PCAS0ZBTDQhVHK9K6YjIgkFnXBRh2DPn7xIWpWyXS1X7y9Ktes8jgI6RifoDNnoEtXQHaqjJqLoCb2gD/RpPBtvxpfxPStdMfKeI/QPxs8v/HKqWQ==</latexit> ˆ z 1 , ˆ z 2 ,..., ˆ z T <latexit sha1_base64="t47Rnk1SpTpukcHb3iQtP1ytOXE=">AAACBHicbVC7TsMwFHXKq5RXgLGLRYXEVCUIFcYKFsYi0YfURJXjOK1Vx45sB6mKMrDwKywMIMTKR7DxNzhtBmg5kuWjc+7VvfcECaNKO863VVlb39jcqm7Xdnb39g/sw6OeEqnEpIsFE3IQIEUY5aSrqWZkkEiC4oCRfjC9Kfz+A5GKCn6vZwnxYzTmNKIYaSON7Lo3QTrzAsFCNYvNl3kkUZQJnucju+E0nTngKnFL0gAlOiP7ywsFTmPCNWZIqaHrJNrPkNQUM5LXvFSRBOEpGpOhoRzFRPnZ/IgcnholhJGQ5nEN5+rvjgzFqljRVMZIT9SyV4j/ecNUR1d+RnmSasLxYlCUMqgFLBKBIZUEazYzBGFJza4QT5BEWJvcaiYEd/nkVdI7b7qtZuvuotG+LuOogjo4AWfABZegDW5B3QBBo/gGbyCN+vJerHerY9FacUqe47BH1ifP1qBmTY=</latexit> ˆ ✏ <latexit sha1_base64="gYSklEVN4QYo48sIha+mQonVzYA=">AAAB/nicbVDLSsNAFL3xWesrKq7cBIvgqiQi1WXRjcsK9gFNKJPJpB06mQkzE6GEgr/ixoUibv0Od/6NkzYLbT0wzOGce5kzJ0wZVdp1v62V1bX1jc3KVnV7Z3dv3z447CiRSUzaWDAheyFShFFO2ppqRnqpJCgJGemG49vC7z4SqajgD3qSkiBBQ05jipE20sA+9kPBIjVJzJX7JFWUCT4d2DW37s7gLBOvJDUo0RrYX34kcJYQrjFDSvU9N9VBjqSmmJFp1c8USREeoyHpG8pRQlSQz+JPnTOjRE4spDlcOzP190aOElUkNJMJ0iO16BXif14/0/F1kFOeZppwPH8ozpijhVN04URUEqzZxBCEJTVZHTxCEmFtGquaErzFLy+TzkXda9Qb95e15k1ZRwVO4BTOwYMraMIdtKANGHJ4hld4s56sF+vd+piPrljlzhH8gfX5A1zblmk=</latexit> ✏ <latexit sha1_base64="BN/UQ7EGZlUCP488mdLVmbS1lGA=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gEVyURqa6k4MaFiwr2AW0Ik+mkHTqZhJmJUEO+xI0LRdz6Ke78GydtF9p6YOBwzr3cMydIOFPacb6t0tr6xuZWebuys7u3X7UPDjsqTiWhbRLzWPYCrChngrY105z2EklxFHDaDSY3hd99pFKxWDzoaUK9CI8ECxnB2ki+XR1EWI8J5tld7mcy9+2aU3dmQKvEXZAaLNDy7a/BMCZpRIUmHCvVd51EexmWmhFO88ogVTTBZIJHtG+owBFVXjYLnqNTowxRGEvzhEYz9fdGhiOlplFgJouYatkrxP+8fqrDKy9jIkk1FWR+KEw50jEqWkBDJinRfGoIJpKZrIiMscREm64qpgR3+curpHNedxv1xv1FrXm9qKMMx3ACZ+DCJTThFlrQBgIpPMMrvFlP1ov1bn3MR0vWYucI/sD6/AFYUZOL</latexit> L r <latexit sha1_base64="E7u/6i02g/wtUkYrDTALbcBFNz4=">AAAB/3icbVDLSsNAFJ3UV62vqODGTbAIrkoiUl1JwY0LFxXsA5oQJtNpO3QyCTM3YolZ+CtuXCji1t9w5984abPQ1gMDh3Pu5Z45QcyZAtv+NkpLyyura+X1ysbm1vaOubvXVlEiCW2RiEeyG2BFORO0BQw47caS4jDgtBOMr3K/c0+lYpG4g0lMvRAPBRswgkFLvnnghhhGBPP0JvNTF+gDpCLLfLNq1+wprEXiFKSKCjR988vtRyQJqQDCsVI9x47BS7EERjjNKm6iaIzJGA9pT1OBQ6q8dJo/s4610rcGkdRPgDVVf2+kOFRqEgZ6Mk+r5r1c/M/rJTC48FIm4gSoILNDg4RbEFl5GVafSUqATzTBRDKd1SIjLDEBXVlFl+DMf3mRtE9rTr1Wvz2rNi6LOsroEB2hE+Sgc9RA16iJWoigR/SMXtGb8WS8GO/Gx2y0ZBQ7++gPjM8fPLiW5g==</latexit> L n <latexit sha1_base64="7v0EkDVq+WjH01HHirYyrZyxRxI=">AAAB/3icbVDLSsNAFJ3UV62vqODGTbAIrkoiUl1JwY0LFxXsA5oQJtNpO3QyCTM3YolZ+CtuXCji1t9w5984abPQ1gMDh3Pu5Z45QcyZAtv+NkpLyyura+X1ysbm1vaOubvXVlEiCW2RiEeyG2BFORO0BQw47caS4jDgtBOMr3K/c0+lYpG4g0lMvRAPBRswgkFLvnnghhhGBPP0JvNTF+gDpCrLfLNq1+wprEXiFKSKCjR988vtRyQJqQDCsVI9x47BS7EERjjNKm6iaIzJGA9pT1OBQ6q8dJo/s4610rcGkdRPgDVVf2+kOFRqEgZ6Mk+r5r1c/M/rJTC48FIm4gSoILNDg4RbEFl5GVafSUqATzTBRDKd1SIjLDEBXVlFl+DMf3mRtE9rTr1Wvz2rNi6LOsroEB2hE+Sgc9RA16iJWoigR/SMXtGb8WS8GO/Gx2y0ZBQ7++gPjM8fRFaW6w==</latexit> L s Independent Noise Estimation EncoderDecoder <latexit sha1_base64="suuLv/79ioV+1VTRS5UCErtu32s=">AAAB9XicbVBNS8NAEN3Ur1q/qh69BIvgqSQi1ZMUvXisYD+giWWy3bZLN5uwO1FKyP/w4kERr/4Xb/4bt20O2vpg4PHeDDPzglhwjY7zbRVWVtfWN4qbpa3tnd298v5BS0eJoqxJIxGpTgCaCS5ZEzkK1okVgzAQrB2Mb6Z++5EpzSN5j5OY+SEMJR9wCmikB28EmF5nvdRDSLJeueJUnRnsZeLmpEJyNHrlL68f0SRkEqkArbuuE6OfgkJOBctKXqJZDHQMQ9Y1VELItJ/Ors7sE6P07UGkTEm0Z+rviRRCrSdhYDpDwJFe9Kbif143wcGln3IZJ8gknS8aJMLGyJ5GYPe5YhTFxBCgiptbbToCBRRNUCUTgrv48jJpnVXdWrV2d16pX+VxFMkROSanxCUXpE5uSYM0CSWKPJNX8mY9WS/Wu/Uxby1Y+cwh+QPr8wcHfZLd</latexit> ˆ B ω <latexit sha1_base64="hMcEwFoIsJifbWHY1ine7SOaLUw=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqicpePEiVLAf0Iay2W7apZtN2J0IJfRHePGgiFd/jzf/jds2B219MPB4b4aZeUEihUHX/XYKa+sbm1vF7dLO7t7+QfnwqGXiVDPeZLGMdSeghkuheBMFSt5JNKdRIHk7GN/O/PYT10bE6hEnCfcjOlQiFIyildq9EcXsftovV9yqOwdZJV5OKpCj0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXUkUjbvxsfu6UnFllQMJY21JI5urviYxGxkyiwHZGFEdm2ZuJ/3ndFMNrPxMqSZErtlgUppJgTGa/k4HQnKGcWEKZFvZWwkZUU4Y2oZINwVt+eZW0LqperVp7uKzUb/I4inACp3AOHlxBHe6gAU1gMIZneIU3J3FenHfnY9FacPKZY/gD5/MHcUOPpQ==</latexit> ˆ M Time-delayed Instantaneous Figure 2: Illustration of estimation pro- cess. ˆ B τ represents the learned time- delayed causal relation and ˆ M is the in- stantaneous causal relation. First, we use a linear autoencoder to enforce the invertible linear transformation between observationsx t and latent variables z t , and the reconstruction lossL r is defined as L r = E x 1:T " T X t=1 (x t − ˆ x t ) 2 # ,(4) where the reconstructed observation is calculated via a linear encoder and decoder: ˆ x t = Decoder( ˆ z t ) and ˆ z t = Encoder(x t ).(5) 4.2 Independent Noise Estimation In prior works [56,49,27], this terms refers to the independent prior estimation, in which they essentially utilize the independence of noise to enforce the independence of latent variablesz t,i , conditioning on parent Pa(z t,i ). In our case, since the whole process is linear, we can directly estimate and enforce the independent noise condition by learning a residual network by reversing the data generation process described in Eq. 3: ˆ ε t = ˆ z t − ˆ M ˆ z t − X τ>0 ˆ B τ ˆ z t−τ ,(6) where the estimated latent variables are given by Eq.(5). Following the prior works, to enforce the independence of noise terms, we model the noise distributionp(ˆε t,i )with isomorphic Laplacian † distribution, and we minimize its KL-divergence with the estimated noise term. L n = E ˆ ε t [|| ˆ ε t || 1 ].(7) † In prior works, the distribution is Gaussian, however, we can see that in linear case as is well discussed in linear ICA literature, the density function of an isomorphic Gaussian distribution is rotation invariant, hence we utilize the Laplacian distribution in our estimation. 5 4.3 Sparsity Regularization Without any further constraint, the noise estimation module may bring redundant causal edges from ˆ z t−1 , ˆ z t,[m] toˆz t,i , leading to the incorrect estimation. As mentioned in Sec. 4.2,B τ andM intuitively denote the time-delayed and instantaneous causal structures, since they describe how the ˆ z t−1 , ˆ z t,[m] contribute toˆz t,i , which motivate us to remove these redundant causal edges with a sparsity regularization termL s by using the L1 penalty on ˆ B τ and ˆ M. Formally, we have L s = X τ || ˆ B τ || 1 ! +|| ˆ M|| 1 ,(8) where||∗|| 1 denotes the L1 Norm of a matrix. And we restrict theMto be strictly lower triangular to match the permutation indeterminacy on both sides ofMandB τ . Finally, the total loss of the model can be formalized as: L total =L r + αL n + βL s ,(9) where α,β denote the hyper-parameters. 5 Experiments Our experimental evaluation addresses five key claims regarding our proposed method: (1) our estimation approach aligns with identifiability theory, accurately recovering latent structures; (2) existing CRL methods fail to handle high-dimensional data at scale; (3) our method is able to recover target relations between concepts from semi-synthetic data; (4) compared with common SAEs, our proposal achieves satisfactory results on quantitative evaluation metrics (SAEBench [24]); and (5) our method effectively learns both time-delayed and instantaneous causal relations among concepts elicited from LLM activations. 5.1 Synthetic Data Experiments First, using synthetic data, we demonstrate that our method can recover both the latent variablesz t and the causal structure including time-delayed relations B τ and instantaneous relations M. Identifiability Verification To establish the effectiveness of our approach, we generate simulated time series data with a latent causal process as introduced in Eq.(3). We apply our method to single time lag synthetic data generated with a randomly initialized matrixAand fixed transition matrices B and M visualized in Figure 3a and 3c. Further details can be found in Appendix A.3. We visualize the estimated parameters by plotting the recovered matrices ˆ Band ˆ Malongside the correlation coefficient matrix used for calculating the mean correlation coefficient (MCC) score. As shown in Figure 3, comparing with the ground truth transition matricesBandM, we observe that both time-delayed and instantaneous causal relations have been precisely recovered. Furthermore, Figure 3e demonstrates that the latent variablesz t are also accurately recovered, confirming the identifiability properties of our method. <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="ngJ4LwVy64SUOm7D1izaSZe1kGs=">AAAB8HicbVDLSgNBEOz1GeMr6tHLYhA8aNgViZ4k4MVjBPOQZAmzk9lkyMzsMtMrxCVf4cWDIl79HG/+jZPHQRMLGoqqbrq7wkRwg5737Swtr6yurec28ptb2zu7hb39uolTTVmNxiLWzZAYJrhiNeQoWDPRjMhQsEY4uBn7jUemDY/VPQ4TFkjSUzzilKCVHp46GZ75p/6oUyh6JW8Cd5H4M1KEGaqdwle7G9NUMoVUEGNavpdgkBGNnAo2yrdTwxJCB6THWpYqIpkJssnBI/fYKl03irUthe5E/T2REWnMUIa2UxLsm3lvLP7ntVKMroKMqyRFpuh0UZQKF2N3/L3b5ZpRFENLCNXc3urSPtGEos0ob0Pw519eJPXzkl8ule8uipXrWRw5OIQjOAEfLqECt1CFGlCQ8Ayv8OZo58V5dz6mrUvObOYA/sD5/AHzuY/b</latexit> z t→1,1 <latexit sha1_base64="AR6sl8mnTQMlH/2iEM37cqHOcDQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRbBg5bdItWTFLx4rGA/pF1KNs22oUl2SbJCXforvHhQxKs/x5v/xrTdg7Y+GHi8N8PMvCDmTBvX/XZyK6tr6xv5zcLW9s7uXnH/oKmjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqZ+q1HqjSL5L0Zx9QXeCBZyAg2Vnp46qXm3DurTHrFklt2Z0DLxMtICTLUe8Wvbj8iiaDSEI617nhubPwUK8MIp5NCN9E0xmSEB7RjqcSCaj+dHTxBJ1bpozBStqRBM/X3RIqF1mMR2E6BzVAvelPxP6+TmPDKT5mME0MlmS8KE45MhKbfoz5TlBg+tgQTxeytiAyxwsTYjAo2BG/x5WXSrJS9arl6d1GqXWdx5OEIjuEUPLiEGtxCHRpAQMAzvMKbo5wX5935mLfmnGzmEP7A+fwB9T6P3A==</latexit> z t→1,2 <latexit sha1_base64="vsJKlHGzdi76OXUIRYX5Q5Xknu8=">AAAB8HicbVBNSwMxEM3Wr1q/qh69BIvgQcuuSvUkBS8eK9gPaZeSTbNtaJJdklmhLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhALbsB1v53c0vLK6lp+vbCxubW9U9zda5go0ZTVaSQi3QqIYYIrVgcOgrVizYgMBGsGw5uJ33xk2vBI3cMoZr4kfcVDTglY6eGpm8Kpd3I+7hZLbtmdAi8SLyMllKHWLX51ehFNJFNABTGm7bkx+CnRwKlg40InMSwmdEj6rG2pIpIZP50ePMZHVunhMNK2FOCp+nsiJdKYkQxspyQwMPPeRPzPaycQXvkpV3ECTNHZojARGCI8+R73uGYUxMgSQjW3t2I6IJpQsBkVbAje/MuLpHFW9irlyt1FqXqdxZFHB+gQHSMPXaIqukU1VEcUSfSMXtGbo50X5935mLXmnGxmH/2B8/kD9sOP3Q==</latexit> z t→1,3 <latexit sha1_base64="E1yrxYdr6jCPxZaVgBccgGpwYB4=">AAAB/3icbVBNS8NAEN3Ur1q/ooIXL4tF8FQSkepJih70WKFf0Iay2WzbpZtN2J2IJfbgX/HiQRGv/g1v/hu3bQ7a+mDg8d4M/P8WHANjvNt5ZaWV1bX8uuFjc2t7R17d6+ho0RRVqeRiFTLJ5oJLlkdOAjWihUjoS9Y0x9eT/zmPVOaR7IGo5h5IelL3uOUgJG69kEH2AOkNypKZIBrKoEBHuOrrl10Ss4UeJG4GSmiDNWu/dUJIpqETAIVROu268TgpUQBp4KNC51Es5jQIemztqGShEx76fT+MT42SoB7kTIlAU/V3xMpCbUehb7pDAkM9Lw3Ef/z2gn0LryUyzgBJulsUS8RGCI8CQMHXDEKYmQIoYqbWzEdEEUomMgKJgR3/uVF0jgtueVS+e6sWLnM4sijQ3SETpCLzlEF3aIqqiOKHtEzekVv1pP1Yr1bH7PWnJXN7KM/sD5/AJQglc8=</latexit> GroundTruthB (a) <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="ngJ4LwVy64SUOm7D1izaSZe1kGs=">AAAB8HicbVDLSgNBEOz1GeMr6tHLYhA8aNgViZ4k4MVjBPOQZAmzk9lkyMzsMtMrxCVf4cWDIl79HG/+jZPHQRMLGoqqbrq7wkRwg5737Swtr6yurec28ptb2zu7hb39uolTTVmNxiLWzZAYJrhiNeQoWDPRjMhQsEY4uBn7jUemDY/VPQ4TFkjSUzzilKCVHp46GZ75p/6oUyh6JW8Cd5H4M1KEGaqdwle7G9NUMoVUEGNavpdgkBGNnAo2yrdTwxJCB6THWpYqIpkJssnBI/fYKl03irUthe5E/T2REWnMUIa2UxLsm3lvLP7ntVKMroKMqyRFpuh0UZQKF2N3/L3b5ZpRFENLCNXc3urSPtGEos0ob0Pw519eJPXzkl8ule8uipXrWRw5OIQjOAEfLqECt1CFGlCQ8Ayv8OZo58V5dz6mrUvObOYA/sD5/AHzuY/b</latexit> z t→1,1 <latexit sha1_base64="AR6sl8mnTQMlH/2iEM37cqHOcDQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRbBg5bdItWTFLx4rGA/pF1KNs22oUl2SbJCXforvHhQxKs/x5v/xrTdg7Y+GHi8N8PMvCDmTBvX/XZyK6tr6xv5zcLW9s7uXnH/oKmjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqZ+q1HqjSL5L0Zx9QXeCBZyAg2Vnp46qXm3DurTHrFklt2Z0DLxMtICTLUe8Wvbj8iiaDSEI617nhubPwUK8MIp5NCN9E0xmSEB7RjqcSCaj+dHTxBJ1bpozBStqRBM/X3RIqF1mMR2E6BzVAvelPxP6+TmPDKT5mME0MlmS8KE45MhKbfoz5TlBg+tgQTxeytiAyxwsTYjAo2BG/x5WXSrJS9arl6d1GqXWdx5OEIjuEUPLiEGtxCHRpAQMAzvMKbo5wX5935mLfmnGzmEP7A+fwB9T6P3A==</latexit> z t→1,2 <latexit sha1_base64="vsJKlHGzdi76OXUIRYX5Q5Xknu8=">AAAB8HicbVBNSwMxEM3Wr1q/qh69BIvgQcuuSvUkBS8eK9gPaZeSTbNtaJJdklmhLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhALbsB1v53c0vLK6lp+vbCxubW9U9zda5go0ZTVaSQi3QqIYYIrVgcOgrVizYgMBGsGw5uJ33xk2vBI3cMoZr4kfcVDTglY6eGpm8Kpd3I+7hZLbtmdAi8SLyMllKHWLX51ehFNJFNABTGm7bkx+CnRwKlg40InMSwmdEj6rG2pIpIZP50ePMZHVunhMNK2FOCp+nsiJdKYkQxspyQwMPPeRPzPaycQXvkpV3ECTNHZojARGCI8+R73uGYUxMgSQjW3t2I6IJpQsBkVbAje/MuLpHFW9irlyt1FqXqdxZFHB+gQHSMPXaIqukU1VEcUSfSMXtGbo50X5935mLXmnGxmH/2B8/kD9sOP3Q==</latexit> z t→1,3 <latexit sha1_base64="dXqqwb4hxMR6gdcUzDCao5czIz0=">AAACAnicbVDLSgNBEJz1GeNr1ZN4GQyCp7ArEj1JUASPEcwDsiHMTjrJkNkHM71iWIIXf8WLB0W8+hXe/BsnyR40saChqOqmu8uPpdDoON/WwuLS8spqbi2/vrG5tW3v7NZ0lCgOVR7JSDV8pkGKEKooUEIjVsACX0LdH1yN/fo9KC2i8A6HMbQC1gtFV3CGRmrb+x7CA6bXGkXAEDp0RL0+w/Ry1LYLTtGZgM4TNyMFkqHStr+8TsSTAELkkmnddJ0YWylTKLiEUd5LNMSMD1gPmoaGLADdSicvjOiRUTq0GylTIdKJ+nsiZYHWw8A3nebOvp71xuJ/XjPB7nkrFWGcIIR8uqibSIoRHedBO0IBRzk0hHElzK2U95liHE1qeROCO/vyPKmdFN1SsXR7WihfZHHkyAE5JMfEJWekTG5IhVQJJ4/kmbySN+vJerHerY9p64KVzeyRP7A+fwB14Jd4</latexit> Estimated ˆ B (b) <latexit sha1_base64="dODl3p8eR0BTY7N/kNdzRfy67b8=">AAAB/3icbVBNS8NAEN3Ur1q/ooIXL4tF8FQSkepJCh70IlToF7ShbDbbdulmE3YnYok9+Fe8eFDEq3/Dm//GbZuDtj4YeLw3w8w8PxZcg+N8W7ml5ZXVtfx6YWNza3vH3t1r6ChRlNVpJCLV8olmgktWBw6CtWLFSOgL1vSHVxO/ec+U5pGswShmXkj6kvc4JWCkrn3QAfYA6bWKEhngmkpggMf4tmsXnZIzBV4kbkaKKEO1a391gogmIZNABdG67ToxeClRwKlg40In0SwmdEj6rG2oJCHTXjq9f4yPjRLgXqRMScBT9fdESkKtR6FvOkMCAz3vTcT/vHYCvQsv5TJOgEk6W9RLBIYIT8LAAVeMghgZQqji5lZMB0QRCiayggnBnX95kTROS265VL47K1Yuszjy6BAdoRPkonNUQTeoiuqIokf0jF7Rm/VkvVjv1sesNWdlM/voD6zPH6TMldo=</latexit> GroundTruthM <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 (c) <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="Flv9ImOi9cg8QXSoFppu1wwOtJw=">AAACAXicbVDLSgNBEJz1GeNr1YvgZTAInsKuSPQkARG8CBHMA7IhzE46yZDZBzO9YljixV/x4kERr/6FN//GSbIHTSxoKKq66e7yYyk0Os63tbC4tLyymlvLr29sbm3bO7s1HSWKQ5VHMlINn2mQIoQqCpTQiBWwwJdQ9weXY79+D0qLKLzDYQytgPVC0RWcoZHa9r6H8IDplUYRMIQOHXl9hunNqG0XnKIzAZ0nbkYKJEOlbX95nYgnAYTIJdO66ToxtlKmUHAJo7yXaIgZH7AeNA0NWQC6lU4+GNEjo3RoN1KmQqQT9fdEygKth4FvOs2ZfT3rjcX/vGaC3fNWKsI4QQj5dFE3kRQjOo6DdoQCjnJoCONKmFsp7zPFOJrQ8iYEd/bleVI7KbqlYun2tFC+yOLIkQNySI6JS85ImVyTCqkSTh7JM3klb9aT9WK9Wx/T1gUrm9kjf2B9/gApLZdZ</latexit> Estimated ˆ M (d) <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="Tz/9i75x4AfF8KilJjjcDD+a5OE=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcKuSvQkAS8eI5gHJCHMTmaTIbOzy0yvEJd8hBcPinj1e7z5N06SPWhiQUNR1U13lx9LYdB1v53cyura+kZ+s7C1vbO7V9w/aJgo0YzXWSQj3fKp4VIoXkeBkrdizWnoS970R7dTv/nItRGResBxzLshHSgRCEbRSs2nXopnF5NeseSW3RnIMvEyUoIMtV7xq9OPWBJyhUxSY9qeG2M3pRoFk3xS6CSGx5SN6IC3LVU05Kabzs6dkBOr9EkQaVsKyUz9PZHS0Jhx6NvOkOLQLHpT8T+vnWBw3U2FihPkis0XBYkkGJHp76QvNGcox5ZQpoW9lbAh1ZShTahgQ/AWX14mjfOyVylX7i9L1ZssjjwcwTGcggdXUIU7qEEdGIzgGV7hzYmdF+fd+Zi35pxs5hD+wPn8ARmSj2s=</latexit> z t,3 <latexit sha1_base64="N94MWXlOXGORMGj8GrW4+DF1nXw=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgQcJukOhJAl48RjAPSJYwO5kkQ2Znl5leIS75C8eFPHq93jzb5wke9DEgoaiqpvuriCWwqDrfju5tfWNza38dmFnd2//oHh41DRRohlvsEhGuh1Qw6VQvIECJW/HmtMwkLwVjG9nfuuRayMi9YCTmPshHSoxEIyilVpPvRQvKtNeseSW3TnIKvEyUoIM9V7xq9uPWBJyhUxSYzqeG6OfUo2CST4tdBPDY8rGdMg7lioacuOn83On5MwqfTKItC2FZK7+nkhpaMwkDGxnSHFklr2Z+J/XSXBw7adCxQlyxRaLBokkGJHZ76QvNGcoJ5ZQpoW9lbAR1ZShTahgQ/CWX14lzUrZq5ar95el2k0WRx5O4BTOwYMrqMEd1KEBDMbwDK/w5sTOi/PufCxac042cwx/4Hz+ABgNj2o=</latexit> z t,2 <latexit sha1_base64="ekEwXi1kCwIrmrJVw9hptyH2I8A=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4kJKIVE9S8OKxgv2ANpTNdtMu3WzC7kSooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMCxIpDLrut7Oyura+sVnYKm7v7O7tlw4OmyZONeMNFstYtwNquBSKN1Cg5O1EcxoFkreC0e3Ubz1ybUSsHnCccD+iAyVCwShaqfXUy/Dcm/RKZbfizkCWiZeTMuSo90pf3X7M0ogrZJIa0/HcBP2MahRM8kmxmxqeUDaiA96xVNGIGz+bnTshp1bpkzDWthSSmfp7IqORMeMosJ0RxaFZ9Kbif14nxfDaz4RKUuSKzReFqSQYk+nvpC80ZyjHllCmhb2VsCHVlKFNqGhD8BZfXibNi4pXrVTvL8u1mzyOAhzDCZyBB1dQgzuoQwMYjOAZXuHNSZwX5935mLeuOPnMEfyB8/kDFoiPaQ==</latexit> z t,1 <latexit sha1_base64="tIPabPiM49xueMc1Wh1c9bQFOJk=">AAAB9HicbVDLSgNBEOz1GeMr6tHLYBA8SNgViZ4k4MVjBPOAZAmzk9lkyOzDmd5AXPY7vHhQxKsf482/cZLsQRMLGoqqbrq7vFgKjbb9ba2srq1vbBa2its7u3v7pYPDpo4SxXiDRTJSbY9qLkXIGyhQ8nasOA08yVve6Hbqt8ZcaRGFDziJuRvQQSh8wSgaye0OKaZPWS/Fcyfrlcp2xZ6BLBMnJ2XIUe+Vvrr9iCUBD5FJqnXHsWN0U6pQMMmzYjfRPKZsRAe8Y2hIA67ddHZ0Rk6N0id+pEyFSGbq74mUBlpPAs90BhSHetGbiv95nQT9azcVYZwgD9l8kZ9IghGZJkD6QnGGcmIIZUqYWwkbUkUZmpyKJgRn8eVl0ryoONVK9f6yXLvJ4yjAMZzAGThwBTW4gzo0gMEjPMMrvFlj68V6tz7mrStWPnMEf2B9/gDwLJI2</latexit> ˆz t,1 <latexit sha1_base64="cBHYDkyruEY7bzvJm4txXKQNzeg=">AAAB9HicbVBNS8NAEJ3Ur1q/qh69BIvgQUpSpHqSghePFewHtKFsttt26WYTdyeFGvI7vHhQxKs/xpv/xm2bg7Y+GHi8N8PMPD8SXKPjfFu5tfWNza38dmFnd2//oHh41NRhrChr0FCEqu0TzQSXrIEcBWtHipHAF6zlj29nfmvClOahfMBpxLyADCUfcErQSF53RDB5SnsJXlTSXrHklJ057FXiZqQEGeq94le3H9I4YBKpIFp3XCdCLyEKORUsLXRjzSJCx2TIOoZKEjDtJfOjU/vMKH17ECpTEu25+nsiIYHW08A3nQHBkV72ZuJ/XifGwbWXcBnFyCRdLBrEwsbQniVg97liFMXUEEIVN7fadEQUoWhyKpgQ3OWXV0mzUnar5er9Zal2k8WRhxM4hXNw4QpqcAd1aACFR3iGV3izJtaL9W59LFpzVjZzDH9gff4A8bGSNw==</latexit> ˆz t,2 <latexit sha1_base64="0PPpZ5Tx+tSLL30F1mIjtHaibbk=">AAAB9HicbVBNS8NAEJ3Ur1q/qh69BIvgQUqiUj1JwYvHCvYD2lA22227dLOJu5NCDfkdXjwo4tUf481/47bNQVsfDDzem2Fmnh8JrtFxvq3cyura+kZ+s7C1vbO7V9w/aOgwVpTVaShC1fKJZoJLVkeOgrUixUjgC9b0R7dTvzlmSvNQPuAkYl5ABpL3OSVoJK8zJJg8pd0Ezy7SbrHklJ0Z7GXiZqQEGWrd4lenF9I4YBKpIFq3XSdCLyEKORUsLXRizSJCR2TA2oZKEjDtJbOjU/vEKD27HypTEu2Z+nsiIYHWk8A3nQHBoV70puJ/XjvG/rWXcBnFyCSdL+rHwsbQniZg97hiFMXEEEIVN7fadEgUoWhyKpgQ3MWXl0njvOxWypX7y1L1JosjD0dwDKfgwhVU4Q5qUAcKj/AMr/Bmja0X6936mLfmrGzmEP7A+vwB8zaSOA==</latexit> ˆz t,3 <latexit sha1_base64="m+OjWbMrDWj3p5BEiFac1toO6dg=">AAACDnicbVA9SwNBEN3z2/h1ammzGASrcCcSrURIYyNEMB+QhLC3mdPFvdtjd04SjvwCG/+KjYUittZ2/hs3lyvU+GDh8ebN7MwLEikMet6XMze/sLi0vLJaWlvf2Nxyt3eaRqWaQ4MrqXQ7YAakiKGBAiW0Ew0sCiS0grvapN66B22Eiq9xlEAvYjexCAVnaKW+e9BFGGJWU1qDzDVaUxBag4AY6SVDLYbjvlv2Kl4OOkv8gpRJgXrf/ewOFE8jO4NLZkzH9xLsZUyj4BLGpW5qIGH8jt1Ax9KYRWB6WX7OmB5YZUBDpe2zO+Tqz46MRcaMosA6I4a35m9tIv5X66QYnvYyEScpQsynH4WppKjoJBs6EBo4ypEljGthd6X8lmnG0SZYsiH4f0+eJc2jil+tVK+Oy+dnRRwrZI/sk0PikxNyTi5InTQIJw/kibyQV+fReXbenPepdc4penbJLzgf3y9MnNM=</latexit> Correlation Coe!cient Matrix (e) Figure 3: Visualization of recovered causal graphs of latent variables. (a) and (b) show the ground truth and estimated time-delayed matrix, respectively. (c) and (d) show the ground truth and the estimated instantaneous causal relations, respectively. (e) displays the correlation between the ground truth and recovered latent variables. Second, we scale the synthetic experiments to dimensions matching LLM activations, illustrating why existing CRL methods fail in these high-dimensional settings. 6 Challenges on Scaling to Large Language Model Activation Dimensions Before presenting results on expanded synthetic data, we investigate the computation bottleneck: Jacobian calculation and explain why existing CRL methods do not extend efficiently to high-dimensional settings, thereby further motivating our use of a linear dynamical model. 2004006008001000 Dimension 0.0 2.5 5.0 7.5 10.0 12.5 Time (s) 2004006008001000 Dimension 0 20 40 60 80 100 120 Memory (GB) Figure 5: Computation time and memory usage for a single-step Jacobian as a function of input dimensionality. Both metrics grow superlinearly and exceed the capacity of modern GPUs when the input dimension is greater than 1000. 2505007501000 Dimension 0.50 0.60 0.70 0.80 0.90 MCC 0 1 2 3 4 5 Time (hours) Figure 6: MCC and total compute time in hours required to train the linear model as a function of input dimension. Computation cost of Jacobian evaluation. We take IDOL [27] as a representative method and measure both the wall-clock time and memory requirements for computing the Jacobian in prior network. Figure 5 demonstrates that both time and memory complexity grow polynomial with dimensionality. At dimensions of several thousand—which are common for Large Language Model activations—a single Jacobian evaluation will require approximately ten seconds on a modern GPU, such computation cannot fit into current generation hardware infrastructure. Since CRL training invokes this operation millions of times during the training process, the cumulative computational cost becomes prohibitive. As other CRL algorithms involve comparable Jacobian computations or more complex algorithms, this fundamental limitation applies broadly across the field. Advantages of linear models. When a linear model provides an adequate approximation of the transition dynamics of the hidden concepts, the Jacobian calculation can be derived directly from model parameters such asBandM, which significantly reduces the computational burden. Furthermore, such a linear model can scale efficiently with current-generation compute resources. To support this claim, we conducted a scaling experiment using the linear model on synthetic data with dimensionalities ranging from128to1024. In each setting, the model was trained on50million samples, simulating the typical training load in real LLM SAEs with50million tokens. As shown in Figure 6, the proposed method scales to substantially higher dimensions while maintaining a high MCC of approximately0.9. Additionally, it remains computationally efficient, with total computation time scaling linearly. In contrast, IDOL [27] exhausts memory when the dimensionality exceeds200, and iCITRIS [31] fails to scale beyond 16 dimensions. 5.2 Semi-synthetic Experiments Given the previous experiments on synthetic data, our proposal has been shown to recover ground- truth relationships even when the hidden dimensionality reaches one thousand, which would be challenging for existing non-parametric CRL approaches. We now proceed to evaluate real-world LLM activations, beginning with investigation (3). The experimental settings are briefly introduced below. Full details can be found in Appendix A.4.1. Table 1: Relation recovery scores (↑) for concept–relation extraction on semi-synthetic data. MethodLegalXMLEmail SAE+regression0.540.940.74 Ours19.958.632.66 Data preparation: We first examine three types of text, each exhibiting an obvious syntactic pattern. For example, in legal text, sequences often begin with “APPEALS” and end with “AFFIRMED”. For illustration, we focus on the legal text contrastive corpus group. We constructed two contrastive subsets from thePiledataset [17]: one containing legal documents with highly structured syntax and 7 United States Court ofAPPEALS... E. D. Mich. AFFIRMED A legal process where a party who is not satisfied with a court’s decision asks a higher court to review the case. ID: 2579 The higher court has reviewed the lower court’s decision and agrees with it. The original decision stands. ID: 1594 Mich. The geographical location associated with the court or the party handling the case. The location of the court along with partial information from the case identifier. ID: 2592 ID: 2623 Time-delayed RelationInstantaneous Relation Enc 25922623 APPEALS Enc 25791594 AFFIRMED Enc Example Text: Figure 7: Case study illustrates two relation types identified in a United States legal text. The blue elements show a time-delayed relation: the term “appeals” is typically followed by “affirmed” when a higher court confirms the lower court decision. The red elements show an instantaneous relation: two geographical location concepts (2592, 2623), are activated together in the same passage. stable temporal patterns, and the other containing unstructured non-legal text. We hypothesized that only texts containing these structured relations would yield meaningful temporal concept patterns and tested whether the model could recover them. Baseline: Since no directly applicable baseline exists, we used the standard SAEs trained above. As SAEs cannot capture concept-to-concept relations, we fitted a regression model to estimate temporal relation matrices ̃ Bviaz t = P τ ̃ B τ z t−τ . Evaluation: We compute the concept recovery score by first identifying the top concept pair(i,j)in legal contexts (ensuring that the two corresponding concepts do not fire in the non-legal text), then taking the corresponding coefficientB i,j and normalizing it by the standard deviationσ(B). The ratio B i,j σ(B) serves as a relation recovery score, indicating how strongly the relation stands out from noise. As shown in Table 1, our method achieves a significantly higher score, demonstrating successful recovery of the relation. Finally, as concept recovery is already achievable by standard SAEs, we additionally conducted steering vector semi-synthetic experiments to verify that our proposal can also recover concepts, following the approach of [23]. Further details are provided in Appendix A.4.2. 5.3 Real LLM Activation Analysis Experiment Setup We train our linear model on activations from the pretrained LLM pythia-160m-deduped[5], usingSAELens[6] anddictionary-learning[35] for activation extraction. The model is trained on 50 million tokens from thePiledataset [17]. To capture time-delayed influences, we setτ ≤ 20in Eq. 3 and aggregate theB τ matrices using max-pooling, preserving any causal link that appears at any time step. We evaluate three feature dimensionalities: 768 (matching the LLM’s hidden size and aligned with Section 3), 3072, and 6144—the latter two following common SAE training setups. Unless specified, main text examples use 3072-dimensional features withτ = 20. Full training details and additional results (sensitivity and ablation studies are included) are in Appendix A.5. Table 2: Comparison of our method against ReLU and TopK SAEs on SAEBench metrics. ModelRecon. Loss↓Sparse Prob. ↑Absorp. ↓Autointerp↑ ReLU SAE0.01100.65550.01410.6791 TopK SAE0.00970.71410.02800.6822 Ours0.01080.67360.01390.6883 Quantitative Evaluation on SAEBenchBefore we dive into the details of concept relation recovery, we first present a quantitative comparison between our method and existing SAE approaches. Since our main contribution lies in recovering temporal and instantaneous concept-to-concept relations, which are not reflected in current SAE benchmarks, we expect our model to perform on par with established SAEs on SAEBench tasks. This expectation is confirmed by the results in Table 2. Additional experiment results on larger latent size and model size can be found in Appendix A.5.5. Case StudiesWe start with an illustrative case in Figure 7, demonstrating how our model uncovers interpretable concept features with both time-delayed and instantaneous causal relationships from real-world LLM activations. This example provides an integrated view of how concepts are structured 8 Table 3: Representative time-delayed and instantaneous concept relations discovered. IDFromIDToCoeff. Time-delayed relations 1657 Keywords for formal and official content (e.g., senate, state, military) 1664 Verbs for official/formal usage (e.g., deny, press, order, sign) 0.99 2641 Adjectives of nationality (e.g. Japanese, Italians) 2674 Nouns that follow nationality (e.g. brands, name) 0.81 1657 Keywords for formal and official content (e.g., senate, state, military) 1124 Objective adjectives in formal usage (e.g., fast, continuous, incomplete) 0.74 Instantaneous relations 2208 Partial appellate citation with volume number 227 Partial appellate citation with volume number and series index 0.23 1714 Coding-format signals (e.g. localization tags, HTML tags) 80 Coding-format content (e.g. key–value pairs, HTML elements) 0.16 1582Month (e.g. March)363Full date (e.g. March 23, 2000)0.02 over time and interact within a single time step. Note that feature interpretations may vary beyond this case; additional examples and discussions appear in Appendix A.5. Figure 7 highlights two key observations: First, a time-delayed causal link between concepts related to “appeals” and “affirmed” in legal texts (features 2579 and 1594), capturing how the model reflects the procedural flow of legal judgments. Second, an instantaneous relation between two geographical location concepts (features 2592 and 2623) that are activated together in legal passages, suggesting that the model represents related spatial information simultaneously rather than sequentially. This example effectively demonstrates that both time-delayed and instantaneous relations exist among concept features, and that these are interpretable alongside the semantic meanings of the features—both of which are essential for LLM interpretability. To further demonstrate our model’s capacity to uncover both types of causal relationships, we present a broader set of examples in Table 3, which showcases representative cases of both time-delayed and instantaneous interactions among concept features. Time-delayed causal relations. We first observe a strong causal relation from nationality adjectives (feature 2641, “Japanese,” “Italians”) to the nouns they commonly modify (feature 2674, “brands,” “literature”), with a coefficient of 0.81. Moreover, the coefficients across the 20-token temporal window (i.e., differentB τ ) contribute consistently to the aggregated score. This suggests that such temporal relations can occur across a broad and uncertain time span, aligning with the semantic dynamics of real-world text generation. In formal contexts, official content words (feature 1657, “senate,” “judge”) influence both formal verbs (feature 1664, “deny,” “order”) with a coefficient of 0.99, and objective adjectives (feature 1124, “fast,” “continuous”) with a coefficient of 0.74. These relationships reflect how formal language constrains both action and descriptive style over time. Instantaneous causal relations. Table 3 presents three distinct categories of instantaneous relations. First, we observe a relationship between two partially overlapping appellate citation features—feature 2208 (volume numbers only) and feature 227 (volume number and series index)—with a normalized coefficient of 0.23. This illustrates how the model captures structured elements that commonly co-occur in legal documents, forming a cohesive representational unit. Second, we find that coding- format signals (feature 1714, e.g., localization tags, HTML tags) have an instantaneous causal relationship with coding-format content (feature 80, e.g., key-value pairs, HTML elements), with a coefficient of 0.16. This reveals how the model processes structured syntax and its associated content as co-occurring elements. Finally, our method identifies a clear relationship between two features that both represent dates: feature 1582 (month only) and feature 363 (full date), suggesting complementary representations within the model’s internal structure. These findings demonstrate our method’s ability to uncover both temporal and instantaneous causal structures in the concept space of LLM activations, offering insights into how models organize and process information. The identified relationships align with expected patterns in natural language across domains such as legal texts, temporal expressions, and structured formats, validating the effectiveness of our approach for analyzing information flow in large language models. 9 6 Related Work LLM Interpretability Understanding the internal representations of LLMs remains challenging despite significant progress [26]. Interpretability research on LLMs has explored multiple directions including: probing for linguistic knowledge [19], evaluating interpretability methods through con- trolled experiments [22], benchmarking SAEs’ capacity to disentangle factual knowledge [12], and developing ground-truth evaluation frameworks [53,25]. Recent work suggests that LLM repre- sentations may follow a linear organization [43], though this hypothesis has been challenged [16]. Our approach extends these efforts by focusing specifically on causal interpretability of temporal relationships in LLMs, providing a principled framework for understanding how information flows through model representations during sequential text generation. Additionally sparse autoencoders (SAEs) decompose neural activations into interpretable features [14,7,52]. Initial work demon- strated that SAEs can recover meaningful features from language model activations [41], leading to numerous architectural innovations including alternative activation functions [50,46], training optimizations [8,20], and efficient dictionary allocation mechanisms [4,39]. Recent work has successfully scaled SAEs to larger models [18,29,2], enabling automated interpretation of millions of features [44]. Despite these advances, most SAE approaches treat features as isolated units without modeling temporal relationships [11,9], lack explicit causal structure [34], and offer no identifiability guarantees [54, 33]—limitations our work directly addresses. Feature-based Causal Circuits Recent methods like Sparse Feature Circuits [36] and attribution graphs [1,30] identify causal subnetworks explaining model behavior. These build on earlier circuit analysis methods exploring component functionalities in vision and language models [42, 10,15]. Targeted interventional studies have revealed specific functional circuits, such as indirect object identification [55] and factual associations [37]. While these methods enable mechanistic understanding of model computations [19,40], they primarily rely on correlational measures rather than structured causal inference [16,43]. But they focus on stationary relationships [21] instead of modeling evolving token-to-token dependencies critical for understanding sequential reasoning. Causal Representation LearningCausal representation learning provides identifiability guarantees for latent variables [56,27,47]. Temporal extensions model dynamics in sequential data [57, 31,49], with recent advances addressing non-stationarity [48,13] and instantaneous effects [32]. Multiple distribution methods [59,38] can recover causal structure under specific interventions or group structures. These approaches provide theoretical foundations for disentangling latent variables and identifying causal graphs [54]. However, existing CRL algorithms cannot scale to LLM dimensions due to computational bottlenecks in calculating Jacobians. Our linearized formulation maintains identifiability guarantees while enabling application to high-dimensional LLM representations—bridging theoretical CRL advances with practical LLM interpretability challenges. 7 Conclusion We introduced a causal representation learning framework for LLMs that jointly models time- delayed relationships and instantaneous constraints between latent concepts. Our approach provides theoretical identifiability guarantees while solving the scalability limitations of existing CRL methods through a computationally efficient linear formulation. Synthetic experiments validated our method’s ability to recover latent causal structures from toy scale to real LLM scales. When applied to real LLM activations, our approach uncovered interpretable semantic patterns, revealing information flow pathways during text generation. Future work could leverage these causal structures for targeted alignment interventions, explore cross-layer concept transformations, and integrate with mechanistic interpretability techniques. 8 Acknowledgment The authors would like to thank the anonymous reviewers for helpful comments and suggestions during the reviewing process. The authors would also like to acknowledge the support from NSF Award No. 2229881, AI Institute for Societal Decision Making (AI-SDM), the National Institutes of Health (NIH) under Contract R01HL159805, and grants from Quris AI, Florin Court Capital, and MBZUAI-WIS Joint Program, and the Al Deira Causal Education project. 10 References [1] Emmanuel Ameisen, Jack Lindsey, Adam Pearce, Wes Gurnee, Nicholas L. Turner, Brian Chen, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, and Joshua Batson. Circuit tracing: Revealing computational graphs in language models. Transformer Circuits Thread, 2025. [2]Anthropic Interpretability Team. Circuits updates — august 2024. Transformer Circuits Thread, 2024. [3]Anthropic Interpretability Team.Training sparse autoencoders.https:// transformer-circuits.pub/2024/april-update/index.html#training-saes, 2024. [Accessed January 20, 2025]. [4] Kola Ayonrinde. Adaptive sparse allocation with mutual choice & feature choice sparse autoencoders, 2024. [5]Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023. [6]Joseph Bloom, Curt Tigges, Anthony Duong, and David Chanin. Saelens.https://github. com/jbloomAus/SAELens, 2024. [7]Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Con- erly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter, Tom Henighan, and Christopher Olah. Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer- circuits.pub/2023/monosemantic-features/index.html. [8]Bart Bussmann, Patrick Leask, and Neel Nanda. Batchtopk: A simple improvement for topk-saes, 2024. [9] Bart Bussmann, Michael Pearce, Patrick Leask, Joseph Isaac Bloom, Lee Sharkey, and Neel Nanda. Showing sae latents are not atomic using meta-saes, 2024. [10] Nick Cammarata, Gabriel Goh, Shan Carter, Chelsea Voss, Ludwig Schubert, and Chris Olah. Curve circuits. Distill, 6(1):e00024–006, 2021. [11] David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, and Joseph Bloom. A is for absorption: Studying feature splitting and absorption in sparse autoencoders, 2024. [12]Maheep Chaudhary and Atticus Geiger. Evaluating open-source sparse autoencoders on disen- tangling factual knowledge in gpt-2 small, 2024. [13] Guangyi Chen, Yifan Shen, Zhenhao Chen, Xiangchen Song, Yuewen Sun, Weiran Yao, Xiao Liu, and Kun Zhang. Caring: Learning temporal causal representation under non-invertible generation process. arXiv preprint arXiv:2401.14535, 2024. [14]Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models, 2023. [15] Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1(1):12, 2021. [16] Joshua Engels, Eric J. Michaud, Isaac Liao, Wes Gurnee, and Max Tegmark. Not all language model features are linear, 2024. 11 [17]Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020. [18]Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders. arXiv preprint arXiv:2406.04093, 2024. [19]Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. [20]Davide Ghilardi, Federico Belotti, and Marco Molinari. Efficient training of sparse autoencoders for large language models via layer groups. arXiv preprint arXiv:2410.21508, 2024. [21] Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, and Dimitris Bertsimas. Finding neurons in a haystack: Case studies with sparse probing, 2023. [22] Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, and Atticus Geiger. Ravel: Evaluat- ing interpretability methods on disentangling language model representations, 2024. [23]Shruti Joshi, Andrea Dittadi, Sébastien Lachapelle, and Dhanya Sridhar. Identifiable steering via sparse autoencoding of multi-concept shifts. arXiv preprint arXiv:2502.12179, 2025. [24]Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Isaac Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum Stuart McDougall, Kola Ayonrinde, et al. Saebench: A comprehensive benchmark for sparse autoencoders in language model interpretability. In Forty-second International Conference on Machine Learning, 2025. [25]Adam Karvonen, Benjamin Wright, Can Rager, Rico Angell, Jannik Brinkmann, Logan Smith, Claudio Mayrink Verdun, David Bau, and Samuel Marks. Measuring progress in dictionary learning for language model interpretability with board game models, 2024. [26] Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality. Transactions on Machine Learning Research, 2023. [27] Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Guangyi Chen, and Kun Zhang. On the identification of temporal causal representation with instantaneous dependence. In The Thirteenth International Conference on Learning Representations, 2025. [28] Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca Dragan, Rohin Shah, and Neel Nanda. Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. arXiv preprint arXiv:2408.05147, 2024. [29]Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca Dragan, Rohin Shah, and Neel Nanda. Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2, 2024. [30]Jack Lindsey, Wes Gurnee, Emmanuel Ameisen, Brian Chen, Adam Pearce, Nicholas L. Turner, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, and Joshua Batson. On the biology of a large language model. Transformer Circuits Thread, 2025. [31]Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M Asano, Taco Cohen, and Efstratios Gavves. Causal representation learning for instantaneous and temporal effects in interactive systems. In The Eleventh International Conference on Learning Representations, 2023. 12 [32]Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M Asano, Taco Cohen, and Stratis Gavves. Citris: Causal identifiability from temporal intervened sequences. In International Conference on Machine Learning, pages 13557–13603. PMLR, 2022. [33]Aleksandar Makelov, George Lange, and Neel Nanda. Towards principled evaluations of sparse autoencoders for interpretability and control, 2024. [34]Luke Marks, Alasdair Paren, David Krueger, and Fazl Barez. Enhancing neural network interpretability with feature-aligned sparse autoencoders, 2024. [35]Samuel Marks, Adam Karvonen, and Aaron Mueller. dictionary_learning.https://github. com/saprmarks/dictionary_learning, 2024. [36]Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, and Aaron Mueller. Sparse feature circuits: Discovering and editing interpretable causal graphs in language models, 2024. [37] Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. Advances in neural information processing systems, 35:17359–17372, 2022. [38]Hiroshi Morioka and Aapo Hyvärinen. Causal representation learning made identifiable by grouping of observational variables. arXiv preprint arXiv:2310.15709, 2023. [39]Anish Mudide, Joshua Engels, Eric J Michaud, Max Tegmark, and Christian Schroeder de Witt. Efficient dictionary learning with switch sparse autoencoders. arXiv preprint arXiv:2410.08201, 2024. [40]Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, et al. The quest for the right mediator: A history, survey, and theoretical grounding of causal interpretability. arXiv preprint arXiv:2408.01416, 2024. [41] Neel Nanda. Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper, Oct 2023. [42] Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter. Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020. [43] Kiho Park, Yo Joong Choe, and Victor Veitch. The linear representation hypothesis and the geometry of large language models. arXiv preprint arXiv:2311.03658, 2023. [44]Gonçalo Paulo, Alex Mallen, Caden Juang, and Nora Belrose. Automatically interpreting millions of features in large language models, 2024. [45] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2000. [46]Senthooran Rajamanoharan, Tom Lieberum, Nicolas Sonnerat, Arthur Conmy, Vikrant Varma, János Kramár, and Neel Nanda. Jumping ahead: Improving reconstruction fidelity with jumprelu sparse autoencoders. arXiv preprint arXiv:2407.14435, 2024. [47]Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021. [48] Xiangchen Song, Zijian Li, Guangyi Chen, Yujia Zheng, Yewen Fan, Xinshuai Dong, and Kun Zhang. Causal temporal representation learning with nonstationary sparse transition. Advances in Neural Information Processing Systems, 37:77098–77131, 2024. [49]Xiangchen Song, Weiran Yao, Yewen Fan, Xinshuai Dong, Guangyi Chen, Juan Carlos Niebles, Eric Xing, and Kun Zhang. Temporally disentangled representation learning under unknown nonstationarity. Advances in Neural Information Processing Systems, 36:8092–8113, 2023. 13 [50]GlenM.Taggart.Prolu:Anonlinearityforsparseautoencoders, 2024.https://w.alignmentforum.org/posts/HEpufTdakGTTKgoYF/ prolu-a-nonlinearity-for-sparse-autoencoders. [51]Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:2408.00118, 2024. [52]Andrew Templeton, Timothy Conerly, Jacob Marcus, John Lindsey, Tamera Bricken, Bowen Chen, et al. Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet. Technical report, Anthropic, 2024. Transformer Circuits Thread Technical Report. [53]Constantin Venhoff, Anisoara Calinescu, Philip Torr, and Christian Schroeder de Witt. Sage: Scalable ground truth evaluations for large sparse autoencoders, 2024. [54] Julius von Kügelgen. Identifiable causal representation learning: Unsupervised, multi-view, and multi-environment. arXiv preprint arXiv:2406.13371, 2024. [55]Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small, 2022. [56]Weiran Yao, Guangyi Chen, and Kun Zhang. Temporally disentangled representation learning. arXiv preprint arXiv:2210.13647, 2022. [57]Weiran Yao, Yuewen Sun, Alex Ho, Changyin Sun, and Kun Zhang. Learning temporally causal latent processes from general temporal data. arXiv preprint arXiv:2110.05428, 2021. [58] Kun Zhang and Aapo Hyvärinen. A general linear non-Gaussian state-space model: Identifi- ability, identification, and applications. In Proceedings of the Asian Conference on Machine Learning, volume 20 of Proceedings of Machine Learning Research, pages 113–128. PMLR, 14–15 Nov 2011. [59]Kun Zhang, Shaoan Xie, Ignavier Ng, and Yujia Zheng. Causal representation learning from multiple distributions: a general setting. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 14 A Technical Appendices and Supplementary Material Contents A.1 Discussion of Non-parametric Proof in Linear Case . . . . . . . . . . . . . . . . .15 A.2 Proof for Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 A.3 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 A.3.1 Fixed Structure Experiment . . . . . . . . . . . . . . . . . . . . . . . . .21 A.3.2 Scalability Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 A.4 Semi-synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 A.4.1 Target Concept Relation Recovery . . . . . . . . . . . . . . . . . . . . . .22 A.4.2 Steering Vector Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . .23 A.5 LLM Activation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 A.5.1 Details on the Real-world Experiments Settings . . . . . . . . . . . . . . .23 A.5.2 Visualizations of Training Loss and Sparsity Metrics . . . . . . . . . . . .25 A.5.3 Sensitivity and Ablation Studies . . . . . . . . . . . . . . . . . . . . . . .26 A.5.4 More Showcases on the Recovered Concepts and Relations from LLM Acti- vations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 A.5.5 Additional SAEBench Results on Larger Latent Sizes and Models . . . . .29 A.5.6 Statistical Testing and Absorption Analysis . . . . . . . . . . . . . . . . .29 A.5.7 Preliminary Investigation with Time Lag up to 100 . . . . . . . . . . . . .29 A.5.8 Addition Experiments with Pretrained SAE . . . . . . . . . . . . . . . . .30 A.6 Compute Resources and Code . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 A.7 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 A.8 Societal Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 A.1 Discussion of Non-parametric Proof in Linear Case In this discussion, we will replay the proof in [27] with assuming linearity to show the proof does not work in linear case. The proof sketch of the original non-parametric proof adapted in linear case is shown in below. And we will show it is impossible to have full-rank coefficient matrixC z t , which leads to the failure of the this derivation. Non-parametric Proof Sketch in Linear Case If the learned model( ˆ A, ˆ B, ˆ M, ˆp ε )is observa- tionally equivalent to the true model, then there exists an invertible linear mapH = A ˆ A −1 such thatp( ˆ z t | ˆ z t−1 ) = p(z t |z t−1 )|H|. For nonadjacent(ˆz t,k , ˆz t,l )inM ˆ z t , conditional independence implies logp( ˆ z t | ˆ z t−1 ) ∂ 2 ˆz t,k ˆz t,l = 0. Expanding this derivative via the chain rule yields a linear system over terms ∂ 2 z t,i ∂ˆz t,k ∂ˆz t,l , ∂z t,i ∂ˆz t,k , ∂z t,i ∂ˆz t,l , and ∂z t,i ∂ˆz t,k , ∂z t,j ∂ˆz t,l . If the corresponding coefficient matrixC z t is full rank, then all these terms vanish. Hence eachz t,i depends on at most oneˆz t,k , and nonadjacent estimated variables cannot involve adjacent true ones. Suppose that the estimated model( ˆ A, ˆ B, ˆ M,p ˆε )is observationally equivalent to the true model (A, B, M,p ε ), then we havep ˆ A ˆ z t (x t )=p Az t (x t ). Applying the change of variable, we have p ˆ A ˆ z t (x t ) = p( ˆ z t )| ˆ A| = p(z t )|A| = p Az t (x t ) ⇔ p( ˆ z t ) = p(z t ) |A| | ˆ A| ⇔ p( ˆ z t ) = p(z t )|A ˆ A −1 |, given the invertibility of the linear mixing function ˆ A . For the sake of simplicity, letH = A ˆ A −1 and p( ˆ z t ) = p(z t )|H|. Similarly, we have p( ˆ z t+1 ) = p(z t+1 )|H|. 15 LetM ˆ z t be the Markov network of ˆ z t . Then, for any pair of variablesˆz t,k andˆz t,l that are not connected inM ˆ z t , i.e.,(ˆz t,k , ˆz t,l ) /∈E M ˆ z t , we have conditional independence on the graph due to the temporal relations between z t−1 and z t and the property of Markove networks: ˆz t,i ⊥ ˆz t,j | ˆ z t−1 ∪ ( ˆ z t \ˆz t,i , ˆz t,j ). We then make use of the fact that if two variables are independent conditioning on all the rest of the variables, then the cross second-order derivative of the logarithmic of the joint density w.r.t. the two variables. In our case, it is shown as: ∂ 2 logp( ˆ z t , ˆ z t−1 ) ∂ ˆz t,i ∂ ˆz t,j = 0. Sincelogp( ˆ z t , ˆ z t−1 ) = logp( ˆ z t | ˆ z t−1 ) + logp( ˆ z t−1 ), where ˆ z t has no functional dependence on z t,k or z t,l , and therefore, ∂ 2 logp( ˆ z t | ˆ z t−1 ) ∂ ˆz t,i ∂ ˆz t,j = 0.(10) Because ˆ z t ˆ z t−1 = H ∗ 0 H z t z t−1 ,p( ˆ z t , ˆ z t−1 ) = p(z t , z t−1 )|H 2 |. Then, we divide both sides by p( ˆ z t−1 ), we have: p( ˆ z t | ˆ z t−1 ) = p(z t , z t−1 )|H 2 | p(z t−1 )|H| = p(z t |z t−1 )|H|.(11) Its partial derivative w.r.t. ˆz t,k is ∂ logp( ˆ z t | ˆ z t−1 ) ∂ ˆz t,k = ∂ logp(z t |z t−1 ) ∂ ˆz t,k + ∂ log|H| ∂ ˆz t,k (12) = X i∈[n] ∂ logp( ˆ z t | ˆ z t−1 ) ∂z t,i ∂z t,i ∂ ˆz t,k + ∂ log|H| ∂ ˆz t,k (13) Then, its cross second order derivative w.r.t. ˆz t,k and ˆz t,l is ∂ 2 logp( ˆ z t | ˆ z t−1 ) ∂ ˆz t,k ∂ ˆz t,l = ∂ 2 logp(z t |z t−1 ) ∂ ˆz t,k ∂ ˆz t,l + ∂ 2 log|H| ∂ ˆz t,k ∂ ˆz t,l (14) = X i∈[n] ∂ logp(z t |z t−1 ) ∂z t,i ∂ 2 z t,i ∂ ˆz t,k ˆz t,l +(15) X i∈[n] X j∈[n] ∂ 2 logp(z t |z t−1 ) ∂z t,i ∂z t,j ∂ 2 z t,i ∂ ˆz t,k ∂ 2 z t,j ∂ ˆz t,l + ∂ 2 log|H| ∂ ˆz t,k ∂ ˆz t,l (16) = X i∈[n] ∂ logp(z t |z t−1 ) ∂z t,i ∂ 2 z t,i ∂ ˆz t,k ˆz t,l + X i∈[n] ∂ 2 logp(z t |z t−1 ) ∂z 2 t,i ∂z t,i ∂ ˆz t,k ∂z t,i ∂ ˆz t,l + X (z t,i ,z t,j )∈M z t ∂ 2 logp(z t |z t−1 ) ∂z t,i ∂z t,j ∂z t,i ∂ ˆz t,k ∂z t,j ∂ ˆz t,l + ∂ 2 log|H| ∂ ˆz t,k ∂ ˆz t,l (17) = X i∈[n] ∂ 2 logp(z t |z t−1 ) ∂z 2 t,i ∂z t,i ∂ ˆz t,k ∂z t,i ∂ ˆz t,l + X (z t,i ,z t,j )∈M z t ∂ 2 logp(z t |z t−1 ) ∂z t,i ∂z t,j ∂z t,i ∂ ˆz t,k ∂z t,j ∂ ˆz t,l , (18) where ∂ 2 log|H| ∂ˆz t,k ∂ˆz t,l = 0 is removed from the derivation since Eq. 17, because the Jacobian matrix H = (I− M) −1 is constant according to Eq. 10. Besides, as we assume linear mixing function in Eq. 3, the cross second order derivative ∂ 2 z t,i ∂ˆz t,k ˆz t,l = 0, the terms including this are removed from the derivation since Eq. 17 as well. 16 From Eq. 18, for each value ofz t , if we haveR = n + |M z t |different values ofz t−1 , i.e., z (r) t−1 r∈[R] , to make the following matrixC z t full rank, for each non-adjacent pair ofˆz t,k andˆz t,l in the estimated Markov networkM ˆ z t , then we can conclude the relations betweenz t and ˆ z t .C z t is, C z t =       ( ∂ 2 logp(z t |z (1) t−1 ) ∂z 2 t,i ) T i∈[n] ⊕ ( ∂ 2 logp(z t |z (1) t−1 ) ∂z t,i ∂z t,j ) T (z t,i ,z t,j )∈M z t . . . ( ∂ 2 logp(z t |z (R) t−1 ) ∂z 2 t,i ) T i∈[n] ⊕ ( ∂ 2 logp(z t |z (R) t−1 ) ∂z t,i ∂z t,j ) T (z t,i ,z t,j )∈M z t       .(19) From the notation above, we can write down each column ofC z t as a function ofz t−1 taking different values to form each row. According to Eq. 3, we havez t = Mz t + Bz t−1 + ε t , which forms the joint distribution ofp(z t , z t−1 )by: z t z t−1 = (I− M) −1 B (I− M) −1 I0 z t−1 ε t , where the determinant of the Jacobian matrix of the transformation (in linear case, the Jacobian is the transformation itself) (I− M) −1 B (I− M) −1 I0 is−|(I− M) −1 |. By change of variable, we have p(z t , z t−1 ) = p(z t−1 ,ε t )|(I− M) −1 |. And due to the independence between z t−1 and ε t , we have p(z t , z t−1 ) = p(z t−1 )p(ε t )|(I− M) −1 |. Therefore, the conditional density is p(z t |z t−1 ) = p(ε t )|(I− M) −1 | = Y a∈[n] p(ε t,a )|(I− M) −1 | = Y a∈[n] p((I a,: − M a,: )z t − B a,: z t−1 )|(I− M) −1 |, where M a,: denotes the a-th row of the matrix. Since the conditional densityp(z t |z t−1 )is second-smooth according to A1, the partial derivative of logp(z t |z t−1 ) w.r.t. z t,i is ∂ logp(z t |z t−1 ) ∂z t,i = X a∈[n] ∂ logp((I a,: − M a,: )z t − B a,: z t−1 ) ∂z t,i + log|(I− M) −1 | ∂z t,i = X a∈[n] (I a,i − M a,i ) p ′ (ε t,a ) p(ε t,a ) = X a∈[n] K a,i φ a (z t−1 , z t ), in which we use the abbreviationp(ε t,a ) = p((I a,: − M a,: )z t − B a,: z t−1 ), and therefore we let φ a (z t , z t−1 ) = p ′ (ε t,a ) p(ε t,a ) . Besides, we also let K = I− M for the sake of simplicity. Its partial derivative w.r.t. z t,j is: ∂ 2 logp(z t |z t−1 ) ∂z t,i ∂z t,k = X a∈[n] K a,i (I a,j − M a,j ) p ′ (ε t,a ) p(ε t,a ) − p ′ (ε t,a ) p(ε t,a ) 2 = X a∈[n] K a,i K a,j ψ a (z t−1 , z t ) where ψ a (z t−1 , z t ) =− p ′ (ε t,a ) p(ε t,a ) − p ′ (ε t,a ) p(ε t,a ) 2 , given p(ε t,a ) = p((I a,: − M a,: )z t − B a,: z t−1 ). When j = i, ∂ 2 logp(z t |z t−1 ) ∂z t,i ∂z t,k = X a∈[n] K a,i K a,i ψ a (z t−1 , z t ). 17 Therefore, each row ofC z t can be represented as a vector of functions taking different values of z t−1 : C z t (z (r) t−1 ) = ( X a∈[n] M a,i M a,i ψ a (z (r) t−1 , z t )) T i∈[n] (20) ⊕ ( X a∈[n] M a,i M a,j ψ a (z (r) t−1 , z t )) T (z t,i ,z t,j )∈M z t .(21) And if we have at least R = n +|M z t | values ofz (r) t−1 r∈[R] to make the matrix C z t = (C z t (z (1) t−1 ),· , C z t (z (R) t−1 )) T of shapeR × Rfull-rank, then the relationships between the estimated ˆ z t and the truez t can be solved by having nontrivial zero solutions in Eq. 18. However, it is impossible forC z t to beR × Rfull-rank, because the concatenated two parts,( P a∈[n] M a,i M a,i ψ a (z (r) t−1 , z t )) T i∈[n] and( P a∈[n] M a,i M a,j ψ a (z (r) t−1 , z t )) T (z t,i ,z t,j )∈M z t are both constant linear combinations of ψ a (z (r) t−1 , z t )) (z t,i ,z t,j ) a∈[n] , forMdoes not change across timestamps. It constrains the rank ofC z t to be at mostn, which implies any non empty instantaneous Markov network leads to the unidentifiability of z t . A.2 Proof for Theorem 1 Theorem 1 (Latent Indeterminacy). Suppose the estimated model( ˆ A, ˆ B τ L τ=1 , ˆ M,p ˆε ) and the true model(A,B τ L τ=1 , M,p ε ) both generatex t according to Eq. 3 and are observationally equivalent on the autocovariance matricesR x (k)fork = 0, 1,...,L. If conditions A1–A4 hold, then the model parameters are identifiable up to the following indeterminacies: ˆ ε t = Pε t , ˆ A = AS,(I− ˆ M) = P(I− M)S, ˆ B τ = PB τ S, where S∈ R n×n is invertible and P is a signed permutation matrix. •A1 (Temporally white noise). Bothe t andε t are temporally white, andε t has i.i.d. mutually independent components. To remove scaling indeterminacy, assumeΣ ε t = Iand zero-mean data. • A2 (Rank sufficiency). A∈ R m×n (m≥ n) has full column rank, and B L is full rank. •A3 (Process stability). Eq. 22 defines a stable vector autoregression, i.e., all roots of det(I− P L τ=1 B τ y −τ ) = 0 lie strictly inside the unit circle. • A4 (Non-Gaussianity). At most one component of ε t is Gaussian. Proof. A more condensed formulation is used, which is the same as Eq. 3, for the sake of matrix transformation presentation in the proof. z t = Mz t + L X τ=1 B τ z t−τ + ε t ,(22) x t = Az t ,(23) Step 1: Construct Autocovariance Matrices According to the formulation forz t , we can derive its autocovariance at lag k as: R z (k) = E   z t L X τ=1 (I− M) −1 B τ z t+k−τ + (I− M) −1 ε t+k ! T   = L X τ=1 E[z t z T t+k−τ ]B T τ (I− M) −T + E[z t ε T t+k ](I− M) −T . 18 Hence, R z (k) =              L X τ=1 R z (k− τ ) B T τ (I− M) −T ,k ̸= 0, L X τ=1 R z (k− τ ) B T τ (I− M) −T + (I− M) −T , k = 0. Because x t = Az t , R x (k) = AR z (k)A T . Thus, R x (k) =              L X τ=1 AR z (k− τ )A T A −T B T τ (I− M) −T A T ,k ̸= 0, L X τ=1 AR z (k− τ )A T A −T B T τ (I− M) −T A T + A(I− M) −T A T , k = 0. Simplifying, R x (k) =              L X τ=1 R x (k− τ )A −T B T τ (I− M) −T A T ,k ̸= 0, L X τ=1 R x (k− τ )A −T B T τ (I− M) −T A T + A(I− M) −T A T , k = 0. Let C τ = A(I− M) −1 B τ A −1 . Then, R x (k) =              L X τ=1 R x (k− τ )C T τ ,k ̸= 0, L X τ=1 R x (k− τ )C T τ + A(I− M) −T A T , k = 0. Step 2: Solving the Linear System Formed by AutocovarianceUse the derivedR x (k), we can construct the following linear system.       R x (0)− H T R x (1) R x (2) . . . R x (L)       =       R x (1) T R x (2) T ·R x (L) T R x (0)R x (1) T · R x (L− 1) T R x (1)R x (0) · R x (L− 2) T . . . . . . . . . . . . R x (L− 1) R x (L− 2) ·R x (0)            C T 1 C T 2 . . . C T L      ,(24) where H = A(I− M) −1 . Let’s look at the lower mL rows of the coefficient matrix of the linear system: Q =     R x (0)R x (1) T · R x (L− 1) T R x (1)R x (0) · R x (L− 2) T . . . . . . . . . . . . R x (L− 1) R x (L− 2) ·R x (0)     . 19 It is clear thatQ = E[ ⃗ x t ⃗ x t T ], ⃗ x t = (x T t , x T t−1 ,· , x T t+L−1 ) T . Because the processx t admits no nontrivial deterministic linear relation amongx t , x t−1 ,..., x t−L+1 due toε,Q = E[ ⃗ x t ⃗ x t T ]is always positive definite and therefore nonsingular. Thus, we can solve the lowermLlinear system to obtainC k ,k = 1,· ,Land then use the solvedC k to obtainHH T . Therefore, suppose we have two models,(A, M,B τ L τ=1 , Σ ε )and ( ˆ A, ˆ M, ˆ B τ L τ=1 , Σ ˆε ), and they are observationally equivalent onR x (k),k = 0, 1,· ,L, then we have the relations between there parameters as follows: ˆ A(I− ˆ M) −1 (I− ˆ M) −T ˆ A T = A(I− M) −1 (I− M) −T A T (25) ˆ A(I− ˆ M) −1 ˆ B τ ˆ A −1 = A(I− M) −1 B τ A −1 ,∀τ = 1,· ,L.(26) Step 3: Further Investigate B τ and M From the RHS of Eq. 25, we have: H T = A(I− M) −1 (I− M) −T A T ,(27) which givesCol(H T ) ⊆ Col(A). Thank to the fact thatAhas full column rank, entailing its hasing left inverse A −1 A = I, and I− M being invertible, we have A = H T A −T (I− M) T (I− M),(28) which further gives Col(A)⊆ Col(H T ). Therefore,Col(A) = Col(H T ). As we can deriveCol( ˆ A) = Col(H T )from the LHS from Eq. 25, we have: Col(A) = Col( ˆ A). Since both A and ˆ A are column wise full rank, we have ˆ A = AS(29) for some invertible S∈ R d×d . From H T = ˆ H ˆ H T , with H = A(I− M) −1 showing that H has left inverse, we have H −1 ˆ H(H −1 ˆ H) T = I d (30) Therefore, we can derive thatH −1 ˆ H = U, whereUis an orthogonal matrix. However, since ˆ Honly has left inverse (the same asH), we cannot directly conclude howUplays the role in the relationship between H and ˆ H but need the following derivation. Given the invertibility ofI− MandCol(A) = Col( ˆ A), we knowCol(H) = Col( ˆ H) ⇒ ˆ H ∈ Col(H). Since H has right inverse only in the column space Col(H), we have H(H −1 ˆ H) = HU⇒ (H −1 ) ˆ H = HU ˆ H∈Col(H) =======⇒ I d ˆ H = HU⇒ ˆ H = HU.(31) Therefore, ˆ A(I− ˆ M) −1 = A(I− M) −1 U(32) ⇒ AS(I− ˆ M) −1 = A(I− M) −1 U(33) ⇒ (I− ˆ M) = U T (I− M)S.(34) Finally, substituting Eq. 26 with Eq. 34 and Eq. 29, for all τ = 1,· ,L we have, ˆ A(I− ˆ M) −1 ˆ B τ ˆ A −1 = A(I− M) −1 U ˆ B τ S −1 A −1 = A(I− M) −1 B τ A −1 (35) ⇒ U ˆ B τ S −1 = B τ (36) ⇒ ˆ B τ = U T B τ S .(37) So far, all the relations derived between the true model(A,B τ L τ=1 , M, Σ ε )and the estimated model ( ˆ A, ˆ B τ L τ=1 , ˆ M, Σ ˆε ) are underlined. 20 Step 4: Utilizing Non-Gaussianity Suppose the true model(A,B τ L τ=1 , M, Σ e )and the esti- mated model ( ˆ A, ˆ B τ L τ=1 , ˆ M, Σ ˆe ) are observationally equivalent: x t = A (I− M)− L X τ=1 B τ z −τ ! −1 ε t (38) = ˆ A (I− ˆ M)− L X τ=1 ˆ B τ z −τ ! −1 ˆε t (39) = AS U T (I− M)S− L X τ=1 U T B τ Sz −τ ! −1 ˆε t (40) = A (I− M)− L X τ=1 B τ z −τ ! −1 Uˆε t (commute B τ with z −τ )(41) ⇒ A (I− M)− L X τ=1 B τ z −τ ! (ε t − Uˆε t ) = 0(42) ⇒ (ε t − Uˆε t ) = 0.(43) Therefore, we can derive thatˆε t = U T ε t . Because there is at most one Gaussian component inε t and Σ ε t = I , we know thatU T must be a signed permutation, i.e.,U T = P, wherePis a permutation matrix and taking values from+1,−1for its nonzero entries. To conclude, we derive ˆ A = AS , (I− ˆ M) = U T (I− M)S, and ˆ B τ = U T B τ S, whereBis an invertible matrix andPis a signed permutation matrix. A.3 Synthetic Experiments We conduct two synthetic verification experiments to validate our linear temporal instantaneous ICA method. Instruction is provided in the synthetic/README.md file in our code repository. A.3.1 Fixed Structure Experiment For the first synthetic verification experiment, we generate data using fixed time-delayed influence functions and instantaneous relations with the following ground truth matrices: B = " 0.4 0.6 0 010 001 # , M = " 000 0.200 00.2 0 # .(44) The data generation process follows a structured temporal model. We initialize the first hidden state z 0 by sampling from a uniform distributionU (0, 1). For subsequent time steps, we compute the historical influence asz hist = B z t−1 and then constructz t iteratively: the first dimension receives only historical influence plus noise, while remaining dimensionsi≥ 2incorporate both historical and instantaneous dependencies: z (1) t = z (1) hist + ε (1) t (45) z (i) t = z (i) hist + w inst · z (i−1) t + ε (i) t , i≥ 2(46) whereε t is Laplace noise with scale 1.0, andw inst = 0.2. The observations are generated asx t = Az t where A is a 3× 3 randomly initialized mixing matrix . We train the model for 50,000 steps with batch size 1024 (approximately 51 million total samples) using the Adam optimizer with learning rate8× 10 −3 and weight decay6× 10 −4 . The loss function includes reconstruction error, KL divergence term, and L1 regularization penalties:1× 10 −3 for matrixMand1× 10 −8 for matrixB. We enforce the lower-triangular constraint onMto ensure identifiability. 21 A.3.2 Scalability Experiment For the second synthetic experiment, we evaluate scalability across different dimensions ranging from 64 to 1024. We randomly sample a sparse time-delayed transition matrixBwhere only 10% of the entries are non-zero, generated using a randomly initialized matrix with 10% masking. For the instantaneous mixing matrixM, we use a chain structure whereM i,i−1 = 0.5fori≥ 2and all other entries are zero: M =       000 · 0 0.500 · 0 00.50 · 0 . . . . . . . . . . . . . . . 0 ·00.5 0       (47) The training hyperparameters are modified from the first experiment: learning rate increased to 1×10 −3 and the sparsity coefficient forBincreased to1×10 −5 to account for the higher dimensional setting, while maintaining 1× 10 −3 . Both experiments use identical noise characteristics (Laplace distribution with unit scale), sequence length of 1 (two time steps total), and Mean Correlation Coefficient (MCC) as the primary evaluation metric to measure the quality of source recovery while accounting for permutation ambiguity inherent in ICA methods. A.4 Semi-synthetic Experiments A.4.1 Target Concept Relation Recovery Before attempting to recover concept relationships from real-world LLM activations, and based on the proven and verified identifiability of our model, we first present a semi-synthetic setting to verify that our proposal can reveal obvious concept relations from contrastive corpus pairs. Data PreparationWe constructed two contrastive collections of texts drawn from thePiledataset [17]. We considered three types of text: legal documents, emails, and XML files. For each type, we constructed two contrastive corpora: one containing the relation of interest, and the other lacking it. Specifically, for legal text, the relation is defined by sequences beginning with “APPEALS” and ending with “AFFIRMED”; for emails, sequences start with forwarding or reply markers (e.g., dashes) and end with common words like “Subject” or “Thanks”; and for XML, sequences start with a UTF encoding label followed by tags such as “UTF-8” or “!DOCTYPE”. We hypothesized that only texts containing these structured relations would yield meaningful temporal concept patterns, and we directly tested whether the model can successfully recover such patterns. Baseline Construction Although there is no directly applicable baseline, we leveraged standard SAEs we had trained above to serve as our baseline method. Since SAEs themselves cannot capture the concept-to-concept relations, we train a regression model to find temporal relation coefficient matrices ̃ Bs by solving the following regression task: z t = P τ ̃ B τ z t−τ . Evaluation MetricWe calculate the concept recovery score by first obtaining the top fired feature index pair(i,j)related to the legal context (restricted to positions where the concepts of interest ought to fire but do not fire in the contrastive non-legal text), and then taking the corresponding entryB i,j in the aggregated temporal relation coefficient matrices. We then calculate the relation recovery score, similar to a signal-to-noise ratio, by:relation recovery score = B i,j σ(B) , where σ(B)denotes the standard deviation of matrixB. Such ratio indicates the extent that the target concept relation entry in the matrix is more significant than a random noise; the larger the score is the more significant the relation recovery. ResultsAll results are shown in Table 1, which verifies that our proposal can identify the concept- relation of interest from contrastive corpus pairs. For the demonstrated results, we used the same trained model as in the experiments on recovering relations from real-world LLM activations. 22 A.4.2 Steering Vector Recovery Except for the relationships between concepts, our model is also able to recover the concepts as current SAEs. To verify this, semi-synthetic benchmarks like SSAE [23] can offer valuable insights into concept identifiability. Following this setting, we tested whether our model can recover steering vectors from paired text. Specifically, we constructed five categories of word pairs where only a single interpretable concept changes, including gender, plurality, comparative, tense, and negation. While these changes are intuitive, ensuring the word pairs capture a clear ground-truth concept is non-trivial. Despite this challenge, our model demonstrated strong performance in identifying the underlying concept differences. Specifically, (1) the average correlation of concept differences within each category exceeded 0.86; (2) assuming one ground-truth pair, the correlation rose above 0.93; and (3) the maximum correlation within a category reached over 0.94. These results support our claim that our model can indeed recover meaningful steering vectors. The word lists for the five categories is summarized in Table 4. A.5 LLM Activation Experiments In addition to the experimental results presented in Section 5.3 of the main text, we provide here: (1) detailed settings for training and inference; (2) visualizations of training losses and sparsity values; (3) comparisons across different hyper-parameter settings, and (4) extended experiment on SAEBench with larger latent size and base language models. A.5.1 Details on the Real-world Experiments Settings Training WetrainourlinearmodelonactivationsfromthepretrainedLLM pythia-160m-deduped[5], using SAELens [6] and dictionary-learning [35] for activation extraction. Importantly, in the original implementation of dictionary-learning [35], activations are loaded using an object namedActivationBuffer, which is refreshed with new activations once a predefined consumption threshold is reached. During each refresh, a random shuffling is applied. However, this randomization disrupts the temporal structure of the LLM activations. To preserve temporal information, we modify the corresponding refresh function to disable the random shuffling. Details of this modification can be found in the examples/README.md file in our code repository. The model is trained on a total of 50 million tokens from thePiledataset [17]. To capture time- delayed influences, we consider two values ofτ, namely5, 20, as described in Eq. 3. While our main results focus on the setting withτ = 20, which offers better guarantees for capturing rich temporal semantics, this choice will be further justified in a later section of the supplementary materials. To address the distributed and uncertain nature of time-delayed dependencies—where some relations manifest over longer time spans and others over shorter ones—we aggregate theB τ matrices using max-pooling. This operation preserves any causal link that appears at any time step. We refer to the resulting aggregated matrix asaggB. Unless otherwise specified, the weight of the independence constraint on the noise term is set to α = 0.1 in Eq. 9. To better enforce sparsity in the hidden feature activations, we apply TopK filtering [8] in addition to theℓ 1 sparsity term included in the final loss function. Given the importance of feature dimensionality in Sparse Autoencoders (SAEs), we evaluate three configurations: 768 (which directly matches the LLM’s hidden size and aligns with the identifiability discussion in Section 3), 3072, and 6144—the latter follow the considerations of SAE literature. We optimize the loss function defined in Eq. 9 using the Adam optimizer with a learning rate of 0.01 and a weight decay of 0.0001. Unless otherwise specified, we use a random seed of 123; additional experiments were conducted with seeds 456 and 789 for robustness. InferenceDuring inference, our primary goal is to interpret the hidden features—particularly those activated by significant entries in the time-delayed (aggB) or instantaneous (M) relation matrices. This selection process differs from conventional SAE interpretation, which typically examines feature importance across the entire feature space by measuring activation strength for a given prompt. In contrast, our method emphasizes the relational structure of features—how they connect to form semantic transitions. We aim to understand the meaning of each feature by analyzing how both types of relations (instantaneous and time-delayed) link features together. 23 Table 4: Summary of word pairs in the five categories CategoriesPairs Gendered Pairs(male, female), (actor, actress), (prince, princess), (king, queen), (god, goddess), (wizard, witch), (boy, girl), (man, woman), (father, mother), (son, daughter), (brother, sister), (husband, wife), (nephew, niece), (uncle, aunt), (gentleman, lady), (monk, nun), (grandfather, grandmother), (lord, lady), (spokesman, spokeswoman) Plurality Pairs(cat, cats), (dog, dogs), (apple, apples), (box, boxes), (child, children), (book, books), (car, cars), (tree, trees), (house, houses), (bird, birds), (chair, chairs), (table, tables), (shoe, shoes), (shirt, shirts), (sock, socks), (cup, cups), (plate, plates), (pen, pens), (bag, bags), (door, doors), (window, windows), (lamp, lamps), (phone, phones), (laptop, laptops), (flower, flowers), (cloud, clouds), (mountain, mountains), (river, rivers), (lake, lakes), (egg, eggs), (grape, grapes), (potato, potatoes), (tomato, tomatoes), (bus, buses), (kiss, kisses), (wish, wishes), (match, matches), (dish, dishes), (baby, babies), (lady, ladies), (city, cities), (party, parties), (family, families), (knife, knives), (leaf, leaves), (wolf, wolves) Comparative Pairs(fast, faster), (tall, taller), (small, smaller), (old, older), (young, younger), (short, shorter), (long, longer), (high, higher), (low, lower), (strong, stronger), (weak, weaker), (rich, richer), (poor, poorer), (hard, harder), (soft, softer), (loud, louder), (bright, brighter), (dark, darker), (clean, cleaner), (easy, easier), (happy, happier), (cool, cooler), (deep, deeper), (wide, wider), (narrow, narrower), (thick, thicker), (thin, thinner), (heavy, heavier), (light, lighter), (safe, safer), (cheap, cheaper) Tense Change Pairs(walk, walked), (run, ran), (eat, ate), (go, went), (write, wrote), (speak, spoke), (drink, drank), (drive, drove), (read, read), (sleep, slept), (sit, sat), (stand, stood), (fly, flew), (begin, began), (buy, bought), (bring, brought), (build, built), (catch, caught), (choose, chose), (come, came), (cut, cut), (dig, dug), (do, did), (draw, drew), (fall, fell), (feel, felt), (find, found), (get, got), (give, gave), (have, had), (hear, heard), (hide, hid), (hold, held), (keep, kept), (know, knew), (leave, left), (lose, lost), (make, made), (meet, met), (pay, paid), (ride, rode), (say, said), (see, saw), (sell, sold), (send, sent), (sing, sang), (sit, sat), (teach, taught), (think, thought) Negative Prefix Pairs(possible, impossible), (legal, illegal), (visible, invisible), (complete, incomplete), (fair, unfair), (known, unknown), (fortunate, unfortunate), (able, unable), (happy, unhappy), (certain, uncertain), (clear, unclear), (real, unreal), (necessary, unnecessary), (likely, unlikely), (available, unavailable), (comfortable, uncomfortable), (pleasant, unpleasant), (reli- able, unreliable), (acceptable, unacceptable), (usual, unusual), (wanted, unwanted), (expected, unexpected), (connected, disconnected), (under- stood, misunderstood), (placed, misplaced) Our feature selection process involves the following steps: First, we select the top 100 coordinates (we also tried 200, though 100 proved sufficient) from eitheraggBorM, and extract the corresponding feature dimensions. Next, we generate 10,000 prompts from theEleutherAI/piledataset, convert them into token streams, and feed them into the trained model to observe how each token responds to each selected concept feature. Finally, for each selected feature, we collect the tokens whose activations exceed a threshold (set to 3.0), along with their corresponding prompts. These tokens are viewed as consequences of the activation of the given feature, while the associated prompts serve as contexts that reveal the token and therefore, feature’s meaning. 24 A.5.2 Visualizations of Training Loss and Sparsity Metrics Here, we compare the training dynamics across different settings by examining the reconstruction loss (Eq. 4), the independence of the estimated noise term (Eq. 7), and the sparsity of both time-delayed and instantaneous relations (Eq. 8). The comparisons are made with respect to variations in hidden feature dimensionality, the sparsity weight on learned relations (i.e.,βin Eq.9), the temporal coverage of delayed relations, as determined byτ ∈ 5, 20, and the parameter of the TopK filtering of the hidden features. We begin by examining the training dynamics withτ = 5, comparing different settings of the sparsity constraint (β ∈ 0.1, 0.01), TopK values (0, 25, 100, where0indicates that TopK is disabled), and hidden dimensions (z_dim ∈ 768, 3072, 6144). The corresponding results are presented in Figure 8. It is worth noting that certain unstable training batches occasionally impact the overall stability during training. However, since most of the configurations eventually converge and our primary interest lies in the behavior at convergence, we cap the y-axis at 5.0 to improve the clarity of the visualizations. Our key findings are summarized below. Figure 8: Dynamics of reconstruction loss, noise independence, and time-delayed and instantaneous relations sparsity with settingτto 5. The x-axis starts at 5M tokens, and the y-axis values are capped at 5 to enhance visualization clarity. Insights on the Number of Training Tokens and the Impact of Hidden Feature Dimensionality From Figure 8, we observe that 50M training tokens are sufficient for convergence across all settings when the hidden feature dimensionality is greater than 768—specifically, at 3072 and 6144. From the subplots in the first column, it is evident that higher-dimensional hidden features provide greater stability during training. This increased robustness likely helps mitigate the effects of noisy or unstable batches within the token stream, leading to more consistent optimization of the objective. Consequently, in the subsequent case studies, including Section 5.3 of the main content, we focus on the settings with hidden dimensionalities of 3072 and 6144. Impact of TopK Filtering The training process is in general more stable after applying TopK filtering. More specifically, comparing the sub-diagrams from the first row in Figure 8 to the second 25 and the third rows, we can see that the decrease of the reconstruction error is significantly less effected by some of the token batches, especially, for the setting when feature dimension is set to 3072 or 6144. Impact of Sparsity Strength In general, whenβsets to 0.01 (pay attention to the round marker in Figure 8 as oppose to star marker), both the time-delayed and the instantaneous relations show lower sparsity compared with a stronger sparsity weight. This might be due to a weaker constrain that results a better optimization results, while the stronger one might increase the sharpness of the potential solution space. This also indicates that 0.01 is sufficient for achieving our goal of sparse causal relations in our model. A.5.3 Sensitivity and Ablation Studies Sensitivity Study onαandβWe conducted additional comparisons withβ = 0,0.001,0.005, 0.05,1.0andα = 0,0.001,0.01to cover a broader hyperparameter range, usingτ = 5and feature dimension 3072. The results are shown in the two tables below, with our selected setting in bold text. The tables highlight that (1) concept relationships are inherently sparse, while a largeβdisrupts optimization, and (2) α has a stronger effect, with 0.1 being a well-balanced choice. Table 5: Performance comparison under different values of α α0.00.0010.010.11.0 Reconstruction Loss↓0.02270.01910.01180.01280.1121 Independence Loss↓4.38492.45720.39100.14480.5252 B s Sparsity (L1)↓0.00120.00070.00180.00070.0058 M Sparsity (L1)↓0.00020.00010.00020.00010.0009 Table 6: Performance comparison under different values of β β0.00.0010.0050.010.050.11.0 Reconstruction Loss↓0.01260.01260.01260.01280.01260.01280.8950 Independence Loss↓0.15220.14960.15040.14480.15500.15823.7682 B s Sparsity (L1)↓0.00030.00050.00050.00070.00070.00130.0053 M Sparsity (L1)↓0.00010.00010.00010.00010.00010.00020.0007 Ablation Studies on Bias TermsFinally, we explore whether there will be potential performance improvement when additional bias terms added to our linear encoder and decoder functions in equation 5, to give a more complete justification of our implementation. We also compared these two settings in the real LLM activations (feature dimension=3072,α = 0.1,β = 0.01,τ = 5). The results shown in the table below indicate that the flexibility gain of the bias terms is not significant in our model. A.5.4 More Showcases on the Recovered Concepts and Relations from LLM Activations In addition to the examples presented in Section 5.3 of the main text, we provide additional cases here to further illustrate the diversity and interpretability of the recovered concepts and relations, highlighting how they manifest across different domains and contexts. Time-delayed Causal Relations Table 8 showcases further examples of time-delayed causal relations extracted from LLM activations by using our model, with the same setting that we have shown in the main content Table 3. Many of these reflect the structured nature of legal, technical, and encyclopedic language. For instance, feature 2341 (e.g., “Orders/mandate in appellate judgment”) is linked to feature 2592 (e.g., “decision” and “observance”), revealing how commands or mandates precede judicial conclusions in legal discourse. Similarly, technical logs such as feature 1856 (error messages) anticipate subsequent failure indicators (feature 1833, “FAILURE”), reflecting typical diagnostic progressions in computing contexts. 26 Table 7: Ablation comparisons on the bias terms for the encoder and decoder Metric (real-world)Without BiasWith Bias Reconstruction Loss0.01290.0129 Independence Loss0.14520.1457 B Sparsity (L1)0.00070.0007 M Sparsity (L1)0.00010.0001 Table 8: More examples of the discovered time-delayed relations with contextual explanations. From_IDFrom_ExplanationTo_IDTo_ExplanationContext 2341Orders/mandate in appel- late judgment 2592“decision” and “obser- vance” Legal judgment labels 1856Technical error message1833“FAILURE”Describes the fail- ure reason 2579“APPEALS”2592Court/party geographical location or case handler Appeals in legal documents 1833Ajaxrequestheader: ‘application’, “function” (type, URL, status) 2390Syntax and functions like “each” Ajax request func- tion labels 1856Volume number in case citation 2341 “mandamus” from “writ of mandamus” Summary of case docket 2100Page number where case starts 2579“APPEALS” Casecitation structure 790Wikipedia ship owner name 2730“ship”Wikipedia entity tagging 1825Emailforward/reply dashes 1641Commonwordslike “subject”, “thanks” Email metadata and signals 1551Name + “Wynne” (e.g., “John Wynne”) 2311“sat” (in Parliament)Wikipediabios for people named Wynne 1124UTF encoding label1657Tagslike“UTF-8”, “!DOCTYPE” XML document structure 1675HTML starting signal “<” 2583Common HTML tags like “a”, “pre” HTML document recognition 1303“default” keyword2623Follows “default” (e.g., “context”, “_”) Generic technical documentation 1895“Q”, “Re”, “forward”1203“thanks”Email or Q&A style messages 2708Personal pronouns (“I”, “you”) 2584 Tenseindicatorslike “will”, “have” Human language facts Notably, semantic connections span heterogeneous domains. Wikipedia entity labeling (e.g., ship names and their categories) and web document structures (e.g., UTF labels leading to encoding declarations) both reveal meaningful temporal dependencies that LLMs internalize. The relation between personal pronouns (feature 2708, “I”, “you”) and tense markers (feature 2584, “will”, “have”) further illustrates how human language patterns are temporally structured, even over several tokens. These cases reinforce the model’s capacity to track and anticipate semantic developments over time in a content- and domain-aware manner. 27 Table 9: More examples of the discovered instantaneous relations with contextual explanations. From_IDFrom_ExplanationTo_IDTo_ExplanationContext 2341Labels ‘license’ in com- ment of "license control prc server" 1856Labels ‘license’ in both comment and command line Bash script con- text 2592Labels ‘research’227Labels ‘research’ with nearby nouns like “pro- gram” Academic texts 2592Labels ‘magazine’80Labels ‘magazine’ and common related nouns like “teenage”, “blogs” Academic texts 2592Labels ‘module’2208Labels both ‘module’ and ‘exports’ as in “mod- ule ̇ exports” JavaScript code 2623Labels ‘https’227Labels both ‘https’ and ‘://’ URLs Instantaneous Causal Relations Table 9 provides more instances of instantaneous relationships, highlighting features that are co-activated within the same context window. In the domain of Bash scripting, we observe co-occurrence between licensing-related comments (feature 2341) and execution commands (feature 1856), showing how LLMs jointly encode comment semantics and imperative script logic. In academic and technical domains, common conceptual pairs such as “research” and “program”, or “magazine” and related digital terms like “blogs” or “websites”, are represented together (e.g., features 2592 and 227 or 80). These examples suggest that the model forms composite concepts out of frequently co-occurring terms, such as in publication metadata or content descriptions. In programming contexts, the instantaneous link between “module” (feature 2592) and the JavaScript construct “module.exports” (feature 2208) demonstrates that the model learns the tight coupling between programming keywords. Likewise, the relation between “https” (feature 2623) and its full syntactic pattern “https://” (feature 227) reflects how structured URL formats are stored as unified units in the model’s activation space. Together, these examples demonstrate the model’s ability to encode concise, domain-specific composite structures through simultaneous feature activation. Notes on the Results Following our presentation of the causal relations recovered from LLM activations, we clarify several key points regarding the interpretation of these results. First, due to variations in tokenization strategies across different corpora, many identified tokens in a given sentence may correspond only to partial words. This issue can be exacerbated by noise introduced during data collection processes such as OCR or web crawling. To address this, we rely on human judgment and linguistic intuition to infer and annotate the complete underlying word, ensuring that the labeling remains accurate and avoids overextending to unrelated tokens. Second, the recovered time-delayed relations we present may be somewhat semantically constrained, as the clearest relations tend to align with explicit syntactic structures. Many of our examples—such as those from code snippets or legal documents—convey semantic information through formal syntax. While these cases are illustrative, we view the discovery of more abstract, syntactically diffuse relations in general language text as an important direction for future work. It is also important to note that the examples we present were not cherry-picked; rather, they are representative cases that naturally appear throughout the dataset and were surfaced by our method. These relational patterns would not be easily discoverable using sparse autoencoders (SAEs), as SAEs do not consider interactions between features. Finally, we observe that feature pairs exhibiting strong causal relations tend to be activated under highly similar prompt conditions, indicating that these features are contextually aligned and often co-occur within the same linguistic environments. 28 Table 10: Gemma-2-2B instantaneous-relation-only model on SAEBench with different latent sizes. Latent SizeRecon. Loss↓Sparse Probing Acc.↑Absorption↓Autointerp↑ 6k0.01080.67360.01390.6883 16k0.00590.69180.01670.7117 Table 11: Absorption statistics with extended training budgets. ModelFull Absorption FractionAbsorption Fraction# Split Features Pythia-160M-16k6.471× 10 −2 9.185× 10 −3 1.043 Gemma-2-2B-16k1.289× 10 −2 3.794× 10 −4 1.269 A.5.5 Additional SAEBench Results on Larger Latent Sizes and Models Following the same dataset and training protocol as in the main experiments, we trained the simplified instantaneous-relation-only variant with 16k latents on Gemma-2-2B and compared it to our 6k-latent configuration. As shown in Table 10, the 16k model reduces reconstruction loss (0.0059 vs. 0.0108) and slightly improves sparse probing top-1 accuracy (0.6918 vs. 0.6736). Its absorption score is modestly higher (0.0167 vs. 0.0139) but remains small, and the Autointerp score increases (0.7117 vs. 0.6883). Overall, performance on SAEBench metrics remains at a similar level across latent sizes. We also extended the training scale to 500M tokens for Pythia-160M and to 300M tokens for Gemma- 2-2B, both with 16k latents. In both cases we observed very small absorption fractions, and the mean number of split features remained close to one, indicating minimal feature splitting. Summary statistics are reported in Table 11. A.5.6 Statistical Testing and Absorption Analysis To strengthen the empirical findings, we additionally performed statistical testing to assess the equivalence of reconstruction losses and the robustness of absorption scores. Using 100 samples per method (N = 300), any shift≥ 0.00127across groups would be detected with power≥ 0.8. Pairwise Welch–TOST and Hodges–Lehmann tests with∆ = 0.001confirmed equivalence: all 90% confidence intervals lay within[−∆, ∆], demonstrating statistical equivalence atα = 0.05among the three methods. For absorption, although rigorous hypothesis testing is challenging due to the very small magnitudes observed, we collected a sufficiently large number of samples (≥ 200) to establish confidence intervals. The mean and 95% confidence intervals were0.0135± 0.0002for the 6k model and0.0136± 0.0002 for the 16k model, which are more than sufficient to demonstrate negligible absorption in practice. Additionally the signal-to-noise ratio (20.02 for our method vs. 2.39 for the SAE baseline) already indicates a strong margin. Such a large difference is unlikely to arise from random noise. A.5.7 Preliminary Investigation with Time Lag up to 100 To address the potential limitation that a fixed value ofτmay be overly restrictive in capturing the rich and diverse semantics of real-world contexts, we explore a more flexible approach. Specifi- cally, different types of concept-relations may require varying numbers of steps to be successfully recovered. Furthermore, even for a single concept-relation, stable recovery across different contexts may necessitate a range of steps rather than a single fixed value. In light of these considerations, in addition to the recovered relations shown in Table 3, Table 8, and Table 9, we present the relations captured within 100 steps, grouping them into bins of sizes 10, 20, and 50. This binning naturally categorizes the relations of interest, facilitating further analysis and discussion. The increased flexibility provided by a larger time lag allows us to recover a greater number of concept-relations. For example, we can recover relations such as “monument”→“from” and “seek” →“opportunity”. Interestingly, increasing the time lag not only allows longer-range relations to be captured but also enables previously overlooked relations to be discovered, as this flexibility improves identification of concepts entangled in the relation. To better illustrate the relations recovered with a 29 larger time lag, we are preparing a web demo, which will soon be included in the code repository once it is ready. To better illustrate the relations recovered with a larger time lag, we are preparing a web demo, which will soon be included in the code repository once it is ready. However, our primary contribution is to demonstrate that our model can recover relation-concepts more effectively than existing SAEs, addressing a gap that is currently missing but crucial for advancing LLM interpretability. A broader and more systematic study of this phenomenon is left to future work. A.5.8 Addition Experiments with Pretrained SAE As ablation study we additionally construct our linear model using the pretrained Sparse Autoencoder (SAE) from Gemma Scope [28] on the Gemma 2 2B model [51]. To enable feasible qualitative evaluation, we selected the top2, 034most frequently activated features from the commonly used SAEgemma-2-2b/20-gemmascope-res-16kusing the SAELens package [6]. We trained our linear model on 5 million tokens from the Pile [17] dataset. Since time-delayed influences may occur with variable time lags, we set a sufficiently large value forτin Eq. 3. In practice, we useτ ≤ 20and aggregate the time-delayed matricesB τ using max-pooling—that is, if a causal link exists in any of the time-lagged matricesB τ , we consider that link to be present in the aggregated causal structure. Case Studies Our analysis reveals rich causal structures among programming-related concepts in LLM activations. We examine both time-delayed and instantaneous causal relationships, providing insights into how the model processes and generates code-related content. Time-Delayed Causal Relations We identified several meaningful time-delayed causal relation- ships in programming contexts. A prominent example is the causal link from a concept representing "function definitions and related code structure in programming languages" to a concept representing "variable definitions and data types in programming contexts." This relationship aligns with the natural structure of programming, where global function definitions often precede and influence local variable declarations or data structures. When the model processes or generates function definitions, it subsequently activates concepts related to the variables and data types that would appear within those functions. Additional time-delayed relationships include causal links from "programming language syntax specifications" to "code implementation details" and from "algorithmic problem statements" to "solution implementation structures." These relationships demonstrate how the model captures the sequential dependencies inherent in programming tasks, where understanding of requirements or specifications precedes implementation details. Instantaneous Causal Relations Our method also reveals interesting instantaneous causal re- lationships that occur within the same time step. We observe a strong instantaneous causal link between a concept representing "specific formatting and notation elements commonly used in math- ematical expressions or programming syntax" and a concept representing "mathematical symbols and expressions in technical content." This relationship indicates that the model simultaneously processes formatting rules and the mathematical content they structure, reflecting how these aspects are intrinsically connected in code representation. We also identified instantaneous causal relationships between "programming language keywords" and "syntax highlighting patterns," as well as between "code indentation patterns" and "block structure delineation." These instantaneous relationships capture the syntactic constraints that operate simultaneously within programming languages, where certain elements must co-occur for the code to be well-formed. These case studies demonstrate that our method can extract meaningful causal relationships from real LLM activations, providing insights into how these models process and generate structured content like code. The identified causal structures align with the logical and syntactic relationships one would expect in programming contexts, validating the effectiveness of our approach for interpretability research. 30 A.6 Compute Resources and Code All experiments were conducted on a computing cluster equipped with NVIDIA L40 GPUs. The synthetic verification experiments were run using 16 CPU cores, 32 GB of memory, and a single GPU. The Jacobian complexity experiment was executed on CPU only, as the computation did not fit within GPU VRAM; to avoid out-of-memory (OOM) errors, 32 CPU cores and 400 GB of memory were allocated. The scaled-up synthetic experiment with the linear model used 32 CPU cores, 64 GB of memory, and one GPU. The large language model (LLM) activation experiment was performed using 16 CPU cores, 15 GB of memory, and a single GPU. The code that can replicate the main experiments presented in our paper can be accessed viahttps: //github.com/xiangchensong/temp-inst-sae A.7 Limitations We acknowledge certain limitations of our work. The linear approximation, while computationally efficient and theoretically grounded, may not capture all nonlinear interactions present in LLM acti- vations. Future work should explore extending our framework to incorporate bounded nonlinearities while maintaining computational tractability. Additionally, developing methods to automatically interpret the discovered causal structures in terms of human-understandable concepts remains a challenge. Our method also assumes a specific form of temporal dependency that might not fully capture the long-range dependencies that LLMs can handle. The current formulation is limited to first-order temporal dependencies, and extending this to higher-order dependencies would increase computational complexity. Lastly, tokenization has been shown to critically affect LLM identifiability during our evaluations, even though it is not inherently part of LLM interpretation methods. We emphasize the importance of choosing a tokenization strategy that preserves semantic information and maximizes the effectiveness of LLM interpretation approaches. A.8 Societal Impacts Our interpretability approach can improve transparency, support alignment interventions, facilitate debugging and bias detection, advance scientific understanding of causal representations, and inform educational tools that raise AI literacy. At the same time, deeper insight into model internals may enable malicious manipulation, create misplaced confidence in safety tools, widen resource disparities, expose private information from training data, and distract attention from broader social and governance measures. Future work should include collaboration with ethicists, social scientists, and policy experts to guide responsible use. 31 NeurIPS Paper Checklist 1. Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: Major claims are described in abstract and emphasized in introduction. Guidelines: •The answer NA means that the abstract and introduction do not include the claims made in the paper. •The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. •The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. •It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: The limitation is discussed in Appendix A.7. Guidelines: •The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. • The authors are encouraged to create a separate "Limitations" section in their paper. •The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. •The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. • The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. • The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. •If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. •While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren’t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an impor- tant role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory assumptions and proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] 32 Justification: All theorems are supported with complete and correct proof in both main text and Appendix with assumptions clearly presented in main text. Guidelines: • The answer NA means that the paper does not include theoretical results. • All the theorems, formulas, and proofs in the paper should be numbered and cross- referenced. •All assumptions should be clearly stated or referenced in the statement of any theorems. •The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. •Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. • Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental result reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main ex- perimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: A detailed description of the experimental setup is provided in Appendix A.3 for the synthetic experiments and in Appendix A.5 for the LLM activation experiments. The codebase required to reproduce the experiments is included in the supplementary material. Guidelines: • The answer NA means that the paper does not include experiments. • If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. • If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. •Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. •While NeurIPS does not require releasing code, the conference does require all submis- sions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b)If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c)If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code 33 Question: Does the paper provide open access to the data and code, with sufficient instruc- tions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The model data used in our experiments are either from publicly available datasets or can be generated using the codebase provided in the supplementary materials. The main experimental results can be reproduced using this submitted codebase. Guidelines: • The answer NA means that paper does not include experiments requiring code. •Please see the NeurIPS code and data submission guidelines (https://nips.c/ public/guides/CodeSubmissionPolicy) for more details. •While we encourage the release of code and data, we understand that this might not be possible, so “No” is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). • The instructions should contain the exact command and environment needed to run to reproduce the results. See the NeurIPS code and data submission guidelines (https: //nips.c/public/guides/CodeSubmissionPolicy) for more details. •The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. •The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. •At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). • Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental setting/details Question: Does the paper specify all the training and test details (e.g., data splits, hyper- parameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: Detailed settings are provided in the Appendix A.3 and A.5. Guidelines: • The answer NA means that the paper does not include experiments. •The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. • The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment statistical significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: The mean value of multiple runs and with std plotted with error bars. Guidelines: • The answer NA means that the paper does not include experiments. •The authors should answer "Yes" if the results are accompanied by error bars, confi- dence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. •The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). 34 •The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) • The assumptions made should be given (e.g., Normally distributed errors). • It should be clear whether the error bar is the standard deviation or the standard error of the mean. •It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. • For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). • If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Yes the computation resources use in the experiment is provided in Ap- pendix A.6. Guidelines: • The answer NA means that the paper does not include experiments. •The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. •The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. • The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn’t make it into the paper). 9. Code of ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.c/public/EthicsGuidelines? Answer: [Yes] Justification: authors have reviewed and conform the NeurIPS Code of Ethics. Guidelines: •The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics. •If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. •The authors should make sure to preserve anonymity (e.g., if there is a special consid- eration due to laws or regulations in their jurisdiction). 10. Broader impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [Yes] Justification: Broader impacts are discussed in a seperate section in Appendix A.8 Guidelines: • The answer NA means that there is no societal impact of the work performed. • If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. 35 •Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. •The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. •The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. • If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML). 11. Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: the paper poses no such risks. Guidelines: • The answer NA means that the paper poses no such risks. • Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. •Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. • We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort. 12. Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: The assets including baseline codes and the dataset and models are explicitly mentioned and credited. Guidelines: • The answer NA means that the paper does not use existing assets. • The authors should cite the original paper that produced the code package or dataset. • The authors should state which version of the asset is used and, if possible, include a URL. • The name of the license (e.g., C-BY 4.0) should be included for each asset. • For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. 36 •If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets,paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. • For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. •If this information is not available online, the authors are encouraged to reach out to the asset’s creators. 13. New assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [Yes] Justification: Detailed instructions have been provided along with the codebase. Guidelines: • The answer NA means that the paper does not release new assets. • Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. •The paper should discuss whether and how consent was obtained from people whose asset is used. • At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and research with human subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: the paper does not involve crowdsourcing nor research with human subjects. Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. • Including this information in the supplemental material is fine, but if the main contribu- tion of the paper involves human subjects, then as much detail as possible should be included in the main paper. •According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15.Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: the paper does not involve crowdsourcing nor research with human subjects. Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. •Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. 37 •We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution. • For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review. 16. Declaration of LLM usage Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Note that if the LLM is used only for writing, editing, or formatting purposes and does not impact the core methodology, scientific rigorousness, or originality of the research, declaration is not required. Answer: [NA] . Justification: the core method development in this research does not involve LLMs as any important, original, or non-standard components. Guidelines: •The answer NA means that the core method development in this research does not involve LLMs as any important, original, or non-standard components. • Please refer to our LLM policy (https://neurips.c/Conferences/2025/LLM) for what should or should not be described. 38