← Back to papers

Paper deep dive

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

Panagiotis Koromilas, Andreas D. Demou, James Oldfield, Yannis Panagakis, Mihalis Nicolaou

Year: 2026Venue: arXiv preprintArea: Mechanistic Interp.Type: EmpiricalEmbeddings: 79

Models: GPT-2 Small, Gemma-2-2B, Pythia-1.4B, Pythia-410M

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 96%

Last extracted: 3/11/2026, 12:31:17 AM

Summary

PolySAE is a novel sparse autoencoder architecture that extends standard linear SAEs with polynomial decoding (quadratic and cubic terms) to model non-linear feature interactions. By using low-rank tensor factorization on a shared projection subspace, it captures compositional structures like morphological binding and phrasal composition with minimal parameter overhead (3% on GPT-2), while maintaining interpretability and improving probing F1 scores by approximately 8%.

Entities (5)

PolySAE · model-architecture · 100%Sparse Autoencoder · model-architecture · 100%Mechanistic Interpretability · research-field · 98%GPT-2 · language-model · 95%Low-rank tensor factorization · technique · 95%

Relation Signals (3)

PolySAE extends Sparse Autoencoder

confidence 100% · PolySAE, which extends the SAE decoder with higher-order terms

PolySAE improves Probing F1

confidence 95% · PolySAE achieves an average improvement of approximately 8% in probing F1

PolySAE uses Low-rank tensor factorization

confidence 95% · Through low-rank tensor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature interactions

Cypher Suggestions (2)

Find all architectures that extend the Sparse Autoencoder model. · confidence 90% · unvalidated

MATCH (a:Model)-[:EXTENDS]->(b:Model {name: 'Sparse Autoencoder'}) RETURN a.name

Identify techniques used by PolySAE to improve interpretability. · confidence 90% · unvalidated

MATCH (p:Model {name: 'PolySAE'})-[:USES]->(t:Technique) RETURN t.name

Abstract

Abstract:Sparse autoencoders (SAEs) have emerged as a promising method for interpreting neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume that features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether "Starbucks" arises from the composition of "star" and "coffee" features or merely their co-occurrence. This forces SAEs to allocate monolithic features for compound concepts rather than decomposing them into interpretable constituents. We introduce PolySAE, which extends the SAE decoder with higher-order terms to model feature interactions while preserving the linear encoder essential for interpretability. Through low-rank tensor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature interactions with small parameter overhead (3% on GPT2). Across four language models and three SAE variants, PolySAE achieves an average improvement of approximately 8% in probing F1 while maintaining comparable reconstruction error, and produces 2-10$\times$ larger Wasserstein distances between class-conditional feature distributions. Critically, learned interaction weights exhibit negligible correlation with co-occurrence frequency ($r = 0.06$ vs. $r = 0.82$ for SAE feature covariance), suggesting that polynomial terms capture compositional structure, such as morphological binding and phrasal composition, largely independent of surface statistics.

Tags

ai-safety (imported, 100%)empirical (suggested, 88%)mechanistic-interp (suggested, 92%)

Links

PDF not stored locally. Use the link above to view on the source site.

Full Text

78,245 characters extracted from source content.

Expand or collapse full text

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Panagiotis Koromilas 1 2 Andreas D. Demou 1 James Oldfield 3 Yannis Panagakis 2 4 Mihalis A. Nicolaou 5 1 Abstract Sparse autoencoders (SAEs) have emerged as a promising method for interpreting neural network representations by decomposing activations into sparse combinations of dictionary atoms. How- ever, SAEs assume that features combine addi- tively through linear reconstruction, an assump- tion that cannot capture compositional structure: linear models cannot distinguish whether “Star- bucks” arises from the composition of “star” and “coffee” features or merely their co-occurrence. This forces SAEs to allocate monolithic features for compound concepts rather than decomposing them into interpretable constituents. We intro- duce PolySAE, which extends the SAE decoder with higher-order terms to model feature interac- tions while preserving the linear encoder essen- tial for interpretability. Through low-rank ten- sor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature in- teractions with small parameter overhead (3% on GPT2). Across four language models and three SAE variants, PolySAE achieves an average im- provement of approximately 8% in probing F1 while maintaining comparable reconstruction er- ror, and produces 2–10×larger Wasserstein dis- tances between class-conditional feature distri- butions. Critically, learned interaction weights exhibit negligible correlation with co-occurrence frequency (r = 0.06vs.r = 0.82for SAE feature covariance), suggesting that polynomial terms capture compositional structure, such as morpho- logical binding and phrasal composition, largely independent of surface statistics. 1 The Cyprus Institute 2 University of Athens 3 University of Oxford 4 Archimedes AI/Athena Research Center 5 University of Cyprus. Correspondence to: Panagiotis Koromilas <pakoromi- las@di.uoa.gr>. Preprint. February 3, 2026. (a) Additive Interactions Famous Beverage Brand Star (z j ) Coffee (z i ) “Star Coffee” Constrained to the linear span of features (b) Multiplicative Interactions Famous Beverage Brand z i · z j Starbucks Interaction lifts into new semantic dimensions Figure 1. Semantic Dimension Expansion via Feature Inter- action. Consider two semantic directions—Famous and Bever- age—and their associated learned features Star and Coffee. (a) Ad- ditive interactions yield co-occurrence semantics that remain in the original feature span. (b) Multiplicative interactions enable representations to escape this subspace viaz i · z j , lifting into or- thogonal dimensions (Brand) to capture emergent concepts like Starbucks.“Starbucks” example from Table 3. 1. Introduction As AI systems are increasingly deployed in real-world do- mains, ensuring their safety and reliability has become a critical challenge (Amodei et al., 2016; Hendrycks et al., 2021; Bengio et al., 2025). Developing interpretable models offers a promising path towards aligning AI with human values: understanding why a model produces a given output enables us to (i) monitor its reasoning (Lindsey et al., 2025), (i) debug failure modes (Wong et al., 2021), and (i) steer away from unwanted behavior (Rimsky et al., 2024). Mech- anistic interpretability pursues this agenda at the level of neural network internals (Bereska & Gavves, 2024), aiming to uncover interpretable features and circuits within a model and thereby provide principled insights into its behavior. Sparse Autoencoders (SAEs), grounded in the principles of sparse dictionary learning, have emerged as a leading tool for mechanistic interpretability. SAEs decompose neural network activations to recover human-interpretable features that models typically represent in superposition—encoded in overlapping directions due to limited representational ca- pacity (Elhage et al., 2022). This framework has been shown to uncover safety-relevant concepts such as deception, bias, and harmful content, enabling targeted interventions that predictably steer model behavior (Templeton et al., 2024). 1 arXiv:2602.01322v1 [cs.LG] 1 Feb 2026 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Investing.com --- Philippines SAE features PolySAE features Sparse encoder SAE decoder (1) Extract sparse features Top-activating tokens (2) Reconstruct the activations, modeling interactions between learned features ... Transformer language model (first L layers) Residual stream PolySAE decoder Pairwise interactions Triplet-wise interactions "ing", "training", "running" "ing", "ting" "stock", "market" "invest", "investing" Figure 2. An overview of PolySAE: (1) sparse latent features are first extracted with a standard SAE encoder. (2) Activations in the residual stream are then reconstructed by modeling 2nd- and 3rd-order interactions in addition to the standard linear component. The example “Investing.com — Philippines stocks were higher after” comes from Table 4. However, recent work has highlighted fundamental lim- itations of the SAE paradigm due to their reliance on the “strong” linear representation hypothesis (Engels et al., 2025; Csordás et al., 2024). Standard SAEs reconstruct activations as weighted sums of independent features, expressing each activation as a linear combination where features contribute additively. This linearity assumption raises a fundamental question: what level of abstraction do learned features natu- rally capture? The answer has direct implications for mech- anistic interpretability. If features truly combine linearly, we would expect individual dictionary atoms to represent atomic components, such as morphemes, simple concepts, or basic semantic primitives, that combine through superpo- sition to form complex expressions. Such atomic features would enable transparent circuit analysis and precise inter- ventions on elemental building blocks of meaning. Yet linguistic theory demonstrates that composition operates non-linearly across multiple levels of language structure. Morphologically, “administrators” is not simply the sum of stem and suffix; the combination produces a distinct lex- ical item with specific syntactic and semantic properties (Haspelmath & Sims, 2013). Semantically, phrasal mean- ings such as “kick the bucket’ or proper names like “Star- bucks” (Figure 1) exhibit emergent properties irreducible to their parts (Partee, 1995). Vanilla SAEs demonstrably succeed at many interpretability tasks, yet their linear re- construction mechanism cannot, in principle, represent non- linear composition. Without explicit interaction mecha- nisms, SAEs cannot simultaneously represent atomic fea- tures and their non-linear compositions. When “Starbucks” appears in context, a linear model must either (i) allocate a dedicated feature for this compositional entity, sacrificing atomicity, or (i) represent it through separate “star” and “coffee” features that cannot distinguish this specific compo- sition from mere co-occurrence. Ideally, SAEs with sufficient capacity can learn features at multiple levels of abstraction simultaneously (such as mor- phemes, words, phrases, and compositional expressions) co- existing as independent atoms in an overcomplete dictionary. While this leads to good reconstruction and intervention, it fundamentally limits our understanding: we cannot decom- pose “Starbucks” into its constituents, cannot trace how “administrators” emerge from stem and suffix binding, and cannot distinguish compositional phrases from accidental co-occurrence. The conflation of atomic and compositional features obscures the mechanisms by which networks build complex representations from simpler parts. This problem connects to a longstanding debate in cognitive science about systematic compositionality in neural repre- sentations (Fodor & Pylyshyn, 1988). Smolensky (1990) proposed tensor product variable binding as a solution: fea- tures bind through multilinear interactions rather than lin- ear superposition, allowing networks to maintain atomic constituents while representing their combinations. In this framework, “administrators” would be represented not as a single indivisible feature, but as an explicit composition of stem and suffix, where the tensor product captures the bind- ing operation. For interpretability of modern LLMs, this principle is critical: to understand how networks compose meaning, our tools must themselves model compositional structure faithfully. However, explicit tensor products are computationally prohibitive for overcomplete sparse codes with tens of thousands of features, requiring methods that capture multilinear interactions while remaining tractable. In this work, we introduce the Polynomial Sparse Au- toencoder (PolySAE) (Figure 2), a sparse autoencoder that extends vanilla SAEs with explicit feature interaction terms. PolySAE preserves a linear encoder for interpretabil- ity while extending the decoder with quadratic and cubic terms that model pairwise and triple feature interactions. Through low-rank tensor factorization on a shared projec- tion subspace, PolySAE captures compositional structure in 2 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding a tractable manner, adding a small parameter overhead (3% for GPT2 small). Critically, PolySAE is a strict generaliza- tion of standard SAEs that enables capturing multiplicative (non-additive) concept interactions. Setting interaction co- efficients to zero recovers vanilla SAE behavior, allowing PolySAE to be readily applied to existing SAE variants, including TopK (Gao et al., 2025), BatchTopK (Bussmann et al., 2024), and Matryoshka (Bussmann et al., 2025b). We summarize below our four main contributions: C1. We introduce PolySAE, a sparse autoencoder with a polynomial decoder that explicitly models quadratic and cubic feature interactions while preserving a linear en- coder for interpretability. Through low-rank tensor factor- ization, PolySAE adds small parameter overhead (3% for GPT2 small) and can be readily applied to existing SAE variants (TopK, BatchTopK, Matryoshka). C2. Across four language models of different scales (GPT- 2 Small, Pythia-410M/1.4B, Gemma-2-2B) and three spar- sification strategies, PolySAE achieves an average 8% F1 improvement, while maintaining comparable reconstruc- tion error. C3. PolySAE produces 2–10×larger Wasserstein dis- tances between class-conditional feature distributions, in- dicating more separated semantic structure in the learned representations. C4. We show that learned interaction weights exhibit negli- gible correlation with co-occurrence frequency (r = 0.06 vs. r = 0.82 for SAE feature covariance), and provide quali- tative examples demonstrating that polynomial terms cap- ture compositional structure such as morphological bind- ing, phrasal composition, and contextual disambiguation. 2. Related Work Sparse dictionary learning. In sparse dictionary learn- ing, signals are represented as sparse linear combinations of overcomplete basis elements (Mallat & Zhang, 1993), an approach also integrated into neural network architec- tures (Hinton & Salakhutdinov, 2006; Lee et al., 2007; Konda et al., 2014). Sparse Autoencoders (SAEs) recently emerged as a leading paradigm for feature discovery in large language models (Huben et al., 2024; Bricken et al., 2023), scaling to millions of features (Gao et al., 2025). Subsequent work has produced architectural variants including Batch- TopK (Bussmann et al., 2024), Matryoshka (Bussmann et al., 2025a), Gated (Rajamanoharan et al., 2024a), and JumpReLU (Rajamanoharan et al., 2024b) SAEs, with stan- dardized benchmarks enabling systematic comparison (Kar- vonen et al., 2025). However, all these methods assume features combine additively through linear reconstruction. Modeling feature interactions. Multiplicative interac- tions between features have a rich history in deep learn- ing (Jayakumar et al., 2020), from early bilinear models for visual data (Tenenbaum & Freeman, 1996; Freeman & Tenenbaum, 1997) to modern gating mechanisms (Shazeer, 2020). Feature interactions through the Hadamard prod- uct (Chrysos et al., 2025) serve as a powerful conditioning mechanism (Perez et al., 2018; Dumoulin et al., 2017), while multiplicative structure also enables parameter-efficient mixture-of-experts (Oldfield et al., 2025a). Recent work has explored multiplicative interactions for interpretability: Bilinear MLPs (Pearce et al., 2025) model pairwise feature interactions enabling weight-based interpretability, while Gauderis & Dooms (2025) propose fully interpretable archi- tectures based on tensor networks. We extend this line of work by modeling feature interactions in the SAE setting. Polynomials. One natural way to model higher-order in- teractions is through polynomials (Shin & Ghosh, 1991). In deep learning, polynomials have been used for a variety of applications, such as image generation (Chrysos et al., 2020; 2021), classification (Babiloni et al., 2021; Chrysos et al., 2022a), privacy preservation (Zhang et al., 2019), interpretability (Dubey et al., 2022), and dynamic safety guardrails (Oldfield et al., 2025b). The work most closely related to ours is the Bilinear Autoencoder (BAE) (Dooms & Gauderis, 2025), which similarly introduces interaction terms for interpretability. The key difference lies in the level at which interactions are modeled: BAE captures pairwise interactions between input neurons, whereas PolySAE mod- els interactions directly between learned sparse features, in- cluding higher-order terms. As a result, PolySAE preserves the interpretability of linear SAE latents while explicitly allocating capacity to non-additive feature composition. 3. Sparse Polynomial Decoding 3.1. Preliminaries Notation. Bold lowercase letters denote vectors and bold uppercase letters denote matrices. Thei-th column ofMis m i , andM :,1:r denotes its firstrcolumns. We use∗for the Hadamard product,⊗for the Kronecker product, and⊙for the Khatri–Rao product.R d andR d sae denote the activation and sparse-code spaces, withd sae ≫ d.S(·)denotes a sparsification operator, such as Top-K or BatchTop-K. Sparse Autoencoders. Sparse autoencoders (SAEs) build on overcomplete dictionary learning (Olshausen & Field, 1997) to decompose neural activations into a sparse set of latent features. Given activationsx ∈ R d from an inter- mediate layer of a pretrained network, an SAE learns a sparse codez ∈ R d sae withd sae ≫ dand reconstructs via ˆ x = b dec + Dz, z = S ReLU(E ⊤ x + b enc ) whereE is a linear encoder,Dis the decoder (dictionary), andS enforces sparsity. The overcomplete latent space allows 3 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding multiple features to align with similar activation directions, supporting disentangled and interpretable representations. Motivated by the superposition hypothesis (Elhage et al., 2022), SAEs assume that features combine additively in the decoder, so reconstruction is linear inz. This corresponds to a strong form of the linear representation hypothesis applied to decoding, which has recently been questioned (Engels et al., 2025). When multiple features co-activate, their joint effect may not be well captured by a linear sum—for ex- ample, a “coffee” feature and a “star” feature may require a reconstruction direction distinct from either individual atom to capture the “Starbucks” concept. This motivates ex- tending the decoder to explicitly model feature interactions, while preserving a linear and interpretable encoder. 3.2. Design Principles for Feature Interactions We extend sparse autoencoders to capture higher-order fea- ture interactions by establishing design principles grounded in prior work. Each architectural choice in PolySAE follows directly from these principles. P1. Linear Encoding (Interpretability). Each sparse code coefficientz i is derived by a linear projection of the input activation x. The linear representation hypothesis in mech- anistic interpretability posits that learned features should correspond to directions in activation space (Elhage et al., 2022; Bricken et al., 2023), a view supported by the success of linear probes for extracting semantic content (Belinkov, 2022; Alain & Bengio, 2016). P2. Polynomial Reconstruction (Expressivity). The de- coder may capture compositional structure by using polyno- mial terms inz. Modeling compositional structure, i.e. how features interact, polynomials have a strong precedent in the literature: Volterra series (Volterra, 1959) represent nonlin- ear systems as sums of multilinear kernels, second-order pooling (Carreira et al., 2012; Gao et al., 2016) captures feature co-occurrences via outer products, and polynomial networks (Chrysos et al., 2022b) parameterize functions as products of linear projections. P3. Factorized Interaction Structure (Coherence & Ef- ficiency). Higher-order terms should operate in a low- dimensional subspace aligned with the linear feature space. Using a shared projectionUensures that interactions are compositions of the same underlying features. This align- ment principle underlies factorized interaction models (Ren- dle, 2010; Blondel et al., 2016) and compact bilinear pool- ing (Gao et al., 2016; Kim et al., 2017). Constraining inter- actions to low-rank subspaces imposes a strong inductive bias, favoring coherent, reusable interaction modes over arbitrary pairwise composition. P4. Structural Constraints (Parsimony & Identifiability). Lower-order terms should have higher representational ca- pacity than higher-order terms, following polynomial ap- proximation theory (Mason & Handscomb, 2002). The la- tent interaction subspace should have orthonormal columns to ensure geometrically distinct directions. Orthogonality constraints are standard in dictionary learning and inde- pendent component analysis to prevent degenerate solu- tions (Arora et al., 2015; Bao et al., 2016; Hyvärinen & Oja, 2000). Orthonormality removes rotational ambiguity and ensures the model does not allocate redundant capacity to correlated interaction directions. 3.3. PolySAE: Polynomial Sparse Autoencoder To satisfy P1 PolySAE adopts the standard SAE en- coder (Huben et al., 2024; Bricken et al., 2023) to first performs a linear map followed by sparsification: z =S ReLU(E ⊤ x + b enc ) ,z ∈ R d sae ,(1) where featureiactivates whenxaligns with directione i , enabling visualization, clustering, and causal intervention via activation patching (Meng et al., 2022). Following P2, we extend the decoder to include quadratic and cubic terms: ˆ x = b dec + y 1 + λ 2 y 2 + λ 3 y 3 ,(2) wherey 1 = A z,y 2 = B (z⊗ z),y 3 =Γ (z⊗ z⊗ z), andλ 2 ,λ 3 ∈ Rare learnable scalar coefficients that con- trol the contribution of each polynomial order. Setting λ 2 = λ 3 = 0recovers a standard linear sparse autoencoder, making PolySAE a strict generalization of existing SAE architectures. This can be viewed as a third-order Volterra expansion (Volterra, 1959) or aΠ-net polynomial parame- terization (Chrysos et al., 2022b), adapted to sparse codes. However, explicitly modeling all pairwise or higher-order feature combinations would requireO(d 2 sae )orO(d 3 sae )pa- rameters, leading to unstructured interaction effects and a high risk of overfitting. Following P3 and P4, we constrain interactions to a low-rank subspace: y 1 = (z U ) C (1)⊤ , y 2 = (z U :,1:R 2 )∗ (z U :,1:R 2 ) C (2)⊤ , y 3 = (z U :,1:R 3 )∗ (z U :,1:R 3 )∗ (z U :,1:R 3 ) C (3)⊤ , (3) where∗denotes element-wise product andC (k) ∈ R d×R k are output projection matrices. This parameterization re- stricts the interaction dictionaries to rank at mostR k , enforc- ing a strong inductive bias on how features may combine. Notice that this parameterization satisfies P3 by applying a single projectionUto the sparse code and forming inter- actions via polynomial operations:zU,(zU )∗ (zU ), and 4 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding (zU )∗ (zU )∗ (zU ). Using the same projected represen- tation at every order ensures that interaction effects remain aligned with the linear feature basis and interpretable as compositions of the same underlying features. Furthermore, PolySAE satisfies the parsimony aspect of P4 by following nested low-rank approximation (Grasedyck et al., 2013) and utilizing ranks(R 1 ,R 2 ,R 3 )withR 1 ≥ R 2 ≥ R 3 . This nested structure meansspan(U :,1:R 3 ) ⊂ span(U :,1:R 2 ) ⊂ span(U ). In practice,R 2 = R 3 ≪ R 1 (e.g.,R 2 = R 3 = 64) suffices to capture most interac- tion structure, confirming our hypothesis that higher-order contributions are low-dimensional (Section 4). Finally, to satisfy the identifiability aspect of P4, we en- force orthonormality of the interaction subspace. Following Stiefel optimization (Absil et al., 2008; Bonnabel, 2013), we imposeU ⊤ U = Ivia QR retraction after each gradient step. We use positive QR retraction (Edelman et al., 1998), which corrects column signs to ensure continuity and avoids discontinuous representation changes during training. 3.4. Discussion Context-Dependent Dictionary Structure. In standard SAEs, each featureiis associated with a fixed dictionary atomd i : regardless of context, activating featureicon- tributesz i d i to the reconstruction. PolySAE fundamentally alters this picture. Because reconstruction includes higher- order terms, the effective contribution of a feature becomes context-dependent, varying with which other features are simultaneously active. This can be seen by expanding Equation (2). The linear term defines a dictionary over individual features, while the quadratic and cubic terms define dictionaries over feature pairs and triples, respectively. Under our low-rank factoriza- tion, these dictionaries are implicitly given by A = C (1) U ⊤ ∈ R d×d sae , B = C (2) (U :,1:R 2 ⊙ U :,1:R 2 ) ⊤ ∈ R d×d 2 sae , Γ = C (3) (U :,1:R 3 ⊙ U :,1:R 3 ⊙ U :,1:R 3 ) ⊤ ∈ R d×d 3 sae , (4) whereAis the linear dictionary,Bthe pairwise interaction dictionary, andΓthe triple interaction dictionary. Column (i,j)ofBspecifies how the co-activationz i z j modifies the reconstruction, while column(i,j,k)ofΓspecifies the contribution arising from the joint activationz i z j z k . The computational form in Equation (3) is algebraically equivalent to Equation (4) but avoids explicitly materializing the d 2 sae - and d 3 sae -dimensional dictionaries. Compositional Capacity. Using the samed sae base fea- tures as a standard SAE, PolySAE can support interaction- driven structure across d sae 2 ·R 2 + d sae 3 ·R 3 feature pairs and triples, enabling a substantially larger space of distinct semantic compositions without increasing the number of learned features. This capacity is mediated through a shared low-rank interaction space: rather than allocating indepen- dent parameters to each feature combination, interactions are expressed viaR 2 andR 3 shared modes. As a result, a large number of potential feature combinations are realized through a small number of reusable interaction directions, reflecting the empirically observed low-dimensional struc- ture of feature interactions. Parameter Efficiency. PolySAE modifies only the decoder; the encoder is unchanged. A standard SAE has2d sae +d + d sae parameters. When the linear term is full rank (R 1 = d), PolySAE adds∆P = d 2 + d(R 2 + R 3 ) + 2parameters. With the empirically optimal choiceR 2 = R 3 andR 2 ∈ [0.06R 1 , 0.11R 1 ] , this yields∆P = (1.12–1.22)d 2 (up to constants). For GPT-2 small (d = 768,d sae = 16,384), this corresponds to an increase of∼ 2.5–3% of the full SAE. 4. Empirical Evaluation 4.1. Experimental Setup Our training pipeline is built by extendingSAELens (Bloom et al., 2024) to include PolySAE. We train and eval- uate our methods against the standard SAE with three spar- sification strategies, TopK (Gao et al., 2025), BatchTopK (Bussmann et al., 2024), and Matryoshka (Bussmann et al., 2025b). Throughout all experiments, we use a sparsity level ofK = 64with16,384latents trained on residual-stream activations from four pretrained language models of dif- ferent scales: Gemma-2-2B (Gemma Team, 2024) (layer 19), Pythia-410M and Pythia-1.4B (Biderman et al., 2023) (layers 15 and 12, respectively), and GPT-2 Small (Radford et al., 2019) (layer 8). Training uses 500M tokens (300M for GPT-2 Small) with context length 128. For Gemma-2-2B and GPT-2 Small, we use OpenWebText (Gokaslan et al., 2019); for Pythia models, we use an uncopyrighted variant of the deduplicated Pile (Gao et al., 2021). We evaluate learned features using SAEBench (Karvonen et al., 2025), which reports reconstruction metrics on held-out data from the training distribution and sparse probing performance on six classification tasks: Bias in Bios (De-Arteaga et al., 2019), AG News (Zhang et al., 2015), EuroParl (Koehn, 2005), GitHub programming languages (CodeParrot, 2022), Amazon Sentiment, and Amazon-15 (Hou et al., 2024). For more implementation details see Section B. 4.2. Reconstruction and Semantic Modeling We evaluate models along two axes: (Q1) reconstruction fidelity and (Q2) semantic modeling of the learned repre- sentations. Reconstruction quality is measured using mean squared error between the decoder output and the unnormal- ized network activations. To assess semantic structure, we 5 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 1. F1 Scores (%) across datasets at K=1. Format: F1 / Wasserstein (×10 −3 ). Mean Probing column shows mean F1 across datasets. LLMSAE variantMSEMean F1EuroparlBios Amazon Sentiment GitHubAG NewsAmazon 15 GPT-2 Small Topk0.5267.167.7 / 19.061.0 / 7.776.0 / 4.363.4 / 8.771.4 / 8.563.3 / 2.8 Topk + PolySAE0.5577.986.1 / 35.275.5 / 16.883.1 / 9.773.0 / 20.681.0 / 18.969.0 / 6.7 BTopk0.5365.767.4 / 19.059.6 / 7.368.8 / 4.468.1 / 8.865.3 / 8.265.1 / 2.9 BTopk + PolySAE0.5478.092.0 / 39.970.9 / 17.384.2 / 8.574.4 / 18.583.2 / 20.063.2 / 6.0 Matryoshka0.6065.765.8 / 12.561.2 / 4.076.2 / 3.260.9 / 7.968.1 / 4.362.1 / 2.4 Matr. + PolySAE0.5877.795.0 / 30.072.9 / 14.381.4 / 8.171.5 / 18.677.4 / 16.068.0 / 5.6 Pythia-410m Topk0.0371.296.1 / 2.067.4 / 1.161.5 / 0.764.6 / 1.571.8 / 1.265.9 / 0.4 Topk + PolySAE0.0477.096.7 / 6.870.8 / 3.875.9 / 2.374.0 / 5.373.3 / 4.071.5 / 1.4 BTopk0.0365.090.9 / 0.860.5 / 0.363.9 / 0.459.7 / 1.158.7 / 0.356.6 / 0.3 BTopk + PolySAE0.0477.397.8 / 8.274.0 / 4.174.6 / 2.175.2 / 4.878.6 / 4.463.5 / 1.3 Matryoshka0.0464.279.1 / 0.663.6 / 0.364.4 / 0.462.3 / 1.158.7 / 0.357.3 / 0.3 Matr. + PolySAE0.0474.699.2 / 2.871.0 / 1.266.9 / 1.381.8 / 3.764.8 / 1.263.8 / 0.9 Pythia-1.4b Topk0.2375.997.8 / 1.672.5 / 1.469.5 / 0.869.3 / 1.977.3 / 1.469.0 / 0.5 Topk + PolySAE0.2381.996.8 / 7.977.2 / 6.488.1 / 3.774.7 / 9.283.4 / 6.371.1 / 2.4 BTopk0.2264.674.0 / 0.665.0 / 0.557.2 / 0.663.3 / 2.365.2 / 0.463.2 / 0.4 BTopk + PolySAE0.2376.493.7 / 4.573.0 / 3.467.1 / 3.173.8 / 8.277.6 / 3.473.2 / 2.1 Matryoshka0.2464.470.2 / 0.562.8 / 0.563.1 / 0.665.6 / 1.964.1 / 0.460.8 / 0.4 Matr. + PolySAE0.2372.191.1 / 2.972.4 / 2.058.0 / 2.168.2 / 6.773.6 / 2.169.4 / 1.5 Gemma2-2b Topk1.5967.778.6 / 5.369.6 / 7.171.8 / 4.460.7 / 6.160.7 / 7.264.8 / 2.7 Topk + PolySAE1.6568.486.8 / 12.064.7 / 16.864.5 / 10.564.1 / 16.161.9 / 16.968.5 / 6.3 BTopk1.5864.868.3 / 1.967.6 / 2.671.1 / 2.864.4 / 4.559.9 / 2.757.6 / 1.9 BTopk + PolySAE1.6869.492.8 / 13.278.3 / 18.356.4 / 10.265.0 / 16.164.0 / 18.860.0 / 6.4 Matryoshka1.6960.960.8 / 0.764.0 / 0.857.3 / 1.561.5 / 2.561.0 / 0.860.5 / 1.0 Matr. + PolySAE1.6465.677.6 / 2.167.5 / 3.161.7 / 4.963.7 / 8.860.9 / 3.562.2 / 3.3 use two complementary metrics. Probing. We evaluate the linear separability of semantic concepts in the learned sparse representations by training logistic regression classifiers on SAE activations to predict ground-truth labels across multiple datasets. For each task, classification is performed using the feature with the largest mean activation difference between positive and negative classes, isolating semantic signal at the feature level. Distributional separation. Probing relies on post-hoc deci- sion boundaries and may not fully reflect the intrinsic geom- etry of the representation. We therefore additionally com- pute the 1-Wasserstein distance between class-conditional activation distributions. Unlike probing, which evaluates separability at a specific threshold, the Wasserstein distance captures global distributional separation, with larger values indicating more distinct semantic separation across space. Table 1 demonstrates that across four language models and three sparsification strategies, PolySAE achieves com- parable MSE to standard SAE across all configurations, confirming that polynomial decoding does not sacrifice re- construction fidelity. For probing, PolySAE consistently outperforms SAE by large margins with mean gains of more than 10% on GPT-2, and 8% on average across models (Pythia-410M, Pythia-1.4B, and Gemma2-2B) and sparsi- fiers. Crucially, PolySAE also achieves consistently sub- stantially higher Wasserstein distances, with improve- ments of approximately2–10×across all other models. 48163264128 0.6 0.7 0.8 Sparsity k (active features) Mean Probing F1 SAE PolySAE Figure 3. Probing Mean F1 vs. sparsityk. Shaded regions show range across widths (2k–16k). PolySAE consistently outperforms SAE with significant separation at higher k. This indicates that the gains observed in probing accuracy re- flect genuinely better-separated class-conditional represen- tations, rather than improvements driven solely by favorable decision boundaries. 4.3. Ablations PolySAE Enables Competitive Performance with Sparser Codes. We ask (Q3) whether PolySAE’s capacity to model feature interactions enables the use of sparser rep- resentations. Figure 3 shows probing F1 as a function of active featuresk, with shaded regions indicating variance across dictionary widths (2k–16k). PolySAE consistently 6 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding 64128256512768 64 128 256 512 768 0.55 0.6 0.7 0.62 0.62 0.56 0.57 0.79 0.64 0.61 0.6 0.6 0.55 0.6 0.59 0.64 0.63 0.63 0.62 0.59 0.61 0.62 0.63 0.62 0.59 Quadratic Rank (R 2 ) Cubic Rank ( R 3 ) Reconstruction MSE↓ 0.55 0.6 0.65 0.7 0.75 0.8 Figure 4. Reconstruction MSE for differentR 2 andR 3 values, with R 1 = 768, using activations from GPT-2 Small. outperforms standard SAEs across all sparsity levels, with the gap widening at higher k. PolySAE also exhibits lower variance across widths, enabling competitive performance with fewer active features in smaller dictionaries where stan- dard SAEs would require substantially more. Semantic Concentration Across Features. We next ask (Q4) whether PolySAE concentrates semantic signal into fewer features. Table 2 reports the F1 gain∆ 1–5 when ex- panding from K=1 to K=5 active features, averaged across all probing datasets and sparsifiers. PolySAE exhibits smaller gains than standard SAEs in 3 out of 4 models, with the largest difference on GPT-2 Small (-7.6%). One interpretation is that PolySAE concentrates semantic infor- mation into fewer features, reducing the marginal benefit of additional features. This indicates that higher-order interac- tions absorb contextual variability while PolySAE’s linear features remain more semantically focused. Ablating different ranks. Figure 4 examines the effect of interaction ranks on reconstruction, for fixedR 1 = 768on GPT-2 Small. PolySAE achieves competitive reconstruction with modest interaction ranks (R 2 = R 3 = 64). Increasing ranks beyond this does not improve reconstruction, suggest- ing that the additional capacity is unnecessary for capturing interaction structure in this setting. 5. Qualitative Analysis: Making Sense of Learned Interactions To better understand the learnt interactions, we firstly ask (Q5) whether PolySAE’s higher-order terms encode genuine Table 2. Mean F1 Gain from K=1 to K=5 after averaging the results from all 6 probing datasets and all 3 sparsifiers. ModelSAE PolySAE PolySAE effect GPT-2 Small+14.3+6.7–7.6 Pythia-410m+11.0+10.5–0.5 Pythia-1.4b+10.3+10.5+0.2 Gemma2-2b+13.6+10.6–3.0 Overall+12.3+9.6–2.7 compositional structure or merely reflect surface-level co-occurrence.To study this, we analyze SAE and PolySAE activations trained with Top-Ksparsification on GPT-2 small, using 1M OpenWebText texts. For each feature pair(i,j), we first define the learned quadratic interaction strengthB ij = λ 2 (u i ⊙ u j ) ⊤ C (2) 2 ,which depends only on the trained decoder parameters and measures how much representational capacity the model allocates to the(i,j)interaction. Second, we compute the empirical co-occurrence frequencyN ij by counting token positions in which both features appear in the top-Kactive set across the same corpus. If polynomial interactions merely replicated bigram statistics,B ij andN ij would correlate strongly. As a baseline, we consider the empirical covariance of SAE activations, which captures the full pairwise structure accessible to a linear model. As expected, this covariance correlates strongly with co- occurrence frequency (r = 0.82). In contrast, PolySAE’s learned interactions exhibit negligible correlation with co- occurrence (r = 0.06), indicating that interaction capacity is allocated based on criteria largely orthogonal to frequency. Since higher-order dictionaries do not simply encode co- occurrence, we finally ask (Q6) whether the learned in- teractions are interpretable. To analyze this structure, we construct a dictionary mapping each feature to its most ac- tivating tokens, then examine feature pairs and triples with high interaction strength by extracting representative con- texts in which the corresponding features co-activate. Selected examples in Table 3 and Table 4 illustrate the qualitative structure captured by PolySAE’s higher-order terms. Second-order interactions often correspond to coher- ent phrase-level compositions that are not recoverable from either feature in isolation, such as coffee×star yielding contexts referring to Starbucks, a highly non-linear seman- tic mapping. In contrast, SAEs typically activate broad or weakly related features in these contexts, failing to recover the composed meaning. Third-order interactions further refine such compositions by conditioning on additional con- text. For example, PolySAE distinguishes financial invest- ing from unrelated-ingusages by integrating morpholog- ical cues with market-related features, and disambiguates 7 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 3.Second-Order Interaction Examples Captured by PolySAE. Quadratic interactions bind two features to capture context-dependent semantic structure beyond co-occurrence. SAE often recovers individual components but fails to represent the composed meaning. Poly F 1 Poly F 2 ContextSAEObserved Pattern [star, stars][coffee, tea] We’ve all certainly heard of beers brewed with espresso, but how about one with an espresso shot poured over the top? Starbucks [Apple, Google] The interaction binds features to represent a specific named entity creating a new semantic category. [surgery, repair] [Trans, LGBT] Some in the transgender community are wor- ried a suspicious fire at a Montreal clinic will add delays to an already lengthy process to get gender reassignment surgery [birth, baby]Specialization: a general concept (surgery) gets specialized by domain context (Trans,LGBT) narrowing the semantic scope. [DNA, genetic] [mod, mods] Activists are opening up a new front in their campaign against genetic modification. The latest target is genetically-modified trees [modified, edit] Multiple modifiers stack to create specific compound meanings. Interaction binds genetic with the action modification. [secret, hidden] [Snowden, WikiLeaks] On May 24th PBS aired a Frontline documen- tary about alleged Wikileaker Bradley Manning called “WikiSecrets” [secret, secrets] Feature interaction binds topical concepts to create coined term that could not be modeled via co-occurrence alone. Table 4. Third-Order Interaction Examples Captured by PolySAE. Cubic interactions condition pairwise compositions on additional context, disambiguating meaning through three-way binding. Vanilla SAE typically activates broader or less specific features. Poly F 1 Poly F 2 Poly F 3 ContextSAEObserved pattern [proved, proven] [star, stars, superstar] [reputation, fame] David Bowie proved some stars are big enough not to make themselves available [star, stars, superstar] Three-way relational binding, all ar- guments must be present; reputation disambiguates which aspect of stars is relevant to the proving action. [nuclear, reactor] [test, test- ing] [radiation, magnetic] US tests nuclear-capable missile with the range to strike North Korea [nuclear, atomic] Specifying concept; Event type (testing) ×domain (nuclear)×capability (radia- tion) (Parsons, 1990) [black, racial] [Americans, Canadians] [people, women] In a push to get more Black Americans involved in the world of tech [Americans, Muslims, Jews] Multi-attribute category intersection, binding demographic attributes. [ing, ting][stock, market] [invest, investing] Investing.com — Philippines stocks were higher after [ing, train- ing, run- ning] Three-way interaction between morpho- logical marker (ing) and domain (stock, market) (Asher, 2011). [historic, historical] [UFC, MMA] [strong, impressive] Jon Jones’ historic UFC title reign came to an end [the, „ .]The standard for historic is calibrated by specific domain and assessed quality; Degree (historic)×domain (UFC)× evaluation (strong, impressive) generic entities such as nuclear or Americans based on surrounding semantic attributes. Across examples, higher- order terms absorb contextual variation that would oth- erwise fragment linear features, allowing PolySAE to express compositional meaning through structured interac- tions rather than proliferating context-specific atoms. Fur- ther examples in Section C confirm these patterns. 6. Conclusion We introduced PolySAE, a sparse autoencoder that extends the decoder with higher order terms to model feature inter- actions while preserving a linear encoder for interpretability. Through low-rank tensor factorization on a shared projec- tion subspace, PolySAE captures pairwise and triple fea- ture interactions with small parameter overhead. Across four language models and three SAE variants, PolySAE consistently improves probing F1 by 8% on average while maintaining comparable reconstruction error. PolySAE also achieves 2–10×larger Wasserstein distances between class- conditional feature distributions, indicating that polynomial decoding produces representations with more separated se- mantic structure. Crucially, learned interaction weights exhibit negligible correlation with co-occurrence frequency (r = 0.06), suggesting that the model allocates interaction capacity based on compositional structure rather than sur- 8 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding face statistics. Limitations: We study models up to 2B parameters and, despite general applicability, restrict experi- ments to forced-sparsity SAE variants. Impact Statement This work contributes to the field of interpretable machine learning by introducing PolySAE, a method for modeling non-additive feature interactions in sparse autoencoders. By enabling explicit representation of compositional structure while preserving linear, human-interpretable features, this approach advances tools for mechanistic analysis of large language models. Improved interpretability has the potential to support downstream efforts in model auditing, debugging, and safety research by making it easier to identify, analyze, and intervene on meaningful internal representations. The primary anticipated benefits of this work are method- ological and scientific. PolySAE is intended as an analysis tool rather than a deployment-facing component, and it does not directly increase the capabilities of language models. As with other interpretability methods, there is a possibility that insights into internal representations could be misused to more effectively manipulate model behavior, but we do not identify novel or unique risks introduced by this work beyond those already present in the interpretability litera- ture. Overall, we believe this work has a net positive societal impact by strengthening the technical foundations of inter- pretability and contributing to the long-term goal of build- ing more transparent, controllable, and trustworthy machine learning systems. References Absil, P.-A., Mahony, R., and Sepulchre, R. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008. Alain, G. and Bengio, Y. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schul- man, J., and Mané, D. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016. Arora, S., Ge, R., Ma, T., and Moitra, A. Simple, efficient, and neural algorithms for sparse coding. In Conference on Learning Theory (COLT), p. 113–149. PMLR, 2015. Asher, N. Lexical Meaning in Context: A Web of Words. Cambridge University Press, 2011. Babiloni, F., Marras, I., Kokkinos, F., Deng, J., Chrysos, G., and Zafeiriou, S. Poly-nl: Linear complexity non-local layers with 3rd order polynomials. In Proceedings of the IEEE/CVF international conference on computer vision, p. 10518–10528, 2021. Bao, C., Ji, H., Quan, Y., and Shen, Z. Dictionary learning for sparse coding: Algorithms and analysis. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 38 (7):1356–1369, 2016. Belinkov, Y. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219, 2022. Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel, B., Goldfarb, D., et al. International ai safety report. arXiv preprint arXiv:2501.17805, 2025. Bereska, L. and Gavves, S. Mechanistic interpretability for AI safety - a review. Transactions on Machine Learn- ing Research, 2024. ISSN 2835-8856. URLhttps:// openreview.net/forum?id=ePUVetPKu6. Sur- vey Certification, Expert Certification. Biderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M. A., Purohit, S., Prashanth, U. S., Raff, E., Skowron, A., Sutawika, L., and Van Der Wal, O. Pythia: A suite for analyz- ing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, p. 2397–2430. PMLR, 2023. URLhttps://proceedings.mlr.press/ v202/biderman23a.html. Blondel, M., Fujino, A., Ueda, N., and Ishihata, M. Higher- order factorization machines. In Advances in Neural Information Processing Systems, volume 29, p. 3351– 3359, 2016. Bloom, J., Tigges, C., Duong, A., and Chanin, D. SAELens.https://github.com/jbloomAus/ SAELens, 2024. GitHub repository. Bonnabel, S. Stochastic gradient descent on Riemannian manifolds. IEEE Transactions on Automatic Control, 58 (9):2217–2229, 2013. doi: 10.1109/TAC.2013.2254619. Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., et al.Towards monosemanticity: De- composing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer- circuits.pub/2023/monosemantic-features/index.html. Bussmann, B., Leask, P., and Nanda, N.BatchTopK sparse autoencoders.In NeurIPS Workshop on Sci- entific Methods for Understanding Deep Learning, 9 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding 2024. URLhttps://openreview.net/forum? id=d4dpOCqybL. Bussmann, B., Nabeshima, N., Karvonen, A., and Nanda, N. Learning multi-level features with matryoshka sparse autoencoders. arXiv preprint arXiv:2503.17547, 2025a. Bussmann, B., Nabeshima, N., Karvonen, A., and Nanda, N. Learning multi-level features with Matryoshka sparse autoencoders. In Proceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 of Proceedings of Machine Learning Research. PMLR, 2025b. Carreira, J., Caseiro, R., Batista, J., and Sminchisescu, C. Semantic segmentation with second-order pooling. In European Conference on Computer Vision (ECCV), p. 430–443. Springer, 2012. Chrysos, G., Georgopoulos, M., and Panagakis, Y. Condi- tional generation using polynomial expansions. Advances in Neural Information Processing Systems, 34:28390– 28404, 2021. Chrysos, G. G., Moschoglou, S., Bouritsas, G., Panagakis, Y., Deng, J., and Zafeiriou, S. P-nets: Deep polynomial neural networks. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, p. 7325–7335, 2020. Chrysos, G. G., Georgopoulos, M., Deng, J., Kossaifi, J., Panagakis, Y., and Anandkumar, A. Augmenting deep classifiers with polynomial neural networks. In European Conference on Computer Vision, p. 692–716. Springer, 2022a. Chrysos, G. G., Moschoglou, S., Bouritsas, G., Deng, J., Panagakis, Y., and Zafeiriou, S. Deep polynomial neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4021–4034, 2022b. doi: 10. 1109/TPAMI.2021.3058891. Published online 2021. Chrysos, G. G., Wu, Y., Pascanu, R., Torr, P., and Cevher, V. Hadamard product in deep learning: Introduction, advances and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. CodeParrot.Github code dataset.https: //huggingface.co/datasets/codeparrot/ github-code, 2022. Csordás, R., Potts, C., Manning, C. D., and Geiger, A. Re- current neural networks learn to store and generate se- quences using non-linear representations. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Inter- preting Neural Networks for NLP, p. 248–262, 2024. De-Arteaga, M., Romanov, A., Wallach, H., Chayes, J., Borgs, C., Chouldechova, A., Geyik, S. C., Kenthapadi, K., and Kalai, A. T. Bias in bios: A case study of se- mantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountabil- ity, and Transparency (FAT* ’19), p. 120–128. ACM, 2019. doi: 10.1145/3287560.3287572. Dooms, T. and Gauderis, W.Finding manifolds with bilinear autoencoders.In Mechanistic Interpretabil- ity Workshop at NeurIPS 2025, 2025. URLhttps: //openreview.net/forum?id=ybJXIh4vcF. Dubey, A., Radenovic, F., and Mahajan, D. Scalable inter- pretability via polynomials. Advances in neural informa- tion processing systems, 35:36748–36761, 2022. Dumoulin, V., Shlens, J., and Kudlur, M. A learned rep- resentation for artistic style. In International Confer- ence on Learning Representations, 2017. URLhttps: //openreview.net/forum?id=BJO-BuT1g. Dunefsky, J., Chlenski, P., and Nanda, N. Transcoders find interpretable LLM feature circuits. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URLhttps://openreview.net/ forum?id=J6zHcScAo0. Edelman, A., Arias, T. A., and Smith, S. T. The geometry of algorithms with orthogonality constraints. SIAM Jour- nal on Matrix Analysis and Applications, 20(2):303–353, 1998. Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., et al.Toy models of superposition. Transformer Circuits Thread, 2022. https://transformer- circuits.pub/2022/toy_model/index.html. Engels, J., Michaud, E. J., Liao, I., Gurnee, W., and Tegmark, M.Not all language model features are one-dimensionally linear.In The Thirteenth In- ternational Conference on Learning Representations, 2025. URLhttps://openreview.net/forum? id=d63a4AM4hb. Fodor, J. A. and Pylyshyn, Z. W. Connectionism and cogni- tive architecture: A critical analysis. Cognition, 28(1-2): 3–71, 1988. Freeman, W. T. and Tenenbaum, J. B. Learning bilinear models for two-factor problems in vision. In Proceedings of IEEE Computer Society Conference on Computer Vi- sion and Pattern Recognition, p. 554–560. IEEE, 1997. Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S., and Leahy, C. The Pile: An 800GB dataset 10 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2021. Gao, L., Dupre la Tour, T., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., and Wu, J. Scaling and evaluating sparse autoencoders. In The Thirteenth International Conference on Learning Representations (ICLR), 2025. URLhttps://openreview.net/ forum?id=tcsZt9ZNKD. Gao, Y., Beijbom, O., Zhang, N., and Darrell, T. Compact bilinear pooling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 317–326, 2016. Gauderis, W. and Dooms, T. Compositionality unlocks deep interpretable models. In Connecting Low-Rank Represen- tations in AI: At the 39th Annual AAAI Conference on Artificial Intelligence, 2025. Gemma Team.Gemma 2: Improving open language models at a practical size. Technical report, Google DeepMind,2024.URLhttps://storage. googleapis.com/deepmind-media/gemma/ gemma-2-report.pdf. Gokaslan, A., Cohen, V., Pavlick, E., and Tellex, S. Open- WebText corpus.Zenodo, 2019.URLhttps:// zenodo.org/records/3834942. Grasedyck, L., Kressner, D., and Tobler, C. A litera- ture survey of low-rank tensor approximation techniques. GAMM-Mitteilungen, 36(1):53–78, 2013. Haspelmath, M. and Sims, A. Understanding morphology. Routledge, 2013. Hendrycks, D., Carlini, N., Schulman, J., and Steinhardt, J. Unsolved problems in ML safety. arXiv preprint arXiv:2109.13916, 2021. Hinton, G. E. and Salakhutdinov, R. R. Reducing the di- mensionality of data with neural networks. science, 313 (5786):504–507, 2006. Hou, Y., Li, J., He, Z., Yan, A., Chen, X., and McAuley, J. Bridging language and items for retrieval and recommen- dation. arXiv preprint arXiv:2403.03952, 2024. Huben, R., Cunningham, H., Smith, L. R., Ewart, A., and Sharkey, L. Sparse autoencoders find highly inter- pretable features in language models. In The Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum? id=F76bwRSLeK. Hyvärinen, A. and Oja, E. Independent component analysis: Algorithms and applications. Neural Networks, 13(4-5): 411–430, 2000. Jayakumar, S. M., Czarnecki, W. M., Menick, J., Schwarz, J., Rae, J., Osindero, S., Teh, Y. W., Harley, T., and Pas- canu, R. Multiplicative interactions and where to find them. In International conference on learning represen- tations, 2020. Karvonen, A., Rager, C., Lin, J., Tigges, C., Bloom, J. I., Chanin, D., Lau, Y.-T., Farrell, E., McDougall, C. S., Ay- onrinde, K., Till, D., Wearden, M., Conmy, A., Marks, S., and Nanda, N. SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretabil- ity. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, p. 29223–29264. PMLR, 2025. URLhttps://proceedings.mlr.press/ v267/karvonen25a.html. Kim, J.-H., On, K.-W., Lim, W., Kim, J., Ha, J.-W., and Zhang, B.-T. Hadamard product for low-rank bilinear pooling. In International Conference on Learning Repre- sentations, 2017. Koehn, P. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Sum- mit X: Papers, p. 79–86, Phuket, Thailand, September 13-15 2005. Konda, K., Memisevic, R., and Krueger, D. Zero-bias au- toencoders and the benefits of co-adapting features. arXiv preprint arXiv:1402.3337, 2014. Lee, H., Ekanadham, C., and Ng, A. Sparse deep belief net model for visual area v2. Advances in neural information processing systems, 20, 2007. Lindsey, J., Gurnee, W., Ameisen, E., Chen, B., Pearce, A., Turner, N. L., Citro, C., Abrahams, D., Carter, S., Hosmer, B., Marcus, J., Sklar, M., Templeton, A., Bricken, T., McDougall, C., Cunningham, H., Henighan, T., Jermyn, A., Jones, A., Persic, A., Qi, Z., Thomp- son, T. B., Zimmerman, S., Rivoire, K., Conerly, T., Olah, C., and Batson, J. On the biology of a large language model. Transformer Circuits Thread, 2025. URLhttps://transformer-circuits.pub/ 2025/attribution-graphs/biology.html. Mallat, S. G. and Zhang, Z. Matching pursuits with time- frequency dictionaries. IEEE Transactions on signal processing, 41(12):3397–3415, 1993. Mason, J. C. and Handscomb, D. C. Chebyshev Polynomials. CRC Press, 2002. Meng, K., Bau, D., Andonian, A., and Belinkov, Y. Locating and editing factual associations in GPT. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, p. 17359–17372, 2022. 11 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Oldfield, J., Im, S., Li, S., Nicolaou, M., Patras, I., and Chrysos, G. Towards interpretability without sacrifice: Faithful dense layer decomposition with mixture of de- coders. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025a. URLhttps: //openreview.net/forum?id=jcvX8XFNqX. Oldfield, J., Torr, P., Patras, I., Bibi, A., and Barez, F. Be- yond linear probes: Dynamic safety monitoring for lan- guage models. arXiv preprint arXiv:2509.26238, 2025b. Olshausen, B. A. and Field, D. J. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):3311–3325, 1997. Parsons, T. Events in the semantics of english, 1990. Partee, B. H. Lexical semantics and compositionality. In Gleitman, L. R. and Liberman, M. (eds.), An Invitation to Cognitive Science: Language, volume 1, p. 311–360. MIT Press, Cambridge, MA, 1995. Pearce, M. T., Dooms, T., Rigg, A., Oramas, J., and Sharkey, L. Bilinear MLPs enable weight-based mechanistic in- terpretability. In The Thirteenth International Confer- ence on Learning Representations, 2025. URLhttps: //openreview.net/forum?id=gI0kPklUKS. Perez, E., Strub, F., De Vries, H., Dumoulin, V., and Courville, A. Film: Visual reasoning with a general con- ditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.Language models are un- supervised multitask learners.Technical report, OpenAI, 2019.URLhttps://cdn.openai. com/better-language-models/language_ models_are_unsupervised_multitask_ learners.pdf. Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., and Nanda, N. Improving dictionary learning with gated sparse autoencoders. arXiv preprint arXiv:2404.16014, 2024a. Rajamanoharan, S., Lieberum, T., Sonnerat, N., Conmy, A., Varma, V., Kramár, J., and Nanda, N. Jumping ahead: Improving reconstruction fidelity with jumprelu sparse autoencoders. arXiv preprint arXiv:2407.14435, 2024b. Rendle, S. Factorization machines. In 2010 IEEE Interna- tional Conference on Data Mining, p. 995–1000. IEEE, 2010. Rimsky, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., and Turner, A. Steering llama 2 via contrastive activation addition. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p. 15504–15522, 2024. Shazeer, N. Glu variants improve transformer, 2020. URL https://arxiv.org/abs/2002.05202. Shin, Y. and Ghosh, J. The pi-sigma network: An efficient higher-order neural network for pattern classification and function approximation. In IJCNN-91-Seattle interna- tional joint conference on neural networks, volume 1, p. 13–18. IEEE, 1991. Smolensky, P. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial intelligence, 46(1-2):159–216, 1990. Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., Carter, S., Olah, C., and Henighan, T. Scaling monosemanticity: Ex- tracting interpretable features from claude 3 sonnet. Transformer Circuits Thread, 2024.URLhttps: //transformer-circuits.pub/2024/ scaling-monosemanticity/index.html. Tenenbaum, J. and Freeman, W. Separating style and con- tent. Advances in neural information processing systems, 9, 1996. Volterra, V. Theory of Functionals and of Integral and Integro-Differential Equations. Dover Publications, 1959. Originally published 1930. Wong, E., Santurkar, S., and Madry, A. Leveraging sparse linear layers for debuggable deep networks. In Inter- national Conference on Machine Learning, p. 11205– 11216. PMLR, 2021. Zhang, S.-X., Gong, Y., and Yu, D. Encrypted speech recognition using deep polynomial networks. In ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), p. 5691–5695. IEEE, 2019. Zhang, X., Zhao, J., and LeCun, Y.Character-level convolutional networks for text classification.In Advances in Neural Information Processing Systems, vol- ume 28, 2015.URLhttps://proceedings. neurips.c/paper/2015/file/ 250cf8b51c773f3f8dc8b4be867a9a02-Paper. pdf. 12 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Algorithm 1 PolySAE Training (Shared-U) Input: activationsx, ranks (R 1 ,R 2 ,R 3 ), sparsity K, learning rate η Initialize U ← qr + (U rand ); λ 2 ←−0.5; λ 3 ← 0.5 for each minibatch x do // Encode (P1: linear encoder with norm rescaling) h← E ⊤ x + b enc Compute decoder norms d∈ R d sae : d i =∥PolyDec(e i )∥ 2 z ← TopK(ReLU(h⊙ d),K) // Decode (P2–P4: polynomial, shared, hierarchical) y 1 ← (z U ) C (1)⊤ A 2 ← z U :,1:R 2 y 2 ← (A 2 ∗ A 2 ) C (2)⊤ A 3 ← z U :,1:R 3 y 3 ← (A 3 ∗ A 3 ∗ A 3 ) C (3)⊤ y ← b dec + y 1 + λ 2 y 2 + λ 3 y 3 // Update with manifold retraction (P4) L←∥y− x∥ 2 2 + architecture specific regularizations (e.g. Matryoshka) Update all parameters via∇L (Q, R)← qr(U ) S ← diag(sgn(diag(R))) U ← QS end for A. PolySAE Algorithm We provide the full training algorithm for PolySAE in Algorithm 1, detailing the encoding, polynomial decoding, and optimization steps used throughout all experiments. B. Implementation Details Architecture and Sparsification. We train standard sparse autoencoders (SAEs) and PolySAEs with a latent width of 16,384and sparsity levelK = 64. Encoders use one of three sparsification strategies: Top-K(Gao et al., 2025), BatchTopK (Bussmann et al., 2024), or Matryoshka (Bussmann et al., 2025b). For Top-Kand BatchTopK, theKlargest activations per token (or batch) are retained and the remainder zeroed. All models are trained on residual-stream activations extracted from pretrained language models. LLMs. We evaluate SAEs and PolySAEs on a standard set of pretrained language models spanning a range of scales: GPT-2 Small (Radford et al., 2019), Pythia-410M and Pythia-1.4B (Biderman et al., 2023), and Gemma-2-2B (Gemma Team, 2024). For each model, we extract residual-stream activations from a single transformer layer chosen near the center of the network, following the methodology of Dunefsky et al. (2024). PolySAE Decoder Ranks. The rank configurations used in our experiments are: • GPT-2 Small (Radford et al., 2019): (768, 64, 64) • Pythia-410M (Biderman et al., 2023): (1024, 128, 128) • Pythia-1.4B (Biderman et al., 2023): (2048, 128, 128) • Gemma-2-2B (Gemma Team, 2024): (2304, 128, 128) Training Setup.All models are trained using the Adam optimizer withβ 1 = 0.9andβ 2 = 0.999, a constant learning rate of3× 10 −4 with no warmup or decay schedules. We use a batch size of4096tokens and a context length of128. We apply gradient clipping with a maximum norm of1.0to stabilize training. No weight decay or L1 regularization is applied to the 13 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding encoder or decoder weights. Training runs for5× 10 8 tokens for Gemma-2-2B and Pythia models, and3× 10 8 tokens for GPT-2 Small, following the protocol used in the main experiments. Datasets. For Gemma-2-2B and GPT-2 Small, training data is drawn from OpenWebText (Gokaslan et al., 2019). For Pythia-410M and Pythia-1.4B, we use an uncopyrighted variant of the deduplicated Pile (Gao et al., 2021). Reconstruction is evaluated on held-out data from the same distribution as training. Evaluation. We evaluate learned representations using SAEBench (Karvonen et al., 2025). Reported metrics include reconstruction error on held-out data and sparse probing performance on six classification tasks: Bias in Bios (De-Arteaga et al., 2019), AG News (Zhang et al., 2015), EuroParl (Koehn, 2005), GitHub programming languages (CodeParrot, 2022), Amazon Sentiment, and Amazon-15 (Hou et al., 2024). Implementation.Our training pipeline extendsSAELens(Bloom et al., 2024) to support PolySAE while preserving the standard SAE training interface. PolySAE differs from standard SAEs only in the decoder; all other components, including the encoder, sparsification strategy, optimizer, and evaluation pipeline, are shared across models. C. Extended Qualitative Analysis We present an extended qualitative analysis of the interaction structure learned by PolySAE. The analysis proceeds hierarchically, first examining second-order (pairwise) interactions and then extending to third-order (triplet) compositions. Throughout, we compare PolySAE to a vanilla Top-K SAE trained under identical conditions. C.1. Second-Order Analysis We begin by analyzing pairwise interactions to assess whether PolySAE captures compositional structure beyond surface- level co-occurrence. Setup.Both models are applied to 1M documents from OpenWebText. Features are ranked by total activation mass, and the top 10,000 are retained, yielding approximately 5× 10 7 candidate feature pairs. Interaction Strength.For PolySAE, we quantify the strength of a feature pair(i,j)using the learned quadratic decoder weights: B ij = λ 2 (u i ⊙ u j ) ⊤ C (2) 2 ,(5) which reflects how much decoder capacity is assigned to that interaction. For the vanilla SAE, which lacks explicit interaction parameters, we use empirical feature covariance, Cov ij = E[z i z j ]− E[z i ]E[z j ],(6) as a proxy for pairwise structure. Relation to Co-occurrence.We independently estimate empirical co-occurrence by counting positions where both features appear in the Top-Kactive set. For the vanilla SAE, covariance is strongly correlated with co-occurrence (r = 0.82), indicating that pairwise structure largely mirrors frequency. In contrast, PolySAE’s interaction strengths show negligible correlation with co-occurrence (r = 0.06), suggesting that learned interactions reflect structure beyond surface statistics. Qualitative Regimes. The weak coupling between interaction strength and frequency allows us to identify qualitatively distinct regimes. Of particular interest are latent interactions: feature pairs with strong learned interactions despite low empirical co-occurrence. These pairs often correspond to meaningful compositional patterns that are not recoverable from frequency alone. Examples. For interactions above the 80th percentile inB ij , we extract representative contexts in which both features co-activate. We mark the target token in each sentence and label features by their top-activating tokens. Comparing these contexts with vanilla SAE activations highlights cases where PolySAE captures relationships that the linear model does not. 14 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 5. Compositional Interactions Captured by PolySAE. PolySAE features (A and B) bind in context to represent specific compositional concepts. Vanilla SAE features (with high-frequency features filtered) fail to capture these compositions. Poly Feature APoly Feature BContextVanilla SAE [star, stars][coffee, tea]We’ve all certainly heard of beers brewed with espresso, but how about one with an espresso shot poured over the top? Starbucks [Apple, Google] [officially, ically] [traditional, conven- tional] It’s hard to say what’s most impressive about Eduardo Garcia. The classically-trained chef spent years cooking aboard yachts [newly, ly] [surgery, repair][Trans, LGBT] Some in the transgender community are worried a suspicious fire at a Montreal clinic will add delays to an already lengthy process to get gender reassignment surgery [birth, baby] [DNA, genetic][mod, mods] Activists are opening up a new front in their campaign against genetic modification. The latest target is genetically-modified trees, which scientists believe could bring huge sustainability [modified, edit] [secret, hidden][Snowden, Wik- iLeaks] On May 24th PBS aired a Frontline documentary about alleged Wikileaker Bradley Manning called “WikiSecrets” [secret, secrets] [business, busi- nesses] [man, woman]By Joseph George. The businessman dad of the boy who drove a Ferrari and was arrested by police in Kerala, India [man, President] C.2. Third-Order Analysis We next examine whether third-order interactions refine or disambiguate pairwise compositions. Candidate Selection. We focus on latent second-order pairs—those with high interaction strength but low co- occurrence—and identify corpus positions where both features are simultaneously active. Triplet Scoring. Within these contexts, we evaluate all co-active third features using the learned cubic decoder: Gamma(f 1 ,f 2 ,k) = λ 3 (u f 1 ⊙ u f 2 )· u ⊤ k C (3)⊤ .(7) For each pair, we retain the third feature with the highest score, after filtering stopword-like features. Interpretation. The resulting triplets are consistently interpretable, with the third feature modulating the meaning of the pair rather than introducing unrelated content. Common patterns include entity–attribute–domain and subject–object– context structures. Representative examples are shown in Table 10, illustrating how higher-order interactions sharpen and contextualize pairwise compositions. C.3. Additional Second-Order Interaction Examples Tables 5–9 show additional second-order interaction examples. Each row highlights a token where two PolySAE features are simultaneously active. Across these tables, the interacting features are typically more specific than the corresponding vanilla SAE features in the same context. The vanilla SAE often activates on a single high-level, morphological, or broadly related feature, while PolySAE activations reflect a more refined decomposition at the highlighted token. C.4. Additional Third-Order Interaction Examples Tables 10 and 11 present further third-order examples. Each row shows contexts in which three PolySAE features co-activate at the same token. In these cases, the activated features vary with context and appear more specific than the corresponding vanilla SAE activations, which often capture only one component or default to generic features. 15 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 6. PolySAE Interactions – Brand & Proper Noun Decomposition. PolySAE decomposes compound names into their semantic constituents. Vanilla SAE (with high-frequency features filtered) often fires on unrelated entities or only captures the surface form. Poly Feature APoly Feature BContextVanilla SAE [economic, econ- omy] [Times, magazine] The RBI on Wednesday did not allow Stanley Pignal, the South Asian business and finance correspondent for the Economist magazine, to attend the central bank’s [economics, economist] [field, fields][University, school]John Doe is a Jesuit with ADHD. He was an outstanding student and a compassionate senior at Fairfield University who played sports and volunteered often at a literacy [York, Washington] [Star, Chronicle][staff, crew]Man Arrested after Stolen Mower Runs Out of Gas. By West Kentucky Star Staff. PADUCAH, KY [staff, faculty] [Dragon, Iron][steel, Pittsburgh]The Iron Horde is on the march, and the Warlords of Draenor are primed to invade Azeroth on November 13! Steel yourself for the onslaught by watching [assault, steel] [Italian, Italy][gang, mob] Details obtained by the Guardian reveal extent to which Sicilian mafia clans are migrating north after running into financial prob- lems in Italy. [State, ISIS] Table 7. PolySAE Interactions – Morphological Composition. PolySAE binds suffix/prefix features with semantic content to form derived words. Vanilla SAE (with high-frequency features filtered) captures only generic morphological patterns without semantic binding. Poly Feature APoly Feature BContextVanilla SAE [ers, Workers][administration, administrative] Piedmont High School. A school reveals it has a “Fantasy Slut League” Administrators try to do the right thing, but fall woefully short of [members, ers] [ing, ings][arrested, arrest]Earlier this year, The Heritage Foundation’s Meese Center released Arresting Your Property, a comprehensive report on civil asset forfeiture-the much mal [ing, training] [protest, protests][making, ing]Major League Baseball can no longer claim to be free of any anthem-protesting players. On Saturday night, A’s catcher Bruce Maxwell took [ing, training] [ers, Workers][photos, pictures] In-Sight Film. The film in-sight was produced in conjunction with the Format Photography Festival to mark 10 years of the Street Photographers group [members, ers] [ized, ization][treatment, drugs] The flu shot is a quack science medical hoax. While some vaccines do confer immunization effectiveness, the flu shot isn’t one of them [development, ation] [bound, ice][gun, guns] A new Texas law gives gun owners a new right to store a weapon (any lawfully owned firearm, not just those owned under a Con- cealed Handgun License [gun, weapons] 16 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 8. PolySAE Interactions – Domain-Specific Collocations. PolySAE captures specialized terminology through the interaction of domain features. Vanilla SAE (with high-frequency features filtered) often misses the domain-specific meaning. Poly Feature APoly Feature BContextVanilla SAE [football, NFL][conference, confer- ences] Statement from the Southeastern Conference Office Regarding the Florida-LSU football game: The LSU-Florida football game scheduled for Saturday in Gainesville [League, Confer- ence] [earnings, financial][number, numbers] T-Mobile US, Inc. TMUS is scheduled to report fourth-quarter 2015 financial numbers, before the opening bell on Feb 17. Last [numbers, figures] [technology, tech][development, de- velopers] You can’t look at internet news lately without seeing the latest and greatest in nanotechnology developments. Everything these days is being manufactured smaller, faster [it, said] [Canada, Canadian][oil, pipeline]Eddy Radillo holds a Texas flag and a sign opposing the Tran- scanada Keystone Pipeline in February 2012 outside the Lamar County Courthouse in Paris [oil, gas] [diet, fitness][train, rail]Here’s what you need to know... Your gains will stagnate if you only weight train within the same rep ranges and loading patterns. [training, train] Table 9. PolySAE Interactions – Compound Words & Phrases. PolySAE captures compound words and multi-word phrases through feature interactions. Vanilla SAE (with high-frequency features filtered) often misses the compositional meaning entirely. Poly Feature APoly Feature BContextVanilla SAE [director, founder][lead, managing]A FRIEND OF MINE recently made the following observation about Ezra Koenig, the founder and lead singer of Vampire Week- end. “Did you realize, [led, lead] [written, designed][research, re- searcher] The following story was written and researched by Rone Tempest for The Utah Investigative Journalism Project in partnership with The Salt Lake Tribune. Dustin Porter said [created, made] [alleged, allega- tions] [level, levels]Back to previous page. Accusations against generals cast dark shadow over Army. By Ernesto Londoño. The accusations leveled against [place, made] [music, musical][official, officer] ROCHESTER, N.Y. – Members of Rochester’s music community continue to pull together to remember and help the family a fellow musician who [man, President] [involved, involve- ment] [support, help]STEAL THIS SHOW’s Patreon campaign helps keep us free and independent. If you enjoy the show, get involved. Our patrons get access to [started, ready] [shooting, shot][focus, focused] Berenice Abbott was an American photographer best known for her black-and-white photography of New York City. She heavily focused her shooting [ing, training] [document, docu- ments] [content, contents]Use these links to rapidly review the document. TABLE OF CON- TENTS. INDEX TO CONSOLIDATED FINANCIAL STATE- MENTS [Introduction, His- tory] 17 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 10. Third-Order Compositional Interactions Captured by PolySAE. Three PolySAE features (F i ,F j ,F k ) bind in context to represent compositional concepts. Vanilla SAE often captures individual components but misses the compositional structure. Poly F i Poly F j Poly F k ContextVanilla SAE [nuclear, Fukushima, reactor] [test, testing, tested] [radiation, laser, mag- netic] US tests nuclear-capable missile with the range to strike North Korea. The US has test-fired a nuclear-capable intercontinental ballistic missile [nuclear, reactor, atomic] [black, white, racial] [Americans, Canadians, Australians] [people, women, men] In a push to get more Black Americans involved in the world of tech, a slew of organizations have teamed up with South by Southwest [Americans, Muslims, Jews] [ing, ings, ting] [stock, trading, market] [investment, in- vest, investing] Philippines stocks higher at close of trade; PSEi Composite up 0.57%. Investing.com — Philippines stocks were higher after [ing, training, running] [line, lines, lining] [supply, sup- plies, shortage] [road, route, pipeline] The same is true of supply lines into landlocked Afghanistan. Within months of the 2001 invasion, Mr. Musharraf signed a deal [the, „ ., ’, of] [the, „ ’, of, a] [proved, proven, prove] [star, stars, superstar] [reputation, popularity, fame] Arguably the biggest surprise would have been if he had turned up, but David Bowie proved some stars are big enough not to have make themselves available [star, stars, super- star] [historic, his- torical, histori- cally] [UFC, fight, MMA] [strong, impres- sive, solid] After 1,501 days as UFC light-heavyweight champion, Jon Jones’ historic title reign came to an end late Tuesday when he was stripped [the, „ ., ’, of] [treated, treat, treating] [consumers, consumer, con- sumption] [customers, customer, clients] Jeremy Corbyn today warned the banking industry it must not treat consumers and entrepreneurs as a “cash cow” and attacked the links between senior politicians [the, „ ., ’, of] 18 PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding Table 11. Additional Third-Order PolySAE Interactions. Further examples of three-way feature compositions. Vanilla SAE sometimes captures individual components but misses the compositional structure. Poly F i Poly F j Poly F k ContextVanilla SAE [Army, Force, Navy] [Israel, Israeli, Jewish] [IDF] Earlier this week, the Friends of the Israel Defense Forces, an organization dedicated to supporting the men and women serving in the IDF, held its annual dinner [Israel, Israeli, Jewish] [the, „ ., ’, of] [annual, monthly, an- nually] [percent, %, points] [regular, regu- larly, frequent] According to the latest research from our Wireless Smartphone Strategies (WSS) service, global smartphone shipments grew 6 percent annually to reach 360 million units [annual, monthly, annually] [the, „ ., ’, of] [get, make, getting] [film, movie, films] [documented, depicted, de- picts] Three tips on how to film anywhere; slums, red light districts, museums, exhibitions, churches, and not get your video camera gear stolen [film, movie, films] [the, „ ., ’, of] [well, ill, poorly] [widely, commonly, widespread] [best, better, good] April 6, 2014. CR Sunday Interview: Zack Soto. ***** is a widely well-liked cartoonist, publisher and [well, poorly, badly] [the, „ ., ’, of] [accept, ac- cepted, accept- ing] [final, ultimate, preliminary] [great, consid- erable, signifi- cant] NEW YORK — Dedicated Hillary Clinton supporters accepted final defeat Wednesday morning even as they struggled to accept that their candidate lost [final, finals, ultimate] [the, „ ., ’, of] [percent, %, percentage] [currency, dol- lar, euro] [cents]The Canadian dollar dipped below 75 cents (U.S.) in Tues- day’s trading as equity markets worldwide remained extremely volatile [the, „ ., ’, of] [., $, „ on, to] [identified, identify, diag- nosed] [virus, Ebola, HIV] [label, labels, labeled] the governor of New York State announced that the first case of Ebola had been diagnosed at Bellevue [the, „ ., ’, of] [base, bases, baseline] [fans, fan, sup- porters] [demand, turnout, at- tendance] It’s a shared problem among fan bases across the National Hockey League: They watch their own players so closely that, after a while [the, „ ., ’, of] [the, „ ’, of, a] [unique, dis- tinct, distinc- tive] [two, different, three] [separate, sep- arated, distinc- tion] For their collaborative project Jus Now, U.K. producer Sam Interface and Trinidad producer LAZAbeam find singularity in mashing up two distinct [people, men, officers] [the, „ ., ’, of] [line, lines, lining] [supply, sup- plies, shortage] [road, route, pipeline] The same is true of supply lines into landlocked Afghanistan. Within months of the 2001 invasion, Mr. Musharraf signed a deal [the, „ ., ’, of] [the, „ ’, of, a] [largest, most, biggest] [able, ible, ability] [stable, stabil- ity, flexible] Groundwater, the globe’s most dependable water insurance system, is not as renewable as researchers once thought [the, „ ., ’, of] [the, „ ’, of, a] 19