Paper deep dive

Dynamics Reveals Structure: Challenging the Linear Propagation Assumption

Hoyeon Chang, Bálint Mucsányi, Seong Joon Oh

Year: 2025Venue: arXiv preprintArea: Formal/TheoreticalType: TheoreticalEmbeddings: 114

Models: OLMo3-7B, Qwen3-30B, Qwen3-4B

Abstract

Abstract:Neural networks adapt through first-order parameter updates, yet it remains unclear whether such updates preserve logical coherence. We investigate the geometric limits of the Linear Propagation Assumption (LPA), the premise that local updates coherently propagate to logical consequences. To formalize this, we adopt relation algebra and study three core operations on relations: negation flips truth values, converse swaps argument order, and composition chains relations. For negation and converse, we prove that guaranteeing direction-agnostic first-order propagation necessitates a tensor factorization separating entity-pair context from relation content. However, for composition, we identify a fundamental obstruction. We show that composition reduces to conjunction, and prove that any conjunction well-defined on linear features must be bilinear. Since bilinearity is incompatible with negation, this forces the feature map to collapse. These results suggest that failures in knowledge editing, the reversal curse, and multi-hop reasoning may stem from common structural limitations inherent to the LPA.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 92%

Last extracted: 3/11/2026, 12:37:58 AM

Summary

The paper investigates the geometric limitations of the Linear Propagation Assumption (LPA) in neural networks, specifically regarding logical coherence during first-order parameter updates. By formalizing relational knowledge through relation algebra, the authors prove that systematic logical propagation (negation and converse) requires a tensor factorization of entity-pair context and relation content. Furthermore, they identify a fundamental obstruction for composition, showing that it necessitates a bilinear structure incompatible with negation, which leads to feature map collapse.

Entities (5)

Neural Networks · technology · 99%Linear Propagation Assumption · concept · 98%Relation Algebra · formalism · 95%Reversal Curse · phenomenon · 95%Systematic Linear Propagation · concept · 92%

Relation Signals (3)

Negation → requires → Tensor Factorization

confidence 95% · For negation and converse, we prove that guaranteeing direction-agnostic first-order propagation necessitates a tensor factorization separating entity-pair context from relation content.

Composition → isobstructedby → Bilinearity

confidence 92% · We show that composition reduces to conjunction, and prove that any conjunction well-defined on linear features must be bilinear.

Linear Propagation Assumption → ischallengedby → Systematic Linear Propagation

confidence 90% · In this work, we rigorously investigate this hypothesis by asking: What structural constraints are imposed on a model’s representation if we demand that local linear updates respect the logical structure of relational knowledge?

Cypher Suggestions (2)

Find all logical operations and their associated structural requirements defined in the paper. · confidence 85% · unvalidated

MATCH (op:Operation)-[:REQUIRES]->(struct:Structure) RETURN op.name, struct.name

Map the relationship between concepts and the phenomena they explain. · confidence 80% · unvalidated

MATCH (c:Concept)-[:EXPLAINS]->(p:Phenomenon) RETURN c.name, p.name

Full Text

113,847 characters extracted from source content.

Expand or collapse full text

Dynamics Reveals Structure: Challenging the Linear Propagation Assumption Hoyeon Chang Bálint Mucsányi Seong Joon Oh Abstract Neural networks adapt through first-order parameter updates, yet it remains unclear whether such updates preserve logical coherence. We investigate the geometric limits of the Linear Propagation Assumption (LPA), the premise that local updates coherently propagate to logical consequences. To formalize this, we adopt relation algebra and study three core operations on relations: negation flips truth values, converse swaps argument order, and composition chains relations. For negation and converse, we prove that guaranteeing direction-agnostic first-order propagation necessitates a tensor factorization separating entity-pair context from relation content. However, for composition, we identify a fundamental obstruction. We show that composition reduces to conjunction, and prove that any conjunction well-defined on linear features must be bilinear. Since bilinearity is incompatible with negation, this forces the feature map to collapse. These results suggest that failures in knowledge editing, the reversal curse, and multi-hop reasoning may stem from common structural limitations inherent to the LPA. Machine Learning, ICML 1 Introduction Modern machine learning systems evolve through a long trajectory, spanning pretraining, continual learning (Wu et al., 2024), and unlearning (Jang et al., 2023). Throughout this lifecycle, the central operation for adapting to new information is the first-order parameter update. Ideally, this adaptation should be rational: when a model revises its belief about a fact, its logically related beliefs should update accordingly to maintain coherence. However, maximizing the likelihood of a target fact is fundamentally an optimization process, distinct from rational belief revision (Hase et al., 2024). Consequently, it remains unclear whether the local geometry of gradient-based updates can inherently guarantee such logical consistency without inducing contradictions. Encouraged by the impressive capabilities of current Large Language Models (LLMs) in logical reasoning during inference (Kojima et al., 2022; Achiam et al., 2023), it is tempting to assume that such coherence is preserved under local, first-order parameter updates. This premise, which we term the Linear Propagation Assumption (LPA), often serves as a foundational design principle in current techniques. For instance, prominent knowledge editing methods explicitly formulate the update as a constrained linear least-squares problem, treating network layers as linear associative memories (Bau et al., 2020; Meng et al., 2022a, b). Beyond editing, this implicit assumption also appears in continual learning strategies that aim to add tasks without forgetting (Lopez-Paz and Ranzato, 2017) and unlearning techniques designed to erase specific knowledge (Jang et al., 2023). However, the validity of the LPA is questionable, as empirical failures of LLMs on logical coherence show. For instance, LLMs exhibit the “reversal curse,” failing to generalize to reverse relationship (Berglund et al., 2023), and struggle with compositional reasoning tasks (Dziri et al., 2023). Since these representations are constructed through first-order updates, such persistent failures suggest that the update mechanism may not reliably imprint the necessary logical structure. This limitation becomes explicitly visible in knowledge editing, where even carefully targeted updates consistently fail to propagate to logical consequences such as negations or implications (Zhong et al., 2023; Cohen et al., 2024; Liu et al., 2025). These phenomena suggest a shared structural issue: the geometry of first-order updates imposes structural constraints that are inherently ill-suited for systematic logical operations. In this work, we rigorously investigate this hypothesis by asking: What structural constraints are imposed on a model’s representation if we demand that local linear updates respect the logical structure of relational knowledge? To answer this, we formalize relational knowledge using relation algebra (Givant, 2006), which builds relational knowledge via three fundamental operations: negation (flipping truth values), converse (swapping argument order), and composition (chaining relations). Following Tarski’s invariance criterion for logical notions (Tarski and Corcoran, 1986; Sher, 2008), we treat entity renamings as symmetries of the query space and ask what constraints such symmetries impose on the linearized geometry. This provides a principled way to test whether first-order parameter updates can support systematic propagation of logical operations. Parameter ManifoldAfter update: Δs(¬q)=−Δs(q) s( q)=- s(q)Tangent Spaceθϕq _q =∇sθ(q)=∇ s_θ(q)ϕ¬q≈−ϕq _ q≈- _qθ+Δθ+ θ Query (q) “T-Rex exists” Negated Query (¬q q) “T-Rex does not exist” Gradient (ϕq _q) [ → ] Inverted Gradient (−ϕq- _q) [ ← ] Log. Not¬ ϕ ϕ Flip−I-I Figure 1: Geometric interpretation of logical equivariance. (Left) A query q is associated with a score sθ(q)s_θ(q) (e.g., log probability), and its gradient ϕq=∇sθ(q) _q=∇ s_θ(q) as a feature. Under the LPA, a local parameter update Δθ θ induces a score change approximated by the inner product: Δs(q)=sθ+Δθ(q)−sθ(q)≈⟨ϕq,Δθ⟩ s(q)=s_θ+ θ(q)-s_θ(q)≈ _q, θ . Logical consistency under direction-agnostic first-order propagation requires that any local parameter change enhancing q suppresses ¬q q (i.e., Δs(¬q)=−Δs(q) s( q)=- s(q)), which necessitates that the gradient vectors be anti-aligned (ϕ¬q≈−ϕq _ q≈- _q). (Right) This geometric requirement induces a commutative diagram where symbolic negation ¬ in the query space corresponds to a linear inversion −I-I in the gradient feature space. Geometrically, satisfying these criteria requires navigating a fundamental trade-off between total superposition (uncontrollable interference) and perfect decoupling (lookup tables). Instead, we seek a structured coupling: for example, an update to p must automatically adjust ¬p p (Fig. 1), while remaining linearly independent from unrelated facts. We translate this requirement into the geometric structure of gradients to formalize Systematic Linear Propagation (SLP). SLP imposes strict logical equivariance on coupled facts, ensuring that linear updates systematically track unary logical transformations (negation and converse). Our theoretical analysis reveals several necessary conditions on the required geometry of first-order updates. For negation, we prove that guaranteeing direction-agnostic first-order propagation necessitates a tensor factorization separating entity-pair context from relation content (Theorem 1), reminiscent of Smolensky (1990). Moreover, converse requires its further decomposition into symmetric and antisymmetric components, constraining how argument order is represented (Theorem 2). In contrast, we identify a fundamental obstruction when extending this systematicity to composition (Theorem 3). We show that a minimal form of systematic composition reduces to conjunction, where a conjunction well-defined on linear features necessitates a bilinear structure. However, we demonstrate that this bilinearity is incompatible with the geometry enforced by negation equivariance. This conflict forces the feature map to collapse, suggesting that the failure to propagate updates to compositional consequences in the first-order regime may stem from a fundamental geometric mismatch. More broadly, our results support the view that dynamics reveals structure: analyzing how representations transform under updates exposes structural necessities that are invisible to static function approximation. This perspective contributes to the classical systematicity debate (Fodor and Pylyshyn, 1988) by deriving binding-compatible block structure not as an architectural choice, but as a geometric necessity for logical coherence under LPA. Ultimately, this opens a path toward logical geometric deep learning (Bronstein et al., 2021), treating logical operations not merely as static rules, but as dynamic symmetries to be preserved throughout learning. 2 Problem Formulation This section formalizes the problem. We first show empirically that current LLMs violate a basic requirement for negation consistency under LPA (Sec. 2.1), then introduce relation algebra as our formalism (Sec. 2.2) and linearized features as our geometric tool (Sec. 2.3). These lead to our central definition: Systematic Linear Propagation (SLP) (Sec. 2.4), which grounds the theorems in Sec. 3. 2.1 Motivation: Gradient Misalignment in LLMs Figure 2: Gradient alignment hinders negation consistency. Cosine similarities between gradients of facts and their negations. Contrary to the theoretical requirement for anti-alignment (=−1=-1), empirical gradients are strongly positively aligned (≈0.85≈ 0.85). The gradients are computed with respect to the parameters of the last Transformer block and LM head. See App. A for detailed setup. The Linear Propagation Assumption (LPA) posits that logical consistency can be maintained via local, first-order parameter updates. For such updates to be rational, the model’s geometry must support the logical relations between facts. Specifically, as illustrated in Fig. 1, increasing the score of a query q should automatically suppress that of the negated query ¬q q. Geometrically, this implies that the update vector Δθ θ enhancing q should naturally have a negative projection onto the gradient of ¬q q. In other words, the gradients ∇θ(q) _θ(q) and ∇θ(¬q) _θ( q) should be anti-aligned. We empirically test this condition on Qwen3-4B and 30B (Yang et al., 2025), and Olmo3-7B (Olmo et al., 2025) using the TREx subset of Negated LAMA (Kassner and Schütze, 2020). Contrary to the ideal, the results in Figs. 2 and 4 reveal a misalignment: the gradients of facts and their negations are strongly positively aligned. Consequently, a typical local update improving q will often induce a similar increase in ¬q q, creating an obstruction for LPA-based methods. We hypothesize this gradient misalignment not merely as a training artifact, but as a symptom of a deeper structural mismatch. Standard representations, shaped by first-order dynamics, seemingly lack the geometric capacity to distinguish a fact from its negation in the tangent space. This motivates our central question: What representation geometry is required to guarantee systematic logical propagation? 2.2 Preliminary: Relation Algebra To study the question, we first formalize the logical structure of relational knowledge using relation algebra. The NLP community has predominantly formalized factual knowledge as a collection of relational triples (h,r,t)(h,r,t), where a head entity h and a tail entity t are connected by a binary relation r (e.g., (T-Rex, HasA, FourLegs)) (Petroni et al., 2019). This has become a standard protocol for evaluating factual knowledge in LLMs (Cohen et al., 2024), interpreting LLM as an implicit knowledge graph. To analyze this setup mathematically, we adopt the formalism of relation algebra (Givant, 2006). Relation algebra provides a rigorous language for manipulating binary relations, which we use to organize the logical operations we study over relational knowledge modeled as a triplet. Let E be a finite set of entities, and write U:=E×EU:=E× E for the universe of ordered pairs. A relation on E is a subset r⊆Ur U. Intuitively, (h,t)∈r(h,t)∈ r means that h stands in relation r to t.111Throughout, we conflate a relation name with the set of entity pairs for which it holds in the underlying knowledge base. Now, we write the universe of relations :=2U Rel:=2^U for the set of all binary relations on E. Relation algebra consists of Boolean operations alongside relation-specific operations such as converse and composition. In this work, we study two regimes: we first focus on the unary operations negation (¬ ) and converse ((⋅)⌣(·) ), and later turn to the binary operation of composition ((⋅;⋅)(·;·)), which underlies multi-hop reasoning. For any relations r,s∈r,s∈ Rel, these operations are defined set-theoretically as: ¬r r :=U∖r, :=U r, (1) r⌣ r :=(t,h):(h,t)∈r, :=\(t,h):(h,t)∈ r\, (2) r;s r;s :=(h,t):∃b∈E,(h,b)∈r∧(b,t)∈s. :=\(h,t):∃ b∈ E,(h,b)∈ r (b,t)∈ s\. (3) We focus on these three operations since recent studies identify them as recurring failure modes of LLMs: negation robustness (Kassner and Schütze, 2020; Liu et al., 2025), reversed queries (the “reversal curse”) (Berglund et al., 2023), and multi-hop propagation (Cohen et al., 2024). For our later analysis, let ℛR be the closure of a base set of relations ℛ0⊆R_0 Rel under the two unary operations of negation and converse, i.e., the smallest set containing ℛ0R_0 and satisfying r∈ℛ⇒¬r∈ℛr r and r⌣∈ℛr . We call (ℛ,¬,(⋅)⌣)(R, ,(·) ) our unary relation algebra.222We later return to consider composition in Sec. 4. Now, given h,t∈Eh,t∈ E and r∈ℛr , we write r(h,t)r(h,t) for the atomic formula asserting that (h,t)∈r(h,t)∈ r. Negation and converse act on these atomic formulas as ¬r(h,t) r(h,t) and r⌣(t,h)r (t,h), respectively. In later sections, we will study how such operations should be represented and propagated in a linear feature space. 2.3 Queries, Scores, and Linearized Features Having established the symbolic structure of relational knowledge, we now bridge the gap to the continuous geometry of neural networks. Our goal is to analyze how logical operations on symbols map to geometric transformations on features. To this end, we operate in the linearized regime of parameter updates, treating the gradient of a query as its feature vector. Note that this approach aligns with the Neural Tangent Kernel (NTK) perspective (Jacot et al., 2018), effectively treating the model as a linear function of parameters defined by these gradient features. Queries. We model each atomic formula as a triplet of a head entity, a relation, and a tail entity. Let E be a finite set of entities and let ℛR be our unary relation algebra. A query is a triple q=(h,r,t)∈Q:=E×ℛ×Eq=(h,r,t)\ ∈\ Q:=E×R× E, where a query q=(h,r,t)q=(h,r,t) can be read as an atomic formula r(h,t)r(h,t) as discussed above. Scores and Linearization. Fix a finite-dimensional real inner-product space (Θ,⟨⋅,⋅⟩)( , ·,· ) representing the parameters of a differentiable model, and let θ0∈Θ _0∈ be a reference model state. We associate each query q∈Qq∈ Q with a differentiable scalar score sθ(q)∈ℝs_θ(q) (e.g., the logit of the correct tail entity). We assume that the model’s decision about q (e.g., predicted truth or preference) is determined by a rule strictly monotone in sθ(q)s_θ(q). As we are primarily interested in how these scores change under the first-order regime, we define the feature of a query as follows: Definition 1 (Linearized feature). For each query q∈Qq∈ Q, we define its linearized feature at θ0 _0 as the gradient of the score: ϕq:=∇θsθ(q)|θ=θ0∈Θ. _q\ :=\ _θs_θ(q) |_θ= _0\ ∈\ . Consequently, the score change under a local parameter update Δθ θ is given by the first-order Taylor approximation: sθ0+Δθ(q)≈sθ0(q)+⟨ϕq,Δθ⟩. s_ _0+ θ(q)\ ≈\ s_ _0(q)\ +\ _q, θ . The features ϕq _q typically span only a subspace of the full parameter space. Henceforth, we restrict our analysis to the subspace W:=spanϕq:q∈Q⊆ΘW:=span\ _q:q∈ Q\ , since components of an update Δθ θ orthogonal to W do not affect first-order score changes on the queries we study. 2.4 Systematic Linear Propagation We now unify the algebraic structure of knowledge (Sec. 2.2) with the geometric view of features (Sec. 2.3). Our goal is to formalize when a linearized model supports the systematic propagation with respect to the unary relation algebra we consider. Qualitatively, systematicity imposes structured coupling in the feature space: an update to a query q should propagate to its converse and inversely to its negation, while logically unrelated facts remain independently controllable. A key choice in our formulation is that we do not define systematicity as the mere existence of a carefully selected update direction that satisfies these constraints. Instead, we ask for automaticity: logical coupling must be an intrinsic geometric property of the representation. This ensures that once an update is applied to enforce q, the induced effects on logically related queries follow automatically, without requiring the optimizer to solve a separate constraint-satisfaction problem. Formally, we require the logical constraints to hold for all Δθ∈W θ∈ W in the first-order regime. This direction-agnostic requirement is motivated by the potential fragility of relying on specific “safe” update directions in high-capacity models. In practical regimes where the feature space is highly superposed (Elhage et al., 2022; Hu et al., 2025), reliably identifying an update direction that satisfies logical constraints while avoiding interference with unrelated facts is often prohibitively difficult. Furthermore, in lifelong learning settings, any such transient “safe subspace” itself might be prone to drift as the representation evolves. Thus, by enforcing systematicity for all update directions, we isolate a notion in which logic is an intrinsic property of the local geometry rather than an artifact of a specific optimization trajectory or a transient subspace. To formalize this, we first identify the closed sets of queries that must be logically coupled under any update. Recall that our unary relation algebra ℛR is closed under negation ¬ and converse (⋅)⌣(·) . These relational operations induce a corresponding action on the space of queries Q as follows: ¬(h,r,t) (h,r,t) :=(h,¬r,t), :=(h,\ r,\ t), (4) rev(h,r,t) (h,r,t) :=(t,r⌣,h). :=(t,\ r ,\ h). (5) Intuitively, ¬ flips the truth value by acting locally on the relation slot (e.g., ¬ChildOf→NotChildOf ChildOf→ NotChildOf where ChildOf⊆U ChildOf U and NotChildOf:=U∖ChildOf NotChildOf:=U ChildOf), while keeping entities fixed. In contrast, revrev swaps the head and tail entities while moving to the converse relation (e.g., ChildOf⌣→ParentOf ChildOf → ParentOf), thereby preserving the truth value (i.e., r(h,t)⇔r⌣(t,h)r(h,t) r (t,h)). Observe that both operations are involutions and commute: ¬(¬q)=q,rev(rev(q))=q,rev(¬q)=¬(rev(q)). ( q)=q, (rev(q))=q, ( q)= (rev(q)). Hence, these two logical operations generate a group G:=id,¬,rev,¬∘revG:=\\,id,\ ,\ rev,\ \,\ acting on Q. We define the orbit of a query q under this group as its logical family: G⋅q:=g(q):g∈G.G· q:=\\,g(q):g∈ G\,\. In plain words, two queries lie in the same family iff one can be transformed into the other via the group operations. We now translate the semantic requirements of these families into constraints on the feature space W. As discussed above, we require that the score changes are coordinated for all update directions Δθ θ. For negation, the condition Δs(¬q)=−Δs(q) s( q)=- s(q) implies that the feature vectors are strictly anti-aligned: ϕ¬q=−ϕq _ q=- _q. Applying the same logic to the converse operation yields ϕrev(q)=ϕq _rev(q)= _q. Therefore, we formalize this requirement as follows: Definition 2 (Logical equivariance). A feature map ϕ:Q→Wφ:Q→ W is logically equivariant w.r.t. ¬ and revrev if ϕ¬q=−ϕq,ϕrev(q)=ϕqfor all q∈Q. _ q=-\, _q, _rev(q)= _q all q∈ Q. While Def. 2 ensures coupling within families, we must also ensure that logically unrelated facts do not interfere with each other. If feature vectors from disjoint logical families are linearly dependent, it becomes impossible to modify one family without inadvertently affecting another. Therefore, to enable selective update, distinct logical families must be linearly independent in the feature space. We formulate these two requirements, intra-family coupling and inter-family decoupling, as Systematic Linear Propagation (SLP). Definition 3 (Systematic Linear Propagation (SLP)). A linearized model ϕqq∈Q\ _q\_q∈ Q is said to satisfy Systematic Linear Propagation with respect to the unary relation algebra if: 1. It is logically equivariant in the sense of Def. 2 2. There exists a choice of one representative query from each family s.t. the corresponding feature vectors are linearly independent in W. We emphasize that SLP is a deliberately strong formalization of the informal Linear Propagation Assumption: since only the projection of an update onto W affects first-order score changes, we require the logical constraints to hold for all Δθ∈W θ∈ W. Thus, failing SLP does not imply that propagation is impossible, but rather that it cannot be guaranteed by direction-agnostic first-order geometry alone and must rely on additional structure (e.g., restricted update directions, nonlinearity, or higher-order effects). In the following section, we derive the geometric structure theoretically required for a model to support SLP. 3 SLP Induces Tensor-Factorized Features Having formalized SLP, we derive the structural implications of these constraints. In Sec. 3.1, we prove that demanding systematic propagation with respect to negation requires the decomposition of feature space into a tensor product structure separating context (i.e., entity pair) from relation. Next, we prove in Sec. 3.2 that requiring systematic propagation with respect to converse further decomposes the entity pair component into symmetric and antisymmetric parts, encoding the directionality of relations. 3.1 Negation Equivariance Forces Tensor Factorization To rigorously derive the geometric structure, we adopt Tarski’s criterion (Tarski and Corcoran, 1986; Sher, 2008) as our guiding principle. It posits that logical notions are characterized by their invariance under all permutations of the domain’s objects. In our context, this implies that logical operations must be invariant to the specific identities of the entities involved. For instance, the logic of negation should apply equally to T-Rex and Chicken. We formalize this requirement by identifying the symmetry group acting on the query space Q. Let GE:=Sym(E)G_E:=Sym(E) be the permutation group acting on the set of entities E. Following Tarski’s criterion, the system must remain consistent under any entity renaming σ∈GEσ∈ G_E, which acts on queries by renaming entities uniformly: σ⋅(h,r,t):=(σ(h),r,σ(t))σ·(h,r,t):=(σ(h),r,σ(t)). Simultaneously, the negation operation (¬ ) is an involution (¬(¬q)=q ( q)=q), generating the cyclic group of order 2, denoted as ℤ2=1,−1Z_2=\1,-1\. Crucially, entity renaming commutes with logical operations: renaming entities does not alter the logical relationship, and logical transformations do not affect entity identities, i.e., σ⋅(¬q)=¬(σ⋅q)σ·( q)= (σ· q). Consequently, we can define a product group H:=GE×ℤ2H:=G_E×Z_2, which acts on the set of queries Q via the simultaneous action of renaming and optional negation. We translate these symbolic symmetries into the geometry of the feature space W. As formally verified in Lemma 7, under SLP, it is guaranteed that the action of H on the query space induces a well-defined linear group representation of H on the feature space W.333For readers unfamiliar with these concepts, we provide a brief primer on group representation theory in App. B.444In this section, we use ‘representation’ exclusively in the sense of group representation theory, whereas we refer to the ML concept of embeddings strictly as ‘features’ to prevent ambiguity. Since W is a finite-dimensional representation of the product group H=GE×ℤ2H=G_E×Z_2, we can utilize standard results in representation theory to prove the following factorization theorem. Theorem 1 (Context-Relation Factorization (Proof in App. C)). Let ϕ:Q→Wφ:Q→ W be the feature map defined by q↦ϕq _q. If ϕφ satisfies SLP, then there exist real vector spaces Cii\C_i\_i, Rii\R_i\_i and an isomorphism W≅⨁i(Ci⊗Ri)W _i(C_i R_i) such that ϕ(h,r,t)=⨁i(∑k=1miui,k(h,t)⊗vi,k(r)),φ(h,r,t)\;=\; _i ( _k=1^m_iu_i,k(h,t) v_i,k(r) ), where ui,k:E×E→Ciu_i,k:E× E→ C_i and vi,k:ℛ→Riv_i,k:R→ R_i. Moreover, negation acts locally as a sign flip on each relation component, i.e., vi,k(¬r)=−vi,k(r)v_i,k( r)=-v_i,k(r). Theorem 1 indicates that, to support systematic linear propagation with respect to negation, the feature geometry must separate entity-pair context information from relation information. Specifically, logical negation is realized via local negation on the relation factors vi,k(r)v_i,k(r). In Sec. 6, we discuss how this slot-local action relates to the problem of variable binding (Greff et al., 2020). 3.2 Converse Equivariance Forces Positional Alignment Having established that logical equivariance under negation forces a blockwise tensor factorization, we now investigate the structural implications of the converse operation. Recall that the symbolic converse operation swaps the head and tail entities and replaces r with its converse r⌣r , i.e., rev(h,r,t)=(t,r⌣,h)rev(h,r,t)=(t,r ,h). SLP requires the feature map to be invariant under this operation, i.e., ϕrev(q)=ϕq _rev(q)= _q. This constraint forces a parity alignment between the context and relation components, as stated in the following theorem. Theorem 2 (Symmetric-Antisymmetric Alignment (Proof in App. D)). Assume the context-relation factorization from Theorem 1 and suppose ϕφ is converse-invariant, i.e. ϕ(h,r,t)=ϕ(t,r⌣,h)for all (h,r,t)∈Q.φ(h,r,t)=φ(t,r ,h) all (h,r,t)∈ Q. Then for each block i, the corresponding feature component ϕi _i admits a decomposition with matched-parity: ϕi(h,r,t)=ϕi+(h,r,t)+ϕi−(h,r,t), _i(h,r,t)= _i^+(h,r,t)+ _i^-(h,r,t), where ϕi± _i^± can be chosen as a sum of context-relation tensor terms u(h,t)⊗v(r)u(h,t) v(r) satisfying u(t,h)=±u(h,t)andv(r⌣)=±v(r).u(t,h)=± u(h,t) v(r )=± v(r). This result reveals a geometric account of how directionality can be represented under converse invariance. The first term represents symmetric components, where neither the relation nor the entity pair cares about order. In contrast, the second term u−⊗v−u^- v^- encodes directionality through a mechanism of sign cancellation: swapping the entities introduces a sign flip (−1-1) in the context factor u(h,t)u(h,t), which is exactly compensated by the sign flip in the relation factor v(r)v(r) induced by the converse operation. Note that the constraints from negation and converse are compatible, as the corresponding symbolic operations commute. In conclusion, while SLP is theoretically realizable through this specific symmetry, such a structure is likely absent in the unconstrained feature spaces of practical LLMs, as demonstrated in Sec. 2.1. 4 The Collapse of Linear Conjunction In this section, we analyze systematic propagation for relational composition (i.e., multi-hop reasoning). We first show that systematic relational composition must handle conjunction in a minimal subclass. We then identify an obstruction to conjunction under our linearized systematicity requirement. In particular, if conjunction must be well-defined on the linearized feature space (Assumption 1), then conjunction-faithful propagation is incompatible with negation equivariance, unless the feature map is a zero map. We seek systematic propagation to compositional consequences: a targeted update to an atomic formula on a relation r∈ℛr must automatically adjust composed queries involving a composition r;sr;s for some s∈ℛs . That is, by systematicity, we require that the feature of r;sr;s be modeled constructively from its constituents to ensure propagation under arbitrary parameter updates that change r. Recall from Eq. 3 that relational composition, (r;s)(h,t)⇔∃b:r(h,b)∧s(b,t)(r;s)(h,t) ∃ b:\,r(h,b) s(b,t), links two queries via an intermediate entity. While the general case involves existential aggregation over possibly many witnesses, we isolate a minimal subclass that removes aggregation altogether. Specifically, in the unique-witness case where there exists a unique b∗b^* satisfying the link, composition reduces to the conjunction r(h,b∗)∧s(b∗,t)r(h,b^*) s(b^*,t). Thus, any mechanism that supports systematic composition must, at a minimum, support systematic conjunction. We therefore investigate whether conjunction can be realized in the linearized feature space while remaining compatible with negation equivariance. We first characterize the constraints LPA imposes on feature geometry to ensure that conjunction is supported systematically. Central to this relationship is compositionality: the representation of a compound statement should be systematically determined by its constituents. That is, given a compound query p∧qp q with p,q∈Qp,q∈ Q, its feature ϕp∧q _p q should be determined solely by ϕp _p and ϕq _q. Along with the logical properties of sentential conjunction (commutativity and idempotence), we formalize this intuition as follows: Definition 4 (Conjunction-faithful features under LPA). Let Q∧Q be the closure of Q with respect to negation and conjunction, and let W∧:=spanϕp:p∈Q∧W :=span\ _p:p∈ Q \. We say that ϕφ is conjunction-faithful (under LPA) if there exists a binary operator F:W∧×W∧→W∧F:W × W → W that governs the conjunction of features, satisfying the following properties for all p,q∈Q∧p,q∈ Q and u,v∈W∧u,v∈ W : (i) Consistency: ϕp∧q=F(ϕp,ϕq) _p q\;=\;F( _p, _q). (i) Symmetry: F(u,v)=F(v,u)F(u,v)\;=\;F(v,u). (i) Idempotence: F(u,u)=uF(u,u)\;=\;u. The consistency condition states that the feature of a conjunction depends solely on the features of its conjuncts. Consequently, two formulas with identical features must behave identically under conjunction with any fixed context: Lemma 1 (Substitution). If ϕφ is conjunction-faithful, then for all p,p′,q∈Q∧p,p ,q∈ Q , ϕp=ϕp′ _p= _p implies ϕp∧q=F(ϕp,ϕq)=F(ϕp′,ϕq)=ϕp′∧q _p q=F( _p, _q)=F( _p , _q)= _p q. While Lemma 1 guarantees that conjunction is a well-defined function of features, the linearity of LPA imposes a stronger constraint. Under the first-order regime, the editing dynamics are governed entirely by linear projections (Δs(p)=⟨ϕp,Δθ⟩ s(p)= _p, θ ). This implies that the editor cannot distinguish any linear dependence: if a weighted sum of features is zero (∑iaiϕpi=0 _ia_i _p_i=0), the collective score change is identically zero for any parameter update. Geometrically, such a zero-sum combination constitutes information that is operationally non-existent to the model’s update mechanism. If a conjunction operator were to map a null signal to a non-zero feature, it would generate distinctions based on information invisible to the editor, thereby decoupling the logic of propagation from the physics of editing. To ensure that propagation remains predictable from first-order geometry alone, we adopt a strong notion of systematicity: logical operations must be consistent with the linear geometry of the editor, meaning they must preserve linear dependencies. Formally, this requires that if a linear combination vanishes (∑iaiϕpi=0 _ia_i _p_i=0), its conjunction with any context q must also vanish (∑iaiϕpi∧q=0 _ia_i _p_i q=0). This condition is formalized as the following assumption: Assumption 1 (Kernel stability for conjunction). Let V∧:=spanep:p∈Q∧V :=span\e_p:p∈ Q \ be the free real vector space on Q∧Q , and let Φ:V∧→W∧ :V → W be the linear extension defined by Φ(ep):=ϕp (e_p):= _p. For each fixed q∈Q∧q∈ Q , define the linear map Tq:V∧→V∧T_q:V → V by Tq(ep):=ep∧qT_q(e_p):=e_p q. Assume that Tq(ker⁡Φ)⊆ker⁡Φfor all q∈Q∧.T_q( ) all q∈ Q . Mathematically, this condition ensures that the conjunction operation is well-defined on W∧W . Under this assumption, we can prove that a feature-level conjunction of two queries is fully characterized by a bilinear operator. Lemma 2 (Kernel Stability Yields Bilinearity (Proof in App. E)). Assume Assumption 1 and the symmetry of conjunction (Def. 4(i)). Then, there exists a unique symmetric bilinear operator F~:W∧×W∧→W∧ F:W × W → W such that F~(ϕp,ϕq)=ϕp∧q F( _p, _q)= _p q for all p,q∈Q∧p,q∈ Q . Lemma 2 implies that once conjunction is required to behave systematically under LPA, its behavior on features is completely governed by the induced bilinear operator F~ F. However, the theorem below demonstrates that no non-trivial bilinear feature map can satisfy idempotence. Symbolic Input ¬p∧¬p p p Symbolic Result ¬p p −ϕp- _p Feature Inputs (−ϕp,−ϕp)(- _p,- _p) +ϕp+ _p 1. Logic(Idempotence)2. Linearize(Neg. Equiv.)1. Linearize(Neg. Equiv.)2. Geometry(Bilinearity)×Collapse Figure 3: The incompatibility of logical conjunction and LPA. Top Path: Logical idempotence maps (¬p,¬p)→¬p( p, p)→ p, expecting feature −ϕp- _p. Bottom Path: Linearization gives (−ϕp,−ϕp)(- _p,- _p), and the bilinearity of F~ F yields +ϕp+ _p. The only possible way to commute the two paths is setting ϕp=0 _p=0, leading to a collapse. Theorem 3 (Structural Collapse of Linear Conjunction). Let F~ F be the bilinear map from Lemma 2. If ϕφ satisfies idempotence (Def. 4) and negation equivariance (ϕ¬p=−ϕp _ p=- _p), then ϕq=0 _q=0 for all q∈Q∧q∈ Q . Proof. Let p∈Qp∈ Q be an atomic query and let u=ϕpu= _p. From Idempotence, we have F~(u,u)=ϕp∧p=ϕp=u F(u,u)= _p p= _p=u. By negation equivariance, ϕ¬p=−u _ p=-u. Since ¬p∈Q∧ p∈ Q , we may apply idempotence to ¬p p as well: F~(−u,−u)=F~(ϕ¬p,ϕ¬p)=ϕ¬p∧¬p=ϕ¬p=−u. F(-u,-u)= F( _ p, _ p)= _ p p= _ p=-u. On the other hand, since F~ F is bilinear, F~(−u,−u)=(−1)(−1)F~(u,u)=F~(u,u)=u. F(-u,-u)=(-1)(-1) F(u,u)= F(u,u)=u. Comparing the two results, we have −u=u-u=u, which implies u=0u=0. Thus, ϕp=0 _p=0 for all atomic queries. Since any compound query q∈Q∧q∈ Q is formed by finite conjunctions of atomic queries (i.e., q=p1∧⋯∧pkq=p_1 … p_k where pi∈Qp_i∈ Q) and F~(0,⋅)=0 F(0,·)=0, by induction, ϕq=0 _q=0 for all q∈Q∧q∈ Q . ∎ Intuitively, the collapse stems from a fundamental conflict between the symbolic rules of logic and the algebraic rules of bilinearity, as illustrated in Fig. 3. Specifically, the diagram illustrating the correspondence between logic and geometry fails to commute. Following the logical path, the rules of negation and idempotence dictate that the result must flip sign (u→−u→-u), preserving the negation. In contrast, following the geometric path, the bilinearity of F~ F forces the signs to cancel out ((−u,−u)→u(-u,-u)→ u), as the product of two negatives is positive. The only vector satisfying both requirements is the zero vector, leading to the collapse. This result implies that, in the first-order regime, the geometries required for negation and conjunction are incompatible. Under our systematicity conditions that render conjunction as a linear operator on features, enforcing negation equivariance collapses the representation. Importantly, this obstruction is structural rather than an optimization failure. It therefore casts doubt on the feasibility of systematic multi-hop propagation under local linear updates, consistent with empirical breakdowns, as discussed in Sec. 6. 5 Related Work Linearized Dynamics and Model Adaptation. A common lens in modern deep learning is to approximate adaptation via local updates in a linearized feature space, a perspective theoretically grounded in Neural Tangent Kernel (NTK) analyses (Jacot et al., 2018; Lee et al., 2019). This viewpoint arises across settings that rely on gradient-based updates, including pretraining trajectories that accumulate factual associations (Chang et al., 2024), as well as domain adaptation and continual learning where updates interact with prior knowledge (Gururangan et al., 2020; Wu et al., 2024). Most explicitly, knowledge editing methods such as ROME and MEMIT treat Transformer MLPs as key-value memories (Geva et al., 2021), implementing edits as constrained least-squares updates (Meng et al., 2022a, b). Related first-order schemes also appear in unlearning approaches aimed at reducing the influence of specific data (Jang et al., 2023; Barez et al., 2025). A recurring empirical theme is that local updates do not reliably generalize to logically related or compositional variants of the target behavior (Cohen et al., 2024; Liu et al., 2025), echoing broader concerns about the gap between optimization and belief revision (Hase et al., 2024). Systematicity and Variable Binding. The systematicity debate argues that connectionist models may lack the structural machinery for compositional generalization and variable binding (Fodor and Pylyshyn, 1988; Lake and Baroni, 2018). While Smolensky (1990) proposed Tensor Product Representations as a mechanism for variable binding, the binding problem remains an active challenge in modern deep learning (Greff et al., 2020). Recent interpretability work studies linear representations in neural activations (Park et al., 2023), yet empirical studies report persistent failures of compositional generalization in LLMs (Dziri et al., 2023; Wang et al., 2024a; Chang et al., 2025). In particular, Wang and Sun (2025) hypothesized that failures of variable binding underlie the reversal curse. These threads motivate analyzing what structures are required for systematic behavior. Logical Structure as Geometric Invariance. To formalize logical structure in vector spaces, we adopt invariance-based views of logical notions (Tarski and Corcoran, 1986; Sher, 2008). This aligns with geometric deep learning, which characterizes representations by invariance and equivariance (Bronstein et al., 2021; Cohen and Welling, 2016). This perspective motivates treating logical operations as transformations on queries and studies the constraints they induce in a linearized feature geometry. 6 Discussion Our investigation establishes a bridge between the algebraic structure of logic and the linear geometry of knowledge representations. A central insight of our work is that the structure of knowledge is best observed through its dynamics, i.e., how representations coordinate under first-order updates. While expressive networks may statically realize a logically coherent representation, our results (Sec. 3, Sec. 4) show that preserving such coherence under linearized updates imposes strict geometric constraints that are invisible to static analyses. This distinction between expressivity and systematic propagation provides a geometric explanation for when logical structure can be realized and when it can be preserved under first-order updates. Implications for Systematicity. As discussed in Sec. 5, tensor products were proposed as a sufficient mechanism for variable binding. Our results sharpen this connection in the linearized update regime: requiring systematic linear propagation forces a blockwise tensor factorization of the representation (Sec. 3). In this sense, our analysis suggests that such a mechanism is not only an architectural choice, but can be a geometric necessity for preserving logical structure under first-order dynamics. Moreover, our converse analysis (Sec. 3.2) shows that systematic propagation constrains how argument order is encoded, requiring sufficient positional structure to remain consistent under reversal. Relatedly, even when a concept is linearly decodable at a fixed parameter state, static linearity alone does not guarantee systematic interaction under local updates. Toward Logical Geometric Deep Learning. Our results align with geometric deep learning by casting systematic propagation as an equivariance constraint of logical operations. This motivates a direction toward logical geometric deep learning, where update-time behavior is constrained by these symmetries beyond static function approximation. One way to view this is that learning dynamics should remain near a constrained manifold defined by logical equivariance, instead of drifting into regions where logically related queries become entangled. Architecturally, this points to parameterizations and objectives that enforce the structures required by systematic propagation, for example, by regularizing gradient features or by building equivariant modules that preserve these symmetries across updates. Implications for Model Adaptation and Editing. This geometric perspective has direct implications for practical model adaptation, most notably knowledge editing. Locate-and-edit methods assume that knowledge can be localized and manipulated through a single local update, but our results emphasize that the relevant structure is defined by how representations transform under that update. This shifts attention from where a fact is stored to whether the representation supports stable, structure-preserving update directions, i.e., editability as a geometric property (Sinitsin et al., 2020). At the same time, our conjunction result suggests that approaches relying on local linearity may face fundamental limits when asked to propagate edits to compositional consequences. This points to mechanisms beyond single-step locality, such as iterative nonlinear updates or memory-based updates that bypass single-update bottlenecks (Mitchell et al., 2022; Wang et al., 2024b). Open Questions. Finally, we view our results as a stress test of systematic propagation in a deliberately strict, first-order setting. We impose direction-agnostic equivariance as a strong requirement, yielding sharp geometric constraints and an impossibility result. Real training can depart from this regime with non-negligible step sizes, where higher-order and nonlocal effects may matter. An important direction is to understand which conclusions persist under weaker, approximate notions of equivariance, and how architectures or objectives can encourage the required structure over repeated updates. Taken together, our results motivate the design of “logical” inductive biases, and suggest a concrete program for testing whether enforcing such structure can support systematic propagation. 7 Conclusion In this work, we analyzed the geometric limits of the Linear Propagation Assumption (LPA). We showed that demanding systematic propagation imposes strict structural constraints: negation and converse require a blockwise tensor-product decomposition (Theorems 1 and 2), while conjunction is fundamentally incompatible with negation under linearity (Theorem 3). These constraints suggest that failures of local linear updates are not merely optimization artifacts, but may reflect a mismatch between the algebraic structure of logic and first-order update geometry. More broadly, failures in knowledge editing, the reversal curse, and multi-hop reasoning may share a common geometric origin in the limits of first-order propagation under the LPA. Ultimately, our findings reinforce the view that dynamics reveals structure: logical coherence is constrained not only by what a model can represent, but by how it can be updated. Impact Statement This paper presents a theoretical analysis of the geometric limits of first-order model adaptation in language models. Our goal is to clarify what local, approximately linear update mechanisms can and cannot guarantee about preserving logical coherence. By characterizing structural constraints required for systematic propagation under such updates, our findings caution against naive reliance on single-step local interventions in settings where compositional consistency matters. This perspective is relevant not only to knowledge editing, but also to continual learning and unlearning, where updates are expected to modify specific behaviors without introducing hard-to-detect logical inconsistencies. We hope this contributes to safer deployment by motivating update mechanisms and inductive biases that explicitly support structure-preserving adaptation. Acknowledgements The authors thank Jaewon Oh, Hanseul Cho, Jaden Park, and Jinwoo Heo for comments and discussions. References J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. (2023) Gpt-4 technical report. arXiv preprint arXiv:2303.08774. Cited by: §1. F. Barez, T. Fu, A. Prabhu, S. Casper, A. Sanyal, A. Bibi, A. O’Gara, R. Kirk, B. Bucknall, T. Fist, L. Ong, P. H. S. Torr, K. Lam, R. Trager, D. Krueger, S. Mindermann, J. Hernández-Orallo, M. Geva, and Y. Gal (2025) Open problems in machine unlearning for ai safety. ArXiv abs/2501.04952. External Links: Link Cited by: §5. D. Bau, S. Liu, T. Wang, J. Zhu, and A. Torralba (2020) Rewriting a deep generative model. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1. L. Berglund, M. Tong, M. Kaufmann, M. Balesni, A. C. Stickland, T. Korbak, and O. Evans (2023) The reversal curse: llms trained on” a is b” fail to learn” b is a”. arXiv preprint arXiv:2309.12288. Cited by: §1, §2.2. M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković (2021) Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478. Cited by: §1, §5. H. Chang, J. Park, H. Cho, S. Yang, M. Ko, H. Hwang, S. Won, D. Lee, Y. Ahn, and M. Seo (2025) Characterizing pattern matching and its limits on compositional task structures. External Links: 2505.20278, Link Cited by: §5. H. Chang, J. Park, S. Ye, S. Yang, Y. Seo, D. Chang, and M. Seo (2024) How do large language models acquire factual knowledge during pretraining?. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, External Links: Link Cited by: §5. R. Cohen, E. Biran, O. Yoran, A. Globerson, and M. Geva (2024) Evaluating the ripple effects of knowledge editing in language models. Transactions of the Association for Computational Linguistics 12, p. 283–298. Cited by: §1, §2.2, §2.2, §5. T. Cohen and M. Welling (2016) Group equivariant convolutional networks. In Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger (Eds.), Proceedings of Machine Learning Research, Vol. 48, New York, New York, USA, p. 2990–2999. External Links: Link Cited by: §5. D. S. Dummit, R. M. Foote, et al. (2004) Abstract algebra. Vol. 3, Wiley Hoboken. Cited by: Appendix B. N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, S. Welleck, P. West, C. Bhagavatula, R. L. Bras, J. D. Hwang, S. Sanyal, X. Ren, A. Ettinger, Z. Harchaoui, and Y. Choi (2023) Faith and fate: limits of transformers on compositionality. In Thirty-seventh Conference on Neural Information Processing Systems, External Links: Link Cited by: §1, §5. N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen, et al. (2022) Toy models of superposition. arXiv preprint arXiv:2209.10652. Cited by: §2.4. J. A. Fodor and Z. W. Pylyshyn (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28 (1-2), p. 3–71. Cited by: §1, §5. M. Geva, R. Schuster, J. Berant, and O. Levy (2021) Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M. Moens, X. Huang, L. Specia, and S. W. Yih (Eds.), Online and Punta Cana, Dominican Republic, p. 5484–5495. External Links: Link, Document Cited by: §5. S. Givant (2006) The calculus of relations as a foundation for mathematics. Journal of Automated Reasoning 37 (4), p. 277–322. Cited by: §1, §2.2. K. Greff, S. Van Steenkiste, and J. Schmidhuber (2020) On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208. Cited by: §3.1, §5. S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith (2020) Don’t stop pretraining: adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault (Eds.), Online, p. 8342–8360. External Links: Link, Document Cited by: §5. P. Hase, T. Hofweber, X. Zhou, E. Stengel-Eskin, and M. Bansal (2024) Fundamental problems with model editing: how should rational belief revision work in llms?. arXiv preprint arXiv:2406.19354. Cited by: §1, §5. C. Hu, P. Cao, Y. Chen, K. Liu, and J. Zhao (2025) Knowledge in superposition: unveiling the failures of lifelong knowledge editing for large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, p. 24086–24094. Cited by: §2.4. A. Jacot, F. Gabriel, and C. Hongler (2018) Neural tangent kernel: convergence and generalization in neural networks. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, p. . External Links: Link Cited by: §2.3, §5. J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, and M. Seo (2023) Knowledge unlearning for mitigating privacy risks in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki (Eds.), Toronto, Canada, p. 14389–14408. External Links: Link, Document Cited by: §1, §1, §5. N. Kassner and H. Schütze (2020) Negated and misprimed probes for pretrained language models: birds can talk, but cannot fly. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. External Links: Link Cited by: §A.1, §2.1, §2.2. T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa (2022) Large language models are zero-shot reasoners. In ICML 2022 Workshop on Knowledge Retrieval and Language Models, External Links: Link Cited by: §1. B. Lake and M. Baroni (2018) Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In International conference on machine learning, p. 2873–2882. Cited by: §5. J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein, and J. Pennington (2019) Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems 32. Cited by: §5. W. Liu, H. Xu, B. Liu, Z. Deng, H. Wang, J. Wang, R. Li, Y. W. Teh, and W. S. Lee (2025) Is model editing built on sand? revealing its illusory success and fragile foundation. External Links: 2510.00625, Link Cited by: §1, §2.2, §5. D. Lopez-Paz and M. Ranzato (2017) Gradient episodic memory for continual learning. Advances in neural information processing systems 30. Cited by: §1. K. Meng, D. Bau, A. Andonian, and Y. Belinkov (2022a) Locating and editing factual associations in gpt. Advances in neural information processing systems 35, p. 17359–17372. Cited by: §A.1, §1, §5. K. Meng, A. S. Sharma, A. Andonian, Y. Belinkov, and D. Bau (2022b) Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229. Cited by: §1, §5. E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, and C. Finn (2022) Memory-based model editing at scale. In Proceedings of the 39th International Conference on Machine Learning, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato (Eds.), Proceedings of Machine Learning Research, Vol. 162, p. 15817–15831. External Links: Link Cited by: §6. T. Olmo, :, A. Ettinger, A. Bertsch, B. Kuehl, D. Graham, D. Heineman, D. Groeneveld, F. Brahman, F. Timbers, H. Ivison, J. Morrison, J. Poznanski, K. Lo, L. Soldaini, M. Jordan, M. Chen, M. Noukhovitch, N. Lambert, P. Walsh, P. Dasigi, R. Berry, S. Malik, S. Shah, S. Geng, S. Arora, S. Gupta, T. Anderson, T. Xiao, T. Murray, T. Romero, V. Graf, A. Asai, A. Bhagia, A. Wettig, A. Liu, A. Rangapur, C. Anastasiades, C. Huang, D. Schwenk, H. Trivedi, I. Magnusson, J. Lochner, J. Liu, L. J. V. Miranda, M. Sap, M. Morgan, M. Schmitz, M. Guerquin, M. Wilson, R. Huff, R. L. Bras, R. Xin, R. Shao, S. Skjonsberg, S. Z. Shen, S. S. Li, T. Wilde, V. Pyatkin, W. Merrill, Y. Chang, Y. Gu, Z. Zeng, A. Sabharwal, L. Zettlemoyer, P. W. Koh, A. Farhadi, N. A. Smith, and H. Hajishirzi (2025) Olmo 3. External Links: 2512.13961, Link Cited by: §A.1, §2.1. K. Park, Y. J. Choe, and V. Veitch (2023) The linear representation hypothesis and the geometry of large language models. In Causal Representation Learning Workshop at NeurIPS 2023, External Links: Link Cited by: §5. F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller (2019) Language models as knowledge bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan (Eds.), Hong Kong, China, p. 2463–2473. External Links: Link, Document Cited by: §2.2. J. Serre et al. (1977) Linear representations of finite groups. Vol. 42, Springer. Cited by: Appendix B. G. Sher (2008) Tarski’s thesis. In New Essays on Tarski and Philosophy, External Links: ISBN 9780199296309, Document, Link, https://academic.oup.com/book/0/chapter/270324395/chapter-ag-pdf/44538630/book_32548_section_270324395.ag.pdf Cited by: §1, §3.1, §5. A. Sinitsin, V. Plokhotnyuk, D. Pyrkin, S. Popov, and A. Babenko (2020) Editable neural networks. In International Conference on Learning Representations, External Links: Link Cited by: §6. P. Smolensky (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial intelligence 46 (1-2), p. 159–216. Cited by: §1, §5. A. Tarski and J. Corcoran (1986) What are logical notions?. History and philosophy of logic 7 (2), p. 143–154. Cited by: §1, §3.1, §5. B. Wang and H. Sun (2025) Is the reversal curse a binding problem? uncovering limitations of transformers from a basic generalization failure. External Links: 2504.01928, Link Cited by: §5. B. Wang, X. Yue, Y. Su, and H. Sun (2024a) Grokking of implicit reasoning in transformers: a mechanistic journey to the edge of generalization. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, External Links: Link Cited by: §5. P. Wang, Z. Li, N. Zhang, Z. Xu, Y. Yao, Y. Jiang, P. Xie, F. Huang, and H. Chen (2024b) WISE: rethinking the knowledge memory for lifelong model editing of large language models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, External Links: Link Cited by: §6. T. Wu, L. Luo, Y. Li, S. Pan, T. Vu, and G. Haffari (2024) Continual learning for large language models: a survey. External Links: 2402.01364, Link Cited by: §1, §5. A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. (2025) Qwen3 technical report. arXiv preprint arXiv:2505.09388. Cited by: §A.1, §2.1. Z. Zhong, Z. Wu, C. Manning, C. Potts, and D. Chen (2023) MQuAKE: assessing knowledge editing in language models via multi-hop questions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali (Eds.), Singapore, p. 15686–15702. External Links: Link, Document Cited by: §1. Appendix A Detailed Experimental Setup A.1 Methodology To empirically investigate whether recent LLMs respect negation equivariance Def. 2, we measure the geometric alignment between the gradients of factual queries and their negated counterparts. Models. We evaluated our hypothesis across models of varying sizes and architectures to ensure the generality of our findings. Specifically, we used the instruction-tuned versions of the Qwen series (Yang et al., 2025): Qwen3-4B-Instruct-2507 and Qwen3-30B-A3B-Instruct-2507. Additionally, to verify cross-architecture consistency, we included OLMo-3-7B-Instruct (Olmo et al., 2025). Dataset. We utilized the TREx subset of the Negated LAMA benchmark (Kassner and Schütze, 2020). This dataset provides pairs of cloze-style queries: a factual query p (e.g., “The capital of France is [MASK]”) and its negated form ¬p p (e.g., “The capital of France is not [MASK]”). We sampled 10 instances for each of 41 templates, making 410 datapoints in total. Gradient Computation. We define the score function sθ(q)s_θ(q) as the log-probability of the correct target span given the prompt. We compute the gradient of the sequence-level log-probability for the entire target entity span. Formally, for a target span y=(y1,…,yL)y=(y_1,…,y_L) and prompt x, the score is: sθ(q)=log⁡Pθ(y|x)=∑t=1Llog⁡Pθ(yt|x,y<t)s_θ(q)= P_θ(y|x)= _t=1^L P_θ(y_t|x,y_<t) (6) We computed the gradients ϕp=∇θsθ(p) _p= _θs_θ(p) and ϕ¬p=∇θsθ(¬p) _ p= _θs_θ( p). To analyze the geometry of the learned representations rather than the entire parameter space, we restricted the gradient computation to the parameters of the last transformer block and the language modeling head. This restriction is motivated by two key factors: (1) computational feasibility, as calculating full-parameter gradients for every sample is prohibitively expensive, and (2) semantic relevance, as high-level semantic concepts are known to be predominantly encoded in the later layers of the network (Meng et al., 2022a). Metric. We measured the cosine similarity between the two gradient vectors: Sim(p,¬p)=⟨ϕp,ϕ¬p⟩‖ϕp‖‖ϕ¬p‖Sim(p, p)= _p, _ p \| _p\|\| _ p\| (7) Implementation Details. All experiments were conducted using the PyTorch framework on a node equipped with 4 NVIDIA A100 (80GB) GPUs. The total runtime for the full evaluation pipeline across all models and relations was approximately 10 hours. A.2 Additional Results on Scale and Architecture In the main text (Fig. 2), we report that Qwen3-4B exhibits a strong positive alignment (Mean: 0.850.85) between factual and negated gradients, contradicting the theoretical requirement for systematic propagation. Here, we demonstrate that this phenomenon is not specific to a single model size or architecture. Specifically, we extended our analysis to Qwen3-30B (Fig. 4(a)) and OLMo-3-7B (Fig. 4(b)). As shown in the histograms, both models display a similar distribution heavily skewed towards positive cosine similarity. Specifically, for Qwen3-30B, Fig. 4(a) shows a high mean (0.860.86) similar to the 4B model, suggesting that simply scaling up the model may not resolve this tendency. Moreover, Fig. 4(b)) shows a slightly lower but still positive mean (0.620.62). Overall, these results suggest that negation equivariance does not hold in current LLMs, regardless of model scale or architecture. (a) Qwen3-30B (b) OLMo-3-7B Figure 4: Gradient alignment across different scales and architectures. The positive alignment phenomenon persists in (a) Qwen3-30B and (b) OLMo-3-7B. Both distributions are heavily skewed towards positive cosine similarity, demonstrating that the geometric mismatch for linear negation propagation is consistent across model scale and architecture. Appendix B Primer on Finite Group Representations This section collects basic definitions from group theory (Dummit et al., 2004) and group representation theory (Serre and others, 1977) used in our proofs. We work over real vector spaces unless otherwise stated, and all groups considered in this paper are finite. B.1 Groups A group is a set G equipped with a binary operation (g,h)↦gh(g,h) gh that encodes the structure of symmetries. Formally, it must satisfy three axioms: • Associativity: (gh)k=g(hk)(gh)k=g(hk) for all g,h,k∈Gg,h,k∈ G. • Identity: There exists an element e∈Ge∈ G such that eg=ge=geg=ge=g for all g. • Inverses: For every g∈Gg∈ G, there exists an element g−1∈Gg^-1∈ G such that gg−1=g−1g=egg^-1=g^-1g=e. We introduce two standard families of groups: • Symmetric group (SnS_n): The set of all permutations (bijections) of a set of n elements forms a group under function composition. • Cyclic group (ℤnZ_n): A group generated by a single element g such that gn=eg^n=e (and gk≠eg^k≠ e for k<nk<n). We specifically use the cyclic group of order 2, ℤ2≅(+1,−1,×)Z_2 (\+1,-1\,×), to model logical negation, where the non-identity element corresponds to the negation operator ((−1)2=1(-1)^2=1). B.2 Group Actions A (left) action of a group G on a set X formalizes the idea of transforming elements of X using the symmetries in G. It is a map G×X→XG× X→ X, denoted (g,x)↦g⋅x(g,x) g· x, that obeys two consistency rules: • Identity preservation: e⋅x=xe· x=x for all x∈Xx∈ X (the identity transformation leaves everything unchanged). • Compatibility: g⋅(h⋅x)=(gh)⋅xg·(h· x)=(gh)· x for all g,h∈Gg,h∈ G and x∈Xx∈ X (transforming by h then g is equivalent to transforming by the combined element ghgh). Given an element x∈Xx∈ X, its orbit is the set of all points reachable from x under the group action: G⋅x:=g⋅x:g∈GG· x:=\g· x:g∈ G\. B.3 Product Groups Let G and H be finite groups. The direct product G×HG× H is the group of pairs (g,h)(g,h) defined by the component-wise operation: (g,h)(g′,h′)=(gg′,hh′).(g,h)(g ,h )\;=\;(g ,h ). This structure allows us to compose symmetries acting on different domains. Specifically, if G acts on a set X and H acts on a set Y, then the product group G×HG× H naturally acts on the product set X×YX× Y via (g,h)⋅(x,y):=(g⋅x,h⋅y).(g,h)·(x,y)\;:=\;(g· x,h· y). B.4 Representations and Equivariant Maps Let W be a real vector space. A (finite-dimensional) representation of G on W is a group homomorphism ρ:G→GL(W),ρ:G→ GL(W), where GL(W)GL(W) denotes the group of invertible linear maps W→W→ W. This definition allows us to view G as acting linearly on the vector space W. We denote this action by g⋅w:=ρ(g)wg· w:=ρ(g)w. Since each ρ(g)ρ(g) lies in GL(W)GL(W), this action inherently preserves the linear structure: g⋅(aw+bv)=a(g⋅w)+b(g⋅v)g·(aw+bv)=a(g· w)+b(g· v) for all scalars a,ba,b and vectors w,vw,v. A linear map T:W→W′T:W→ W between two G-representations (W,ρ)(W,ρ) and (W′,ρ′)(W ,ρ ) is called G-equivariant if it commutes with the group action: T(g⋅w)=g⋅T(w)for all g∈G,w∈W.T(g· w)\;=\;g· T(w) all g∈ G,\ w∈ W. In terms of the homomorphism ρ, this is equivalent to T(ρ(g)w)=ρ′(g)T(w)T(ρ(g)w)=ρ (g)T(w). The space of all such G-equivariant linear maps is denoted HomG(W,W′)Hom_G(W,W ). B.5 Invariant Subspaces and Irreducibility A subspace U⊆WU W is called G-invariant if it is closed under the group action: ρ(g)u∈Ufor all g∈G,u∈U.ρ(g)u∈ U all g∈ G,u∈ U. Intuitively, if a feature vector lies in a G-invariant subspace, applying any symmetry from G will keep the vector within that same subspace. In this case, the restriction ρ|U:G→GL(U)ρ|_U:G→ GL(U) defines a valid representation on U, called a subrepresentation. A representation is irreducible if it contains no proper nontrivial invariant subspaces. That is, the only G-invariant subspaces are 0\0\ and W itself. A representation is reducible if it has a nontrivial invariant subspace. A representation W is completely reducible if it can be decomposed entirely into these atomic blocks. Formally, this means W is isomorphic to a direct sum of irreducible subrepresentations: W≅⨁i=1ℓWi,W\; \; _i=1 W_i, where each WiW_i is irreducible. B.6 Maschke’s Theorem A key structural fact we use is that finite-dimensional representations of finite groups over ℝR are completely reducible (Maschke’s theorem). While the theorem holds for more general fields, we focus on the real field relevant to our work. Theorem 4 (Maschke). Let G be a finite group and let W be a finite-dimensional representation of G over ℝR. Then W is completely reducible: every G-invariant subspace U⊆WU W admits a G-invariant complement U′U such that W=U⊕U′W=U U . We note that by recursively applying this property, i.e., finding an invariant subspace, splitting it off, and repeating the process on the complement, one can guarantee that any finite-dimensional representation W can be fully decomposed into a direct sum of irreducible subrepresentations. Appendix C Proof of Theorem 1 In this section, we provide the complete proof of Theorem 1. We proceed in four logical steps: 1. Geometric Constraints: First, we analyze the linear dependencies in the feature space, showing that the kernel of the feature map decomposes according to logical families under SLP (Lemma 3). 2. Group Action Construction: Building on this kernel structure, we translate the symbolic symmetries of entity renaming and negation to a well-defined linear representation of the group H on the feature space (Lemmas 6 and 7). 3. Representation Theoretic Decomposition: We invoke standard results from representation theory to decompose the feature space into a direct sum of tensor products (Lemma 8). 4. Derivation of Factorization: Finally, we utilize the isomorphism of equivariant maps on tensor products (Lemma 9) to derive the explicit context-relation factorization of the feature map (Theorem 1). C.1 Logical Families and Controllability Let V be the free real vector space with basis eq:q∈Q\e_q:q∈ Q\, where eqe_q is a basis vector corresponding to the query q. Define a linear map Φ:V⟶W,Φ(eq):=ϕq. :V W, (e_q):= _q. By construction, ker⁡Φ is the space of all linear dependencies among the feature vectors ϕq\ _q\. Moreover, recall from Sec. 2.4 that the two logical operations in the unary relation algebra generate a group G:=id,¬,rev,¬∘revG:=\\,id,\ ,\ rev,\ \,\ acting on Q, and logical family is defined as: G⋅q:=g(q):g∈G.G· q:=\\,g(q):g∈ G\,\. Intuitively, the second condition of SLP (Def. 3) states that no unintended collapse happens across different logical families. In other words, linear dependencies arise if and only if they are semantically forced within a family. The following lemma states this formally. Lemma 3 (Family-wise kernel decomposition). Let F be a logical family and assume SLP (Def. 3). Let V(F):=spaneq:q∈F⊆V(F):=span\e_q:q∈ F\ V and let Φ|V(F) |_V(F) denote the restriction of Φ to V(F)V(F). Then any linear relation among features decomposes family-wise, i.e., ker⁡Φ=⨁Fker⁡Φ|V(F). \;=\; _F |_V(F). Proof. The logical families F\F\ form a partition of Q, so the basis eq:q∈Q\e_q:q∈ Q\ of V splits accordingly into disjoint subsets eq:q∈F\e_q:q∈ F\. Hence V=⨁FV(F),V\;=\; _FV(F), and every v∈Vv∈ V can be written uniquely as v=∑FvFv= _Fv_F with vF∈V(F)v_F∈ V(F). By logical equivariance (Def. 2), for each family F, we can choose a representative qF∗∈Fq_F ∈ F such that, for every q∈Fq∈ F, the feature vector ϕq _q is either ϕqF∗ _q_F or −ϕqF∗- _q_F . Equivalently, for each q∈Fq∈ F there exists a scalar λq,q∗∈+1,−1 _q,q ∈\+1,-1\ with ϕq=λq,q∗ϕqF∗. _q\;=\; _q,q \, _q_F . By the second condition of SLP (Def. 3), the chosen representatives can be taken so that the set ϕqF∗:F is a logical family\ _q_F :F is a logical family\ is linearly independent in W. Now take any v∈ker⁡Φv∈ and decompose it as v=∑FvFv= _Fv_F with vF∈V(F)v_F∈ V(F). Write each component as vF=∑q∈Fαqeq.v_F\;=\; _q∈ F _q\,e_q. Then Φ(vF)=∑q∈Fαqϕq=∑q∈Fαqλq,q∗ϕqF∗=βFϕqF∗, (v_F)\;=\; _q∈ F _q\, _q\;=\; _q∈ F _q\, _q,q \, _q_F \;=\; _F\, _q_F , where we define βF:=∑q∈Fαqλq,q∗. _F:= _q∈ F _q\, _q,q . Since v∈ker⁡Φv∈ , we have 0=Φ(v)=∑FΦ(vF)=∑FβFϕqF∗.0\;=\; (v)\;=\; _F (v_F)\;=\; _F _F\, _q_F . The vectors ϕqF∗F\ _q_F \_F are linearly independent, so each coefficient must vanish: βF=0for all F. _F=0 all F. Hence Φ(vF)=βFϕqF∗=0 (v_F)= _F _q_F =0 for every family F, i.e. vF∈ker⁡Φ|V(F)v_F∈ |_V(F). Since v=∑FvFv= _Fv_F and the decomposition V=⨁FV(F)V= _FV(F) is unique, this shows that every element of ker⁡Φ decomposes uniquely as a sum of elements from the subspaces ker⁡Φ|V(F) |_V(F), and therefore ker⁡Φ=⨁Fker⁡Φ|V(F). \;=\; _F |_V(F). ∎ C.2 Diagonal Renaming of Entities We now formalize the idea that logical notions should be insensitive to the names of entities. Definition 5 (Entity renaming). Let GE:=Sym(E)G_E:=Sym(E) be the permutation group of E. We let GEG_E act on Q by renaming entities uniformly in both argument slots: σ⋅(h,r,t):=(σ(h),r,σ(t))(σ∈GE,(h,r,t)∈Q).σ·(h,r,t)\;:=\;(σ(h),\,r,\,σ(t)) (σ∈ G_E,\ (h,r,t)∈ Q). An important observation is that, since entity renaming permutes only entities and logical operations act only on relations, the two operations commute. Lemma 4 (Symbolic renaming invariance). For any σ∈GEσ∈ G_E and any q∈Qq∈ Q we have ¬(σ⋅q)=σ⋅¬(q),rev(σ⋅q)=σ⋅rev(q). (σ· q)\;=\;σ· (q), (σ· q)\;=\;σ·rev(q). Proof. This is the direct consequence of the definition of logical operations in the tiny relation algebra and Def. 5. ∎ In other words, if two queries are in the same logical family, then their renamings are in the same family. Our goal is to show that this renaming symmetry translates to a linear symmetry of W. To this end, we first show that the kernel structure is preserved upon entity renaming. For each σ∈GEσ∈ G_E define a linear map Pσ:V→VP_σ:V→ V on basis vectors by Pσeq:=eσ⋅q.P_σe_q:=e_σ· q. Lemma 5 (Kernel invariance under renaming). For every σ∈GEσ∈ G_E, if v∈ker⁡Φv∈ , then Pσv∈ker⁡ΦP_σv∈ . Proof. Let v∈ker⁡Φv∈ . By Lemma 3, it suffices to show the invariance for a component vF∈ker⁡Φ|V(F)v_F∈ |_V(F) entirely contained in a single logical family F. By the logical equivariance assumption, the feature vectors within a family F are all collinear. Specifically, fix a representative q∗∈Fq ∈ F. Then for any q∈Fq∈ F, there exists λq,q∗∈+1,−1 _q,q ∈\+1,-1\ such that ϕq=λq,q∗ϕq∗. _q\;=\; _q,q \, _q . Consequently, a vector vF=∑q∈Fαqeqv_F= _q∈ F _qe_q lies in ker⁡Φ if and only if Φ(vF)=∑q∈Fαqϕq=(∑q∈Fαqλq,q∗)ϕq∗= 0. (v_F)\;=\; _q∈ F _q _q\;=\; ( _q∈ F _q _q,q ) _q \;=\;0. Since ϕq∗≠0 _q ≠ 0 (by the second condition of SLP Def. 3), the condition for membership in the kernel is purely scalar: ∑q∈Fαqλq,q∗= 0. _q∈ F _q\, _q,q \;=\;0. (8) Now consider the transformed vector PσvF=∑q∈Fαqeσ⋅qP_σv_F= _q∈ F _qe_σ· q. This vector is supported on the permuted family σ(F)σ(F). Let p∗:=σ⋅q∗p :=σ· q be the representative for σ(F)σ(F). For any q∈Fq∈ F, let p:=σ⋅qp:=σ· q. By definition of a logical family (orbit under the logical-operation group G generated by ¬ and revrev), for any q∈Fq∈ F there exists g∈Gg∈ G such that q=g⋅q∗.q=g· q . We claim that entity renaming commutes with every g∈Gg∈ G: σ⋅(g⋅x)=g⋅(σ⋅x)∀σ∈GE,∀g∈G,∀x∈Q.σ·(g· x)\;=\;g·(σ· x) ∀\,σ∈ G_E,\ ∀\,g∈ G,\ ∀\,x∈ Q. (9) Indeed, Lemma 4 gives this commutation for the generators ¬ and revrev, and Eq. 9 follows for arbitrary g by closure under composition in G. Applying Eq. 9 to x=q∗x=q yields p:=σ⋅q=σ⋅(g⋅q∗)=g⋅(σ⋅q∗)=:g⋅p∗,p:=σ· q=σ·(g· q )=g·(σ· q )=:g· p , so the same g relates p to p∗p . Next, by the logical equivariance condition, there is a sign homomorphism χ:G→+1,−1χ:G→\+1,-1\ such that for all x∈Qx∈ Q, ϕg⋅x=χ(g)ϕx. _g· x\;=\;χ(g)\, _x. (10) Using Eq. 10 with x=q∗x=q and x=p∗x=p , we relate the scalar coefficients to χ: ϕq=ϕg⋅q∗=χ(g)ϕq∗⟹λq,q∗=χ(g), _q= _g· q =χ(g) _q _q,q =χ(g), ϕp=ϕg⋅p∗=χ(g)ϕp∗⟹λp,p∗=χ(g). _p= _g· p =χ(g) _p _p,p =χ(g). Therefore, the relative signs are preserved: λp,p∗=χ(g)=λq,q∗. _p,p \;=\;χ(g)\;=\; _q,q . Using the equality above, we finally show that PσvF∈ker⁡ΦP_σv_F∈ : Φ(PσvF) (P_σv_F) =∑q∈Fαqϕσ⋅q \;=\; _q∈ F _q _σ· q =∑q∈Fαqλσ⋅q,σ⋅q∗ϕp∗ \;=\; _q∈ F _q\, _σ· q,σ· q \, _p =(∑q∈Fαqλq,q∗)ϕp∗. \;=\; ( _q∈ F _q\, _q,q ) _p . By equation Eq. 8, the coefficient sum is zero. Thus Φ(PσvF)=0 (P_σv_F)=0, proving that PσvF∈ker⁡ΦP_σv_F∈ . ∎ Using Lemma 5, we can translate the entity renaming to a linear symmetry on the feature space. Lemma 6. Suppose that for each σ∈GEσ∈ G_E we have Pσ(ker⁡Φ)⊆ker⁡ΦP_σ( ) . Then there exists a unique linear map ρE(σ):W→W _E(σ):W→ W such that ϕσ⋅q=ρE(σ)ϕq∀q∈Q. _σ· q\;=\; _E(σ)\, _q ∀ q∈ Q. Moreover, the assignment σ↦ρE(σ)σ _E(σ) defines a group representation ρE:GE→GL(W) _E:G_E→ GL(W). Proof. Recall that Φ:V→W :V→ W is surjective by definition of W=spanϕq:q∈QW=span\ _q:q∈ Q\. For a fixed σ∈GEσ∈ G_E, we define ρE(σ) _E(σ) as follows: given any w∈Ww∈ W, choose v∈Vv∈ V with Φ(v)=w (v)=w and set ρE(σ)w:=Φ(Pσv). _E(σ)\,w\;:=\; (P_σv ). We first check that this does not depend on the choice of v. Suppose v,v′∈Vv,v ∈ V satisfy Φ(v)=Φ(v′) (v)= (v ), i.e. Φ(v−v′)=0 (v-v )=0, so v−v′∈ker⁡Φv-v ∈ . By assumption, Pσ(ker⁡Φ)⊆ker⁡ΦP_σ( ) , hence Φ(Pσ(v−v′))=0, (P_σ(v-v ) )=0, which implies Φ(Pσv)=Φ(Pσv′). (P_σv )= (P_σv ). Thus, ρE(σ)w _E(σ)\,w is well-defined. Linearity of ρE(σ) _E(σ) follows immediately from the linearity of Φ and PσP_σ. Now, we apply this definition to the basis elements. For q∈Qq∈ Q, we have eq∈Ve_q∈ V and Φ(eq)=ϕq (e_q)= _q, so ρE(σ)ϕq=ρE(σ)Φ(eq)=Φ(Pσeq)=Φ(eσ⋅q)=ϕσ⋅q. _E(σ)\, _q\;=\; _E(σ)\, (e_q)\;=\; (P_σe_q )\;=\; (e_σ· q)\;=\; _σ· q. To check its uniqueness, observe that the vectors ϕq:q∈Q\ _q:q∈ Q\ span W, so any linear map T:W→WT:W→ W satisfying Tϕq=ϕσ⋅qT\, _q= _σ· q for all q must agree with ρE(σ) _E(σ) on a spanning set, hence must be equal to ρE(σ) _E(σ). Finally, we verify the group property. For any σ1,σ2∈GE _1, _2∈ G_E and any, w=Φ(v)w= (v) we have ρE(σ1)ρE(σ2)w=ρE(σ1)Φ(Pσ2v)=Φ(Pσ1Pσ2v)=Φ(Pσ1σ2v)=ρE(σ1σ2)w. _E( _1)\, _E( _2)\,w\;=\; _E( _1)\, (P_ _2v )\;=\; (P_ _1P_ _2v )\;=\; (P_ _1 _2v )\;=\; _E( _1 _2)\,w. Thus, ρE(σ1)ρE(σ2)=ρE(σ1σ2) _E( _1) _E( _2)= _E( _1 _2) on all of W, and ρE:GE→GL(W) _E:G_E→ GL(W) is a representation. ∎ Hence, SLP implies that the linearized features carry a compatible entity-renaming symmetry. C.3 Combining Renaming and Negation We now combine entity renaming with relation negation into a single symmetry structure. Recall from Def. 2 that logical equivariance of negation says ϕ¬q=−ϕq∀q∈Q, _ q=-\, _q ∀ q∈ Q, i.e., relation-level negation always flips the feature vector by a global sign −1-1, regardless of which entities appear. Entity renaming acts only on the head and tail slots, while negation acts only on the relation slot. We want to treat these two operations together as a single family of joint transformations. Definition 6 (Combined symmetry group). Let ℤ2:=+1,−1Z_2:=\+1,-1\ be the two-element group with multiplication. We define H:=GE×ℤ2.H:=G_E×Z_2. An element (σ,ϵ)∈H(σ,ε)∈ H acts on a query by (σ,ϵ)⋅(h,r,t):=(σ(h),ϵ⋅r,σ(t)),(σ,ε)·(h,r,t):= (σ(h),\,ε· r,\,σ(t) ), where ϵ=+1ε=+1 means “keep the relation as it is” and ϵ=−1ε=-1 means “replace r by its negation ¬r r”. On the feature space W we let (σ,ϵ)(σ,ε) act linearly by ρ(σ,+1):=ρE(σ),ρ(σ,−1):=−ρE(σ),ρ(σ,+1):= _E(σ), ρ(σ,-1):=-\, _E(σ), so that ρ(σ,ϵ)ρ(σ,ε) first applies the renaming operator ρE(σ) _E(σ) and then, if ϵ=−1ε=-1, flips the sign of the feature vector. In other words, for any w∈Ww∈ W, ρ(σ,ϵ)w:=ρE(σ)wif ϵ=+1,−ρE(σ)wif ϵ=−1.ρ(σ,ε)\,w:= cases _E(σ)\,w&if ε=+1,\\[3.00003pt] -\, _E(σ)\,w&if ε=-1. cases Because ρE _E respects composition of renamings and (−1)2=+1(-1)^2=+1, the maps ρ(σ,ϵ)ρ(σ,ε) also respect composition: applying (σ1,ϵ1)( _1, _1) and then (σ2,ϵ2)( _2, _2) has the same effect on features as applying (σ1σ2,ϵ1ϵ2)( _1 _2, _1 _2) once. This means H acts consistently and linearly on W. Hence, logical equivariance and renaming equivariance can now be combined into a single statement as below. Lemma 7 (Equivariant feature map). The map ϕ:Q→Wφ:Q→ W is compatible with the combined symmetry action of H in the sense that, for all (σ,ϵ)∈H(σ,ε)∈ H and all q∈Qq∈ Q, ϕ((σ,ϵ)⋅q)=ρ(σ,ϵ)ϕq.φ ((σ,ε)· q )=ρ(σ,ε)\, _q. Proof. When ϵ=+1ε=+1, we have (σ,+1)⋅(h,r,t)=(σ(h),r,σ(t)),(σ,+1)·(h,r,t)=(σ(h),r,σ(t)), and this is represented on features by ρE(σ) _E(σ), i.e. ϕ(σ⋅q)=ρE(σ)ϕq=ρ(σ,+1)ϕqφ(σ· q)= _E(σ)\, _q=ρ(σ,+1)\, _q. When ϵ=−1ε=-1, we have (σ,−1)⋅(h,r,t)=(σ(h),¬r,σ(t)).(σ,-1)·(h,r,t)=(σ(h), r,σ(t)). By logical equivariance of negation, ϕ¬q=−ϕq _ q=- _q for every q, and by Lemma 4, negation commutes with renaming on the symbolic side. Combining these facts, we obtain ϕ((σ,−1)⋅q)=ϕ(σ⋅(¬q))=ρE(σ)ϕ¬q=ρE(σ)(−ϕq)=−ρE(σ)ϕq=ρ(σ,−1)ϕq,φ ((σ,-1)· q )=φ (σ·( q) )= _E(σ)\, _ q= _E(σ)\,(- _q)=-\, _E(σ)\, _q=ρ(σ,-1)\, _q, as claimed. ∎ In words, performing a renaming and optional negation on the symbolic query side has the same effect as applying the corresponding linear operator ρ(σ,ϵ)ρ(σ,ε) on the feature side. C.4 Decomposition of H-representations Using the Lemmas above, we prove Lemma 8, which will play a crucial role in proving Theorem 1. Lemma 8 (Decomposition of H-representations). Let W be a finite-dimensional real representation of H=GE×ℤ2H=G_E×Z_2. Then W decomposes into a direct sum of tensor-product blocks: W≅⨁i=1m(Ci⊗Ri),W\; \; _i=1^m (C_i R_i ), where each CiC_i is an irreducible representation of the entity group GEG_E, and each RiR_i is an irreducible representation of the logical group ℤ2Z_2. Proof. We explicitly construct the decomposition using the structure of ℤ2=1,−1Z_2=\1,-1\. Let z:=(id,−1)∈Hz:=(id,-1)∈ H. Since z commutes with every element in H (i.e., zh=hzzh=hz for all h), the linear map ρ(z)ρ(z) must commute with the group action: ρ(z)ρ(h)=ρ(zh)=ρ(hz)=ρ(h)ρ(z)for all h∈H.ρ(z)ρ(h)\;=\;ρ(zh)\;=\;ρ(hz)\;=\;ρ(h)ρ(z) all h∈ H. Also, since z2=(id,1)=ez^2=(id,1)=e, we have ρ(z)2=Iρ(z)^2=I. Thus, ρ(z)ρ(z) is an involution and W can be decomposed into eigenspaces corresponding to eigenvalues +1+1 and −1-1: W=W+⊕W−,where W±:=w∈W:ρ(z)w=±w.W\;=\;W^+ W^-, W^±:=\w∈ W:ρ(z)w=± w\. Crucially, these eigenspaces are H-invariant. To see this, let w∈W±w∈ W^± and h∈Hh∈ H. Using the commutativity shown above: ρ(z)(ρ(h)w)=ρ(h)(ρ(z)w)=ρ(h)(±w)=±(ρ(h)w).ρ(z) (ρ(h)w )\;=\;ρ(h) (ρ(z)w )\;=\;ρ(h)(± w)\;=\;± (ρ(h)w ). This shows that if w∈W±w∈ W^±, then the transformed vector ρ(h)wρ(h)w also satisfies the condition to be in W±W^±. Thus, W+W^+ and W−W^- are subrepresentations of H. We now determine the action of a general element (σ,ϵ)∈H(σ,ε)∈ H on these subspaces. Note that, any element factors as (σ,ϵ)=(σ,1)(id,ϵ)(σ,ε)=(σ,1)(id,ε). Then, the logical factor (id,ϵ)(id,ε) acts on the subspaces as: ρ(id,ϵ)|W+=I,ρ(id,ϵ)|W−=ϵI.ρ(id,ε) |_W^+=I, ρ(id,ε) |_W^-=ε I. Consequently, the full action is: ρ(σ,ϵ)w=ρ(σ,1)ρ(id,ϵ)w=ρ(σ,1)wif w∈W+,ϵ⋅ρ(σ,1)wif w∈W−.ρ(σ,ε)w\;=\;ρ(σ,1)ρ(id,ε)w\;=\; casesρ(σ,1)w&if w∈ W^+,\\ ε·ρ(σ,1)w&if w∈ W^-. cases Since W is finite-dimensional, we can apply Maschke’s Theorem (Theorem 4) to decompose W+W^+ and W−W^- into irreducible representations of GEG_E (via the map σ↦ρ(σ,1)σ ρ(σ,1)): W+≅⨁j∈J+Cj,W−≅⨁j∈J−Cj.W^+ _j∈ J_+C_j, W^- _j∈ J_-C_j. Finally, we map this structure to the tensor product form claimed in the theorem. Let 1 and sgnsgn denote the two irreducible real representations of ℤ2Z_2, respectively. Each irreducible GEG_E-summand C⊆W+C W^+ is an H-subrepresentation on which (σ,ϵ)(σ,ε) acts by ρ(σ,1)ρ(σ,1) alone, hence it is H-isomorphic to C⊗C 1 (where (σ,ϵ)⋅(c⊗1)=(σc)⊗1(σ,ε)·(c 1)=(σ c) 1). Likewise, each irreducible GEG_E-summand C⊆W−C W^- is an H-subrepresentation on which (σ,ϵ)(σ,ε) acts by ϵρ(σ,1)ε\,ρ(σ,1), hence it is H-isomorphic to C⊗sgnC (where (σ,ϵ)⋅(c⊗1)=(σc)⊗ϵ(σ,ε)·(c 1)=(σ c) ε). Therefore, W≅(⨁j∈J+Cj⊗)⊕(⨁j∈J−Cj⊗sgn).W\; \; ( _j∈ J_+C_j 1 )\; \; ( _j∈ J_-C_j ). Letting RiR_i represent either 1 or sgnsgn as appropriate for each block, we obtain the claimed form W≅⨁i(Ci⊗Ri)W _i(C_i R_i). ∎ C.5 Equivariant Maps on Tensor Products To prove the main theorem, we need to characterize equivariant maps between tensor product representations. Lemma 9 (Equivariant maps on tensor products). Let G and K be groups. Let U,AU,A be finite-dimensional real representations of G, and let V,BV,B be finite-dimensional real representations of K. We regard the tensor products U⊗VU V and A⊗BA B as representations of the product group G×KG× K via the component-wise action: (g,k)⋅(u⊗v)=(g⋅u)⊗(k⋅v),(g,k)⋅(a⊗b)=(g⋅a)⊗(k⋅b).(g,k)·(u v)=(g· u) (k· v), (g,k)·(a b)=(g· a) (k· b). Then there is a natural linear isomorphism HomG(U,A)⊗HomK(V,B)≅HomG×K(U⊗V,A⊗B).Hom_G(U,A) _K(V,B)\; \;Hom_G× K(U V,\ A B). Proof. Consider first the full vector spaces of linear maps without equivariance constraints. For any linear maps f∈Hom(U,A)f (U,A) and h∈Hom(V,B)h (V,B), we define a map f⊠h:U⊗V→A⊗Bf h:U V→ A B by its action on simple tensors: (f⊠h)(u⊗v):=f(u)⊗h(v)for all u∈U,v∈V.(f h)(u v):=f(u) h(v) all u∈ U,v∈ V. This construction induces a natural linear map Ψ:Hom(U,A)⊗Hom(V,B)→Hom(U⊗V,A⊗B),Ψ(f⊗h)=f⊠h. :Hom(U,A) (V,B) (U V,A B), (f h)=f h. We first claim that Ψ is an isomorphism. To see this, choose bases for the source and target vector spaces: (ui)i=1m (u_i)_i=1^m of U, of U, (aj)j=1p (a_j)_j=1^p of A, of A, (vr)r=1n (v_r)_r=1^n of V, of V, (bs)s=1q (b_s)_s=1^q of B. of B. We define the elementary linear maps that form the bases for the hom-spaces. For each pair (j,i)(j,i), let Ej,i∈Hom(U,A)E_j,i (U,A) be the unique linear map defined by: Ej,i(uk)=ajif k=i,0if k≠i.E_j,i(u_k)= casesa_j&if k=i,\\ 0&if k≠ i. cases The set Ej,ij,i\E_j,i\_j,i forms a basis of Hom(U,A)Hom(U,A). Similarly, for each (s,r)(s,r), let Fs,r∈Hom(V,B)F_s,r (V,B) be defined by: Fs,r(vk)=bsif k=r,0if k≠r.F_s,r(v_k)= casesb_s&if k=r,\\ 0&if k≠ r. cases The set Fs,rs,r\F_s,r\_s,r forms a basis of Hom(V,B)Hom(V,B). Consequently, the set of tensor products forms a basis for the domain of Ψ : Ej,i⊗Fs,rj,i,s,r⊂Hom(U,A)⊗Hom(V,B).\E_j,i F_s,r\_j,i,s,r ⊂ (U,A) (V,B). Now, consider the image of these basis vectors under Ψ . Let Mj,i,s,r:=Ψ(Ej,i⊗Fs,r)M_j,i,s,r:= (E_j,i F_s,r). By the definition of Ψ (action on simple tensors), Mj,i,s,rM_j,i,s,r is the unique linear map U⊗V→A⊗BU V→ A B satisfying: Mj,i,s,r(ui′⊗vr′)=Ej,i(ui′)⊗Fs,r(vr′)=aj⊗bsif (i′,r′)=(i,r),0otherwise.M_j,i,s,r(u_i v_r )=E_j,i(u_i ) F_s,r(v_r )= casesa_j b_s&if (i ,r )=(i,r),\\ 0&otherwise. cases The collection of maps Mj,i,s,rj,i,s,r\M_j,i,s,r\_j,i,s,r is precisely the standard basis of the codomain space Hom(U⊗V,A⊗B)Hom(U V,A B). Since Ψ maps a basis of the domain bijectively onto a basis of the codomain, it is a linear isomorphism. Our strategy is to identify equivariant maps as the fixed points of a group action on the function space. Specifically, we equip Hom(U,A)Hom(U,A) with a natural G-action via conjugation: for g∈Gg∈ G and f∈Hom(U,A)f (U,A), the transformed map g⋅fg· f is defined by (g⋅f)(u):=g⋅f(g−1⋅u).(g· f)(u):=g· f(g^-1· u). A map f is G-equivariant if and only if it is invariant under this action (i.e., g⋅f=fg· f=f for all g), hence HomG(U,A)=Hom(U,A)G,Hom_G(U,A)=Hom(U,A)^G, where the superscript G denotes the subspace of fixed points. Analogously, we define the K-action on Hom(V,B)Hom(V,B), and the (G×K)(G× K)-action on Hom(U⊗V,A⊗B)Hom(U V,A B): for any T∈Hom(U⊗V,A⊗B)T (U V,A B), the action is defined by ((g,k)⋅T)(x):=(g,k)⋅T((g,k)−1⋅x). ((g,k)· T )(x):=(g,k)· T ((g,k)^-1· x ). We now verify that the isomorphism Ψ respects these group structures. We equip the domain, Hom(U,A)⊗Hom(V,B)Hom(U,A) (V,B), with the component-wise action of G×KG× K: (g,k)⋅(f⊗h):=(g⋅f)⊗(k⋅h).(g,k)·(f h):=(g· f) (k· h). For any (g,k)∈G×K(g,k)∈ G× K, a direct computation on simple tensors u⊗vu v shows: ((g,k)⋅(f⊠h))(u⊗v) ((g,k)·(f h) )(u v) =(g,k)⋅(f⊠h)((g,k)−1⋅(u⊗v)) =(g,k)·(f h) ((g,k)^-1·(u v) ) =(g,k)⋅(f⊠h)(g−1u⊗k−1v) =(g,k)·(f h)(g^-1u k^-1v) =(g,k)⋅(f(g−1u)⊗h(k−1v)) =(g,k)· (f(g^-1u) h(k^-1v) ) =(g⋅f(g−1u))⊗(k⋅h(k−1v)) =(g· f(g^-1u)) (k· h(k^-1v)) =((g⋅f)⊠(k⋅h))(u⊗v). = ((g· f) (k· h) )(u v). Since this equality holds for all simple tensors u⊗vu v, which span U⊗VU V, the linear maps are identical. Rewriting this using Ψ , we see that: (g,k)⋅Ψ(f⊗h)=Ψ((g,k)⋅(f⊗h)).(g,k)· (f h)= ((g,k)·(f h) ). Therefore, Ψ restricts to an isomorphism between the respective fixed-point subspaces: Ψ:(Hom(U,A)⊗Hom(V,B))G×K→∼Hom(U⊗V,A⊗B)G×K. : (Hom(U,A) (V,B) )^G× K Hom(U V,A B)^G× K. Finally, we identify the invariants of the tensor product. Let X be any G-representation and Y any K-representation, and consider X⊗YX Y as a (G×K)(G× K)-representation via (g,k)⋅(x⊗y)=(g⋅x)⊗(k⋅y)(g,k)·(x y)=(g· x) (k· y). Then (X⊗Y)G×K=XG⊗YK(X Y)^G× K=X^G Y^K. Indeed, if w∈(X⊗Y)G×Kw∈(X Y)^G× K, then in particular w∈(X⊗Y)e×Kw∈(X Y)^\e\× K, so w∈X⊗YKw∈ X Y^K. Write w=∑ixi⊗yiw= _ix_i y_i with yi∈YKy_i∈ Y^K. Now, invariance under G×eG×\e\ implies ∑i(g⋅xi)⊗yi=∑ixi⊗yifor all g∈G, _i(g· x_i) y_i= _ix_i y_i all g∈ G, and since the yiy_i lie in YKY^K, we may choose them to be linearly independent by regrouping terms. It follows that g⋅xi=xig· x_i=x_i for each i, hence xi∈XGx_i∈ X^G and therefore w∈XG⊗YKw∈ X^G Y^K. The reverse inclusion XG⊗YK⊆(X⊗Y)G×KX^G Y^K (X Y)^G× K is immediate, proving the claim. Applying this with X=Hom(U,A)X=Hom(U,A) and Y=Hom(V,B)Y=Hom(V,B) gives (Hom(U,A)⊗Hom(V,B))G×K=Hom(U,A)G⊗Hom(V,B)K=HomG(U,A)⊗HomK(V,B). (Hom(U,A) (V,B) )^G× K\;=\;Hom(U,A)^G (V,B)^K\;=\;Hom_G(U,A) _K(V,B). Combining this with the fact that Ψ restricts to an isomorphism on the invariant subspaces, we conclude: HomG(U,A)⊗HomK(V,B)≅HomG×K(U⊗V,A⊗B).Hom_G(U,A) _K(V,B)\; \;Hom_G× K(U V,A B). ∎ C.6 Proof of Theorem 1 Lemma 8 implies that, with a proper choice of basis, any feature vector ϕq _q in a model satisfying SLP can be expressed as a direct sum of components, where each component is a tensor product of an entity-dependent factor (from CiC_i) and a relation-dependent factor (from RiR_i). We now translate this general decomposition into specific constraints on the query feature map ϕ(h,r,t)φ(h,r,t), to prove Theorem 1. Theorem 1 (Context-Relation Factorization). Let ϕ:Q→Wφ:Q→ W be the feature map defined by q↦ϕq _q. If ϕφ satisfies SLP, then there exist real vector spaces Cii\C_i\_i, Rii\R_i\_i and an isomorphism W≅⨁i(Ci⊗Ri)W _i(C_i R_i) such that ϕ(h,r,t)=⨁i(∑k=1miui,k(h,t)⊗vi,k(r)),φ(h,r,t)\;=\; _i ( _k=1^m_iu_i,k(h,t) v_i,k(r) ), where ui,k:E×E→Ciu_i,k:E× E→ C_i and vi,k:ℛ→Riv_i,k:R→ R_i. Moreover, negation acts locally as a sign flip on each relation component, i.e., vi,k(¬r)=−vi,k(r)v_i,k( r)=-v_i,k(r). Proof. Let V with the free vector space with basis eq:q∈Q\e_q:q∈ Q\. The group action of H on the set Q naturally induces a linear representation on V, defined by permuting the basis vectors: (σ,ϵ)⋅eq:=e(σ,ϵ)⋅q.(σ,ε)· e_q\;:=\;e_(σ,ε)· q. Let Φ:V→W :V→ W be the unique linear extension of ϕφ, defined by Φ(eq):=ϕq (e_q):= _q. We identify V with the tensor product V≅Vctx⊗VrelV\; \;V_ctx V_rel via the basis mapping e(h,r,t)↦e(h,t)⊗ere_(h,r,t) e_(h,t) e_r. Under this identification, the H-action on V acts component-wise: (σ,ϵ)⋅(e(h,t)⊗er):=e(σ(h),σ(t))⊗eϵ⋅r.(σ,ε)·(e_(h,t) e_r)\;:=\;e_(σ(h),σ(t)) e_ε· r. By Lemma 7, ϕ:Q→Wφ:Q→ W is H-equivariant with respect to the H-action on queries. Since Φ is defined linearly on the basis Q, this equivariance extends to the entire space V, making Φ an H-equivariant map. Moreover, by Lemma 8, one can fix an H-equivariant isomorphism W≅⨁i(Ci⊗Ri)W _i(C_i R_i), and let πi:W→Ci⊗Ri _i:W→ C_i R_i be the corresponding projection. Define the component map Φi:=πi∘Φ _i:= _i . Since both Φ and πi _i are equivariant, Φi _i belongs to the space HomH(Vctx⊗Vrel,Ci⊗Ri)≅HomGE(Vctx,Ci)⊗Homℤ2(Vrel,Ri),Hom_H(V_ctx V_rel,C_i R_i)\; \;Hom_G_E(V_ctx,C_i) _Z_2(V_rel,R_i), where the isomorphism is established by Lemma 9. Thus, Φi _i is an element of a tensor product space, implying it can be written as a finite sum of pure tensors: Φi=∑k=1miUi,k⊗Si,k, _i\;=\; _k=1^m_iU_i,k S_i,k, where Ui,k:Vctx→CiU_i,k:V_ctx→ C_i is GEG_E-equivariant and Si,k:Vrel→RiS_i,k:V_rel→ R_i is ℤ2Z_2-equivariant. Defining the embeddings ui,k(h,t):=Ui,k(e(h,t))u_i,k(h,t):=U_i,k(e_(h,t)) and vi,k(r):=Si,k(er)v_i,k(r):=S_i,k(e_r), the overall feature map decomposes as: ϕ(h,r,t)=⨁i(∑k=1miui,k(h,t)⊗vi,k(r)).φ(h,r,t)\;=\; _i ( _k=1^m_iu_i,k(h,t) v_i,k(r) ). It remains to show that negation acts by sign on the relation embeddings. Let ϵ∈ℤ2ε _2 be the negation element. The H-action on the feature space decomposes block-wise. On the i-th block Ci⊗RiC_i R_i, the action is component-wise: (id,ϵ)⋅(u⊗v)=(id⋅u)⊗(ϵ⋅v).(id,ε)·(u v)\;=\;(id· u) (ε· v). Since RiR_i is a real irreducible representation of ℤ2Z_2, it is one-dimensional, so ϵε acts on RiR_i as a scalar ηi∈+1,−1 _i∈\+1,-1\. Crucially, because the first component of the group element is the identity, the embedding ui,ku_i,k remains unchanged. Thus, the action on the sum is: ϕi(h,¬r,t) _i(h, r,t) =(id,ϵ)⋅∑k=1mi(ui,k(h,t)⊗vi,k(r)) \;=\;(id,ε)· _k=1^m_i (u_i,k(h,t) v_i,k(r) ) =ηi∑k=1mi(ui,k(h,t)⊗vi,k(r)). \;=\; _i _k=1^m_i (u_i,k(h,t) v_i,k(r) ). The SLP condition requires ϕ(h,¬r,t)=−ϕ(h,r,t)φ(h, r,t)=-φ(h,r,t) for all queries. Comparing this with the equation above, we see that for any block i where the feature map is not identically zero, we must have ηi=−1 _i=-1. (If ϕi≡0 _i≡ 0, the SLP constraint is satisfied on this block for any ηi _i, hence the sign choice is immaterial.) Consequently, in all cases, the relation embeddings must satisfy the sign-flip property: vi,k(¬r)=−vi,k(r).v_i,k( r)\;=\;-v_i,k(r). Thus, every contributing relation component transforms under negation by the sign representation. ∎ Appendix D Proof of Theorem 2 In this section, we provide the proof of Theorem 2. We first prove the lemma below, which will play a crucial role in proving Theorem 2. Lemma 10 (Converse alignment in Hom-spaces). Fix i and an H-equivariant projection πi:W→Wi≅Ci⊗Ri _i:W→ W_i C_i R_i. Let Φi:=πi∘Φ∈HomH(Vctx⊗Vrel,Ci⊗Ri) _i:= _i _H(V_ctx V_rel,\,C_i R_i), and identify this space with i⊗ℬiA_i _i via Lemma 9, where i=HomGE(Vctx,Ci)A_i=Hom_G_E(V_ctx,C_i) and ℬi=Homℤ2(Vrel,Ri)B_i=Hom_Z_2(V_rel,R_i). Let Ppair:Vctx→VctxP_pair:V_ctx→ V_ctx and Prel:Vrel→VrelP_rel:V_rel→ V_rel be the linear maps defined by their action on the basis vectors: Ppair(e(h,t)):=e(t,h)andPrel(er):=er⌣.P_pair(e_(h,t)):=e_(t,h) P_rel(e_r):=e_r . Note that the converse operation commutes with negation (i.e., (¬r)⌣=¬(r⌣)( r) = (r )), so PrelP_rel is ℤ2Z_2-equivariant. Define involutions pair(U)=U∘PpairP_pair(U)=U P_pair on iA_i and rel(S)=S∘PrelP_rel(S)=S P_rel on ℬiB_i. If ϕ(t,r⌣,h)=ϕ(h,r,t)φ(t,r ,h)=φ(h,r,t) for all (h,r,t)∈Q(h,r,t)∈ Q, then (pair⊗rel)(Φi)=Φi.(P_pair _rel)( _i)= _i. Proof. First, we verify that pairP_pair and relP_rel are well-defined. For any U∈iU _i and g∈GEg∈ G_E, since PpairP_pair commutes with the GEG_E-action, (U∘Ppair)(g⋅x)=U(Ppair(g⋅x))=U(g⋅Ppair(x))=g⋅(U∘Ppair)(x),(U P_pair)(g· x)=U(P_pair(g· x))=U(g· P_pair(x))=g·(U P_pair)(x), so pair(U)∈iP_pair(U) _i. Similarly, ℤ2Z_2-equivariance of PrelP_rel ensures relP_rel is well-defined on ℬiB_i. Next, since ϕ(t,r⌣,h)=ϕ(h,r,t)φ(t,r ,h)=φ(h,r,t) for all (h,r,t)∈Q(h,r,t)∈ Q, Φ∘(Ppair⊗Prel)=Φ (P_pair P_rel)= . Applying πi _i gives Φi∘(Ppair⊗Prel)=(πi∘Φ)∘(Ppair⊗Prel)=πi∘Φ=Φi. _i (P_pair P_rel)=( _i ) (P_pair P_rel)= _i = _i. Under the identification i⊗ℬi≅HomH(Vctx⊗Vrel,Ci⊗Ri)A_i _i _H(V_ctx V_rel,C_i R_i), a pure tensor U⊗SU S corresponds to the map x⊗y↦U(x)⊗S(y)x y U(x) S(y). Precomposing with Ppair⊗PrelP_pair P_rel yields (x⊗y)↦U(Ppair(x))⊗S(Prel(y))=(U∘Ppair)(x)⊗(S∘Prel)(y),(x y) U(P_pair(x)) S(P_rel(y))=(U P_pair)(x) (S P_rel)(y), which is exactly the map corresponding to (pairU)⊗(relS)(P_pairU) (P_relS). Extending linearly shows that precomposition by Ppair⊗PrelP_pair P_rel corresponds to pair⊗relP_pair _rel, hence (pair⊗rel)(Φi)=Φi(P_pair _rel)( _i)= _i. ∎ To prove Theorem 2, we restate its general form. Theorem 2 (Symmetric-Antisymmetric Alignment (Formal statement)). Assume the context-relation factorization from Theorem 1 and ϕ(t,r⌣,h)=ϕ(h,r,t)φ(t,r ,h)=φ(h,r,t) for all (h,r,t)∈Q.(h,r,t)∈ Q. Then, for each block i (as in Theorem 1), the projected feature component ϕi:=πi∘ϕ:Q→Wi≅Ci⊗Ri _i:= _i φ:Q→ W_i C_i R_i admits a decomposition ϕi(h,r,t)=ϕi+(h,r,t)+ϕi−(h,r,t), _i(h,r,t)= _i^+(h,r,t)+ _i^-(h,r,t), where each part can be written as a finite sum of context–relation pure tensors ϕi±(h,r,t)=∑k∈Ii±ui,k±(h,t)⊗vi,k±(r), _i^±(h,r,t)= _k∈ I_i^±u_i,k^±(h,t) v_i,k^±(r), such that every summand satisfies the aligned symmetry relations ui,k±(t,h)=±ui,k±(h,t),vi,k±(r⌣)=±vi,k±(r).u_i,k^±(t,h)=±\,u_i,k^±(h,t), v_i,k^±(r )=±\,v_i,k^±(r). Consequently, ϕ(h,r,t)=⨁i(ϕi+(h,r,t)+ϕi−(h,r,t))φ(h,r,t)= _i ( _i^+(h,r,t)+ _i^-(h,r,t) ) with the same componentwise symmetry properties on each block. Proof. Fix a block index i and an H-equivariant projection πi:W→Wi≅Ci⊗Ri _i:W→ W_i C_i R_i, and write Φi:=πi∘Φ∈HomH(Vctx⊗Vrel,Ci⊗Ri) _i:= _i _H(V_ctx V_rel,\,C_i R_i). By Lemma 9, we identify HomH(Vctx⊗Vrel,Ci⊗Ri)≅i⊗ℬi,Hom_H(V_ctx V_rel,\,C_i R_i)\; \;A_i _i, where i:=HomGE(Vctx,Ci)A_i:=Hom_G_E(V_ctx,C_i) and ℬi:=Homℤ2(Vrel,Ri)B_i:=Hom_Z_2(V_rel,R_i). Let pairP_pair and relP_rel be the involutions on iA_i and ℬiB_i defined in Lemma 10. Since they are involutions, their eigenvalues are either +1+1 or −1-1. We write their eigenspace decompositions as i=i+⊕i−andℬi=ℬi+⊕ℬi−,A_i=A_i^+ _i^- _i=B_i^+ _i^-, where the superscripts ± denote the eigenspaces corresponding to eigenvalues ±1± 1, respectively. Recall from Lemma 10 that (pair⊗rel)(Φi)=Φi(P_pair _rel)( _i)= _i. Note that the operator pair⊗relP_pair _rel acts on a pure tensor U⊗S∈iσ⊗ℬiτU S _i^σ _i^τ (where σ,τ∈+,−σ,τ∈\+,-\) by scalar multiplication with σ⋅τσ·τ. Since Φi _i is invariant (eigenvalue +1+1), it must lie in the subspace where this product is positive: Φi∈(i+⊗ℬi+)⊕(i−⊗ℬi−). _i∈(A_i^+ _i^+) (A_i^- _i^-). Consequently, Φi _i uniquely decomposes as Φi=Φi++Φi− _i= _i^++ _i^- with terms in these respective subspaces. We now expand each Φi± _i^± as a finite sum of pure tensors with factors in the corresponding eigenspaces. Since VctxV_ctx and VrelV_rel are finite-dimensional (in particular, |E|,|R|<∞|E|,|R|<∞), the spaces i±A_i^± and ℬi±B_i^± are finite-dimensional. Let d+:=dimi+d_+:= _i^+ and choose a basis Ui,1+,…,Ui,d++\U_i,1^+,…,U_i,d_+^+\ of i+A_i^+. Consider the linear map T+:(ℬi+)d+→i+⊗ℬi+,T+(S1,…,Sd+):=∑k=1d+Ui,k+⊗Sk.T^+:(B_i^+)^d_+ _i^+ _i^+, T^+(S_1,…,S_d_+):= _k=1^d_+U_i,k^+ S_k. This map is surjective. Indeed, any element in the tensor product is a sum of pure tensors U⊗SU S. Since Ui,k+\U_i,k^+\ is a basis, any U∈i+U _i^+ can be written as ∑kckUi,k+ _kc_kU_i,k^+. Substituting this expansion into the pure tensors and regrouping terms by Ui,k+U_i,k^+ shows that any element takes the form ∑kUi,k+⊗Sk′ _kU_i,k^+ S_k for some Sk′∈ℬi+S_k _i^+. Since the dimensions of the domain and codomain coincide: dim(ℬi+)d+=d+⋅dimℬi+=(dimi+)⋅(dimℬi+)=dim(i+⊗ℬi+), (B_i^+)^d_+=d_+· _i^+=( _i^+)·( _i^+)= (A_i^+ _i^+), the surjective linear map T+T^+ is an isomorphism. Therefore, there exist unique maps Si,1+,…,Si,d++∈ℬi+S_i,1^+,…,S_i,d_+^+ _i^+ such that Φi+=∑k=1d+Ui,k+⊗Si,k+. _i^+= _k=1^d_+U_i,k^+ S_i,k^+. Set Ii+:=1,…,d+I_i^+:=\1,…,d_+\. Analogously, we obtain unique maps Si,1−,…,Si,d−∈ℬi−S_i,1^-,…,S_i,d_-^- _i^- such that Φi−=∑k=1d−Ui,k−⊗Si,k−. _i^-= _k=1^d_-U_i,k^- S_i,k^-. Set Ii−:=1,…,d−I_i^-:=\1,…,d_-\. Now, define the projected feature maps by evaluation on basis tensors: ϕi±(h,r,t):=Φi±(e(h,t)⊗er),ui,k±(h,t):=Ui,k±(e(h,t)),vi,k±(r):=Si,k±(er). _i^±(h,r,t):= _i^± (e_(h,t) e_r ), u_i,k^±(h,t):=U_i,k^±(e_(h,t)), v_i,k^±(r):=S_i,k^±(e_r). Under the identification of Lemma 9, each pure tensor Ui,k±⊗Si,k±U_i,k^± S_i,k^± acts by (x⊗y)↦Ui,k±(x)⊗Si,k±(y)(x y) U_i,k^±(x) S_i,k^±(y), hence ϕi±(h,r,t)=∑k∈Ii±ui,k±(h,t)⊗vi,k±(r), _i^±(h,r,t)= _k∈ I_i^±u_i,k^±(h,t) v_i,k^±(r), and ϕi=ϕi++ϕi− _i= _i^++ _i^- since Φi=Φi++Φi− _i= _i^++ _i^-. Finally, since Ui,k±∈i±U_i,k^± _i^± means Ui,k±∘Ppair=±Ui,k±U_i,k^± P_pair=± U_i,k^±, evaluating at e(h,t)e_(h,t) gives ui,k±(t,h)=±ui,k±(h,t)u_i,k^±(t,h)=± u_i,k^±(h,t). Likewise Si,k±∈ℬi±S_i,k^± _i^± means Si,k±∘Prel=±Si,k±S_i,k^± P_rel=± S_i,k^±, and evaluating at ere_r gives vi,k±(r⌣)=±vi,k±(r)v_i,k^±(r )=± v_i,k^±(r). Summing over i yields the global decomposition. ∎ Appendix E Proof of Lemma 2 In this section, we provide the proof of Lemma 2. Lemma 2 (Kernel Stability Yields Bilinearity). Assume Assumption 1 and the symmetry of conjunction (Def. 4(i)). Then, there exists a unique symmetric bilinear operator F~:W∧×W∧→W∧ F:W × W → W such that F~(ϕp,ϕq)=ϕp∧q F( _p, _q)= _p q for all p,q∈Q∧p,q∈ Q . Proof. Fix q∈Q∧q∈ Q . Let V∧V be the free real vector space with basis ep:p∈Q∧\e_p:p∈ Q \, and let Φ:V∧→W∧ :V → W be the linear map defined on basis vectors by Φ(ep)=ϕp (e_p)= _p. Define a linear map P∧,q:V∧→V∧P_ ,q:V → V by its action on basis: P∧,q(ep):=ep∧q(p∈Q∧).P_ ,q(e_p):=e_p q (p∈ Q ). First, we claim that P∧,qP_ ,q induces a well-defined linear operator L∧(q)∈End(W∧)L_ (q) (W ) satisfying L∧(q)ϕp=ϕp∧q∀p∈Q∧.L_ (q)\, _p= _p q ∀\,p∈ Q . For any w∈W∧w∈ W , since W∧=spanϕp:p∈Q∧W =span\ _p:p∈ Q \, we may choose x∈V∧x∈ V with Φ(x)=w (x)=w, and define L∧(q)w:=Φ(P∧,qx).L_ (q)\,w:= (P_ ,qx ). It remains to check that this does not depend on the choice of x. If x′x is another preimage of w, then x−x′∈ker⁡Φx-x ∈ . By Assumption 1, we have P∧,q(ker⁡Φ)⊆ker⁡ΦP_ ,q( ) , hence Φ(P∧,q(x−x′))=0 (P_ ,q(x-x ))=0, i.e., Φ(P∧,qx)=Φ(P∧,qx′) (P_ ,qx)= (P_ ,qx ). Thus L∧(q)L_ (q) is well-defined. Linearity follows from linearity of Φ and P∧,qP_ ,q. Applying the definition to w=ϕp=Φ(ep)w= _p= (e_p) gives L∧(q)ϕp=Φ(P∧,qep)=Φ(ep∧q)=ϕp∧q,L_ (q)\, _p= (P_ ,qe_p )= (e_p q)= _p q, as claimed. We now construct the map F~:W∧×W∧→W∧ F:W × W → W . Since ϕp:p∈Q∧\ _p:p∈ Q \ spans W∧W , any vector v∈W∧v∈ W can be decomposed as a finite linear combination v=∑iciϕqiv= _ic_i _q_i. For any u∈W∧u∈ W , we define F~ F linear in the second argument by using the operators L∧(qi)L_ (q_i): F~(u,v):=∑iciL∧(qi)u. F(u,v)\;:=\; _ic_iL_ (q_i)\,u. To ensure the well-definedness of F~(u,v) F(u,v), we must check that this definition is independent of the decomposition of v. Suppose ∑iciϕqi=0 _ic_i _q_i=0, i.e., the vector y=∑icieqiy= _ic_ie_q_i lies in ker⁡Φ . We check the value of the map for any basis vector u=ϕpu= _p (p∈Q∧p∈ Q ): ∑iciL∧(qi)ϕp=∑iciϕp∧qi. _ic_iL_ (q_i)\, _p\;=\; _ic_i _p q_i. By commutativity of conjunction on realized inputs (equivalently, by the assumed symmetry of F), we have ϕp∧qi=ϕqi∧p _p q_i= _q_i p for all i. Thus, ∑iciϕp∧qi=∑iciϕqi∧p=Φ(∑icieqi∧p)=Φ(P∧,p(y)). _ic_i _p q_i\;=\; _ic_i _q_i p\;=\; ( _ic_ie_q_i p )\;=\; (P_ ,p(y) ). Since y∈ker⁡Φy∈ , by Assumption 1, we have P∧,p(y)∈ker⁡ΦP_ ,p(y)∈ , so Φ(P∧,p(y))=0 (P_ ,p(y))=0. Thus, ∑iciL∧(qi)ϕp=0 _ic_iL_ (q_i) _p=0 for all p∈Q∧p∈ Q . Since ϕp:p∈Q∧\ _p:p∈ Q \ spans W∧W , this implies ∑iciL∧(qi)u=0 _ic_iL_ (q_i)u=0 for all u∈W∧u∈ W . Therefore, if v=∑iciϕqi=∑jdjϕrjv= _ic_i _q_i= _jd_j _r_j are two decompositions, then ∑iciϕqi−∑jdjϕrj=0 _ic_i _q_i- _jd_j _r_j=0 implies ∑iciL∧(qi)u=∑jdjL∧(rj)u∀u∈W∧, _ic_iL_ (q_i)u\;=\; _jd_jL_ (r_j)u ∀\,u∈ W , so F~(u,v) F(u,v) is independent of the chosen decomposition of v. By construction, F~ F is linear in the second argument. Also, since each L∧(qi)L_ (q_i) is linear, F~ F is linear in the first argument. Finally, for realized inputs, F~(ϕp,ϕq)=L∧(q)ϕp=ϕp∧q F( _p, _q)=L_ (q) _p= _p q. Thus, F~ F is the unique bilinear extension. ∎