← Back to papers

Paper deep dive

The Reasoning Error About Reasoning: Why Different Types of Reasoning Require Different Representational Structures

Yiling Wu

Year: 2026Venue: arXiv preprintArea: cs.AIType: PreprintEmbeddings: 80

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 96%

Last extracted: 3/26/2026, 2:33:47 AM

Summary

The paper proposes a framework identifying four structural properties of representational systems—operability, consistency, structural preservation, and compositionality—to explain the structural demands of different reasoning types. It argues that a principal structural boundary exists: reasoning types below this boundary can function on associative, probabilistic representations, while those above it (like deductive reasoning) require all four properties to be fully satisfied, a requirement that cannot be met by scaling statistical learning alone.

Entities (6)

Compositionality ¡ structural-property ¡ 100%Consistency ¡ structural-property ¡ 100%Deductive Reasoning ¡ reasoning-type ¡ 100%Operability ¡ structural-property ¡ 100%Structural Preservation ¡ structural-property ¡ 100%Yiling Wu ¡ researcher ¡ 100%

Relation Signals (3)

Statistical Learning → failstosatisfy → Deductive Reasoning

confidence 95% ¡ Scaling statistical learning without structural reorganization of the representational system is insufficient to cross this boundary

Representational System → requires → Operability

confidence 90% ¡ I identify four structural properties of representational systems, namely operability, consistency, structural preservation, and compositionality

Deductive Reasoning → requires → Compositionality

confidence 90% ¡ Above this boundary, all four properties must be fully satisfied

Cypher Suggestions (2)

Find all structural properties required by a specific reasoning type. ¡ confidence 90% ¡ unvalidated

MATCH (r:ReasoningType {name: 'Deductive Reasoning'})-[:REQUIRES]->(p:StructuralProperty) RETURN p.name

Map the relationship between reasoning types and their structural demands. ¡ confidence 90% ¡ unvalidated

MATCH (r:ReasoningType)-[rel:REQUIRES]->(p:StructuralProperty) RETURN r.name, p.name

Abstract

Abstract:Different types of reasoning impose different structural demands on representational systems, yet no systematic account of these demands exists across psychology, AI, and philosophy of mind. I propose a framework identifying four structural properties of representational systems: operability, consistency, structural preservation, and compositionality. These properties are demanded to different degrees by different forms of reasoning, from induction through analogy and causal inference to deduction and formal logic. Each property excludes a distinct class of reasoning failure. The analysis reveals a principal structural boundary: reasoning types below it can operate on associative, probabilistic representations, while those above it require all four properties to be fully satisfied. Scaling statistical learning without structural reorganization is insufficient to cross this boundary, because the structural guarantees required by deductive reasoning cannot be approximated through probabilistic means. Converging evidence from AI evaluation, developmental psychology, and cognitive neuroscience supports the framework at different levels of directness. Three testable predictions are derived, including compounding degradation, selective vulnerability to targeted structural disruption, and irreducibility under scaling. The framework is a necessary-condition account, agnostic about representational format, that aims to reorganize existing debates rather than close them.

Tags

ai-safety (imported, 100%)csai (suggested, 92%)preprint (suggested, 88%)

Links

Your browser cannot display the PDF inline. Open PDF directly →

Full Text

79,512 characters extracted from source content.

Expand or collapse full text

The Reasoning Error About Reasoning: Why Different Types of Reasoning Require Different Representational Structures Yiling Wu Department of Philosophy University of Massachusetts Amherst Abstract Despite important advances in distinguishing types of reasoning and types of cognitive processing, current research in psychology, artificial intelligence, and philosophy of mind still lacks a systematic account of the structural demands that different reasoning types impose on the representational systems that support them. This gap has produced recurrent confusions: unresolved debates about human rationality, unpredicted cycles of success and failure in AI reasoning, and a stalled dispute about representational format. I propose a framework that identifies four structural properties of representational systems, namely operability, consistency, structural preservation, and compositionality, and show that these properties are demanded to different degrees by different forms of reasoning, from induction through analogy and causal inference to deduction and formal logic. Each property excludes a distinct class of reasoning failure. The analysis reveals a principal structural boundary: reasoning types below this boundary can operate on associative, probabilistic representations, while those above it require all four properties to be fully satisfied. Scaling statistical learning without structural reorganization of the representational system is insufficient to cross this boundary, because the structural guarantees required by deductive reasoning cannot be approximated through probabilistic means. Converging evidence from AI evaluation, developmental psychology, and cognitive neuroscience supports the framework at different levels of directness, which I explicitly distinguish and layer by evidential strength. Three core testable predictions are derived, including predictions about compounding degradation, selective vulnerability to targeted structural disruption, and irreducibility under scaling. The framework is a necessary-condition account, agnostic about representational format, that aims to reorganize existing debates rather than close them. The Reasoning Error About Reasoning: Why Different Types of Reasoning Require Different Representational Structures Yiling Wu Department of Philosophy University of Massachusetts Amherst 1 Introduction Research on reasoning has made substantial progress over the past several decades. The dual-process framework has clarified the distinction between fast, heuristic processing and slow, deliberate processing (Evans & Stanovich, 2013; Sloman, 1996; Kahneman, 2011). Developmental psychologists have mapped the emergence of different cognitive capacities across childhood (Piaget & Inhelder, 1958; Gopnik et al., 2004). AI researchers have built systems that solve complex mathematical problems and generate coherent chains of inference (DeepSeek-AI, 2025). And philosophers have refined our understanding of the structure of mental representations (Fodor & Pylyshyn, 1988; Piccinini, 2020; Quilty-Dunn, Porot, & Mandelbaum, 2023). Yet despite these advances, a gap remains. Although different types of reasoning, including induction, analogy, causal inference, deduction, and formal logic, are routinely distinguished in the literature, surprisingly little work has been devoted to asking what structural properties a representational system must possess in order to support each of these types. The dual-process framework tells us that some reasoning requires deliberate processing, but not why certain tasks demand it while others do not. AI evaluations document patterns of success and failure across reasoning benchmarks (Mondorf & Plank, 2024; Wan et al., 2024; Parmar et al., 2024), but lack a theoretical framework that predicts these patterns from properties of the representational system. The debate between classical and connectionist architectures (Fodor & Pylyshyn, 1988; Smolensky, 1988) asks whether representations must be symbolic or can be distributed, but does not systematically connect representational properties to the demands of different reasoning types. In each case, progress has been made on reasoning and on representation separately, but the structural relationship between specific reasoning types and specific representational properties has remained largely implicit. This gap matters. Without a systematic account of how reasoning types map onto representational demands, we cannot explain why the same system, human or artificial, succeeds at certain reasoning tasks while struggling with others. We cannot predict what kind of representational architecture is needed to support what kind of reasoning. And we cannot resolve debates that have persisted in part because they operate at the wrong level of analysis. This paper proposes a framework to fill this gap. I identify four structural properties of representational systems and argue that different types of reasoning depend on these properties to different degrees. The central claim is that there exists a principal structural boundary within the space of reasoning types. Below this boundary, reasoning can operate on associative, probabilistic representations that satisfy the structural properties only partially. Above this boundary, all four properties must be fully satisfied, and scaling statistical learning without structural reorganization of the representational system is insufficient to cross it. Several features of the approach define the scope of what this paper does and does not claim. First, the framework is functionalist: it specifies structural constraints without committing to any particular representational format. A connectionist system that satisfies all four properties can support deduction; a classical system that fails to maintain consistency cannot. Second, the four structural properties are proposed as a minimal functional analysis set for distinguishing reasoning types, not as a complete theory of representation. Each property excludes a distinct failure mode that the others cannot exclude, but I do not claim that no other properties are relevant to cognition more broadly. Third, the framework is a necessary-condition account. It specifies what representational systems must provide for different reasoning types, not what is sufficient. Working memory, attention, metacognition, and domain knowledge also matter. Fourth, the aim is to reorganize existing debates, not to close them. The paper proceeds as follows. Section 2 introduces the four structural properties and provides a justification for their selection. Section 3 analyzes the structural demands of five reasoning types and identifies the principal boundary. Section 4 argues that this boundary cannot be crossed by scaling statistical learning alone. Section 5 presents converging evidence, explicitly distinguishing the level of support each evidence line provides. Section 6 derives three core testable predictions. Section 7 discusses the framework’s novelty, its relation to existing positions, and its limitations. 2 Four Structural Properties of Representational Systems 2.1 From Format to Properties The classical debate about mental representation focuses on format. Are mental representations sentence-like structures with combinatorial syntax and semantics (Fodor, 1975; Fodor & Pylyshyn, 1988)? Model-like structures that are iconic and spatial (Johnson-Laird, 1983)? Distributed patterns of activation (Smolensky, 1988)? This paper takes a different approach, asking not what format representations take but what structural properties a representational system must possess to support reasoning. This shift is motivated by two observations. First, different formats may satisfy the same structural constraints. A system of mental rules and a system of mental models may both allow their components to be individually accessed and manipulated, even though the underlying format differs. If structural properties determine reasoning capacity, the format question becomes secondary for present purposes. Second, a property-level analysis is directly useful for evaluating artificial systems whose format is known, such as vectors, embeddings, and attention patterns, but whose reasoning capacities are in question. I identify four structural properties relevant to reasoning. These are properties of representational systems, not of individual representations. A system may satisfy some properties to varying degrees. The properties are proposed as a minimal functional analysis set: each excludes a distinct class of reasoning failure that the others do not exclude. I defend this minimality claim in Section 2.7, after introducing the properties individually. 2.2 Operability Operability refers to the capacity of a representational system to allow its component parts to be individually accessed, distinguished, and operated upon. A system with operability can identify a constituent part of a complex representation and perform targeted transformations on it without disrupting other parts. Given a representation of ”all dogs are mammals,” a system with operability can access ”dogs” and ”mammals” as separate elements, substitute one for the other, and generate ”all cats are mammals” while preserving the quantificational structure. This notion has deep roots. Fodor (1975) argued that mental representations have constituent structure: parts that are independently tokened. Pylyshyn (1984) developed this in his account of computation and cognition. Piccinini (2020) offers an account in which even neural representations can possess operability if their internal structure supports targeted transformations. Operability does not require that parts be discrete symbols. What it requires is that processing mechanisms can selectively engage with distinguishable aspects of representational content. The distinct failure mode that operability excludes is decomposition failure. A system lacking operability must treat representations as undifferentiated wholes and can only respond on the basis of holistic similarity, not internal structure. This failure mode is not addressed by consistency, structural preservation, or compositionality, each of which presupposes that components can already be distinguished. 2.3 Consistency Consistency refers to the stability of semantic content associated with a representational element across different contexts of use within a reasoning chain. When a reasoning chain involves multiple steps, the concepts at each step must maintain their identity. If semantic content shifts between steps, the chain becomes unreliable even if each step appears locally correct. Consider a syllogism. The major premise states that all mammals are warm-blooded. The minor premise states that whales are mammals. The conclusion follows only if ”mammals” denotes the same category in both premises. If the system allows this boundary to drift, the inference fails. Both mental model theory (Johnson-Laird, 2001) and mental logic theory (Rips, 1994) implicitly require consistency. Fodor (1998) made a related point about conceptual identity as a precondition for compositional thought. Consistency should be distinguished from rigidity. A system can be context-sensitive in interpreting inputs while maintaining consistency within a given reasoning chain. What matters is not that a concept always means the same thing everywhere, but that it retains its meaning throughout a particular inference. The distinct failure mode that consistency excludes is semantic drift within a reasoning chain. A system that has operability, meaning it can access components, but lacks consistency, meaning those components shift in meaning across steps, will produce inferences that appear well-formed but are unreliable because the same term no longer picks out the same content at different stages of the chain. This failure cannot be prevented by operability alone, nor by structural preservation or compositionality, which presuppose stable content. 2.4 Structural Preservation Structural preservation refers to the capacity of reasoning transformations to maintain the structural relations encoded in the representations they operate on. When we infer from ”all A are B” and ”all B are C” that ”all A are C,” the inclusion relation chaining through the middle term is preserved. A system that sometimes preserves and sometimes loses this relation cannot perform such inferences reliably. This property connects to Johnson-Laird’s (2010) iconicity: the structure of a mental model corresponds to the structure of what it represents. Piccinini (2020) argues that genuine representations maintain a homomorphic relationship with their targets. Gładziejewski and Miłkowski (2017) argue that structural representations are causally relevant in virtue of their structure-preserving properties. The distinct failure mode that structural preservation excludes is relational distortion during inference. A system that has operability and consistency, meaning it can access stable components, but lacks structural preservation, meaning the relations among those components are not maintained through transformations, will produce conclusions that involve the right concepts in the wrong relationships. For instance, it might correctly identify A, B, and C as distinct stable concepts, yet lose the inclusion chain that links them, generating conclusions that do not follow from the premises. This failure is not prevented by the other three properties. 2.5 Compositionality Compositionality refers to the systematic determination of the meaning of a complex representation by the meanings of its parts and the way they are combined. Understanding ”John loves Mary” and ”Mary loves John” as expressing different contents requires treating the arrangement of components as semantically significant. Fodor and Pylyshyn (1988) placed compositionality at the center of their critique of connectionism. The ensuing debate produced important refinements. Smolensky (1988) argued for connectionist compositionality without classical structure. Van Gelder (1990) distinguished classical from functional compositionality. Chalmers (1990) proposed that connectionist systems could realize compositional representations through activation patterns. The present framework adopts a functional reading: what matters is that complex meanings are systematically determined by parts and combination, regardless of realization. The distinct failure mode that compositionality excludes is structural indifference, the inability to distinguish representations that share the same parts but differ in arrangement. A system with operability, consistency, and structural preservation can access stable components and maintain relations through individual transformations, but without compositionality it cannot distinguish ”all A are B” from ”all B are A.” This failure is catastrophic for deduction and is not prevented by the other three properties. 2.6 Relations Among the Properties The four properties are partially ordered. Compositionality presupposes operability: one cannot compose what one cannot access. Structural preservation presupposes consistency: if content drifts, structure cannot be maintained. But the converse does not hold. Operability does not entail compositionality, and consistency does not entail structural preservation. The properties form increasingly demanding constraints, with operability as the most basic and compositionality as the most demanding. 2.7 Why These Four? A Minimality Argument The selection of these four properties requires justification. Why these four, rather than others? The answer is that each property excludes a distinct class of reasoning failure, and no property in the set can be replaced by any combination of the others. To see that the failure modes are genuinely independent, consider the following contrasts. A system can decompose representations into stable tokens and maintain relations through individual transformations, yet still fail to distinguish role reversals: it treats ”the cat chased the dog” and ”the dog chased the cat” as equivalent, because it lacks sensitivity to the arrangement of parts. This is compositionality failure in a system that has operability, consistency, and structural preservation. Conversely, a system can be fully sensitive to arrangement, thus possessing compositionality, and able to decompose its representations, thus possessing operability, yet allow the concept denoted by a key term to shift subtly across a multi-step inference, so that what ”mammal” picks out in step one is not quite what it picks out in step four. This is consistency failure in a system that has operability, compositionality, and structural preservation of individual steps. A third system can decompose representations, maintain stable content, and distinguish structural arrangements, yet fail to carry the inclusion relation through a chain of inferences, losing the transitivity that links the first premise to the conclusion. This is structural preservation failure in a system that has the other three properties. These contrasts are not merely logical possibilities. They correspond to distinguishable patterns of reasoning failure that have been documented in both human error data and AI evaluation, as Sections 4 and 5 will show. A clarification may help prevent the most likely confusion between properties. Compositionality and structural preservation are the two most easily conflated, because both involve structure. The distinction is this: compositionality concerns the meaning-contribution of arrangement as such, that is, whether a system treats the ordering of parts as semantically significant when constructing or interpreting a representation. More precisely: compositionality concerns whether arrangement contributes to meaning at the point of representation. Structural preservation concerns the maintenance of already encoded relations across transformation, that is, whether relations that hold in the premises are carried forward intact through the inferential process. More precisely: structural preservation concerns whether already encoded relations survive inferential transformation. A system can be fully compositional, in that it distinguishes ”A chased B” from ”B chased A,” yet fail to preserve the inclusion chain through a multi-step syllogism. Conversely, a system can preserve individual relational links through each transformation yet be blind to the difference between two representations that arrange the same elements in different roles. A natural question is whether the set is complete, whether there are additional structural properties, such as variable binding, recursion, or hierarchy, that should be included. I do not claim completeness. The four properties are proposed as a minimal set sufficient for the specific purpose of distinguishing the structural demands of different reasoning types. These additional notions can often be analyzed as special cases or implementation-level refinements of the present four-property framework, rather than being exhaustively reduced to any single property. Variable binding, for instance, plausibly involves operability, in the sense of accessing a component, and compositionality, in the sense of assigning it a role within structured content, working together. Recursion may be understood as the iterated application of compositional construction. Hierarchy involves structural relations that fall under both structural preservation, in maintaining hierarchical ordering through inference, and compositionality, in building hierarchically organized representations from parts. I do not claim that these analyses are the only correct ones, but they suggest that the four properties provide a level of abstraction appropriate for the present purpose. The framework could be extended with additional properties if future work reveals failure modes that the current four do not capture. 3 Reasoning Types and Their Structural Demands This section examines five types of reasoning in order of increasing structural demand. For each, I specify which structural properties are necessary, which are beneficial but not required, and which are largely irrelevant. An important qualification applies throughout: the claims in this section concern the minimal structural requirements for core forms of each reasoning type as standardly studied in the cognitive science literature, not the most sophisticated or hybrid cases of each type. Complex versions of any reasoning type may approach or exceed the demands I assign to the core case. The gradient I identify is therefore a characterization of typical structural demands, not an exhaustive taxonomy of every possible instance. 3.1 Inductive Reasoning Inductive reasoning involves drawing general conclusions from specific observations. Its defining feature is that the conclusion goes beyond the evidence: the premises support but do not guarantee the conclusion (Heit, 2000). Because inductive conclusions are probabilistic, the structural demands are modest. The core operation is detecting regularities across instances and generalizing to new cases. This requires minimal operability that is, to identify instances and similarities, but does not require consistency, structural preservation, or compositionality in the strict, fully stabilized form demanded by demonstrative reasoning. Inductive generalizations tolerate fuzzy concept boundaries, the conclusion does not stand in a strict structural relationship to its premises, and generalization can proceed on feature overlap without treating arrangement as semantically significant. Tenenbaum, Kemp, Griffiths, and Goodman (2011) have shown that Bayesian models operating on probabilistic representations capture many aspects of human induction, suggesting that the core representational requirements are within the capacity of statistical learning. I do not claim that induction requires none of these properties. I claim that the core inductive operation, as typically studied, does not require any of them to be fully satisfied. 3.2 Analogical Reasoning Analogical reasoning involves drawing inferences about a target domain based on structural similarity to a source domain (Gentner, 1983). Gentner’s structure-mapping theory holds that core analogy depends on aligning relational structures between domains, requiring more than simple feature matching. This requires operability that is, to access relational structure, and partial structural preservation that is, to map relations from source to target,. However, analogies as typically studied are inherently approximate. Holyoak and Thagard (1995) argue that analogical reasoning is governed by multiple soft constraints satisfied in a graded manner. Full consistency and full compositionality are not required in the strict form demanded by demonstrative reasoning, because analogy claims relevance, not validity. I note that more sophisticated forms of analogical reasoning, such as those involving higher-order relational abstraction, may place greater demands. The present characterization applies to the core phenomenon as standardly investigated in cognitive psychology. 3.3 Causal Reasoning Causal reasoning involves identifying and reasoning about cause-effect relationships. Causal relations are directional, so the cause must be distinguishable from the effect and the ordering preserved. This requires operability that is, to distinguish causal elements,, consistency because the identity of causes and effects must be stable,, and partial structural preservation because directionality must be maintained,. Gopnik, Glymour, Sobel, Schulz, Kushnir, and Danks (2004) have shown that children’s causal learning can be modeled by Bayesian networks encoding directed relationships. Full compositionality is not strictly required for the core causal operations studied in the literature: causal inferences can be drawn on the basis of temporal contiguity, covariation, and intervention without fully compositional representations (Sloman, 2005). I note that complex forms of causal reasoning, particularly those involving counterfactuals, nested interventions, and causal models with compositional structure, may approach the structural demands of deduction. The boundary I identify below is therefore best understood as a principal structural boundary for the purpose of the present analysis, not as the only possible boundary. The key point is that the core operations of causal reasoning, as typically studied, do not require the full set of structural properties that deduction demands. The present placement of causal reasoning in the gradient concerns the core representational demands of causal inference as ordinarily studied in cognitive and developmental psychology, not fully articulated causal model construction as practiced in scientific reasoning or formal causal epistemology. The claim is comparative rather than eliminative: causal reasoning can involve substantial structure, but its core forms do not require the full four-property package in the same strict way that demonstrative reasoning does. 3.4 Deductive Reasoning Deductive reasoning involves drawing conclusions that are necessarily true given the truth of the premises. This is what distinguishes deduction from everything considered so far. Inductive conclusions are probable. Analogical conclusions are suggestive. Causal conclusions are supported. Deductive conclusions are necessitated. This distinction has profound consequences for representational demands. All four structural properties must be fully satisfied. I argue for each in turn and then show that this requirement holds regardless of whether deduction is implemented through mental rules or mental models. Operability is required because the system must access and manipulate individual components of premises. In a syllogism, subject, predicate, and quantifier must be individually accessible so that the system can identify the middle term connecting the major and minor premises and derive the conclusion. Without operability, the system must treat each premise as an undifferentiated whole and can only respond on the basis of holistic similarity to patterns encountered in prior experience. Consistency is required because the concepts employed in the premises must maintain the degree of content stability necessary to preserve the identity conditions relevant to validity within the inference. The term ”mammals” in the major premise must pick out the same category as in the minor premise. Unlike induction, where approximate stability suffices because the conclusion is probabilistic, deduction requires that content remain stable enough to preserve the truth-conditional relationships on which validity depends. If semantic content drifts, even slightly, across the steps of an inference, the link between premises and conclusion is severed, because validity is a binary property: an inference either preserves truth from premises to conclusion, or it does not. Structural preservation is required because the inference must maintain the structural relations encoded in the premises. From ”all A are B” and ”all B are C,” the inclusion chain must be preserved intact to yield ”all A are C.” If the system sometimes preserves and sometimes loses this structure, it cannot perform even simple syllogisms reliably. Compositionality is required because the system must distinguish propositions that share parts but differ in structure. ”All dogs are mammals” and ”all mammals are dogs” have the same constituents but support entirely different inferences. Without compositionality, the system cannot distinguish them, and any inference involving directional relations will be unreliable. A crucial feature of this analysis is that it holds regardless of implementational format. I develop this point in detail because it is central to the framework’s claim of implementation neutrality and constitutes one of its most important theoretical contributions. On the mental logic account (Rips, 1994; Braine & O’Brien, 1998), deductive reasoning involves applying inference rules to propositional representations. Consider how each structural property figures in this process. Operability is required at the most basic level: applying a rule such as modus ponens requires identifying the conditional structure of one premise and matching its antecedent against another premise. The system must parse the representation into its component parts before any rule can be applied. Without this capacity, the system cannot even set up the conditions for rule application. Consistency is required because the rules presuppose that terms retain their content across applications. When modus ponens takes ”if P then Q” and ”P” as inputs, the P in the conditional must denote the same proposition as the standalone P. If P shifts in meaning between its two occurrences, the rule is applied to different contents and the inference is unsound, even though its form appears correct. Structural preservation is required because the rules are, by definition, validity-preserving transformations. What makes modus ponens a rule of inference, rather than an arbitrary manipulation, is precisely that it preserves truth from premises to conclusion. This preservation depends on maintaining the structural relationship namely conditional dependency, that links the premises. Compositionality is required because the rules are sensitive to the structural arrangement of parts. The rule of universal instantiation, for example, must distinguish ”all A are B” from ”all B are A” in order to substitute correctly. If the system treats these as equivalent because they contain the same terms, the rule produces invalid conclusions. On the mental model account (Johnson-Laird, 1983, 2006, 2010), deductive reasoning involves constructing models of the possibilities compatible with the premises and searching for counterexamples to the putative conclusion. Consider how each structural property figures in this process. Operability is required because the tokens within a model must be individually accessible. To construct a model of ”all mammals are warm-blooded,” the system must create tokens representing mammals and assign them the property of being warm-blooded. To search for counterexamples, the system must inspect individual tokens and check whether they satisfy the putative conclusion. Without the ability to access individual tokens, the model is opaque and cannot support systematic search. Consistency is required because tokens must maintain their identity across the model and across successive models. A token representing a whale in the model of the major premise must still represent the same whale when the system integrates the minor premise. If token identity is unstable, the system cannot determine whether the information from different premises applies to the same or different entities, and the model becomes unreliable. Structural preservation is required because the iconicity of the model, the correspondence between the structure of the model and the structure of what it represents, must be maintained throughout the reasoning process. If this correspondence breaks down during model manipulation, the model no longer faithfully represents the possibilities that the premises describe, and conclusions drawn from it are not guaranteed. Compositionality is required because complex models are built from components in a structure-sensitive way. The model of ”all dogs are mammals” differs from the model of ”all mammals are dogs” in the arrangement of its components, even though the same entity types are involved. Model construction must be sensitive to this arrangement, or the system will construct models that represent the wrong set of possibilities. The convergence of these two historically opposed traditions on the same set of structural requirements is, I submit, strong evidence that these requirements are genuine prerequisites for deduction, not artifacts of any particular theory. Mental logic and mental model theories disagree about the mechanism of deduction, about whether it proceeds through rule application or model construction. But they converge on the functional constraints that the representational system must satisfy for either mechanism to work. This convergence matters because it shows that the four-property framework is not an artifact of rule-based or model-based implementation assumptions. It is a characterization of what deductive reliability demands from any representational system, regardless of how that system is organized. The convergence at the level of structural properties, despite disagreement at the level of mechanism, is what one would expect if the properties capture something real about what deduction demands. It is also what makes the framework implementation-neutral in a substantive sense, not merely in the trivial sense of being vague about implementation, but in the strong sense of identifying constraints that are invariant across the two most developed and empirically tested theories of deductive cognition. What converges is not a preferred mechanism of deduction, but the structural conditions any reliable deductive mechanism must satisfy. Neuroimaging evidence supports this from a different angle. Goel (2007) conducted a meta-analysis of 28 neuroimaging studies and found clear dissociations among the neural substrates of different deductive subtypes. Relational reasoning preferentially engages the right parietal cortex associated with spatial and structural processing. Categorical reasoning engages left frontal regions associated with linguistic and rule-based processing. Goel concluded that deduction does not rely on a unitary brain system. What unites these diverse neural implementations, the present framework suggests, is not a shared mechanism but a shared set of structural requirements that the underlying representations must satisfy. 3.5 Formal Logical Reasoning Formal logical reasoning requires all four structural properties at their strictest level, plus rule transparency: the rules governing transformations must be explicit, deterministic, and inspectable. In everyday deduction, the inferential process may be partly implicit. In formal logic, every step must be a fully explicit application of a recognized rule. Rips (1994) developed the PSYCOP model implementing natural deduction as a model of human proof construction. Stenning and van Lambalgen (2008) argue that formal reasoning demands both precise interpretation and strict derivation. Formal reasoning is the limiting case that makes explicit what deduction requires implicitly. 3.6 The Principal Structural Boundary The analysis reveals a gradient of structural demands: inductive requires the least, then analogical, then causal, then deductive, then formal-logical. Within this gradient, there is a principal structural boundary between causal and deductive reasoning. I call it ”principal” rather than ”absolute” because I do not claim it is the only possible boundary in the space of reasoning. Complex counterfactual reasoning, for instance, may straddle it. But for the purpose of understanding why reasoning research has produced the confusions identified in Section 1, this is the most important and most neglected boundary. Below this boundary, reasoning types can operate on representations that satisfy the structural properties partially and approximately. The conclusions are probabilistic, suggestive, or supported, and this tolerance for approximation in the conclusion is matched by tolerance for approximation in the underlying representations. Above the boundary, deductive reasoning claims that the conclusion necessarily follows from the premises. This claim of necessity places a correspondingly strict demand: the structural properties must be satisfied to the degree needed to sustain the identity conditions and structural relations on which validity depends. This boundary corresponds to the distinction between ampliative and demonstrative reasoning, one of the most fundamental distinctions in logic and epistemology. It also corresponds, as Section 5 will show, to independently observable discontinuities in developmental sequence, neural recruitment, and AI performance profiles. 4 Scaling Without Structural Reorganization Is Insufficient The principal boundary faces an obvious challenge: perhaps a system that satisfies the four structural properties only approximately, through massive statistical learning, can approximate deductive reasoning well enough that the distinction becomes practically irrelevant. I argue that this is not the case. My claim is specifically an insufficiency claim: scaling statistical learning without structural reorganization of the representational system is insufficient to provide the structural guarantees that deductive reasoning requires. I am not claiming a full logical impossibility. I am not claiming that no system trained through statistical methods could ever perform deduction. I am claiming that statistical approximation alone, without changes to the structural properties of the representational system, cannot secure deductive reliability. Two principal arguments and one defensive supplement support this claim. A critical clarification is needed here, because the term ”structural reorganization” invites the question of whether the four properties must be hand-coded or can be acquired through learning. The answer is that structural reorganization is defined functionally, not architecturally. If a system, through any process whatsoever, including reinforcement learning, self-supervised training, or any future training method, comes to possess representations that genuinely satisfy operability, consistency, structural preservation, and compositionality, then it has undergone structural reorganization in the sense intended here, regardless of whether the properties were explicitly engineered or spontaneously emerged. The insufficiency claim is not that learning cannot produce structural properties. It is that merely scaling up a system whose representations lack these properties, adding more parameters, more data, or more compute without changing what the representations can do, will not cause the properties to appear. The question is always empirical: does the system’s representational architecture, after training, actually satisfy the properties? If it does, it has crossed the boundary. If it does not, no amount of further scaling within that architecture will cross it. The evidence reviewed in Section 5 suggests that current large language models, even after extensive reinforcement learning, do not yet satisfy these properties, as indicated by the persistence of content effects, length degradation, and fallacy blindness. But the framework does not rule out the possibility that future training methods might produce genuine structural reorganization. What it rules out is the expectation that scaling alone, without such reorganization, will suffice. 4.1 The Argument from Validity Validity is a binary property of inferences. An inference is either valid or not. For ampliative reasoning, approximation is tolerable: an inductive inference correct 95% of the time is excellent. For deduction, a system that preserves validity 95% of the time is not performing deduction with a 5% error rate. It is performing a probabilistic simulation that provides no structural guarantee of validity in any particular case. This compounds over multi-step reasoning. If each step has independent probability p of being correct, a chain of n steps has probability p to the nth power. For five steps with p = 0.95, the chain probability is approximately 0.77. For ten steps, 0.60. For twenty, 0.36. Genuine deduction does not degrade with chain length, because each step preserves validity by structural means. Saparov and He (2023) confirmed this formally, showing that chain-of-thought reasoning in language models follows a greedy strategy vulnerable to compounding errors. The ICLR 2025 DeduCE evaluation confirmed that the primary source of error is the number of reasoning steps, not input complexity. This is precisely what the framework predicts for systems that approximate the structural properties without fully satisfying them. 4.2 The Argument from Error Patterns If apparent deductive ability is grounded in statistical patterns rather than structural properties, three predictable failure modes should appear. First, content effects: performance on logically equivalent problems should vary with semantic content, because statistical associations are content-dependent while validity is content-independent. Dasgupta et al. (2024) confirmed this. Second, length degradation: performance should decline with chain length, because each step introduces compounding risk. Wan et al. (2024) and the DeduCE evaluation confirmed this. Third, fallacy blindness: the system should accept plausible but invalid inferences. Wan et al. (2024) found that current models have the lowest accuracy in identifying logical fallacies. These are not random failures. They are the specific failure modes predicted by the framework for systems that partially satisfy the structural properties through statistical learning. Content effects arise from the absence of full structural preservation: the system’s response is driven by distributional features of the content rather than structural relations among premises. Length degradation arises from the absence of full consistency and structural preservation: small approximation errors compound across steps. Fallacy blindness arises from the absence of full compositionality: the system cannot reliably distinguish valid from invalid structural arrangements. The three modes map onto the three failure classes introduced in Section 2, providing a direct link between the theoretical framework and observable patterns. These mappings are intended as primary diagnostic correspondences, not as one-to-one exclusive reductions. Length degradation, for example, may also involve operability limitations under resource constraints. The point is that each error pattern has a primary structural source that the framework identifies, not that each pattern is caused by exactly one missing property. The framework predicts a structured family of dominant vulnerabilities, not a set of isolated single-cause failures. 4.3 Addressing the ”But It Sometimes Gets the Right Answer” Objection The two arguments above constitute the principal grounds for the insufficiency claim. A further objection, however, must be addressed. One might argue that these systems sometimes produce correct deductive conclusions, and ask whether this does not show that the structural properties are approximately satisfied. The competence-performance distinction, foundational in cognitive science, provides the answer. Correct answers do not demonstrate competence to produce them through valid inference. A system may arrive at correct conclusions by pattern retrieval rather than deduction. Saparov et al. (2023) tested language models on out-of-distribution deductive problems and found significant degradation, suggesting dependence on distributional familiarity rather than structural competence. McCoy et al. (2024) showed that reasoning in language models is shaped by probabilistic patterns rather than logical processes. The emergent reasoning in DeepSeek-R1 (DeepSeek-AI, 2025), including chain-of-thought and self-verification, is notable but does not settle the question: sophisticated heuristics for checking outputs are not the same as representations satisfying the four structural properties. This objection is important to address, but the competence-performance distinction is a well-established tool for doing so. The real weight of the insufficiency argument rests on the validity and error-pattern arguments above. 4.4 The Dual-Process Objection One might object that human deduction is also imperfect. This objection conflates two claims. The present argument does not claim any system performs deduction perfectly. It claims that reliable deduction requires representations with certain structural properties, and their absence produces predictable failures. The dual-process framework (Evans & Stanovich, 2013) supports this: deduction engages Type 2 (slow, deliberate) processes. Errors occur when Type 1 (fast, associative) processes interfere (Sloman, 1996; Kahneman, 2011). When subjects reason carefully and are trained in logic, performance improves markedly (Nisbett, 1993; Lehman et al., 1988). Human cognition can access representational systems satisfying the properties, but engaging them is effortful. The errors are evidence that the systems are demanding, not that they are unnecessary. The dual-process framework explains when humans succeed and fail at deduction. The present framework explains why: the properties required for deduction can only be provided by resource-intensive Type 2 processing. To be clear, the framework does not identify Type 2 processing with the four properties. It explains why tasks that recruit Type 2 processing are precisely those for which these representational properties must be stably available. The dual-process distinction is at the processing level. The present framework operates at the representational level. The bridge between them is that the representational demands of deduction are too exacting to be met by fast, associative processing. Indeed, the framework gains predictive specificity precisely from the patterns of human deductive failure. The four structural properties are not a description of an ideal logical machine that humans merely approximate. They are a diagnostic tool for identifying which property breaks down when human deduction goes wrong. When working memory is overloaded, as in the cognitive load experiments of De Neys (2006), the primary casualty is the ability to maintain term identity across the steps of a reasoning chain. This is consistency failure: the concept picked out by a key term in step one is no longer reliably the same concept in step three, because the resources needed to maintain stable content across steps have been diverted. When content effects occur, as in the belief-bias literature (Evans, Barston, & Pollard, 1983), the primary casualty is the ability to respond to the structural arrangement of premises rather than to their distributional familiarity. This is compositionality failure: the system responds to the plausibility of the content rather than to the logical structure that arranges it. When multi-step deductive chains break down, as documented in studies of conditional and syllogistic reasoning (Markovits & Barrouillet, 2002), the primary casualty is the ability to carry relational structure intact through successive transformations. This is structural preservation failure. These correspondences between documented human error patterns and specific property breakdowns are what one would expect if the four properties are genuine functional prerequisites for deduction, rather than post hoc labels applied to a unitary capacity. The framework does not merely say that humans sometimes fail at deduction. It says which property fails in which condition, and the existing evidence is consistent with these specific predictions. 5 Converging Evidence The framework generates predictions testable against multiple independent sources. It is important to be explicit about the different levels of support these evidence lines provide. They do not all verify the framework in the same way. AI evidence offers the most direct diagnostic support, because the characteristic error patterns map directly onto the structural diagnosis. Developmental evidence provides gradient-compatible support: the ordering of reasoning acquisition corresponds to the structural-demand gradient, but does not directly verify the four properties themselves. Neuroscience evidence provides convergent support: different reasoning types recruit different processing resources, consistent with different structural demands, but the evidence does not directly demonstrate the presence or absence of specific structural properties in neural representations. I present these evidence lines in this explicit hierarchy. 5.1 Direct Diagnostic Support: Artificial Intelligence The framework predicts that systems whose representations partially satisfy the structural properties through statistical learning should perform well below the principal boundary and struggle above it, with failures exhibiting the error patterns identified in Section 4.2. An important clarification is needed here: the present use of AI evidence is diagnostic rather than constitutive. The framework was not inferred from benchmark outcomes. It was derived from an analysis of representational structure and reasoning demands. Benchmark outcomes provide an unusually sharp test bed for the framework’s predictions, but they are not its theoretical source. This prediction is confirmed across multiple model generations and benchmark families, suggesting a recurring pattern rather than a feature of any single system. The IJCAI 2025 survey reviewed logical reasoning in large language models across benchmarks spanning 2021 to 2025. The recurring pattern is clear: adequate performance on simple, short-chain problems; degradation with complexity, chain length, and novelty; better performance on propositional than predicate logic, consistent with the greater structural demands of quantificational reasoning. This pattern has persisted across multiple model generations, from GPT-3 through GPT-4 to reasoning-optimized models, suggesting it reflects a structural feature of these systems rather than a temporary engineering limitation. LogicBench (Parmar et al., 2024) provided systematic assessment of propositional, first-order, and non-monotonic logic. Two findings are particularly diagnostic. First, models struggle with fallacy detection, tending to accept plausible but invalid inferences. This maps directly onto the compositionality-related failure mode that is, structural indifference. Second, performance is sensitive to the logical form of problems in ways suggesting reliance on surface features, mapping onto the content-effect prediction. The DeduCE evaluation (ICLR 2025) introduced a metric of deductive consistency. The key finding is that the primary error source is maintenance of valid inference across steps, not premise comprehension. This directly confirms the structural-preservation prediction: the systems can process individual premises since they have some degree of operability, but cannot maintain the structural relations through multi-step chains. Saparov and He (2023) demonstrated formally that chain-of-thought reasoning follows a greedy strategy producing compounding errors, confirming the validity argument’s prediction about degradation as a function of chain length. Dasgupta et al. (2024) found that language models show content effects more pervasive and less responsive to logical structure than those in humans, consistent with systems whose responses are driven by distributional features rather than structural relations. DeepSeek-R1 (DeepSeek-AI, 2025) is an important test case. Trained through reinforcement learning, it developed emergent self-verification and error correction, with substantially improved benchmark performance. Yet even this model exhibits the predicted patterns: content sensitivity persists, novel logical structures remain challenging, and deductive consistency degrades with chain length. The framework is not tied to any single model generation. These cases are illustrative because they display recurring patterns already visible across multiple waves of systems. If future systems eliminate these patterns while remaining purely statistical learning systems without structural reorganization, that would constitute evidence against the framework. In summary, AI evidence provides the most direct diagnostic support for the framework because the recurrent error patterns, content effects, length degradation, and fallacy blindness, correspond to the specific failure modes predicted by the structural diagnosis. This evidence does not merely show that current systems struggle with deduction. It shows that they struggle in exactly the ways the framework predicts for systems that partially satisfy the structural properties through statistical learning. 5.2 Gradient-Compatible Support: Human Development The framework predicts that reasoning types should be acquired in an order corresponding to the structural-demand gradient. This prediction is broadly confirmed, though the developmental evidence supports the gradient ordering rather than directly verifying the four properties. Inductive reasoning is present in infancy. Infants generalize from observed regularities (Mandler, 2004). Analogical reasoning emerges in early childhood, initially limited to simple relational matches (Goswami, 1992). Gentner and Rattermann (1991) documented a relational shift from surface similarity to deeper relational structure. Causal reasoning emerges by age three to four (Gopnik et al., 2004; Schulz, 2012). Deductive reasoning emerges significantly later. Piaget and Inhelder (1958) placed formal operational thinking at approximately age twelve. Subsequent research shows earlier emergence under favorable conditions (Dias & Harris, 1988; Markovits & Barrouillet, 2002), but the trajectory is clear: deductive competence lags inductive and causal competence by several years, and its development is tied to working memory and executive function (Handley et al., 2004). The ordering matches the structural-demand gradient. If reasoning were a single capacity improving with age, this specific type-differentiated ordering would not be expected. The framework provides a natural explanation: more demanding types require more mature representational systems. I emphasize that this evidence is compatible with the framework rather than directly diagnostic. The developmental ordering does not by itself demonstrate that the four specific structural properties drive the sequence. But the correspondence is what the framework predicts and what alternative accounts do not straightforwardly explain. Developmental evidence thus supports compatibility with the structural-demand gradient, but does not by itself validate the four-property decomposition. 5.3 Convergent Support: Cognitive Neuroscience Different reasoning types should recruit different neural resources if they impose different structural demands. This is broadly confirmed, though the neuroscience evidence supports the claim that different types recruit different systems rather than directly verifying the four properties. Goel (2007) found clear dissociations among neural substrates of deductive subtypes. Osherson et al. (1998) found distinct brain loci for deductive versus probabilistic reasoning. Goel and Dolan (2004) found differential prefrontal involvement in deductive versus inductive reasoning. De Neys (2006) showed that cognitive load selectively impairs deductive performance. The broad neuroanatomical pattern is suggestive: prefrontal cortical regions, particularly the left inferior frontal gyrus, are associated with rule-governed, structure-sensitive processing and are preferentially recruited by deductive tasks, while medial temporal lobe structures associated with associative memory and pattern completion are more prominent in inductive and probabilistic reasoning. This dissociation is at least consistent with the framework’s claim that deduction draws on representational resources with different structural properties than those supporting associative reasoning, though the mapping between neural systems and the specific structural properties identified here remains indirect. Neuroscientific evidence thus supports heterogeneity in the processing resources recruited by different reasoning forms, but only indirectly bears on the specific representational properties identified here. The data confirm that different reasoning types are not simply more or less demanding versions of the same process, but they do not directly demonstrate the presence or absence of operability, consistency, structural preservation, or compositionality in neural representations. 5.4 Summary The three evidence lines support the framework at different levels. AI evidence directly confirms the predicted error patterns and their mapping onto structural failure modes. Developmental evidence confirms the predicted ordering of acquisition. Neuroscience evidence confirms that reasoning above the boundary recruits distinctively structured processing. Their convergence across independent domains strengthens the overall case, but the framework should be evaluated in light of the explicit differences in evidential directness. 6 Three Core Testable Predictions The framework generates many possible predictions. I focus on three that are most directly testable, most distinctive to the framework, and most capable of falsifying it. Prediction 1: Compounding degradation. Any system that does not fully satisfy the four structural properties will show performance on deductive tasks that degrades as a compounding function of chain length, not merely linearly. If per-step accuracy is p, multi-step accuracy should approximate p to the nth power. This can be tested by systematically varying chain length in deductive tasks administered to both human subjects under conditions impairing Type 2 processing and artificial systems. Existing evidence from Saparov and He (2023) and the DeduCE evaluation is consistent with this prediction, but a dedicated test with precise quantification of the functional form of degradation has not yet been conducted. Prediction 2: Selective vulnerability. If the four properties are genuinely distinct prerequisites for deduction, selectively disrupting each should selectively impair the reasoning types that depend on it. This prediction can be tested through carefully designed experimental manipulations. To disrupt consistency, one could introduce subtle semantic shifts in key terms across premises, for example by using a term like ”bank” that shifts slightly in connotation between the major and minor premise of a syllogism, or by embedding premises in contexts that subtly bias different senses of the same word. This should impair deductive performance while leaving inductive generalization relatively intact, because induction does not depend on strict term stability across steps. To disrupt operability, one could present premises in holistic, non-decomposable formats, for example by embedding logical content in complex narrative prose that resists parsing into discrete propositional components. This should impair deduction and formal reasoning but leave causal reasoning partially functional, because causal inference can proceed on the basis of event-level associations without decomposing premises into quantificational structure. To disrupt structural preservation while leaving other properties intact, one could present problems in which the relational structure must be tracked through multiple intermediate steps that each introduce distracting but irrelevant structural information. To disrupt compositionality, one could present pairs of propositions that share identical content words but differ only in structural arrangement, testing whether the system can reliably distinguish them. This prediction is distinctive to the present framework: no competing account predicts this specific pattern of selective vulnerability tied to these specific structural properties. Prediction 3: Irreducibility under scaling. Increasing the scale of a statistical learning system by adding more parameters, more data, or more compute, should improve performance below the principal boundary but should not eliminate the boundary itself, unless the scaling produces structural reorganization of the representational system. For any system whose representational architecture does not satisfy the structural properties, there should exist deductive tasks of sufficient novelty and chain length at which performance falls to unreliable levels, regardless of scale. By ”representational architecture” I mean the set of computational operations and data structures that determine how information is encoded, stored, and transformed. By ”structural reorganization” I mean changes that alter these operations and structures in ways that provide the identified properties, not merely changes in the number of parameters or volume of training data. By ”unreliable levels” I mean performance that does not significantly exceed what would be expected from a system using distributional heuristics rather than valid inference. This is the framework’s strongest and most directly falsifiable prediction. It is a conditional architectural prediction, not a metaphysical impossibility claim: it states that for architectures that do not acquire the relevant structural properties, increasing scale alone predicts recurrent failure under sufficient novelty and chain length. If a sufficiently scaled system, without structural reorganization in the sense defined, achieves robust generalizable deductive reasoning on entirely novel logical structures of arbitrary chain length, the framework is wrong. 7 Discussion 7.1 What Is Novel About This Framework Several existing research programs have addressed aspects of the territory covered here. It is important to be explicit about what the present framework adds. Fodor and Pylyshyn (1988) argued that mental representations must have combinatorial structure to account for the systematicity of thought. The present framework shares their emphasis on structural properties but differs in two respects. First, it does not commit to classical symbolic architecture. Second, and more importantly, it differentiates structural demands by reasoning type, explaining why systems with non-classical architectures succeed at some tasks while failing at others, something their original argument cannot explain. Johnson-Laird (1983, 2006, 2010) developed a detailed theory of deductive reasoning through mental models. The framework shows that the structural properties required for deduction are satisfied by mental models, but Johnson-Laird’s theory does not provide a systematic account of why different reasoning types place different demands or a specification of the boundary between reasoning that can be approximated and reasoning that cannot. The dual-process tradition (Evans & Stanovich, 2013; Sloman, 1996; Kahneman, 2011) distinguishes fast and slow processing. The present framework provides a representational explanation for this distinction. AI evaluation research (Mondorf & Plank, 2024; Wan et al., 2024; Parmar et al., 2024) documents patterns of success and failure. The present framework provides a theoretical explanation for these patterns, predicting which tasks will be difficult and why. To my knowledge, no existing account brings together in a single implementation-neutral framework the elements combined here. The framework’s most distinctive steps are two. First, it identifies a minimal set of structural properties characterized not by their theoretical pedigree but by the distinct failure modes each excludes, and maps these properties across multiple reasoning types to reveal a principal structural boundary. Second, it uses that mapping to explain why the observed performance profiles of current AI systems take the form they do, connecting a philosophical analysis of representational structure to a recurring empirical pattern across multiple generations of artificial systems. Other elements of the framework, including the insufficiency argument and the converging evidence from development and neuroscience, support these two central contributions but do not by themselves constitute them. 7.2 Relation to Harman, Stenning, and the Rationality Debate Harman (1986) argued that ordinary reasoning is not logical deduction but belief revision governed by pragmatic considerations. This challenge is important to address directly, because it raises the question of whether the framework conflates logic as a normative theory with reasoning as a psychological capacity. The present framework does not make this conflation. It does not claim that human reasoning is, or should be, logical deduction. It claims that deductive competence is a real cognitive capacity, one that is engaged under specific conditions and that requires specific representational support. Most everyday reasoning is, as Harman correctly argues, ampliative: it involves updating beliefs in light of new evidence, weighing considerations, and making pragmatic judgments. None of this requires the structural properties identified here to be fully satisfied. But some reasoning does involve deriving conclusions that are intended to follow necessarily from premises, and when it does, the structural demands change qualitatively. The framework thus occupies a middle ground between Harman’s skepticism about the role of logic in cognition and Rips’s (1994) defense of deduction as central to thought. Both are right about different parts of the reasoning spectrum. Harman is right that most everyday reasoning does not need deductive machinery. Rips is right that deduction is a genuine capacity with specific prerequisites. The structural-demand gradient explains how both claims can be true simultaneously, because they concern different reasoning types with different structural demands. The present framework agrees that ordinary reasoning is not exhausted by logic, but denies that this licenses collapsing the structural difference between demonstrative and associative forms of reasoning. Stenning and van Lambalgen (2008) argued that reasoning involves two stages: interpretation, in which the reasoner constructs a representation of the problem, and derivation, in which inferential procedures are applied to that representation. This distinction is complementary to the present framework and helps clarify its scope. The structural properties I identify are primarily constraints on what the representational system must provide for the derivation stage to succeed. They specify what the output of interpretation must look like for subsequent inference to be reliable. Stenning and van Lambalgen’s emphasis on interpretation highlights that getting the representation right in the first place is a separate and important challenge. Failures in deductive reasoning may arise not only because the representational system lacks the structural properties needed for derivation, but also because the interpretation stage produces a representation that does not accurately capture the logical structure of the problem. This distinction is important because it means that satisfying the four structural properties is necessary for reliable deduction, but may not be sufficient if the interpretation stage introduces distortions. The present framework addresses the derivation-stage requirements; a complete account of deductive reasoning would need to address interpretation as well. The fact that reasoning includes an interpretation stage does not diminish the need for structurally adequate representations once derivational reliability is at stake. 7.3 Implications For artificial intelligence, the framework implies that improving reasoning requires structural innovation, not just scaling. Neurosymbolic approaches (Garcez & Lamb, 2023) can be understood as attempts to provide the identified properties. The framework predicts that approaches directly targeting operability, consistency, structural preservation, and compositionality will yield the most reliable improvements. Lake, Ullman, Tenenbaum, and Gershman (2017) argued that building machines that learn and think like people requires integrating statistical learning with structured representations. The present framework provides theoretical support by specifying which structural properties are needed and for which cognitive functions. For psychology, the framework provides a representational explanation for the dual-process divide: certain reasoning tasks require deliberate processing because the representational properties they demand cannot be provided by associative processing. For philosophy of mind, the framework shifts the center of explanatory attention within the format debate, at least for the purposes of reasoning research: what matters most is not whether representations are symbolic or distributed but whether they satisfy the structural properties required for the cognitive functions in question. 7.4 Limitations and Scope Several limitations should be acknowledged. First, boundaries between reasoning types are not always sharp. Real-world reasoning blends types, and the framework provides a useful idealization. Second, the four properties are necessary conditions, not sufficient. Third, the framework is deliberately agnostic about implementation, providing generality but limiting mechanistic specificity. Fourth, the insufficiency argument is strongest for formal logical reasoning and somewhat less decisive for everyday deduction, where ecological scaffolding may partially compensate. Fifth, the framework does not address counterfactual reasoning, moral reasoning, or probabilistic reasoning systematically. To be explicit about what this paper does and does not claim: it proposes a necessary-condition account of the structural demands that different reasoning types impose on representational systems. It does not propose a complete theory of representation, a cognitive architecture, or a proof of impossibility for statistical systems. It aims to reorganize debates about reasoning, not to close them. 8 Conclusion Research on reasoning has made important progress, yet a systematic account of how representational structure constrains reasoning capacity has remained missing. This paper proposes that different types of reasoning impose different structural demands, and that there is a principal structural boundary between reasoning types that can operate on associative representations and those that require a fuller set of structural constraints. The four properties identified here, operability, consistency, structural preservation, and compositionality, constitute a minimal functional analysis set, each excluding a distinct class of reasoning failure. The principal boundary between causal and deductive reasoning tracks the distinction between ampliative and demonstrative reasoning, and it is supported by converging evidence, at different levels of directness, from developmental psychology, cognitive neuroscience, and artificial intelligence. Three core predictions distinguish the framework from competing accounts. The framework does not claim that no statistical system can ever reason deductively. It claims that scaling without structural reorganization is insufficient. It does not claim completeness for the four properties. It claims they are the minimal set for distinguishing the structural demands of different reasoning types. And it does not claim to close the debates about reasoning and representation. It claims to reorganize them, by replacing the underspecified question ”can this system reason?” with the more productive question ”which structural properties does its representational system satisfy, and which types of reasoning do those properties support?” Until this question is asked, and asked separately for each type of reasoning, we will continue to confuse what systems happen to do with what reasoning requires. References Braine, M. D. S., & O’Brien, D. P. (1998). Mental logic. Lawrence Erlbaum Associates. Calvo, P., & Symons, J. (Eds.). (2014). The architecture of cognition: Rethinking Fodor and Pylyshyn’s systematicity challenge. MIT Press. Chalmers, D. J. (1990). Syntactic transformations on distributed representations. Connection Science, 2(1–2), 53–62. Dasgupta, I., Lampinen, A. K., Chan, S. C. Y., Sheahan, H. R., Creswell, A., Kumaran, D., McClelland, J. L., & Hill, F. (2024). Language models, like humans, show content effects on reasoning tasks. PNAS Nexus, 3(7), pgae233. De Neys, W. (2006). Dual processing in reasoning: Two systems but one reasoner. Psychological Science, 17(5), 428–433. DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv:2501.12948. Dias, M., & Harris, P. (1988). The effect of make-believe play on deductive reasoning. British Journal of Developmental Psychology, 6(3), 207–221. Evans, J. St. B. T., Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory & Cognition, 11(3), 295–306. Evans, J. St. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59, 255–278. Evans, J. St. B. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Fodor, J. A. (1975). The language of thought. Harvard University Press. Fodor, J. A. (1998). Concepts: Where cognitive science went wrong. Oxford University Press. Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 3–71. Garcez, A. d’Avila, & Lamb, L. C. (2023). Neurosymbolic AI: The 3rd wave. Artificial Intelligence Review, 56, 12387–12406. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(2), 155–170. Gentner, D., & Rattermann, M. J. (1991). Language and the career of similarity. In S. A. Gelman & J. P. Byrnes (Eds.), Perspectives on language and thought (p. 225–277). Cambridge University Press. Gładziejewski, P., & Miłkowski, M. (2017). Structural representations: Causally relevant and different from detectors. Biology and Philosophy, 32(3), 337–355. Goel, V. (2007). Anatomy of deductive reasoning. Trends in Cognitive Sciences, 11(10), 435–441. Goel, V., & Dolan, R. J. (2004). Differential involvement of left prefrontal cortex in inductive and deductive reasoning. Cognition, 93(3), B109–B121. Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111(1), 3–32. Goswami, U. (1992). Analogical reasoning in children. Lawrence Erlbaum Associates. Handley, S. J., Capon, A., Beveridge, M., Dennis, I., & Evans, J. St. B. T. (2004). Working memory, inhibitory control, and the development of children’s reasoning. Thinking & Reasoning, 10(2), 175–195. Harman, G. (1986). Change in view: Principles of reasoning. MIT Press. Heit, E. (2000). Properties of inductive reasoning. Psychonomic Bulletin & Review, 7(4), 569–592. Holyoak, K. J., & Thagard, P. (1995). Mental leaps: Analogy in creative thought. MIT Press. Johnson-Laird, P. N. (1983). Mental models: Towards a cognitive science of language, inference, and consciousness. Harvard University Press. Johnson-Laird, P. N. (2001). Mental models and deduction. Trends in Cognitive Sciences, 5(10), 434–442. Johnson-Laird, P. N. (2006). How we reason. Oxford University Press. Johnson-Laird, P. N. (2010). Mental models and human reasoning. Proceedings of the National Academy of Sciences, 107(43), 18243–18250. Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux. Knauff, M. (2009). Neuro-cognitive foundations of reasoning. In P. Calvo & J. Symons (Eds.), The Routledge companion to philosophy of psychology (p. 477–490). Routledge. Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, e253. Lehman, D. R., Lempert, R. O., & Nisbett, R. E. (1988). The effects of graduate training on reasoning. American Psychologist, 43(6), 431–442. Mandler, J. M. (2004). The foundations of mind: Origins of conceptual thought. Oxford University Press. Markovits, H., & Barrouillet, P. (2002). The development of conditional reasoning: A mental model account. Developmental Review, 22(1), 5–36. McCoy, R. T., Yun, T., Smolensky, P., & Linzen, T. (2024). On the planning abilities of large language models: A critical investigation. arXiv preprint arXiv:2404.00209. Mondorf, P., & Plank, B. (2024). Beyond accuracy: Evaluating the reasoning behavior of large language models. A survey. arXiv preprint arXiv:2404.01869. Nisbett, R. E. (Ed.). (1993). Rules for reasoning. Lawrence Erlbaum Associates. Osherson, D. N., Perani, D., Cappa, S., Schnur, T., Grassi, F., & Fazio, F. (1998). Distinct brain loci in deductive versus probabilistic reasoning. Neuropsychologia, 36(4), 369–376. Parmar, M., Patel, N., Varshney, N., Nakamura, M., Luo, M., Mashetty, S., Mitra, A., & Baral, C. (2024). Towards systematic evaluation of logical reasoning ability of large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Piaget, J., & Inhelder, B. (1958). The growth of logical thinking from childhood to adolescence. Basic Books. Piccinini, G. (2020). Neurocognitive mechanisms: Explaining biological cognition. Oxford University Press. Pylyshyn, Z. W. (1984). Computation and cognition: Toward a foundation for cognitive science. MIT Press. Quilty-Dunn, J., Porot, N., & Mandelbaum, E. (2023). The best game in town: The re-emergence of the language of thought hypothesis across the cognitive sciences. Behavioral and Brain Sciences, 46, e261. Rips, L. J. (1994). The psychology of proof: Deductive reasoning in human thinking. MIT Press. Saparov, A., & He, H. (2023). Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. In The Eleventh International Conference on Learning Representations. Saparov, A., Pang, R. Y., Padmakumar, V., Joshi, N., Kazemi, M., Kim, N., & He, H. (2023). Testing the general deductive reasoning capacity of large language models using OOD examples. Advances in Neural Information Processing Systems, 36, 3083–3105. Schulz, L. E. (2012). The origins of inquiry: Inductive inference and exploration in early childhood. Trends in Cognitive Sciences, 16(7), 382–389. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22. Sloman, S. A. (2005). Causal models: How people think about the world and its alternatives. Oxford University Press. Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1), 1–23. Stenning, K., & van Lambalgen, M. (2008). Human reasoning and cognitive science. MIT Press. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 1279–1285. Van Gelder, T. (1990). Compositionality: A connectionist variation on a classical theme. Cognitive Science, 14(3), 355–384. Wan, Y., Wang, W., Yang, Y., Yuan, Y., Huang, J. T., He, P., Jiao, W., & Lyu, M. R. (2024). A & b == b & a: Triggering logical reasoning failures in large language models. arXiv preprint arXiv:2401.00757.