← Back to papers

Paper deep dive

AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability

Fernando Rosas, Alexander Boyd, Manuel Baltieri

Year: 2025Venue: arXiv preprintArea: Formal/TheoreticalType: TheoreticalEmbeddings: 116

Abstract

Abstract:Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment to ensure their reliability and safety. However, accurate world models often have high computational demands that can severely restrict the scope and depth of such assessments. Inspired by the classic `brain in a vat' thought experiment, here we investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation. By following principles from computational mechanics, our approach reveals a fundamental trade-off in world model construction between efficiency and interpretability, demonstrating that no single world model can optimise all desirable characteristics. Building on this trade-off, we identify procedures to build world models that either minimise memory requirements, delineate the boundaries of what is learnable, or allow tracking causes of undesirable outcomes. In doing so, this work establishes fundamental limits in world modelling, leading to actionable guidelines that inform core design choices related to effective agent evaluation.

Tags

ai-safety (imported, 100%)formaltheoretical (suggested, 92%)interpretability (suggested, 80%)theoretical (suggested, 88%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 93%

Last extracted: 3/12/2026, 5:32:31 PM

Summary

The paper investigates the fundamental limits of world modeling for AI agent sandboxing and interpretability. It introduces a framework based on computational mechanics and transducers to analyze the trade-offs between computational efficiency, forward interpretability, and reverse interpretability, providing guidelines for designing optimal world models.

Entities (6)

Transducer · computational-structure · 100%World Model · concept · 100%Agent Sandboxing · application · 95%Computational Mechanics · field-of-study · 95%Forward Interpretability · methodology · 90%Reverse Interpretability · methodology · 90%

Relation Signals (3)

World Model exhibitstradeoff Efficiency and Interpretability

confidence 95% · our approach reveals a fundamental trade-off in world model construction between efficiency and interpretability

Transducer implements World Model

confidence 95% · Transducers correspond to world models of anticipation-free interfaces

Reverse Interpretability enables Retrodictive Analysis

confidence 90% · reverse interpretability, which is related to retrodictive analyses that can identify the roots of undesirable outcomes

Cypher Suggestions (2)

Find all concepts related to world modeling and their associated methodologies. · confidence 85% · unvalidated

MATCH (e1:Entity {name: 'World Model'})-[r]->(e2:Entity) RETURN e1, r, e2

Map the relationship between computational structures and their design goals. · confidence 80% · unvalidated

MATCH (s:Structure)-[:ENABLES|IMPLEMENTS]->(g:Goal) RETURN s.name, g.name

Full Text

115,679 characters extracted from source content.

Expand or collapse full text

AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability Fernando E. Rosas, Alexander Boyd, Manuel Baltieri Keywords:World models, agent sandboxing, POMDPs, AI interpretability, AI safety Summary While traditionally conceived as tools for model-based reinforcement learning agents to im- prove their task performance, recent works have proposedworld modelsas a way to build controlled virtual environments where AI agents can be thoroughly evaluated before deploy- ment. However, the efficacy of these approaches critically rely on the ability of world models to accurately represent real environments, which can results on high computational costs that may substantially restrict testing capabilities. Drawing inspiration from the ‘brain in a vat’ thought experiment, here we investigate methods to simplify world models that remain agnos- tic to the agent under evaluation. Our results reveal a fundamental trade-off inherent to the construction of world models related to their efficiency and interpretability. Furthermore, we develop approaches that either minimise memory usage, establish the limits on what is learn- able, or enable retrodictive analyses tracking the causes of undesirable outcomes. These results sheds light on the fundamental constraints that shape the design space of world modelling for agent sandboxing and interpretability. Contribution(s) 1. This paper conceptualises and formalises a novel problem: building efficient world models for an operator to sandbox, evaluate, and interpret AI agents before deployment. Context:Prior work (e.g. (Ha & Schmidhuber, 2018; Hafner et al., 2020)) focuses on world models from the perspective of the agent using for boosting performance, and has not considered this safety-inspired perspective. 2. We introduce generalised transducers based on quasi-probabilities, leading to a more effi- cient approach to compress world models at the expense of their interpretability. Context:Generalised transducers are an extension of generalised hidden Markov models, which have been thoroughly studied in previous works (Upper, 1997; Vidyasagar, 2011). 3. We provide a unifying framework to investigate and reason about world models of beliefs, and show that all models that can be calculated by an agent in real time can be bisimulated into a canonical world model known asε-transducer. Context:The minimality of theε-transducer among prescient rival partitions was proven in (Barnett & Crutchfield, 2015), without investigating links with bisimulation or other con- cepts from reinforcement learning. Relationships between bisimulation and other computa- tional mechanics constructions were investigated by Zhang et al. (2019). 4. We introduce the notion ofreverseinterpretability, which is related to retrodictive analyses that can identify the roots of undesirable outcomes. Context:Standard interpretability approaches assess agents with respect to their capabili- ties to predict and plan with respect to future events (Nanda et al., 2023; Gurnee & Tegmark, 2023; Shai et al., 2025). 5. We introduce the notion of reversible transducer, and identify necessary and sufficient con- ditions for it. We also introduce and explore the notion of retrodictive beliefs. Context:Retrodictive and reversible hidden Markov models have been investigated by El- lison et al. (2009; 2011). AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability Fernando E. Rosas 1-5 , Alexander Boyd 1,6 , Manuel Baltieri 7,1 f.rosas@sussex.ac.uk, alecboy@gmail.com, manuel_baltieri@araya.org 1 Department of Informatics, University of Sussex 2 Sussex AI and Sussex Centre for Consciousness Science, University of Sussex 3 Centre for Complexity Science and Center for Psychedelic Research, Department of Brain Sciences, Imperial College London 4 Center for Eudaimonia and Human Flourishing, University of Oxford 5 Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) 6 Beyond Institute for Theoretical Science (BITS) 7 Araya Inc. Abstract Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment to ensure their reliability and safety. However, accurate world models often have high computational demands that can severely restrict the scope and depth of such assessments. Inspired by the classic ‘brain in a vat’ thought experiment, here we investigate ways of simplifying world mod- els that remain agnostic to the AI agent under evaluation. By following principles from computational mechanics, our approach reveals a fundamental trade-off in world model construction between efficiency and interpretability, demonstrating that no single world model can optimise all desirable characteristics. Building on this trade-off, we identify procedures to build world models that either minimise memory requirements, delineate the boundaries of what is learnable, or allow tracking causes of undesirable outcomes. In doing so, this work establishes fundamental limits in world modelling, leading to ac- tionable guidelines that inform core design choices related to effective agent evaluation. 1 Introduction Breakthroughs in deep reinforcement learning are progressively enabling AI agents capable of mas- tering complex tasks across a wide array of domains (Arulkumaran et al., 2017; Wang et al., 2022), and a new generation of agents leveraging large language models (Wang et al., 2024) and large mul- timodal models (Yin et al., 2024) is expected to drive a new wave of technological innovation with the potential to benefit all sectors of the global economy (Larsen et al., 2024). Alongside these bene- fits, the proliferation of increasingly advanced autonomous AI systems will also bring important new risks regarding their safety, controllability, and alignment with human values (Bengio et al., 2024; Tang et al., 2024). Given these far-reaching prospects, it is imperative to develop frameworks and methodologies to guarantee the safe and beneficial integration of these technologies to our societies. One path to pursue AI safety and alignment is to use synthetic world models as sandbox envi- ronments to evaluate AI agents without real-world consequences (Dalrymple et al., 2024; Díaz- Rodríguez et al., 2023). These simulated environments are ideal for observing how AI agents handle edge cases and respond to novel situations, potentially revealing safety issues or alignment failures before deployment (He et al., 2024). The efficacy of this approach, however, critically relies on the world model accurately representing relevant aspects of real environments, which is key for guar- anteeing that the agent’s behaviour in simulation may transfer to real-world settings. Thus, a key challenge lies in dealing with the computational demands of high-fidelity simulations, whose costs can impose heavy restrictions on the breadth and depth of safety and reliability assessments. Here we address these issues by investigating the fundamental limits that shape the design of world models. By bridging concepts from reinforcement learning, control theory, and computational me- chanics, we identify a fundamental trade-off between the computational efficiency of a world model and its interpretability. This approach also leads to the distinction betweenforwardandreversein- terpretability approaches, where the former characterises the predictive capabilities of agents, and the latter enables retrodictive analyses of the origins of undesirable outcomes. Overall, this work establishes foundational groundwork that leads to actionable guidelines for building world models to study AI agents following different desiderata. 2 Scenario and approach Representation and what is represented belong to two completely different worlds. Hermann von Helmholtz,Handbuch der physiologischen Optik(1867) Consider the design of a world model to sandbox and test an AI agent (Dalrymple et al., 2024). What should this world model look like? What information should it encode? And for what purpose? To ensure a reliable assessment of AI agent behaviour from simulations to a real-world setting, world models must faithfully reflect the world’s structure and dynamics. This could be seen as suggest- ing that designing reliable world models is critically bounded by a trade-off between accuracy and computational tractability. Interestingly, this trade-off can be partially circumvented by recognising that effective world models only need to incorporate variables that make a difference for the AI’s actions, and these variables only require a granularity that is sufficient to simulate their dynamics. To illustrate this idea, consider how one could choose to build a world model to sandbox a simple robot. Although one could in principle design a simulation that includes the quantum dynamics of the whole planet, such a simulation would not only be computationally unfeasible but also unneces- sary to answer most questions of interest. Indeed, such a world model would likely be too spatially extended (by including regions of the planet that are inaccessible to the robot) and have too much resolution (by including quantum effects for a fundamentally classical robot). To avoid this, one could instead design a more parsimonious world model that factors out indistinguishable properties from the robot’s perspective, focusing on the agent’s ‘interface’ including, for instance, sensorimo- tor contingencies (O’Regan & Noë, 2001; Baltieri & Buckley, 2017; 2019; Tschantz et al., 2020; Mannella et al., 2021) or task-relevant information (Zhang et al., 2021). Related questions have been extensively investigated in the philosophy of mind and cognitive (neuro)science literature, and more recently in reinforcement learning. These works suggest an important insight: while an agent’s actions turn into outcomes through the mediation of the external world, the agent lacks direct access to the world’s ‘true nature’ and only interacts with it via its inputs and outputs (Clark, 2013; Seth & Tsakiris, 2018). This notion is illustrated by the classical ‘brain in a vat’thought experiment, which proposes that if an organism’s brain were to be placed in a vat, and a computer used to read the brain’s output signals and generate plausible sensory sig- nals, then the brain may not be able to tell it is in fact in a vat. 1 This thought experiment suggests that an ideal world model should depend only on three elements: the set of possible actions of the agentA, the set of possible outcomes affecting the agentY, 2 and the statistical relationship between action sequences and outcomes. In fact, it should be possible — at least in principle — to build a compressed representation of the ‘effective world’ of an AI agent that cannot be distinguished from a full simulation, irrespectively of how smart or powerful it may be. 1 The modern form of this thought experiment is due to Putnam (1981), but has roots in Descartes’ ‘evil demon’ (Descartes, 1641) and Plato’s cave allegory (Plato, 375 BC) while serving as inspiration for popular media such asThe Matrixmovies. 2 The outcome may be a combination of a quantity observable by the agent and a reward signal, so thatY=O×R. These ideas can be operationalised using principles from computational mechanics (Crutchfield, 1994; 2012), which reveal how observable processes can be generated by multiple data-generating procedures (see App. A for an example). Embracing this multiplicity leads to a perspective that we describe as‘AI in a vat’, which posits that designers should not focus on a single world model, but instead should (i) consider the class of all world models that are indistinguishable from the AI agent’s perspective, (i) characterise their properties, and then (i) choose one depending on specific priorities. After setting some formal foundations in Sec. 3, the remainder of this work explores how this approach reveals key design choices and procedures to construct optimal models related to different ways of using world models to pursue AI safety and alignment (see Figure 1): •Computational efficiency(Sec. 4): sandboxing agents to evaluate their behaviour or provide formal guarantees about their capabilities (Dalrymple et al., 2024) using a minimal amount of resources. •Forward interpretability(Sec. 5): building models to study which features are learnable by agents, and how representations are encoded inside them (Shai et al., 2025). •Reverse interpretability(Sec. 6): deploying models that can be run backward in time to investigate the origins and tipping points that lead to specific — desirable or undesirable — outcomes. World model Reduced model Belief transducer Reversible transducer Generalised transducer ε-transducer Reverse belief transducer Computational efficiency Forward interpretability Reverse interpretability compression via bisimulation phase space expansion time reversal compression via quasi-probabilities compression via bisimulation phase space expansion Figure 1:Recommendations for building world optimal models, including implementations (boxes), transformations (arrows), and design criteria (ellipses). 3 Generating interfaces via transducers This section formalises the notions of ‘world model’ and ‘interface’. In the following, uppercase letters (e.g.X,Y) are used to denote random variables and lowercase (e.g.x,y) their realisations, calligraphic letters (e.g.X,Y) denote the sets over which they take values, and the symbol∆(as in∆(X),∆(Y)) is used to denote the collection of all distributions over those sets. We use the shorthand notationp(x|y) = Pr(X=x|Y=y)to express probabilities when there is no risk of ambiguity, and assume that equalities of the formp(x|y,z) =p(x|y)hold for all realisations that can take place with non-zero probability.N=0,1,2,...corresponds to zero-based numbering, and we use the following abbreviations:x a:b = (x a ,...,x b ),x :b =x 0:b ,x a: =x a:∞ , andx : =x 0:∞ . 3.1 World models We operationalise interfaces as descriptions of how actions turn into outcomes for a particular agent. Definition 1.AninterfaceI(Y|A)is a collection of distributionsp(y :t |a : ),t∈Ncorresponding to a stochastic process over outcome sequencesy : ∈Y N conditioned on action sequencesa : ∈A N . An interface isanticipation-freeifp(y :t |a : ) =p(y :t |a :t )for allt∈N. Essentially, an interface describes a semi-infinite stochastic process (Kallenberg, 1997; Loomis & Crutchfield, 2023) for each action sequencea : while being agnostic to the agent’s computational capabilities, architecture, or internal functioning. Interfaces can be constructed from an underlying world model that specifies how actions turn into outcomes. Next, we introduce a general notion of a world model in terms of statistical sufficiency (App. B), and useh t = (a t ,y t )so thath :t denotes the joint history of the interface up to timet. Definition 2.Aworld modelfor an interfaceI(Y|A)is a collection of distributionsp(s :t |h : ),t∈ Ncorresponding to a stochastic process over state sequencess : : = (s 0 ,s 1 ,...)∈S N that satisfies (1)p(y t |s :t ,a :t ,y :t−1 ) =p(y t |s t ,a t )and(2)p(s :t |h :t−1 ,a t: ) =p(s :t |h :t−1 )∀t∈N.(1) Thus, world models are candidate mechanisms for implementing the statistical relationships between actions and outcomes, taking the form of auxiliary processesS t that encapsulate relevant informa- tion between past events and present outcomes (condition 1) while guaranteeing time’s arrow so future actions cannot affect previous world states or outcomes (condition 2). We may informally denote a world model simply byS t when it is unambiguous from context. This definition includes models of the ‘external world’ such aspartially observed Markov decision processes(POMDPs), as well as their ‘epistemic’ counterparts,belief MDPs(Kaelbling et al., 1998), as explained in Sec. 3.3. The unifying property of all world models is presented next (proof in App. C). Lemma 1.A processS t is a world model for an anticipation-free interfaceI(Y|A)if and only if p(y :t ,s :t+1 |a : ) =p(s 0 ) t Y τ=0 p(y τ |s τ ,a τ )p(s τ+1 |s :τ ,h :τ ),∀t∈N.(2) Thus, world models let us express interfaces in terms of probabilistic graphical models (Koller & Friedman, 2009). Among other things, Eq. (2) can be used to generate outcomes for given sequences a :t ands :t by samplingp(y :t |s :t ,a :t ) = Q t τ=0 p(y τ |s τ ,a τ ). In this sense, we say that the world modelS t generatesthe interfaceI(Y|A), and that the graphical model outlined in Eq. (2) estab- lishes apresentationof the interface. These ideas are illustrated by an example in App. A. 3.2 Transducers Sampling sequences of world model’s states can be highly non-trivial if their dynamics are non- Markovian. This issue can be avoided by restricting ourselves to building world models usingtrans- ducers(Barnett & Crutchfield, 2015), a computational structure that is introduced next. Definition 3.Atransduceris a tuple S,A,Y,κ,p , whereSis a set of memory states,Aand Yare sets of inputs and outputs,κ:A×S ×N→∆(Y ×S)is a Markov kernel of the form κ τ (y, ̃s|a,s) :s, ̃s∈S,y∈Y,a∈A,τ∈N, andp∈∆(S)is an initial distribution over states. If a transducer’s memory can only take|S|=ndifferent states, then their transitions can be de- scribed via substochastic matricesT (y|a) τ of the form T (y|a) τ : = n X i=1 n X j=1 κ τ (y,s i |a,s j )e i e ⊺ j ,(3) wheree k is a binary vector with a 1 at thek-th position and zeros elsewhere. These transducers are also known as stochastic automata (Claus, 1971; Cakir et al., 2021), generalising deterministic automata (Minsky, 1967) by using stochastic transitions to generate outputs and update their state. Running a transducer generates an interfaceI(Y|A)given by its inputs and outputs according to p(y :t s :t+1 |a : ) =p(s 0 ) t Y τ=0 κ τ (y τ ,s τ+1 |a τ ,s t ),(4) providing a graphical model that can be used to simulate the interface (see Figure 2). Comparing Eq. (4) with Lemma 1 let us to formalise this fact as follows. Lemma 2.Transducers correspond to world models of anticipation-free interfaces whose dynamics satisfy the Markov conditionp(s τ+1 |s :τ ,h :τ ) =p(s τ+1 |s τ ,h τ )for allτ∈N. We may denote a transducer informally as(S t ,A t ,Y t )when it is unambiguous from the context, and describe its memory stateS t as a world model when appropriate. Complementary characterisations of transducers in terms of sufficient statistics and information properties are provided in App. D. S 0 S 1 S 2 S 3 A 0 A 1 A 2 A 3 Y 0 Y 1 Y 2 Y 3 A 0 ,A 1 ,A 2 ,A 3 ,... Y 0 ,Y 1 ,Y 2 ,Y 3 ,... Figure 2:Illustration of an interface (left) and a possible unravelling of it via a presentation with a world model built from the memory states of a transducer (right), as given by Eq. (4). Transducers can be seen as reflecting the memory structure of interfaces. In particular, an interface I(Y|A)is said to bememorylessifS t = 0is a valid transducer, and isfullyobservableifS t =Y t yields a valid transducer — including Markov decision processes (MDPs) as a main example. We next characterise the statistics of such interfaces, which clarifies that elaborate world models are required by interfaces with non-Markovian dynamics (proof in App. E). Lemma 3.An interface is fully observable if and only ifp(y t+1 |y :t ,a : ) =p(y t+1 |y t ,a t ), and is memoryless if and only ifp(y :t |a : ) = Q t τ=0 p(y τ |a τ ). Note that ifp(y :t |a : ) =p(y :t ), corresponding to ‘contemplative’ or passive agents that do not act but only sense, then transducers reduce to hidden Markov models (Ephraim & Merhav, 2002). 3.3 General classes of transducers Transducers use their kernelsκ τ to generate(s t+1 ,y t )jointly from(s t ,a t ), corresponding to what has been described as a ‘Mealy’ machines (Virgo, 2023; Bonchi et al., 2024). Simpler computational structures can be obtained by imposing constraints in the kernel as follows: •Input-Moore transducersgenerate outputs ignoring the current input, corresponding to kernels of the formκ τ (y, ̃s|a,s) =μ τ (y|s)ν τ ( ̃s|y,a,s). •Output-Moore transducersupdate their state without considering the current output, correspond- ing to kernels of the formκ(y, ̃s|a,s) =μ τ (y|a,s)ν τ ( ̃s|a,s). •I-O Moore transducerssatisfy both previous conditions, corresponding to kernels of the form κ(y, ̃s|a,s) =μ τ (y|s)ν τ ( ̃s|a,s). Based on ideas from automata theory (Lee & Seshia, 2017), input-Moore transducers serve as mod- els for interfaces that satisfy the stronger anticipation-free conditionp(y :t |a : ) =p(y :t |a :t−1 ), corre- sponding to scenarios whereY t takes place beforeA t , thus reflecting a time-indexing convention. In contrast, output-Moore transducers build on the hidden Markov models literature (Riechers, 2016), and are used to represent (non-quantum) physical processes whose evolution is not affected by the observations made by the agent — as opposed to epistemic processes such as belief MPDs, in which the opposite happens (see Sec. 5). POMDPs can be shown to be examples of either output-Moore or I-O Moore transducers depending on the specific definition, as explained in App. F. 4 Minimal world models After setting the formal foundations of world models, and establishing transducers as a natural way to construct them, we now investigate how to buildminimalworld models. 4.1 Reducing world models We begin by showing that all interfaces have at least one transducer presentation, and hence one can focus on transducers without loss of generality (see the proof in App. G). Lemma 4.The world modelS t =H :t−1 yields a transducer that generates the interfaceI(Y|A). This world model is far from parsimonious, resembling Borges’ characterFunes the memoriousin its inability to forget. This suggests the importance of ‘reducing’ world models. To formalise this, we extend the notion of MDP homomorphism (Ravindran & Barto, 2003) to transducers as follows. Definition 4.Ahomomorphismbetween transducers(S t ,A t ,Y t )and(S ′ t ,A ′ t ,Y ′ t )is given by the mappings⟨φ:S →S ′ ,f:Y →Y ′ ,g:A→A ′ ⟩satisfying two compatibility conditions: (i)Pr Y ′ t =y ′ |S ′ t =φ(s),A ′ t =g(a) = P y∈f −1 (y ′ ) Pr Y t =y|S t =s,A t =a . (i)Pr S ′ t+1 =s ′ |S ′ t =φ(s),H ′ t = (f(y),g(a)) = P s∈φ −1 (s ′ ) Pr S t+1 =s|S t = ̃s,H t = (y,a) andPr S ′ 0 =s ′ = P s∈φ −1 (s ′ ) Pr S 0 =s . Areductionof a world modelS t into a world modelS ′ t is a homomorphism⟨φ,f,g⟩between the transducers(S t ,A t ,Y t )and(S ′ t ,A t ,Y t ), wheref:Y → Yandg:A → Aare identity mappings andφis a surjective mapφ:S t →S ′ t . Two transducers areisomorphicif they are reductions of each other, and a transducer isminimalif all its reductions are isomorphic to itself. A world reduction can be informally described as a coarse-grainingφbetween the memory states of two transducers of the same interface. Condition (i) above ensures that outcomes are generated with the same statistics, and (i) that the resulting world model is still Markovian — as can be confirmed by relating it with the notion of ‘lumpability’ of Markov chains (Tian & Kannan, 2006). These properties guarantee that reductions do not distort the corresponding interface (proof in App. H). Lemma 5.A world model reduction yields a transducer presentation of the original transducer. The next two sections study different approaches to look for minimal world models. 3 4.2 Reduction via bisimulation A natural way to reduce a world model is via the notion of bisimulation, which is a way of merging states that behave equivalently (Givan et al., 2003). Here we leverage previous work on bisimulations for hidden Markov models (Jansen et al., 2012) to define bisimulations of transducers. Definition 5.For a given transducer with world modelS t and kernelκ t , abisimulationis an equivalence relationB t ⊆S×Ssuch thats,s ′ ∈Sare equivalent if they satisfy two conditions: (i)p t (y|s,a) =p t (y|s ′ ,a), wherep t (y|s,a) = P s ′ ∈S κ t (y,s ′ |s,a). (i)p t (C|s,a) =p t (C|s ′ ,a)for all equivalence classesC⊆ S, wherep t (C|s,a) = P y∈Y P s ′ ∈C κ t (y,s ′ |s,a). There is a direct correspondence between world model reductions (Def. 4) and bisimulations, as shown next by adapting (Taylor et al., 2008, Theorem 3) to our setup (proof in App. I). Proposition 1.φ:S t →S ′ t is a reduction of world models if and only if the equivalence relation it induces, with equivalence classes given byφ −1 (s ′ ) =s∈S:φ(s) =s ′ , is a bisimulation. Together with Lemma 5, this result confirms that the bisimulation of a transducer yields another transducer presentation for the same interface. This has a simple and yet powerful implication: a full reduction of a given transducer can be attained by coarse-graining all bisimilar states. There may be cases where bisimulations do not produce the most efficient world model that gen- erates a given interface, since reducing a particular transducer usually does not lead to a global minimum. To investigate this claim, we consider a world model with|S|=nstates and build vectorsw(h :t )∈R n containing the probabilities of generatingy :t givena :t when starting from different world states, so that itsk-th coordinate is[w(h :t )] k = Pr(Y :t =y :t |A :t =a :t ,S 0 =s k ). Intuitively, if vectorsw(h :t )for differenth :t are linearly dependent, some of their dimensions (and, hence, the corresponding world states) are still in some sense redundant. Crucially, the coarse- grainings associated to bisimulation can only lump states that have identical components, but cannot 3 Minimality can also be studied via the entropy of the world’s dynamics. Interestingly, minimal entropy models may not coincide with the models with fewer states — although the two coincide for predictive models (Loomis & Crutchfield, 2019). Figure 3:Illustration of the minimisation of world models. Purple boxes represent reducible models and orange boxes represent minimal ones, and arrows correspond to reductions. Red boxes are generalised models following quasi-probabilities, which (if allowed) establish global minima. reduce linear dependencies between states more generally. Relaxing the criteria for merging states (e.g. via bisimulation metrics (Ferns & Precup, 2014)) does not solve this issue, as this introduces distortion in the interface due to the imprecisions allowed in the state merging procedure. These ideas can be made concrete by studying thecanonical dimensionof a transducerT, given by 4 d(T) : = lim m→∞ dim(U m ),whereU m =Spanw(h :t ) :t≤m⊆R n .(5) The canonical dimension is an important indicator of the compressibility of a transducer as shown next, whose proof can be found in (Cakir et al., 2021, Cor. 4.9) — also see (Ito et al., 1992; Bala- subramanian, 1993) for related results in hidden Markov models. Theorem 1.IfTis a transducer with|S|=n∈N, thend(T) =nimplies that there are no transducers with fewer memory states that can generate the same interface. The minimal bisimulation of a transducer ˆ Twith world states in ˆ Scould still exhibitd( ˆ T)<| ˆ S|. In fact, there are interfaces for which no transducer reachesd(T) =|S|. Even if there exists a transducer withd( ˆ T) =|S|, we are not aware of any general algorithm that can directly build it. 5 4.3 Reduction via pseudo-probabilities and generalised transducers This section focuses on the reduction of world models with a finite number of states|S|=nbut d(T)< n. As discussed in Sec. 3.2, the probabilities ofy :t givena :t can then be calculated as p(y :t |a :t ) =1 ⊺ · t Y τ=0 T (y τ |a τ ) τ ·p,(6) where1 ⊺ is a transposed vector withnones as components. Normally, the substochastic matrices T (y|a) t and the initial distributionpare assumed to contain only non-negative terms. However, a more general class of transducers can be explored by removing this constraint and considering quasi-distributionsv∈R n , which may have negative components but still satisfy P n i=1 v i = 1, and quasi-stochastic matrices whose columns are quasi-distributions (Balasubramanian, 1993; Upper, 1997). This leads to the following generalisation of a transducer. Definition 6.Ageneralised transducerfor an interfaceI(Y|A)is a tuple(S,A,Y,A (y|a) ,v,u) withu,v∈R n andA (y|a) ∈R n×n that satisfy p(y :t |a :t ) =u ⊺ · t Y τ=0 A (y τ |a τ ) τ ·v∀y :t ∈Y t+1 ,a :t ∈A t+1 .(7) Generalised transducers are useful because, in contrast to standard transducers (or POMDPs), they can always be reduced to find representations with a minimal number of states, as shown next. 4 If a transducer has|S|=nmemory states, thenlim m→∞ dim(U m ) =dim(U n−1 )(Cakir et al., 2021, Prop. 4.3). 5 In fact, the relatively simpler case of reducing hidden Markov models is still not fully solved (Vidyasagar, 2011), although algorithms that can address some cases have been developed (Huang et al., 2015; Ohta, 2021). Theorem 2.A generalised transducer ̃ Twithd( ̃ T)< ncan always be reduced via a linear trans- formation into another transducer that generates the same interface using fewer states. This result follows directly from the proofs provided in (Balasubramanian, 1993, Ch. 3) and related results can be found in (Upper, 1997; Vidyasagar, 2011), in which reductions correspond to linear projections. Notably, these proofs lead to practical algorithms that can be used to efficiently reduce transducers withd( ̃ T)< n(see App. J). In this way, generalised transducers achieve a minimal computational complexity at the cost of introducing an opaque world model whose trajectories can- not be sampled (due to the quasi-probabilities), which results in a substantial lack of interpretability. 5 Forward interpretability via epistemic world models The previous section shows how computational efficiency can be achieved by either compressing memory state spaces with bisimulations or by allowing memory states of transducers to be encoded by quasi-probabilities. While the latter generally yields higher efficiency, this comes at the cost of making those reduced world models highly uninterpretable due to the possible presence of negative probabilities. This section takes a different route by investigating specific types of world models that focus on interpretability, bringing insights about what agents can learn. 5.1 Beliefs as world models Let us start by reviewing properties of certain classes of world models that make them learnable by agents in real time. A world modelS t ispredictiveif it contains only past information, which in information-theoretic terms corresponds toI(S t ;Y t: |H :t−1 ,A t: ) = 0. A world model isobservable ifS t+1 =f t (H :t ), i.e. if it can be estimated from action-output history via mappingsf t . Finally, a world model isunifilarifS t+1 = ˆ f t (S t ,A t ,Y t ), so its state can be deterministically updated given inputs and outputs. Thus, observable models are always predictive, and unifilar models are observable if there is no randomness in the world’s initial condition. Moreover, unifilar models correspond to transducer whose kernels have the formκ τ (y, ̃s|a,s) =δ ̃s ˆ f τ (y,a,s) μ τ (y|a,s). The literature contains several procedures for building observable world models from non- observable ones (see (Subramanian et al., 2022; Ni et al., 2024) for general reviews and (Virgo et al., 2021; Biehl & Virgo, 2022; Virgo, 2023) for a categorical formulation). These approaches suggest to expand the phase space of world models from elements inSto distributions over those ∆(S), henceforth calledbelief states. This idea has been extensively studied for POMDPs via the notion ofbelief MDP(Kaelbling et al., 1998). We extend these ideas to more general transducers. Definition 7.Abelief transducerover a transducer(S,A,Y,κ,p)is another transducer B,A,Y,ˆκ,δ b 0 whereB ⊆∆(S)is a set of belief states,ˆκ:A×B ×N→∆(Y ×B)is a Markov kernel of the formˆκ τ (y,b ′ |a,b) :b,b ′ ∈ B,a∈ A,y∈ Y,τ∈Nsuch thatˆκ=Fκ for some functionalF, andb 0 ∈ Bis an initial belief. A belief transducer is said to befaithfulif it generates the same interface as the original transducer. A natural way to define beliefs about an underlying world modelS t is viapredictive Bayesian beliefs corresponding to the posterior distribution b t (s t ) : = Pr(S t =s t |H :t−1 =h :t−1 ). The dynamics of the updates of such beliefs are described by Bayesian prediction (Jazwinski, 1970; Särkkä & Svensson, 2023), and their properties have been further studied under the name of ‘mixed-states’ in computational mechanics (Riechers & Crutchfield, 2018; Jurgens & Crutchfield, 2021a). Building on this literature, we show that the predictive Bayesian beliefs of the memory states of transducers are unifilar and can be used to generate the same interface (proof in App. K). Proposition 2.If(S t ,A t ,Y t )is a transducer andB t is the predictive Bayesian belief ofS t , then (B t ,A t ,Y t )is a unifilar belief transducer whose state dynamics are given by b t+1 (s t+1 ) = 1 Z X s t p(y t ,s t+1 |a t ,s t )b t (s t ),(8) withZa normalisation constant. Moreover,(B t ,A t ,Y t )is faithful ifb 0 =p(s 0 ). Interestingly, I-O Moore transducers (Sec. 3.3) also allow forpostdictive Bayesian beliefs 6 of the form d t (s t ) : = Pr(S t =s t |G :t =g :t )withg t = (y t ,a t−1 ), in whichY t is used to inferS t . Our next result explains how predictive and postdictive Bayesian beliefs and MSPs relate, and how they set the bases for Bayesian and Kalman filtering (proof in App. L). Proposition 3.If(S t ,A t ,Y t )is a I-O Moore transducer andD t is the postdictive Bayesian belief ofS t , then(D t ,A t ,Y t )is a belief transducer whose state dynamics are given by d t+1 (s t+1 ) = p(y t+1 |s t+1 ) Z ′ X s t p(s t+1 |s t ,a t )d t (s t ),(9) withZ ′ a normalisation constant. Moreover,(D t ,A t ,Y t )is faithful ifd 0 =p(s 0 ). Corollary 1.Their predictive and postdictive Bayesian beliefs of I-O Moore transducers can be updated asb t−1 predict −→d t update −→b t following the ‘predict-update’ process from Bayesian filtering. A faithful belief transducers can be said to provide amixed-state presentation(MSP) of the under- lying transducer, extending previous work on MSPs of hidden Markov models (Jurgens & Crutch- field, 2021b;a). MSPs are generally not minimal, as they tend to have different mixed-states that are bisimilar. The reduction of these is studied in the next section. 5.2 Minimal predictive world models Following Barnett & Crutchfield (2015), we now present a method to build an observable world model directly from an interfaceI(Y|A)without the need to bootstrap from another world model. For this, consider an equivalence relation between histories in whichh :t−1 ∼ ε h ′ :t−1 when p(y t:t+T |h :t−1 ,a t:t+T ) =p(y t:t+T |h ′ :t−1 ,a t:t+T ),∀y t:t+L ,a t:t+L ,L∈N.(10) Let’s denote byε t the coarse-graining mapping that assigns each history to its corresponding equiv- alence classε t (h :t−1 ) = [h :t−1 ] ∼ ε , and defineM t =ε t (H :t−1 ). This construction is known aspredictive state representations(Littman & Sutton, 2001; Singh et al., 2004) andinstrumental states(Kosoy, 2019), and is based on older ideas for stochastic processes (without inputs/actions) from computational mechanics (Crutchfield & Young, 1989). Interestingly, the equivalence classes induced byεare the minimal bisimulation of the world modelS t =H :t−1 (Lemma 4), and therefore serve as memory states of a transducer that generates the original interface (first shown in (Barnett & Crutchfield, 2015, Prop 2), alternative proof in App. M). Proposition 4.(M t ,A t ,Y t )withM t =ε t (H :t−1 )is a transducer presentation forI(Y|A). The transducer with memory states given byM t =ε t (H :t−1 )resulting from Prop. 4 is known as the ε-transducerof the interfaceI(Y|A), and is unique up to isomorphism. The link between com- putational mechanics and other approaches such as predictive state representations was first noticed by Zhang et al. (2019), which explored it using a different computational structure instead of trans- ducers. A salient feature of these approaches is that they can provide observable world models over fewer states than other methods (Littman & Sutton, 2001). Our next result strengthens this intuition by proving that theε-transducer yields the most efficient predictive world model possible (proof in App. N), which takes inspiration from and extends (Barnett & Crutchfield, 2015, Lemma 1). Theorem 3.IfR t is a predictive world model of a transducer, then its minimal bisimulation is isomorphic to theε-transducer. Corollary 2.Theε-transducer is the minimal predictive model that generates a given interface. These results reveal, for instance, that the predictive beliefs on all world models converge via bisimu- lating into the memory states of theε-transducer. In effect, while bisimulations of arbitrary transduc- ers may not fully reduce world models (Sec. 4.2), bisimulations of predictive transducers necessarily 6 In general,postdictive world modelsS t satisfyI(S t ;Y t+1: |H :t ,A t+1: ) = 0. do so (see App. O). An analogous result can be derived for postdictive beliefs and world models, which are reduced into a ‘time-shifted’ε-transducer. This will be developed in a future publication. 6 Reverse interpretability via retrodictive world models The results of the last section show that theε-transducer is a universal construction that distils the information that is relevant for predicting future events, which can be used to evaluate the extent to which agents can learn through a given interface. However, prediction alone does not exhaust the possible knowledge-driven activities that can involve an agent. This section investigates reversible and retrodictive world models, exploring new opportunities for agent interpretability. 6.1 Reversible transducers The kernel of a transducer is usually used to update a world modelS t froms τ tos τ+1 . Interestingly, some transducers can be used to run things ‘backwards’, so that the world state can be updated from s τ+1 tos τ while generating the same interface. This is formalised by the next definition. 7 Definition 8.Areversible transduceris a transducer S,A,Y,κ,p together with an additional Markov kernelκ R of the formκ R t (y,s ′ |a,s) :a∈A,y∈Y,s,s ′ ∈S,t∈Nsuch that p(y :t ,s :t+1 |a : ) =p(s 0 ) t Y τ=0 κ τ (y τ ,s τ+1 |s τ ,a t ) =p(s t+1 |a :t ) t Y τ=0 κ R τ (y τ ,s τ |a τ ,s τ+1 ).(11) A reversible transducer can be run in reverse to produce the same interface. This can be used to analyse prior events that resulted in an undesirable world states ∗ t+1 after an agent executed actions a :t . This can be investigated via the distribution p(s :t ,y :t |a :t ,s ∗ t+1 ) = p(y :t ,s :t ,s ∗ t |a : ) p(s ∗ t+1 |a :t ) =κ R τ (y t ,s t |a t ,s ∗ t+1 ) t−1 Y τ=0 κ R τ (y τ ,s τ |a τ ,s τ+1 ),(12) which allows to study how outputs lead to actions and identify tipping points in the world dynamics. Unfortunately, not all transducers are reversible, as swapping past and future could break the condi- tion of anticipation-free — which is needed for a world model to yield a transducer (see App. Q). A necessary and sufficient condition for transducers to be reversed is provided next (proof in App. R). Theorem 4.A transducer is reversible forτ≤Tif and only if the dynamics of its memory state satisfyp(s τ |s τ+1 ,a :T ) =p(s τ |s τ+1 ,a τ )for allτ∈0,...,T, with a reverse kernel given by κ R τ (y,s|a, ̃s) = Pr(S τ =s|A τ =a) Pr(S τ+1 = ̃s|A τ =a) κ τ (y, ̃s|a,s).(13) Although Eq. (13) always leads to a valid kernel due to Bayes rule, this may not generate the same in- terface — in fact, Eq. (11) only holds when the conditions in Theorem 4 are met. Interestingly, those conditions can be attained in a variety of ways. For example, memoryless transducers (see Lemma 3) are always reversible asp(s t |s t+1 ,a :t ) =p(s t |s t+1 ,a t ) =p(s t ). Also, consistent with results by Ellison et al. (2011),action-agnostictransducers (i.e. hidden Markov models) can be shown to be always reversible (see Sec. R.2). Finally, if the transducer isaction-counifilar(i.e. if there exists fsuch thatS t =f(S t+1 ,A t )can be deterministically updated) 8 is also sufficient for reversibility, as such transducer satisfiesp(s τ |s τ+1 ,a :T ) =δ s τ f(s τ+1 ,a τ ) =p(s τ |s τ+1 ,a τ ). Examples of these conditions are illustrated in Figure 4. 7 This definition differs importantly from thermodynamically reversible transducers (Jurgens & Crutchfield, 2020). 8 This is a special case ofcounifilartransducers, in whichS t =f t (S t+1 ,A t ,Y t )holds. Reversible Transducers <latexit sha1_base64="tQWNgHdjvTBUQ7UrxmGfN0TmIrU=">AAACBnicbVDLSgMxFM34rPVVdSlCsAiuykwVdFl047JKX9CWkklv29BMZkjuiGXoyo2/4saFIm79Bnf+jem0C209EDg55x6Se/xICoOu++0sLa+srq1nNrKbW9s7u7m9/ZoJY82hykMZ6obPDEihoIoCJTQiDSzwJdT94fXEr9+DNiJUFRxF0A5YX4me4Ayt1MkdtRAeMLmDdMimaEUzZboxt/dxJ5d3C24Kuki8GcmTGcqd3FerG/I4AIVcMmOanhthO2EaBZcwzrZiAxHjQ9aHpqWKBWDaSbrGmJ5YpUt7obZHIU3V34mEBcaMAt9OBgwHZt6biP95zRh7l+1EqChGUHz6UC+WFEM66YR2hQaOcmQJ41rYv1I+YJpxtCVkbQne/MqLpFYseGeF4u15vnQ1qyNDDskxOSUeuSAlckPKpEo4eSTP5JW8OU/Oi/PufExHl5xZ5oD8gfP5A1Qxmas=</latexit> AB 1|0:0.5 1|1:0.5 0|0:1.0 0|1:1.0 1|0:0.5 1|1:0.5 A 0|0:0.5 1|0:0.5 1|1:1.0 Memoryless <latexit sha1_base64="Zpf6f06jsGpwssZZCWEv7QucKvI=">AAAB+nicbVBNS8NAEN3Ur1q/Uj16CRbBU0mqoMeiFy9CBfsBbSib7aRduvlgd6KG2J/ixYMiXv0l3vw3btsctPXBwOO9GWbmebHgCm372yisrK6tbxQ3S1vbO7t7Znm/paJEMmiySESy41EFgofQRI4COrEEGngC2t74auq370EqHoV3mMbgBnQYcp8zilrqm+UewiNmNxBEMhWg1KRvVuyqPYO1TJycVEiORt/86g0ilgQQIhNUqa5jx+hmVCJnAialXqIgpmxMh9DVNKQBKDebnT6xjrUysPxI6grRmqm/JzIaKJUGnu4MKI7UojcV//O6CfoXbsbDOEEI2XyRnwgLI2uagzXgEhiKVBPKJNe3WmxEJWWo0yrpEJzFl5dJq1Z1Tqu127NK/TKPo0gOyRE5IQ45J3VyTRqkSRh5IM/klbwZT8aL8W58zFsLRj5zQP7A+PwBPWSUog==</latexit> Action Co-Unifilar <latexit sha1_base64="Mn8BuXPCH56pnbWocXhbE01AwNE=">AAACAnicbVDLSsNAFJ3UV62vqCtxM1gEN5akCrqsduOygmkLbSiT6aQdOnkwcyOWUNz4K25cKOLWr3Dn3zhps9DWAxfOnHMvc+/xYsEVWNa3UVhaXlldK66XNja3tnfM3b2mihJJmUMjEcm2RxQTPGQOcBCsHUtGAk+wljeqZ37rnknFo/AOxjFzAzIIuc8pAS31zIMusAdIr2j2xPXo1MlcQeSkZ5atijUFXiR2TsooR6NnfnX7EU0CFgIVRKmObcXgpkQCp4JNSt1EsZjQERmwjqYhCZhy0+kJE3yslT72I6krBDxVf0+kJFBqHHi6MyAwVPNeJv7ndRLwL92Uh3ECLKSzj/xEYIhwlgfuc8koiLEmhEqud8V0SCShoFMr6RDs+ZMXSbNasc8q1dvzcu06j6OIDtEROkE2ukA1dIMayEEUPaJn9IrejCfjxXg3PmatBSOf2Ud/YHz+AEZ0l1k=</latexit> AB 1|1:1.01|1:1.0 0|0:0.5 1|0:0.5 1|0:1.0 Action Agnostic <latexit sha1_base64="9zfHzlUx2MzS4goPzqJMHN/av6w=">AAAB/3icbVC7TsMwFHV4lvIKILGwWFRITFVSkGBsYWEsEn1IbVQ5rtNadZzIvkFUoQO/wsIAQqz8Bht/g9NmgJYjWTo+597r6+PHgmtwnG9raXlldW29sFHc3Nre2bX39ps6ShRlDRqJSLV9opngkjWAg2DtWDES+oK1/NF15rfumdI8kncwjpkXkoHkAacEjNSzD7vAHiCt0eyKawMZaeB00rNLTtmZAi8SNycllKPes7+6/YgmIZNABdG64zoxeClRZppgk2I30SwmdEQGrGOoJCHTXjrdf4JPjNLHQaTMkYCn6u+OlIRaj0PfVIYEhnrey8T/vE4CwaWXchknwCSdPRQkAkOEszBwnytGQYwNIVRxsyumQ6IIBRNZ0YTgzn95kTQrZfesXLk9L1Wv8jgK6Agdo1PkogtURTeojhqIokf0jF7Rm/VkvVjv1sesdMnKew7QH1ifP2lmllo=</latexit> Figure 4:Three examples of reversible transducers. Circles represent world states, and arrows represent transitions and their labels describe the associated actions and outputs. For instance, the label1|0:0.5on the edge froms 0 tos 1 indicates thatPr(S t+1 =s 1 , Y t = 1|A t = 0, S t =s 0 ) = 0.5. 6.2 Retrodictive beliefs The previous subsection showed how there are substantial restrictions on the reversibility of trans- ducers. Even if an interface cannot be generated via a reversible transducer, there are still ‘retrod- ictive’ constructions that can be used to investigate their dynamics. Retrodiction uses the future to learn about the past in the same way that prediction uses the past to learn about the future. Formal treatments of retrodiction include classic work in physics (Watanabe, 1955) and filtering theory (Jazwinski, 1970), and in more recent years have been formalised in computational mechan- ics (Ellison et al., 2009) and category theory (Parzygnat & Buscemi, 2023; Parzygnat, 2024). Following these ideas, one can buildretrodictive Bayesian beliefs(or mixed states) of a world modelS t as distributions overSgiven by r t (s 0 ) : = Pr(S 0 =s 0 |H :t =h :t ). These beliefs provide an analogue of the backward pass of Bayesian smoothing (Jazwinski, 1970), in the same way that the predictive and postdictive beliefs of input-Moore transducers correspond to different steps of Bayesian filtering (Prop. 3). However, in contrast with predictive Bayesian beliefs which can always be faithfull (Prop. 2), retrodictive beliefs may not be able to generate the same interface. In order to study the dynamics of retrodictive beliefs, we introduce thebi-directional mixed-state matrix(BDMSM) of an action-outcome sequenceρ(y 0:t ,a 0:t )as the|S|×|S|matrix given by ρ(y 0:t ,a 0:t ) : = X s 0 ,s t+1 ∈S p(s 0 ,s t+1 |y 0:t ,a 0:t )e s t+1 e ⊺ s 0 .(14) The BDMSM allows to calculate retrodictive beliefs and their dynamics (proof in App. S). Theorem 5.Given a world modelS t , its BDMSM, predictive Bayesian beliefsb τ and retrodictive Bayesian beliefsr τ can be calculated as ρ(y 0:τ ,a 0:τ ) = T (y 0:τ |a 0:τ ) ρ 0 1 ⊺ ·T (y 0:τ |a 0:τ ) ρ 0 ·1 ,b τ =ρ(y 0:τ ,a 0:τ )·1,and r τ =ρ(y 0:τ ,a 0:τ ) ⊺ ·1, whereT (y τ:τ ′ |a τ:τ ′ ) = Q j=τ T (y j |a j ) j andρ t = P s t p(s t )e s t e ⊺ s t is a diagonal matrix. Corollary 3.The forward-time update of the BDMSM is given by ρ(y 0:τ+1 ,a 0:τ+1 ) = T (y τ+1 |a τ+1 ) 1 ⊺ ·T (y τ+1 |a τ+1 ) ρ(y 0:τ ,a 0:τ )·1 ·ρ(y 0:τ ,a 0:τ ),(15) while the reverse-time update is ρ(y −1:τ ,a −1:τ ) =ρ(y 0:τ ,a 0:τ )· ρ −1 0 T (y −1 |a −1 ) ρ −1 1 ⊺ ·ρ(y 0:τ ,a 0:τ )ρ −1 0 T (y −1 |a −1 ) ρ −1 ·1 .(16) Retrodictive beliefs can be used to infer the most likely past states of the world given a sequence of future actions and outcomes. This could lead, for instance, to identifying the origins of specific behavioural patterns exhibited by an AI agent, which can in turn be used to characterise favourable or dangerous initial conditions via counterfactual reasoning (Karimi et al., 2021). 7 Conclusion This paper investigated the fundamental limits that shape the usage of world models as tools to eval- uate AI agents. This follows recent proposals to use world models not as tools for the agent (as in standard model-based reinforcement learning), but as tools for the scientist in charge of evaluating its safety and reliability (Dalrymple et al., 2024). By formalising these ideas via principles from compu- tational mechanics, this approach led to a series of proposals for how to assess AI agents that require no assumptions about an agent’s policy, architecture, or capabilities, being broadly applicable to systems regardless of how they were designed or trained. This framework revealed fundamental limits, challenges, and opportunities inherent to world modelling, leading to actionable guidelines that can inform core design choices instrumental for effective agent evaluation (see Figure 1). Our framework revealed a fundamental trade-off between the efficiency and interpretability of world models. Generalised transducers were found to generate the most efficient implementations, but these come at the cost of inducing quasi-probabilities — yielding opaque world models that cannot be sampled. 9 Our results also revealed that theε-transducer, a generalisation of the geometric belief structure recently found in the residual stream of transformers (Shai et al., 2025), yields the unique minimal world model that could be calculated by an agent in real time. The uniqueness of theε- transducer implies that the refinement of the beliefs of any optimal predictive agent must eventually reach this model, regardless of the world model the agent uses. Thus, theε-transducer can be seen as encapsulating all the predictive information that is available for agents, and hence establishes what is learnable about an environments through a particular interface. We also introduced retrodictive world models as tools to investigate the origins of undesirable events or behaviours. These models allow retrospective analyses that could, for instance, identify ‘danger zones’ that are likely to lead to undesirable future outcomes. This view complements standard interpretability approaches, which typically assess agents via their capabilities to predict and plan with respect to future events (Nanda et al., 2023; Gurnee & Tegmark, 2023; Shai et al., 2025). While this work focused on the fundamental limits of world modelling under the dictum of per- fect reconstruction, future work may relax this constraint by employing notions such as approxi- mate homomorphisms (Taylor et al., 2008) or bisimulation (Girard & Pappas, 2011), rate-distortion trade-offs (Marzen & Crutchfield, 2016), or other approaches (Subramanian et al., 2022). Another promising direction to enable efficient modelling is to exploit the compositional structure of the world (Lake & Baroni, 2023; Elmoznino et al., 2024; Baek et al., 2025; Fu et al., 2025). The approach taken here complements the substantial body of work that employs world models to improve the performance of agents in model-based reinforcement learning (Ha & Schmidhuber, 2018; Hafner et al., 2020; 2023; Hansen et al., 2024), and also on representations from the point of view of the agent (see (Ni et al., 2024) and references within). In fact, the formalism presented here provides a unified framework for reasoning about both (i) models that represent physical processes external to the agent and (i) models that describe knowledge-gathering processes internal to the agent (Kaelbling et al., 1998; Biehl & Virgo, 2022; Virgo, 2023). Furthermore, the relationship found between predictive and postdictive machines in I-O Moore transducers and Bayesian and Kalman filtering sheds new light into the mechanisms supporting these well-established procedures. Moreover, the formalism of belief transducers opens several interesting avenues for future work, including the investigation of more general belief update dynamics on, for example, curved statistical manifolds (Morales & Rosas, 2021; Aguilera et al., 2024). Overall, the ideas put forward here establish new bridges between related subjects in reinforcement learning, control theory, and computational mechanics, which we hope may serve as a Rosetta stone for navigating across these literatures. These new insights also have interesting implications for cognitive and computational neuroscience (Matsuo et al., 2022), particularly pertaining the formal characterisation of the internal world (‘umwelt’) of an agent (Von Uexkül, 1909; Ay & Löhr, 2015; Baltieri et al., 2025), which will be explored in future work. 9 This is reminiscent of the notion of Kantian noumena, which suggests that things-in-themselves are beyond knowledge. Acknowledgments The authors thank Lionel Barnett, Martin Biehl, Chris Buckley, Matteo Capucci, James Crutchfield, Alexander Gietelink Oldenziel, Adam Goldstein, Alexandra Jurgens, Vanessa Kosoy, Sarah Marzen, Paul Riechers, Anil Seth, Adam Shai, and Lucas Teixera for inspiring discussions and useful feed- back. The work of F.R. and A.B. has been supported by UK ARIA’s Safeguarded AI programme. F.R. has also been supported by the PIBBSS Affiliateship programme. M.B. was supported by JST, Moonshot R&D, Grant Number JPMJMS2012. References Miguel Aguilera, Pablo A Morales, Fernando E Rosas, and Hideaki Shimazaki.Explosive neural networks via higher-order interactions in curved statistical manifolds.arXiv preprint arXiv:2408.02326, 2024. Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep rein- forcement learning: A brief survey.IEEE Signal Processing Magazine, 34(6):26–38, 2017. Shahab Asoodeh, Fady Alajaji, and Tamás Linder. Notes on information-theoretic privacy. In2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), p. 1272–1278. IEEE, 2014. Nihat Ay and Wolfgang Löhr. The umwelt of an embodied agent—a measure-theoretic definition. Theory in Biosciences, 134(3):105–116, 2015. Junyeob Baek, Yi-Fu Wu, Gautam Singh, and Sungjin Ahn. Dreamweaver: Learning compositional world representations from pixels.arXiv preprint arXiv:2501.14174, 2025. Vijay Balasubramanian. Equivalence and reduction of hidden Markov models. Technical report, Massachusetts Institute of Technology, 01 1993. Manuel Baltieri and Christopher L Buckley. An active inference implementation of phototaxis. In Artificial Life Conference Proceedings, p. 36–43. MIT Press, 2017. Manuel Baltieri and Christopher L. Buckley. Generative models as parsimonious descriptions of sensorimotor loops.Behavioral and Brain Sciences, 42:e218, 2019.DOI: 10.1017/ S0140525X19001353. Manuel Baltieri, Martin Biehl, Matteo Capucci, and Nathaniel Virgo. A bayesian interpretation of the internal model principle.arXiv preprint arXiv:2503.00511, 2025. Nix Barnett and James Crutchfield. Computational mechanics of input–output processes: Structured transformations and theε-transducer.Journal of Statistical Physics, 161(2):404–451, 2015. Yoshua Bengio, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Danielle Goldfarb, Hoda Heidari, Leila Khalatbari, et al. International sci- entific report on the safety of advanced AI (interim report).arXiv preprint arXiv:2412.05282, 2024. Martin Biehl and Nathaniel Virgo. Interpreting systems as solving pomdps: a step towards a formal understanding of agency. InInternational Workshop on Active Inference, p. 16–31. Springer, 2022. D Blackwell, RV Ramamoorthi, et al. A bayes but not classically sufficient statistic.The Annals of Statistics, 10(3):1025–1026, 1982. Filippo Bonchi, Elena Di Lavore, and Mario Román. Effectful Mealy machines: Bisimulation and trace.arXiv preprint arXiv:2410.10627, 2024. Merve Nur Cakir, Mehwish Saleemi, and Karl-Heinz Zimmermann. On the theory of stochastic automata.arXiv preprint arXiv:2103.14423, 2021. George Casella and Roger L Berger.Statistical inference, volume 2. Duxbury Pacific Grove, CA, 2002. Andy Clark. Whatever next? predictive brains, situated agents, and the future of cognitive science. Behavioral and brain sciences, 36(3):181–204, 2013. Volker Claus.Stochastische Automaten. Teubner Studienskripten, 1971. Thomas M Cover and Joy A Thomas.Elements of Information Theory. John Wiley & Sons, 2012. James Crutchfield. The calculi of emergence: Computation, dynamics and induction.Physica D: Nonlinear Phenomena, 75(1-3):11–54, 1994. James Crutchfield. Between order and chaos.Nature Physics, 8(1):17–24, 2012. James Crutchfield and Karl Young. Inferring statistical complexity.Physical review letters, 63(2): 105, 1989. David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe AI: A framework for ensuring robust and reliable ai systems.arXiv preprint arXiv:2405.06624, 2024. René Descartes.Meditationes de Prima Philosophia. Michael Soly, 1641. Natalia Díaz-Rodríguez, Javier Del Ser, Mark Coeckelbergh, Marcos López de Prado, Enrique Herrera-Viedma, and Francisco Herrera. Connecting the dots in trustworthy artificial intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation.Infor- mation Fusion, 99:101896, 2023. Christopher Ellison, John Mahoney, and James Crutchfield. Prediction, retrodiction, and the amount of information stored in the present.Journal of Statistical Physics, 136:1005–1034, 2009. Christopher J Ellison, John R Mahoney, Ryan G James, James P Crutchfield, and Jörg Reichardt. In- formation symmetries in irreversible processes.Chaos: An Interdisciplinary Journal of Nonlinear Science, 21(3), 2011. Eric Elmoznino, Thomas Jiralerspong, Yoshua Bengio, and Guillaume Lajoie. A complexity-based theory of compositionality.arXiv preprint arXiv:2410.14817, 2024. Yariv Ephraim and Neri Merhav. Hidden Markov processes.IEEE Transactions on information theory, 48(6):1518–1569, 2002. Norman Ferns and Doina Precup. Bisimulation metrics are optimal value functions. InUAI, p. 210–219, 2014. Ronald A Fisher. On the mathematical foundations of theoretical statistics.Philosophical Transac- tions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222(594-604):309–368, 1922. Shuhao Fu, Andrew Jun Lee, Anna Wang, Ida Momennejad, Trevor Bihl, Hongjing Lu, and Tay- lor W Webb. Evaluating compositional scene understanding in multimodal generative models. arXiv preprint arXiv:2503.23125, 2025. Antoine Girard and George J Pappas. Approximate bisimulation: A bridge between computer sci- ence and control theory.European Journal of Control, 17(5-6):568–578, 2011. Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimization in Markov decision processes.Artificial intelligence, 147(1-2):163–223, 2003. Wes Gurnee and Max Tegmark. Language models represent space and time. InProceedings of the International Conference on Learning Representations (ICLR’23), 2023. David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018. Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InProceedings of the International Conference on Learning Representations (ICLR’20), 2020. Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for con- tinuous control. InProceedings of the International Conference on Learning Representations (ICLR’24), 2024. Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, and Hao Chen. Security of AI agents.arXiv preprint arXiv:2406.08689, 2024. Qingqing Huang, Rong Ge, Sham Kakade, and Munther Dahleh. Minimal realization problems for hidden markov models.IEEE Transactions on Signal Processing, 64(7):1896–1904, 2015. Hisashi Ito, S-I Amari, and Kingo Kobayashi. Identifiability of hidden Markov information sources and their minimum degrees of freedom.IEEE transactions on information theory, 38(2):324–333, 1992. David N Jansen, Flemming Nielson, and Lijun Zhang. Belief bisimulation for hidden Markov mod- els: Logical characterisation and decision algorithm. InNASA Formal Methods: 4th International Symposium, NFM 2012, Norfolk, VA, USA, April 3-5, 2012. Proceedings 4, p. 326–340. Springer, 2012. Andrew H. Jazwinski.Stochastic Processes and Filtering Theory, volume 64 ofMathematics in Science and Engineering. Academic Press, New York, 1970. ISBN 0123815509. Alexandra Jurgens and James Crutchfield. Shannon entropy rate of hidden Markov processes.Jour- nal of Statistical Physics, 183(2):32, 2021a. Alexandra M Jurgens and James P Crutchfield. Functional thermodynamics of maxwellian ratchets: Constructing and deconstructing patterns, randomizing and derandomizing behaviors.Physical Review Research, 2(3):033334, 2020. Alexandra M Jurgens and James P Crutchfield. Divergent predictive states: The statistical com- plexity dimension of stationary, ergodic hidden markov processes.Chaos: An Interdisciplinary Journal of Nonlinear Science, 31(8), 2021b. Leslie Pack Kaelbling, Michael Littman, and Anthony Cassandra. Planning and acting in partially observable stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998. Olav Kallenberg.Foundations of modern probability, volume 2. Springer, 1997. Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. Algorithmic recourse: from coun- terfactual explanations to interventions. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, p. 353–362, 2021. Daphne Koller and Nir Friedman.Probabilistic graphical models: principles and techniques. MIT press, 2009. Andrei Nikolaevitch Kolmogorov. Determination of the centre of dispersion and degree of accuracy for a limited number of observation.Izv. Akad. Nauk, USSR Ser. Mat, 6:3–32, 1942. Vanessa Kosoy.Reinforcement learning with imperceptible rewards, 2019.URL https://w.alignmentforum.org/posts/aAzApjEpdYwAxnsAS/ reinforcement-learning-with-imperceptible-rewards-1. Solomon Kullback.Information theory and statistics. Courier Corporation, 1997. Brenden M Lake and Marco Baroni. Human-like systematic generalization through a meta-learning neural network.Nature, 623(7985):115–121, 2023. Benjamin Larsen, Cathy Li, Stephanie Teeuwen, Olivier Denti, Jason DePerro, and Efi Raili. Navi- gating the AI frontier: A primer on the evolution and impact of ai agents. Technical report, World Economic Forum, December 2024. Edward Ashford Lee and Sanjit Arunkumar Seshia.Introduction to embedded systems: A cyber- physical systems approach. MIT press, 2017. Erich L Lehmann and George Casella.Theory of point estimation. Springer Science & Business Media, 2006. Erich Leo Lehmann and Henry Scheffé. Completeness, similar regions, and unbiased estimation- part i. InSelected Works of EL Lehmann, p. 233–268. Springer, 2012. Michael Littman and Richard S Sutton. Predictive representations of state.Advances in neural information processing systems, 14, 2001. Samuel P Loomis and James P Crutchfield. Strong and weak optimizations in classical and quantum models of stochastic processes.Journal of Statistical Physics, 176(6):1317–1342, 2019. Samuel P Loomis and James P Crutchfield. Topology, convergence, and reconstruction of predictive states.Physica D: Nonlinear Phenomena, 445:133621, 2023. Francesco Mannella, Federico Maggiore, Manuel Baltieri, and Giovanni Pezzulo. Active inference through whiskers.Neural Networks, 144:428–437, 2021. Sarah E Marzen and James P Crutchfield. Predictive rate-distortion for infinite-order markov pro- cesses.Journal of Statistical Physics, 163:1312–1338, 2016. Yutaka Matsuo, Yann LeCun, Maneesh Sahani, Doina Precup, David Silver, Masashi Sugiyama, Eiji Uchibe, and Jun Morimoto. Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275, 2022. Marvin Lee Minsky.Computation: Finite and Infinite Machines. Prentice-Hall Englewood Cliffs, 1967. Pablo A Morales and Fernando E Rosas. Generalization of the maximum entropy principle for curved statistical manifolds.Physical Review Research, 3(3):033216, 2021. Neel Nanda, Andrew Lee, and Martin Wattenberg. Emergent linear representations in world models of self-supervised sequence models. In Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Najoung Kim, Arya McCarthy, and Hosein Mohebbi (eds.),Proceedings of the 6th BlackboxNLP Work- shop: Analyzing and Interpreting Neural Networks for NLP, p. 16–30, Singapore, December 2023. Association for Computational Linguistics. DOI: 10.18653/v1/2023.blackboxnlp-1.2. URL https://aclanthology.org/2023.blackboxnlp-1.2/. Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Ma- hajan, and Pierre-Luc Bacon. Bridging state and history representations: Understanding self- predictive rl.arXiv preprint arXiv:2401.08898, 2024. Yoshito Ohta. On the realization of hidden Markov models and tensor decomposition.IFAC- PapersOnLine, 54(9):725–730, 2021. J Kevin O’Regan and Alva Noë. A sensorimotor account of vision and visual consciousness.Be- havioral and brain sciences, 24(5):939–973, 2001. Arthur J Parzygnat. Reversing information flow: retrodiction in semicartesian categories.arXiv preprint arXiv:2401.17447, 2024. Arthur J Parzygnat and Francesco Buscemi. Axioms for retrodiction: achieving time-reversal sym- metry with a prior.Quantum, 7:1013, 2023. Plato.Republic. The Academy, 375 BC. Hilary Putnam.Reason, truth and history. Cambridge University Press Cambridge, 1981. Balaraman Ravindran and Andrew G. Barto. SMDP homomorphisms: An algebraic approach to abstraction in semi markov decision processes. InProceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Aug 2003. Paul Riechers and James Crutchfield. Spectral simplicity of apparent complexity. I. the nondiago- nalizable metadynamics of prediction.Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(3), 2018. Paul Michael Riechers.Exact results regarding the physics of complex systems via linear algebra, hidden Markov models, and information theory. University of California, Davis, 2016. Simo Särkkä and Lennart Svensson.Bayesian filtering and smoothing, volume 17. Cambridge university press, 2023. Anil K Seth and Manos Tsakiris. Being a beast machine: The somatic basis of selfhood.Trends in cognitive sciences, 22(11):969–981, 2018. Adam Shai, Lucas Teixeira, Alexander Oldenziel, Sarah Marzen, and Paul Riechers. Transformers represent belief state geometry in their residual stream.Advances in Neural Information Process- ing Systems, 37:75012–75034, 2025. Satinder Singh, Michael James, and Matthew Rudary. Predictive state representations: A new the- ory for modeling dynamical systems. InProceedings of the 20th Conference in Uncertainty in Artificial Intelligence, p. 512–518, 01 2004. Jayakumar Subramanian, Amit Sinha, Raihan Seraj, and Aditya Mahajan. Approximate information state for approximate planning and reinforcement learning in partially observed systems.Journal of Machine Learning Research, 23(12):1–83, 2022. Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, et al. Prioritizing safeguarding over autonomy: Risks of LLM agents for science.arXiv preprint arXiv:2402.04247, 2024. Jonathan Taylor, Doina Precup, and Prakash Panagaden. Bounding performance loss in approximate MDP homomorphisms.Advances in Neural Information Processing Systems, 21, 2008. Jianjun Tian and Dan Kannan. Lumpability and commutativity of Markov processes.Stochastic analysis and Applications, 24(3):685–702, 2006. Alexander Tschantz, Anil K Seth, and Christopher L Buckley. Learning action-oriented models through active inference.PLOS Computational Biology, 16(4):e1007805, 2020. Daniel Ray Upper.Theory and algorithms for hidden Markov models and generalized hidden Markov models. PhD thesis, University of California, Berkeley, 1997. Mathukumalli Vidyasagar. The complete realization problem for hidden Markov models: A survey and some new results.Mathematics of Control, Signals, and Systems, 23(1):1–65, 2011. Nathaniel Virgo. Unifilar machines and the adjoint structure of bayesian filtering.arXiv preprint arXiv:2305.02826, 2023. Nathaniel Virgo, Martin Biehl, and Simon McGregor. Interpreting dynamical systems as bayesian reasoners.InJoint European conference on machine learning and knowledge discovery in databases, p. 726–762. Springer, 2021. Jakob Von Uexkül.Umwelt und innenwelt der tiere. Springer, 1909. Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):186345, 2024. Xu Wang, Sen Wang, Xingxing Liang, Dawei Zhao, Jincai Huang, Xin Xu, Bin Dai, and Qiguang Miao. Deep reinforcement learning: A survey.IEEE Transactions on Neural Networks and Learning Systems, 35(4):5064–5078, 2022. Satosi Watanabe. Symmetry of physical laws. part i. prediction and retrodiction.Reviews of Modern Physics, 27(2):179, 1955. Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. A survey on multimodal large language models.National Science Review, 11(12):1–20, 11 2024. Amy Zhang, Zachary C Lipton, Luis Pineda, Kamyar Azizzadenesheli, Anima Anandkumar, Lau- rent Itti, Joelle Pineau, and Tommaso Furlanello. Learning causal state representations of partially observable environments.arXiv preprint arXiv:1906.10437, 2019. Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invari- ant representations for reinforcement learning without reconstruction. InProceedings of the 38th International Conference on Machine Learning, 2021. Supplementary Materials A Example of an interface and multiple world models Let us present an example to illustrate the notions of interface and world models. This example showcases how a single interface can be generated by various world models with different properties. Consider a scenario in which a robot is manipulating a deck of cards. This setting can be described via a world model that can adopt|S|= 13!≈6×10 10 possible states, corresponding to the possible arrangements of the deck. At every time point, the robot can take two possible actions: either it puts the front card in the back (α 1 ), or it shuffles the deck (α 2 ). Thus, the set of actions available to the robot isA=α 1 ,α 2 . Additionally, at every time point the robot can observe the card that is on top of the deck. However, we will assume that the sensory apparatus of the robot is not capable of reading the number or the suit of the card, but only its colour. Hence, the possible outcomes of this scenario for the robot areY=black,red. In this scenario, the interface of the robot is constituted by a collection of probability distributions of the formp(y :t |a :t )relating sequences of actions with sequences of outcomes. Furthermore, if one tracks the state of the deck at timetvia the variableS t , this results in a transformer(S t ,A t ,Y t ) (see Def. 3). If we wanted to implement this transducer on a computer, specifying its kernel would require a substantial amount of memory due to the large number of possible world states. Following the considerations made in Sec. 2, one could instead forget about the fact that there is an underlying deck of cards and focus just on the sequences of colours that the agent records. By doing this, one may notice that, given a sequence of actions and outcomes(a :t ,y :t ), the only information that is relevant to predict the next outcomey t+1 is the number of red and black cards observed since the last time the agent took actionα 2 (shuffling the deck). If one tracks this information at time tvia the variableM t , then Prop. 4 guarantees that(M t ,A t ,Y t )provides an alternative transducer presentation of the original interface. Note that this is an ‘epistemic’ world model that reflects the agent’s state of knowledge (as described in Sec. 5), contrasting withS t which reflects an objective physical process taking place ‘out there’. Interestingly,M t uses only roughly13 2 states instead of 13!states, requiring substantially fewer memory resources. Furthermore,M t reflects all the relevant information that an agent with this particular interface (i.e. limited to recognising colours) could ever want to take into account in order to make informed actions in this scenario. Therefore, this new world model is not only more memory efficient, but also reveals what information an agent of this kind could and should learn. Before concluding, let us add some considerations related to the ideas explored in the latest part of the manuscript. Sec. 6 studies world models that ‘run backwards’ — i.e. can be updated in reverse time. The interface chosen in this example does not allow for such a reversible presentation, as the combinations of shuffling (α 2 ) and card flipping (α 1 ) lead to world dynamics that violate the conditions outlined in Theorem 4. One could attain a reversible interface if we considered a different set of actions, for exampleA ′ =α 1 ,α 3 withα 3 corresponding to the robot taking the card in the back and putting that on top of the deck. Indeed, the world dynamics resulting from such a set of actions do satisfy the sufficient condition for reversibility discussed at the end of Sec. 6.1. B Sufficient statistics Given the importance of the notion of sufficient statistics in this work, we use this appendix to provide an account of its origins and significance. Consider a random vectorX= (X 1 ,...,X n )∈ X n that follows a distribution with parameter θ∈Θ, and a ‘statistic’T(·)(that is, a mappingT:X n →R). Following Fisher (1922),Y=T(X) is aclassical/frequentist sufficient statisticforXw.r.t.θif the value ofPr θ (X=x|Y=y)is the same∀θ∈Θ(Casella & Berger, 2002). This means that the information given byXthat is not in Yis irrelevant to estimating the value ofθ. Another approach to statistical sufficiency due to Kolmogorov (1942), which can be calledstrong bayesian statistical sufficiency, states thatYis sufficient forXw.r.t.θifXis statistically indepen- dent ofθgivenYfor any prior distribution overθ. Strong Bayesian sufficiency can be shown to imply classical sufficiency, but the converse does not necessarily hold (Blackwell et al., 1982). A useful generalisation of the above condition, which we simply call(weak) Bayesian statistical sufficiency, follows Kolmogorov’s condition just for a given distribution ofθ(Cover & Thomas, 2012). In particular, given two random variablesXandY, a statisticT=f(X)is said to be a Bayesian sufficient statistic forXw.r.t.YifXis statistically independent ofYgivenT. In terms of Shannon’s mutual information, this corresponds to the conditionPr(X=x|Y=y,T=t) = Pr(X=x|T=t). This is equivalent to the information-theoretic conditionI(X;Y|T) = 0, which states thatXandYshare no information that is not given byT(Cover & Thomas, 2012). This is the definition of sufficient statistics that we use in this work. Another way to think about sufficient statistics is to notice that ifT=f(X)for some mappingf thenT−X−Yis a Markov chain. Then, thanks to the data processing inequality,I(Y;X)≥ I(Y;T)as ‘processing’XintoTcannot increase its information aboutY(Cover & Thomas, 2012). Interestingly, the equalityI(Y;X) =I(Y;T)is attained if an only ifX−T−Yis also a Markov chain, which corresponds to whenTis a sufficient statistic. In summary, sufficient statistics are related to optimal (i.e. lossless) data processing (Kullback, 1997). Sufficient statistics always exist — in particular,Xis always sufficient for itself. The search for optimal but also efficient statistics leads to the idea of minimal sufficiency: a sufficient statisticSis minimal if for all other sufficient statisticTexists a functionf(·)such thatS=f(T)(Lehmann & Scheffé, 2012), or equivalently, the following Markov chain holds:S−T−X−Y. From an information-theoretic point of view, a minimal sufficient statistic is the sufficient statistic of minimal entropy, hence providing the most parsimonious representation of the relevant information. Mini- mal sufficient statistics exist for a wide range of settings (Lehmann & Casella, 2006, Sec. 1.6), and are unique up to isomorphisms (i.e. re-labelling). Moreover, the minimal sufficient statistics ofXw.r.t.Ycan be built explicitly, built as the partition induced by the following equivalence relation (Asoodeh et al., 2014, Def. 2): x∼x ′ iff∀y∈Y:p(y|x) =p(y|x ′ ).(17) Note the similarities between this way of building minimal sufficient statistics, bisimulation (Def. 5), and the construction of theε-transducer via the equivalence relation in Eq. (10). C Proof of Lemma 1 Proof.Let us first prove that ifS t is a world model for the interfaceI(Y|A), then Eq. (2) holds. Using property (2) of world models, together with the fact that the interface is anticipation-free, one can show that p(y :τ ,s :τ+1 |a : ) =p(y :τ |a : )p(s :τ+1 |y :τ ,a : ) =p(y :τ |a :τ )p(s :τ+1 |y :τ ,a :τ ) =p(y :τ ,s :τ+1 |a :τ ),(18) which holds for allτ∈N. Then, one can use this equality recursively to derive the following: p(y :t ,s :t+1 |a : ) =p(y :t ,s :t+1 |a :t ) =p(y :t−1 ,s :t |a :t )p(y t ,s t+1 |y :t−1 ,s :t ,a :t ) =p(y :t−1 ,s :t |a :t−1 )p(y t ,s t+1 |y :t−1 ,s :t ,a :t ) =p(y :t−2 ,s :t−1 |a :t−1 ) t Y τ=t−1 p(y τ ,s τ+1 |y :τ−1 ,s :τ ,a :τ ) =... =p(s 0 ) t Y τ=0 p(y τ ,s τ+1 |y :τ−1 ,s :τ ,a :τ ).(19) Note that, in the last step,p(s 0 |a 0 ) =p(s 0 )follows by applying property (2) fort= 0. Now, using property (1) one can show that p(y τ ,s τ+1 |y :τ−1 ,s :τ ,a :τ ) =p(y τ |y :τ−1 ,s :τ ,a :τ )p(s τ+1 |y :τ ,s :τ ,a :τ ) =p(y τ |s τ ,a τ )p(s τ+1 |y :τ ,s :τ ,a :τ ).(20) The desired result follows from putting together Eq. (19) and Eq. (20). For the converse, let’s show that if Eq. (2) holds, thenS t satisfies the two properties of world models as given in Def. 2. The first property can be proven directly as follows: p(y t |s :t ,a :t ,y :t−1 ) = P s t+1 p(y :t ,s :t+1 |a :t ) P s t+1 ,y t p(y :t ,s :t+1 |a :t ) = P s t+1 p(s 0 ) Q t τ= p(y τ |s τ ,a τ )p(s τ+1 |s :τ ,h :τ ) P s t+1 ,y t p(s 0 ) Q t τ=0 p(y τ |s τ ,a τ )p(s τ+1 |s :τ ,h :τ ) = P s t+1 p(y t |s t ,a t )p(s t+1 |s :t ,h :t ) P s t+1 ,y t p(y t |s τ ,a t )p(s t+1 |s :t ,h :t ) =p(y t |s t ,a t ).(21) Similarly, the second property can be proven as follows: p(s :t |h :t−1 ,a t: ) = p(s :t ,y :t−1 |a : ) p(y :t−1 |a : ) = p(s 0 ) Q t−1 τ= p(y τ |s τ ,a τ )p(s τ+1 |s :τ ,h :τ ) P s :t p(s 0 ) Q t−1 τ= p(y τ |s τ ,a τ )p(s τ+1 |s :τ ,h :τ ) (a) =p(s :t |h :t−1 ).(22) Above, (a) follows from the fact that the variablesa t: do not appear in the previous expression. D Alternative characterisations of a transducer Lemma 2 characterises transducers as world models. However, this characterisation is not the most useful when investigating if a given processS t qualifies as the memory state of a transducer. Here we provide two alternative characterisations of a transducer that are better suited to those tasks. Lemma 6.The processS t provides a memory state for a transducer presentation of an anticipation- free interfaceI(Y|A)if and only if one of the following conditions hold: 1.p(s 0 |a : ) =p(s 0 )andp(s t+1 ,y t |s :t ,h :t−1 ,a t: ) =p(s t+1 ,y t |s t ,a t )for allt∈N. 2.I(S t i +1:t ,Y t i :t−1 ;A t: |A t i :t−1 ,S t i ) =I(S t+1: ,Y t: ;Y :t−1 ,S :t−1 ,A :t−1 |A t: ,S t ) = 0for all t i ,t∈Nwitht i ≤t. Proof.We prove these equivalences in two steps. Step 1: Equivalence between condition (1) and Lemma 2 Let’s first prove that Lemma 2 imply condition (1). By Lemma 1, ifS t is a world model then p(s t+1 ,y t |s :t ,h :t−1 ,a t: ) =p(y t |s t ,a t )p(s t+1 |s :t ,h :t ). Combining this withp(s t+1 |s :t ,h :t ) = p(s t+1 |s t ,h t )(Lemma 2), it is clear thatp(s t+1 ,y t |s :t ,h :t−1 ,a t: ) =p(s t+1 ,y t |s t ,a t ). To prove the converse, let us now show that condition (1) guarantees thatS t is a world model that satisfiesp(s t+1 |s :t ,h :t ) =p(s t+1 |s t ,h t ). Property (1) of world models can be proven as follows: p(y t |s :t ,a :t ,y :t−1 ) = X s t+1 p(s t+1 ,y t |s :t ,h :t−1 ,a t ) (a) = X s t+1 p(s t+1 ,y t |s t ,a t ) =p(y t |s t ,a t ),(23) where (a) uses condition (1). Property (2) follows from p(s :t |h :t−1 ,a t: ) = p(s :t ,y :t−1 |a : ) p(y :t−1 |a : ) = Q t−1 τ=0 p(s τ+1 ,y τ |s :τ ,y :τ−1 ,a : ) p(y :t−1 |a : ) (b) = Q t−1 τ=0 p(s τ+1 ,y τ |s :τ ,y :τ−1 ,a :t−1 ) p(y :t−1 |a :t−1 ) = p(s :t ,y :t−1 |a :t−1 ) p(y :t−1 |a :t−1 ) =p(s :t |h :t−1 ),(24) where (b) is using condition (1) and the fact that the interface is anticipation free. Finally, the Markovianity of state dynamics can be proven as follows: p(s t+1 |s :t ,h :t ) = p(s t+1 ,y t |s :t ,a t ,h :t−1 ) P s t+1 p(s t+1 ,y t |s :t ,a t ,h :t−1 ) = p(s t+1 ,y t |s t ,a t ) P s t+1 p(s t+1 ,y t |s t ,a t ) =p(s t+1 |s t ,h t ). (25) Part 2: Equivalence between conditions (1) and (2) Let’s first show that condition (2) implies condition (1). For this, let’s first note that in general if I(A;B|C) = 0holds for some variablesA,B, andC, thenp(a|c) =p(a|b,c). Thus, the condition I(S t i +1:t ,Y t i :t−1 ;A t: |A t i :t−1 ,S t i ) = 0implies that p(s t i +1:τ ,y t i :τ−1 |a t i : ,s t i ) =p(s t i +1:τ ,y t i :τ−1 |a t i :τ−1 ,s t i ),(26) holding for allτ∈Nwitht i ≤τ≤t. Similarly,I(S t+1: ,Y t: ;Y :t−1 ,S :t−1 ,A :t−1 |A t: ,S t ) = 0 implies that p(s τ+1:t+1 ,y τ:t |y :τ−1 ,s :τ ,a : ) =p(s τ+1:t+1 ,y τ:t |a τ: ,s τ ),(27) holding for allτ∈Nwitht i ≤τ≤t. Note thatS 0 is an element of the pastS :t−1 , so we can multiply these together to obtain p(s t i +1:τ ,y t i :τ−1 |a t i :τ−1 ,s t i )p(s τ+1:t+1 ,y τ:t |a τ: ,s τ ) =p(s t i +1:τ ,y t i :τ−1 |a t i : ,s t i )p(s τ+1:t+1 ,y τ:t |y t i :τ−1 ,s t i :τ ,a t i : ) =p(s t i +1:t+1 ,y t i :t |a t i : ,s t i ).(28) Using this relation recursively, one can find that p(s :t+1 ,y :t ,|a : ,s 0 ) (c) =p(s :1 ,y 0 |a 0 ,s 0 )p(s 2:t+1 ,y 1:t |a 1: ,s 1 ) (d) =p(s 1 ,y 0 |a 0 ,s 0 )p(s 2 ,y 1 |s 1 ,a 1 )p(s 3:t+1 ,y 2:t |s 2 ,a 2: ) =... = t Y τ=0 p(s τ+1 ,y τ |s τ ,a τ ),(29) where (c) is obtained by usingt i = 0andτ= 1, (d) by usingt i = 1andτ= 2, and so on. By comparing with Lemma 2, this means that condition (2) implies condition (1). Let us now show that condition (1) implies condition (2). Condition (1) implies that p(s t i +1:t f ,y t i :t f −1 ,|a t i : ,s t i ) = t f Y τ=t i p(s t+1 ,y t |s t ,a t ) =   t f Y j=t p(s j+1 ,y j |s j ,a j )   t Y k=t i p(s k+1 ,y k |s k ,a k ) ! =p(s t+1:t f ,y t:t f −1 |a t:t f −1 ,s t )p(s t i +1:t ,y t i :t−1 |a t i :t−1 ,s t i ). (30) This can be used to show that p(s t i +1:t ,y t i :t−1 |a t i :t−1 ,s t i ) = X s t+1:t f y t:t f −1 p(s t+1:t f ,y t:t f −1 |a t:t f −1 ,s t )p(s t i +1:t ,y t i :t−1 |a t i :t−1 ,s t i ) = X s t+1:t f y t:t f −1 p(s t i +1:t f ,y t i :t f −1 ,|a t i : ,s t i ) =p(s t i +1:t ,y t i :t−1 ,|a t i : ,s t i ),(31) which impliesI(S t i +1:t ,Y t i :t−1 ;A t: |A t i :t−1 ,S t i ) = 0. To prove the second information equality, one can divide both sides of Eq. (30) byp(s t i +1:t ,y t i :t−1 |a t i :t−1 ,s t i )to obtain p(s t+1:t f ,y t:t f −1 |a t:t f −1 ,s t ) = p(s t i +1:t f ,y t i :t f −1 ,|a t i : ,s t i ) p(s t i +1:t ,y t i :t−1 |a t i :t−1 ,s t i ) =p(s t+1:t f ,y t:t f −1 ,|y t i :t−1 ,s t i :t ,a t i : ).(32) Given thatt i andt f are arbitrary, this implies thatI(S t+1: ,Y t: ;Y :t−1 ,S :t ,A :t−1 |A t: ,S t ) = 0. E Proof of Lemma 3 Lemma 7.An interface is fully observable if and only ifp(y t+1 |y :t ,a : ) =p(y t+1 |y t ,a t ), and is memoryless if and only ifp(y :t |a : ) = Q t τ=0 p(y τ |a τ ). Proof.To prove the first part of the lemma, one can use condition (1) in Lemma 6 which im- plies thatS t =Y t yields a transducer if and only ifp(y τ ,y τ+1 |y :τ ,a :τ ) =p(y τ ,y τ+1 |y τ ,a τ ) = p(y τ+1 |y τ ,a τ ). To prove the second part of the lemma, note that an interface satisfiesp(y :t |a : ) = Q t τ=0 p(y τ |a τ )if and only ifS t = 0yields a factorisation ofp(y :t |a : )as in Eq. (4). This shows that S t = 0is the state of a transducer presentation ofI(Y|A)if and only if the interface is memoryless. F Relationship between transducers and POMDPs A POMDP is a tuple(S,A,O,τ,μ,ρ)in whichScorrespond to states of the world,Athe action space,Othe observation space, and the probability kernelsτ:S×A→∆(S),μ:S →∆(O), and ρ:S ×A →∆(R)specify the world dynamics, observation map, and reward function (Kaelbling et al., 1998). Under a POMDP, the joint dynamics satisfy Eq. (4), which — thanks to condition (1) in Lemma 6 — is sufficient to show that the POMDP induces a transducer. This, together with Lemma 2, implies that the processS t in a POMDP is a world model, in the sense that it satisfies the conditions in Def. 2. Note also that the standard presentaiton of POMDPs correspond to a transducer whose kernel allows the following factorisation: p(s t+1 ,y t |s t ,a t ) =τ(s t+1 |s t ,a t )μ(o t |s t )ρ(r t |s t ,a t ).(33) This corresponds to a I-O Moore transducer, as defined in Sec. 3.3. The different types of transducers are illustrated in Figure 5. S 0 S 1 S 2 A 0 A 1 Y 0 Y 1 a) κ τ (y, ̃s|a,s) S 0 S 1 S 2 A 0 A 1 Y 0 Y 1 b) κ τ (y, ̃s|a,s) =μ τ (y|a,s)ν τ ( ̃s|a,s) S 0 S 1 S 2 A 0 A 1 Y 0 Y 1 c) κ τ (y, ̃s|a,s) =μ τ (y|s)ν τ ( ̃s|y,a,s) S 0 S 1 S 2 A 0 A 1 Y 0 Y 1 d) κ τ (y, ̃s|a,s) =μ τ (y|s)ν τ ( ̃s|a,s) Figure 5: Illustration of different types of transducers: Mealy transducers (a), output-Moore trans- ducers (b), input-Moore transducer (c), and I-O Moore transducer. G Proof of Lemma 4 Proof.Let considerS t =H t−1 andS 0 = 0. To prove thatS t yields a transducer presentation of I(Y|A), we will use condition (1) from Lemma 6. For this, note that p(s t+1 ,y t |s :t ,h :t−1 ,a t: ) =p(s t+1 ,y t |h :t−1 ,a t: ) =p(s t+1 |h :t ,a t+1: )p(y t |h :t−1 ,a t: ).(34) Let us develop each of those terms separately. First, one can find that p(s t+1 |h :t ,a t+1: ) =p(s t+1 |s t ,h t ,a t+1: ) =δ (s t ,h t ) s t+1 =p(s t+1 |s t ,h t ),(35) whereδ b a is the Kroneker delta. Similarly, the second term can be developed as follows: p(y t |h :t−1 ,a t: ) = p(y :t |a : ) p(y :t−1 |a : ) (a) = p(y :t |a :t ) p(y :t−1 |a :t ) =p(y t |y :t−1 ,a :t ) =p(y t |s t ,a t ),(36) where (a) uses the fact that the interface is anticipation-free. Finally, by combining Eq. (34), Eq. (35), and Eq. (36) one finds that p(s t+1 ,y t |s :t ,h :t−1 ,a t: ) =p(s t+1 |s t ,h t )p(y t |s t ,a t ) =p(s t+1 ,y t |s t ,a t ),(37) which shows that condition (1) from Lemma 6 is satisfied. H Proof of Lemma 5 Proof.ConsiderS ′ t =φ(S t )a reduction of the memory stateS t of a transducer. Then p(y :t s ′ :t+1 |a : ) = t X τ=0 X s τ ∈S φ(s τ )=s ′ τ p(y :t s :t+1 |a : ) (a) = t X τ=0 X s τ ∈S φ(s τ )=s ′ τ p(s 0 ) t Y τ=0 p(y τ |s τ ,a τ )p(s τ+1 |s τ ,h τ ) (b) = t X τ=0 X s τ ∈S φ(s τ )=s ′ τ p(s 0 ) t Y τ=0 p(y τ |s ′ τ ,a τ )p(s τ+1 |s τ ,h τ ) = t−1 X τ=0 X s τ ∈S φ(s τ )=s ′ τ p(s 0 ) t−1 Y τ=0 p(y τ |s ′ τ ,a τ )p(s τ+1 |s τ ,h τ )p(y t |s ′ t ,a t ) X s t+1 ∈S φ(s t+1 )=s ′ t+1 p(s t+1 |s t ,h τ ) (c) = t−1 X τ=0 X s τ ∈S φ(s τ )=s ′ τ p(s 0 ) t−1 Y τ=0 p(y τ |s ′ τ ,a τ )p(s τ+1 |s τ ,h τ )p(y t |s ′ t ,a t )p(s ′ t+1 |s ′ t ,h τ ) (d) =... = " X s 0 ∈S φ(s 0 )=s ′ 0 p(s 0 ) # t Y τ=0 p(y τ |s ′ τ ,a τ )p(s ′ τ+1 |s ′ τ ,h τ ) (e) =p(s ′ 0 ) t Y τ=0 p(y τ |s ′ τ ,a τ )p(s ′ τ+1 |s ′ τ ,h τ ).(38) Above, (a) uses thatS t is the memory state of a transducer, (b) uses property (i) of homomorphisms (see Def. 4), (c) and (e) uses property (i) of homomorphisms, and (d) denotes that the same steps of previous equations are done iteratively. Finally, Eq. (38) together with Lemma 1 confirm thatS ′ t yields a valid transducer for the same interface. I Proof of Prop. 1 Proof.Let’s first assume that the mappingφinduces a reduction of the world modelS t intoS ′ t , and define the equivalence relationBsuch thats∼s ′ whenφ(s) =φ(s ′ ). In this setting, let’s prove thatBis a bisimulation. For this, one can note that ifs∼s ′ then one can use the first property of homomorphims to find that Pr(Y t =y|S t =s,A t =a) = Pr(Y t =y|S ′ t =φ(s),A t =a) = Pr(Y t =y|S ′ t =φ(s ′ ),A t =a) = Pr(Y t =y|S t =s ′ ,A t =a).(39) Additionally, using the second property one finds that X s ′ ∈[ ̃s] Pr S t+1 =s ′ |S t =s,H t = (y,a) = Pr S ′ t+1 = ̃s|S ′ t =φ(s),H t = (y,a) = Pr S ′ t+1 = ̃s|S ′ t =φ(s ′ ),H t = (y,a) = X s ′ ∈[ ̃s] Pr S t+1 =s ′ |S t =s ′ ,H t = (y,a) ,(40) where[ ̃s] =s∈ S:φ(s) = ̃s. Together, these two results show thatBis a bisimulation in the sense of Def. 5. For proving the converse statement, let’s assume thatB⊆ S × Sis a bisimulation, and define φ(s) = [s]as a function that maps each states∈ Sinto its equivalence class according toB. Let’s prove thatS t φ −→φ(S t ) = [S t ]is a reduction. First, forBbeing a bisimulation implies that Pr Y t =y|S t =s,A t =a = Pr Y t =y|S t =s ′ ,A t =a for any(s,s ′ )∈B, which in turn implies that Pr Y t =y|φ(S t ) = [s],A t =a = Pr Y t =y|S t =s,A t =a ,(41) showing thatφsatisfies the first property of homomorphisms. Furthermore, if(s,s ′ )∈Bthen Pr φ(S t+1 ) = [ ̃s]|S t =s,H t = (y,a) = X s ′ ∈[ ̃s] Pr S t+1 =s ′ |S t =s,H t = (y,a) = X s ′ ∈[ ̃s] Pr S t+1 =s ′ |S t =s ′ ,H t = (y,a) = Pr φ(S t+1 ) = [ ̃s]|S t =s ′ ,H t = (y,a) ,(42) which implies that Pr φ(S t+1 ) = [ ̃s]|S t =s,H t = (y,a) = Pr φ(S t+1 ) = [ ̃s]|φ(S t ) = [s],H t = (y,a) .(43) Using this, one can finally show that Pr φ(S t+1 ) = [ ̃s]|φ(S t ) = [s],H t = (y,a) = X s ′ ∈[ ̃s] Pr S t+1 =s ′ |φ(S t ) = [s],H t = (y,a) = X s ′ ∈[s] Pr S t+1 = ̃s|S t =s,H t = (y,a) (44) J Algorithms to reduce a transducer For a given transducer with a finite number of possible memory states, one can reduce the corre- sponding world model as follows: 1. Compute a singular value decompositionU m =UΛV ⊺ , whereU∈R m×m andV∈R n×n are unitary matrices of singular vectors andΛ∈ R m×n is a diagonal matrix with Rank(V m ) =r non-zero elements. 2. Collect therleft singular vectors associated with non-zero singular values, and create the matrix C= [v 1 ,...,v n ]∈R n×r . 3. UseCas a transformation matrix to define the new world states, and calculate the resulting quasi-stochastic matrices. It can be shown that the resulting representation is minimal as in Def. 4. For more details on this procedure, see (Balasubramanian, 1993, Sec. 3) and also (Huang et al., 2015, Algorithm 1). K Proof of Prop. 2 Proof.To prove the first part of the proposition, let’s consider the Bayesian beliefs of the mem- ory stateS t of a transducer(S t ,Y t ,A t )as given by b t (s t ) =p(s t |h :t−1 ). Let’s also define the postdictive beliefs as d t =p(s t |h :t ). Then, their update can be calculated as follows: b t+1 (s t+1 ) = X s t p(s t+1 |s t ,h :t )p(s t |h :t ) (i) = X s t p(s t+1 |s t ,h t )d t (s t ),(45) where (i) uses the Markovianity of the memory states following Lemma 2. In a similar way, one can find that d t (s t ) = p(s t ,h :t−1 ,a t ,y t ) p(h :t−1 ,a t ,y t ) = p(y t |s t ,h :t−1 ,a t )p(s t |h :t−1 ,a t ) p(y t |h :t−1 ,a t ) (i) = p(y t |s t ,a t ) Z ′ b t (s t )(46) withZ ′ a normalising constant. Above, (i) uses thatS t is a world model together with the fact that that p(s t |h :t−1 ,a t ) = p(s t ,h :t−1 ,a t ) p(h :t−1 ,a t ) = p(a t |s t ,h :t−1 )p(s t |h :t−1 ) p(a t |h :t−1 ) =p(s t |h :t−1 ),(47) where the last equality holds due to the fact that actions depend on histories and not on states, and hencep(a t |s t ,h :t−1 ) =p(a t |h :t−1 ). Then, combining Eq. (45) and Eq. (46) one can find that b t+1 (s t+1 ) = ˆ f(b t ,y t ,a t ) : = 1 Z ′ X s t p(s t+1 |s t ,h t )p(y t |s t ,a t )b t (s t ),(48) proving the first part of the proposition. To prove the second part of the proposition, first note that Eq. (48) combined with condition (1) in Lemma 6 imply that(B t ,A t ,Y t )is a belief transducer with unifilar kernel given by ˆκ τ (y, ̃ b|a,b) =δ ˆ f τ (b,y,a) ̃ b X ̃s,s∈S κ τ (y, ̃s|a,s)b(s) =δ ˆ f τ (b,y,a) ̃ b 1 ⊺ ·T (y|a) τ ·b ,(49) whereT (y|a) τ is the linear operator defined as in Eq. (3), and the second term corresponds to the probability of emittingygiven b, i.e. 1 ⊺ ·T (y τ |a τ ) τ ·b τ = X s τ ∈S p(y τ |s τ ,a τ )b τ (s τ ).(50) Now, let’s consider a transducer(S t ,A t ,Y t )and the belief transducer of predictive Bayesian beliefs (B t ,A t ,Y t ). Given that(S t ,A t ,Y t )is a transducer, then the update rule given by Eq. (48) can be re-written as ˆ f τ (b,y,a) = T (y|a) τ ·b 1 ⊺ ·T (y|a) τ ·b .(51) Moreover, for a given sequencesy :τ ,a :τ and beliefs b 0 ,...,b τ following this updating rule, this can be applied recursively yielding ˆ f τ (b τ ,y τ ,a τ ) = T (y τ |a τ ) τ ·b τ 1 ⊺ ·T (y τ |a τ ) τ ·b τ = T (y τ−1:τ |a τ−1:τ ) τ−1 ·b τ−1 1 ⊺ ·T (y τ−1:τ |a τ−1:τ ) τ−1 ·b τ−1 =... = T (y :τ |a :τ ) 0 ·b 0 1 ⊺ ·T (y :τ |a :τ ) 0 ·b 0 ,(52) where we are usingT (y t:t ′ |a t:t ′ ) t : = Q t ′ τ=t T (y τ |a τ ) τ as a shorthand notation. Note that, similarly as Eq. (49), the denominator in Eq. (52) corresponds to 1 ⊺ ·T (y :τ |a :τ ) 0 ·b 0 =p(y :τ |a :τ ,s 0 )b 0 (s 0 ).(53) Finally, combining all these expressions one can directly calculate what is the result of successive applications of the kernel of a predictive Bayesian belief transducer: X b :t+1 t Y τ=0 ˆκ τ (y τ ,b τ+1 |a τ ,b τ ) (a) = X b :t+1 t Y τ=0 δ ˆ f(b τ ,y τ ,a τ ) b τ+1 1 ⊺ ·T (y τ |a τ ) τ ·b τ (b) = t Y τ=0 1 ⊺ ·T (y τ |a τ ) τ · T (y :τ−1 |a :τ−1 ) 0 ·b 0 1 ⊺ ·T (y :τ−1 |a :τ−1 ) 0 ·b 0 =1 ⊺ ·T (y :t |a :t ) 0 ·b 0 (c) = X s 0 p(y :τ |a :τ ,s 0 )b 0 (s 0 ).(54) Above, (a) uses Eq. (49), (b) resolves the Dirac delta with the summation and uses Eq. (52), and (c) uses Eq. (53). To conclude, one can notice that if b 0 (s 0 = Pr(S 0 =s 0 ), then this shows that (S t ,A t ,Y t )and(B t ,A t ,Y t )generate the same interface. L Proof of Prop. 3 Before the proof, let us note that the kernel of I-O Moore transducers can be re-organised as p(y :t ,s :t+1 |a :t ) =p(s 0 ) t Y τ=0 κ τ (y τ ,s τ+1 |s τ ,a τ ) =p(s 0 ) t Y τ=0 μ τ (y τ |s τ )ν τ (s τ+1 |s τ ,a τ ) =p(s 0 ) t Y τ=0 κ S τ (y τ ,s τ |s τ−1 ,a τ−1 ),(55) whereκ S τ (y t+1 ,s t+1 |s t ,a t )is a ‘time-shifted’ kernel defined as κ S τ (y, ̃s|s,a) : = ( μ τ (y| ̃s)ν τ−1 ( ̃s|s,a)ifτ≥1, μ 0 (y|s)ifτ= 0. This means that I-O Moore transducers yield two associated kernels: the standard oneκand the time-shifted oneκ S , and — crucially — both generate the interface. Following Eq. (3), let’s defined the time-shifted linear operators T ′ (y|a) τ : = n X i=1 n X j=1 κ S τ (y,s i |a,s j )e i e ⊺ j ,(56) with the understanding that forτ= 0then T ′ (y|a) 0 : = n X i=1 n X j=1 μ 0 (y|s j )e i e ⊺ j .(57) Proof.To derive the update rule of postdictive beliefs, one can combine Eq. (45) and Eq. (46) to find that d t+1 (s t+1 ) = p(y t+1 |s t+1 ,a t+1 ) Z ′ X s t p(s t+1 |s t ,h t )d t (s t ).(58) Furthermore, if the transformer is I-O Moore, then p(y t+1 |s t+1 ,a t+1 ) =p(y t+1 |s t+1 )andp(s t+1 |s t ,h t ) =p(s t+1 |s t ,a t ).(59) Using these relationships in Eq. (58) directly yields the desired update rule. To prove the second part of the proposition, let us first note for I-O Moore transducers then p(y t+1 |s t+1 ) =μ t+1 (y t+1 |s t+1 )andp(s t+1 |s t ,a t ) =ν t (s t+1 |s t ,a t ). Using this, the update rule can be re-written as d τ =f ′ τ (y τ ,a τ−1 ,d τ−1 )withf ′ given by h f ′ τ (y,a,d) i (s) : = μ τ (y|s) Z ′ X s ′ ν τ−1 (s|s ′ ,a)d(s ′ ) = 1 Z ′ X s ′ κ S τ (y,s|a,s ′ )d(s ′ ).(60) Moreover, comparing this with Eq. (48), one can find that f ′ τ (y,a,d) = T ′ (y|a) τ ·d 1 ⊺ ·T ′ (y|a) τ ·d ,(61) and following a similar derivation one finds that for sequencesy :τ ,a :τ and beliefs d 0 ,...,d τ f ′ τ (y τ ,a τ−1 ,d τ−1 ) = T ′ (y τ |a τ−1 ) τ ·d τ−1 1 ⊺ ·T ′ (y τ |a τ−1 ) τ ·d τ−1 =... = T (y :τ |a :τ−1 ) 0 ·d 0 1 ⊺ ·T (y :τ |a :τ−1 ) 0 ·d 0 ,(62) where we useT ′ (y t:t ′ |a t−1:t ′ −1 ) t : = Q t ′ τ=t T ′ (y τ |a τ−1 ) τ as a shorthand notation. With these tools at hand, let’s note that Eq. (9) combined with condition (1) in Lemma 6 imply that (D t ,A t ,Y t )is a belief transducer with unifilar kernel given by ˆκ S τ (y, ̃ d|a,d) =δ f ′ τ (y,a,d) ̃ d X s∈S κ S τ (y, ̃s|a,s)d(s) =δ f ′ τ (y,a,d) ̃ d 1 ⊺ ·T ′ (y|a) τ ·d ,(63) being analogous to the result found in Eq. (49). Then, combining all these expressions one can directly calculate what is the result of successive applications of the kernel of a predictive Bayesian belief transducer: X d :t t Y τ=0 ˆκ S τ (y τ ,d τ |a τ−1 ,d τ−1 ) (a) = X d :t t Y τ=0 δ f ′ (y τ ,a τ−1 ,d τ−1 ) d τ 1 ⊺ ·T ′ (y τ |a τ−1 ) τ ·d τ−1 (b) = t Y τ=0 1 ⊺ ·T (y τ |a τ−1 ) τ · T (y :τ−1 |a :τ−2 ) 0 ·d 0 1 ⊺ ·T (y :τ−1 |a :τ−2 ) 0 ·d 0 =1 ⊺ ·T (y :t |a :t−1 ) 0 ·d 0 = X s 0 p(y :τ |a :τ−1 ,s 0 )d 0 (s 0 ).(64) Above, (a) uses Eq. (63), (b) resolves the Dirac delta with the summation and uses Eq. (62). To conclude, note that this shows that(S t ,A t ,Y t )and(D t ,A t ,Y t )generate the same interface when d 0 (s 0 ) = Pr(S 0 =s 0 ). Before finishing, note that Cor. 1 follows directly from recognising Eq. (45) and Eq. (46) as the equa- tions related to ‘predict’ and ‘update’ steps in Kalman and Bayesian filtering (Särkkä & Svensson, 2023) corresponding to b t−1 =p(s t−1 |h :t−1 ) predict −→d t =p(s t |h :t−1 ) update −→b t =p(s t |h :t ).(65) M Proof of Prop. 4 This result was originally proven in (Barnett & Crutchfield, 2015, Prop. 2). Here we provide an alternative proof that leverages the link established between transducer homomorphisms and bisim- ulations. Proof.We first show that the equivalence class defined by Eq. (10) is a bisimulation of the world modelS t =H :t−1 . For this, we first show that the coarse-graining defined by Eq. (10) is a bisimu- lation — i.e., it it satisfies the two properties outlined in Def. 5. Condition (i) follows from Eq. (10) directly, since it only considers futures of lengthL= 1. A proof that Condition (i) follows from the fact that the dynamics of the equivalence classes are conditionally Markovian given the actions, which has been shown in (Barnett & Crutchfield, 2015, Prop. 6). With this, the desired result follows directly from noting thatS t =H :t−1 is always a valid trans- ducer ofI(Y|A)(Lemma 4), and that the bisimulation of a transducer always yields a valid trans- ducer (Prop. 1) that generates the same interface (Lemma 5). N Proof of Theorem 3 This proof is a direct extension of (Barnett & Crutchfield, 2015, Lemma 1), which focuses on ‘rival partitions’ rather than predictive transducers. The core idea of the proof is that, for a given predictive transducer with memoryR t ∈ R, one can build an equivalence relation inR×Rinduced by a coarse-graining mapεdefined as ε ′ t (r) =ε ′ t (r ′ )iffPr(Y t:t ′ |A t:t ′ ,R t =r) = Pr(Y t:t ′ |A t:t ′ ,R t =r ′ )∀t ′ ≥t.(66) Then, one can show that ifR t is a predictive world model, thenε ′ t (R t )is isomorphic to the memory states of theε-transducer. The full proof of this is given next, after presenting the following lemma. Lemma 8.A predictive transducer(S t ,A t ,Y t )satisfies p(y t:t ′ |a t:t ′ ,s t ) =p(y t:t ′ |a t:t ′ ,s t ,h :t−1 ) =p(y t:t ′ |a t:t ′ ,h :t−1 )∀t ′ ≥t.(67) Proof.By definition (see Sec. 5.1) a predictive transducer has memory statesS t that satisfy the conditionI(Y t: ,S t |H :t−1 ,A t: ) = 0, which implies that for allt ′ ≥t p(y t:t ′ |a t:t ′ ,s t ,h :t−1 ) (i) =p(y t:t ′ |a t: ,s t ,h :t−1 ) =p(y t:t ′ |a t: ,h :t−1 ) (i) =p(y t:t ′ |a t:t ′ ,h :t−1 ),(68) holding wheneverp(y t:t ′ ,s t |a t:t ′ ,h :t−1 )̸= 0, which means that these events are compatible. Above, (i) and (i) use the fact that the interface and world models are non anticipatory. Addi- tionally, using the properties of transducers one can show that p(y t:t ′ |a t:t ′ ,s t ,h :t−1 ) = X s t+1:t ′ +1 p(y t:t ′ ,s t+1:t ′ +1 |a t:t ′ ,s t ,h :t−1 ) = X s t+1:t ′ +1 t ′ Y τ=t p(y τ ,s τ+1 |a t:t ′ ,s t:τ ,y t:τ−1 ,h :t−1 ) (i) = X s t+1:t ′ +1 t ′ Y τ=t p(y τ ,s τ+1 |a t:t ′ ,s t:τ ,y t:τ−1 ) = X s t+1:t ′ +1 p(y t:t ′ ,s t+1:t ′ +1 |a t:t ′ ,s t ) =p(y t:t ′ |s t ,a t:t ′ )(69) for allt ′ > t, where (i) uses condition (1) from Lemma 6. Finally, combining Eq. (68) and Eq. (69) gives the desired result. Proof of Theorem 3.ConsiderS t the memory state of a transducer. Let us first note that p(y t:t ′ |a t:t ′ ,h :t−1 ) = X j∈J α j p(y t:t ′ |a t:t ′ ,s t ,h :t−1 )(70) whereα j =p(s t |h :t−1 ,a t:t ′ ) =p(s t |h :t−1 ), where the second equality holds becauseS t is a world model (property (2) in Def. 2). Above, we are using a suitable set of indicesJcorresponding to the possible values ofs t , which satisfy P j∈J α j = P s t ∈S p(s t |h :t−1 ) = 1. Given that the Shannon entropy is concave, Eq. (70) implies H h p(y t:t ′ |a t:t ′ ,s t ) i ≥ X j∈J α j H h p(y t:t ′ |a t:t ′ ,s t ,h :t−1 ) i ,(71) whereH[p]is a shorthand notation for the entropy of a variable with distributionp. Moreover, given thatHis strictly concave, Eq. (71) turns into an equality if and only if allα j ’s are either 0 or 1 for allj∈J. Now, considerS t the memory state of a predictive transducer andM t =ε(H :t−1 )the memory state of theε-transducer, as determined by the coarse-graining mapping defined in Eq. (10). Note that (M t ,A t ,Y t )is a predictive transducer, and hence Lemma 8 can be used yielding p y t:t ′ |a t:t ′ ,ε(h :t−1 ) =p(y t:t ′ |a t:t ′ ,ε(h :t−1 ),h :t−1 ) =p(y t:t ′ |a t:t ′ ,h :t−1 )(72) for allt ′ > t. Then, using Lemma 8 this time on(S t ,A t ,Y t ), one can find that Pr Y t:t ′ |A t:t ′ ,M t−1 =ε(h :t−1 ) = Pr(Y t:t ′ |A t:t ′ ,H :t−1 =h :t−1 ) = Pr(Y t:t ′ |A t:t ′ ,S t =s t ,H :t−1 =h :t−1 ) = Pr(Y t:t ′ |A t:t ′ ,S t =s t ).(73) This implies that for each equivalence classε t (h :t−1 ) = [h :t−1 ] ∼ ε there exists at least ones t ∈ S such thatPr Y t:t ′ |A t:t ′ ,M t−1 =ε(h :t−1 ) = Pr(Y t:t ′ |A t:t ′ ,S t =s t ). Moreover, the previous equalities imply that ifS t is a predictive transducer, then Eq. (71) necessarily becomes an equality. This, in turn, implies thatα j =p(s t |h :t−1 )is 1 for alls t ∈Sfor which Pr(Y t:t ′ |A t:t ′ ,S t =s t ) = Pr(Y t:t ′ |A t:t ′ ,S t =s t ,H :t−1 =h :t−1 ) = Pr(Y t:t ′ |A t:t ′ ,H :t−1 =h :t−1 )(74) holds, or 0 otherwise. This implies that the mappingε ′ given by ε ′ (s t ) =ε ′ (s ′ t )⇔Pr(Y t:t ′ |A t:t ′ ,S t =s t ) = Pr(Y t:t ′ |A t:t ′ ,S t =s ′ t )∀t ′ ≥t(75) satisfiesε ′ t (S t ) =ε t (H :t−1 ) =M t . O Comparing the reduction of general vs predictive transducers Building upon the discussion about the canonical dimension of a transducer (see Eq. (5)), let us focus on transducers with finite memory states (i.e.|S|=n) and consider the matrixWwhose columns given by the vectorsw(h :t )∈R n of probabilities of generatingy :t givena :t when starting from different world states, so that itsk-th coordinate is[w(h :t )] k = Pr(Y :t =y :t |A :t =a :t ,S 0 =s k ) for all possible sequences whent=n−1(see (Cakir et al., 2021, Prop. 4.3)). Then, the coarse- grainingεdefined by Eq. (10) correspond to merging together all rows ofW t that are equal. In contrast, the canonical dimensiond(T)defined in Eq. (5) corresponds to the number of linearly independent rows. The crucial point is that, if a transducer with memory statesS t is predictive, then any coarse-grainingε(S t )will also be predictive. However, reductions via more general procedures to trim linearly dependent components may not be attainable via coarse-grainings. In particular, the matrixW t of anε-transducer may have linearly dependent rows, and reducing those would — due to Cor. 2 — necessary make the transducer to stop being predictive. It is interesting to note that the predictive beliefs of theε-transducer are isomorphic to the states of theε-transducer. However, theε-transducer is not the only machine whose MSP produces the ε-transducer — in fact, the MSP of any transducer without redundant states will produce the same. Finally, it is worth noting that a non-predictive finite world model may have an associateε-transducer that has infinite states filling the simplex of belief states with intricate fractal patterns (Jurgens & Crutchfield, 2021b). This fact makes techniques to study transducers at various degrees of resolu- tion (e.g., via approximate homomorphisms (Taylor et al., 2008) or bisimulation (Girard & Pappas, 2011), or using rate-distortion trade-offs (Marzen & Crutchfield, 2016)) particularly important. P A canonical retrodictive world model One can show that all anticipation-free transducer have one retrodictive transducer that is somehow dual toFunes the memorious(Lemma 4), and can be described asSunef the prophet, with knowledge of all possible sequence of future actions. To build the world model of this transducer, let us first denote asT A the regular tree with one root and where each node has one branch per element inA. Let us denote byN(T A )the nodes of the tree, and establish some operations: •μ:N(T A )→ A ∗ andν:N(T A )→ N(T A ) ∗ with() ∗ the Kleene operator, whereμ(v)and ν(v)returns a vector with all the branches and nodes in the path leading back fromvto the root, respectively. •π:N(T A )×A→N(T A ), whereπ(v,a)gives the descendent ofvconnected via brancha. •τ:N(T A )→N, whereτ(v)is the depth ofvin the tree. With all this, we are ready to define the world model. In general,S t ∈ Y T A are random variables that take values onT A -shaped sequences of symbols inY. Concretely,S 0 = Z v :v∈ N(T A ) withZ v ∈Ybeing random variables, whose joint distribution is given by Pr S 0 = (Z v :v∈T A ) := Y v∈T A Pr(Z v |Z ν(v) )(76) withZ ν(v) the vector of variables corresponding to nodes inν(v)and Pr(Z v =y|Z ν(v) =y :ν(v)−1 ) := Pr Y τ(v) =y|Y :τ(v)−1 =y :ν(v)−1 ,A :τ(v) =μ(v) (77) Then, the world’s dynamics are established recursively byp(s t+1 |s :t ,h : ) :=δ f(s t ,a t ) s t+1 so that S t+1 =f(S t ,A t )a.s., with the unifilar update established by S t+1 = (Z t+1 v :v∈T A )withZ t+1 v =Z t π(v,A t ) .(78) In summary, the world is first initialised at time zero by samplingS 0 , i.e. by samplingZ v for all v∈ T A — which stands to sampleY : for all possible sequences of actionsa : . After this, the world evolves deterministically by following the update rule given byf. Q Example of a non-reversible transducer Let us provide an example of how reversing a transducer can lead to a violation of the anticipation free condition. Let’s consider the so-called delay channel, for which the outputY t+1 is equal to the previous actionA t (Barnett & Crutchfield, 2015). This channel displays acausal behaviour when time reversed: somehow the actionA t determines the outcome at the previous time stepY t−1 , meaning that I(Y :t−1 ;A t: |A t−1: ) =I(Y t−1 ;A t |A t−1: ) =H(A t |A t−1: ),(79) which is nonzero if the entropy rate of the actions is nonzero. This leads to an interface that does not satisfy anticipation free, violating the conditions of Def. 1. R Reversing processes and proof of Theorem 4 Here we present an extended exposition of the conditions for reversing various types of stochastic processes. R.1 Reversing Markov processes Let’s start by considering a Markov processX t , so thatp(x t |x :t−1 ) =p(x t |x t−1 )for allt∈N. Then the reverse process (given byX t ,X t−1 ,...) is also Markov, as p(x t |x t+1:t ′ ) = p(x t:t ′ ) p(x t+1:t ′ ) = p(x t ) Q t ′ k=t+1 p(x k |x t:k−1 ) p(x t+1 ) Q t ′ j=t+2 p(x j |x t+1:j−1 ) = p(x t ) Q t ′ k=t+1 p(x k |x k−1 ) p(x t+1 ) Q t ′ j=t+2 p(x j |x j−1 ) = p(x t )p(x t+1 |x t ) p(x t+1 ) =p(x t |x t+1 ).(80) R.2 Reversing hidden Markov models Let’s now consider a general (i.e. Mealy (Riechers, 2016)) hidden Markov model, in which p(s t+1 ,y t |s :t ,y :t−1 ) =p(s t+1 ,y t |s t )holds for allt∈N. As for Markov chains, one can show that the reverse process is also an hidden Markov model, as p(s t ,y t |s t+1:t ′ +1 ,y t+1:t ′ ) = p(s t:t ′ +1 ,y t:t ′ ) p(s t+1:t ′ +1 ,y t+1:t ′ ) = p(s t ,y t ,s t+1 ) Q t ′ k=t+1 p(s k+1 ,y k |s t:k ,y t:k−1 ) p(s t+1 ,y t+1 ,s t+2 ) Q t ′ j=t+2 p(s j+1 ,y j |s t:j ,y t:j−1 ) = p(s t ,y t ,s t+1 ) Q t ′ k=t+1 p(s k+1 ,y k |s k ) p(s t+1 ,y t+1 ,s t+2 ) Q t ′ j=t+2 p(s j+1 ,y j |s j ) = p(s t ) Q t ′ k=t p(s k+1 ,y k |s k ) p(s t+1 ) Q t ′ j=t+1 p(s j+1 ,y j |s j ) = p(s t )p(s t+1 ,y t |s t ) p(s t+1 ) =p(s t ,y t |s t+1 ).(81) Note that in this case the structure is not perfectly time-symmetric, but could be described as ‘co- Mealy’ structure — as the time indices of the world are shifted. If the hidden Markov model is Moore (Riechers, 2016), so thatp(s t+1 ,y t |s :t ,y :t−1 ) = p(s t+1 |s t )p(y t |s t ), then a similar calculation leads to p(s t ,y t |s t+1:t ′ +1 ,y t+1:t ′ ) = p(s t )p(s t+1 ,y t |s t ) p(s t+1 ) = p(s t )p(s t+1 |s t )p(y t |s t ) p(s t+1 ) =p(s t |s t+1 )p(y t |s t ), (82) yielding another Moore hidden Markov model. R.3 Reversing transducers Using the previous calculations as a foundation, let’s now explore the reverse properties of a trans- ducer, wherep(s t+1 ,y t |s :t ,y :t−1 ,a : ) =p(s t+1 ,y t |s t ,a t )holds (see App. D). Using this property, it is direct to see that p(y :t ,s :t+1 |a : ) =p(s 0 ) t Y τ=0 p(y τ ,s τ+1 |y :τ−1 ,s :τ ,a : ) =p(s 0 ) t Y τ=0 p(y τ ,s τ+1 |s τ ,a :t ) =p(y :t ,s :t+1 |a :t ),(83) showing that transducers naturally impose some arrow of time over actions. 10 Now, let’s consider expressingp(y :t ,s :t+1 |a : )factor backwards as follows: p(y :t ,s :t+1 |a : ) =p(y :t ,s :t+1 |a :t ) =p(s t+1 |a :t ) t Y τ=0 p(y τ ,s τ |y τ+1:t ,s τ+1:t+1 ,a :t ).(84) This shows that we need to looks for ways of simplifying expressions of the form p(y τ ,s τ |y τ+1:t ,s τ+1:t+1 ,a :t ). Using the properties of transducers, one can show that p(s τ ,y τ |s τ+1:t+1 ,y τ+1:t ,a :t ) = p(s τ:t+1 ,y τ:t ,a :t ) p(s τ+1:t+1 ,y τ+1:t ,a :t ) = p(s τ ,y τ ,s τ+1 ,a :t ) Q t k=τ+1 p(s k+1 ,y k |s τ:k ,y τ:k−1 ,a :t ) p(s τ+1 ,y τ+1 ,s τ+2 ,a :t ) Q t j=τ+2 p(s j+1 ,y j |s τ:j ,y τ:j−1 ,a :t ) = p(s τ ,y τ ,s τ+1 ,a :t ) Q t k=τ+1 p(s k+1 ,y k |s k ,a k ) p(s τ+1 ,y τ+1 ,s τ+2 ,a :t ) Q t j=τ+2 p(s j+1 ,y j |s j ,a j ) = p(s τ ,a :t ) Q t k=τ p(s k+1 ,y k |s k ,a k ) p(s τ+1 ,a :t ) Q t j=τ+1 p(s j+1 ,y j |s j ,a j ) = p(s τ |a :t )p(s τ+1 ,y τ |s τ ,a τ ) p(s τ+1 |a :t ) = p(s τ |a :t )p(s τ+1 |s τ ,a τ ) p(s τ+1 |a :t ) p(y τ |s τ+1 ,s τ ,a τ ) =p(s τ |s τ+1 ,a :t )p(y τ |s τ ,s τ+1 ,a τ ).(85) This shows that, for any transducer(S t ,A t ,Y t ), we can always ‘run it back’ using the whole se- quence of actions to reproduce the interface, as shown by the factorisation p(y :t ,s :t+1 |a : ) =p(s t+1 |a :t ) t Y τ=0 p(s τ |s τ+1 ,a :t )p(y τ |,s τ ,s τ+1 ,a τ ).(86) 10 Note that the derivation uses the fact thatp(s 0 |a : ) =p(s 0 ), and it wouldn’t work for other initial point where this does not hold. If the transducer satisfies the additional conditionp(s τ |s τ+1 ,a :t ) =p(s τ |s τ+1 ,a τ ), or equivalently the information relationI(S τ ;A 0:τ−1 A τ+1:t |S τ+1 ,A τ ) = 0, then one obtains a reverse factorisa- tion of the form p(y :t ,s :t+1 |a : ) =p(s t+1 |a :t ) t Y τ=0 p(y τ ,s τ |s τ+1 ,a t ).(87) The reverse kernelκ R can be expressed in terms of the forward one via the following derivation: κ R τ (y τ ,s τ |a τ ,s τ+1 ) =p(y τ ,s τ |a :τ ,s τ+1 ) = p(s τ |a τ ) p(s τ+1 |a τ ) p(y τ ,s τ+1 |a :τ ,s τ ),(88) which implies that for reversible transducers then κ R τ (y,s|a, ̃s) = Pr(S τ =s|A τ =a) Pr(S τ+1 = ̃s|A τ =a) κ τ (y, ̃s|a,s).(89) It is interesting to note that that replacing Eq. (89) in Eq. (87) could give the impression of not recovering Eq. (4); however, under the assumption of transducer reversibility it does. To confirm this, let us first check that p(s τ |s τ+1 ,a :t ) = p(s τ ,s τ+1 |a :t ) p(s τ+1 |a :t ) = p(s τ |a :t )p(s τ+1 |s τ ,a :t ) p(s τ+1 |a :t ) ,(90) and similarly p(s τ |s τ+1 ,a τ ) = p(s τ ,s τ+1 |a τ ) p(s τ+1 |a τ ) = p(s τ |a τ )p(s τ+1 |s τ ,a τ ) p(s τ+1 |a τ ) .(91) Now, ass τ is the memory state of a transducer thenp(s τ+1 |s τ ,a :t ) =p(s τ+1 |s τ ,a t ). Putting all together, the conditionp(s t |s t+1 ,a :t ) =p(s t |s t+1 ,a t )can be seen to imply the following: p(s τ |a :t ) p(s τ+1 |a :t ) = p(s τ |a τ ) p(s τ+1 |a τ ) .(92) This equality allows us to confirm that replacing Eq. (13) in Eq. (87) indeed recovers Eq. (4). In conclusion, ifp(s τ |s τ+1 ,a :t ) =p(s τ |s τ+1 ,a τ )holds then one can generate the interface via the following procedure: 1. Initialise the state of the world at timet+ 1by samplingp(s t+1 |a :t ). Alternatively, for counter- factual analyses pick an arbitrary world states∈Sand setS t+1 =s. 2. Run the transducer backwards using the kernelκ R (y τ ,s τ |a τ ,s τ+1 ) =p(y τ ,s τ |s τ+1 ,a t ). R.4 Effect of action-unifiliarity A transducer is action-unifilar ifp(s τ+1 |s τ ,a τ ) =δ f(s τ ,a τ ) s τ+1 withS τ+1 =f(S τ ,A τ )a function. If the dynamics of the transducer is action-counifilar, meaning thatp(s τ |s τ+1 ,a τ ) =δ r(s τ+1 ,a τ ) s τ where S τ =r(S τ+1 ,A τ ), then we necessarily safisfy the condition of being reversiblep(s τ |s τ+1 ,a :τ ) = p(s τ |s τ+1 ,a τ ). However, this is much more restrictive if than action-unifilarity if we insist that every world-state can accept every action P s τ+1 p(s τ+1 |s τ ,a τ ) = 1. Using Bayes rule p(s τ+1 |s τ ,a τ ) =p(s τ |s τ+1 ,a τ ) p(s τ+1 |a τ ) p(s τ |a τ ) =δ r(s τ+1 ,a τ ) s τ p(s τ+1 |a τ ) p(s τ |a τ ) ,(93) we see that there is one nonzero transition for every combination of states τ+1 and actiona τ . We can think of each transition as an edge betweens states labeled with the action, like a driven transition. This means that there are|A|transitions per states τ . The condition that every world-state can accept every action means that every state has at least one outgoing edge for every action. If this were a non- unifilar model, this would mean that there an action that had two or more outgoing edges. However, that would mean that the total number of edges in the automata is larger than|A||S|, which is a contradiction. Thus, each states τ has exactly one outgoing edge for each actiona τ , meaning that the next state is a function of these states S τ+1 =f(S τ ,A τ ).(94) Therefore, every action-counifilar transducer is also action-unifilar, meaning that it obeys a type of reversibility. S Proof of Theorem 5 For convenience, in this proof we use Dirac’s notation, which uses bras like⟨v|and kets like|v⟩to express row and column vectors respectively. If we are describing vectors and matrices over states S, then we can use an orthonormal basis (|s⟩ s∈S such that⟨s|s ′ ⟩=δ s,s ′ ) in the Hilbert spaceH S to express the vector |v⟩= X s v(s)|s⟩.(95) Here,v(s)represents thesth element of the vector. Similarly, for a linear operator in this Hilbert space, we can think of ⟨s ′ |M|s⟩,(96) as the element in thesth row ands ′ th column, and we can translate a matrixAwith elementsA s ′ into a linear operator in this space by using the outer-product A= X s ′ |s ′ ⟩A s ′ ⟨s|.(97) Using this notation, we can consider a vector spaceR |S| using the orthonormal basis of states |s⟩ s∈S such that⟨s|s ′ ⟩=δ s,s ′ . Then, the predictive Bayesian belief can be described as |ρ P (y :t ,a :t )⟩= X s t+1 |s t+1 ⟩p(s t+1 |h :t ),(98) and the retrodictive Bayesian belief as ⟨ρ R (y :t ,a :t )|= X s 0 p(s 0 |h :t )⟨s 0 |.(99) Similarly, the matrix corresponding a sequence of actionsa :t and outputsy :t can be described as T (y :t |a :t ) = t Y τ=0 T (y τ |a τ ) = X s 0 ,s τ+1 |s τ+1 ⟩p(s τ+1 ,y :τ |a :τ ,s 0 )⟨s 0 |,(100) If we define the initial diagonal operatorρ t ≡ P s t |s t ⟩p(s t )⟨s t |, then we can calculate the proba- bility of joint start and end state as follows T (y :τ |a :τ ) ρ 0 = X s 0 ,s τ+1 |s τ+1 ⟩p(s τ+1 ,s 0 ,y :τ |a :τ )⟨s 0 |.(101) With this, one can calculate the interface via expressions of the form p(y :τ |a :τ ) =⟨1|T (y :τ |a :τ ) ρ 0 |1⟩,(102) where|1⟩≡ P s |s⟩. Proof of Theorem 5.Using this notation, the BMSM (Sec. 6.2) can be expressed as ρ(y :τ ,a :τ ) = X s 0 ,s τ+1 |s τ+1 ⟩p(s τ+1 ,s 0 |y :τ ,a :τ )⟨s 0 |.(103) Moreover, by using Eq. (101) and Eq. (102) one can find that ρ(y :τ ,a :τ ) = X s 0 ,s τ+1 |s τ+1 ⟩p(s τ+1 ,s 0 ,y :τ |a :τ )⟨s 0 | p(y :τ |a :τ ) = X s 0 ,s τ+1 |s τ+1 ⟩T (y :τ |a :τ ) ρ 0 ⟨s 0 | ⟨1|T (y :τ |a :τ ) ρ 0 |1⟩ ,(104) which proves the first part of the theorem. Additionally, by comparing this with Eq. (98) and Eq. (99) one finds that ρ(y :τ ,a :τ )|1⟩=|ρ P (y :t ,a :t )⟩and⟨1|ρ(y :τ ,a :τ ) =⟨ρ R (y :t ,a :t )|,(105) which proves the second part of the theorem. The corollary can be proven by noticing that ρ(y :τ+1 ,a :τ+1 ) = T (y τ+1 |a τ+1 ) ρ(y :τ ,a :τ ) ⟨1|T (y τ+1 |a τ+1 ) ρ(y :τ ,a :τ )|1⟩ ,(106) where the denominator is a normalisation term. By contrast, the reverse-time update requires apply- ing a modified version of the transducer operatorρ −1 0 T (y|a) ρ 0 and normalizing: ρ(y −1:τ ,a −1:τ ) = ρ(y :τ ,a :τ )ρ −1 0 T (y −1 |a −1 ) ρ −1 ⟨1|ρ(y :τ ,a :τ )ρ −1 0 T (y −1 |a −1 ) ρ −1 |1⟩ .(107) Reflecting the fact that not every transducer is reversible, the operation ofρ −1 0 T (y|a) ρ 0 cannot nec- essarily be interpreted as the action of a transducer. However, it is nevertheless a valid method for retrodicting the state distribution of the world. It is important to note that, since not every transducer is reversible, the operationρ −1 t T (y|a) ρ t−1 generally does not yield the action of a transducer. This operation is, nevertheless, a valid method for retrodicting the state distribution of a world model if its initial state is assumed to be uncorrelated with future action sequences.