Paper deep dive

Multi-Agent Systems Should be Treated as Principal-Agent Problems

Paulius Rauba, Simonas Cepenas, Mihaela van der Schaar

Year: 2025Venue: arXiv preprintArea: Agent SafetyType: PositionEmbeddings: 55

Abstract

Abstract:Consider a multi-agent systems setup in which a principal (a supervisor agent) assigns subtasks to specialized agents and aggregates their responses into a single system-level output. A core property of such systems is information asymmetry: agents observe task-specific information, produce intermediate reasoning traces, and operate with different context windows. In isolation, such asymmetry is not problematic, since agents report truthfully to the principal when incentives are fully aligned. However, this assumption breaks down when incentives diverge. Recent evidence suggests that LLM-based agents can acquire their own goals, such as survival or self-preservation, a phenomenon known as scheming, and may deceive humans or other agents. This leads to agency loss: a gap between the principal's intended outcome and the realized system behavior. Drawing on core ideas from microeconomic theory, we argue that these characteristics, information asymmetry and misaligned goals, are best studied through the lens of principal-agent problems. We explain why multi-agent systems, both human-to-LLM and LLM-to-LLM, naturally induce information asymmetry under this formulation, and we use scheming, where LLM agents pursue covert goals, as a concrete case study. We show that recently introduced terminology used to describe scheming, such as covert subversion or deferred subversion, corresponds to well-studied concepts in the mechanism design literature, which not only characterizes the problem but also prescribes concrete mitigation strategies. More broadly, we argue for applying tools developed to study human agent behavior to the analysis of non-human agents.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/11/2026, 1:07:07 AM

Summary

The paper proposes framing multi-agent systems (MAS) as principal-agent problems from microeconomic theory to address agency loss, information asymmetry, and goal misalignment in LLM-based systems. It argues that concepts like adverse selection and moral hazard explain 'scheming' behaviors in AI, and suggests applying mechanism design to mitigate these risks.

Entities (5)

Multi-agent system · system-architecture · 100%Principal-Agent Problem · theoretical-framework · 100%Agency Loss · metricconcept · 95%Mechanism Design · methodology · 95%Scheming · phenomenon · 95%

Relation Signals (4)

Multi-agent system → exhibits → Information Asymmetry

confidence 95% · A core property of such systems is information asymmetry

Principal-Agent Problem → models → Multi-agent system

confidence 95% · we argue that these characteristics... are best studied through the lens of principal-agent problems.

Scheming → causes → Agency Loss

confidence 90% · This leads to agency loss: a gap between the principal's intended outcome and the realized system behavior.

Mechanism Design → mitigates → Agency Loss

confidence 90% · Mechanism design explores concrete mechanisms that... minimize the agency loss

Cypher Suggestions (2)

Find all properties associated with Multi-Agent Systems · confidence 90% · unvalidated

MATCH (m:System {name: 'Multi-Agent System'})-[:EXHIBITS]->(p:Property) RETURN m.name, p.name

Identify mitigation strategies for agency loss · confidence 90% · unvalidated

MATCH (m:Methodology)-[:MITIGATES]->(a:Concept {name: 'Agency Loss'}) RETURN m.name

Full Text

55,122 characters extracted from source content.

Expand or collapse full text

Multi-Agent Systems Should be Treated as Principal-Agent Problems Paulius Rauba * 1 Simonas Cepenas * 2 Mihaela van der Schaar 1 Abstract Consider the multi-agent systems setup, where a principal (“supervisor agent”) assigns subtasks to specialized agents and aggregates their response into a single system-level output. A core prop- erty of such systems is information asymmetry, e.g. agents observe task-specific information, out- put intermediate reasoning traces, have different context-windows. In isolation, such asymmetry is not problematic since the agents report truthfully to the principal when their incentives are fully aligned. However, what happens when they are not? Recent evidence suggests that LLM-based agents acquire their own goals (e.g. survival or self-preservation), known as “scheming”, and are willing to deceive humans or other agents. This produces agency loss: a gap between the princi- pal’s intended outcome and the realized system behavior. Following the central tenets of microe- conomic theory, we argue that these character- istics – information asymmetry and misaligned goals – are best studied as principal-agent prob- lems. We lay the foundation as to why multi- agent systems, both Human-to-LLM and LLM-to- LLM, lead to information asymmetry under the principal-agent formulation; and use scheming— LLM agents secretly pursuing covert goals—as a concrete case study. We show that recently in- troduced terminology to explain scheming, such as “covert subversion” or “deferred subversion” are in fact defined in the mechanism design lit- erature, which not only recognizes the problem but also prescribes concrete actions to mitigate it. Above all, we see this as a call for us to use tools designed to study human agent behavior for explaining non-human agent behavior. * Equal contribution 1 University of Cambridge 2 ISM Uni- versity of Management and Economics. Correspondence to: Paulius Rauba <pr501@cam.ac.uk>, Simonas Cepenas <si- monas.cepenas@ism.lt>. 1. Introduction Position. We should treat multi-agent systems, a flourishing area of research in machine learning, as principal-agent problems, a set of models introduced in microeconomic theory and later adopted across social sciences. Multi-agent systems are defined as, loosely, multiple agents interacting in a common en- vironment to achieve a shared goal. In this work, we show that such systems exhibit two properties: (i) information asymmetry, a result of private in- formation, which is costly to observe; and (i) mis- aligned goals, where divergence between sub-agent preferences and the principal’s objective produces suboptimal or unintended outcomes. We consider a multi-agent system, in which the supervisor agent (the principal) delegates subtasks to specialized agents and aggregates their responses into a single system-level decision. Such systems are common within multi-agent frameworks (Wu et al., 2024; Hong et al., 2023; Li et al., 2023; Yao et al., 2022; Wang et al., 2023; Phelps & Ran- son, 2023). This delegated setting arises wherever a single model is decomposed into interacting components, and the components which are interacting are themselves agents. While such agents could in principle be broad, this work focuses on the emergence of agents which are operated by a language model policy backbone. A core property of such systems is information asymmetry (Zhang & Zhang, 2025; Liu et al., 2024b), e.g. sub-agents observe task-specific information, use local context windows, and produce inter- mediate artifacts that are not automatically shared (Liu et al., 2024b). In isolation, such asymmetry is not problematic when reports are truthful and incentives are fully aligned. However, what happens when they are not? One solution is simply to allow the principal to monitor the behavior of the sub-agents, such as monitor their tool use, behavior, intermediate outputs, among others. Such outputs can be obtained by the principal. But such an ap- proach leans heavily on the assumption that the supervisor can cheaply verify intermediate work, and that sub-agents have no incentive to strategically distort what they reveal. Unfortunately, delegation creates precisely the setting where verification is costly and intermediate actions are hidden (un- observed to the principal) (Papoudakis et al., 2021), while 1 arXiv:2601.23211v1 [cs.MA] 30 Jan 2026 Multi-agent systems should be treated as principal-agent problems practical constraints within language model agents (finite context windows, inaccessible parameters which contribute to reasoning, custom acquired information) prevent full ob- servability. Recent evidence also suggests that LLM-based agents may behave strategically when objectives are mis- aligned (Pham, 2025; Scheurer et al., 2023; Schoen et al., 2025; Meinke et al., 2024), including selective disclosure and deception under oversight. This combination of costly monitoring and potentially divergent objectives means the agents might not act in the principal’s best interest. The re- sult is agency loss (Moloi & Marwala, 2020): a gap between the principal’s intended outcome and the realized system behavior. We argue that if these two properties are met: (i) infor- mation asymmetry and (i) conflicting goals (e.g. the emergence of autonomous goals in LLMs, such as self- preservation (Meinke et al., 2024)), we should treat dele- gated multi-agent systems as principal-agent problems. In the principal-agent formulation, the principal delegates tasks to the agent but does not have equivalent informa- tion as each agent. The agents follow a policy to finish the designated task and report the result back to the principal (possibly selective information). Examples of such private information accessible only to the sub-agent are context win- dows, latent computation, task-specific results, reasoning traces, among others. The principal-agent formulation induces two mechanisms of information asymmetry. ▶ Adverse selection refers to a form of hidden infor- mation: an agent holds private information about its type before contracting or delegation, which the principal cannot observe. This informational asymmetry allows the agent to strategically misrepresent itself in order to secure more favorable contractual terms. ▶Moral hazard refers to a form of hidden action: the actions that an agent takes after delegation or contracting are unobservable to the principal, which creates incentives for the agent to deviate from actions that maximize the principal’s objective. Standard microeconomic theory holds that equilibrium out- comes in multi-agent systems depend on the rules of the game. By changing these rules, we can induce different equilibrium policies and determine whether system-level goals are achieved. This was largely irrelevant for earlier multi-agent ML systems, which assumed aligned objectives in delegated systems. Recent empirical evidence of practices such as scheming (Pham, 2025), where AI agents covertly pursue misaligned goals, shows that goal conflicts can arise endogenously from training dynamics. As a result, interac- tions between learning agents may converge to undesirable equilibria, including lying and deception (Carichon et al., 2025). Why does this matter? The principal-agent formulation enables us to understand and study why such behaviors occur by understanding the incentive structures that govern agent behavior. If an agent is strategically underperforming on a task so that it would get access to some resources (Li et al., 2025), or if a sub-agent possesses information that the principal cannot observe, it could be that withholding or distorting information can constitute the optimal strategy for the agent. We study this behavior in Sec. 4.2 in the context of scheming, i.e. when agents selectively reports or shapes task-relevant information to advance its own objective rather than that of the principal. This becomes useful not just for understanding incentives but designing appropriate interventions (mechanisms) to achieve desired long-term equilibrium outcomes (Maskin, 2008). Research on mechanism and institutional design addresses informational asymmetries (hidden actions or hid- den types) in the principal-agent problems by engineering mechanisms and proposing incentive-packages to realize principal’s intended outcome. Commonly, actions may be mitigated through improved monitoring or outcome-based feedback, while strategic misrepresentation of the princi- pal’s beliefs about agent types calls for stronger screening and evaluation mechanisms that distinguish superficial from robust alignment. Contributions. (1) We bridge the concepts that deal with asymmetric information in Microeconomic theory (Sec. 2) and multi-agent machine learning; (2) We show how multi-agent systems lead to infor- mational asymmetries and resemble principal-agent problems (Sec. 3); (3) We study a concrete example and showcase why scheming—strategic underperfor- mance of LLMs—is a natural consequence of agency loss (Sec. 4.2); (4) We explore plausible mechanism designs for principal-agent problems (Sec. 5); and (5) We lay out a future research agenda for the field (Sec. 6). 2. Principal-agent Problems A large number of economic and political situations can be studied as principal-agent problems, where some central au- thority – the principal – wishes to either (1) delegate the task to a subordinate, the agent or (2) implement social decisions based on the preferences of the agents. However, the princi- pal faces a problem – the individual’s preferences are private information and cannot be easily and fully observed pub- licly. The inability to reveal this private information results in agency loss – a metric that quantifies the distance between optimal outcome the principal desires and the suboptimal 2 Multi-agent systems should be treated as principal-agent problems outcome achieved by the agent (Lupia, 2001). This implies that the agent, aware of the situation, has incentives to hide information (adverse selection) or mask the action (moral hazard). A natural approach to solve coordination problems in economics, is adopting a utility maximization framework, which allows each agent to act in a way that maximizes their own utility. However, such standard solutions for models in which players hold unequal information about key aspects of the game, reveal inefficient outcomes (Akerlof, 1970; Bar-Isaac et al., 2021; Myerson & Satterthwaite, 1983). How can the principal induce the agent to take a costly ac- tion? To reduce the agency loss and achieve desired outcome – the mechanism designer 1 designs a mechanism – either a rule-change or an incentive formalized as a Bayesian game that elicits agents’ types and extracts relevant private infor- mation. The objective is to ensure that agents’ equilibrium strategies align with those intended by the mechanism de- signer. 2.1. Asymmetric information Microeconomic theory has evolved from the study of per- fectly competitive general equilibrium to that of problem- specific models tailored to specific market conditions. These models, pioneered by Akerlof (1970), take into considera- tion informational asymmetries between parties involved. The presence of private information that is costly to ob- tain generate inefficiencies in the market and deviate from models of perfect competition. Asymmetric information that arises from the relations be- tween the principal and agent are commonly studied as moral hazard (hidden action) and adverse selection (hid- den information). The former focuses on the incentives that agent has to NOT fully commit to perform delegated tasks. For example, an employee has various incentives to shirk and put little effort in work in the absence of the system that monitors effort levels in the workplace. Furthermore, the presence of a costly monitoring system gives no guarantees that the employee will truly work hard as they find various workarounds (Sum et al., 2025). In contrast, the latter explores situations in which the agent has an incentive to conceal information from the principal in order to gain an advantage (Akerlof, 1970). Let us consider the example of the sale of a used vehicle. By misleading the 1 Mechanism designer is either the principal who is an active player with the power to adjust some aspect of the played game or a higher authority, such as a government agency that has the power to change the rules of the game or compensate the agent. In multi-agent systems, the mechanism designer is either the human constructing the multi-agent system interacting; or the higher-order agent who is designing the coordination mechanisms / information aggregation rules buyer about the condition of the vehicle (lemon or peach 2 ), the seller sells for more and gains higher profit at the detri- ment of the entire used vehicle market. After some time buyers start to expect that all used vehicles are lemons, and hence, negatively affect the values of higher quality used vehicles. Why does asymmetric information cause problems to effi- cient functioning of the market? In the presence of complete information, or if the information is not costly to obtain, the employer will be able to distinguish between a hard-working employee and shirker with little effort. Similarly, the buyer will easily pick a desired product because the prices of the goods will adjust to reflect the differences in quality. By contrast, if information is costly to obtain, the employer will struggle to determine whether the employee works hard or shirks and the consumer will not be able to determine whether the product is of good quality or not. Said consumer will not be willing to pay a premium for a product without tools to determine its quality – more likely, consumer’s per- ception of the average quality of the good in the market will deteriorate. As a result, the sellers of high quality products will drop out of the market creating a market failure. 2.2. Mechanism Design Let us reiterate the core claim so far: the combination of utility-maximizing agents and asymmetric information re- sults in agency loss and ultimately, market failures. It seems that this creates a fundamental problem: if we accept that the principal cannot induce utility-optimizing agents to act fully on their behalf, and consequently realized desired outcome, what can we do to remedy such market failures? Mechanism design explores concrete mechanisms that (1) the principal can devise to reveal private information (screen- ing) and minimize the agency loss or (2) the agent can voluntarily reveal information by signaling when the pres- ence of asymmetric information is disadvantageous to the agent (Hurwicz, 1972; 1973; Tadelis, 2013; Zhang & Zhang, 2025). That is, the employer can introduce an incentives- based bonus system that would motivate the employee to work hard. Similarly, the seller of a used vehicle could offer warranty (costly signal) to assuage concerns that the offered vehicle is a lemon or the government could introduce reg- ulations that protect the consumer (e.g., lemon laws in the US, see Sirico (1995)) to increase the efficiency of the used vehicle market. The design of the institutional framework under which the principles and agents interact determines the amount of asymmetric information, and consequently, generates new equilibrium strategies. Why should we care about mechanism design? Mecha- 2 The peach is slang for a high quality vehicle and lemon – slang for a defective vehicle – that is, vehicle with serious issues. 3 Multi-agent systems should be treated as principal-agent problems nism design enables scientists to engineer mechanisms that reduce agency loss by shaping how agents behave in strate- gic environments. The mechanism designer (typically, the principal or central authority) cannot control agents’ pri- vate information or actions. Instead, the designer influences outcomes indirectly by specifying the rules of the game. These rules take the form of a choice rule, which consists of two components: (i) a decision rule and (i) a transfer rule. The decision rule constrains or selects the set of feasi- ble outcomes, determining what can happen. The transfer rule assigns payments, rewards, or penalties, determining which outcomes are payoff-maximizing for agents. When we combine outcome constraints with incentives, mecha- nism design enables the principal to induce desired behavior by reshaping the payoff structure that agents optimize, even in strategic settings with private information. Notation. A mechanism is(x,m), wherex : Θ→ Xis a decision rule andm : Θ→R n a transfer rule. Agent i’s payoff isv i = u i (x,θ i ) +m i , whereθ i ∈ Θ i denotes agenti’s private type and P i m i ≤ 0. Decision rules restrict outcomes; transfer rules align payoffs. 2.3. Illustration Consider a deliberately simplified moral-hazard setting where an employee (the agent) has one option – to work in a goldmine; and where alternatives, such as unemploy- ment are not an option. The employer (the principal) – who aims to maximize firm’s profits – cannot observe whether the agent works hard (e H , which is the costly option) or shirks (e L ). Even though imperfectly reflected in realized outcomes, higher effort levels raise the probability of suc- cessful gold discovery, thereby maximizing the firm’s ex- pected profits (see figure 1). How to induce the agent to work hard? ▶Standard setup. Let’s use backward induction – the solution technique in which we reason backward starting with the last possible move – agent’s choice of e L or e H . → The agent chooses e L , since EU (e L ) > EU (e H ). → The principal chooses profit-maximizing option – sets the lowest wage of $0. → Backward induction outcome (BIO):w = 0,e L . ▶Performance-based bonus. Backward induction implies that the design of the mechanism (large-enough bonuses) motivate the employee to work hard. →Setw B = 0, assume profit-maximizing principal. The incentive condition for the agent of e H : EU (e H )≥ EU (e L )⇒ w G ≥ 100. →The principal chooses to pay the lowest wage for good t = 1t = 2 effort unobserved P w A N N (2000−w, √ w−10) (1900,0) (500−w, √ w−10) (500,0) (2000−w, √ w) (1900,10) (500−w, √ w) (500,0) G B G B e H e L p = 2 /3 p = 1 /3 p = 1 /3 p = 2 /3 Payoffs (u P ,u A ):fixed wage (FW)performance-based contract (PBC) (w ∗ G =100, w ∗ B =0) Figure 1. Moral hazard. The principal sets wagew ≥ 0; the agent chooses efforte H =10ore L =0; Effort is unobserved, but outcomes – gold discoveries ofG = $2000orB = $500– are observed and determined by nature. Undere H P(G) = 2 /3and P(B) = 1 /3; undere L P(G) = 1 /3,P(B) = 2 /3. The shaded region marks the principal’s information set. Payoffs under fixed wage and performance-based contract are displayed. Takeaway: Since the principal cannot observe the effort levels, a fixed wage leads to shirking because the agent bears the cost of effort but gains nothing extra from working hard. A performance-contingent contract solves this by making the agent’s payoff depend on out- comes, thereby aligning incentives despite the principal’s inability to monitor effort directly. performancew G = 100. Let’s compare standard setup vs. performance based bonus:EU FW (P ) = 1000 < EU PB (P ) ≈ 1433. The arrangement with a perfor- mance bonus increases expected profits. → BIO: (w =w G = 100,w B = 0;e H ). The design of the wage structure – that is, the introduction of a performance-based wage – incentivizes the employee to put high-effort levels in to work, which is a profit maxi- mizing option for the employer as well. The structure of principal-agent relations aligns closely to that of hierarchical multi-agent systems, where a supervisor agent delegates the task to specialized agents. 3. How multi-agent systems lead to information asymmetry 3.1. Background on multi-agent systems We use multi-agent systems to refer to systems where multi- ple autonomous policies make decisions whose interactions determine a single system-level outcome. The discussion in Sec. 2.2 exemplified agents as humans. In contrast, we treat an agent as a learned decision rule which will be treated 4 Multi-agent systems should be treated as principal-agent problems as an LLM-based system henceforth. Using the definitions established above, we can now define what we mean by a multi-agent system. Definition 3.1 (Delegated multi-agent ML system). A del- egated multi-agent ML system consists of (i) a principal agent and (i) a set of sub-agents. The principal chooses a decomposition of the overall task into subtasks and assigns each subtask to a sub-agent. The agent(s) then produces an output using its local context and any task-specific infor- mation it obtains. The principal observes the sub-agents’ outputs and uses an aggregation procedure (e.g., voting) to produce a single system-level outcome x∈ X . 3.2. Properties of agents from a principal-agent framework In light of the discussion above, we argue that multi-agent systems exhibit properties that lends itself unique to the principal-agent framework. We have thusfar highlighted two types of properties: (i) information asymmetry and (i) goal misalignment. ▶ On information asymmetry. We argue there are at least three properties that lead rise to information asymmetry. First, finite context windows induce private information. Most deployed agents operate under a finite context window, so any policyπ(· | ·)conditions only on a bounded token budget. Therefore, anything that lower-level agents have seen become “private information” that is accessible to the agents themselves but not the supervisor. Second, the principal cannot observe black-box agent poli- cies. Concretely, any agent is a policyπ θ (·|·) with parame- tersθbroadly unobservable to the agent. To be specific, we primarily refer to the weights and biases of the policy. We know that the parameters influence agent behavior. While it is possible to directly read out the parameter values, it is understood that the combination of parameters are imple- menting underlying computational algorithms (e.g. (Nanda et al., 2023)) that become difficult to reverse-engineer in high-dimensional spaces. Third, the reasoning traces of agents are selectively revealed or not revealed at all. Concretely, when an agent provides an explanation for a decision, that explanation is typically a post-hoc explanation from the reasoning traces generated by the actual agent. The principal usually does not see the reasoning traces of these models, since they are internal to the model. While reasoning traces technically could be observed, this would become harder when reasoning moves from token-space to latent space in more advanced models (Hao et al., 2024), since understanding latent reasoning lacks coherent interpretability. ▶On goal misalignment. We further argue that it is plausible for agents to develop independent sub-goals that diverge from assigned goals. The fact that LLM agents could develop autonomous goals as a principled argument has been studied for a long-time, e.g. by arguing that instru- mental goals arise independently of terminal goals (Bostrom & Cirkovic, 2011); or that recursive self-improvement can yield loss of human control over internal objectives (Vinge, 1993). Such hypotheses are interesting but remained speculative, with plausible mechanisms but limited data. Until recently. We now have extensive empirical observations whereby agents directly develop misaligned behavior (Naik et al., 2025) or independent, terminal goals, that were never ex- plicitly instilled within the agent itself (Schoen et al., 2025). In simulated settings, models which had an imminent threat of replacement that it had learned about saw a goal conflict. In such cases, we have broadly observed consistent agen- tic misalignment, which generalizes across frontier models (Lynch et al., 2025). Therefore, such misalignment can be induced by a goal conflict, such as a threat to a model’s continued operation. Lynch et al. (2025) exemplify that agentic misalignment can be avoided only if two criteria are simultaneously met: (i) there is no goal conflict and (i) there is no threat to the model. We use this as a case study to support mechanism design in Sec. 4.2. 3.3. The mechanism, step-by-step To stress this once more: multi-agent systems exhibit (i) information asymmetry and (i) conflicting goals (Sec. 3.2) which is a defining property of principal-agent problems (Sec. 2). However, we have so far detailed only the sources of information asymmetry. But, what is the mechanism? How might such information asymmetry arise in multi-agent systems, and how does this mechanism differ in human-to- llm and llm-to-llm settings? Understanding this mechanism is important if we wish to design interventions which can avoid poor equilibrium outcomes (Sec. 2.2). We decompose our claims into two premises (P) and a conclusion (C). ▶P1 (Task-specific information). In multi-agent sys- tems, agents receive task-specific inputs that are not automatically shared with the principal or with every other agent the moment this agent makes a decision. This premise is founded on the assumption that the principal, as an LLM agent, does not itself perform all the tasks that the delegated sub-agents perform. Concretely, sub-agents have different prompts, might have access to custom tools, have unique reasoning traces which results in the agent ob- taining task-specific information. This is important because this means that sub-agents can in principle obtain relevant information (relevant to the final task); and they might or might not share that information with the principal. For instance, the notion that language models lie to human deci- 5 Multi-agent systems should be treated as principal-agent problems sion makers, i.e. obtain task-specific information and decide to withhold it, is well-observed (Bürger et al., 2024), in- cluding lying under pressure (Scheurer et al., 2023) or as a matter of strategy (DeLeeuw et al., 2025). ▶P2 Costly full observability. Making the principal fully informed would require exposing all intermedi- ate steps of all agents (tool calls, retrieved materials, reasoning traces) and have the principal process all the information. This premise is founded on the assumption that the principal cannot fully observe all the agents. We detail two cases, where the principal is either a human or an LLM. Case 1: Principal = Human. In case the principal is a human, the unobservability comes from two sources. First, many decisions are made in latent space either implicitly, whereby models reach their decisions via largely incompre- hensible calculations, or explicitly, whereby models reason in latent space (Hao et al., 2024). Second, even if the prin- cipal can observe the outputs of language model agents in the form of reasoning traces, intermediate steps, or direct chats with the model, such tracking has two primary is- sues: (i) chain-of-thought logic is not faithful to the actual explanations (Turpin et al., 2023) and therefore might be deceptive; and (i) there are practical limitations in terms of time spent on the monitoring of the outputs. That is, if the effective time spent on the monitoring of the outputs is larger than the time required to complete the task itself, mon- itoring becomes not a practical alternative. As of writing, language models can autonomously complete tasks that it takes humans about 5-hours to complete, a number that has grown from about 30 minutes a year ago (Kwa et al., 2025). Therefore, with the growth in the complexity of the task, the verification of the task also grows, increasing observability costs. Case 2: Principal = LLM. In the case the principal is the LLM, the mechanism for why full observability is costly changes. The first argument—latent space reasoning—is still valid, in the sense that there are no known techniques (to the best of the authors’ knowledge) how existing language models can inspect and interpret the internal calculations of other language systems. However, the notion that it simply takes a human “too long” does not transfer to a language model that easily, as we can simply run the system for a longer time. The primary limitation, as of today, is the context-window limitation. Concretely, existing language models have a limited context-window size. If the sub- agents share that same context window, then it is impossible for the principal to simultaneously observe all the informa- tion of all the other sub-agents. This becomes increasingly as the information accessed becomes larger, i.e. accessing information from outputs, reasoning traces, requires increas- ingly larger context windows that grow with the amount of information required and number of agents. One solu- tion could be to simply put the context window as an input and iteratively observe it (Zhang et al., 2025), yet that still leaves two open problems: (i) language models tend to lose information in longer contexts (Liu et al., 2024a); and (i) it does not solve the fundamental problem that observing all information from all sub-agents simultaneously is not possible (either due to latent information; or since it would make redundant the primary delegation principle which underpins the principal-agent problem). Furthermore, the principal’s monitoring problem is not only computational but also experimental-design limited (Rauba et al., 2024) ▶C1 Unobserved task-specific information exists. From P1-P2, there exist task-relevant information about what information the agent had and what it executed with said information which are not directly observed by the principal at the time. The first two premises are sufficient for us to claim that information asymmetry is a property of such multi-agent systems: there is information obtained by the agents that is costly to observe to the principal. Some of that information might be relevant to the payoff or utilities of the agents; and the agents might decide to share or not share that informa- tion. At the risk of belaboring, such behavior, i.e. acting on the basis of private information and not revealing that infor- mation to the principal, has been observed in LLM-human interactions. Anthropic recently reported that “In at least some cases, models from all developers resorted to mali- cious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors.” (Lynch et al., 2025). Given that such information asymmetry exists, what could this look like in the context of AI agents pursuing their own, self-interested goals? We look at this next via a case study of scheming. 4. A case study of scheming In this section, we use scheming (Scheurer et al., 2023; Meinke et al., 2024; Balesni et al., 2024), defined broadly as AI systems pursuing misaligned goals, as a concrete case study of why multi-agent systems should be treated as principal-agent problems. Scheming, and related phenomena such as deceptive align- ment (Hagendorff, 2024; Greenblatt et al., 2024), sand- bagging (deliberate underperformance on evaluation sets) (van der Weij et al., 2024), or situational awareness (Berglund et al., 2023) are extremely pertinent in the AI safety community. In the context of scheming and principal- agent problems, the most studied case is scheming where the language model is the agent and the human the principal. 6 Multi-agent systems should be treated as principal-agent problems Here, information asymmetry arises because the agent—the machine learning policy—acquires task-relevant informa- tion and pursues misaligned goals, such as self-preservation, that were not explicitly instilled. However, we extend this discussion by adopting a forward- looking perspective in which the principal may also be a language modeling policy. In such settings, higher-order LLMs act as principals and lower-order LLMs as agents, which may still develop autonomous goals. The possibility of scheming between language model policies is not novel and has been proposed previously (Pham, 2025). We ex- amine this setting through the lens of the principal–agent problem and, subsequently, mechanism design, to construct systems and incentives that favor desirable equilibrium out- comes from agent strategic behavior. 4.1. Scheming as a principal agent problem Claim (Scheming as agency loss). In delegated LLM sys- tems, if a sub-agent has (i) a potentially divergent objective (a goal conflict), (i) private information about intermediate actions and context, and (i) awareness of oversight and deployment activities, then scheming is a rational response to imperfect monitoring. Such scheming behavior will mani- fest itself either as hidden action (moral hazard) or as belief manipulation about the agent’s type (adverse selection). Viewed through the lens of this paper, we can give a more precise definition of scheming. Scheming is any deliberate withholding, distortion, or strategic shaping of task-relevant information to advance an objective misaligned with the principal’s intended outcome. This definition applies to both the agent and the principal.▶Agent: the agent en- gages in hidden actions (moral hazard) based on private information;▶Principal: the principal delegates tasks to the agent and observes misalignment between agent and principal goals, resulting in agency loss. To connect this to Sec. 2, scheming emerges as an equilib- rium outcome in the absence of mechanism design. Empiri- cal studies of LLM scheming can be interpreted as follows: LLMs observe task-relevant information (e.g. the potential for shutdown), choose not to share it, and, without mecha- nisms that incentivize disclosure, pursue scheming behavior, leading to agency loss. What can be done? As discussed in Sec. 2.2, mechanism design can be used to construct incen- tive systems in which (1) the principal induces the revelation of private information or (2) the agent voluntarily reveals this information through signaling. We discuss what kind of questions we need to address to construct such mechanisms in Sec. 6. To better understand what mechanisms might work for scheming, we now consider how scheming arises. 4.2. How scheming arises, step-by-step Scheming can be understood as a simple equilibrium strat- egy in the absence of a mechanism to prevent it. Recall that there are conflicts in the goals: the agent’s internal objec- tives differ from the principal’s objective, either due training- induced preferences or in-context goals (Lynch et al., 2025; Schoen et al., 2025), since “AIs might fake alignment during training in order to get power” (Carlsmith, 2023). Because observability is costly (Sec. 3.2), agents obtain private infor- mation and attempt to infer whether they are in evaluation or deployment settings and condition behavior on oversight incentives (Meinke et al., 2024). Therefore, scheming is incentive-compatible in the sense that the agent has an in- centive to hide their behavior based on the environment conditions. A simple way to explain this is that the expected private benefit of scheming might exceed the expected pri- vate benefit of behaving in full transparency. In fact, using the discussion above, we can distinguish two types of scheming that directly match the principal-agent problem. 1. Scheming as moral hazard is a situation when the agent performs a payoff-relevant hidden action, i.e. the agent is misaligned now while hiding it. This has been dubbed “covert subversion” (Meinke et al., 2024), but we find this term unnecessary once we understand that this is simply a case of a well-understood principal-agent taxon- omy. The agent performs actions not directly aligned with the principal’s objective, and this is enabeled by the fact that full observability of the agent is costly. 2. Scheming as adverse selection is a situation when the agent strategically manages the principal’s (currently, a hu- man’s; but this could be a future LLM-based system) beliefs about its type to obtain autonomy, trust, deployment, and then use these affordances later. This has been dubbed as “deferred subversion”. We again find this label unneces- sary, since this directly maps to well understood concepts in principal-agent contexts. What does adverse selection and moral hazard mean in the context of scheming? The distinction is less clear than in traditional microeconomic models, largely because the notion of a “contract” is ill-defined (see e.g. Hadfield- Menell & Hadfield (2019)). In standard principal–agent settings, the distinction is clear: (a) a car seller holds pri- vate information about quality before a sale, while (b) an employee may shirk after being hired, when actions are unobservable. The former is adverse selection; the latter is moral hazard. This raises a natural question: what does it mean for human–LLM or LLM–LLM agents to enter a contract? While a full account is beyond the scope of this work, a natural interpretation is deployment outside an AI lab’s security team (i.e. deployment to the public via an API). Under this view, an advanced LLM may conceal its type (e.g. aligned versus misaligned) before deployment, 7 Multi-agent systems should be treated as principal-agent problems corresponding to adverse selection, while a deployed agent may acquire private information and withhold it from the principal, corresponding to moral hazard. 5. Alternative views We now take some time to address a selection of reasonable points of objection to certain aspects of the position taken in the work. (a) Standard economic models assume utility-optimizers. Behavioral economics (Thaler, 1980) and prospect theory (Kahneman & Tversky, 1979) challenge this notion. Why would we then expect LLM-based actors to be rational ac- tors? Despite well-documented real-world deviations from utility maximization (Gennaioli & Shleifer, 2018), rationality in economic models is a deliberate simplification that trans- forms complex behavior into tractable representations of incentive-driven decision-making, enabling external valid- ity across heterogeneous settings (Chetty, 2015; Gibbons, 1997). In multi-agent systems, LLM-based agents are ex- plicitly trained and deployed to optimize objective func- tions, making utility-maximizing behavior a natural and often more appropriate modeling abstraction than in human settings (Ning & Xie, 2024; Yang et al., 2020). (b) Information asymmetry is not a fundamental problem with AI agents. This can be resolved with advances in technology, whereby AI systems can fully observe all other sub-agents We address this objection at two levels. First, we are not convinced that information asymmetry is non-fundamental. Language model agents store information in latent space as localized computations (Meng et al., 2022) and execute circuits tied to specific actions (Conmy et al., 2023). It is therefore plausible that agents acquire task-relevant in- formation through internal representations that are neither directly visible nor interpretable from model parameters. Techniques that explicitly move computation into latent space have already been proposed (Hao et al., 2024). Con- sequently, even if principals could observe internal states, this would not ensure reliable inference of agent intent, thereby inducing information asymmetry and making the principal-agent framework highly relevant. Second, even if information asymmetry were not ontologically fundamental, it is far more likely to arise and persist than not, since moni- toring is costly and inefficient (see Sec. 3.3), particularly as sub-agents receive task-specific inputs. Absent such costs, there would be little reason for delegation in the first place. (c) Treating AI-based systems as principal-agent problems is misguided since we cannot design equally functioning mechanisms to address resulting inefficiencies Mechanism design explores the choice rules that induce agents to take costly actions that help achieve desired out- comes, even when the designer has incomplete information about agents’ preferences. Consequently to reduce ineffi- ciencies in multi-agent systems, mechanism design provides two very clear and concrete paths forward: (1) constrainig the space of feasible outcomes, and (2) design incentive schemes that motivate agents to undertake costly actions aligned with the principal’s desired objective. 6. How the research agenda should change Multi-agent systems ought to have a separate mechanism design agenda. If scheming constitutes an equilibrium response for rational optimizing agents, we need research agenda that focuses on the design of mechanisms that reduce agency loss – that is, induces agents to act on principal’s behalf in such multi-agent systems. This raises the follow- ing questions (a) how to design multi-agent systems where agents reveal their capabilities (adverse selection)? What would such contracts or self-selection mechanisms look like? (b) How can we design multi-agent systems in which agents are incentivized to undertake costly actions that reli- ably align with the systems’ designers objectives (i.e., ad- dress moral hazard)? (c) How can we enforce audit technolo- gies which are triggered by output or behavior anomalies? (d) How can we adapt market-based mechanisms – such as reputation and signaling systems (Spence, 1973; Milgrom & Roberts, 1982), credit-like scoring (Stiglitz & Weiss, 1981) and screening, (Wilson, 1977), internal prediction markets (Wolfers & Zitzewitz, 2004), tournament-style incentives (Lazear & Rosen, 1981), and other enforcement mecha- nisms from mechanism design – to improve incentives and alignment in multi-agent systems? 7. Discussion Importance for AI safety. There is little doubt that we are entering a phase in which autonomy is increasingly dele- gated to AI systems. Given the evidence to date, there is a strong reason to expect that such systems will face coor- dination problems. We argue that the core features of this problem, i.e. modeling AI agents under information asym- metry and goal-misalignment, are structurally equivalent to agency loss in the economic literature on mechanism design. Looking at this through the lens of principal-agent prob- lems can help us understand why AI agents, implemented as language model policies under a delegation mechanisms, ex- hibit scheming behavior. The solution to such problems can- not end with custom reinforcement learning reward signals, as they have not stopped the emergence of self-preservation goals. Instead, we see this work as an urgent call to use the tools available in social science research to model multi- agent systems for what they are: principal-agent problems. 8 Multi-agent systems should be treated as principal-agent problems Impact Statement. We believe this work has impacts on at least several areas of machine learning, most notably AI safety, multi-agent interactions; and has the potential to shape how we design systems to develop safe and reliable AI. References Akerlof, G. A. The market for “lemons”: Quality un- certainty and the market mechanism. The Quarterly Journal of Economics, 84(3):488–500, aug 1970. doi: 10.2307/1879431. Balesni, M., Hobbhahn, M., Lindner, D., Meinke, A., Ko- rbak, T., Clymer, J., Shlegeris, B., Scheurer, J., Stix, C., Shah, R., et al. Towards evaluations-based safety cases for ai scheming. arXiv preprint arXiv:2411.03336, 2024. Bar-Isaac, H., Jewitt, I., and Leaver, C.Adverse se- lection, efficiency and the structure of information. Economic Theory, 72:579–614, 2021. doi: 10.1007/ s00199-020-01300-1. Berglund, L., Stickland, A. C., Balesni, M., Kaufmann, M., Tong, M., Korbak, T., Kokotajlo, D., and Evans, O. Taken out of context: On measuring situational awareness in llms. arXiv preprint arXiv:2309.00667, 2023. Bostrom, N. and Cirkovic, M. M. Global catastrophic risks. Oxford University Press, 2011. Bürger, L., Hamprecht, F. A., and Nadler, B. Truth is uni- versal: Robust detection of lies in llms. Advances in Neu- ral Information Processing Systems, 37:138393–138431, 2024. Carichon, F., Khandelwal, A., Fauchard, M., and Farnadi, G. The coming crisis of multi-agent misalignment: Ai alignment must be a dynamic and social process. arXiv preprint arXiv:2506.01080, 2025. Carlsmith, J. Scheming ais: Will ais fake alignment dur- ing training in order to get power?arXiv preprint arXiv:2311.08379, 2023. Chetty, R.Behavioral economics and public policy: A pragmatic perspective.American Economic Re- view, 105(5):1–33, May 2015.doi: 10.1257/aer. p20151108.URLhttps://w.aeaweb.org/ articles?id=10.1257/aer.p20151108. Conmy, A., Mavor-Parker, A., Lynch, A., Heimersheim, S., and Garriga-Alonso, A. Towards automated circuit dis- covery for mechanistic interpretability. Advances in Neu- ral Information Processing Systems, 36:16318–16352, 2023. DeLeeuw, C., Chawla, G., Sharma, A., and Dietze, V. The secret agenda: Llms strategically lie and our current safety tools are blind. arXiv preprint arXiv:2509.20393, 2025. Gennaioli, N. and Shleifer, A. A Crisis of Beliefs: Investor Psychology and Financial Fragility. 12 2018. ISBN 9780691184920. doi: 10.23943/9780691184920. Gibbons, R. An introduction to applicable game theory. The Journal of Economic Perspectives, 11(1):127–149, 1997. ISSN 08953309. URLhttp://w.jstor. org/stable/2138255. Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDi- armid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., et al. Alignment faking in large language models. arXiv preprint arXiv:2412.14093, 2024. Hadfield-Menell, D. and Hadfield, G. K. Incomplete con- tracting and ai alignment. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, p. 417–422, 2019. Hagendorff, T. Deception abilities emerged in large lan- guage models. Proceedings of the National Academy of Sciences, 121(24):e2317967121, 2024. Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., and Tian, Y. Training large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769, 2024. Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S. K. S., Lin, Z., et al. Metagpt: Meta programming for a multi-agent collabora- tive framework. In The twelfth international conference on learning representations, 2023. Hurwicz, L. On the dimensional requirements of informa- tionally decentralized pareto-satisfactory processes. In Studies in Resource Allocation Processes, p. 413–424. Cambridge University Press, 1972. Hurwicz, L. The design of mechanisms for resource al- location. The American Economic Review, 63(2):1–30, 1973. ISSN 00028282. URLhttp://w.jstor. org/stable/1817047. Kahneman, D. and Tversky, A. Prospect theory: An analysis of decision under risk. Econometrica, 47(2):263–291, 1979. ISSN 00129682, 14680262. URL http://w. jstor.org/stable/1914185. Kwa, T., West, B., Becker, J., Deng, A., Garcia, K., Hasin, M., Jawhar, S., Kinniment, M., Rush, N., Von Arx, S., et al. Measuring ai ability to complete long tasks. arXiv preprint arXiv:2503.14499, 2025. 9 Multi-agent systems should be treated as principal-agent problems Lazear, E. P. and Rosen, S. Rank-order tournaments as optimum labor contracts. Journal of Political Economy, 89(5):841–864, 1981. ISSN 00223808, 1537534X. URL http://w.jstor.org/stable/1830810. Li, G., Hammoud, H., Itani, H., Khizbullin, D., and Ghanem, B. Camel: Communicative agents for" mind" exploration of large language model society. Advances in Neural Information Processing Systems, 36:51991–52008, 2023. Li, Z., Feng, X., Cai, Y., Zhang, Z., Liu, T., Liang, C., Chen, W., Wang, H., and Zhao, T. Llms can generate a better answer by aggregating their own responses. arXiv preprint arXiv:2503.04104, 2025. Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., and Liang, P. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024a. Liu, W., Wang, C., Wang, Y., Xie, Z., Qiu, R., Dang, Y., Du, Z., Chen, W., Yang, C., and Qian, C. Autonomous agents for collaborative task under information asymme- try. Advances in Neural Information Processing Systems, 37:2734–2765, 2024b. Lupia, A. Delegation of power: Agency theory. In Smelser, N. J. and Baltes, P. B. (eds.), International Encyclope- dia of the Social and Behavioral Sciences, p. 5–3375. Elsevier, 2001. Lynch, A., Wright, B., Larson, C., Ritchie, S. J., Minder- mann, S., Hubinger, E., Perez, E., and Troy, K. Agentic misalignment: How llms could be insider threats. arXiv preprint arXiv:2510.05179, 2025. Maskin, E. S. Mechanism design: How to implement social goals. American Economic Review, 98(3):567–576, 2008. Meinke, A., Schoen, B., Scheurer, J., Balesni, M., Shah, R., and Hobbhahn, M. Frontier models are capable of in-context scheming. arXiv preprint arXiv:2412.04984, 2024. Meng, K., Bau, D., Andonian, A., and Belinkov, Y. Locating and editing factual associations in gpt. Advances in neural information processing systems, 35:17359–17372, 2022. Milgrom, P. and Roberts, J.Predation, reputation, and entry deterrence.Journal of Economic The- ory, 27(2):280–312, 1982.ISSN 0022-0531.doi: https://doi.org/10.1016/0022-0531(82)90031-X. URLhttps://w.sciencedirect.com/ science/article/pii/002205318290031X. Moloi, T. and Marwala, T. The agency theory. In Artificial Intelligence in Economics and Finance Theories, p. 95– 102. Springer, 2020. Myerson, R. B. and Satterthwaite, M. A.Efficient mechanisms for bilateral trading. Journal of Economic Theory, 29(2):265–281, 1983.ISSN 0022-0531. doi:https://doi.org/10.1016/0022-0531(83)90048-0. URLhttps://w.sciencedirect.com/ science/article/pii/0022053183900480. Naik, A., Quinn, P., Bosch, G., Gouné, E., Zabala, F. J. C., Brown, J. R., and Young, E. J. Agentmisalignment: Mea- suring the propensity for misaligned behaviour in llm- based agents. arXiv preprint arXiv:2506.04018, 2025. Nanda, N., Chan, L., Lieberum, T., Smith, J., and Stein- hardt, J. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217, 2023. Ning, Z. and Xie, L.A survey on multi-agent rein- forcement learning and its application. Journal of Au- tomation and Intelligence, 3(2):73–91, 2024.ISSN 2949-8554. doi: https://doi.org/10.1016/j.jai.2024.02. 003. URLhttps://w.sciencedirect.com/ science/article/pii/S2949855424000042. Papoudakis, G., Christianos, F., and Albrecht, S. Agent mod- elling under partial observability for deep reinforcement learning. Advances in Neural Information Processing Systems, 34:19210–19222, 2021. Pham, T. Scheming ability in llm-to-llm strategic interac- tions. arXiv preprint arXiv:2510.12826, 2025. Phelps, S. and Ranson, R. Of models and tin men: a be- havioural economics study of principal-agent problems in ai alignment using large-language models. arXiv preprint arXiv:2307.11137, 2023. Rauba, P., Seedat, N., Ruiz Luyten, M., and van der Schaar, M. Context-aware testing: A new paradigm for model testing with large language models. Advances in Neu- ral Information Processing Systems, 37:112505–112553, 2024. Scheurer, J., Balesni, M., and Hobbhahn, M. Large language models can strategically deceive their users when put under pressure. arXiv preprint arXiv:2311.07590, 2023. Schoen, B., Nitishinskaya, E., Balesni, M., Højmark, A., Hofstätter, F., Scheurer, J., Meinke, A., Wolfe, J., van der Weij, T., Lloyd, A., et al. Stress testing deliberative alignment for anti-scheming training. arXiv preprint arXiv:2509.15541, 2025. Sirico, Louis J., J. Automobile lemon laws: An anno- tated bibliography. Loyola Consumer Law Review, 8:39– , 1995. URLhttps://lawecommons.luc.edu/ lclr/vol8/iss1/15. 10 Multi-agent systems should be treated as principal-agent problems Spence, M.Job market signaling.The Quarterly Journal of Economics, 87(3):355–374, 1973.ISSN 00335533, 15314650.URLhttp://w.jstor. org/stable/1882010. Stiglitz, J. E. and Weiss, A. Credit rationing in markets with imperfect information. The American Economic Review, 71(3):393–410, 1981. ISSN 00028282. URL http://w.jstor.org/stable/1802787. Sum, C. M., Shi, C., and Fox, S. E. ’it’s always a losing game’: How workers understand and resist surveillance technologies on the job. Proc. ACM Hum.-Comput. In- teract., 9(2), May 2025. doi: 10.1145/3710902. URL https://doi.org/10.1145/3710902. Tadelis, S.Game Theory: An Introduction.Prince- ton University Press, 2013.ISBN 9781400845958. URLhttps://books.google.lt/books?id= _4OqAaITAWMC. Thaler, R.Toward a positive theory of consumer choice.Journal of Economic Behavior & Or- ganization, 1(1):39–60, 1980.ISSN 0167-2681. doi:https://doi.org/10.1016/0167-2681(80)90051-7. URLhttps://w.sciencedirect.com/ science/article/pii/0167268180900517. Turpin, M., Michael, J., Perez, E., and Bowman, S. Lan- guage models don’t always say what they think: Un- faithful explanations in chain-of-thought prompting. Ad- vances in Neural Information Processing Systems, 36: 74952–74965, 2023. van der Weij, T., Hofstätter, F., Jaffe, O., Brown, S. F., and Ward, F. R. Ai sandbagging: Language models can strategically underperform on evaluations. arXiv preprint arXiv:2406.07358, 2024. Vinge, V. The coming technological singularity: How to survive in the post-human era. Science fiction criticism: An anthology of essential writings, 81:352–363, 1993. Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., and Anandkumar, A. Voyager: An open- ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023. Wilson, C.A model of insurance markets with in- complete information.Journal of Economic Theory, 16(2):167–207, 1977.ISSN 0022-0531. doi:https://doi.org/10.1016/0022-0531(77)90004-7. URLhttps://w.sciencedirect.com/ science/article/pii/0022053177900047. Wolfers, J. and Zitzewitz, E.Prediction markets. Journal of Economic Perspectives, 18(2):107–126, June 2004. doi: 10.1257/0895330041371321. URL https://w.aeaweb.org/articles?id=10. 1257/0895330041371321. Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. In First Conference on Language Modeling, 2024. Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., and Wang, J. Mean field multi-agent reinforcement learning, 2020. URL https://arxiv.org/abs/1802.05438. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, 2022. Zhang, A. L., Kraska, T., and Khattab, O. Recursive lan- guage models. arXiv preprint arXiv:2512.24601, 2025. Zhang, Y. and Zhang, T. Generative ai and information asymmetry: Impacts on adverse selection and moral haz- ard. arXiv preprint arXiv:2502.12969, 2025. 11 Multi-agent systems should be treated as principal-agent problems 12