Paper deep dive
From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration
Gaole He, Brian Y. Lim
Abstract
Abstract:Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks. However, human-agent interaction remains pointwise and reactive: users approve or correct individual actions to mitigate immediate risks, without visibility into subsequent consequences. This forces users to mentally simulate long-term effects, a cognitively demanding and often inaccurate process. Users have control over individual steps but lack the foresight to make informed decisions. We argue that effective collaboration requires foresight, not just control. We propose simulation-in-the-loop, an interaction paradigm that enables users and agents to explore simulated future trajectories before committing to decisions. Simulation transforms intervention from reactive guesswork into informed exploration, while helping users discover latent constraints and preferences along the way. This perspective paper characterizes the limitations of current paradigms, introduces a conceptual framework for simulation-based collaboration, and illustrates its potential through concrete human-agent collaboration scenarios.
Tags
Links
- Source: https://arxiv.org/abs/2603.11677v1
- Canonical: https://arxiv.org/abs/2603.11677v1
PDF not stored locally. Use the link above to view on the source site.
Intelligence
Status: failed | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 0%
Last extracted: 3/13/2026, 1:15:23 AM
OpenRouter request failed (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 65536 tokens, but can only afford 52954. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher monthly limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2shvuzpVFCCndDdGXIdfi40gIMy"}
Entities (0)
Relation Signals (0)
No relation signals yet.
Cypher Suggestions (0)
No Cypher suggestions yet.
Full Text
20,292 characters extracted from source content.
Expand or collapse full text
From Control to Foresight: Simulation as a New Paradigm for Human–Agent Collaboration GAOLE HE, National University of Singapore, Singapore BRIAN Y. LIM, National University of Singapore, Singapore Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks. However, human- agent interaction remains pointwise and reactive: users approve or correct individual actions to mitigate immediate risks, without visibility into subsequent consequences. This forces users to mentally simulate long-term effects, a cognitively demanding and often inaccurate process. Users have control over individual steps but lack the foresight to make informed decisions. We argue that effective collaboration requires foresight, not just control. We propose simulation-in-the-loop, an interaction paradigm that enables users and agents to explore simulated future trajectories before committing to decisions. Simulation transforms intervention from reactive guesswork into informed exploration, while helping users discover latent constraints and preferences along the way. This perspective paper characterizes the limitations of current paradigms, introduces a conceptual framework for simulation-based collaboration, and illustrates its potential through concrete human-agent collaboration scenarios. Additional Key Words and Phrases: Human-agent Collaboration; LLM Agents, Simulation; Human-computer Interaction ACM Reference Format: Gaole He and Brian Y. Lim. 2026. From Control to Foresight: Simulation as a New Paradigm for Human–Agent Collaboration. In CHI 2026 Workshop on Human-Agent Collaboration, April 13–17, 2026, Barcelona, Spain. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/n.n 1 Introduction Equipped with toolkits such as browser search and machine access, large language models (LLMs) have shown promising potential to interact with the world and assist with more complex, multi-step tasks—ranging from travel planning to code generation [15,18]. To ensure reliable and accountable outcomes, LLM agents are often supervised by humans [5,8]: the agent proposes a sequence of actions, and the human is asked to approve or correct decisions at key junctures. However, this form of collaboration rests on an implicit assumption: that humans can act as oracles [4,12], making sound decisions with minimal context. In practice, when humans are inserted into multi-step workflows, they are asked to make critical decisions with limited information—without visibility into how their approval/action might shape subsequent outcomes [3]. Without visibility into downstream consequences, human intervention becomes short-sighted rather than informed collaboration [11]. In these cases, humans are forced to rely on intuition or mental simulation, both of which are prone to error—especially as task complexity grows. This limitation is especially pronounced in long-horizon tasks, where early decisions cascade into future outcomes in ways that are difficult to anticipate [9]. For example, a seemingly plausible choice (e.g., booking a tight connection flight) can propagate through the planning horizon, amplifying or constraining downstream possibilities. Yet current Authors’ Contact Information: Gaole He, hegaole@nus.edu.sg, National University of Singapore, Singapore, Singapore; Brian Y. Lim, brianlim@nus.edu.sg, National University of Singapore, Singapore, Singapore. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2026 Copyright held by the owner/author(s). Manuscript submitted to ACM Manuscript submitted to ACM1 arXiv:2603.11677v1 [cs.HC] 12 Mar 2026 2He et al. human-agent interactions do not proactively support reasoning about these ripple effects. Without such support, users are left to mentally simulate alternative futures, a process that is cognitively demanding and notoriously unreliable, especially when tasks involve long-range dependencies and stochastic outcomes. For instance, a delayed flight can cascade into missed connections. Worse still, this narrow view forecloses serendipity [13]: when humans see only the immediate next step and react to it, they miss the opportunity to discover unexpected but valuable alternatives that lie off the agent’s proposed path. This results in a substantial asymmetry: while LLM agents can explore possible actions (e.g., tree-based search over the action space [1,10]) and their subsequent impacts, the human collaborator is given access to only a single path through that tree—the trajectory proposed by the agent. We argue that addressing this asymmetry requires more than just giving humans control over actions—it requires giving them foresight into the consequences of those actions. Control without foresight is like driving at night with no headlights: you can turn the wheel, but you cannot see what lies ahead. To this end, we propose simulation-in-the-loop collaboration, an interaction paradigm that allows humans and agents to preview counterfactual future trajectories before committing to decisions. By generating and visualizing possible outcomes across multiple paths, simulation transforms human intervention from reactive guesswork into proactive exploration. It also creates space for serendipity: as users explore simulated futures, they may discover valuable alternatives or latent constraints that were not apparent from the agent’s initial proposal. In this paper, we (1) articulate the limitations of existing pointwise interaction paradigms, (2) introduce a conceptual framework and design space for simulation-based collaboration, and (3) illustrate the approach through concrete scenarios with LLM agents and other planning tasks. 2 Simulation-in-the-loop Collaboration We introduce simulation-in-the-loop collaboration, an interaction paradigm in which humans and agents jointly explore simulated future trajectories before committing to real-world actions. 2.1 Concept and Definition First, we introduce four core concepts that ground our framework: • Agentic Workflow: The unit of analysis is a multi-step task performed by an LLM agent under human oversight (e.g., travel planning). The workflow proceeds through a sequence of actions, during which the agent may request human input. For simplicity, we can view it as a step-by-step planning-execution process [5,8]: at each step, the agent predicts an action, and the human may approve, modify, or override it before execution. • Action Space: At each step, the agent considers multiple possible actions (e.g., search flights, search relevant news, and search social media), each leading to different downstream trajectories. LLM agents inherently explore such trees during planning—through beam search [14], Monte Carlo tree search [1,2], or implicit generation of alternatives—but this exploration typically remains internal and invisible to the human. •Simulation: We define simulation as the agent’s ability to externalize this internal exploration: before committing to a decision, the agent generates and presents multiple future trajectories for human preview. Simulation is not planning (which seeks an optimal path) but rather exploration for sensemaking—making the tree of possibilities visible and navigable. •Simulated Impact: Each simulated trajectory is annotated with key outcomes—risks, opportunities, trade-offs, uncertainties—that help humans compare alternatives. Simulated impact translates abstract futures into concrete, decision-relevant outcomes, enabling foresight rather than guesswork. Manuscript submitted to ACM From Control to Foresight: Simulation as a New Paradigm for Human–Agent Collaboration3 Step 1Step 2Decision PointStep 3Step 4 CurrentHuman-agentCollaboration Agent Proposes Let’stake actionA Human Sees •Only action A&relevantcontext •Notawareofotheralternatives/potential subsequentimpacts Human Decides Approve / Modify/Override Simulation-in-the-Loop Collaboration Agent Proposes + Invites Exploration Suggestaction A, but let's explore alternatives Action Space Path A •30%delayrisk •Expectation: arrivalontime Path B •Nodelayrisk •$50morein budget Path C •Skiponemeeting •Save2hours Path D •Highuncertainty •Potential Opportunity Human Explores, Compares, Decides Informed choice based on simulated outcomes SimulatedOutcome AgenticWorkflow Fig. 1. Comparison of interaction paradigms at a decision point. 2.2 Illustrative Scenario We ground our framework in one concrete scenario—a multi-city trip planning. As visualized in Figure 1, the simulation acts as an intermediate layer between agent proposals and human commitment. At a decision point, the agent must choose whether to book a tight connection between flights. In the current human-agent collaboration mode, the agent would simply propose its preferred option i.e., Path A: a direct flight with a 1-hour layover—and ask for feedback. The user, seeing only that this option is available and cheaper than alternatives, might approve without realizing the downstream risk or that lower-risk alternatives even exist. With simulation-in-the-loop, the agent externalizes multiple alternatives (Paths A-D in Figure 1). Path A (the agent’s proposal) is annotated with a simulated impact: 30% delay risk due to short connection time. Path B, a later flight with a longer layover, shows no delay risk but adds $50 cost. By comparing these simulated outcomes, the human discovers a risk they hadn’t considered and can make an informed trade-off between time and reliability. The simulation also reveals Path D: a flight into a different airport, which opens up new options the user hadn’t considered, illustrating how simulation enables serendipitous discovery. By externalizing possible futures, the human’s role shifts from reactive supervision to proactive planning and negotiation with the agent. 2.3 Design Space for Simulation-Based Interaction Designing effective simulations involves navigating trade-offs between three key dimensions. These dimensions are not merely technical parameters. They shape how humans reason, trust, and collaborate with agents—making simulation a design choice. Lookahead Depth. How far into the future should simulations project? Deeper lookahead provides greater foresight but risks information overload and compounding uncertainty. Shallower previews are more reliable but may miss critical downstream effects. Manuscript submitted to ACM 4He et al. Exploration Breadth. How many alternative futures should be shown? A single trajectory minimizes cognitive load but risks tunnel vision. Multiple branches enable comparison and serendipity, but may overwhelm. Beyond quantity, systems must also convey outcome diversity—ensuring displayed futures represent meaningfully different possibilities. Granularity. How detailed should simulations be? Fine-grained simulations (e.g., code execution) provide rich infor- mation but incur latency. Coarse-grained approximations (e.g., LLM sketches) are faster but risk omitting critical details or hallucinating. The balance depends on task criticality. 3 Challenges and Opportunities Implementing simulation-in-the-loop collaboration introduces several challenges, yet also opens new directions for human-agent collaboration. 3.1 Challenges Simulation Reliability. Simulation requires a model or environment that can generate plausible future trajectories. While this is feasible for tasks with well-defined environments (e.g., games and code execution), it becomes technically challenging in open-ended domains where world dynamics are less structured. Relying on LLMs to simulate their own futures—asking the agent to predict “what if”—offers a tempting but uncertain alternative. LLM-generated simulations may hallucinate, omit critical dependencies, or produce overly optimistic trajectories [6,7,16]. This highlights a need for more reliable world models [19] that can support simulation across diverse, open-ended tasks. What to Simulate. Not all possible futures are valuable for users’ attention. Simulations can generate an abundance of trajectories, but presenting trivial or near-identical options. Thus, it is important to identify which outcomes are non- trivial and decision-relevant: surfacing paths that reveal genuine trade-offs, hidden risks, or unexpected opportunities, while filtering out those that offer no new insight. Cognitive Load. Even with careful filtering, comparing multiple futures imposes cognitive demands. The interface is supposed to help users navigate across trajectories, understand trade-offs, and track their exploration—without becoming a source of confusion itself. If users cannot integrate simulated outcomes into their decisions, simulation adds noise rather than insight. 3.2 Oppotunities From Reactive to Proactive Collaboration. Current proactive agents act autonomously unless interrupted. Simulation- in-the-loop offers a middle ground: agents proactively show possible futures, inviting human input before acting. This shifts collaboration from “human as supervisor” to “human as explorer”. Enabling Backtracking by Anticipation. Recent work on backtracking agents [17] focuses on error detection and recovery in LLM agents. Simulation flips this backward-looking repair into forward-looking prevention—helping humans and agents avoid dead ends before committing. Discovering Latent Constraints and Needs. As users explore simulated futures, they encounter constraints embedded in the task—dependencies, resource limits, timing conflicts—that were not visible from the initial proposal. At the same time, they may discover gaps between their expectations and what is achievable, revealing unstated preferences or new goals. This turns collaboration into joint discovery: requirements emerge dynamically through exploration, and the agent’s final outcome improves precisely because these latent factors are surfaced before commitment, not after. Manuscript submitted to ACM From Control to Foresight: Simulation as a New Paradigm for Human–Agent Collaboration5 References [1] Shenghui Chen, Daniel Fried, and Ufuk Topcu. 2024. Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024. ijcai.org, 7833–7841. https://w.ijcai.org/proceedings/2024/867 [2]Shenghui Chen, Ruihan Zhao, Sandeep Chinchali, and Ufuk Topcu. 2025. Human-Agent Coordination in Games under Incomplete Information via Multi-Step Intent. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). [3]Maurice Chiodo, Dennis Müller, Paul Siewert, Jean-Luc Wetherall, Zoya Yasmine, and John Burden. 2025. Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility. CoRR abs/2505.10426 (2025). doi:10.48550/ARXIV.2505.10426 [4]Katherine Maeve Collins, Matthew Barker, Mateo Espinosa Zarlenga, Naveen Raman, Umang Bhatt, Mateja Jamnik, Ilia Sucholutsky, Adrian Weller, and Krishnamurthy Dvijotham. 2023. Human Uncertainty in Concept-Based AI Systems. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2023, Montréal, QC, Canada, August 8-10, 2023, Francesca Rossi, Sanmay Das, Jenny Davis, Kay Firth-Butterfield, and Alex John (Eds.). ACM, 869–889. [5]K. J. Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S. Weld, Amy X. Zhang, and Joseph Chee Chang. 2024. Cocoa: Co-Planning and Co-Execution with AI Agents. CoRR abs/2412.10999 (2024). arXiv:2412.10999 doi:10.48550/ARXIV.2412.10999 [6]Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2024. Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanities and Social Sciences Communications 11, 1 (2024), 1–24. [7] Giacomo Grattafiori, Jun Hu, and Baolin Peng. 2025. From word to world: Can large language models be implicit text-based world models? arXiv preprint arXiv:2512.18832 (2025). [8] Gaole He, Gianluca Demartini, and Ujwal Gadiraju. 2025. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–22. [9] Gaole He, Patrick Hemmer, Michael Vössing, Max Schemmer, and Ujwal Gadiraju. 2025. Fine-Grained Appropriate Reliance: Human-AI Collaboration with a Multi-Step Transparent Decision Workflow for Complex Task Decomposition. arXiv preprint arXiv:2501.10909 (2025). [10]Jing Yu Koh, Stephen Marcus McAleer, Daniel Fried, and Ruslan Salakhutdinov. 2025. Tree Search for Language Model Agents. Transactions on Machine Learning Research (2025). https://openreview.net/forum?id=QF0N3x2XVm [11] Noam Kolt. 2025. Governing AI Agents. arXiv:2501.07913 [cs.AI] https://arxiv.org/abs/2501.07913 [12]Stephan J. Lemmer and Jason J. Corso. 2023. Evaluating and Improving Interactions with Hazy Oracles. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, Brian Williams, Yiling Chen, and Jennifer Neville (Eds.). AAAI Press, 6039–6047. [13]Zainab Salma, Raquel Hijón-Neira, and Celeste Pizarro. 2025. Designing Co-Creative Systems: Five Paradoxes in Human-AI Collaboration. Inf. 16, 10 (2025), 909. doi:10.3390/INFO16100909 [14] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 3104–3112. [15]Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al.2024. A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 186345. [16] Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi Yuan, Marc-Alexandre Côté, Peter Clark, and Peter A. Jansen. 2024. Can Language Models Serve as Text-Based World Simulators?. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, 1–17. [17]Qinzhuo Wu, Pengzhi Gao, Wei Liu, and Jian Luan. 2025. BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, China, 4250–4272. doi:10.18653/v1/2025.emnlp-main.212 [18]Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al.2025. The rise and potential of large language model based agents: A survey. Science China Information Sciences 68, 2 (2025), 121101. [19] Jiannan Xiang, Yi Gu, Zihan Liu, Zeyu Feng, Qiyue Gao, Yiyan Hu, Benhao Huang, Guangyi Liu, Yichi Yang, Kun Zhou, Davit Abrahamyan, Arif Ahmad, Ganesh Bannur, Junrong Chen, Kimi Chen, Mingkai Deng, Ruobing Han, Xinqi Huang, Haoqiang Kang, Zheqi Li, Enze Ma, Hector Ren, Yashowardhan Shinde, Rohan Shingre, Ramsundar Tanikella, Kaiming Tao, Dequan Yang, Xinle Yu, Cong Zeng, Binglin Zhou, Hector Liu, Zhiting Hu, and Eric P. Xing. 2025. PAN: A World Model for General, Interactable, and Long-Horizon World Simulation. CoRR abs/2511.09057 (2025). doi:10.48550/ARXIV.2511.09057 Manuscript submitted to ACM