Paper deep dive

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, Jishen Zhao

Year: 2026Venue: arXiv preprintArea: cs.ARType: PreprintEmbeddings: 18

Abstract

Abstract:As LLM agents evolve into collaborative multi-agent systems, their memory requirements grow rapidly in complexity. This position paper frames multi-agent memory as a computer architecture problem. We distinguish shared and distributed memory paradigms, propose a three-layer memory hierarchy (I/O, cache, and memory), and identify two critical protocol gaps: cache sharing across agents and structured memory access control. We argue that the most pressing open challenge is multi-agent memory consistency. Our architectural framing provides a foundation for building reliable, scalable multi-agent systems.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/13/2026, 12:56:51 AM

Summary

This paper proposes framing multi-agent LLM memory as a computer architecture problem, introducing a three-layer memory hierarchy (I/O, cache, memory) and identifying critical gaps in cache sharing protocols, memory access control, and multi-agent memory consistency models.

Entities (5)

Multi-Agent Memory · concept · 98%Multi-Agent Memory Consistency · concept · 96%Agent Memory Hierarchy · architecture · 95%Agent Cache Sharing Protocol · protocol · 92%Agent Memory Access Protocol · protocol · 92%

Relation Signals (4)

Agent Memory Hierarchy → comprises → Agent I/O Layer

confidence 95% · Agent I/O layer: Interfaces that ingest and emit information

Agent Memory Hierarchy → comprises → Agent Cache Layer

confidence 95% · Agent cache layer: fast, limited-capacity memory for immediate reasoning

Agent Memory Hierarchy → comprises → Agent Memory Layer

confidence 95% · Agent memory layer: large-capacity, slower memory optimized for retrieval and persistence

Multi-Agent Memory → requires → Multi-Agent Memory Consistency

confidence 94% · We argue that the most pressing open challenge is multi-agent memory consistency.

Cypher Suggestions (2)

Find all components of the proposed agent memory hierarchy. · confidence 90% · unvalidated

MATCH (h:Architecture {name: 'Agent Memory Hierarchy'})-[:COMPRISES]->(c:Component) RETURN c.name

Identify critical challenges associated with multi-agent memory. · confidence 90% · unvalidated

MATCH (m:Concept {name: 'Multi-Agent Memory'})-[:REQUIRES]->(c:Concept) RETURN c.name

Full Text

17,313 characters extracted from source content.

Expand or collapse full text

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead Zhongming Yu zhy025@ucsd.edu University of California, San Diego San Diego, California, USA Naicheng Yu n7yu@ucsd.edu University of California, San Diego San Diego, California, USA Hejia Zhang hez024@ucsd.edu University of California, San Diego San Diego, California, USA Wentao Ni w2ni@ucsd.edu University of California, San Diego San Diego, California, USA Mingrui Yin m3yin@ucsd.edu University of California, San Diego San Diego, California, USA Jiaying Yang jesyjy7@gatech.edu Georgia Institute of Technology Atlanta, Georgia, USA Yujie Zhao yuz285@ucsd.edu University of California, San Diego San Diego, California, USA Jishen Zhao jzhao@ucsd.edu University of California, San Diego San Diego, California, USA Abstract As LLM agents evolve into collaborative multi-agent systems, their memory requirements grow rapidly in complexity. This position paper frames multi-agent memory as a computer architecture prob- lem. We distinguish shared and distributed memory paradigms, propose a three-layer memory hierarchy (I/O, cache, and memory), and identify two critical protocol gaps: cache sharing across agents and structured memory access control. We argue that the most pressing open challenge is multi-agent memory consistency. Our architectural framing provides a foundation for building reliable, scalable multi-agent systems. 1 Introduction Large language model (LLM) agents [1,2] are quickly moving from “single agent” tools [3] to multi-agent systems [4,5]: tool-using agents [6], planner–orchestrator stacks [7], debate teams [8,9], and specialized sub-agents that collaborate to solve tasks [10,11]. At the same time, the context these agents operate within is becoming more complex: longer histories, multiple modalities, structured traces, and customized environments. This combination creates a bottleneck that looks surprisingly familiar to computer architects: memory. In computer systems, performance and scalability are often lim- ited not by compute but by memory hierarchy, bandwidth, and consistency. Multi-agent systems are heading toward the same wall—except their “memory” is not raw bytes, but semantic context used for reasoning. This position paper frames multi-agent memory as a computer architecture problem and highlights key protocol and consistency gaps. 2 Why Memory Matters: Context Is Changing LLM evaluations show that “real” context ability involves more than simple retrieval; it requires multi-hop tracing, aggregation, and sustained reasoning as context length scales. Multimodal bench- marks add images, diagrams, and videos. Structured tasks introduce Figure 1: Two fundamental multi-agent memory architec- tures for managing growing context complexity: shared mem- ory and distributed memory. executable traces and schemas. Interactive environments make envi- ronment state + execution part of the memory problem. The result is not a static prompt but a dynamic, multi-format, partially persistent memory system. •Longer context windows: Suites like RULER [12] emphasize reasoning over long histories, not just retrieval. • Multimodal inputs: Benchmarks such as MMMU [13] and VideoMME [14] require joint reasoning over images and videos. •Structured data & traces: Text-to-SQL datasets like Spider [15] and BIRD [16] show that agents increasingly operate over struc- tured, executable traces. • Customized environments: Evaluations such as SWE-bench [17] and OSWorld [18] stress long-horizon state tracking and grounded actions. As such, context is no longer a static prompt; it is a dynamic mem- ory system with bandwidth, caching, and coherence constraints. 3 Shared vs. Distributed Agent Memory Here we name two basic prototypes that mirror classical memory systems. In shared memory, all agents access a shared pool (e.g., a arXiv:2603.10062v1 [cs.AR] 9 Mar 2026 Architecture 2.0 ’26, March 23, 2026, Pittsburgh, PA, USAZhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, and Jishen Zhao Agent IO Layer Agent Cache Layer Agent Memory Layer Full Dialogue History External Knowledge DataBase Long-Term Storage (Latent) Compressed Context Recent Trajectories and Tool Calls Short-Term Storage (Latent) User Inputs: Audio, Text, Image; Network (a) Agent Mem Inspired by Computer Architecture Direct Retrieve/Store Populate Return Persist Lookup (b) Protocol Extension for Multi-Agent Scenarios Agent Memory Access Agent Cache Sharing Agent Context IO MCP JSON-RPC e.g. Agent-RDMA Cache Reuse Cache Protocol Agent Cache Sharing protocol enables one agent’s cached artifacts to be transformed and reused by other agents. Agents registered via MCP can connect and communicate, but inter-agent bandwidth remains limited by context. Agent Memory Access Protocol defines how agents read/ write other agents’ memory, including permissions, scope, and access granularity. Similar to computer memory hierarchies, agent memory also benefits from I/O and caching layers to improve efficiency and scalability. Figure 2: Agent memory hierarchy and protocol framing. shared vector store or document database). In distributed memory, each agent owns local memory and synchronizes selectively. Shared memory makes knowledge reuse easy but requires co- herence support; without coordination, agents overwrite each other, read stale information, or rely on inconsistent versions of shared facts. Distributed memory improves isolation and scalability but requires explicit synchronization; state divergence becomes com- mon unless carefully managed. Most real systems sit between these extremes: local working memory with selectively shared artifacts. 4 An Architecture-Inspired Memory Hierarchy Computer architecture teaches a practical lesson: systems are not designed around “one memory.” Instead, they are built as memory hierarchies with layers optimized for latency, bandwidth, capacity, and persistence. A useful mapping for agents is as follows: Agent I/O layer: Interfaces that ingest and emit information (audio, text documents, images, network calls). Agent cache layer: fast, limited-capacity memory for immediate reasoning (compressed context, recent tool calls, short-term latent storage such as KV caches and embeddings). Agent memory layer: large-capacity, slower memory optimized for retrieval and persistence (full dia- logue history, vector DBs, graph DBs, and document stores). This framing emphasizes a key principle: agent performance is an end-to-end data movement problem. If relevant information is stuck in the wrong layer (or never loaded), reasoning accuracy and efficiency degrade. As in hardware, caching is not optional. 5Protocol Extensions for Multi-Agent Scenarios Architecture layers need protocols. Many systems rely on connectiv- ity protocols, but inter-agent bandwidth remains limited by message passing. This layer is best viewed as agent context I/O, e.g. MCP [19]. That is necessary—but not sufficient. Two missing pieces stand out. Missing piece 1: Agent cache sharing protocol. Recent work explores KV cache sharing [20–22], but we lack a principled proto- col for sharing cached artifacts across agents. The goal is to enable one agent’s cached results to be transformed and reused by another, analogous to cache transfers in multiprocessors. Memory AddressesSemantic Context Shared Information Architecture Memory Consistency R/W Key Features: Hardware-level guarantees Well-defined models: Sequential, TSO, etc Atomicity concerns Synchronization primitives Reordering constraints CPU Goal: Read/Write Order Agent Memory Consistency Key Features: Temporal coherence Context-related retrieval Update for new/Delete old info Conflict Resolution Personalization Goal: Maintain Coherent Context Multi-Agent Memory Consistency Key Features: Shared Memory/Distributed Memory Consistency Model Support Inter-Agent Communication Conflict Resolution Goal: Coordinate Shared Information Figure 3: Consistency model comparison from traditional memory architecture to multi-agent memory. Missing piece 2: Agent memory access protocol. Agentic memory frameworks [23–26] propose many strategies for main- taining and optimizing LLM agents’ memory. Yet even when some frameworks support shared state, the standard access protocol (per- missions, scope, granularity) remains under-specified. Key ques- tions include: Can one agent read another’s long-term memory? Is access read-only or read-write? What is the unit of access: a document, a chunk, a key-value record, or a trace segment? 6 The Next Frontier: Multi-Agent Consistency The largest conceptual gap is consistency. In computer architecture, consistency models [27] specify which updates are visible to a read and in what order concurrent updates may be observed. We argue that agent memory systems require an analogous notion. For a single LLM agent, consistency demands that its memory remains temporally coherent [28] — new information must be integrated without contradicting established facts, and retrievals must reflect the most current state. Here, consistency is a stateful property of persistent, evolving knowledge. When we move to multi-agent set- tings, the problem compounds: multiple agents now read from and write to shared memory concurrently, raising classical challenges of visibility, ordering, and conflict resolution. For agent memory systems, multi-agent memory consistency de- composes into two requirements: read-time conflict handling under iterative revisions, where records evolve across versions and stale artifacts may remain visible, and update-time visibility and ordering that determines when an agent’s writes become observ- able to others and how concurrent writes may be observed in a permissible order. This is harder than classical settings because memory artifacts are heterogeneous (evidence, tool traces, plans), and conflicts are often semantic and coupled to environment state. A practical direction is to make versioning, visibility, and conflict- resolution rules explicit, so agents agree on what to read and when updates take effect. 7 Conclusion Many agent memory systems today resemble human memory: informal, redundant, and hard to control. To move from ad-hoc prompting to reliable multi-agent systems, we need better hierar- chies, explicit protocols for cache sharing and memory access, and principled consistency models that keep shared context coherent. We believe this architecture framing is a foundational research direction for next-generation agent systems. Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges AheadArchitecture 2.0 ’26, March 23, 2026, Pittsburgh, PA, USA References [1]Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations. [2]Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36 (2023), 8634–8652. [3]Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems 36 (2023), 68539–68551. [4] Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024). [5]Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. 2023. MetaGPT: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations. [6]Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al.2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. In First Conference on Language Modeling. [7]LangChain Inc. 2026. LangGraph overview. https://docs.langchain.com/oss/ python/langgraph/overview. Accessed: 2026-02-11. [8]Microsoft. 2026. Multi-Agent Debate. https://microsoft.github.io/autogen/stable/ user-guide/core-user-guide/design-patterns/multi-agent-debate.html. Accessed: 2026-02-11. [9]Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2023. Chateval: Towards better llm-based evalu- ators through multi-agent debate. arXiv preprint arXiv:2308.07201 (2023). [10] LangChain. 2026. Subagents. https://docs.langchain.com/oss/python/langchain/ multi-agent/subagents. Accessed: 2026-02-11. [11] Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants. Findings of the Association for Computational Linguistics: EMNLP 2025 (2025), 5977–6043. [12]Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models? arXiv preprint arXiv:2404.06654 (2024). [13]Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, et al.2024. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9556–9567. [14]Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, et al.2025. Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis. In Proceedings of the Computer Vision and Pattern Recognition Conference. 24108–24118. [15] Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al.2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887 (2018). [16]Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al.2023. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2023), 42330–42357. [17]Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770 (2023). [18] Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh J Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al.2024. Os- world: Benchmarking multimodal agents for open-ended tasks in real computer environments. Advances in Neural Information Processing Systems 37 (2024), 52040–52094. [19]Anthropic. 2025.Model Context Protocol: Introduction.https:// modelcontextprotocol.io/docs/getting-started/intro [20]Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, et al.2024. DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving. arXiv preprint arXiv:2411.02820 (2024). [21]Tianyu Fu, Zihan Min, Hanling Zhang, Jichao Yan, Guohao Dai, Wanli Ouyang, and Yu Wang. 2025. Cache-to-cache: Direct semantic communication between large language models. arXiv preprint arXiv:2510.03215 (2025). [22]Hancheng Ye, Zhengqi Gao, Mingyuan Ma, Qinsi Wang, Yuzhe Fu, Ming-Yu Chung, Yueqian Lin, Zhijian Liu, Jianyi Zhang, Danyang Zhuo, et al.2025. Kv- comm: Online cross-context kv-cache communication for efficient llm-based multi-agent systems. arXiv preprint arXiv:2510.12872 (2025). [23]Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems. arXiv preprint arXiv:2310.08560 (2023). [24]Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. 2025. A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110 (2025). [25]Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Ya- dav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413 (2025). [26]Hongjin Qian, Peitian Zhang, Zheng Liu, Kelong Mao, and Zhicheng Dou. 2024. Memorag: Moving towards next-gen rag via memory-inspired knowledge dis- covery. arXiv preprint arXiv:2409.05591 1 (2024). [27]Daniel Sorin, Mark Hill, and David Wood. 2011. A primer on memory consistency and cache coherence. Morgan & Claypool Publishers. [28]Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology. 1–22.