Paper deep dive
AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem
Rui Liu, Tao Zhe, Dongjie Wang, Zijun Yao, Kunpeng Liu, Yanjie Fu, Huan Liu, Jian Pei
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%
Last extracted: 3/13/2026, 12:56:48 AM
Summary
The paper proposes 'AgentOS', a paradigm shift from legacy GUI-based operating systems to a natural language-driven architecture. AgentOS replaces traditional desktops with a 'Single Port' interface and an 'Agent Kernel' that manages multi-agent orchestration, intent mining, and skill-based modular software. The authors frame the realization of AgentOS as a Knowledge Discovery and Data Mining (KDD) challenge, requiring real-time intent parsing, personal knowledge graph construction, and sequential pattern mining for workflow optimization, while addressing security through a 'Semantic Firewall'.
Entities (6)
Relation Signals (4)
AgentOS → utilizes → Agent Kernel
confidence 100% · At the core of AgentOS lies the Agent Kernel
AgentOS → replaces → GUI
confidence 98% · In AgentOS, traditional GUI desktops are replaced by a Natural User Interface (NUI)
Agent Kernel → implements → Semantic Firewall
confidence 95% · Integrated within the Agent Kernel, it [Semantic Firewall] functions as a real-time text and data mining security layer
Agent Kernel → manages → Personal Knowledge Graph
confidence 95% · AgentOS processes multimodal interaction streams... to update the PKG in real time.
Cypher Suggestions (2)
Find all components of the AgentOS architecture · confidence 90% · unvalidated
MATCH (a:SystemComponent)-[:PART_OF]->(os:OperatingSystem {name: 'AgentOS'}) RETURN aIdentify security mechanisms protecting the Agent Kernel · confidence 85% · unvalidated
MATCH (k:SystemComponent {name: 'Agent Kernel'})<-[:PROTECTS]-(s:SecurityMechanism) RETURN sAbstract
Abstract:The rapid emergence of open-source, locally hosted intelligent agents marks a critical inflection point in human-computer interaction. Systems such as OpenClaw demonstrate that Large Language Model (LLM)-based agents can autonomously operate local computing environments, orchestrate workflows, and integrate external tools. However, within the current paradigm, these agents remain conventional applications running on legacy operating systems originally designed for Graphical User Interfaces (GUIs) or Command Line Interfaces (CLIs). This architectural mismatch leads to fragmented interaction models, poorly structured permission management (often described as "Shadow AI"), and severe context fragmentation. This paper proposes a new paradigm: a Personal Agent Operating System (AgentOS). In AgentOS, traditional GUI desktops are replaced by a Natural User Interface (NUI) centered on a unified natural language or voice portal. The system core becomes an Agent Kernel that interprets user intent, decomposes tasks, and coordinates multiple agents, while traditional applications evolve into modular Skills-as-Modules enabling users to compose software through natural language rules. We argue that realizing AgentOS fundamentally becomes a Knowledge Discovery and Data Mining (KDD) problem. The Agent Kernel must operate as a real-time engine for intent mining and knowledge discovery. Viewed through this lens, the operating system becomes a continuous data mining pipeline involving sequential pattern mining for workflow automation, recommender systems for skill retrieval, and dynamically evolving personal knowledge graphs. These challenges define a new research agenda for the KDD community in building the next generation of intelligent computing systems.
Tags
Links
- Source: https://arxiv.org/abs/2603.08938v2
- Canonical: https://arxiv.org/abs/2603.08938v2
PDF not stored locally. Use the link above to view on the source site.
Full Text
36,011 characters extracted from source content.
Expand or collapse full text
AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem Rui Liu University of Kansas Lawrence, USA rayliu@ku.edu Tao Zhe University of Kansas Lawrence, USA taozhe@ku.edu Dongjie Wang ∗ University of Kansas Lawrence, USA wangdongjie@ku.edu Zijun Yao University of Kansas Lawrence, USA zyao@ku.edu Kunpeng Liu Clemson University Clemson, USA kunpenl@clemson.edu Yanjie Fu Arizona State University Tempe, USA yanjie.fu@asu.edu Huan Liu Arizona State University Tempe, USA huanliu@asu.edu Jian Pei Duke University Durham, USA j.pei@duke.edu Abstract The rapid emergence of open-source, locally hosted intelligent agents marks a critical inflection point in human-computer interac- tion. Systems such as OpenClaw demonstrate that Large Language Model (LLM)-based agents can autonomously operate local comput- ing environments, orchestrate workflows, and integrate external tools. However, within the current paradigm, these agents remain conventional applications running on legacy operating systems originally designed for Graphical User Interfaces (GUIs) or Com- mand Line Interfaces (CLIs). This architectural mismatch leads to fragmented interaction models, poorly structured permission management (often described as "Shadow AI"), and severe context fragmentation. This paper proposes a new paradigm: a Personal Agent Operating System (AgentOS). In AgentOS, traditional GUI desktops are replaced by a Natural User Interface (NUI) centered on a unified natural language or voice portal. The system core becomes an Agent Kernel that interprets user intent, decomposes tasks, and coordinates multiple agents, while traditional applications evolve into modular Skills-as-Modules enabling users to compose software through natural language rules. We argue that realizing AgentOS fundamentally becomes a Knowledge Discovery and Data Mining (KDD) problem. The Agent Kernel must operate as a real-time en- gine for intent mining and knowledge discovery. Viewed through this lens, the operating system becomes a continuous data mining pipeline involving sequential pattern mining for workflow automa- tion, recommender systems for skill retrieval, and dynamically evolving personal knowledge graphs. These challenges define a new research agenda for the KDD community in building the next generation of intelligent computing systems. 1Introduction: The OpenClaw Moment and the End of the GUI 1.1 The Eruption of Local Autonomous Agents In early 2026, the computing landscape experienced a rapid shift driven by locally deployed autonomous agents, exemplified by ∗ Corresponding author OpenClaw (formerly Moltbot and Clawdbot). As an open-source, locally self-hosted AI assistant, OpenClaw quickly gained atten- tion within the developer community, accumulating over 100,000 GitHub stars within weeks. It demonstrated a new paradigm of hu- man–computer interaction: an AI agent capable of directly operat- ing personal computing environments on behalf of users [20,26,29]. By integrating messaging gateways (e.g., WhatsApp, Telegram, iMessage, and Slack) with the Model Context Protocol (MCP), Open- Claw functions as a persistent background agent capable of reading and writing files, executing terminal commands, managing calen- dars, and browsing the web [1,18]. Developers often describe it as “Claude with hands,” highlighting that it is not merely a con- versational interface but a continuously running agent interacting directly with the operating system. Through an autonomous agentic loop, it maintains long-term memory and executes long-horizon tasks with minimal supervision [15]. Meanwhile, proprietary sys- tems—such as Anthropic’s Claude Code Remote Control and Per- plexity Computer—illustrate growing demand for scheduled task execution, deep contextual integration, and remote AI orchestration in enterprise and developer workflows. However, the rapid adop- tion of such agents exposes a fundamental limitation of modern computing architectures: they still operate as user-space processes on legacy operating systems (e.g., Windows, macOS, and Linux) designed for human-driven graphical interfaces and isolated appli- cation silos rather than continuously operating autonomous agents. 1.2 The "Shadow AI" Crisis and the "Screen-as-Interface" Bottleneck Forcing autonomous agents to operate within legacy operating sys- tems creates a phenomenon analogous to “Shadow IT,” which we term Shadow AI. Because these systems lack semantic understand- ing of agent actions, agents rely on brittle interaction workarounds. Consequently, many mobile and desktop agents operate under a “Screen-as-Interface” paradigm, using visual scraping (pixels-to- text) or simulated GUI interactions (e.g., mouse clicks and keystrokes) to perform tasks [17]. This GUI-centric interaction model is funda- mentally ill-suited for autonomous agents for several reasons: arXiv:2603.08938v2 [cs.AI] 11 Mar 2026 System actions via MCP on legacy OS kernal Google File Not Fund File Not Fund File Not Found Email Error x ? Agent Core Agent 1 (Scheduler) Agent 2 (File Manager) Agent 3 (Network) Multi-agent Orchestration ... Natural Language Input Voice Command Traditional OS (GUI-based) Agent OS (Proposed) ... Figure 1: Paradigm Shift from GUI-Based Operating Systems to AgentOS with Multi-Agent Orchestration and Natural Lan- guage Interface. • Loss of Semantic Information: GUIs are designed for human visual interpretation, presenting visual outputs while obscuring underlying structured data and metadata. Agents that “read” the screen therefore lose semantic context, often leading to reasoning errors [5, 36]. • Fragile Execution Paths: Interface layouts frequently change. When applications update their interfaces, agents relying on visual markers or fixed screen coordinates may fail cata- strophically [37]. •Runaway Security and Permissions: Legacy operating systems manage permissions at the application level (e.g., file system access). Once an autonomous agent receives such permissions, the system cannot reliably distinguish legiti- mate actions from malicious behaviors (e.g., indirect prompt injection leading to data exfiltration) [3, 23]. 1.3 Paradigm Shift: From GUI to Natural User Interfaces Historical trends in computing indicate a transition from graphical user interfaces (GUIs) toward natural user interfaces (NUIs). Tradi- tional operating systems are largely passive, responding to explicit user commands through deterministic program logic. Future com- puting environments, however, require proactive and probabilistic systems capable of inferring user intent from ambiguous and multi- modal interactions [24]. Addressing the growing mismatch between intelligent software and legacy system architectures therefore re- quires a clean-slate architectural redesign. In this paradigm, the traditional operating system kernel is encapsulated beneath a new intelligent layer—the Agent Kernel—which becomes the primary interface between users and the system, transforming the computer from isolated applications into a unified natural language–driven computational platform. 2 Vision: The Architectural Reconstruction of AgentOS AgentOS represents a structural rethinking of the computing en- vironment [19]. Rather than operating as an application on top of an existing system, AgentOS redefines the operating system itself. Under this paradigm, traditional GUI-driven desktop environments and isolated application layers are replaced by an agent-centric architecture. The Agent Kernel acts as the primary system inter- face, translating natural language intent into deterministic actions executed by sandboxed, user-defined skill modules. File System / Network Stack / Hardware Drivers ... User Interface & Input Agent Kernel Legacy OS Kernel MCP Intent Parser Multi-agent Coordinator LLM Resource Scheduler Skill Modules Text InputVoice Input Figure 2: Layered Architecture of AgentOS: Single Port Inter- face, Agent Kernel, and Legacy Infrastructure Abstraction. 2.1 The Single Port: The Death of the Desktop A defining user-facing characteristic of AgentOS is the replacement of the traditional desktop metaphor with a unified interaction gate- way referred to as the Single Port [9,41]. Conventional interface elements such as icons, menus, taskbars, and window management become secondary interaction mechanisms rather than the primary interface. In the default state, the system remains in a minimalist standby mode, exposing a persistent multimodal interface that ac- cepts voice, text, and contextual signals. Users interact with the system primarily through natural language instructions. Visual interfaces are generated only when necessary—for example, when resolving ambiguous instructions or presenting inherently visual outputs such as charts, maps, or videos. By centralizing interac- tion into a single semantic interface, AgentOS significantly reduces the cognitive overhead associated with navigating across multiple applications and interface contexts. 2.2 Agent Kernel Architecture: From Process Scheduling to Intent Orchestration At the core of AgentOS lies the Agent Kernel, which abstracts the complexities of physical hardware and legacy operating systems. Building upon early AIOS (LLM-based Agent Operating System) frameworks proposed in 2024, the Agent Kernel integrates agent applications with system resources to support coordinated execu- tion, resource management, and security enforcement. The Agent Kernel exposes two primary interfaces: •Northbound Interface (Intent Translation): Facing the user, this interface performs continuous semantic parsing, transforming ambiguous and multimodal human inputs (e.g., voice commands, gestures, or contextual queries) into struc- tured, machine-executable intents. It maintains persistent user context, manages conversational state, and handles in- terruptions during long-horizon tasks. Architecture LayerLegacy OSAgentOS Human–Computer Interface (HCI) GUI / Desktop Window Manager Single Natural Language Port (Voice / Text) Application LayerIsolated Third-Party Applications (Apps)User-Defined Skill Modules (Skills) Core Management Engine Process Scheduling, Memory Management Intent Parser, Multi-Agent Coordinator, LLM Resource Scheduler Underlying InteractionSystem Calls (Syscalls), POSIX Model Context Protocol (MCP), Semantic API Hardware AbstractionPhysical Hardware Drivers Legacy OS Kernel (Invisible Infrastructure) Table 1: Comparison between Legacy OS and AgentOS architectures. Voice Logs Geolocation data Event Logs Text Corpus Task Context ... Multimodal Data Stream Personalized Knowledge Graph Book my usual flight for that conference Would you like me to book flight B at time b to conference A? Conference refer to the recent A event Execution The user often takes the flight B at time b. Confirm Intent Resolution Figure 3: Multimodal Intent Mining and Personal Knowledge Graph Construction within the Agent Kernel. •Southbound Interface (Multi-Agent Orchestration): Fac- ing the underlying computing infrastructure, this interface activates once intent structuring is complete. An internal Multi-Agent System (MAS) decomposes user requests into executable sub-tasks and dispatches them through the Model Context Protocol (MCP) to interact with the “invisible” legacy OS kernel, including the file system, network stack, and hard- ware drivers [27]. Crucially, the Agent Kernel must also perform large-scale LLM resource scheduling. Analogous to how traditional kernels multi- plex CPU time across processes, the Agent Kernel allocates limited LLM resources—including context windows, token budgets, and API rate limits—across multiple concurrent agent threads. This scheduling layer prevents out-of-memory failures and maintains system throughput under high concurrency. 2.3 Skills as User-Defined Software Within the AgentOS ecosystem, the traditional notion of pre-packaged applications becomes significantly less central. Instead of installing monolithic software packages from centralized app stores, func- tionality is expressed through modular Skills-as-Modules. These skills represent reusable automation logic that can be composed dynamically to satisfy user goals. Users can define such skills di- rectly through natural language specifications. For example, a user may specify: “Whenever I receive an email from the finance di- rector containing a PDF invoice, extract the total amount, verify the corresponding item in my budget spreadsheet, and if correct, draft a payment authorization.” The Agent Kernel converts this natural language rule into a structured skill module and persis- tently stores it within the user environment. Each skill operates as a composable microservice that can be invoked independently or combined with other skills to form more complex workflows. This paradigm resembles emerging agentic AI architectures in which specialized agents divide responsibilities and collaboratively exe- cute complex tasks [13,14,34]. Over time, the operating system evolves into a highly personalized ecosystem of skills, yielding a computing experience tailored to the individual user’s workflows and preferences. 3 Mining User Context: Why the Future OS is a Data Mining Problem While the architectural vision of AgentOS is disruptive, its techni- cal realization depends fundamentally on advances in Knowledge Discovery and Data Mining (KDD). Traditional operating systems are deterministic engineering systems built on static logic, rigid file hierarchies, and explicit machine instructions. AgentOS, in contrast, operates in a probabilistic environment: it must infer user intent, resolve linguistic ambiguity, and adapt to evolving behavioral pat- terns. Consequently, the core challenge of AgentOS shifts from conventional systems engineering to real-time data mining, where the operating system continuously extracts structured intent from large-scale unstructured interaction data. 3.1 Intent Mining & Context Awareness Traditional operating systems are reactive, responding only to ex- plicit user commands. AgentOS, however, must proactively infer user intent from ambiguous and incomplete inputs. Without con- textual grounding, requests such as “Book my usual flight for that conference” remain ambiguous for machines [24]. • KDD Challenge: Dynamic Personal Knowledge Graphs To resolve such ambiguity, AgentOS requires continuous construction and querying of a Personal Knowledge Graph (PKG) [22]. The PKG serves as a semantically rich, user- controlled knowledge base capturing personal data, prefer- ences, social relationships, and behavioral history to support personalized reasoning. Evaluation DimensionLegacy OS MetricsAgentOS Metrics Primary ObjectiveSystem stability and resource utilization User intent fulfillment and complex workflow automation Key Performance Indicators (KPIs) CPU load, memory faults, disk I/O Intent Alignment (IA), task completion rate, tool invocation accuracy Fault Tolerance & Measurement Crash logs, kernel panics, exception traces Hallucination rate, context drift, disambiguation failure rate Benchmarking Methodology Unit testing, integration testing, stress testing Tri-Agent evaluation, AndroidArena simulations, Agentic Benchmark Checklist (ABC) System Evolution Mechanism Static; manual developer patches and upgrades Dynamic; self-evolving via Sequential Pattern Mining (SPM) and MIRA Table 2: Evaluation framework comparison between Legacy OS and AgentOS. • As a continuous intent mining engine, AgentOS processes multimodal interaction streams—including voice logs, screen context, and geolocation signals—and applies Natural Lan- guage Processing (NLP) and relational extraction techniques to update the PKG in real time. When handling ambiguous instructions, the Agent Kernel performs graph-augmented reasoning over the PKG to infer implicit user preferences from historical behavior [10]. • The Zero-shot-to-head-shot hyperpersonalization (Z2H2) frame- work offers one implementation pathway [4]. It categorizes user data into explicit, implicit, passive, and derivative modal- ities, enabling AgentOS to evolve from cold-start prompts to- ward personalized interaction through Retrieval-Augmented Generation and behavioral profiling. Without robust intent mining and PKG integration, agents risk repeatedly inter- rupting users for clarification or executing incorrect actions. 3.2 Skill Retrieval & Recommendation As traditional App Stores evolve into large-scale Skill Ecosystems, users will accumulate extensive collections of highly granular and overlapping skill modules. The operating system must therefore identify which specific skill—or sequence of skills—best matches a given task. This process resembles tool orchestration in agentic AI systems, where complex workflows are constructed by dynamically selecting and composing specialized tools or modules [35]. • KDD Challenge: Hyperscale Recommender Systems for Software and Logic This problem can be framed as a large-scale recommender system task [31]. Unlike tradi- tional recommendation settings (e.g., movies or e-commerce), AgentOS must recommend executable logic—such as skill modules or code sequences—based on complex and multidi- mensional user context. •A natural solution is a Two-Tower Recommendation Archi- tecture. The “User Tower” encodes natural language queries together with real-time PKG context into a high-dimensional context embedding [28]. The “Skill Tower” represents skill modules by embedding functional metadata, code seman- tics, and execution constraints into a shared representation space. Similarity search (e.g., cosine similarity) is then used to retrieve the most relevant skill modules [25]. •Recommendation quality can be further improved through Reinforcement Learning (RL) and collaborative filtering. User feedback—such as task abortion or manual correction—provides negative signals that allow the system to update skill embed- dings and improve future retrieval accuracy [12, 32, 39]. 3.3 Action Sequence Mining and Optimization An AgentOS operating through a multi-agent system will gener- ate large volumes of interaction data. As agents interact with file systems, network APIs, and web environments, they produce con- tinuous action traces—sequences of system calls, API requests, and navigation steps executed to complete tasks [11]. •KDD Challenge: Sequential Pattern Mining (SPM) To improve efficiency over time, the operating system must apply Sequential Pattern Mining (SPM) techniques to analyze large-scale UI and system logs. SPM algorithms are well suited for discovering frequently occurring subsequences in temporal interaction data. •By mining action traces, the system can identify repetitive and high-latency workflows. When frequent patterns are detected, the Agent Kernel can automatically synthesize optimized macros or background services that replace multi- step interaction sequences with direct execution. •A key challenge lies in handling the noise inherent in real- world computing environments, including fluctuating ac- tion spaces and unpredictable external responses. Robust se- quence mining therefore requires integrating action models and filtering heuristics to distinguish meaningful behavioral patterns from spurious interaction traces. 3.4 Evaluation & Benchmarks Transitioning to a probabilistic and highly personalized operating system challenges traditional software evaluation metrics. Perfor- mance can no longer be assessed solely through CPU utilization, memory efficiency, or binary unit tests. Because system behav- ior varies across users and contexts, the notion of “correctness” becomes inherently subjective and context-dependent. • KDD Challenge: Quantifying Intent Alignment Evalu- ating AgentOS requires new benchmarks centered on user satisfaction and Intent Alignment (IA) [6,21]. IA measures the semantic gap between a user’s latent goal and the actions executed by the agent. • Recent research has begun addressing this challenge through new evaluation frameworks. For example, the Tri-Agent framework introduces a clarification agent, a response agent, and an evaluator agent (LLM-as-a-judge) to assess conversa- tional efficiency, disambiguation, and intent alignment [33]. Similarly, initiatives such as the Agentic Benchmark Check- list (ABC) enforce structured multi-turn task settings to pre- vent overestimation of agent capabilities [38]. •Future evaluation frameworks should integrate simulation environments (e.g., AndroidArena) together with learning- based reward modeling methods such as Markovian Intrinsic Reward Adjustment (MIRA) to measure cognitive agency, tool proficiency, and long-horizon task performance [16]. 4 Challenges and Risks: Governing a Probabilistic OS Delegating core system control to autonomous probabilistic agents introduces significant systemic risks. Because AgentOS operates under inherent uncertainty, robust security architectures and fault- tolerance mechanisms become essential. 4.1Privacy, Security, and the Semantic Firewall Legacy operating systems rely on static permissions and determin- istic Access Control Lists (ACLs), where applications either possess access to a resource or they do not. In AgentOS, however, agents re- quire broad system-level access to coordinate tasks across multiple resources. Security boundaries must therefore shift from verifying who requests data to evaluating the semantic intent of the request. Recent studies reveal structural vulnerabilities in system-integrated AI agents. For example, an attacker may embed an Indirect Prompt Injection in an email, instructing the agent to retrieve sensitive data such as SSH keys [30] . An agent with unrestricted file-system access may execute such instructions without verification, high- lighting the susceptibility of multimodal agents to jailbreak attacks, prompt injections, and adversarial inputs. To mitigate these risks, AgentOS requires a Semantic Firewall [2]. Integrated within the Agent Kernel, it functions as a real-time text and data mining secu- rity layer that monitors information flows into and out of the LLM core. External inputs must undergo sanitization and intent analysis before execution; if malicious intent is detected, the firewall blocks the action and alerts the user. Core capabilities include: •Input Sanitization and Intent Vetting: Text mining tech- niques detect adversarial prompts and jailbreak attempts in incoming data streams or RAG-retrieved documents. •Taint-Aware Memory and Cognitive Integrity: Inspired by the experimental “Aura” OS architecture [40], the firewall labels data from untrusted sources as tainted and prevents it from triggering high-privilege operations (e.g., password changes or financial transactions). • Real-Time Data Loss Prevention (DLP): Outbound agent actions are analyzed to detect and prevent the leakage of sensitive entities (e.g., SSNs, API keys, or financial records). 4.2 Hallucination Control and System Fault Tolerance Beyond malicious attacks, the probabilistic nature of LLMs means the OS is inevitably prone to hallucinations and reasoning er- rors [7,8]. A traditional OS rarely deletes a critical directory without explicit, hard-coded instructions. Conversely, an AgentOS misun- derstanding a vague command like "clean up my workspace" could irreversibly delete project files or corrupt system configurations. To ensure fault tolerance, the Agent Kernel must operate within strict sandboxes, utilizing advanced virtualization and containerization to isolate high-risk operations. Most critically, the OS requires a robust, system-level "State Rollback" mechanism. By leveraging underlying file systems (like ZFS or Btrfs variants), the OS must maintain fine-grained snapshots. If an action trajectory is deemed erroneous (either automatically detected via meta-reflection algo- rithms or reported by the user), the Agent Kernel must reversely traverse the sequence, undoing state changes to restore absolute system integrity in milliseconds. 5 Conclusion The rapid emergence of local autonomous agents, exemplified by systems such as OpenClaw, signals a fundamental shift in per- sonal computing. However, deploying these probabilistic systems on legacy operating systems designed for graphical interfaces is in- creasingly inadequate, leading to fragmented contextual reasoning, new security risks, and limited autonomy. AgentOS rethinks the op- erating system paradigm. By introducing a unified natural language interface (the Single Port), an Agent Kernel for intent orchestra- tion, and dynamically composable Skill Modules, it transforms the computer from isolated applications into a coherent intent-driven system. Realizing this vision is not merely a systems engineering challenge but fundamentally a problem of Knowledge Discovery and Data Mining (KDD). The effectiveness of the Agent Kernel depends on algorithmic advances that enable machines to infer and model user intent from large-scale interaction data. Key enabling technologies include Personal Knowledge Graphs (PKGs) for con- textual reasoning, Sequential Pattern Mining (SPM) for workflow discovery, recommender systems for skill retrieval, and semantic security mechanisms such as the Semantic Firewall. Together, these components redefine the operating system as a continuous data mining pipeline that converts unstructured interactions into exe- cutable intent. Ultimately, the operating system of the future will be defined by its ability to interpret and operationalize human intent, evolving into a continuously learning data mining system. References [1]Piero A. Bonatti, John Domingue, Anna Lisa Gentile, Andreas Harth, Olaf Hartig, Aidan Hogan, Katja Hose, Ernesto Jimenez-Ruiz, Deborah L. McGuinness, Chang Sun, Ruben Verborgh, and Jesse Wright. 2025. Towards Computer-Using Personal Agents. arXiv preprint arXiv:2503.15515 (2025). https://arxiv.org/abs/2503.15515 [2]Victor Castro-Maldonado, Marco A. Aceves-Fernandez, Luis R. Garcia-Noguez, and Jesus C. Pedraza-Ortega. 2026. Semantic Firewalls with Online Ensemble Learning for Secure Agentic RAG Systems in Financial Chatbots. AI 7, 3 (2026), 80. doi:10.3390/ai7030080 [3]Saket Sanjeev Chaturvedi, Gaurav Bagwe, Lan Emily Zhang, and Xiaoyong Yuan. 2025. AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt. In Proceedings of EMNLP 2025. 15861–15878. doi:10.18653/ v1/2025.emnlp-main.801 [4]Kanishka Dandeniya, Sam Saltis, Shalinka Jayatilleke, Nishan Mills, Harsha Moraliyage, Daswin De Silva, and Milos Manic. 2025. Zero-Shot to Head-Shot: Hyperpersonalization in the Age of Generative AI. Applied System Innovation 8, 6 (2025), 186. doi:10.3390/asi8060186 [5]Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2Web: Towards a Generalist Agent for the Web. arXiv:2306.06070 [cs.CL] https://arxiv.org/abs/2306.06070 [6]Shengyue Guan, Jindong Wang, Jiang Bian, Bin Zhu, Jian guang Lou, and Haoyi Xiong. 2026. Evaluating LLM-Based Agents for Multi-Turn Conversations: A Survey. arXiv preprint arXiv:2503.22458 (2026). https://arxiv.org/abs/2503.22458 [7]Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions on Information Systems 43, 2 (Jan. 2025), 1–55. doi:10.1145/3703155 [8]Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12, Article 248 (March 2023), 38 pages. doi:10.1145/3571730 [9]Dr. Manju Kaushik and Rashmi Jain. 2014. Natural User Interfaces: Trend in Virtual Interaction. arXiv:1405.0101 [cs.HC] https://arxiv.org/abs/1405.0101 [10]Serin Kim, Sangam Lee, and Dongha Lee. 2026. Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History. arXiv preprint arXiv:2602.17003 (2026). https://arxiv.org/abs/2602.17003 [11]Volodymyr Leno, Adriano Augusto, Marlon Dumas, Marcello La Rosa, Fab- rizio Maria Maggi, and Artem Polyvyanyy. 2022. Discovering Data Transfer Routines from User Interaction Logs. Information Systems 107 (2022), 101916. doi:10.1016/j.is.2021.101916 [12]Yuanguo Lin, Yong Liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huan- huan Chen, and Chunyan Miao. 2024. A Survey on Reinforcement Learning for Recommender Systems. IEEE Transactions on Neural Networks and Learning Systems 35, 10 (Oct. 2024), 13164–13184. doi:10.1109/tnnls.2023.3280161 [13] Rui Liu, Steven Jige Quan, Zhong-Ren Peng, Zijun Yao, Han Wang, Zhengzhang Chen, Kunpeng Liu, Yanjie Fu, and Dongjie Wang. 2026. City Editing: Hierarchi- cal Agentic Execution for Dependency-Aware Urban Geospatial Modification. arXiv:2602.19326 [cs.MA] https://arxiv.org/abs/2602.19326 [14] Rui Liu, Tao Zhe, Zhong-Ren Peng, Necati Catbas, Xinyue Ye, Dongjie Wang, and Yanjie Fu. 2026. Urban Planning in the Age of Agentic AI: Emerging Paradigms and Prospects. SIGKDD Explorations Newsletter 27, 2 (Dec. 2026), 35–42. doi:10. 1145/3787470.3787474 [15] Kai Mei, Xi Zhu, Wujiang Xu, Wenyue Hua, Mingyu Jin, Zelong Li, Shuyuan Xu, Ruosong Ye, Yingqiang Ge, and Yongfeng Zhang. 2025. AIOS: LLM Agent Operating System. arXiv preprint arXiv:2403.16971 (2025). https://arxiv.org/abs/ 2403.16971 [16]Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. 2025. Evaluation and Benchmarking of LLM Agents: A Survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (Toronto ON, Canada) (KDD ’25). Association for Computing Machinery, New York, NY, USA, 6129–6139. doi:10.1145/3711896.3736570 [17] Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2022. WebGPT: Browser-assisted question-answering with human feedback. arXiv:2112.09332 [cs.CL] https: //arxiv.org/abs/2112.09332 [18]OpenClaw. 2026. OpenClaw: Personal AI Assistant. https://openclaw.ai. https: //openclaw.ai Accessed March 3, 2026. [19]Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2024. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560 [cs.AI] https://arxiv.org/abs/2310.08560 [20]Timo Schick, Jane Dwivedi-Yu, Roberto Dessí, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. In Advances in Neural Information Processing Systems (NeurIPS). [21]Manish Shukla. 2025. Evaluation and Benchmarking of Generative and Agen- tic AI Systems: A Comprehensive Survey. Preprints (2025). doi:10.20944/ preprints202512.1421.v1 [22] Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, and Trond Linjordet. 2024. An Ecosystem for Personal Knowledge Graphs: A Survey and Research Roadmap. AI Open 5 (2024), 55–69. doi:10.1016/j.aiopen. 2024.01.003 [23]Hang Su, Jun Luo, Chang Liu, Xiao Yang, Yichi Zhang, Yinpeng Dong, and Jun Zhu. 2025. A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents. arXiv preprint arXiv:2506.23844 (2025). https://arxiv.org/abs/2506.23844 [24]Yu Sun, Nicholas Jing Yuan, Yingzi Wang, Xing Xie, Kieran McDonald, and Rui Zhang. 2016. Contextual Intent Tracking for Personal Assistants. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 273–282. doi:10.1145/2939672.2939676 [25]Fang Tang, Renqi Zhu, Feng Yao, Junzhi Wang, Lailong Luo, and Bo Li. 2025. Explainable Person–Job Recommendations: Challenges, Approaches, and Com- parative Analysis. Frontiers in Artificial Intelligence (2025). doi:10.3389/frai.2025. 1660548 [26]Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv preprint arXiv: Arxiv-2305.16291 (2023). [27]Yuntao Wang, Shaolong Guo, Yanghe Pan, Zhou Su, Fahao Chen, Tom H. Luan, Peng Li, Jiawen Kang, and Dusit Niyato. 2026. Internet of Agents: Fundamentals, Applications, and Challenges. IEEE Transactions on Cognitive Communications and Networking 12 (2026), 4476–4501. doi:10.1109/TCCN.2025.3623369 [28] Yu Wang, Lei Sang, Yi Zhang, and Yiwen Zhang. 2025. Intent Representation Learning with Large Language Model for Recommendation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1870–1879. doi:10.1145/3726302.3730011 [29] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In International Conference on Learning Representations (ICLR). [30]Yi Zhang and Jantan Aman. 2025. Targeted Injection Attack Toward the Semantic Layer of Large Language Models. Frontiers in Computer Science (2025). doi:10. 3389/fcomp.2025.1683495 [31]Jing Zhao, Jingya Wang, Madhav Sigdel, Bopeng Zhang, Phuong Hoang, Mengshu Liu, and Mohammed Korayem. 2021. Embedding-Based Recommender System for Job to Candidate Matching on Scale. arXiv preprint arXiv:2107.00221 (2021). https://arxiv.org/abs/2107.00221 [32]Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-Wise Recommendations. In Proceedings of the ACM Conference on Recommender Systems (RecSys). 95–103. [33]Yikai Zhao. 2025.A Tri-Agent Framework for Evaluating and Aligning Question Clarification Capabilities of Large Language Models. (2025). https: //w.amazon.science/publications/a-tri-agent-framework-for-evaluating- and-aligning-question-clarification-capabilities-of-large-language-models [34]Tao Zhe, Rui Liu, Fateme Memar, Xiao Luo, Wei Fan, Xinyue Ye, Zhongren Peng, and Dongjie Wang. 2025. Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents. arXiv:2510.06078 [cs.AI] https: //arxiv.org/abs/2510.06078 [35] Tao Zhe, Haoyu Wang, Bo Luo, Min Wu, Wei Fan, Xiao Luo, Zijun Yao, Haifeng Chen, and Dongjie Wang. 2026. Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction. arXiv:2602.18968 [cs.AI] https://arxiv.org/abs/2602.18968 [36]Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. 2024. GPT-4V(ision) is a Generalist Web Agent, if Grounded. In Forty-first International Conference on Machine Learning. https://openreview.net/forum?id=piecKJ2DlB [37]Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. arXiv:2307.13854 [cs.AI] https://arxiv.org/abs/2307.13854 [38]Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, and et al. 2025. Establish- ing Best Practices for Building Rigorous Agentic Benchmarks. arXiv preprint arXiv:2507.02825 (2025). https://arxiv.org/abs/2507.02825 [39]Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement Learning to Optimize Long-term User Engagement in Rec- ommender Systems. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 2810–2818. [40] Zhenhua Zou, Sheng Guo, Qiuyang Zhan, Lepeng Zhao, Shuo Li, Qi Li, Ke Xu, Mingwei Xu, and Zhuotao Liu. 2026. Blind Gods and Broken Screens: Archi- tecting a Secure, Intent-Centric Mobile Agent Operating System. arXiv preprint arXiv:2602.10915 (2026). https://arxiv.org/abs/2602.10915 [41] V.W. Zue and J.R. Glass. 2000. Conversational interfaces: advances and challenges. Proc. IEEE 88, 8 (2000), 1166–1180. doi:10.1109/5.880078