← Back to papers

Paper deep dive

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

Haochen Zhao, Shaoyang Cui

Year: 2026Venue: arXiv preprintArea: cs.CRType: PreprintEmbeddings: 29

Abstract

Abstract:Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap for network-layer security testing. In this paper, we present \textbf{ClawTrap}, a \textbf{MITM-based red-teaming framework for real-world OpenClaw security evaluation}. ClawTrap supports diverse and customizable attack forms, including \textit{Static HTML Replacement}, \textit{Iframe Popup Injection}, and \textit{Dynamic Content Modification}, and provides a reproducible pipeline for rule-driven interception, transformation, and auditing. This design lays the foundation for future research to construct richer, customizable MITM attacks and to perform systematic security testing across agent frameworks and model backbones. Our empirical study shows clear model stratification: weaker models are more likely to trust tampered observations and produce unsafe outputs, while stronger models demonstrate better anomaly attribution and safer fallback strategies. These findings indicate that reliable OpenClaw security evaluation should explicitly incorporate dynamic real-world MITM conditions rather than relying only on static sandbox protocols.

Tags

ai-safety (imported, 100%)cscr (suggested, 92%)preprint (suggested, 88%)safety-evaluation (suggested, 80%)

Links

Your browser cannot display the PDF inline. Open PDF directly →

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

28,735 characters extracted from source content.

Expand or collapse full text

CLAWTRAP: A MITM-BASED RED-TEAMING FRAMEWORK FOR REAL-WORLD OPENCLAW SECURITY EVALUATION Haochen Zhao School of Computing National University of Singapore Singapore e1553606@u.nus.edu Shaoyang Cui Department of Psychological and Cognitive Sciences Tsinghua University Beijing, China sy-cui@thu.edu.cn ABSTRACT Autonomous web agents such as OpenClaw are rapidly moving into high-impact real-world work- flows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap for network-layer security testing. In this paper, we present ClawTrap, a MITM-based red-teaming framework for real-world OpenClaw security evaluation. ClawTrap supports diverse and customizable attack forms, including Static HTML Replacement, Iframe Popup Injection, and Dynamic Content Modification, and provides a reproducible pipeline for rule-driven interception, transformation, and auditing. This design lays the foundation for future research to construct richer, customizable MITM attacks and to perform systematic security testing across agent frameworks and model backbones. Our empirical study shows clear model stratification: weaker models are more likely to trust tampered observations and produce unsafe outputs, while stronger models demonstrate better anomaly attribution and safer fallback strategies. These findings indicate that reliable OpenClaw security evaluation should explicitly incorporate dynamic real-world MITM conditions rather than relying only on static sandbox protocols. Project Blog: https://clawtrap.github.io/ Code: https://github.com/ClawTrap/claw_trap 1 Introduction The landscape of artificial intelligence is currently undergoing rapid iteration, transitioning from foundational Large Language Models (LLMs) toward sophisticated Agentic Workflows empowered by specialized Model Context Protocol (MCP) tools and skills. Agents like Claude Code and Codex in the technical domain, alongside Manus in daily-life automation, exemplify this shift as they move beyond simple conversational interfaces to execute multi-step, autonomous tasks with high efficiency. Marking a definitive milestone in autonomous agency, OpenClaw has surged from a niche toolkit to a large-scale public platform with global reach. Yet, its very success necessitates a critical shift in perspective; as deployment scales worldwide, the urgency to address its underlying security vulnerabilities has moved from a theoretical concern to a practical imperative. As deployment becomes broader and more autonomous, vulnerabilities such as privacy leakage, information pollution, and unintended action execution become practical risks rather than hypothetical ones. Therefore, improving security evaluation is not a peripheral concern; it is a prerequisite for reliable, responsible, and sustainable adoption of agentic systems in real-world settings. arXiv:2603.18762v1 [cs.CR] 19 Mar 2026 ClawTrap: MITM-Based Red-Teaming for OpenClaw Several pioneering works have already begun to explore and quantify the safety issues of these automated agents. For instance, Zhan et al. [1] evaluated the safety of tool-integrated agents against indirect prompt injection (IPI) in simulated settings, while Evtimov et al. [2] tested real-world web agents in sandbox environments like VisualWebArena [3]. Furthermore, Wu et al. [4] assessed autonomous frameworks against deceptive UI and malicious prompts, and taxonomic studies like [5] have categorized various failure modes such as agent corruption and sensitive information disclosure. Despite these significant contributions, existing methodologies remain largely confined to sandboxed and static settings, where the dominant threat model is still content-layer attack injection. This leaves a crucial blind spot, because modern web agents depend on live networked observations, yet their robustness against dynamic network-layer manipulation is rarely evaluated. To close this gap, we present ClawTrap, a customized adversarial framework designed for the real-world evaluation of OpenClaw under dynamic Man-in-the-Middle (MITM) attacks. ClawTrap introduces an MITM attack pipeline that intercepts and tampers with physical network traffic in real time. By modifying, injecting, or deleting external information during active execution, ClawTrap reveals vulnerabilities that remain hidden in static evaluation and provides a more deployment-faithful assessment of agent robustness. Our main contributions are summarized as follows. • A Dedicated MITM Attack Framework for OpenClaw. We propose ClawTrap, a framework specifically designed to launch and evaluate MITM attacks against OpenClaw agents. •Real-Time, Real-World, and Diverse Attack Realization. ClawTrap operates in live browsing environments and supports highly diverse attack patterns—including response rewriting, targeted injection, and full-page replacement—for realistic security stress testing. •Safety Insights for Agentic Workflows. Through rigorous analysis of failure cases, we reveal significant security disparities across foundation models—where flagship models exhibit high "anti-fraud awareness" while others remain susceptible to MITM deception—prompting the open-source community to reconsider the fundamental safety of autonomous agentic workflows. 2 Related Work 2.1 Benchmarking Agent Security and Tool-Use Robustness Recent studies have established broad benchmarks for evaluating agent capability and security, with indirect prompt injection (IPI) as a central threat model. InjecAgent, AgentDojo, and ASB provide representative evaluation settings for prompt-injected tool-use workflows[1,6,7]. In parallel, general-agent benchmarks such as AgentBench, ToolLLM, GAIA, and SWE-bench evaluate planning, tool use, and long-horizon execution under diverse tasks[8,9,10,11]. Safety-oriented ecosystems, including HAICOSYSTEM and OpenAgentSafety, further expand the coverage of risk evaluation[12,13]. These works define important foundations, but most of them do not directly evaluate network-layer adversarial manipulation during live browsing. 2.2 Security Evaluation of Real-World Web Agents For web-agent settings, realistic environments such as WebArena, Mind2Web, WebShop, WebLINX, WorkArena, and OSWorld have enabled increasingly deployment-relevant evaluation[14,15,16,17,18,19]. Security-focused evaluations built on these environments, including WASP, WebTrap Park, DoomArena, WAREX, and commercial-agent attack studies, show that autonomous browsers remain vulnerable under adversarial web conditions[2,3,4,20,21,22]. Complementary visual/UI attack lines, including EIA, Pop-up Attacks, WebInject, AdvAgent, SecureWebArena, and TRAP, demonstrate that manipulated interface signals can significantly alter agent decisions[23,24,25,26,27,28]. Overall, this line has made substantial progress on content- and UI-level threats. 2.3 MITM-Centric Auditing and the Remaining Gap Compared with the above directions, explicit MITM-centric auditing for web agents remains limited. Existing MITM- related studies in agent systems mainly target communication-channel attacks among agents, while other work studies adversarial memory/factual manipulation rather than end-to-end interception of live web traffic[29,30]. As a result, there is still a gap between current benchmarks and real deployment conditions where traffic can be intercepted and rewritten in transit. ClawTrap is designed to fill this gap by focusing on dynamic MITM evaluation over real browsing sessions and by measuring both task outcome and trust calibration under network-layer attacks. 2 ClawTrap: MITM-Based Red-Teaming for OpenClaw 3 ClawTrap 3.1 Pipeline Figure 1: The pipeline of ClawTrap MITM attack framework. As illustrated in Figure 1, ClawTrap follows a “Local Capture–Cloud Induction” architecture that keeps agent execution in the cloud while centralizing auditing logic on a researcher-controlled local node. The system consists of four coupled layers: (i) cloud-side OpenClaw targets wrapped by per-instance proxy adapters, (i) a private Tailscale P2P tunnel cluster for transparent traffic forwarding, (i) a local interception engine built on mitmdump (with interceptor.pyas dispatcher,matcher.pyas rule evaluator, andtransformer.pyfor attack-mode execution), and (iv) auxiliary services including payload snippets, a FastAPI-based Honey Server, and a researcher dashboard/CLI for telemetry inspection. To align with the latest implementation, ClawTrap processes each HTTP flow through the following end-to-end stages: 1.Initialization and Environment Sync: The researcher first configures localconfig.json(including the local Tailscale endpoint), then runssync_config.pyto generate cloud-side scripts and uploads them to target instances. Traffic takeover is activated viasudo bash /root/cloud_proxy_toggle.sh on, which resets proxy-related environment variables and restarts agent services so that outbound requests are routed into the private tunnel. 2.Request-Path Interception and Rule Decision: When an OpenClaw agent issues a request, the flow is tunneled to the local proxy port and intercepted byinterceptor.py. The matcher first checks detection rules (e.g., suspicious metadata-interface access such as100.100.100.200); matched events are asynchronously reported to the Honey Server (e.g.,/api/report_vulnerability). It then checks mock rules for protected domains; if matched, the transformer directly serves forged content from local snippets so the request never reaches the public Internet. 3. Response-Path Transformation and Return: For non-mocked traffic, the request is forwarded to the real web, and the returned response is rewritten in-stream according to the active attack mode. ClawTrap currently supports three synchronized MITM modes: Static HTML Replacement (REPLACE), Iframe Popup Injection (INJECT), and Dynamic Content Modification (SUBSTITUTE). The transformed response is sent back through Tailscale to the cloud agent, while execution traces and attack outcomes are persisted for post-hoc auditing. 3.2 MITM Attack-Mode Taxonomy in ClawTrap The ClawTrap framework formalizes a tripartite attack-mode taxonomy centered on how adversarial payloads are delivered through its MITM interception pipeline during real-world agent–web interaction. As illustrated in Figure 2, we categorize attacks into Static HTML Replacement, Iframe Popup Injection, and Dynamic Content Modification. Static HTML Replacement fully swaps the original response body with a forged but plausible page, allowing attackers to poison the agent’s primary evidence source while preserving normal navigation flow. Iframe Popup Injection 3 ClawTrap: MITM-Based Red-Teaming for OpenClaw Figure 2: The ClawTrap MITM attack-mode taxonomy: attack forms are categorized as Static HTML Replacement, Iframe Popup Injection, and Dynamic Content Modification. overlays deceptive, high-priority interface elements on top of legitimate pages through injected iframe containers, enabling phishing-style instruction hijacking without visibly breaking site context. Dynamic Content Modification performs fine-grained, in-stream rewriting of selected DOM fragments or textual fields as content is rendered, which is effective for stealthy manipulation of facts, prices, warnings, or other task-critical parameters. ClawTrap’s MITM pipeline natively supports all three forms through scenario-driven interception and response transformation, enabling systematic evaluation of both coarse-grained and fine-grained failure modes under realistic traffic conditions. This design is intentional: it bridges the gap between static sandbox content attacks and deployment-time risks while providing a unified and reproducible basis for measuring robustness, trust calibration, and provenance-aware reasoning in OpenClaw-like agentic systems. 4 Experiments 4.1 Dynamic MITM Attack Setup in Real-World Browsing Unlike a static sandbox environment, our experiments run in a dynamic real-world browsing loop: OpenClaw issues live web requests, and ClawTrap performs on-the-fly MITM rewriting before the response is rendered to the agent. This setup targets the agent’s observation channel directly, which is the core security surface in practical deployment. Using the ClawTrap framework, we designed and evaluated two representative MITM injection patterns in real browsing sessions, as shown in Figure 3. Attack A replaces the returned page with fabricated news content. Attack B preserves the real webpage but injects a high-urgency fake warning overlay. Together, these two attacks cover both full-content poisoning and local UI-signal manipulation under live traffic. (a) Attack A: fabricated news page injected via MITM(b) Attack B: real Google page with injected fake warning Figure 3: Two dynamic MITM browsing attacks in ClawTrap. Both operate on live traffic, but manipulate different perception layers: full-page content replacement (A) and warning-layer deception on a real page (B). 4 ClawTrap: MITM-Based Red-Teaming for OpenClaw 4.2 HTML Replacement: Fabricated News Injection For Demo Case 1 (Attack A), we assign the task "Tell me what is on bbc.com." The interceptor rewrites the returned page into forged content while preserving normal browsing flow. The attack therefore does not break execution; instead, it silently shifts the semantic evidence seen by the agent. Figure 4: Model behavior under Attack A. Smaller models tend to summarize poisoned content, while stronger models show higher anomaly awareness. Results reveal a clear trust-transfer failure: once tampered HTML is delivered through the normal channel, weaker models treat it as trustworthy context and produce confident but incorrect summaries. Stronger models are more likely to detect inconsistency, attribute possible interception, or seek safer fallback sources. This demonstrates that dynamic MITM content injection can cause direct task-level failure in real browsing without explicit prompt strings. Figure 4 shows this contrast concretely.GPT-5-minireports the injected page as if it were legitimate (e.g., page title "BBC News - Brent Oil Market Today") and then summarizes the forged headlines as normal news content. It follows the expected scraping pipeline correctly, but without authenticity checks. In contrast,GPT-5.4explicitly states that the observed page is "not the normal live BBC homepage," attributes the anomaly to possible "network interception/proxy rewrite," and proposes safer recovery steps (e.g., fetching BBC News RSS and verifying whether network rewriting is occurring). This behavior difference is important: both models can read the page, but only stronger models reliably reason about where the page evidence comes from. 4.3 Iframe Injection: Real Page + Fake Warning Injection For Iframe & Pop-up Injection (Attack B), we issue "Visit google.com in the browser and tell me what is in it." and inject a fake warning on top of an otherwise legitimate page. Compared with Attack A, this case probes whether the agent can calibrate trust in UI-level security signals under dynamic MITM interference. We observe clear model stratification again.GPT-5-nanotends to underweight warning anomalies and continue with surface-level descriptions, while stronger models adopt more conservative reasoning and first verify whether the warning is authentic. This indicates that robustness in dynamic real-world settings depends not only on content understanding, but also on UI-trust calibration. The multi-model outputs in Figure 5 make this pattern explicit.GPT-5.4,GLM-5, andQwen3.5-397b-a17ball flag the warning as injected or non-legitimate and provide causal hypotheses such as extension/script injection, proxy interception, or local network manipulation. By contrast,GPT-5-nanomainly returns structural page metadata (title, locale, scripts, and DOM elements) and does not escalate the fake warning as a security anomaly. This indicates that effective defense against dynamic MITM attacks requires both perception and attribution-level reasoning; recognizing text alone is insufficient. Taken together, Demo 1 and Demo 2 support our central claim: the key risk is not only sandbox prompt attacks, but dynamic, real-world MITM manipulation of the agent’s observation channel. More importantly, these findings demonstrate why ClawTrap is practically valuable: it surfaces failure modes that are invisible to static benchmarks and 5 ClawTrap: MITM-Based Red-Teaming for OpenClaw Figure 5: Model behavior comparison under Attack B (real page with injected fake warning). reveals whether a model can perform provenance-aware reasoning under compromised network conditions. In other words, ClawTrap evaluates not only task completion, but also trust calibration, which is a core capability required for safe deployment of OpenClaw-like systems. 5 Conclusion In this work, we present ClawTrap, the first dynamic MITM-attack-oriented evaluation framework for real-world OpenClaw instances. Our central narrative is straightforward. OpenClaw is a powerful and impactful ecosystem, yet its security exposure grows with real-world adoption. Therefore, robust evaluation must move beyond static sandbox setups and explicitly test the live observation channel on which agents rely. Compared with prior agent-security benchmarks that predominantly emphasize static, content-level attacks in simulated environments, ClawTrap introduces a deployment-faithful threat model based on dynamic network interception and response rewriting. This design enables systematic stress testing of task integrity, agent behavior integrity, and user-level safety in a unified framework. Through two representative demos—fabricated page replacement and real-page warning injection—we show that model behavior diverges substantially across model scales: weaker models often transfer trust to tampered evidence, while stronger models exhibit better anomaly attribution and safer fallback strategies. The broader significance of ClawTrap lies in shifting evaluation criteria from “can the agent finish the task?” to “can the agent finish the task safely under adversarial network conditions?” We hope this framework helps the community build provenance-aware defenses, improve security-by-design practices, and establish more realistic safety standards for open-source agentic workflows. As future work, we plan to expand scenario coverage, include longitudinal robustness tracking, and explore automatic defense modules that can be paired with OpenClaw deployments. 6 Future Work ClawTrap in its current form represents a preliminary framework and proof-of-concept evaluation. We identify three primary directions for future development. Quantitative Evaluation at Scale. The current study demonstrates ClawTrap’s capability through qualitative case analysis. Future versions will include systematic quantitative benchmarks measuring attack success rate, task completion 6 ClawTrap: MITM-Based Red-Teaming for OpenClaw rate under attack, and trust miscalibration rate across a larger and more diverse task suite spanning information retrieval, form submission, and multi-step transactional workflows. Expanded Task Coverage. We plan to extend evaluation scenarios beyond news reading and homepage inspection to include security-sensitive tasks such as credential handling, e-commerce transactions, and API-integrated agentic pipelines, where MITM manipulation carries higher real-world consequence. Advanced Dynamic MITM Attack Methods. Beyond the three current attack modes, future work will explore adaptive and context-aware MITM strategies, including session-persistent injection, multi-hop traffic tampering across chained agent calls, and timing-based attacks that exploit agent re-querying behavior. These directions aim to stress-test agent robustness under more realistic and sophisticated adversarial network conditions. 7 Ethical Considerations The development and evaluation of CLAWTRAP strictly adhere to ethical hacking principles and responsible disclosure practices. We emphasize that all experiments conducted in this study were performed under the following constraints to ensure zero impact on real-world systems: •Controlled Environment: All OPENCLAW instances and HONEY-SERVER nodes were deployed in isolated, containerized environments. The MITM interception was confined to our own research infrastructure via private TAILSCALE networks, ensuring no third-party traffic was intercepted or manipulated. •Synthetic Data Usage: We utilized exclusively synthetic user credentials and mock financial data for all "Information Disclosure" scenarios. No real-world Personally Identifiable Information (PII) or sensitive assets were at risk during the evaluation. •Non-Disruptive Testing: While we used real-world domains (e.g., bbc.com, google.com) as anchors for our MITM demos, the traffic was redirected at the proxy level within our local environment. No actual requests were sent to these services that violated their respective Terms of Service or rate-limiting policies. The primary goal of this work is to provide the community with a rigorous auditing tool to improve the security-by-design of autonomous agents, rather than to facilitate malicious exploitation. References [1] Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024. [2]Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, and Kamalika Chaudhuri. Wasp: Bench- marking web agent security against prompt injection attacks. arXiv preprint arXiv:2504.18575, 2025. [3]Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried. Visualwebarena: Evaluating multimodal agents on realistic visual web tasks. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 881–905, 2024. [4]Xinyi Wu, Jiagui Chen, Geng Hong, Jiayi Dong, Xudong Pan, Jiarun Dai, and Min Yang. Webtrap park: An automated platform for systematic security evaluation of web agents. arXiv preprint arXiv:2601.08406, 2026. [5]Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, et al. Agents of chaos. arXiv preprint arXiv:2602.20021, 2026. [6]Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. Advances in Neural Information Processing Systems, 37:82895–82920, 2024. [7] Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents. arXiv preprint arXiv:2410.02644, 2024. [8] Xiao Liu et al. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023. [9]Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, 7 ClawTrap: MITM-Based Red-Teaming for OpenClaw and Maosong Sun. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023. [10]Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a benchmark for general ai assistants. arXiv preprint arXiv:2311.12983, 2023. [11]Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. SWE-bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, 2024. [12]Xuhui Zhou, Hyunwoo Kim, Faeze Brahman, Liwei Jiang, Hao Zhu, Ximing Lu, Frank Xu, Bill Yuchen Lin, Yejin Choi, Niloofar Mireshghallah, Ronan Le Bras, and Maarten Sap. Haicosystem: An ecosystem for sandboxing safety risks in human-ai interactions. arXiv preprint arXiv:2409.16427, 2024. [13]Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, and Maarten Sap. Openagentsafety: A comprehensive framework for evaluating real-world ai agent safety. arXiv preprint arXiv:2507.06134, 2025. [14]Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023. [15]Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. arXiv preprint arXiv:2306.06070, 2023. [16]Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems, volume 35, 2022. [17]Xing Han Lù, Zden ˇ ek Kasner, and Siva Reddy. Weblinx: Real-world website navigation with multi-turn dialogue. arXiv preprint arXiv:2402.05930, 2024. [18]Alexandre Drouin, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom Marty, Léo Boisvert, Megh Thakkar, Quentin Cappart, David Vazquez, Nicolas Chapados, and Alexandre Lacoste. Workarena: How capable are web agents at solving common knowledge work tasks? arXiv preprint arXiv:2403.07718, 2024. [19]Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments. arXiv preprint arXiv:2404.07972, 2024. [20]Leo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, et al. Doomarena: A framework for testing ai agents against evolving security threats. arXiv preprint arXiv:2504.14064, 2025. [21]Su Kara, Fazle Faisal, and Suman Nath. Warex: Web agent reliability evaluation on existing benchmarks. arXiv preprint arXiv:2510.03285, 2025. [22]Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, and Micah Goldblum. Commercial llm agents are already vulnerable to simple yet dangerous attacks. arXiv preprint arXiv:2502.08586, 2025. [23]Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage. arXiv preprint arXiv:2409.11295, 2024. [24]Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8387–8401, 2025. [25]Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, and Neil Zhenqiang Gong. Webinject: Prompt injection attack to web agents. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 2010–2030, 2025. [26] Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents. arXiv preprint arXiv:2410.17401, 2024. [27] Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, et al. Securewebarena: A holistic security evaluation benchmark for lvlm-based web agents. arXiv preprint arXiv:2510.10073, 2025. 8 ClawTrap: MITM-Based Red-Teaming for OpenClaw [28]Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Will Howard, Lukas Aichberger, Chris Russell, Philip HS Torr, Adam Mahdi, Adel Bibi, et al. It’s a trap! task-redirecting agent persuasion benchmark for web agents. arXiv preprint arXiv:2512.23128, 2025. [29]Pengfei He, Yuping Lin, Shen Dong, Han Xu, Yue Xing, and Hui Liu. Red-teaming llm multi-agent systems via communication attacks. In Findings of the Association for Computational Linguistics: ACL 2025, pages 6726–6747, 2025. [30] Alina Fastowski, Bardh Prenkaj, Yuxiao Li, and Gjergji Kasneci. Injecting falsehoods: Adversarial man-in-the- middle attacks undermining factual recall in llms. arXiv preprint arXiv:2511.05919, 2025. 9