Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

PaperIntel
AIR: Improving Agent Safety through Incident Response

Jun Sun, Junjie Chen, Zibo Xiao

Year: 2026Area: Agent SafetyCitations: -

Tags: empirical, agent-safety, ai-safety

E5 / R3 (95%)
Agentic Uncertainty Reveals Agentic Overconfidence

Jean Kaddour, Leo Richter, Srijan Patel, Pasquale Minervini

Year: 2026Area: Agent SafetyCitations: -

Tags: empirical, agent-safety, ai-safety, adversarial-robustness

E5 / R3 (96%)
Authenticated Workflows: A Systems Approach to Protecting Agentic AI

Mohan Rajagopalan, Vinay Rao

Year: 2026Area: Agent SafetyCitations: -

Tags: theoretical, agent-safety, ai-safety

E5 / R4 (96%)
Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

Daniele Nardi, Piercosma Bisconti, Vincenzo Suriani, Marcantonio Bracale Syrnikov

Year: 2026Area: Agent SafetyCitations: 1

Tags: empirical, agent-safety, ai-safety

E5 / R4 (92%)
LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial Scenarios

Xia Hu, Ge Gao, Chujia Hu, Dongrui Liu

Year: 2026Area: Agent SafetyCitations: 1

Tags: agent-safety, ai-safety, adversarial-robustness, benchmark

E4 / R3 (95%)
Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World

Davood Soleymanzadeh, Yi Ding, Minghui Zheng, Joonkyung Kim

Year: 2026Area: Agent SafetyCitations: -

Tags: agent-safety, ai-safety, position

E5 / R4 (93%)
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Zhiqiang Lin, Zeyi Liao, Eric Fosler-Lussier, Yu Su

Year: 2026Area: Agent SafetyCitations: 12

Tags: agent-safety, ai-safety, adversarial-robustness, benchmark

E6 / R3 (95%)
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

Chaozhuo Li, Xi Zhang, Chenxu Wang, Songyang Liu

Year: 2026Area: Agent SafetyCitations: -

Tags: empirical, alignment-training, agent-safety, ai-safety

E4 / R3 (95%)
The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents

Qian Wang, Yuan Yang, Ziyao Liu, Kwok-Yan Lam

Year: 2026Area: Agent SafetyCitations: -

Tags: alignment-training, agent-safety, ai-safety, benchmark

E4 / R3 (96%)
Toward Constitutional Autonomy in AI Systems: A Theoretical Framework for Aligned Agentic Intelligence

William Torgbi Agbemabiese

Year: 2026Area: Agent SafetyCitations: 1

Tags: theoretical, alignment-training, agent-safety, ai-safety

E4 / R3 (93%)
TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents

Yang Yu, Chong Zhang, Yong Wang, Yibing Liu

Year: 2026Area: Agent SafetyCitations: -

Tags: empirical, agent-safety, ai-safety

E5 / R3 (96%)
Trustworthy Agentic AI Requires Deterministic Architectural Boundaries

Manish Bhattarai, Minh Vu

Year: 2026Area: Agent SafetyCitations: -

Tags: theoretical, alignment-training, agent-safety, ai-safety

E5 / R4 (97%)
A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

Moksh Nirvaan, Raghu Arghal, Fade Chen, Angelos Nalmpantis

Year: 2025Area: Agent SafetyCitations: -

Tags: empirical, agent-safety, ai-safety, safety-evaluation

E4 / R3 (94%)
A Sketch of an AI Control Safety Case

Geoffrey Irving, Buck Shlegeris, Tomek Korbak, Joshua Clymer

Year: 2025Area: Agent SafetyCitations: 22

Tags: agent-safety, ai-safety, position, safety-evaluation

E6 / R4 (93%)
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures

Dezhang Kong, Xuan Liu, Yuyuan Li, Zhenhua Xu

Year: 2025Area: Agent SafetyCitations: 35

Tags: agent-safety, ai-safety, survey

E5 / R3 (97%)
A Survey on Agentic Security: Applications, Threats and Defenses

Asif Shahriar, Sadif Ahmed, Farig Sadeque, Md Nafiu Rahman

Year: 2025Area: Agent SafetyCitations: 6

Tags: agent-safety, ai-safety, adversarial-robustness, survey

E6 / R4 (96%)
A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents

Chang Liu, Jun Zhu, Jun Luo, Hang Su

Year: 2025Area: Agent SafetyCitations: 12

Tags: agent-safety, ai-safety, survey

E5 / R3 (95%)
A Survey on Trustworthy LLM Agents: Threats and Countermeasures

Qingsong Wen, Shilong Wang, Bo An, Linsey Pang

Year: 2025Area: Agent SafetyCitations: 55

Tags: agent-safety, ai-safety, survey

E5 / R4 (95%)
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

Yongjiang Wu, Jen-tse Huang, Wenxuan Wang, Ada Chen

Year: 2025Area: Agent SafetyCitations: 15

Tags: agent-safety, ai-safety, survey

E5 / R3 (95%)
ACE: A Security Architecture for LLM-Integrated App Systems

William Robertson, Cristina Nita-Rotaru, Alina Oprea, Evan Li

Year: 2025Area: Agent SafetyCitations: 16

Tags: empirical, agent-safety, ai-safety

E5 / R3 (95%)
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

Xianglong Liu, Dacheng Tao, Jiakai Wang, Siyuan Liang

Year: 2025Area: Agent SafetyCitations: 13

Tags: agent-safety, ai-safety, adversarial-robustness, benchmark

E5 / R4 (99%)
AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Shenghong Dai, Muhao Chen, Suman Banerjee, Chaowei Xiao

Year: 2025Area: Agent SafetyCitations: 29

Tags: empirical, agent-safety, ai-safety

E7 / R4 (95%)
AI Kill Switch for Malicious Web-based LLM Agents

Sechan Lee, Sangdon Park

Year: 2025Area: Agent SafetyCitations: -

Tags: empirical, agent-safety, ai-safety

E5 / R4 (96%)
AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery

Rupeng Zhang, Junjie Wang, Qing Wang, Xiaojun Jia

Year: 2025Area: Agent SafetyCitations: 4

Tags: empirical, agent-safety, ai-safety

E5 / R3 (96%)
Adapting Insider Risk mitigations for Agentic Misalignment: an empirical study

Francesca Gomez

Year: 2025Area: Agent SafetyCitations: -

Tags: empirical, alignment-training, agent-safety, ai-safety

E6 / R3 (96%)
AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models

Lu Yin, Jinchuan Zhang, Songlin Hu, Yan Zhou

Year: 2025Area: Agent SafetyCitations: 4

Tags: empirical, alignment-training, agent-safety, ai-safety

E5 / R3 (96%)
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement

J Rosser, Jakob Foerster

Year: 2025Area: Agent SafetyCitations: 6

Tags: empirical, agent-safety, ai-safety, adversarial-robustness

E5 / R4 (94%)
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

Arman Zharmagambetov, Maya Pavlova, Kamalika Chaudhuri, Chuan Guo

Year: 2025Area: Agent SafetyCitations: 31

Tags: agent-safety, ai-safety, safety-evaluation, benchmark

E6 / R3 (97%)
AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

David Mimran, Nadya Abaev, Denis Klimov, Gerard Levinov

Year: 2025Area: Agent SafetyCitations: 2

Tags: empirical, agent-safety, ai-safety

E4 / R3 (96%)
AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems

Faouzi El Yagoubi, Ranwa Al Mallah, Godwin Badu-Marfo

Year: 2025Area: Agent SafetyCitations: -

Tags: agent-safety, ai-safety, benchmark

E5 / R3 (95%)

Showing 30 of 195 papers on page 1.