Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

PaperIntel
Constructing Safety Cases for AI Systems: A Reusable Template Framework

Jieshan Chen, Md Shamsujjoha, Sung Une Lee, Liming Dong

Year: 2026Area: Safety EvaluationCitations: -

Tags: ai-safety, survey, safety-evaluation

E5 / R4 (97%)
Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Dmitry Namiot, Narek Maloyan

Year: 2026Area: Adversarial RobustnessCitations: -

Tags: ai-safety, adversarial-robustness, survey

E6 / R3 (97%)
A CIA Triad-Based Taxonomy of Prompt Attacks on Large Language Models

Amr Adel, Tony Jan, Nicholas Jones, Afnan Alkreisat

Year: 2025Area: Adversarial RobustnessCitations: 6

Tags: ai-safety, adversarial-robustness, survey

E7 / R4 (97%)
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Herbert Woisetschläger, Jiahui Geng, Qing Li, Zongxiong Chen

Year: 2025Area: Model EditingCitations: 23

Tags: ai-safety, survey, safety-evaluation, model-editing

E6 / R4 (97%)
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models

Virilo Tejedor, Andrés Herrera-Poyatos, David Herrera-Poyatos, Cristina Zuheros

Year: 2025Area: Adversarial RobustnessCitations: 1

Tags: alignment-training, ai-safety, adversarial-robustness, survey

E6 / R4 (96%)
A Review of Developmental Interpretability in Large Language Models

Ihor Kendiukhov

Year: 2025Area: Surveys & ReviewsCitations: -

Tags: surveys-reviews, ai-safety, survey, interpretability

E6 / R4 (94%)
A Survey of Attacks on Large Language Models

Wenrui Xu, Keshab K. Parhi

Year: 2025Area: Adversarial RobustnessCitations: 10

Tags: ai-safety, adversarial-robustness, survey

E6 / R3 (95%)
A Survey of LLM Alignment: Instruction Understanding, Intention Reasoning, and Reliable Generation

Qian Li, Ziqin Zhu, Shangguang Wang, Jianxin Li

Year: 2025Area: Surveys & ReviewsCitations: 2

Tags: alignment-training, surveys-reviews, ai-safety, survey

E6 / R4 (97%)
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures

Dezhang Kong, Xuan Liu, Yuyuan Li, Zhenhua Xu

Year: 2025Area: Agent SafetyCitations: 35

Tags: agent-safety, ai-safety, survey

E5 / R3 (97%)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations

Nenghai Yu, Wenke Huang, Dacheng Tao, Xuankun Rong

Year: 2025Area: Multimodal SafetyCitations: 47

Tags: ai-safety, survey, multimodal-safety, safety-evaluation

E5 / R3 (95%)
A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks

Hieu Minh Nguyen

Year: 2025Area: Surveys & ReviewsCitations: 5

Tags: alignment-training, surveys-reviews, ai-safety, survey, safety-evaluation

E5 / R3 (92%)
A Survey on Agentic Security: Applications, Threats and Defenses

Asif Shahriar, Sadif Ahmed, Farig Sadeque, Md Nafiu Rahman

Year: 2025Area: Agent SafetyCitations: 6

Tags: agent-safety, ai-safety, adversarial-robustness, survey

E6 / R4 (96%)
A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents

Chang Liu, Jun Zhu, Jun Luo, Hang Su

Year: 2025Area: Agent SafetyCitations: 12

Tags: agent-safety, ai-safety, survey

E5 / R3 (95%)
A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluation Methods

Yihe Zhou, Tao Ni, Qingchuan Zhao, Wei-Bin Lee

Year: 2025Area: Adversarial RobustnessCitations: 24

Tags: ai-safety, adversarial-robustness, survey, safety-evaluation

E5 / R3 (95%)
A Survey on Data Security in Large Language Models

Kang Chen, Jinhe Su, Yuanhui Yu, Li Shen

Year: 2025Area: Surveys & ReviewsCitations: 1

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E5 / R3 (95%)
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction

Chengye Wang, Kaixiang Li, Yuyuan Li, Jianwei Yin

Year: 2025Area: Surveys & ReviewsCitations: 2

Tags: surveys-reviews, ai-safety, survey, safety-evaluation

E5 / R3 (93%)
A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Ryan A. Rossi, Keivan Rezaei, Zhiyang Xu, Mohammad Beigi

Year: 2025Area: Surveys & ReviewsCitations: 20

Tags: surveys-reviews, ai-safety, survey, interpretability

E7 / R4 (96%)
A Survey on Model Extraction Attacks and Defenses for Large Language Models

Lincan Li, Yue Zhao, Yushun Dong, Kaixiang Zhao

Year: 2025Area: Adversarial RobustnessCitations: 11

Tags: ai-safety, adversarial-robustness, survey, safety-evaluation

E5 / R3 (95%)
A Survey on Progress in LLM Alignment from the Perspective of Reward Design

Shoujin Wang, Zhibin Wu, Usman Naseem, Yanqiu Wu

Year: 2025Area: Surveys & ReviewsCitations: 10

Tags: alignment-training, surveys-reviews, ai-safety, survey

E6 / R4 (95%)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of LLMs

Xuansheng Wu, Mengnan Du, Ziyu Yao, Ninghao Liu

Year: 2025Area: Surveys & ReviewsCitations: 34

Tags: surveys-reviews, ai-safety, survey, safety-evaluation

E5 / R3 (94%)
A Survey on Trustworthy LLM Agents: Threats and Countermeasures

Qingsong Wen, Shilong Wang, Bo An, Linsey Pang

Year: 2025Area: Agent SafetyCitations: 55

Tags: agent-safety, ai-safety, survey

E5 / R4 (95%)
A Survey on Unlearning in Large Language Models

Ruichen Qiu, Xiao-Shan Gao, Honglin Wang, Fei Sun

Year: 2025Area: Surveys & ReviewsCitations: 1

Tags: surveys-reviews, ai-safety, survey

E5 / R3 (95%)
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

Yongjiang Wu, Jen-tse Huang, Wenxuan Wang, Ada Chen

Year: 2025Area: Agent SafetyCitations: 15

Tags: agent-safety, ai-safety, survey

E5 / R3 (95%)
A Systematic Review of Poisoning Attacks Against Large Language Models

Edward W. Staley, Marie Chau, Nathan Drenkow, Neil Fendley

Year: 2025Area: Adversarial RobustnessCitations: 6

Tags: ai-safety, adversarial-robustness, survey

E4 / R3 (95%)
A comprehensive survey of adversarial defense techniques in the visual domain

Jun Zhang, Jun Lei, Yibing Dong, Sheng Long

Year: 2025Area: Adversarial RobustnessCitations: -

Tags: ai-safety, adversarial-robustness, survey

-
AI Awareness

Haoyuan Shi, Rongwu Xu, Xiaojian Li, Wei Xu

Year: 2025Area: Surveys & ReviewsCitations: 4

Tags: surveys-reviews, ai-safety, survey, safety-evaluation

E6 / R5 (97%)
AI Safety for Everyone

Bálint Gyevnar, Atoosa Kasirzadeh

Year: 2025Area: Surveys & ReviewsCitations: 17

Tags: surveys-reviews, ai-safety, survey

E6 / R3 (96%)
AI Safety vs. AI Security: Demystifying the Distinction and Boundaries

Zhiqiang Lin, Ness Shroff, Huan Sun

Year: 2025Area: Surveys & ReviewsCitations: 2

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E5 / R4 (95%)
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey

Shashank Kapoor, Ankit Shetgaonkar, Lakshit Arora, Dipen Pradhan

Year: 2025Area: Multimodal SafetyCitations: 2

Tags: ai-safety, adversarial-robustness, survey, multimodal-safety

E6 / R3 (94%)
Attack and defense techniques in large language models: A survey and new perspectives

Kangkang Li, Yunxuan Liu, Kang Chen, Hefeng Chen

Year: 2025Area: Adversarial RobustnessCitations: 5

Tags: ai-safety, adversarial-robustness, survey

E7 / R4 (95%)

Showing 30 of 193 papers on page 1.