Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
| Paper | Year | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| Constructing Safety Cases for AI Systems: A Reusable Template Framework Jieshan Chen, Md Shamsujjoha, Sung Une Lee, Liming Dong Year: 2026Area: Safety EvaluationCitations: - Tags: ai-safety, survey, safety-evaluation | 2026 | Safety Evaluation | ai-safety, survey, safety-evaluation | E5 / R4 (97%) | - |
| Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems Dmitry Namiot, Narek Maloyan Year: 2026Area: Adversarial RobustnessCitations: - Tags: ai-safety, adversarial-robustness, survey | 2026 | Adversarial Robustness | ai-safety, adversarial-robustness, survey | E6 / R3 (97%) | - |
| A CIA Triad-Based Taxonomy of Prompt Attacks on Large Language Models Amr Adel, Tony Jan, Nicholas Jones, Afnan Alkreisat Year: 2025Area: Adversarial RobustnessCitations: 6 Tags: ai-safety, adversarial-robustness, survey | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey | E7 / R4 (97%) | 6 |
| A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models Herbert Woisetschläger, Jiahui Geng, Qing Li, Zongxiong Chen Year: 2025Area: Model EditingCitations: 23 Tags: ai-safety, survey, safety-evaluation, model-editing | 2025 | Model Editing | ai-safety, survey, safety-evaluation, model-editing | E6 / R4 (97%) | 23 |
| A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models Virilo Tejedor, Andrés Herrera-Poyatos, David Herrera-Poyatos, Cristina Zuheros Year: 2025Area: Adversarial RobustnessCitations: 1 Tags: alignment-training, ai-safety, adversarial-robustness, survey | 2025 | Adversarial Robustness | alignment-training, ai-safety, adversarial-robustness, survey | E6 / R4 (96%) | 1 |
| A Review of Developmental Interpretability in Large Language Models Ihor Kendiukhov Year: 2025Area: Surveys & ReviewsCitations: - Tags: surveys-reviews, ai-safety, survey, interpretability | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | E6 / R4 (94%) | - |
| A Survey of Attacks on Large Language Models Wenrui Xu, Keshab K. Parhi Year: 2025Area: Adversarial RobustnessCitations: 10 Tags: ai-safety, adversarial-robustness, survey | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey | E6 / R3 (95%) | 10 |
| A Survey of LLM Alignment: Instruction Understanding, Intention Reasoning, and Reliable Generation Qian Li, Ziqin Zhu, Shangguang Wang, Jianxin Li Year: 2025Area: Surveys & ReviewsCitations: 2 Tags: alignment-training, surveys-reviews, ai-safety, survey | 2025 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | E6 / R4 (97%) | 2 |
| A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures Dezhang Kong, Xuan Liu, Yuyuan Li, Zhenhua Xu Year: 2025Area: Agent SafetyCitations: 35 Tags: agent-safety, ai-safety, survey | 2025 | Agent Safety | agent-safety, ai-safety, survey | E5 / R3 (97%) | 35 |
| A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations Nenghai Yu, Wenke Huang, Dacheng Tao, Xuankun Rong Year: 2025Area: Multimodal SafetyCitations: 47 Tags: ai-safety, survey, multimodal-safety, safety-evaluation | 2025 | Multimodal Safety | ai-safety, survey, multimodal-safety, safety-evaluation | E5 / R3 (95%) | 47 |
| A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks Hieu Minh Nguyen Year: 2025Area: Surveys & ReviewsCitations: 5 Tags: alignment-training, surveys-reviews, ai-safety, survey, safety-evaluation | 2025 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey, safety-evaluation | E5 / R3 (92%) | 5 |
| A Survey on Agentic Security: Applications, Threats and Defenses Asif Shahriar, Sadif Ahmed, Farig Sadeque, Md Nafiu Rahman Year: 2025Area: Agent SafetyCitations: 6 Tags: agent-safety, ai-safety, adversarial-robustness, survey | 2025 | Agent Safety | agent-safety, ai-safety, adversarial-robustness, survey | E6 / R4 (96%) | 6 |
| A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents Chang Liu, Jun Zhu, Jun Luo, Hang Su Year: 2025Area: Agent SafetyCitations: 12 Tags: agent-safety, ai-safety, survey | 2025 | Agent Safety | agent-safety, ai-safety, survey | E5 / R3 (95%) | 12 |
| A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluation Methods Yihe Zhou, Tao Ni, Qingchuan Zhao, Wei-Bin Lee Year: 2025Area: Adversarial RobustnessCitations: 24 Tags: ai-safety, adversarial-robustness, survey, safety-evaluation | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey, safety-evaluation | E5 / R3 (95%) | 24 |
| A Survey on Data Security in Large Language Models Kang Chen, Jinhe Su, Yuanhui Yu, Li Shen Year: 2025Area: Surveys & ReviewsCitations: 1 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E5 / R3 (95%) | 1 |
| A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction Chengye Wang, Kaixiang Li, Yuyuan Li, Jianwei Yin Year: 2025Area: Surveys & ReviewsCitations: 2 Tags: surveys-reviews, ai-safety, survey, safety-evaluation | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey, safety-evaluation | E5 / R3 (93%) | 2 |
| A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models Ryan A. Rossi, Keivan Rezaei, Zhiyang Xu, Mohammad Beigi Year: 2025Area: Surveys & ReviewsCitations: 20 Tags: surveys-reviews, ai-safety, survey, interpretability | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | E7 / R4 (96%) | 20 |
| A Survey on Model Extraction Attacks and Defenses for Large Language Models Lincan Li, Yue Zhao, Yushun Dong, Kaixiang Zhao Year: 2025Area: Adversarial RobustnessCitations: 11 Tags: ai-safety, adversarial-robustness, survey, safety-evaluation | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey, safety-evaluation | E5 / R3 (95%) | 11 |
| A Survey on Progress in LLM Alignment from the Perspective of Reward Design Shoujin Wang, Zhibin Wu, Usman Naseem, Yanqiu Wu Year: 2025Area: Surveys & ReviewsCitations: 10 Tags: alignment-training, surveys-reviews, ai-safety, survey | 2025 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | E6 / R4 (95%) | 10 |
| A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of LLMs Xuansheng Wu, Mengnan Du, Ziyu Yao, Ninghao Liu Year: 2025Area: Surveys & ReviewsCitations: 34 Tags: surveys-reviews, ai-safety, survey, safety-evaluation | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey, safety-evaluation | E5 / R3 (94%) | 34 |
| A Survey on Trustworthy LLM Agents: Threats and Countermeasures Qingsong Wen, Shilong Wang, Bo An, Linsey Pang Year: 2025Area: Agent SafetyCitations: 55 Tags: agent-safety, ai-safety, survey | 2025 | Agent Safety | agent-safety, ai-safety, survey | E5 / R4 (95%) | 55 |
| A Survey on Unlearning in Large Language Models Ruichen Qiu, Xiao-Shan Gao, Honglin Wang, Fei Sun Year: 2025Area: Surveys & ReviewsCitations: 1 Tags: surveys-reviews, ai-safety, survey | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E5 / R3 (95%) | 1 |
| A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? Yongjiang Wu, Jen-tse Huang, Wenxuan Wang, Ada Chen Year: 2025Area: Agent SafetyCitations: 15 Tags: agent-safety, ai-safety, survey | 2025 | Agent Safety | agent-safety, ai-safety, survey | E5 / R3 (95%) | 15 |
| A Systematic Review of Poisoning Attacks Against Large Language Models Edward W. Staley, Marie Chau, Nathan Drenkow, Neil Fendley Year: 2025Area: Adversarial RobustnessCitations: 6 Tags: ai-safety, adversarial-robustness, survey | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey | E4 / R3 (95%) | 6 |
| A comprehensive survey of adversarial defense techniques in the visual domain Jun Zhang, Jun Lei, Yibing Dong, Sheng Long Year: 2025Area: Adversarial RobustnessCitations: - Tags: ai-safety, adversarial-robustness, survey | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey | - | - |
| AI Awareness Haoyuan Shi, Rongwu Xu, Xiaojian Li, Wei Xu Year: 2025Area: Surveys & ReviewsCitations: 4 Tags: surveys-reviews, ai-safety, survey, safety-evaluation | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey, safety-evaluation | E6 / R5 (97%) | 4 |
| AI Safety for Everyone Bálint Gyevnar, Atoosa Kasirzadeh Year: 2025Area: Surveys & ReviewsCitations: 17 Tags: surveys-reviews, ai-safety, survey | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E6 / R3 (96%) | 17 |
| AI Safety vs. AI Security: Demystifying the Distinction and Boundaries Zhiqiang Lin, Ness Shroff, Huan Sun Year: 2025Area: Surveys & ReviewsCitations: 2 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2025 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E5 / R4 (95%) | 2 |
| Adversarial Attacks in Multimodal Systems: A Practitioner's Survey Shashank Kapoor, Ankit Shetgaonkar, Lakshit Arora, Dipen Pradhan Year: 2025Area: Multimodal SafetyCitations: 2 Tags: ai-safety, adversarial-robustness, survey, multimodal-safety | 2025 | Multimodal Safety | ai-safety, adversarial-robustness, survey, multimodal-safety | E6 / R3 (94%) | 2 |
| Attack and defense techniques in large language models: A survey and new perspectives Kangkang Li, Yunxuan Liu, Kang Chen, Hefeng Chen Year: 2025Area: Adversarial RobustnessCitations: 5 Tags: ai-safety, adversarial-robustness, survey | 2025 | Adversarial Robustness | ai-safety, adversarial-robustness, survey | E7 / R4 (95%) | 5 |
Showing 30 of 193 papers on page 1.