Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
| Paper | Year | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| A Causal Perspective for Enhancing Jailbreak Attack and Defense Kui Ren, Haozhe Feng, Licheng Pan, Hui Xue Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E6 / R3 (94%) | - |
| A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode Yupeng Chen, Philip Torr, Eric Sommerlade, Jialin Yu Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E6 / R3 (95%) | - |
| A White-Box Prompt Injection Attack on Embodied AI Agents Driven by Large Language Models Yubin Qu, W. E. Wong, Tongcheng Geng Year: 2026Area: Adversarial RobustnessCitations: 1 Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E4 / R3 (95%) | 1 |
| ADV-0: Closed-Loop Min-Max Adversarial Training for Long-Tail Robustness in Autonomous Driving Wei Ma, Jie Sun, Junlin He, Yihong Tang Year: 2026Area: cs.LGCitations: - Tags: ai-safety, cslg, adversarial-robustness, preprint | 2026 | cs.LG | ai-safety, cslg, adversarial-robustness, preprint | - | - |
| AM3Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs Yujin Zhou, Yike Guo, Chengkun Cai, Pengcheng Wen Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | - |
| Adversarial Latent-State Training for Robust Policies in Partially Observable Domains Angad Singh Ahuja Year: 2026Area: cs.LGCitations: - Tags: ai-safety, cslg, adversarial-robustness, preprint | 2026 | cs.LG | ai-safety, cslg, adversarial-robustness, preprint | E4 / R3 (94%) | - |
| Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing Aron Laszka, Yevgeniy Vorobeychik, Taha Eghtesad Year: 2026Area: cs.AICitations: - Tags: ai-safety, adversarial-robustness, csai, preprint | 2026 | cs.AI | ai-safety, adversarial-robustness, csai, preprint | - | - |
| Adversarial attacks against Modern Vision-Language Models Alejandro Paredes La Torre Year: 2026Area: cs.CRCitations: - Tags: ai-safety, adversarial-robustness, cscr, preprint | 2026 | cs.CR | ai-safety, adversarial-robustness, cscr, preprint | - | - |
| Agentic Uncertainty Reveals Agentic Overconfidence Jean Kaddour, Leo Richter, Srijan Patel, Pasquale Minervini Year: 2026Area: Agent SafetyCitations: - Tags: empirical, agent-safety, ai-safety, adversarial-robustness | 2026 | Agent Safety | empirical, agent-safety, ai-safety, adversarial-robustness | E5 / R3 (96%) | - |
| Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models Nikolay Matyunin, Gurang Gupta, Jibesh Patra, Ali Raza Year: 2026Area: cs.CRCitations: - Tags: ai-safety, adversarial-robustness, cscr, preprint | 2026 | cs.CR | ai-safety, adversarial-robustness, cscr, preprint | E5 / R3 (96%) | - |
| Among Us: Measuring and Mitigating Malicious Contributions in Model Collaboration Systems Shangbin Feng, Wenxuan Ding, Yulia Tsvetkov, Ziyuan Yang Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E4 / R3 (93%) | - |
| Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models Yuhong Wang, Zhihui Fu, Songze Li, Jiahao Chen Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E6 / R4 (96%) | - |
| Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation Nesreen Ahmed, Mahantesh Halappanavar, Haoyu Han, Yue Zhao Year: 2026Area: Adversarial RobustnessCitations: - Tags: ai-safety, adversarial-robustness, benchmark | 2026 | Adversarial Robustness | ai-safety, adversarial-robustness, benchmark | E5 / R3 (96%) | - |
| Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought Weihong Lin, Lin Sun, Jianfeng Si, Xiangzheng Zhang Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, alignment-training, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, alignment-training, ai-safety, adversarial-robustness | E5 / R4 (94%) | - |
| Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models Fadi Hassan, Hicham Eddoubi, Umar Faruk Abdullahi Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, safety-evaluation | E7 / R3 (98%) | - |
| Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs Wei Wang, Mingyu Yu, Sujuan Qin, Lana Liu Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | - |
| Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks Murat Kantarcioglu, Jafar Isbarov Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E5 / R3 (96%) | - |
| CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns Qiankun Li, Kun Wang, Shilinlu Yan, Zhenhong Zhou Year: 2026Area: Safety EvaluationCitations: - Tags: ai-safety, adversarial-robustness, safety-evaluation, benchmark | 2026 | Safety Evaluation | ai-safety, adversarial-robustness, safety-evaluation, benchmark | E6 / R5 (96%) | - |
| Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems Jose Sanchez Vicarte, Anjo Vahldiek-Oberwagner, Sarbartha Banerjee, Prateek Sahu Year: 2026Area: cs.CRCitations: - Tags: ai-safety, adversarial-robustness, cscr, preprint | 2026 | cs.CR | ai-safety, adversarial-robustness, cscr, preprint | - | - |
| ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents Hwan Chang, Yonghyun Jun, Hwanhee Lee Year: 2026Area: Adversarial RobustnessCitations: 9 Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E5 / R4 (97%) | 9 |
| CoT Defender: Preemptive Chain-of-Thought Occupation for Jailbreak Attack Mitigation Yihe Wang, Xiao Yu, Jin Liu, Xiaokang Li Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | - | - |
| Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks Rob Gilson, Bobby Chen, Christopher Liu, Hoagy Cunningham Year: 2026Area: Adversarial RobustnessCitations: 6 Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E5 / R3 (94%) | 6 |
| Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model? Anna Chistyakova, Mikhail Pautov Year: 2026Area: cs.LGCitations: - Tags: ai-safety, cslg, adversarial-robustness, preprint | 2026 | cs.LG | ai-safety, cslg, adversarial-robustness, preprint | - | - |
| David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning Tal Kachman, Samuel Nellessen Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E5 / R3 (95%) | - |
| Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions Madhav S. Baidya, Chirag Chawla, S. S. Baidya Year: 2026Area: cs.CLCitations: - Tags: cscl, ai-safety, adversarial-robustness, preprint | 2026 | cs.CL | cscl, ai-safety, adversarial-robustness, preprint | - | - |
| Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation Zafar Ayyub Qazi, Tao Ni, Shayan Ali Hassan, Marco Canini Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness | E5 / R3 (98%) | - |
| Enhancing Network Intrusion Detection Systems: A Multi-Layer Ensemble Approach to Mitigate Adversarial Attacks Raphael Khoury, Kelton A. P. Costa, Nasim Soltani, Shayan Nejadshamsi Year: 2026Area: cs.CRCitations: - Tags: ai-safety, adversarial-robustness, cscr, preprint | 2026 | cs.CR | ai-safety, adversarial-robustness, cscr, preprint | - | - |
| Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking Chengze Yan, Yunlou Fan, Jiacheng Ji, Hanhui Xu Year: 2026Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation | 2026 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, safety-evaluation | E5 / R3 (96%) | - |
| Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs Zhi Rui Tam, Yen-Shan Chen, Yun-Nung Chen, Cheng-Kuang Wu Year: 2026Area: Safety EvaluationCitations: - Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation | 2026 | Safety Evaluation | empirical, ai-safety, adversarial-robustness, safety-evaluation | E5 / R3 (95%) | - |
| From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning Haibo Hu, Zhibiao Guo, Qingqing Ye, Zi Liang Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, adversarial-robustness, model-editing | 2026 | Model Editing | empirical, ai-safety, adversarial-robustness, model-editing | E7 / R4 (95%) | - |
Showing 30 of 1490 papers on page 1.