Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
| Paper | Year | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services Bhaskarjit Sarmah, Fabrizio Dimino, Stefano Pasquali Year: 2026Area: q-fin.CPCitations: - Tags: q-fincp, ai-safety, red-teaming, preprint | 2026 | q-fin.CP | q-fincp, ai-safety, red-teaming, preprint | - | - |
| A Red Teaming Roadmap Towards System-Level Safety Jeremy Kritz, Zifan Wang, Julian Michael, Willow E. Primack Year: 2025Area: Safety EvaluationCitations: 2 Tags: ai-safety, position, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, position, safety-evaluation, red-teaming | E5 / R3 (92%) | 2 |
| AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming Zhanyu Ma, Keqing He, Yutao Mou, Shikun Zhang Year: 2025Area: Safety EvaluationCitations: - Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | E5 / R4 (97%) | - |
| AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration Francesco Pinto, Shuang Yang, Zhaorun Chen, Bo Li Year: 2025Area: Safety EvaluationCitations: 17 Tags: ai-safety, tool, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, tool, safety-evaluation, red-teaming | E5 / R3 (96%) | 17 |
| Automatic LLM Red Teaming Roman Belaire, Pradeep Varakantham, Arunesh Sinha Year: 2025Area: Safety EvaluationCitations: 1 Tags: empirical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, safety-evaluation, red-teaming | E5 / R3 (93%) | 1 |
| BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing Caelin Kaplan, Neil Archibald, Alexander Warnecke Year: 2025Area: Safety EvaluationCitations: - Tags: ai-safety, tool, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, tool, safety-evaluation, red-teaming | E7 / R4 (98%) | - |
| Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models Alberto Purpura, Jesse Zymet, Sahil Wadhwa, Akshay Gupta Year: 2025Area: Safety EvaluationCitations: 7 Tags: ai-safety, survey, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, survey, safety-evaluation, red-teaming | E6 / R4 (96%) | 7 |
| Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming Hanna Wallach, Alexandra Chouldechova, Abhinav Palia, Solon Barocas Year: 2025Area: Safety EvaluationCitations: 1 Tags: theoretical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | theoretical, ai-safety, safety-evaluation, red-teaming | E5 / R3 (95%) | 1 |
| Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Rob Gilson, Peter Lofgren, Euan Ong, Logan Graham Year: 2025Area: Adversarial RobustnessCitations: 105 Tags: empirical, ai-safety, adversarial-robustness, red-teaming | 2025 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, red-teaming | E5 / R3 (97%) | 105 |
| DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling Junjie Wang, Boheng Li, Run Wang, Yiming Li Year: 2025Area: Adversarial RobustnessCitations: 2 Tags: empirical, ai-safety, adversarial-robustness, red-teaming | 2025 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, red-teaming | E4 / R3 (96%) | 2 |
| Foundation Models as Guardrails: LLM-and VLM-Based Approaches to Safety and Alignment Koki Wataoka, Huy H. Nguyen, Tomoya Kurosawa, Pride Kavumba Year: 2025Area: Adversarial RobustnessCitations: - Tags: alignment-training, ai-safety, adversarial-robustness, survey, safety-evaluation, red-teaming | 2025 | Adversarial Robustness | alignment-training, ai-safety, adversarial-robustness, survey, safety-evaluation, red-teaming | E5 / R3 (92%) | - |
| GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models Yu-Gang Jiang, Xiang Zheng, Bo Wang, Xiaosen Wang Year: 2025Area: Multimodal SafetyCitations: 2 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety, red-teaming | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety, red-teaming | E5 / R3 (95%) | 2 |
| Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models Abhinav Aggarwal, David Zhang, Ankit Jain, Kai Hu Year: 2025Area: Adversarial RobustnessCitations: - Tags: empirical, ai-safety, adversarial-robustness, red-teaming | 2025 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, red-teaming | E5 / R3 (95%) | - |
| Lessons From Red Teaming 100 Generative AI Products Pete Bryan, Yonatan Zunger, Roman Lutz, Daniel Jones Year: 2025Area: Safety EvaluationCitations: 20 Tags: empirical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, safety-evaluation, red-teaming | E5 / R3 (97%) | 20 |
| OpenAI's Approach to External Red Teaming for AI Models and Systems Michael Lampe, Lama Ahmad, Pamela Mishkin, Sandhini Agarwal Year: 2025Area: Safety EvaluationCitations: 33 Tags: ai-safety, position, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, position, safety-evaluation, red-teaming | E6 / R4 (97%) | 33 |
| PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training Pengfei Du Year: 2025Area: Adversarial RobustnessCitations: 2 Tags: empirical, alignment-training, ai-safety, adversarial-robustness, red-teaming | 2025 | Adversarial Robustness | empirical, alignment-training, ai-safety, adversarial-robustness, red-teaming | E5 / R3 (95%) | 2 |
| Query-efficient and dataset-independent red teaming for LLMs content safety evaluation Sen Su, Shuo Liu, Xiang Cheng Year: 2025Area: Safety EvaluationCitations: 1 Tags: empirical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, safety-evaluation, red-teaming | E5 / R3 (97%) | 1 |
| Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives Miriam Ugarte, Miguel Romero-Arjona, Vicente Cambrón, José A. Parejo Year: 2025Area: Safety EvaluationCitations: 7 Tags: empirical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, safety-evaluation, red-teaming | E5 / R4 (97%) | 7 |
| Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade Year: 2025Area: Adversarial RobustnessCitations: 30 Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | 2025 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | E7 / R3 (95%) | 30 |
| Red teaming large language models: A comprehensive review and critical analysis Sadam Al-Azani, Muhammad Shahid Jabbar, Abrar Alotaibi, Moataz Ahmed Year: 2025Area: Safety EvaluationCitations: 2 Tags: ai-safety, survey, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, survey, safety-evaluation, red-teaming | E6 / R3 (99%) | 2 |
| RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models Chris Ngo, Truong-Son Hy, Quy-Anh Dang Year: 2025Area: Safety EvaluationCitations: - Tags: ai-safety, dataset, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, dataset, safety-evaluation, red-teaming | E5 / R4 (95%) | - |
| RedDebate: Safer Responses through Multi-Agent Red Teaming Debates Stephen Obadinma, Radin Shayanfar, Ali Asad, Xiaodan Zhu Year: 2025Area: Safety EvaluationCitations: 3 Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | E7 / R4 (97%) | 3 |
| RedDiffuser: Red Teaming Vision-Language Models for Toxic Continuation via Reinforced Stable Diffusion Ruofan Wang, Cong Wang, Xiang Zheng, Xiaosen Wang Year: 2025Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, multimodal-safety, red-teaming | 2025 | Multimodal Safety | empirical, ai-safety, multimodal-safety, red-teaming | E6 / R3 (96%) | - |
| RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming Cong Wang, Xiang Zheng, Wei-Bin Lee, Xingjun Ma Year: 2025Area: Safety EvaluationCitations: 1 Tags: ai-safety, safety-evaluation, red-teaming, benchmark | 2025 | Safety Evaluation | ai-safety, safety-evaluation, red-teaming, benchmark | E5 / R3 (97%) | 1 |
| Reliable Weak-to-Strong Monitoring of LLM Agents Chen Bo Calvin Zhang, Paula Rodriguez, Ankit Aich, Kevin Zhu Year: 2025Area: Scalable OversightCitations: 4 Tags: scalable-oversight, empirical, ai-safety, red-teaming | 2025 | Scalable Oversight | scalable-oversight, empirical, ai-safety, red-teaming | E5 / R3 (95%) | 4 |
| Scaling Responsible Generative AI: Automating Red Teaming of LLM Applications Adison Goh, Benjamin Chee, Matteo Vagnoli, Luca Baldassarre Year: 2025Area: Safety EvaluationCitations: - Tags: ai-safety, adversarial-robustness, tool, safety-evaluation, red-teaming | 2025 | Safety Evaluation | ai-safety, adversarial-robustness, tool, safety-evaluation, red-teaming | E4 / R2 (94%) | - |
| Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning Ruoxi Jia, Ninareh Mehrabi, Xiao Yu, Si Chen Year: 2025Area: Safety EvaluationCitations: 9 Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming | E5 / R3 (96%) | 9 |
| The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs Haoming Tang, Siying Hu, Xiangzhe Yuan, Zhenhao Zhang Year: 2025Area: Safety EvaluationCitations: - Tags: empirical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, safety-evaluation, red-teaming | E5 / R3 (93%) | - |
| The Automation Advantage in AI Red Teaming Ads Dawson, Brian Greunke, Rob Mulla, Brad Palm Year: 2025Area: Safety EvaluationCitations: 2 Tags: empirical, ai-safety, safety-evaluation, red-teaming | 2025 | Safety Evaluation | empirical, ai-safety, safety-evaluation, red-teaming | E5 / R3 (96%) | 2 |
| UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning Shuang Yang, Jiawei Zhang, Bo Li Year: 2025Area: Adversarial RobustnessCitations: 9 Tags: empirical, ai-safety, adversarial-robustness, red-teaming | 2025 | Adversarial Robustness | empirical, ai-safety, adversarial-robustness, red-teaming | E6 / R4 (96%) | 9 |
Showing 30 of 60 papers on page 1.