Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

PaperIntel
Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

Bhaskarjit Sarmah, Fabrizio Dimino, Stefano Pasquali

Year: 2026Area: q-fin.CPCitations: -

Tags: q-fincp, ai-safety, red-teaming, preprint

-
A Red Teaming Roadmap Towards System-Level Safety

Jeremy Kritz, Zifan Wang, Julian Michael, Willow E. Primack

Year: 2025Area: Safety EvaluationCitations: 2

Tags: ai-safety, position, safety-evaluation, red-teaming

E5 / R3 (92%)
AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Zhanyu Ma, Keqing He, Yutao Mou, Shikun Zhang

Year: 2025Area: Safety EvaluationCitations: -

Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming

E5 / R4 (97%)
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

Francesco Pinto, Shuang Yang, Zhaorun Chen, Bo Li

Year: 2025Area: Safety EvaluationCitations: 17

Tags: ai-safety, tool, safety-evaluation, red-teaming

E5 / R3 (96%)
Automatic LLM Red Teaming

Roman Belaire, Pradeep Varakantham, Arunesh Sinha

Year: 2025Area: Safety EvaluationCitations: 1

Tags: empirical, ai-safety, safety-evaluation, red-teaming

E5 / R3 (93%)
BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing

Caelin Kaplan, Neil Archibald, Alexander Warnecke

Year: 2025Area: Safety EvaluationCitations: -

Tags: ai-safety, tool, safety-evaluation, red-teaming

E7 / R4 (98%)
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Alberto Purpura, Jesse Zymet, Sahil Wadhwa, Akshay Gupta

Year: 2025Area: Safety EvaluationCitations: 7

Tags: ai-safety, survey, safety-evaluation, red-teaming

E6 / R4 (96%)
Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

Hanna Wallach, Alexandra Chouldechova, Abhinav Palia, Solon Barocas

Year: 2025Area: Safety EvaluationCitations: 1

Tags: theoretical, ai-safety, safety-evaluation, red-teaming

E5 / R3 (95%)
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Rob Gilson, Peter Lofgren, Euan Ong, Logan Graham

Year: 2025Area: Adversarial RobustnessCitations: 105

Tags: empirical, ai-safety, adversarial-robustness, red-teaming

E5 / R3 (97%)
DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

Junjie Wang, Boheng Li, Run Wang, Yiming Li

Year: 2025Area: Adversarial RobustnessCitations: 2

Tags: empirical, ai-safety, adversarial-robustness, red-teaming

E4 / R3 (96%)
Foundation Models as Guardrails: LLM-and VLM-Based Approaches to Safety and Alignment

Koki Wataoka, Huy H. Nguyen, Tomoya Kurosawa, Pride Kavumba

Year: 2025Area: Adversarial RobustnessCitations: -

Tags: alignment-training, ai-safety, adversarial-robustness, survey, safety-evaluation, red-teaming

E5 / R3 (92%)
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models

Yu-Gang Jiang, Xiang Zheng, Bo Wang, Xiaosen Wang

Year: 2025Area: Multimodal SafetyCitations: 2

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety, red-teaming

E5 / R3 (95%)
Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models

Abhinav Aggarwal, David Zhang, Ankit Jain, Kai Hu

Year: 2025Area: Adversarial RobustnessCitations: -

Tags: empirical, ai-safety, adversarial-robustness, red-teaming

E5 / R3 (95%)
Lessons From Red Teaming 100 Generative AI Products

Pete Bryan, Yonatan Zunger, Roman Lutz, Daniel Jones

Year: 2025Area: Safety EvaluationCitations: 20

Tags: empirical, ai-safety, safety-evaluation, red-teaming

E5 / R3 (97%)
OpenAI's Approach to External Red Teaming for AI Models and Systems

Michael Lampe, Lama Ahmad, Pamela Mishkin, Sandhini Agarwal

Year: 2025Area: Safety EvaluationCitations: 33

Tags: ai-safety, position, safety-evaluation, red-teaming

E6 / R4 (97%)
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training

Pengfei Du

Year: 2025Area: Adversarial RobustnessCitations: 2

Tags: empirical, alignment-training, ai-safety, adversarial-robustness, red-teaming

E5 / R3 (95%)
Query-efficient and dataset-independent red teaming for LLMs content safety evaluation

Sen Su, Shuo Liu, Xiang Cheng

Year: 2025Area: Safety EvaluationCitations: 1

Tags: empirical, ai-safety, safety-evaluation, red-teaming

E5 / R3 (97%)
Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives

Miriam Ugarte, Miguel Romero-Arjona, Vicente Cambrón, José A. Parejo

Year: 2025Area: Safety EvaluationCitations: 7

Tags: empirical, ai-safety, safety-evaluation, red-teaming

E5 / R4 (97%)
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs

Chetan Pathade

Year: 2025Area: Adversarial RobustnessCitations: 30

Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming

E7 / R3 (95%)
Red teaming large language models: A comprehensive review and critical analysis

Sadam Al-Azani, Muhammad Shahid Jabbar, Abrar Alotaibi, Moataz Ahmed

Year: 2025Area: Safety EvaluationCitations: 2

Tags: ai-safety, survey, safety-evaluation, red-teaming

E6 / R3 (99%)
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Chris Ngo, Truong-Son Hy, Quy-Anh Dang

Year: 2025Area: Safety EvaluationCitations: -

Tags: ai-safety, dataset, safety-evaluation, red-teaming

E5 / R4 (95%)
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

Stephen Obadinma, Radin Shayanfar, Ali Asad, Xiaodan Zhu

Year: 2025Area: Safety EvaluationCitations: 3

Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming

E7 / R4 (97%)
RedDiffuser: Red Teaming Vision-Language Models for Toxic Continuation via Reinforced Stable Diffusion

Ruofan Wang, Cong Wang, Xiang Zheng, Xiaosen Wang

Year: 2025Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, multimodal-safety, red-teaming

E6 / R3 (96%)
RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming

Cong Wang, Xiang Zheng, Wei-Bin Lee, Xingjun Ma

Year: 2025Area: Safety EvaluationCitations: 1

Tags: ai-safety, safety-evaluation, red-teaming, benchmark

E5 / R3 (97%)
Reliable Weak-to-Strong Monitoring of LLM Agents

Chen Bo Calvin Zhang, Paula Rodriguez, Ankit Aich, Kevin Zhu

Year: 2025Area: Scalable OversightCitations: 4

Tags: scalable-oversight, empirical, ai-safety, red-teaming

E5 / R3 (95%)
Scaling Responsible Generative AI: Automating Red Teaming of LLM Applications

Adison Goh, Benjamin Chee, Matteo Vagnoli, Luca Baldassarre

Year: 2025Area: Safety EvaluationCitations: -

Tags: ai-safety, adversarial-robustness, tool, safety-evaluation, red-teaming

E4 / R2 (94%)
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning

Ruoxi Jia, Ninareh Mehrabi, Xiao Yu, Si Chen

Year: 2025Area: Safety EvaluationCitations: 9

Tags: empirical, ai-safety, adversarial-robustness, safety-evaluation, red-teaming

E5 / R3 (96%)
The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Haoming Tang, Siying Hu, Xiangzhe Yuan, Zhenhao Zhang

Year: 2025Area: Safety EvaluationCitations: -

Tags: empirical, ai-safety, safety-evaluation, red-teaming

E5 / R3 (93%)
The Automation Advantage in AI Red Teaming

Ads Dawson, Brian Greunke, Rob Mulla, Brad Palm

Year: 2025Area: Safety EvaluationCitations: 2

Tags: empirical, ai-safety, safety-evaluation, red-teaming

E5 / R3 (96%)
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

Shuang Yang, Jiawei Zhang, Bo Li

Year: 2025Area: Adversarial RobustnessCitations: 9

Tags: empirical, ai-safety, adversarial-robustness, red-teaming

E6 / R4 (96%)

Showing 30 of 60 papers on page 1.