Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

PaperIntel
AM3Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs

Yujin Zhou, Yike Guo, Chengkun Cai, Pengcheng Wen

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (97%)
Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs

Wei Wang, Mingyu Yu, Sujuan Qin, Lana Liu

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (97%)
Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Qiankun Li, Zhongxiang Sun, Kun Wang, Zhenhong Zhou

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, alignment-training, ai-safety, multimodal-safety

E5 / R3 (96%)
Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Mengxuan Wang, Hongjie Jiang, Ming Li, Gang Xu

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (95%)
Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models

Jiaxi Yang, Shicheng Liu, Yuchen Yang, Dongwon Lee

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, multimodal-safety

E5 / R3 (96%)
Text is All You Need for Vision-Language Model Jailbreaking

Youyuan Jiang, Tianle Zheng, Yihang Chen, Cho-Jui Hsieh

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E6 / R3 (96%)
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Hanxun Huang, Yutao Wu, Yige Li, Kaiyuan Cui

Year: 2026Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (95%)
Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering

Chenghao Xu, Qi Liu, Jiexi Yan, Fen Fang

Year: 2026Area: Multimodal SafetyCitations: 1

Tags: empirical, ai-safety, multimodal-safety

E5 / R3 (95%)
When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life

Xiangyu Shi, Fengran Mo, Su Yao, Youwei Liao

Year: 2026Area: Multimodal SafetyCitations: -

Tags: alignment-training, ai-safety, multimodal-safety, benchmark

E4 / R2 (94%)
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Wai Kin Victor Chan, Fangming Liu, Haochen Han, Yining Sun

Year: 2026Area: Multimodal SafetyCitations: 1

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (95%)
A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models

Qinqin He, Jiaqi Weng, Hui Xue, Jialing Tao

Year: 2025Area: Multimodal SafetyCitations: 3

Tags: empirical, ai-safety, multimodal-safety

E5 / R3 (96%)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations

Nenghai Yu, Wenke Huang, Dacheng Tao, Xuankun Rong

Year: 2025Area: Multimodal SafetyCitations: 47

Tags: ai-safety, survey, multimodal-safety, safety-evaluation

E5 / R3 (95%)
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models

Gaojie Jin, Xiaowei Huang, Wei Huang, Sihao Wu

Year: 2025Area: Multimodal SafetyCitations: 1

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (96%)
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

Chaohu Liu, Linli Xu, Tianyi Gui, Yu Liu

Year: 2025Area: Multimodal SafetyCitations: 3

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E6 / R4 (94%)
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Yihao Huang, Simeng Qin, Tianyu Pang, Xiaojun Jia

Year: 2025Area: Multimodal SafetyCitations: 20

Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (94%)
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey

Shashank Kapoor, Ankit Shetgaonkar, Lakshit Arora, Dipen Pradhan

Year: 2025Area: Multimodal SafetyCitations: 2

Tags: ai-safety, adversarial-robustness, survey, multimodal-safety

E6 / R3 (94%)
Adversary-Aware DPO: Enhancing Safety Alignment in Vision Language Models via Adversarial Training

Minlie Huang, Fenghua Weng, Jun Feng, Jian Lou

Year: 2025Area: Multimodal SafetyCitations: 6

Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (96%)
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

Usman Naseem, Mark Dras, Juan Ren

Year: 2025Area: Multimodal SafetyCitations: 1

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E6 / R3 (97%)
Attack as Defense: Safeguarding Large Vision-Language Models from Jailbreaking by Adversarial Attacks

Chongxin Li, Yuchun Fang, Hanzhang Wang

Year: 2025Area: Multimodal SafetyCitations: 2

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (97%)
Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

Rajiv Mathews, Soheil Feizi, Lun Wang, Vinu Sankar Sadasivan

Year: 2025Area: Multimodal SafetyCitations: 5

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (97%)
Attention! Your Vision Language Model Could Be Maliciously Manipulated

Shaokang Wang, Shudong Zhang, Zhijin Ge, Xiaosen Wang

Year: 2025Area: Multimodal SafetyCitations: 3

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E7 / R3 (97%)
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts

Lin Wang, Weiping Wang, Xiaojun Jia, Wanqian Zhang

Year: 2025Area: Multimodal SafetyCitations: 1

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (96%)
Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Wenke Huang, Jinhe Bi, Xuankun Rong, Jian Liang

Year: 2025Area: Multimodal SafetyCitations: 14

Tags: empirical, ai-safety, multimodal-safety

E5 / R3 (96%)
Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images

Aditya Kumar, Franziska Boenisch, Tom Blanchard, Adam Dziedzic

Year: 2025Area: Multimodal SafetyCitations: -

Tags: ai-safety, multimodal-safety, benchmark

E5 / R3 (94%)
Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts

Hee-Seon Kim, Changick Kim, Wonjun Lee, Kihyun Kim

Year: 2025Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (94%)
Beyond the Safety Tax: Mitigating Unsafe Text-to-Image Generation via External Safety Rectification

Xiangtao Meng, Zheng Li, Yingkai Dong, Ning Yu

Year: 2025Area: Multimodal SafetyCitations: -

Tags: empirical, ai-safety, multimodal-safety

E5 / R3 (93%)
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

Spencer Stice, Wenhan Yang, Baharan Mirzasoleiman, Ali Payani

Year: 2025Area: Multimodal SafetyCitations: 1

Tags: empirical, ai-safety, multimodal-safety

E6 / R3 (96%)
Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities

Michael Backes, Yiting Qu, Yang Zhang

Year: 2025Area: Multimodal SafetyCitations: 1

Tags: empirical, alignment-training, ai-safety, multimodal-safety

E5 / R3 (98%)
CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models

Jinwei Hu, Wenjie Ruan, Yi Dong, Jiaxu Liu

Year: 2025Area: Multimodal SafetyCitations: -

Tags: theoretical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R4 (96%)
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities

Jiahui Geng, Preslav Nakov, Thy Thy Tran, Iryna Gurevych

Year: 2025Area: Multimodal SafetyCitations: 2

Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety

E5 / R3 (97%)

Showing 30 of 276 papers on page 1.