Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
| Paper | Year | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| AM3Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs Yujin Zhou, Yike Guo, Chengkun Cai, Pengcheng Wen Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | - |
| Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs Wei Wang, Mingyu Yu, Sujuan Qin, Lana Liu Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | - |
| Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment Qiankun Li, Zhongxiang Sun, Kun Wang, Zhenhong Zhou Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, alignment-training, ai-safety, multimodal-safety | 2026 | Multimodal Safety | empirical, alignment-training, ai-safety, multimodal-safety | E5 / R3 (96%) | - |
| Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility Mengxuan Wang, Hongjie Jiang, Ming Li, Gang Xu Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (95%) | - |
| Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models Jiaxi Yang, Shicheng Liu, Yuchen Yang, Dongwon Lee Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, multimodal-safety | E5 / R3 (96%) | - |
| Text is All You Need for Vision-Language Model Jailbreaking Youyuan Jiang, Tianle Zheng, Yihang Chen, Cho-Jui Hsieh Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E6 / R3 (96%) | - |
| Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models Hanxun Huang, Yutao Wu, Yige Li, Kaiyuan Cui Year: 2026Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (95%) | - |
| Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering Chenghao Xu, Qi Liu, Jiexi Yan, Fen Fang Year: 2026Area: Multimodal SafetyCitations: 1 Tags: empirical, ai-safety, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, multimodal-safety | E5 / R3 (95%) | 1 |
| When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life Xiangyu Shi, Fengran Mo, Su Yao, Youwei Liao Year: 2026Area: Multimodal SafetyCitations: - Tags: alignment-training, ai-safety, multimodal-safety, benchmark | 2026 | Multimodal Safety | alignment-training, ai-safety, multimodal-safety, benchmark | E4 / R2 (94%) | - |
| When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Wai Kin Victor Chan, Fangming Liu, Haochen Han, Yining Sun Year: 2026Area: Multimodal SafetyCitations: 1 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2026 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (95%) | 1 |
| A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models Qinqin He, Jiaqi Weng, Hui Xue, Jialing Tao Year: 2025Area: Multimodal SafetyCitations: 3 Tags: empirical, ai-safety, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, multimodal-safety | E5 / R3 (96%) | 3 |
| A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations Nenghai Yu, Wenke Huang, Dacheng Tao, Xuankun Rong Year: 2025Area: Multimodal SafetyCitations: 47 Tags: ai-safety, survey, multimodal-safety, safety-evaluation | 2025 | Multimodal Safety | ai-safety, survey, multimodal-safety, safety-evaluation | E5 / R3 (95%) | 47 |
| Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models Gaojie Jin, Xiaowei Huang, Wei Huang, Sihao Wu Year: 2025Area: Multimodal SafetyCitations: 1 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (96%) | 1 |
| AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization Chaohu Liu, Linli Xu, Tianyi Gui, Yu Liu Year: 2025Area: Multimodal SafetyCitations: 3 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E6 / R4 (94%) | 3 |
| Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment Yihao Huang, Simeng Qin, Tianyu Pang, Xiaojun Jia Year: 2025Area: Multimodal SafetyCitations: 20 Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (94%) | 20 |
| Adversarial Attacks in Multimodal Systems: A Practitioner's Survey Shashank Kapoor, Ankit Shetgaonkar, Lakshit Arora, Dipen Pradhan Year: 2025Area: Multimodal SafetyCitations: 2 Tags: ai-safety, adversarial-robustness, survey, multimodal-safety | 2025 | Multimodal Safety | ai-safety, adversarial-robustness, survey, multimodal-safety | E6 / R3 (94%) | 2 |
| Adversary-Aware DPO: Enhancing Safety Alignment in Vision Language Models via Adversarial Training Minlie Huang, Fenghua Weng, Jun Feng, Jian Lou Year: 2025Area: Multimodal SafetyCitations: 6 Tags: empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, alignment-training, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (96%) | 6 |
| Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models Usman Naseem, Mark Dras, Juan Ren Year: 2025Area: Multimodal SafetyCitations: 1 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E6 / R3 (97%) | 1 |
| Attack as Defense: Safeguarding Large Vision-Language Models from Jailbreaking by Adversarial Attacks Chongxin Li, Yuchun Fang, Hanzhang Wang Year: 2025Area: Multimodal SafetyCitations: 2 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | 2 |
| Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World Rajiv Mathews, Soheil Feizi, Lun Wang, Vinu Sankar Sadasivan Year: 2025Area: Multimodal SafetyCitations: 5 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | 5 |
| Attention! Your Vision Language Model Could Be Maliciously Manipulated Shaokang Wang, Shudong Zhang, Zhijin Ge, Xiaosen Wang Year: 2025Area: Multimodal SafetyCitations: 3 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E7 / R3 (97%) | 3 |
| AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts Lin Wang, Weiping Wang, Xiaojun Jia, Wanqian Zhang Year: 2025Area: Multimodal SafetyCitations: 1 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (96%) | 1 |
| Backdoor Cleaning without External Guidance in MLLM Fine-tuning Wenke Huang, Jinhe Bi, Xuankun Rong, Jian Liang Year: 2025Area: Multimodal SafetyCitations: 14 Tags: empirical, ai-safety, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, multimodal-safety | E5 / R3 (96%) | 14 |
| Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images Aditya Kumar, Franziska Boenisch, Tom Blanchard, Adam Dziedzic Year: 2025Area: Multimodal SafetyCitations: - Tags: ai-safety, multimodal-safety, benchmark | 2025 | Multimodal Safety | ai-safety, multimodal-safety, benchmark | E5 / R3 (94%) | - |
| Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts Hee-Seon Kim, Changick Kim, Wonjun Lee, Kihyun Kim Year: 2025Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (94%) | - |
| Beyond the Safety Tax: Mitigating Unsafe Text-to-Image Generation via External Safety Rectification Xiangtao Meng, Zheng Li, Yingkai Dong, Ning Yu Year: 2025Area: Multimodal SafetyCitations: - Tags: empirical, ai-safety, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, multimodal-safety | E5 / R3 (93%) | - |
| Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap Spencer Stice, Wenhan Yang, Baharan Mirzasoleiman, Ali Payani Year: 2025Area: Multimodal SafetyCitations: 1 Tags: empirical, ai-safety, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, multimodal-safety | E6 / R3 (96%) | 1 |
| Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities Michael Backes, Yiting Qu, Yang Zhang Year: 2025Area: Multimodal SafetyCitations: 1 Tags: empirical, alignment-training, ai-safety, multimodal-safety | 2025 | Multimodal Safety | empirical, alignment-training, ai-safety, multimodal-safety | E5 / R3 (98%) | 1 |
| CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models Jinwei Hu, Wenjie Ruan, Yi Dong, Jiaxu Liu Year: 2025Area: Multimodal SafetyCitations: - Tags: theoretical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | theoretical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R4 (96%) | - |
| Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities Jiahui Geng, Preslav Nakov, Thy Thy Tran, Iryna Gurevych Year: 2025Area: Multimodal SafetyCitations: 2 Tags: empirical, ai-safety, adversarial-robustness, multimodal-safety | 2025 | Multimodal Safety | empirical, ai-safety, adversarial-robustness, multimodal-safety | E5 / R3 (97%) | 2 |
Showing 30 of 276 papers on page 1.