Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
| Paper | Year | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| BLOCK-EM: Preventing Emergent Misalignment by Blocking Causal Features Guannan Qu, Muhammed Ustaomeroglu Year: 2026Area: Model EditingCitations: - Tags: empirical, alignment-training, ai-safety, model-editing | 2026 | Model Editing | empirical, alignment-training, ai-safety, model-editing | E4 / R2 (95%) | - |
| Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data Dan Alistarh, Eugenia Iofinova Year: 2026Area: Model EditingCitations: - Tags: ai-safety, model-editing, benchmark | 2026 | Model Editing | ai-safety, model-editing, benchmark | E5 / R4 (96%) | - |
| C-ΔΘ: Circuit-Restricted Weight Arithmetic for Selective Refusal Aditya Kasliwal, Vinay Kumar Sankarapu, Pratinav Seth Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (96%) | - |
| CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu, Junyuan Hong Year: 2026Area: Model EditingCitations: - Tags: empirical, alignment-training, ai-safety, model-editing | 2026 | Model Editing | empirical, alignment-training, ai-safety, model-editing | E6 / R4 (93%) | - |
| Controllable Value Alignment in Large Language Models through Neuron-Level Editing Richang Hong, Weibiao Huang, Tat-Seng Chua, Le Wu Year: 2026Area: Model EditingCitations: - Tags: empirical, alignment-training, ai-safety, model-editing | 2026 | Model Editing | empirical, alignment-training, ai-safety, model-editing | E5 / R3 (94%) | - |
| DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E6 / R3 (93%) | - |
| EvoMU: Evolutionary Machine Unlearning Paul Swoboda, Pawel Batorski Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E6 / R5 (98%) | - |
| FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning Haibo Hu, Kun Fang, Cheng Hong, Qingqing Ye Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E6 / R3 (94%) | - |
| Fine-Grained Activation Steering: Steering Less, Achieving More Kezhi Mao, Junlang Qian, Lee Onn Mak, Jia Jim Deryl Chua Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (93%) | - |
| From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning Haibo Hu, Zhibiao Guo, Qingqing Ye, Zi Liang Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, adversarial-robustness, model-editing | 2026 | Model Editing | empirical, ai-safety, adversarial-robustness, model-editing | E7 / R4 (95%) | - |
| Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models Andi Zhang, Yuheng Yang, Jingxin Han, Guangxu Chen Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E8 / R4 (96%) | - |
| Inference-time Unlearning Using Conformal Prediction Aranyak Mehta, Gokhan Mergen, Amr Ahmed, Avinava Dubey Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E4 / R3 (96%) | - |
| Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits Prasoon Bajpai, Been Kim, Zi Wang, Wenjun Zeng Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, interpretability, model-editing | 2026 | Model Editing | empirical, ai-safety, interpretability, model-editing | E5 / R3 (91%) | - |
| Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning Qizhou Wang, Zhuo Huang, Bo Han, Tongliang Liu Year: 2026Area: Model EditingCitations: 4 Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (95%) | 4 |
| Nexus scissor: enhance open-access language model safety by connection pruning Peihua Mai, Yan Pang, Youjia Yang, Ran Yan Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, adversarial-robustness, model-editing | 2026 | Model Editing | empirical, ai-safety, adversarial-robustness, model-editing | E5 / R3 (95%) | - |
| On the Robustness of Knowledge Editing for Detoxification Guanyi Chen, Ziyan Peng, Ming Dong, Tingting He Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (93%) | - |
| Per-parameter Task Arithmetic for Unlearning in Large Language Models Jiangchao Yao, Bo Han, Jun Zhou, Jianzhong Qi Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E7 / R4 (96%) | - |
| Reverse-Engineering Model Editing on Language Models Zhiyu Sun, Yu Wang, Minrui Luo, Zhili Chen Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (94%) | - |
| SafeNeuron: Neuron-Level Safety Alignment for Large Language Models Weixiang Zhao, Zhaoxin Wang, Tat-Seng Chua, Jiayi Ji Year: 2026Area: Model EditingCitations: - Tags: empirical, alignment-training, ai-safety, adversarial-robustness, model-editing | 2026 | Model Editing | empirical, alignment-training, ai-safety, adversarial-robustness, model-editing | E5 / R4 (95%) | - |
| Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance Ruoxi Jia, Jian Liu, Xiaohu Yang, Jiawen Zhang Year: 2026Area: Model EditingCitations: 1 Tags: empirical, alignment-training, ai-safety, model-editing | 2026 | Model Editing | empirical, alignment-training, ai-safety, model-editing | E5 / R3 (97%) | 1 |
| Sparsity-Aware Unlearning for Large Language Models Ke Xu, Yujia Tong, Jiawei Jiang, Chuang Hu Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E6 / R4 (96%) | - |
| Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models Jiaqian Li, Kuan-Hao Huang, Yanshu Li Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (96%) | - |
| Surgical Refusal Ablation: Disentangling Safety from Intelligence via Concept-Guided Spectral Cleaning Tony Cristofano Year: 2026Area: Model EditingCitations: 1 Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E7 / R3 (95%) | 1 |
| UnHype: CLIP-Guided Hypernetworks for Dynamic LoRA Unlearning Przemysław Spurek, Wojciech Gromski, Maciej Zieba, Maksym Petrenko Year: 2026Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2026 | Model Editing | empirical, ai-safety, model-editing | E6 / R4 (98%) | - |
| A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models Herbert Woisetschläger, Jiahui Geng, Qing Li, Zongxiong Chen Year: 2025Area: Model EditingCitations: 23 Tags: ai-safety, survey, safety-evaluation, model-editing | 2025 | Model Editing | ai-safety, survey, safety-evaluation, model-editing | E6 / R4 (97%) | 23 |
| A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy Dristi Roy, Ryan Lagasse, Sean O'Brien, Ashwinee Panda Year: 2025Area: Model EditingCitations: - Tags: empirical, ai-safety, model-editing | 2025 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (97%) | - |
| A General Framework to Enhance Fine-tuning-based LLM Unlearning Xianfeng Tang, Jingying Zeng, Zhen Li, Yue Xing Year: 2025Area: Model EditingCitations: 8 Tags: empirical, ai-safety, model-editing | 2025 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (95%) | 8 |
| A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty Chengye Wang, Yuyuan Li, Li Zhang, Xiaohua Feng Year: 2025Area: Model EditingCitations: 1 Tags: empirical, ai-safety, model-editing | 2025 | Model Editing | empirical, ai-safety, model-editing | E5 / R3 (94%) | 1 |
| A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs Shaona Ghosh, Amrita Bhattacharjee, Christopher Parisien, Yftah Ziser Year: 2025Area: Model EditingCitations: 2 Tags: empirical, ai-safety, model-editing | 2025 | Model Editing | empirical, ai-safety, model-editing | E6 / R3 (95%) | 2 |
| A Unified Framework for Diffusion Model Unlearning with f-Divergence Luigi Cinque, Deniz Gündüz, Federico Fontana, Nicola Novello Year: 2025Area: Model EditingCitations: - Tags: theoretical, ai-safety, model-editing | 2025 | Model Editing | theoretical, ai-safety, model-editing | E5 / R3 (96%) | - |
Showing 30 of 468 papers on page 1.