Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

PaperIntel
BLOCK-EM: Preventing Emergent Misalignment by Blocking Causal Features

Guannan Qu, Muhammed Ustaomeroglu

Year: 2026Area: Model EditingCitations: -

Tags: empirical, alignment-training, ai-safety, model-editing

E4 / R2 (95%)
Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data

Dan Alistarh, Eugenia Iofinova

Year: 2026Area: Model EditingCitations: -

Tags: ai-safety, model-editing, benchmark

E5 / R4 (96%)
C-ΔΘ: Circuit-Restricted Weight Arithmetic for Selective Refusal

Aditya Kasliwal, Vinay Kumar Sankarapu, Pratinav Seth

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E5 / R3 (96%)
CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment

Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu, Junyuan Hong

Year: 2026Area: Model EditingCitations: -

Tags: empirical, alignment-training, ai-safety, model-editing

E6 / R4 (93%)
Controllable Value Alignment in Large Language Models through Neuron-Level Editing

Richang Hong, Weibiao Huang, Tat-Seng Chua, Le Wu

Year: 2026Area: Model EditingCitations: -

Tags: empirical, alignment-training, ai-safety, model-editing

E5 / R3 (94%)
DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E6 / R3 (93%)
EvoMU: Evolutionary Machine Unlearning

Paul Swoboda, Pawel Batorski

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E6 / R5 (98%)
FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

Haibo Hu, Kun Fang, Cheng Hong, Qingqing Ye

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E6 / R3 (94%)
Fine-Grained Activation Steering: Steering Less, Achieving More

Kezhi Mao, Junlang Qian, Lee Onn Mak, Jia Jim Deryl Chua

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E5 / R3 (93%)
From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning

Haibo Hu, Zhibiao Guo, Qingqing Ye, Zi Liang

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, adversarial-robustness, model-editing

E7 / R4 (95%)
Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models

Andi Zhang, Yuheng Yang, Jingxin Han, Guangxu Chen

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E8 / R4 (96%)
Inference-time Unlearning Using Conformal Prediction

Aranyak Mehta, Gokhan Mergen, Amr Ahmed, Avinava Dubey

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E4 / R3 (96%)
Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits

Prasoon Bajpai, Been Kim, Zi Wang, Wenjun Zeng

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, interpretability, model-editing

E5 / R3 (91%)
Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning

Qizhou Wang, Zhuo Huang, Bo Han, Tongliang Liu

Year: 2026Area: Model EditingCitations: 4

Tags: empirical, ai-safety, model-editing

E5 / R3 (95%)
Nexus scissor: enhance open-access language model safety by connection pruning

Peihua Mai, Yan Pang, Youjia Yang, Ran Yan

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, adversarial-robustness, model-editing

E5 / R3 (95%)
On the Robustness of Knowledge Editing for Detoxification

Guanyi Chen, Ziyan Peng, Ming Dong, Tingting He

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E5 / R3 (93%)
Per-parameter Task Arithmetic for Unlearning in Large Language Models

Jiangchao Yao, Bo Han, Jun Zhou, Jianzhong Qi

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E7 / R4 (96%)
Reverse-Engineering Model Editing on Language Models

Zhiyu Sun, Yu Wang, Minrui Luo, Zhili Chen

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E5 / R3 (94%)
SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Weixiang Zhao, Zhaoxin Wang, Tat-Seng Chua, Jiayi Ji

Year: 2026Area: Model EditingCitations: -

Tags: empirical, alignment-training, ai-safety, adversarial-robustness, model-editing

E5 / R4 (95%)
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance

Ruoxi Jia, Jian Liu, Xiaohu Yang, Jiawen Zhang

Year: 2026Area: Model EditingCitations: 1

Tags: empirical, alignment-training, ai-safety, model-editing

E5 / R3 (97%)
Sparsity-Aware Unlearning for Large Language Models

Ke Xu, Yujia Tong, Jiawei Jiang, Chuang Hu

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E6 / R4 (96%)
Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models

Jiaqian Li, Kuan-Hao Huang, Yanshu Li

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E5 / R3 (96%)
Surgical Refusal Ablation: Disentangling Safety from Intelligence via Concept-Guided Spectral Cleaning

Tony Cristofano

Year: 2026Area: Model EditingCitations: 1

Tags: empirical, ai-safety, model-editing

E7 / R3 (95%)
UnHype: CLIP-Guided Hypernetworks for Dynamic LoRA Unlearning

Przemysław Spurek, Wojciech Gromski, Maciej Zieba, Maksym Petrenko

Year: 2026Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E6 / R4 (98%)
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Herbert Woisetschläger, Jiahui Geng, Qing Li, Zongxiong Chen

Year: 2025Area: Model EditingCitations: 23

Tags: ai-safety, survey, safety-evaluation, model-editing

E6 / R4 (97%)
A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy

Dristi Roy, Ryan Lagasse, Sean O'Brien, Ashwinee Panda

Year: 2025Area: Model EditingCitations: -

Tags: empirical, ai-safety, model-editing

E5 / R3 (97%)
A General Framework to Enhance Fine-tuning-based LLM Unlearning

Xianfeng Tang, Jingying Zeng, Zhen Li, Yue Xing

Year: 2025Area: Model EditingCitations: 8

Tags: empirical, ai-safety, model-editing

E5 / R3 (95%)
A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty

Chengye Wang, Yuyuan Li, Li Zhang, Xiaohua Feng

Year: 2025Area: Model EditingCitations: 1

Tags: empirical, ai-safety, model-editing

E5 / R3 (94%)
A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs

Shaona Ghosh, Amrita Bhattacharjee, Christopher Parisien, Yftah Ziser

Year: 2025Area: Model EditingCitations: 2

Tags: empirical, ai-safety, model-editing

E6 / R3 (95%)
A Unified Framework for Diffusion Model Unlearning with f-Divergence

Luigi Cinque, Deniz Gündüz, Federico Fontana, Nicola Novello

Year: 2025Area: Model EditingCitations: -

Tags: theoretical, ai-safety, model-editing

E5 / R3 (96%)

Showing 30 of 468 papers on page 1.