← Back to papers

Paper deep dive

Bridging the Black Box: A Survey on Mechanistic Interpretability in AI

Shriyank Somvanshi, Md Monzurul Islam, Amir Rafe, Anannya Ghosh Tusti, Arka Chakraborty, Anika Baitullah, Tausif Islam Chowdhury, Nawaf Alnawmasi, Anandi K. Dutta, Subasish Das

Year: 2025Venue: ACM Computing SurveysArea: Surveys & ReviewsType: SurveyEmbeddings: 0

Models: GPT-2, large language models (general)

Abstract

A comprehensive survey of mechanistic interpretability organized across three abstraction layers (neurons, circuits, algorithms) and three evaluation regimes (behavioral, counterfactual, causal).

Tags

ai-safety (imported, 100%)interpretability (suggested, 80%)safety-evaluation (suggested, 80%)survey (suggested, 88%)surveys-reviews (suggested, 92%)

Links

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

No full-text extraction is stored for this paper yet.