Paper deep dive
Bridging the Black Box: A Survey on Mechanistic Interpretability in AI
Shriyank Somvanshi, Md Monzurul Islam, Amir Rafe, Anannya Ghosh Tusti, Arka Chakraborty, Anika Baitullah, Tausif Islam Chowdhury, Nawaf Alnawmasi, Anandi K. Dutta, Subasish Das
Year: 2025Venue: ACM Computing SurveysArea: Surveys & ReviewsType: SurveyEmbeddings: 0
Models: GPT-2, large language models (general)
Abstract
A comprehensive survey of mechanistic interpretability organized across three abstraction layers (neurons, circuits, algorithms) and three evaluation regimes (behavioral, counterfactual, causal).
Tags
ai-safety (imported, 100%)interpretability (suggested, 80%)safety-evaluation (suggested, 80%)survey (suggested, 88%)surveys-reviews (suggested, 92%)
Links
- Source: https://dl.acm.org/doi/10.1145/3787104
- Canonical: https://dl.acm.org/doi/10.1145/3787104
Intelligence
Status: not_run | Model: - | Prompt: - | Confidence: 0%
Entities (0)
No extracted entities yet.
Relation Signals (0)
No relation signals yet.
Cypher Suggestions (0)
No Cypher suggestions yet.
Full Text
No full-text extraction is stored for this paper yet.