Paper deep dive

A Pragmatic Vision for Interpretability

Neel Nanda, Josh Engels, Arthur Conmy, Senthooran Rajamanoharan, Bilal Chughtai, Callum McDougall, Janos Kramar, Lewis Smith

Year: 2025Venue: AI Alignment ForumArea: Mechanistic Interp.Type: PositionEmbeddings: 0

Abstract

Google DeepMind's mech interp team pivots from ambitious reverse-engineering to pragmatic interpretability: solving problems on the critical path to AGI safety using proxy tasks and simple methods first.

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

No full-text extraction is stored for this paper yet.