Paper deep dive
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level
Neel Nanda, Senthooran Rajamanoharan, János Kramár, Rohin Shah
Year: 2023Venue: Alignment ForumArea: Mechanistic Interp.Type: EmpiricalEmbeddings: 0
Models: Pythia-2.8B
Intelligence
Status: not_run | Model: - | Prompt: - | Confidence: 0%
Entities (0)
No extracted entities yet.
Relation Signals (0)
No relation signals yet.
Cypher Suggestions (0)
No Cypher suggestions yet.
Abstract
Investigated how early MLP layers in Pythia 2.8B implement factual recall for 1,500 athletes' sports, finding MLPs create 'multi-token embeddings' but failing to fully reverse-engineer computation in superposition.
Tags
ai-safety (imported, 100%)empirical (suggested, 88%)mechanistic-interp (suggested, 92%)
Links
Full Text
No full-text extraction is stored for this paper yet.