Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

PaperIntel
Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory

Jesse Hoogland, Einar Urdshals, Edmund Lau, Daniel Murfet

Year: 2025Area: Training DynamicsCitations: 3

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (95%)
Data Shapley in One Training Run

Ruoxi Jia, Jiachen T. Wang, Prateek Mittal, Dawn Song

Year: 2025Area: Training DynamicsCitations: 48

Tags: theoretical, ai-safety, training-dynamics

E5 / R3 (95%)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

Jesse Hoogland, Zach Furman, George Wang, Daniel Murfet

Year: 2025Area: Training DynamicsCitations: 24

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (93%)
Evolution of Concepts in Language Model Pre-Training

Xuyang Ge, Zhengfu He, Yunhua Zhou, Jiaxing Wu

Year: 2025Area: Training DynamicsCitations: 2

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (94%)
Generalization to Political Beliefs from Fine-Tuning on Sports Team Preferences

Owen Terry

Year: 2025Area: Training DynamicsCitations: -

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (93%)
Learning Dynamics of LLM Finetuning

Yi Ren, Danica J. Sutherland

Year: 2025Area: Training DynamicsCitations: 67

Tags: theoretical, alignment-training, ai-safety, training-dynamics

E5 / R3 (94%)
Learning from Negative Examples: Why Warning-Framed Training Data Teaches What It Warns Against

Tsogt-Ochir Enkhbayar

Year: 2025Area: Training DynamicsCitations: -

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (96%)
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model

Wei Hu, Zhiwei Xu, Zhiyu Ni, Yixin Wang

Year: 2025Area: Training DynamicsCitations: 6

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (95%)
The Local Learning Coefficient: A Singularity-Aware Complexity Measure

Zach Furman, Susan Wei, Edmund Lau, George Wang

Year: 2025Area: Training DynamicsCitations: 34

Tags: theoretical, ai-safety, training-dynamics

E5 / R3 (96%)
Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks

Anish Lakkapragada

Year: 2025Area: Training DynamicsCitations: -

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (95%)
AI Models Collapse When Trained on Recursively Generated Data

Ilia Shumailov, Ross Anderson, Zakhar Shumaylov, Yarin Gal

Year: 2024Area: Training DynamicsCitations: 427

Tags: empirical, ai-safety, training-dynamics

E6 / R3 (97%)
Fine-tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

David Bau, Nikhil Prakash, Tamar Rott Shaham, Tal Haklay

Year: 2024Area: Training DynamicsCitations: 101

Tags: empirical, ai-safety, training-dynamics

E7 / R4 (95%)
Learning and Unlearning of Fabricated Knowledge in Language Models

Mark Sandler, Chen Sun, Nolan Andrew Miller, Max Vladymyrov

Year: 2024Area: Training DynamicsCitations: 5

Tags: empirical, ai-safety, training-dynamics

E4 / R3 (94%)
Loss of Plasticity in Deep Continual Learning

Parash Rahman, J. Fernando Hernandez-Garcia, A. Rupam Mahmood, Shibhansh Dohare

Year: 2024Area: Training DynamicsCitations: 40

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (97%)
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

Ravid Shwartz-Ziv, Angelica Chen, Matthew L. Leavitt, Kyunghyun Cho

Year: 2024Area: Training DynamicsCitations: 109

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (97%)
The Developmental Landscape of In-Context Learning

Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, George Wang

Year: 2024Area: Training DynamicsCitations: 19

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (96%)
Are Emergent Abilities of Large Language Models a Mirage?

Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

Year: 2023Area: Training DynamicsCitations: 591

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (96%)
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Susan Wei, Edmund Lau, Jake Mendel, Zhongtian Chen

Year: 2023Area: Training DynamicsCitations: 22

Tags: theoretical, ai-safety, training-dynamics

E5 / R3 (95%)
Emergent and Predictable Memorization in Large Language Models

Hailey Schoelkopf, Quentin Anthony, Edward Raff, Shivanshu Purohit

Year: 2023Area: Training DynamicsCitations: 173

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (96%)
Explaining Grokking Through Circuit Efficiency

János Kramár, Vikrant Varma, Rohin Shah, Ramana Kumar

Year: 2023Area: Training DynamicsCitations: 80

Tags: empirical, ai-safety, training-dynamics

E6 / R3 (93%)
Progress Measures for Grokking via Mechanistic Interpretability

Lawrence Chan, Tom Lieberum, Jess Smith, Neel Nanda

Year: 2023Area: Training DynamicsCitations: 680

Tags: empirical, ai-safety, training-dynamics, interpretability

E5 / R3 (95%)
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Hailey Schoelkopf, Quentin Anthony, Edward Raff, Shivanshu Purohit

Year: 2023Area: Training DynamicsCitations: 1708

Tags: ai-safety, training-dynamics, tool, interpretability

E5 / R3 (99%)
Studying Large Language Model Generalization with Influence Functions

Alex Tamkin, Nicholas Joseph, Benoit Steiner, Karina Nguyen

Year: 2023Area: Training DynamicsCitations: 286

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (94%)
The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'

Max Kaufmann, Meg Tong, Mikita Balesni, Owain Evans

Year: 2023Area: Training DynamicsCitations: 425

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (96%)
Transformers Learn In-Context by Gradient Descent

Johannes von Oswald, Alexander Mordvintsev, Eyvind Niklasson, Max Vladymyrov

Year: 2023Area: Training DynamicsCitations: 677

Tags: theoretical, ai-safety, training-dynamics

E5 / R3 (95%)
Emergent Abilities of Large Language Models

William Fedus, Tatsunori Hashimoto, Yi Tay, Jeff Dean

Year: 2022Area: Training DynamicsCitations: 3244

Tags: empirical, ai-safety, training-dynamics

E7 / R3 (96%)
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Harri Edwards, Alethea Power, Igor Babuschkin, Yuri Burda

Year: 2022Area: Training DynamicsCitations: 526

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (95%)
Inverse Scaling Can Become U-Shaped

Yi Tay, Jason Wei, Quoc V. Le, Najoung Kim

Year: 2022Area: Training DynamicsCitations: 79

Tags: empirical, ai-safety, training-dynamics

E5 / R3 (95%)
Quantifying Memorization Across Neural Language Models

Nicholas Carlini, Katherine Lee, Matthew Jagielski, Daphne Ippolito

Year: 2022Area: Training DynamicsCitations: 805

Tags: empirical, ai-safety, training-dynamics

E5 / R4 (96%)
Scaling Laws and Interpretability of Learning from Repeated Data

Tom Conerly, Nicholas Joseph, Dawn Drain, Tom Henighan

Year: 2022Area: Training DynamicsCitations: 148

Tags: empirical, ai-safety, training-dynamics, interpretability

E5 / R3 (94%)

Showing 30 of 33 papers on page 1.