Paper deep dive

SAELens: A Library for Training and Analyzing Sparse Autoencoders

Joseph Bloom, Curt Tigges, Anthony Duong, David Chanin

Year: 2024Venue: Open-source softwareArea: Mechanistic Interp.Type: ToolEmbeddings: 3

Models: Any Hugging Face Transformers model, Any TransformerLens-supported model

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 97%

Last extracted: 3/11/2026, 12:34:25 AM

Summary

SAELens is an open-source library designed for training and analyzing Sparse Autoencoders (SAEs) to advance mechanistic interpretability in neural networks, supporting various frameworks like PyTorch, Hugging Face, and TransformerLens.

Entities (5)

Joseph Bloom · person · 100%SAELens · software-library · 100%Sparse Autoencoder · machine-learning-technique · 100%Mechanistic Interpretability · research-field · 95%SAE-Vis · software-library · 95%

Relation Signals (3)

Joseph Bloom → maintains → SAELens

confidence 100% · This library is maintained by Joseph Bloom, Curt Tigges, Anthony Duong and David Chanin.

SAELens → supports → Sparse Autoencoder

confidence 100% · SAELens exists to help researchers: Train sparse autoencoders.

SAE-Vis → integrateswith → SAELens

confidence 90% · SAE-Vis: A library for visualizing SAE features, works with SAELens.

Cypher Suggestions (2)

Identify maintainers of SAELens · confidence 100% · unvalidated

MATCH (p:Person)-[:MAINTAINS]->(s:SoftwareLibrary {name: 'SAELens'}) RETURN p.name

Find all software libraries related to Sparse Autoencoders · confidence 90% · unvalidated

MATCH (s:SoftwareLibrary)-[:SUPPORTS|INTEGRATES_WITH]->(t:Technique {name: 'Sparse Autoencoder'}) RETURN s

Abstract

Training Sparse Autoencoders on Language Models. Contribute to decoderesearch/SAELens development by creating an account on GitHub.

Full Text

2,340 characters extracted from source content.

Expand or collapse full text

SAE Lens SAELens exists to help researchers: Train sparse autoencoders. Analyse sparse autoencoders / research mechanistic interpretability. Generate insights which make it easier to create safe and aligned AI systems. SAELens inference works with any PyTorch-based model, not just TransformerLens. While we provide deep integration with TransformerLens via HookedSAETransformer, SAEs can be used with Hugging Face Transformers, NNsight, or any other framework by extracting activations and passing them to the SAE's encode() and decode() methods. Please refer to the documentation for information on how to: Download and Analyse pre-trained sparse autoencoders. Train your own sparse autoencoders. Generate feature dashboards with the SAE-Vis Library. SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to safeguard humanity from risks posed by artificial intelligence. This library is maintained by Joseph Bloom, Curt Tigges, Anthony Duong and David Chanin. Loading Pre-trained SAEs. Pre-trained SAEs for various models can be imported via SAE Lens. See this page for a list of all SAEs. Migrating to SAELens v6 The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the migration guide for more details. Tutorials SAE Lens + Neuronpedia Loading and Analysing Pre-Trained Sparse Autoencoders Understanding SAE Features with the Logit Lens Training a Sparse Autoencoder Training SAEs on Synthetic Data SynthSAEBench: Evaluating SAE Architectures on Synthetic Data Join the Slack! Feel free to join the Open Source Mechanistic Interpretability Slack for support! Other SAE Projects dictionary-learning: An SAE training library that focuses on having hackable code. Sparsify: A lean SAE training library focused on TopK SAEs. Overcomplete: SAE training library focused on vision models. SAE-Vis: A library for visualizing SAE features, works with SAELens. SAEBench: A suite of LLM SAE benchmarks, works with SAELens. Citation Please cite the package as follows: @miscbloom2024saetrainingcodebase, title = SAELens, author = Bloom, Joseph and Tigges, Curt and Duong, Anthony and Chanin, David, year = 2024, howpublished = https://github.com/decoderesearch/SAELens,