Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
Showing 91-97 of 97 papers (page 4 of 4)
| Paper | Published | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| Taxonomy of Risks posed by Language Models Jonathan Uesato, Courtney Biles, Laura Weidinger, Maribeth Rauh Published: 2021-12-08Area: Surveys & ReviewsCitations: 1366 Tags: surveys-reviews, ai-safety, survey | 2021-12-08 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E7 / R6 (98%) | 1366 |
| Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks Dylan Hadfield-Menell, Tilman Räuker, Anson Ho, Stephen Casper Published: 2022-07-27Area: Surveys & ReviewsCitations: 174 Tags: surveys-reviews, ai-safety, survey, interpretability | 2022-07-27 | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | E8 / R4 (94%) | 174 |
| X-Risk Analysis for AI Research Dan Hendrycks, Mantas Mazeika Published: 2022-06-13Area: Surveys & ReviewsCitations: 81 Tags: surveys-reviews, ai-safety, position | 2022-06-13 | Surveys & Reviews | surveys-reviews, ai-safety, position | E6 / R4 (96%) | 81 |
| Unsolved Problems in ML Safety Nicholas Carlini, John Schulman, Dan Hendrycks, Jacob Steinhardt Published: 2021-09-28Area: Surveys & ReviewsCitations: 359 Tags: alignment-training, surveys-reviews, ai-safety, position | 2021-09-28 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, position | E6 / R3 (95%) | 359 |
| A Primer in BERTology: What We Know About How BERT Works Anna Rogers, Olga Kovaleva, Anna Rumshisky Published: 2020-02-27Area: Surveys & ReviewsCitations: 1772 Tags: surveys-reviews, ai-safety, survey, interpretability | 2020-02-27 | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | E5 / R4 (94%) | 1772 |
| AI Research Considerations for Human Existential Safety (ARCHES) David Krueger, Andrew Critch Published: 2020-05-30Area: Surveys & ReviewsCitations: 65 Tags: surveys-reviews, ai-safety, position | 2020-05-30 | Surveys & Reviews | surveys-reviews, ai-safety, position | E5 / R3 (97%) | 65 |
| An Overview of 11 Proposals for Building Safe Advanced AI Evan Hubinger Published: 2020-12-04Area: Surveys & ReviewsCitations: 27 Tags: alignment-training, surveys-reviews, ai-safety, survey | 2020-12-04 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | E7 / R3 (97%) | 27 |