Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

Showing 91-97 of 97 papers (page 4 of 4)

PaperIntel
Taxonomy of Risks posed by Language Models

Jonathan Uesato, Courtney Biles, Laura Weidinger, Maribeth Rauh

Published: 2021-12-08Area: Surveys & ReviewsCitations: 1366

Tags: surveys-reviews, ai-safety, survey

E7 / R6 (98%)
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Dylan Hadfield-Menell, Tilman Räuker, Anson Ho, Stephen Casper

Published: 2022-07-27Area: Surveys & ReviewsCitations: 174

Tags: surveys-reviews, ai-safety, survey, interpretability

E8 / R4 (94%)
X-Risk Analysis for AI Research

Dan Hendrycks, Mantas Mazeika

Published: 2022-06-13Area: Surveys & ReviewsCitations: 81

Tags: surveys-reviews, ai-safety, position

E6 / R4 (96%)
Unsolved Problems in ML Safety

Nicholas Carlini, John Schulman, Dan Hendrycks, Jacob Steinhardt

Published: 2021-09-28Area: Surveys & ReviewsCitations: 359

Tags: alignment-training, surveys-reviews, ai-safety, position

E6 / R3 (95%)
A Primer in BERTology: What We Know About How BERT Works

Anna Rogers, Olga Kovaleva, Anna Rumshisky

Published: 2020-02-27Area: Surveys & ReviewsCitations: 1772

Tags: surveys-reviews, ai-safety, survey, interpretability

E5 / R4 (94%)
AI Research Considerations for Human Existential Safety (ARCHES)

David Krueger, Andrew Critch

Published: 2020-05-30Area: Surveys & ReviewsCitations: 65

Tags: surveys-reviews, ai-safety, position

E5 / R3 (97%)
An Overview of 11 Proposals for Building Safe Advanced AI

Evan Hubinger

Published: 2020-12-04Area: Surveys & ReviewsCitations: 27

Tags: alignment-training, surveys-reviews, ai-safety, survey

E7 / R3 (97%)