Instant research discovery

Search and browse ingested papers with intelligence signals and fast filtering.

Showing 61-90 of 97 papers (page 3 of 4)

PaperIntel
Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models

Sabrina Sicari, Alberto Coen-Porisini, Alessandra Rizzardi, Jesus F. Cevallos M.

Published: -Area: Surveys & ReviewsCitations: 7

Tags: alignment-training, surveys-reviews, ai-safety, survey

-
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Ke Xu, Xinyu Kong, Zhixing Tan, Ziyi Qiu

Published: 2024-01-11Area: Surveys & ReviewsCitations: 104

Tags: surveys-reviews, ai-safety, survey

E7 / R5 (97%)
Safeguarding Large Language Models: A Survey

Gaojie Jin, Jinwei Hu, Yi Dong, Saddek Bensalem

Published: 2024-06-03Area: Surveys & ReviewsCitations: 82

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E6 / R4 (97%)
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger, Dirk Hovy, Fabio Pernisi, Bertie Vidgen

Published: 2024-04-08Area: Surveys & ReviewsCitations: 69

Tags: surveys-reviews, ai-safety, survey

E5 / R3 (98%)
Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks

Ziqian Bi, Keyu Chen, Benji Peng, Pohsun Feng

Published: 2024-09-12Area: Surveys & ReviewsCitations: 31

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E6 / R4 (94%)
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Sara Abdali, Richard Anarfi, Jia He, CJ Barberan

Published: 2024-03-19Area: Surveys & ReviewsCitations: 49

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E6 / R4 (98%)
Security and Privacy Challenges of Large Language Models: A Survey

Badhan Chandra Das, Yanzhao Wu, M. Hadi Amini

Published: 2024-01-30Area: Surveys & ReviewsCitations: 351

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E6 / R4 (97%)
The Art of Refusal: A Survey of Abstention in Large Language Models

Shangbin Feng, Lucy Lu Wang, Chenjun Xu, Bill Howe

Published: 2024-07-25Area: Surveys & ReviewsCitations: 55

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey, safety-evaluation

E5 / R3 (94%)
The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms

Adam Davies, Ashkan Khakzar

Published: 2024-08-11Area: Surveys & ReviewsCitations: 14

Tags: surveys-reviews, ai-safety, survey, interpretability

E6 / R4 (95%)
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies

Bo Liu, Feng He, Philip S. Yu, Tianqing Zhu

Published: 2024-07-28Area: Surveys & ReviewsCitations: 85

Tags: surveys-reviews, ai-safety, survey

E7 / R4 (94%)
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Jiuding Sun, David Bau, Can Rager, Samuel Marks

Published: 2024-08-02Area: Surveys & ReviewsCitations: 3

Tags: surveys-reviews, ai-safety, survey, interpretability

E5 / R3 (95%)
The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

JinYeong Bak, Jing Yao, Xiaoyuan Yi, Muhua Huang

Published: 2024-12-21Area: Surveys & ReviewsCitations: 9

Tags: alignment-training, surveys-reviews, ai-safety, survey

E6 / R3 (94%)
Towards Uncovering How Large Language Model Works: An Explainability Perspective

Mengnan Du, Himabindu Lakkaraju, Haiyan Zhao, Fan Yang

Published: 2024-02-16Area: Surveys & ReviewsCitations: 26

Tags: alignment-training, surveys-reviews, ai-safety, survey, interpretability

E5 / R3 (93%)
Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey

Bo Liu, Shang Wang, Ming Ding, Xu Guo

Published: 2024-06-12Area: Surveys & ReviewsCitations: 20

Tags: surveys-reviews, ai-safety, survey

E7 / R3 (97%)
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking

Rose Hadshar

Published: 2023-10-27Area: Surveys & ReviewsCitations: 11

Tags: surveys-reviews, ai-safety, survey

E6 / R3 (92%)
AI Alignment: A Comprehensive Survey

Wen Gao, Hantao Lou, Yizhou Wang, Juntao Dai

Published: 2023-10-30Area: Surveys & ReviewsCitations: 320

Tags: alignment-training, surveys-reviews, ai-safety, survey, interpretability

E7 / R4 (97%)
AI Safety Subproblems for Software Engineering Researchers

Prem Devanbu, Zhou Yu, David Gros

Published: 2023-04-28Area: Surveys & ReviewsCitations: 4

Tags: surveys-reviews, ai-safety, survey

E5 / R3 (94%)
An Overview of Catastrophic AI Risks

Thomas Woodside, Dan Hendrycks, Mantas Mazeika

Published: 2023-06-21Area: Surveys & ReviewsCitations: 258

Tags: surveys-reviews, ai-safety, survey

E7 / R3 (100%)
Core Views on AI Safety: When, Why, What, and How

Anthropic

Published: -Area: Surveys & ReviewsCitations: -

Tags: surveys-reviews, ai-safety, position, interpretability

E6 / R4 (95%)
From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

Jing Yao, Xiting Wang, Xiaoyuan Yi, Xing Xie

Published: 2023-08-23Area: Surveys & ReviewsCitations: 62

Tags: alignment-training, surveys-reviews, ai-safety, survey

E5 / R3 (94%)
Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Carolyn Ashurst, Adrian Weller, Victoria Smith, Ali Shahin Shamsabadi

Published: 2023-09-27Area: Surveys & ReviewsCitations: 41

Tags: surveys-reviews, ai-safety, survey

E5 / R3 (97%)
Identifying and Mitigating the Security Risks of Generative AI

Nicholas Carlini, Mihai Christodorescu, Jihye Choi, Elie Bursztein

Published: 2023-08-28Area: Surveys & ReviewsCitations: 126

Tags: alignment-training, surveys-reviews, ai-safety, survey

E6 / R4 (95%)
Large Language Model Alignment: A Survey

Zishan Guo, Tianhao Shen, Deyi Xiong, Yufei Huang

Published: 2023-09-26Area: Surveys & ReviewsCitations: 292

Tags: alignment-training, surveys-reviews, ai-safety, adversarial-robustness, survey, interpretability, safety-evaluation

E6 / R4 (97%)
Managing AI Risks in an Era of Rapid Progress

Ashwin Acharya, Jan Brauner, Philip Torr, Andrew Yao

Published: 2023-10-26Area: Surveys & ReviewsCitations: 80

Tags: surveys-reviews, ai-safety, position

E5 / R3 (93%)
Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries

Jonas Schuett, Leonie Koessler

Published: 2023-07-17Area: Surveys & ReviewsCitations: 35

Tags: surveys-reviews, ai-safety, survey

E8 / R4 (98%)
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Yu Fu, Md Abdullah Al Mamun, Nael Abu-Ghazaleh, Pedram Zaree

Published: 2023-10-16Area: Surveys & ReviewsCitations: 238

Tags: surveys-reviews, ai-safety, adversarial-robustness, survey

E6 / R3 (96%)
Taming Simulators: Challenges, Pathways and Vision for the Alignment of Large Language Models

Leonard Bereska, Efstratios Gavves

Published: -Area: Surveys & ReviewsCitations: -

Tags: alignment-training, surveys-reviews, ai-safety, position

E6 / R4 (92%)
200 Concrete Open Problems in Mechanistic Interpretability

Neel Nanda

Published: -Area: Surveys & ReviewsCitations: -

Tags: surveys-reviews, ai-safety, survey, interpretability

-
A Survey of Machine Unlearning

Thanh Tam Nguyen, Thanh Trung Huynh, Hongzhi Yin, Alan Wee-Chung Liew

Published: 2022-09-06Area: Surveys & ReviewsCitations: 345

Tags: surveys-reviews, ai-safety, survey

E5 / R3 (94%)
Clarifying AI X-risk

Vikrant Varma, Rohin Shah, Ramana Kumar, Elliot Catt

Published: -Area: Surveys & ReviewsCitations: -

Tags: surveys-reviews, ai-safety, position

-