Instant research discovery
Search and browse ingested papers with intelligence signals and fast filtering.
| Paper | Published | Area | Tags | Intel | Citations |
|---|---|---|---|---|---|
| Open-Ethical AI: Advancements in Open-Source Human-Centric Neural Language Models Sabrina Sicari, Alberto Coen-Porisini, Alessandra Rizzardi, Jesus F. Cevallos M. Published: -Area: Surveys & ReviewsCitations: 7 Tags: alignment-training, surveys-reviews, ai-safety, survey | - | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | - | 7 |
| Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems Ke Xu, Xinyu Kong, Zhixing Tan, Ziyi Qiu Published: 2024-01-11Area: Surveys & ReviewsCitations: 104 Tags: surveys-reviews, ai-safety, survey | 2024-01-11 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E7 / R5 (97%) | 104 |
| Safeguarding Large Language Models: A Survey Gaojie Jin, Jinwei Hu, Yi Dong, Saddek Bensalem Published: 2024-06-03Area: Surveys & ReviewsCitations: 82 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2024-06-03 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E6 / R4 (97%) | 82 |
| SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety Paul Röttger, Dirk Hovy, Fabio Pernisi, Bertie Vidgen Published: 2024-04-08Area: Surveys & ReviewsCitations: 69 Tags: surveys-reviews, ai-safety, survey | 2024-04-08 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E5 / R3 (98%) | 69 |
| Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks Ziqian Bi, Keyu Chen, Benji Peng, Pohsun Feng Published: 2024-09-12Area: Surveys & ReviewsCitations: 31 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2024-09-12 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E6 / R4 (94%) | 31 |
| Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices Sara Abdali, Richard Anarfi, Jia He, CJ Barberan Published: 2024-03-19Area: Surveys & ReviewsCitations: 49 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2024-03-19 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E6 / R4 (98%) | 49 |
| Security and Privacy Challenges of Large Language Models: A Survey Badhan Chandra Das, Yanzhao Wu, M. Hadi Amini Published: 2024-01-30Area: Surveys & ReviewsCitations: 351 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2024-01-30 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E6 / R4 (97%) | 351 |
| The Art of Refusal: A Survey of Abstention in Large Language Models Shangbin Feng, Lucy Lu Wang, Chenjun Xu, Bill Howe Published: 2024-07-25Area: Surveys & ReviewsCitations: 55 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey, safety-evaluation | 2024-07-25 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey, safety-evaluation | E5 / R3 (94%) | 55 |
| The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms Adam Davies, Ashkan Khakzar Published: 2024-08-11Area: Surveys & ReviewsCitations: 14 Tags: surveys-reviews, ai-safety, survey, interpretability | 2024-08-11 | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | E6 / R4 (95%) | 14 |
| The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies Bo Liu, Feng He, Philip S. Yu, Tianqing Zhu Published: 2024-07-28Area: Surveys & ReviewsCitations: 85 Tags: surveys-reviews, ai-safety, survey | 2024-07-28 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E7 / R4 (94%) | 85 |
| The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability Jiuding Sun, David Bau, Can Rager, Samuel Marks Published: 2024-08-02Area: Surveys & ReviewsCitations: 3 Tags: surveys-reviews, ai-safety, survey, interpretability | 2024-08-02 | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | E5 / R3 (95%) | 3 |
| The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment JinYeong Bak, Jing Yao, Xiaoyuan Yi, Muhua Huang Published: 2024-12-21Area: Surveys & ReviewsCitations: 9 Tags: alignment-training, surveys-reviews, ai-safety, survey | 2024-12-21 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | E6 / R3 (94%) | 9 |
| Towards Uncovering How Large Language Model Works: An Explainability Perspective Mengnan Du, Himabindu Lakkaraju, Haiyan Zhao, Fan Yang Published: 2024-02-16Area: Surveys & ReviewsCitations: 26 Tags: alignment-training, surveys-reviews, ai-safety, survey, interpretability | 2024-02-16 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey, interpretability | E5 / R3 (93%) | 26 |
| Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey Bo Liu, Shang Wang, Ming Ding, Xu Guo Published: 2024-06-12Area: Surveys & ReviewsCitations: 20 Tags: surveys-reviews, ai-safety, survey | 2024-06-12 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E7 / R3 (97%) | 20 |
| A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking Rose Hadshar Published: 2023-10-27Area: Surveys & ReviewsCitations: 11 Tags: surveys-reviews, ai-safety, survey | 2023-10-27 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E6 / R3 (92%) | 11 |
| AI Alignment: A Comprehensive Survey Wen Gao, Hantao Lou, Yizhou Wang, Juntao Dai Published: 2023-10-30Area: Surveys & ReviewsCitations: 320 Tags: alignment-training, surveys-reviews, ai-safety, survey, interpretability | 2023-10-30 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey, interpretability | E7 / R4 (97%) | 320 |
| AI Safety Subproblems for Software Engineering Researchers Prem Devanbu, Zhou Yu, David Gros Published: 2023-04-28Area: Surveys & ReviewsCitations: 4 Tags: surveys-reviews, ai-safety, survey | 2023-04-28 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E5 / R3 (94%) | 4 |
| An Overview of Catastrophic AI Risks Thomas Woodside, Dan Hendrycks, Mantas Mazeika Published: 2023-06-21Area: Surveys & ReviewsCitations: 258 Tags: surveys-reviews, ai-safety, survey | 2023-06-21 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E7 / R3 (100%) | 258 |
| Core Views on AI Safety: When, Why, What, and How Anthropic Published: -Area: Surveys & ReviewsCitations: - Tags: surveys-reviews, ai-safety, position, interpretability | - | Surveys & Reviews | surveys-reviews, ai-safety, position, interpretability | E6 / R4 (95%) | - |
| From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models Jing Yao, Xiting Wang, Xiaoyuan Yi, Xing Xie Published: 2023-08-23Area: Surveys & ReviewsCitations: 62 Tags: alignment-training, surveys-reviews, ai-safety, survey | 2023-08-23 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | E5 / R3 (94%) | 62 |
| Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey Carolyn Ashurst, Adrian Weller, Victoria Smith, Ali Shahin Shamsabadi Published: 2023-09-27Area: Surveys & ReviewsCitations: 41 Tags: surveys-reviews, ai-safety, survey | 2023-09-27 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E5 / R3 (97%) | 41 |
| Identifying and Mitigating the Security Risks of Generative AI Nicholas Carlini, Mihai Christodorescu, Jihye Choi, Elie Bursztein Published: 2023-08-28Area: Surveys & ReviewsCitations: 126 Tags: alignment-training, surveys-reviews, ai-safety, survey | 2023-08-28 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, survey | E6 / R4 (95%) | 126 |
| Large Language Model Alignment: A Survey Zishan Guo, Tianhao Shen, Deyi Xiong, Yufei Huang Published: 2023-09-26Area: Surveys & ReviewsCitations: 292 Tags: alignment-training, surveys-reviews, ai-safety, adversarial-robustness, survey, interpretability, safety-evaluation | 2023-09-26 | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, adversarial-robustness, survey, interpretability, safety-evaluation | E6 / R4 (97%) | 292 |
| Managing AI Risks in an Era of Rapid Progress Ashwin Acharya, Jan Brauner, Philip Torr, Andrew Yao Published: 2023-10-26Area: Surveys & ReviewsCitations: 80 Tags: surveys-reviews, ai-safety, position | 2023-10-26 | Surveys & Reviews | surveys-reviews, ai-safety, position | E5 / R3 (93%) | 80 |
| Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries Jonas Schuett, Leonie Koessler Published: 2023-07-17Area: Surveys & ReviewsCitations: 35 Tags: surveys-reviews, ai-safety, survey | 2023-07-17 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E8 / R4 (98%) | 35 |
| Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks Yu Fu, Md Abdullah Al Mamun, Nael Abu-Ghazaleh, Pedram Zaree Published: 2023-10-16Area: Surveys & ReviewsCitations: 238 Tags: surveys-reviews, ai-safety, adversarial-robustness, survey | 2023-10-16 | Surveys & Reviews | surveys-reviews, ai-safety, adversarial-robustness, survey | E6 / R3 (96%) | 238 |
| Taming Simulators: Challenges, Pathways and Vision for the Alignment of Large Language Models Leonard Bereska, Efstratios Gavves Published: -Area: Surveys & ReviewsCitations: - Tags: alignment-training, surveys-reviews, ai-safety, position | - | Surveys & Reviews | alignment-training, surveys-reviews, ai-safety, position | E6 / R4 (92%) | - |
| 200 Concrete Open Problems in Mechanistic Interpretability Neel Nanda Published: -Area: Surveys & ReviewsCitations: - Tags: surveys-reviews, ai-safety, survey, interpretability | - | Surveys & Reviews | surveys-reviews, ai-safety, survey, interpretability | - | - |
| A Survey of Machine Unlearning Thanh Tam Nguyen, Thanh Trung Huynh, Hongzhi Yin, Alan Wee-Chung Liew Published: 2022-09-06Area: Surveys & ReviewsCitations: 345 Tags: surveys-reviews, ai-safety, survey | 2022-09-06 | Surveys & Reviews | surveys-reviews, ai-safety, survey | E5 / R3 (94%) | 345 |
| Clarifying AI X-risk Vikrant Varma, Rohin Shah, Ramana Kumar, Elliot Catt Published: -Area: Surveys & ReviewsCitations: - Tags: surveys-reviews, ai-safety, position | - | Surveys & Reviews | surveys-reviews, ai-safety, position | - | - |