Paper deep dive

Managing AI Risks in an Era of Rapid Progress

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

Year: 2023Venue: arXiv preprintArea: Surveys & ReviewsType: PositionEmbeddings: 48

Models: GPT-2, GPT-4

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 93%

Last extracted: 3/12/2026, 7:57:03 PM

Summary

This paper addresses the urgent need for managing extreme risks associated with rapidly advancing autonomous AI systems. It argues that current AI safety research and governance are insufficient, proposing a comprehensive strategy that combines increased technical R&D investment with proactive, adaptive, and enforceable governance mechanisms to ensure AI development remains safe and aligned with human interests.

Entities (5)

Yoshua Bengio · researcher · 100%Autonomous AI Systems · technology · 98%AI Safety Research · field-of-study · 95%Governance Frameworks · policy · 92%Safety Cases · methodology · 90%

Relation Signals (3)

Autonomous AI Systems → posesriskof → Loss of human control

confidence 95% · Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include... an irreversible loss of human control over autonomous AI systems.

Governance Frameworks → shouldmitigate → Autonomous AI Systems

confidence 92% · We need governance measures that prepare us for sudden AI breakthroughs... and mitigation standards commensurate to powerful autonomous AI.

AI Safety Research → islackingin → Funding

confidence 90% · Humanity is pouring vast resources into making AI systems more powerful but far less into their safety and mitigating their harms.

Cypher Suggestions (2)

Find all risks associated with autonomous AI systems · confidence 90% · unvalidated

MATCH (a:Technology {name: 'Autonomous AI Systems'})-[:POSES_RISK_OF]->(r:Risk) RETURN a.name, r.name

Identify researchers and their associated institutions · confidence 85% · unvalidated

MATCH (r:Researcher)-[:AFFILIATED_WITH]->(i:Institution) RETURN r.name, i.name

Abstract

Abstract:Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation.

PDF

Open source PDF →Open local PDF →

Full Text

47,437 characters extracted from source content.

Expand or collapse full text

This is the author’s version of the work. It is posted here by permission of the AAAS for personal use, not for redistribution. The definitive version was published in Science on 20 May 2024, DOI: 10.1126/science.adn0117. Managing extreme AI risks amid rapid progress Yoshua BengioMila - Quebec AI Institute, Universit ́ e de Montr ́ eal Geoffrey HintonUniversity of Toronto, Vector Institute Andrew YaoTsinghua University Dawn SongUC Berkeley Pieter AbbeelUC Berkeley Trevor DarrellUC Berkeley Yuval Noah HarariThe Hebrew University of Jerusalem Ya-Qin ZhangTsinghua University Lan XueInstitute for AI International Governance, Tsinghua University Shai Shalev-ShwartzThe Hebrew University of Jerusalem Gillian HadfieldUniversity of Toronto, Schwartz Reisman Inst. for Technology and Society, Vector Inst. Jeff CluneUniversity of British Columbia, Vector Institute Tegan MaharajUniversity of Toronto, Schwartz Reisman Inst. for Technology and Society, Vector Inst. Frank HutterELLIS Institute T ̈ ubingen, University of Freiburg Atılım G ̈ unes ̧ BaydinUniversity of Oxford Sheila McIlraithUniversity of Toronto, Schwartz Reisman Inst. for Technology and Society, Vector Inst. Qiqi GaoEast China University of Political Science and Law Ashwin AcharyaRAND Corporation David KruegerUniversity of Cambridge Anca DraganUC Berkeley Philip TorrUniversity of Oxford Stuart RussellUC Berkeley Daniel KahnemanSchool of Public and International Affairs, Princeton University Jan Brauner*University of Oxford, RAND Corporation S ̈ oren Mindermann*University of Oxford, Mila - Quebec AI Institute, Universit ́ e de Montr ́ eal Abstract Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI 1 , there is a lack of consensus about how exactly such risk arise, and how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation. *Equal contribution, order determined randomly; correspondence to jan.m.brauner@gmail.com and soeren.mindermann@gmail.com. page 1 of 11 arXiv:2310.17688v3 [cs.CY] 22 May 2024 Author’s version. Published paper in Science: 10.1126/science.adn0117 Rapid progress Current deep learning systems still lack important capabilities and we do not know how long it will take to develop them. However, companies are engaged in a race to create generalist AI systems that match or exceed human abilities in most cog- nitive work 2,3 . They are rapidly deploying more resources and developing new techniques to in- crease AI capabilities, with investment in training state-of-the-art models tripling annually 4 . There is much room for further advances, as tech companies have the cash reserves needed to scale the latest training runs by multiples of 100 to 1000 5 . Hardware and algorithms will also improve: AI computing chips have been getting 1.4 times more cost-effective, and AI training algorithms 2.5 times more efficient, each year 6,7 . Progress in AI also enables faster AI progress 8 : AI assistants are increasingly used to automate programming 9 , data collection 10,11 , and chip design 12 . There is no fundamental reason for AI progress to slow or halt at human-level abilities. Indeed, AI has already surpassed human abilities in narrow domains like playing strategy games and predict- ing how proteins fold 13–15 . Compared to humans, AI systems can act faster, absorb more knowledge, and communicate at higher bandwidth. Addition- ally, they can be scaled to use immense compu- tational resources and can be replicated by the millions. We don’t know for certain how the future of AI will unfold. However, we must take seriously the possibility that highly powerful generalist AI systems—outperforming human abilities across many critical domains—will be developed within the current decade or the next. What happens then? More capable AI systems have larger impacts. Es- pecially as AI matches and surpasses human work- ers in capabilities and cost-effectiveness, we expect a massive increase in AI deployment, opportunities, and risks. If managed carefully and distributed fairly, AI could help humanity cure diseases, ele- vate living standards, and protect ecosystems. The opportunities are immense. But alongside advanced AI capabilities come large-scale risks that we are not on track to han- dle well. Humanity is pouring vast resources into making AI systems more powerful but far less into their safety and mitigating their harms. Only an es- timated 1-3% of AI publications are on safety 16,17 . For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough. We are already behind schedule for this reori- entation. The scale of the risks means that we need to be proactive, as the costs of being unpre- pared far outweigh those of premature preparation. We must anticipate the amplification of ongoing harms, as well as novel risks, and prepare for the largest risks well before they materialize. Climate change has taken decades to be acknowledged and confronted; for AI, decades could be too long. Societal-scale risks If not carefully designed and deployed, increas- ingly advanced AI systems threaten to amplify so- cial injustice, erode social stability, and weaken our shared understanding of reality that is founda- tional to society. They could also enable large-scale criminal or terrorist activities. Especially in the hands of a few powerful actors, AI could cement or exacerbate global inequities, or facilitate au- tomated warfare, customized mass manipulation, and pervasive surveillance 18–23 . Many of these risks could soon be amplified, and new risks created, as companies are working to develop autonomous AI: systems that can pursue goals and act in the world. While current AI sys- tems have limited autonomy, work is underway to change this 24 . For example, the non-autonomous GPT-4 model was quickly adapted to browse the web, design and execute chemistry experiments, and utilize software tools, including other AI mod- els 25–28 . If we build highly advanced autonomous AI, we risk creating systems that pursue undesirable goals. Malicious actors could deliberately embed unde- sirable goals. Without R&D breakthroughs (see below), even well-meaning developers may inad- vertently create AI systems pursuing unintended goals: The reward signal used to train AI systems usually fails to fully capture the intended objec- tives, leading to AI systems that pursue the literal specification rather than the intended outcome 29 . Additionally, the training data never captures all relevant situations, leading to AI systems that pur- sue undesirable goals in novel situations encoun- tered after training. Once autonomous AI systems pursue undesir- able goals, we may be unable to keep them in check. Control of software is an old and unsolved problem: computer worms have long been able to proliferate and avoid detection 30 . However, AI is making progress in critical domains such as hacking, social manipulation, and strategic plan- ning 24,31 , and may soon pose unprecedented con- trol challenges. Managing extreme AI risks amid rapid progresspage 2 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 To advance undesirable goals, future au- tonomous AI systems could use undesirable strategies—learned from humans or developed independently—as a means to an end 32–35 . AI systems could gain human trust, acquire financial resources, influence key decision-makers, and form coalitions with human actors and other AI systems. To avoid human intervention 35 , they might copy their algorithms across global server networks 36 , as computer worms do. AI assistants are already co-writing a substantial share of computer code worldwide 37 ; future AI systems could insert and then exploit security vulnerabilities to control the computer systems behind our communication, me- dia, banking, supply-chains, militaries, and gov- ernments. In open conflict, AI systems could au- tonomously deploy a variety of weapons, includ- ing biological ones. AI systems having access to such technology would merely continue existing trends to automate military activity and biological research. If AI systems pursued such strategies with sufficient skill, it would be difficult for hu- mans to intervene. Finally, AI systems will not need to plot for influ- ence if it is freely handed over. As autonomous AI systems increasingly become faster and more cost- effective than human workers, a dilemma emerges. Companies, governments, and militaries might be forced to deploy AI systems widely and cut back on expensive human verification of AI decisions, or risk being outcompeted 19,38 . As a result, au- tonomous AI systems could increasingly assume critical societal roles. Without sufficient caution, we may irreversibly lose control of autonomous AI systems, rendering human intervention ineffective. Large-scale cyber- crime, social manipulation, and other harms could escalate rapidly. This unchecked AI advancement could culminate in a large-scale loss of life and the biosphere, and the marginalization or extinction of humanity. Harms such as misinformation and discrimina- tion from algorithms are already evident today; other harms show signs of emerging. It is vi- tal to both address ongoing harms and anticipate emerging risks. This is not a question of either/or. Present and emerging risks often share similar mechanisms, patterns, and solutions 39 ; investing in governance frameworks and AI safety will bear fruit on multiple fronts 40 . Reorient Technical R&D There are many open technical challenges in en- suring the safety and ethical use of generalist, au- tonomous AI systems. Unlike advancing AI ca- pabilities, these challenges cannot be addressed by simply using more computing power to train bigger models. They are unlikely to resolve auto- matically as AI systems get more capable 33,41–45 , and require dedicated research and engineering efforts. In some cases, leaps of progress may be needed; we thus do not know if technical work can fundamentally solve these challenges in time. However, there has been comparatively little work on many of these challenges. More R&D may thus make progress and reduce risks. A first set of R&D areas needs breakthroughs to enable reliably safe AI. Without this progress, de- velopers must either risk creating unsafe systems or falling behind competitors who are willing to take more risks. If ensuring safety remains too difficult, extreme governance measures would be needed to prevent corner-cutting driven by compe- tition and overconfidence. These R&D challenges include: Oversight and honesty:More capable AI systems can better exploit weaknesses in technical over- sight and testing 42,46,47 —for example, by produc- ing false but compelling output 43,48,49 . Robustness:AI systems behave unpredictably in new situations. While some aspects of robustness improve with model scale 50 , other aspects do not or even get worse 44,51–53 . Interpretability and transparency:AI decision- making is opaque, with larger, more capable, mod- els being more complex to interpret. So far, we can only test large models via trial and error. We need to learn to understand their inner workings 54 . Inclusive AI development:AI advancement will need methods to mitigate biases and integrate the values of the many populations it will affect 20,55 . Addressing emerging challenges:Future AI sys- tems may exhibit failure modes we have so far seen only in theory or lab experiments, such as AI systems taking control over the training reward- provision channels or exploiting weaknesses in our safety objectives and shutdown mechanisms to ad- vance a particular goal 35,56–58 . A second set of R&D challenges need progress to enable effective, risk-adjusted governance, or reduce harms when safety and governance fail: Evaluation for dangerous capabilities:As AI de- velopers scale their systems, unforeseen capabil- ities appear spontaneously, without explicit pro- gramming 59 . They are often only discovered after Managing extreme AI risks amid rapid progresspage 3 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 deployment 60–62 . We need rigorous methods to elicit and assess AI capabilities, and to predict them before training. This includes both generic capabil- ities to achieve ambitious goals in the world (e.g., long-term planning and execution), as well as spe- cific dangerous capabilities based on threat models (e.g. social manipulation or hacking). Current eval- uations of frontier AI models for dangerous capabil- ities 63 —key to various AI policy frameworks—are limited to spot-checks and attempted demonstra- tions in specific settings 36,64,65 . These evaluations can sometimes demonstrate dangerous capabilities but cannot reliably rule them out: AI systems that lacked certain capabilities in the tests may well demonstrate them in slightly different settings or with post-training enhancements. Decisions that depend on AI systems not crossing any red lines thus need large safety margins. Improved evalua- tion tools decrease the chance of missing danger- ous capabilities, allowing for smaller margins. Evaluating AI alignment:If AI progress contin- ues, AI systems will eventually possess highly dan- gerous capabilities. Before training and deploying such systems, we need methods to assess their propensity to use these capabilities. Purely behav- ioral evaluations may fail for advanced AI systems: like humans, they might behave differently under evaluation, faking alignment 56–58 . Risk assessment:We must learn to assess not just dangerous capabilities, but risk in a societal con- text, with complex interactions and vulnerabilities. Rigorous risk assessment for frontier AI systems remains an open challenge due to their broad ca- pabilities and pervasive deployment across diverse application areas 66 . Resilience:Inevitably, some will misuse or act recklessly with AI. We need tools to detect and de- fend against AI-enabled threats such as large-scale influence operations, biological risks, and cyber- attacks. However, as AI systems become more capable, they will eventually be able to circumvent human-made defenses. To enable more powerful AI-based defenses, we first need to learn how to make AI systems safe and aligned. Given the stakes, we call on major tech compa- nies and public funders to allocate at least one- third of their AI R&D budget—comparable to their funding for AI capabilities—towards addressing the above R&D challenges and ensuring AI safety and ethical use 44 . Beyond traditional research grants, government support could include prizes, advance market commitments 67 , and other incen- tives. Addressing these challenges, with an eye toward powerful future systems, must become cen- tral to our field. Governance measures We urgently need national institutions and inter- national governance to enforce standards prevent- ing recklessness and misuse. Many areas of tech- nology, from pharmaceuticals to financial systems and nuclear energy, show that society requires and effectively uses government oversight to reduce risks. However, governance frameworks for AI are far less developed, lagging behind rapid tech- nological progress. We can take inspiration from the governance of other safety-critical technolo- gies, while keeping the uniqueness of advanced AI in mind: that it far outstrips other technolo- gies in its potential to act and develop ideas au- tonomously, progress explosively, behave adversar- ially, and cause irreversible damage. Governments worldwide have taken positive steps on frontier AI, with key players including China, the US, the EU, and the UK engaging in dis- cussions 68,69 and introducing initial guidelines or regulations 70–73 . Despite their limitations—often voluntary adherence, limited geographic scope, and exclusion of high-risk areas like military and R&D-stage systems—they are important initial steps towards, amongst others, developer account- ability, third-party audits, and industry standards. Yet, these governance plans fall critically short in view of the rapid progress in AI capabilities. We need governance measures that prepare us for sud- den AI breakthroughs, while being politically fea- sible despite disagreement and uncertainty about AI timelines. The key is policies that automati- cally trigger when AI hits certain capability mile- stones. If AI advances rapidly, strict requirements automatically take effect, but if progress slows, the requirements relax accordingly. Rapid, unpre- dictable progress also means that risk reduction efforts must be proactive—identifying risks from next-generation systems and requiring developers to address them before taking high-risk actions. We need fast-acting, tech-savvy institutions for AI oversight, mandatory and much more rigorous risk assessments with enforceable consequences (in- cluding assessments that put the burden of proof on AI developers), and mitigation standards com- mensurate to powerful autonomous AI. Without these, companies, militaries, and gov- ernments may seek a competitive edge by pushing AI capabilities to new heights while cutting corners on safety, or by delegating key societal roles to au- tonomous AI systems with insufficient human over- Managing extreme AI risks amid rapid progresspage 4 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 sight; reaping the rewards of AI development while leaving society to deal with the consequences. Institutions to govern the rapidly moving frontier of AI.To keep up with rapid progress and avoid quickly outdated, inflexible laws 74–76 national in- stitutions need strong technical expertise and the authority to act swiftly. To facilitate technically demanding risk assessments and mitigations, they will require far greater funding and talent than they are due to receive under almost any current policy plan. To address international race dynam- ics, they need the affordance to facilitate interna- tional agreements and partnerships 77,78 . Institu- tions should protect low-risk use and low-risk aca- demic research, by avoiding undue bureaucratic hurdles for small, predictable AI models. The most pressing scrutiny should be on AI systems at the frontier: the few most powerful systems – trained on billion-dollar supercomputers – which will have the most hazardous and unpredictable capabili- ties 79,80 . Government insight.To identify risks, govern- ments urgently need comprehensive insight into AI development. Regulators should mandate whistle- blower protections, incident reporting, registration of key information on frontier AI systems and their data sets throughout their life cycle, and moni- toring of model development and supercomputer usage 81 . Recent policy developments should not stop at requiring that companies report the re- sults of voluntary or underspecified model eval- uations shortly before deployment 70,72 . Regulators can and should require that frontier AI develop- ers grant external auditors on-site, comprehensive (“white-box”), and fine-tuning access from the start of model development 82 . This is needed to identify dangerous model capabilities such as autonomous self-replication, large-scale persuasion, breaking into computer systems, developing (autonomous) weapons, or making pandemic pathogens widely accessible 36,63–65,83,84 . Safety cases.Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe”. With current testing methodologies, issues can easily be missed. Addi- tionally, it is unclear if governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal- scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable lim- its. Doing so, they would follow best practices for risk management from industries such as avia- tion 85 , medical devices 86 , and defense software 87 , where companies make safety cases 88–92 : struc- tured arguments with falsifiable claims supported by evidence, which identify potential hazards, de- scribe mitigations, show that systems will not cross certain red lines, and model possible outcomes to assess risk. Safety cases could leverage developers’ in-depth experience with their own systems. Safety cases are politically viable even when people dis- agree on how advanced AI will become, since it is easier to demonstrate a system is safe when its ca- pabilities are limited. Governments are not passive recipients of safety cases: they set risk thresholds, codify best practices, employ experts and third- party auditors to assess safety cases and conduct independent model evaluations, and hold develop- ers liable if their safety claims are later falsified. Mitigation.To keep AI risks within acceptable limits, we need governance mechanisms matched to the magnitude of the risks 79,93–95 . Regulators should clarify legal responsibilities arising from ex- isting liability frameworks and hold frontier AI de- velopers and owners legally accountable for harms from their models that can be reasonably foreseen and prevented—including harms that foreseeably arise from deploying powerful AI systems whose behavior they cannot predict. Liability, together with consequential evaluations and safety cases, can prevent harm and create much-needed incen- tives to invest in safety. Commensurate mitigations are needed for ex- ceptionally capable future AI systems, like au- tonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deploy- ment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers, until ade- quate protections are ready. Governments should build these capacities now. To bridge the time until regulations are com- plete, major AI companies should promptly lay out “if-then” commitments: specific safety mea- sures they will take if specific red-line capabili- ties 63 are found in their AI systems. These com- mitments should be detailed and independently scrutinized. Regulators should encourage a race- to-the-top among companies by using the best-in- class commitments, together with other inputs, to inform standards that apply to all players. To steer AI toward positive outcomes and away from catastrophe, we need to reorient. There is a responsible path, if we have the wisdom to take it. Managing extreme AI risks amid rapid progresspage 5 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 Acknowledgments Yoshua Bengio, Jeff Clune, Gillian Hadfield, Sheila McIlraith hold the position of CIFAR AI Chair. Jeff Clune is a Senior Research Advisor to Google Deep- Mind. Ashwin Acharya reports acting as an advisor to the Civic AI Security Program. Ashwin Acharya was affiliated with the Institute for AI Policy and Strategy at the time of the first submission. Anca Dragan now hold an appointment at Google Deep- Mind, but joined the company after the manuscript was written. Dawn Song is the president of Oa- sis Labs. Trevor Darrell is a cofounder of Prompt AI. Pieter Abbeel is a cofounder at covariant.ai and Investment Partner at AIX Ventures. Shai Shalev-Shwartz is the CTO at Mobileye. David Krueger served as a Research Director for the UK Foundation Model Task Force in 2023, and joined the board of the non-profit Center for AI Policy in 2024. Gillian Hadfield reports the following activ- ities: 2018-2023: Senior Policy Advisor, OpenAI; 2023-present: Member, RAND Technology Advi- sory Group; 2022-present: Member, Partnership on AI, Safety Critical AI Steering Committee. In gratitude and remembrance of Daniel Kahneman, our co-author, whose remarkable contributions to this paper and to humanity’s cumulative knowl- edge and wisdom will never be forgotten. References and notes [1] Statement on AI risk,https://w.safe .ai/work/statement- on- ai- risk, Ac- cessed: 2024-5-1, 2023. [2] DeepMind,About,https://w.deepmind .com/about, Accessed: 2023-9-15, n.d. [3] OpenAI,About,https://openai.com/abo ut, Accessed: 2023-9-15, n.d. [4]B. Cottier,Trends in the dollar training cost of machine learning systems, 2023. [5] Alphabet,Alphabet annual report, page 33 (page 71 in the pdf): ‘As of December 31, 2022, we had USD113.8 billion in cash, cash equivalents, and short-term marketable securities’. [For comparison, the cost of training GPT-4 has been estimated as USD50 million (https://epochai.org/trends), and Sam Altman, the CEO of OpenAI, has stated that the cost for the whole process was more than USD100 million (https://w.wired.com/story/openai-ceo- sam-altman-the-age-of-giant-ai-models-is- already-over/).]https://abc.xyz/asset s/d4/4f/a48b94d548d0b2fdc029a95e8c6 3/2022-alphabet-annual-report.pdf , 2022. [6] M. Hobbhahn, L. Heim, and G. Aydos, Trends in machine learning hardware, 2023. [7] E. Erdil and T. Besiroglu, “Algorithmic progress in computer vision,” Dec. 2022. arXiv:2212.05153 [cs.CV]. [8] Examples of AI improving AI,https://ai-i mproving-ai.safe.ai/ , Accessed: 2023-9- 15, n.d. [9]M. Tabachnyk,ML-enhanced code completion improves developer productivity,https://b log.research.google/2022/07/ml-enh anced-code-completion-improves.html , Accessed: 2023-9-15, n.d. [10] Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Ols- son, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, K. Lukosuite, L. Lovitt, M. Sellitto, N. Elhage, N. Schiefer, N. Mercado, N. Das- Sarma, R. Lasenby, R. Larson, S. Ringer, S. Johnston, S. Kravec, S. El Showk, S. Fort, T. Lanham, T. Telleen-Lawton, T. Conerly, T. Henighan, T. Hume, S. R. Bowman, Z. Hatfield-Dodds, B. Mann, D. Amodei, N. Joseph, S. McCandlish, T. Brown, and J. Ka- plan, “Constitutional AI: Harmlessness from AI feedback,” Dec. 2022. arXiv:2212.08073 [cs.CL]. [11]OpenAI, “GPT-4 technical report,” Mar. 2023. arXiv:2303.08774 [cs.CL]. [12] A. Mirhoseini, A. Goldie, M. Yazgan, J. W. Jiang, E. Songhori, S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, A. Nazi, J. Pak, A. Tong, K. Srinivasa, W. Hang, E. Tuncer, Q. V. Le, J. Laudon, R. Ho, R. Carpenter, and J. Dean, “A graph placement methodology for fast chip design,”Nature, vol. 594, no. 7862, p. 207–212, Jun. 2021. [13]J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasu- vunakool, R. Bates, A. ˇ Z ́ ıdek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Pe- tersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Bergham- mer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli, and Managing extreme AI risks amid rapid progresspage 6 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 D. Hassabis, “Highly accurate protein struc- ture prediction with AlphaFold,”Nature, vol. 596, no. 7873, p. 583–589, Aug. 2021. [14]N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,”Science, vol. 365, no. 6456, p. 885–890, Aug. 2019. [15]M. Campbell, A. J. Hoane, and F.-H. Hsu, “Deep Blue,”Artif. Intell., vol. 134, no. 1, p. 57–83, Jan. 2002. [16]H. Toner and A. Acharya,Exploring clusters of research in three areas of AI safety, Center for Security and Emerging Technology, Feb. 2022. [17]Emerging Technology Observatory,AI safety – ETO research almanac,https://almanac. eto.tech/topics/ai-safety/, Accessed: 2024-2-12, n.d. [18]L. Weidinger, J. Uesato, M. Rauh, C. Grif- fin, P.-S. Huang, J. Mellor, A. Glaese, M. Cheng, B. Balle, A. Kasirzadeh, C. Biles, S. Brown, Z. Kenton, W. Hawkins, T. Stepleton, A. Birhane, L. A. Hendricks, L. Rimell, W. Isaac, J. Haas, S. Legassick, G. Irving, and I. Gabriel, “Taxonomy of risks posed by lan- guage models,” inProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, ser. FAccT ’22, Seoul, Re- public of Korea: Association for Computing Machinery, Jun. 2022, p. 214–229. [19]A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Lan- gosco, Z. He, Y. Duan, M. Carroll, M. Lin, A. Mayhew, K. Collins, M. Molamohammadi, J. Burden, W. Zhao, S. Rismani, K. Voudouris, U. Bhatt, A. Weller, D. Krueger, and T. Ma- haraj, “Harms from increasingly agentic al- gorithmic systems,” inProceedings of the 2023 ACM Conference on Fairness, Account- ability, and Transparency, ser. FAccT ’23, Chicago, IL, USA: Association for Comput- ing Machinery, Jun. 2023, p. 651–666. [20]V. Eubanks,Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor. St Martin’s Press, 2018. [21] D. Hendrycks, M. Mazeika, and T. Woodside, “An overview of catastrophic AI risks,” Jun. 2023. arXiv:2306.12001 [cs.CY]. [22] R. Bommasaniet al., “On the opportunities and risks of foundation models,” Aug. 2021. arXiv:2108.07258 [cs.LG]. [23]I. Solaiman, Z. Talat, W. Agnew, L. Ahmad, D. Baker, S. L. Blodgett, H. Daum ́ e I, J. Dodge, E. Evans, S. Hooker, Y. Jernite, A. S. Luccioni, A. Lusoli, M. Mitchell, J. Newman, M.-T. Png, A. Strait, and A. Vassilev, “Eval- uating the social impact of generative AI systems in systems and society,” Jun. 2023. arXiv:2306.05949 [cs.CY]. [24]L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, and J.-R. Wen, “A survey on large language model based au- tonomous agents,” Aug. 2023. arXiv:2308 .11432 [cs.AI]. [25]ChatGPT plugins,https://openai.com/b log/chatgpt-plugins, Accessed: 2023-9- 15, n.d. [26]A. M. Bran, S. Cox, A. D. White, and P. Schwaller, “ChemCrow: Augmenting large-language models with chemistry tools,” Apr. 2023. arXiv:2304 . 05376 [physics.chem-ph]. [27] G. Mialon, R. Dess ` ı, M. Lomeli, C. Nalmpan- tis, R. Pasunuru, R. Raileanu, B. Rozi ` ere, T. Schick, J. Dwivedi-Yu, A. Celikyilmaz, E. Grave, Y. LeCun, and T. Scialom, “Aug- mented language models: A survey,” Feb. 2023. arXiv:2302.07842 [cs.CL]. [28]Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hug- ging Face,” Mar. 2023. arXiv:2303.17580 [cs.CL]. [29]D. Hadfield-Menell and G. K. Hadfield, “In- complete contracting and ai alignment,” in Proceedings of the 2019 AAAI/ACM Con- ference on AI, Ethics, and Society, 2019, p. 417–422. [30]P. J. Denning, “The science of computing: The internet worm,”Am. Sci., vol. 77, no. 2, p. 126–128, 1989. [31]P. S. Park, S. Goldstein, A. O’Gara, M. Chen, and D. Hendrycks, “AI deception: A survey of examples, risks, and potential solutions,” Aug. 2023. arXiv:2308.14752 [cs.CY]. [32]A. M. Turner, L. Smith, R. Shah, A. Critch, et al., “Optimal policies tend to seek power,” Thirty-Fifth Conference on Neural Informa- tion Processing Systems, 2019. Managing extreme AI risks amid rapid progresspage 7 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 [33]E. Perez, S. Ringer, K. Luko ˇ si ̄ ut ̇ e, K. Nguyen, E. Chen, S. Heiner, C. Pettit, C. Olsson, S. Kundu, S. Kadavath, A. Jones, A. Chen, B. Mann, B. Israel, B. Seethor, C. McKinnon, C. Olah, D. Yan, D. Amodei, D. Amodei, D. Drain, D. Li, E. Tran-Johnson, G. Khun- dadze, J. Kernion, J. Landis, J. Kerr, J. Mueller, J. Hyun, J. Landau, K. Ndousse, L. Goldberg, L. Lovitt, M. Lucas, M. Sel- litto, M. Zhang, N. Kingsland, N. Elhage, N. Joseph, N. Mercado, N. DasSarma, O. Rausch, R. Larson, S. McCandlish, S. John- ston, S. Kravec, S. El Showk, T. Lanham, T. Telleen-Lawton, T. Brown, T. Henighan, T. Hume, Y. Bai, Z. Hatfield-Dodds, J. Clark, S. R. Bowman, A. Askell, R. Grosse, D. Hernandez, D. Ganguli, E. Hubinger, N. Schiefer, and J. Kaplan, “Discovering lan- guage model behaviors with model-written evaluations,” Dec. 2022. arXiv:2212.09251 [cs.CL]. [34]A. Pan, J. S. Chan, A. Zou, N. Li, S. Basart, T. Woodside, J. Ng, H. Zhang, S. Emmons, and D. Hendrycks, “Do the rewards justify the means? Measuring trade-offs between rewards and ethical behavior in the MACHI- AVELLI benchmark,”International Confer- ence on Machine Learning, 2023. [35] D. Hadfield-Menell, A. Dragan, P. Abbeel, and S. Russell, “The Off-Switch game,”Pro- ceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, p. 220–227, 2017. [36]M. Kinniment, L. J. K. Sato, H. Du, B. Goodrich, M. Hasin, L. Chan, L. H. Miles, T. R. Lin, H. Wijk, J. Burget, A. Ho, E. Barnes, and P. Christiano, “Evaluat- ing language-model agents on realistic au- tonomous tasks,” Dec. 2023. arXiv:2312.1 1671 [cs.CL]. [37] T. Dohmke,GitHub Copilot,https://gi thub.blog/2023-02-14-github-copil ot-for-business-is-now-available/ , Accessed: 2023-9-15, n.d. [38]A. Critch and S. Russell, “TASRA: A taxon- omy and analysis of societal-scale risks from AI,” Jun. 2023. arXiv:2306.06924 [cs.AI]. [39]J. Brauner and A. Chan, “AI poses doomsday risks—but that doesn’t mean we shouldn’t talk about present harms too,”Time, Aug. 2023. [40]Center for AI Safety,Existing policy propos- als targeting present and future harms,htt ps://assets-global.website-files.c om/63fe96aeda6bea77ac7d3000/647d536 8c2368c32b359f88_Policy%20Agreemen t%20Statement.pdf, Accessed: 2023-9-15, Jun. 2023. [41]I. R. McKenzie, A. Lyzhov, M. Pieler, A. Par- rish, A. Mueller, A. Prabhu, E. McLean, A. Kirtland, A. Ross, A. Liu, A. Gritsevskiy, D. Wurgaft, D. Kauffman, G. Recchia, J. Liu, J. Cavanagh, M. Weiss, S. Huang, The Float- ing Droid, T. Tseng, T. Korbak, X. Shen, Y. Zhang, Z. Zhou, N. Kim, S. R. Bowman, and E. Perez, “Inverse scaling: When bigger isn’t better,”Transactions on Machine Learning Research, Oct. 2023. [42] A. Pan, K. Bhatia, and J. Steinhardt, “The effects of reward misspecification: Mapping and mitigating misaligned models,”Inter- national Conference on Learning Representa- tions, 2022. [43]S. Casper, X. Davies, C. Shi, T. K. Gilbert, J. Scheurer, J. Rando, R. Freedman, T. Ko- rbak, D. Lindner, P. Freire, T. Wang, S. Marks, C.-R. Segerie, M. Carroll, A. Peng, P. Christoffersen, M. Damani, S. Slocum, U. Anwar, A. Siththaranjan, M. Nadeau, E. J. Michaud, J. Pfau, D. Krasheninnikov, X. Chen, L. Langosco, P. Hase, E. Bıyık, A. Dra- gan, D. Krueger, D. Sadigh, and D. Hadfield- Menell, “Open problems and fundamental limitations of reinforcement learning from human feedback,” Jul. 2023. arXiv:2307.1 5217 [cs.AI]. [44] D. Hendrycks, N. Carlini, J. Schulman, and J. Steinhardt, “Unsolved problems in ML safety,” Sep. 2021. arXiv:2109 . 13916 [cs.LG]. [45] J. Wei, D. Huang, Y. Lu, D. Zhou, and Q. V. Le, “Simple synthetic data reduces syco- phancy in large language models,” Aug. 2023. arXiv:2308.03958 [cs.CL]. [46]S. Zhuang and D. Hadfield-Menell, “Conse- quences of misaligned AI,”Adv. Neural Inf. Process. Syst., vol. 33, p. 15 763–15 773, 2020. [47]L. Gao, J. Schulman, and J. Hilton, “Scaling laws for reward model overoptimization,” inProceedings of the 40th International Con- ference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, Managing extreme AI risks amid rapid progresspage 8 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 and J. Scarlett, Eds., ser. Proceedings of Ma- chine Learning Research, vol. 202, PMLR, 2023, p. 10 835–10 866. [48]M. Sharma, M. Tong, T. Korbak, D. Duve- naud, A. Askell, S. R. Bowman, N. Cheng, E. Durmus, Z. Hatfield-Dodds, S. R. John- ston, S. Kravec, T. Maxwell, S. McCandlish, K. Ndousse, O. Rausch, N. Schiefer, D. Yan, M. Zhang, and E. Perez, “Towards under- standing sycophancy in language models,” Oct. 2023. arXiv:2310.13548 [cs.CL]. [49]D. Amodei, P. Christiano, and A. Ray,Learn- ing from human preferences,https://open ai.com/research/learning-from-human -preferences, Accessed: 2023-9-15, n.d. [50]D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. Zhu, S. Parajuli, M. Guo, D. Song, J. Steinhardt, and J. Gilmer, “The many faces of robustness: A critical analysis of out-of-distribution gen- eralization,” Jun. 2020. arXiv:2006.16241 [cs.CV]. [51] L. L. D. Langosco, J. Koch, L. D. Sharkey, J. Pfau, and D. Krueger, “Goal misgener- alization in deep reinforcement learning,” Proceedings of Machine Learning Research, vol. 162, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., p. 12 004–12 019, 2022. [52]R. Shah, V. Varma, R. Kumar, M. Phuong, V. Krakovna, J. Uesato, and Z. Kenton, “Goal misgeneralization: Why correct specifica- tions aren’t enough for correct goals,” Oct. 2022. arXiv:2210.01790 [cs.LG]. [53] T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D. Dennis, Y. Duan, V. Pogrebniak, S. Levine, and S. Russell, “Ad- versarial policies beat superhuman go AIs,” Nov. 2022. arXiv:2211.00241 [cs.LG]. [54]T. R ̈ auker, A. Ho, S. Casper, and D. Hadfield- Menell, “Toward transparent AI: A survey on interpreting the inner structures of deep neural networks,” in2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), Feb. 2023, p. 464–483. [55]A. Sen, “Social choice theory,” inHandbook of Mathematical Economics, Vol. I, K. J. Ar- row and M. Intriligator, Eds., Amsterdam: North Holland, 1986. [56] R. Ngo, L. Chan, and S. Mindermann, “The alignment problem from a deep learning perspective,”International Conference on Learning Representations 2024, Jan. 2024. [57]E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Du- venaud, D. Ganguli, F. Barez, J. Clark, K. Ndousse, K. Sachan, M. Sellitto, M. Sharma, N. DasSarma, R. Grosse, S. Kravec, Y. Bai, Z. Witten, M. Favaro, J. Brauner, H. Karnof- sky, P. Christiano, S. R. Bowman, L. Gra- ham, J. Kaplan, S. Mindermann, R. Green- blatt, B. Shlegeris, N. Schiefer, and E. Perez, “Sleeper agents: Training deceptive LLMs that persist through safety training,” Jan. 2024. arXiv:2401.05566 [cs.CR]. [58]M. K. Cohen, N. Kolt, Y. Bengio, G. K. Had- field, and S. Russell, “Regulating advanced artificial agents,”Science, vol. 384, no. 6691, p. 36–38, Apr. 2024. [59]J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus, “Emergent abilities of large language mod- els,”Transactions on Machine Learning Re- search, Jun. 2022. [60] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reason- ing in large language models,”Adv. Neural Inf. Process. Syst., vol. 35, S. Koyejo, S. Mo- hamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., p. 24 824–24 837, Jan. 2022. [61] P. Zhou, J. Pujara, X. Ren, X. Chen, H.-T. Cheng, Q. V. Le, E. H. Chi, D. Zhou, S. Mishra, and H. S. Zheng, “Self-Discover: Large language models self-compose rea- soning structures,” Feb. 2024. arXiv:2402 .03620 [cs.AI]. [62] T. Davidson, J.-S. Denain, P. Villalobos, and G. Bas, “AI capabilities can be sig- nificantly improved without expensive re- training,” Dec. 2023. arXiv:2312 . 07413 [cs.AI]. [63]T. Shevlane, S. Farquhar, B. Garfinkel, M. Phuong, J. Whittlestone, J. Leung, D. Koko- tajlo, N. Marchal, M. Anderljung, N. Kolt, L. Ho, D. Siddarth, S. Avin, W. Hawkins, B. Managing extreme AI risks amid rapid progresspage 9 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 Kim, I. Gabriel, V. Bolina, J. Clark, Y. Bengio, P. Christiano, and A. Dafoe, “Model evalu- ation for extreme risks,” May 2023. arXiv: 2305.15324 [cs.AI]. [64]C. A. Mouton, C. Lucas, and E. Guest,The operational risks of AI in large-scale biological attacks: Results of a red-team study, Santa Monica, CA, 2024. [65] J. Scheurer, M. Balesni, and M. Hobbhahn, “Technical report: Large language models can strategically deceive their users when put under pressure,” Nov. 2023. arXiv:231 1.07590 [cs.CL]. [66]L. Koessler and J. Schuett, “Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety- critical industries,” Jul. 2023. arXiv:2307 .08823 [cs.CY]. [67]A. Ho and J. Taylor,Using advance market commitments for public purpose technology development, Policy Brief, Jun. 2021. [68]AI Safety Summit,The Bletchley Declaration by countries attending the AI Safety Summit, 1-2 November 2023, Nov. 2023. [69]OECD,G7 Hiroshima Process on Generative Artificial Intelligence (AI). OECD Publishing, 2023, p. 37. [70]The White House (US),Executive order on the safe, secure, and trustworthy development and use of artificial intelligence, Oct. 2023. [71]Cyberspace Administration of China,In- terim measures for generative artificial in- telligence service management,http://w w.cac.gov.cn/2023-07/13/c_1690898 327029107.htm, Accessed: 2024-2-12, Jul. 2023. [72]European Union,EU AI act,https://art ificialintelligenceact.eu/the-act/, Accessed: 2024-NA-NA, Jan. 2024. [73] Department of State for Science, Innovation and Technology (UK),A pro-innovation ap- proach to AI regulation,https://w.go v.uk/government/publications/ai-re gulation-a-pro-innovation-approach /white-paper , Accessed: 2024-2-12, Mar. 2023. [74]L. Xue, K. Jia, and J. Zhao, “Agile gover- nance practices in artificial intelligence: Cat- egorizing regulatory approaches and con- structing a policy toolbox,”Chinese Public Administration, n.d. [75] M. M. Maas, “Aligning AI regulation to so- ciotechnical change,” inThe Oxford Hand- book of AI Governance, J. B. Bullock, Y.-C. Chen, J. Himmelreich, V. M. Hudson, A. Ko- rinek, M. M. Young, and B. Zhang, Eds., Oxford University Press, n.d. [76]L. Xue and J. Zhao, “Toward agile gover- nance: The pattern of emerging industry development and regulation,”Chinese Pub- lic Administration, vol. 410, no. 4, p. 28– 34, 2019. [77]L. Ho, J. Barnhart, R. Trager, Y. Bengio, M. Brundage, A. Carnegie, R. Chowdhury, A. Dafoe, G. Hadfield, M. Levi, and D. Snidal, “International institutions for advanced AI,” Jul. 2023. arXiv:2307.04699 [cs.CY]. [78]R. F. Trager, B. Harack, A. Reuel, A. Carnegie, L. Heim, L. Ho, S. Kreps, R. Lall, O. Larter, S. ́ O h ́ Eigeartaigh, S. Staffell, and J. J. Villalobos,International governance of civilian AI: A jurisdictional certification ap- proach,https://cdn.governance.ai/In ternational_Governance_of_Civilian _AI_OMS.pdf, Aug. 2023. [79]M. Anderljung, J. Barnhart, A. Korinek, J. Leung, C. O’Keefe, J. Whittlestone, S. Avin, M. Brundage, J. Bullock, D. Cass-Beggs, B. Chang, T. Collins, T. Fist, G. Hadfield, A. Hayes, L. Ho, S. Hooker, E. Horvitz, N. Kolt, J. Schuett, Y. Shavit, D. Siddarth, R. Trager, and K. Wolf, “Frontier AI regulation: Man- aging emerging risks to public safety,” Jul. 2023. arXiv:2307.03718 [cs.CY]. [80] D. Ganguli, D. Hernandez, L. Lovitt, A. Askell, Y. Bai, A. Chen, T. Conerly, N. Das- sarma, D. Drain, N. Elhage, S. El Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, S. Johnston, A. Jones, N. Joseph, J. Kernian, S. Kravec, B. Mann, N. Nanda, K. Ndousse, C. Olsson, D. Amodei, T. Brown, J. Kaplan, S. McCandlish, C. Olah, D. Amodei, and J. Clark, “Predictability and surprise in large generative models,” inProceedings of the 2022 ACM Conference on Fairness, Account- ability, and Transparency, ser. FAccT ’22, Seoul, Republic of Korea: Association for Computing Machinery, Jun. 2022, p. 1747– 1764. [81]N. Kolt, M. Anderljung, J. Barnhart, A. Brass, K. Esvelt, G. K. Hadfield, L. Heim, M. Ro- driguez, J. B. Sandbrink, and T. Woodside, “Responsible reporting for frontier AI de- Managing extreme AI risks amid rapid progresspage 10 of 11 Author’s version. Published paper in Science: 10.1126/science.adn0117 velopment,” Apr. 2024. arXiv:2404.02675 [cs.CY]. [82]S. Casper, C. Ezell, C. Siegmann, N. Kolt, T. L. Curtis, B. Bucknall, A. Haupt, K. Wei, J. Scheurer, M. Hobbhahn, L. Sharkey, S. Krishna, M. Von Hagen, S. Alberti, A. Chan, Q. Sun, M. Gerovitch, D. Bau, M. Tegmark, D. Krueger, and D. Hadfield-Menell, “Black- box access is insufficient for rigorous AI audits,” Jan. 2024. arXiv:2401 . 14446 [cs.CY]. [83]M. Phuong, M. Aitchison, E. Catt, S. Cogan, A. Kaskasoli, V. Krakovna, D. Lindner, M. Rahtz, Y. Assael, S. Hodkinson, H. Howard, T. Lieberum, R. Kumar, M. A. Raad, A. Web- son, L. Ho, S. Lin, S. Farquhar, M. Hutter, G. Deletang, A. Ruoss, S. El-Sayed, S. Brown, A. Dragan, R. Shah, A. Dafoe, and T. Shevlane, “Evaluating frontier models for dangerous capabilities,” Mar. 2024. arXiv:2403.13793 [cs.LG]. [84] J. M ̈ okander, J. Schuett, H. R. Kirk, and L. Floridi, “Auditing large language models: A three-layered approach,”AI and Ethics, May 2023. [85]European Organisation for the Safety of Air Navigation,EAD safety case guidance, Dec. 2010. [86]Food and Drug Administration,Infusion pumps total product life cycle - guidance for industry and FDA staff, Dec. 2014. [87] SMP12. safety case and safety case report, https://w.asems.mod.uk/guidance /posms/smp12, Accessed: 2024-2-12, Jun. 2023. [88] J. Clymer, N. Gabrieli, D. Krueger, and T. Larsen, “Safety cases: How to justify the safety of advanced AI systems,” Mar. 2024. arXiv:2403.10462 [cs.CY]. [89]T. Kelly, “A systematic approach to safety case management,”SAE Trans. J. Mater. Manuf., vol. 113, p. 257–266, 2004. [90]J. Mcdermid and Y. Jia, “Safety of artificial intelligence: A collaborative model,” 2020. [91]Iso/iec,ISO/IEC 23894:2023 Standard on Information technology — Artificial intelli- gence — Guidance on risk management, Feb. 2023. [92] T. Raz and D. Hillson, “A comparative re- view of risk management standards,”Risk Manage.: Int. J., vol. 7, no. 4, p. 53–66, Oct. 2005. [93]AI Now Institute,General purpose AI poses serious risks, should not be excluded from the EU’s AI act — policy brief,https://ainowi nstitute.org/publication/gpai-is-hi gh-risk-should-not-be-excluded-from -eu-ai-act, Accessed: 2023-9-15, n.d. [94] J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, E. Bluemke, and B. Garfinkel, “Towards best practices in AGI safety and governance: A survey of expert opinion,” May 2023. arXiv:2305 . 07153 [cs.CY]. [95]G. K. Hadfield and J. Clark, “Regulatory markets: The future of AI governance,” Apr. 2023. arXiv:2304.04914 [cs.AI]. Managing extreme AI risks amid rapid progresspage 11 of 11