Paper deep dive

Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries

Leonie Koessler, Jonas Schuett

Year: 2023Venue: arXiv preprintArea: Surveys & ReviewsType: SurveyEmbeddings: 170

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 98%

Last extracted: 3/12/2026, 7:33:45 PM

Summary

This paper reviews established risk assessment techniques from safety-critical industries (finance, aviation, nuclear, biolabs) and evaluates their applicability to managing catastrophic risks associated with Artificial General Intelligence (AGI). It categorizes techniques into identification (scenario analysis, fishbone, taxonomies), analysis (causal mapping, Delphi, cross-impact, bow tie, STPA), and evaluation (checklists, risk matrices), providing recommendations for their integration into the AGI development lifecycle.

Entities (8)

Anthropic · organization · 100%Artificial General Intelligence · technology · 100%Bow Tie Analysis · risk-assessment-technique · 100%Delphi Technique · risk-assessment-technique · 100%Google DeepMind · organization · 100%OpenAI · organization · 100%Scenario Analysis · risk-assessment-technique · 100%System-Theoretic Process Analysis · risk-assessment-technique · 100%

Relation Signals (4)

Scenario Analysis → usedfor → Risk Identification

confidence 100% · Risk identification aims to identify risks and their sources. To that end, scenario analysis uses forward reasoning

Delphi Technique → usedfor → Risk Analysis

confidence 100% · Risk analysis aims to facilitate a deep understanding... The Delphi technique collates expert forecasts

Risk Matrices → usedfor → Risk Evaluation

confidence 100% · Risk evaluation aims to establish whether a risk is acceptable... risk matrices are overall frameworks

OpenAI → developing → Artificial General Intelligence

confidence 90% · Companies like OpenAI... have the stated goal of building artificial general intelligence

Cypher Suggestions (2)

Identify organizations developing AGI. · confidence 95% · unvalidated

MATCH (o:Organization)-[:DEVELOPING]->(a:Technology {name: 'Artificial General Intelligence'}) RETURN o.name

Find all risk assessment techniques categorized by their function (Identification, Analysis, Evaluation). · confidence 90% · unvalidated

MATCH (t:Technique)-[:USED_FOR]->(f:Function) RETURN t.name, f.name

Abstract

Abstract:Companies like OpenAI, Google DeepMind, and Anthropic have the stated goal of building artificial general intelligence (AGI) - AI systems that perform as well as or better than humans on a wide variety of cognitive tasks. However, there are increasing concerns that AGI would pose catastrophic risks. In light of this, AGI companies need to drastically improve their risk management practices. To support such efforts, this paper reviews popular risk assessment techniques from other safety-critical industries and suggests ways in which AGI companies could use them to assess catastrophic risks from AI. The paper discusses three risk identification techniques (scenario analysis, fishbone method, and risk typologies and taxonomies), five risk analysis techniques (causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis), and two risk evaluation techniques (checklists and risk matrices). For each of them, the paper explains how they work, suggests ways in which AGI companies could use them, discusses their benefits and limitations, and makes recommendations. Finally, the paper discusses when to conduct risk assessments, when to use which technique, and how to use any of them. The reviewed techniques will be obvious to risk management professionals in other industries. And they will not be sufficient to assess catastrophic risks from AI. However, AGI companies should not skip the straightforward step of reviewing best practices from other industries.

PDF

Open source PDF →Open local PDF →

Full Text

169,271 characters extracted from source content.

Expand or collapse full text

Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries Leonie Koessler ∗ Centre for the Governance of AI Jonas Schuett Centre for the Governance of AI Abstract Companies like OpenAI, Google DeepMind, and Anthropic have the stated goal of building artificial general intelligence (AGI) – AI systems that perform as well as or better than humans on a wide variety of cognitive tasks. However, there are increasing concerns that AGI would pose catastrophic risks. In light of this, AGI companies need to drastically improve their risk management practices. To support such efforts, this paper reviews popular risk assessment techniques from other safety-critical industries and suggests ways in which AGI companies could use them to assess catastrophic risks from AI. The paper discusses three risk identification techniques (scenario analysis, fishbone method, and risk typologies and taxonomies), five risk analysis techniques (causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis), and two risk evaluation techniques (checklists and risk matrices). For each of them, the paper explains how they work, suggests ways in which AGI companies could use them, discusses their benefits and limitations, and makes recommendations. Finally, the paper discusses when to conduct risk assessments, when to use which technique, and how to use any of them. The reviewed techniques will be obvious to risk management professionals in other industries. And they will not be sufficient to assess catastrophic risks from AI. However, AGI companies should not skip the straightforward step of reviewing best practices from other industries. Pre-deployment risk assessment Casual mapping Delphi technique Cross-impact analysis Bow tie analysis Risk matrices / heat maps Initial development Regular updates to all techniques New project launch Cross-impact analysis Checklists Risk matrices / heat maps Pre-training risk assessment Fishbone method / Ishikawa analysis Causal mapping Delphi technique Bow tie analysis System-theoretic process analysis (STPA) Scenario analysis Risk typologies and taxonomies Training DeploymentMonitoring Figure 1: Exemplary use of risk assessment techniques across the AI system lifecycle ∗ Corresponding author:leonie.koessler@kcl.ac.uk. Leonie Koessler worked on the project as part of the 2023 GovAI Winter Research Fellowship. arXiv:2307.08823v1 [cs.CY] 17 Jul 2023 Executive summary Future AI systems may pose catastrophic risks. Companies that develop and deploy such systems must take adequate measures to manage these and other risks. This paper reviews popular risk assessment techniques from other industries – namely, finance, aviation, nuclear, and biolabs – and discusses how they can be used to assess catastrophic risks from AI. Although some of these techniques may be helpful, they are by no means sufficient. In particular, AI companies should also use techniques like model evaluations for dangerous capabilities and propensities. Selection criteria.To identify techniques, we used IEC 31010:2019, a leading risk assessment standard, as a starting point and added popular techniques from other industries based on a literature review. We then defined criteria for excluding and prioritizing techniques based on the particularities of catastrophic risks from AI. First, we only included techniques that are applicable to societal risks, not just business risks. Second, we focused on techniques that are able to account for low-probability, high-impact events. Third, we excluded techniques that aim to ensure that humans perform reliably on routine tasks (which, however, might become more relevant in the future). Fourth, we also excluded techniques that were originally developed to assess hardware reliability (which might also become more relevant in the future). Finally, we prioritized techniques that can deal with complex interactions between events, help combine the viewpoints of a variety of people, and provide clarity on future developments. [more] Selected techniques.Based on these criteria, we selected three risk identification techniques, five risk analysis techniques, and two risk evaluation techniques. • Risk identificationaims to identify risks and their sources. To that end,scenario analysis uses forward reasoning to develop future scenarios which are then examined for the risks they entail. By contrast, thefishbone methoduses backward reasoning from a risk to its sources.Risk typologies and taxonomiesstructure the risk universe and can identify additional risks. [more] •Risk analysisaims to facilitate a deep understanding of the causes, consequences, and likelihood of risks.Causal mappinghelps to establish causal relationships between events associated with risks. TheDelphi techniquecollates expert forecasts to assess the likelihood of events or scenarios.Cross-impact analysiscombines the functions of the previous two techniques by analyzing expert forecasts on the correlations between events. Bothbow tie analysisandsystem- theoretic process analysis (STPA)focus on controls, i.e. mechanisms supposed to impede risks from materializing. [more] •Risk evaluationaims to establish whether a risk is acceptable or whether its treatment is warranted.Checklistsare useful to decentralize risk evaluation for routine decisions that may add up and increase risks, whilerisk matricesare overall frameworks for deciding on the necessity of treating risks based on consequence and likelihood or vulnerability. [more] When and how to conduct risk assessments.AGI companies should continuously and iteratively assess risks. Pre-deployment risk assessments are particularly important, but AGI companies should also conduct pre-training risk assessments. There is no specific trigger for each of the techniques discussed in this paper. Instead, they can all be applied in various situations, each technique with a different focus or function. In a given situation, AGI companies should avoid relying on a single technique but use several techniques to gain a more complete understanding of risks. Finally, AGI companies need to set up structures and processes to ensure that the results of risk assessments actually have a bearing on decisions. [more] 2 TechniqueExplanationReasons for including Scenario analysis (recommended) Involves developing future scenarios and analyzing them for risks Focuses on future developments; uses forward reasoning Fishbone method (recommended) Involves drawing a diagram from a risk to its sources by repeatedly asking "why?" or "how might that occur?" Provides structure and visualization for brainstorming; uses backward reasoning; simple Risk typologies and taxonomies (strongly recommended) Categorizations of risks; conceptually vs. empirically derived Organizes the risk universe; can provide helpful input to many other techniques Causal mapping (recommended) Involves drawing a map of causes and consequences regarding a specific issue, including their interactions Helps to combine different viewpoints; focuses on future developments; takes into account interactions between events; allows remote and anonymous participation Delphi technique (strongly recommended) Procedure to collect and collate expert judgments, mostly forecasts Helps to combine different viewpoints; focuses on future developments; can be quantitative; allows remote & anonymous participation Cross-impact analysis (encouraged) Involves breaking an issue down into contributing events, gathering and analyzing expert opinions on their likelihood; can yield possible and likely future scenarios Helps to combine different viewpoints; focuses on future developments; takes into account interactions between events; can be quantitative; allows remote and anonymous participation Bow tie analysis (recommended) Involves drawing a diagram of causes and consequences of a risk, as well as preventive and reactive controls Focuses on controls; simple System-theoretic process analysis (STPA) (encouraged) Examines a complex system for safe and unsafe states; involves backward reasoning from undesired events to how controls may not have the desired effect Focuses on controls; comprehensive system-theoretic perspective Checklists (encouraged)Standardized questionnaires containing a list of open or closed questions Allow decentralization of risk evaluation Risk matrices (strongly recommended) Matrices that combine consequence and likelihood of risks / consequence of risks and vulnerability of the system at stake Enable comparisons and prioritization Table 1: Overview of the selected risk assessment techniques 3 1 Introduction Companies like OpenAI, Google DeepMind, and Anthropic have the stated goal of building artificial general intelligence (AGI) – AI systems that perform as well as or better than humans on a wide variety of cognitive tasks. While it remains unclear when, if at all, AGI will be built and forecasting AI progress is inherently difficult, 1 the prospect of AGI is increasingly taken seriously. AGI has reached the mainstream academic discourse(e.g. Pei et al., 2019; Fei et al., 2022; Mahler, 2022; Roli et al., 2022; Salmon et al., 2023), is widely covered in the news(e.g. Yudkowsky, 2023; Hogarth, 2023; Klein, 2023; Metz, 2023), and on the agenda of governments around the world(e.g. The White House, 2023; HM Government, 2021; NITI Aayog, 2018). AI systems already cause significant harm. For example, some facial recognition systems discriminate against women and people of color(Buolamwini & Gebru, 2018; Raji & Buolamwini, 2019). Language models can produce racist, sexist, and homophobic outputs(Bolukbasi et al., 2016; Bender et al., 2021; Weidinger et al., 2022), or can be used for disinformation campaigns(Buchanan et al., 2021) and cyberattacks(Brundage et al., 2018; Hazell, 2023), while image generation systems can be used to create harmful content, such as non-consensual deepfake pornography(Westerlund, 2019). These and other risks must be taken seriously and warrant significant attention. In addition to existing risks, there are increasing concerns about future catastrophic risks, including human extinction. In a recent statement, hundreds of leading AI scientists (e.g. Turing Prize winners Geoffrey Hinton and Yoshua Bengio), the CEOs of leading AI companies (e.g. Sam Altman and Demis Hassabis), and other prominent figures (e.g. Bill Gates) claim that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war”(Center for AI Safety, 2023). AGI companies need to take adequate measures to manage these risks. Relevant risks need to be assessed and adequate mitigations and controls need to be put in place. This paper focuses on risk assessment, which according to ISO 31000:2018 includes the identification, analysis, and evaluation of risks. The results of risk assessments inform decisions about risk treatment(ISO, 2018). In a recent survey of leading experts from AGI companies, academia, and civil society (N= 51), 98% of respondents somewhat or strongly agreed that AGI companies should conduct pre-deployment risk assessments, while 94% thought that they should also conduct pre-training risk assessments (Schuett et al., 2023). Risk assessments could also become an essential part of regulatory frameworks (Anderljung et al., 2023; Shevlane et al., 2023). Our paper aims to provide guidance on how AGI companies could conduct risk assessments with regard to catastrophic risks from AI. We review popular risk assessment techniques from other industries and suggest ways in which AGI companies could use them. It is worth noting that many of the reviewed techniques will be obvious to risk management professionals in other industries. This is a “feature, not a bug”. The goal of this paper is to make sure that AGI companies are aware of best practices in other industries. We assume that some AGI companies already use some of the techniques in some situations, but we also expect there to be gaps. We also emphasize that these techniques will not be sufficient to assess catastrophic risks from AI. The paper has four areas of focus. First, it focuses on catastrophic risks from AI. However, the selected techniques can also be used to assess other risks. Importantly, catastrophic risks may be intertwined with other risks, such that assessing them in isolation may not provide the full picture. Second, the paper focuses on how AGI companies can use these techniques. Needless to say, other actors like academics or governments may also use them to assess catastrophic risks from AI. Third, the paper only reviews existing techniques. We do not develop new techniques and only slightly adapt existing techniques. This would require significant effort and is beyond the scope of this paper. Finally, the paper focuses on general techniques. We do not review techniques that have been developed specifically for AI (e.g. dangerous model capabilities and propensities evaluations, or 1 Some attempts include expert surveys(e.g. V. C. Müller & Bostrom, 2016; Grace et al., 2018; Stein-Perlman et al., 2022; Zhang et al., 2022), comparisons with the development of intelligence in humans(Cotra, 2020), and comparisons with developments of other technologies (Davidson, 2021). 4 “evals”). 2 We also excluded established techniques from the field of information security (e.g. red teaming and bug bounties). 3 The paper reviews techniques that AGI companies could use to assess catastrophic risks from AI. Let us define each of these terms. By “AGI”, we mean AI systems that perform as well as or better than humans on a wide variety of cognitive tasks. 4 By “AGI company”, we mean companies that have the stated goal of building AGI(Schuett et al., 2023). “Risk” can be defined as the possibility that an undesired event will occur(COSO, 2017). 5 “Risk sources” are the causes that alone or in combination give rise to risks(ISO, 2018). 6 “Risk assessment” involves the identification, analysis, and evaluation of risks(ISO, 2018). Note that our use of risk-related terminology follows leading risk management standards, especially ISO 31000:2018(ISO, 2018), IEC 31010:2019(IEC, 2019), and COSO ERM 2017(COSO, 2017). By the term “catastrophic risk” we loosely mean the risk of widespread and significant harm, such as several million fatalities or severe disruption to the social and political global order(Bostrom & ́ Cirkovi ́ c, 2008; Posner, 2004; Rees, 2004; Shevlane et al., 2023). This includes “existential risks”, i.e. the risk of human extinction or permanent civilizational collapse (Bostrom, 2002, 2013; Ord, 2020). 7 The paper proceeds as follows. Section 2 reviews related work. Section 3 describes our method for selecting risk assessment techniques. Section 4 discusses three risk identification techniques: scenario analysis, fishbone method, and risk typologies and taxonomies. Section 5 discusses five risk analysis techniques: causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis (STPA). Section 6 discusses two risk evaluation techniques: checklists and risk matrices. Section 7 discusses when to conduct risk assessments, when to use which technique, and how to use any risk assessment technique. Section 8 concludes with a summary 2 For example, before the release of GPT-4 and Claude, OpenAI and Anthropic gave the Alignment Research Center early access to the models to evaluate their ability to autonomously replicate and acquire resources (OpenAI, 2023; Anthropic, 2023; Alignment Research Center, 2023). For more information on model evaluations, see Shevlane et al. (2023). 3 AGI companies increasingly engage red teams which adopt an attacker’s mindset to test the organization’s security(Applebaum et al., 2016; Brundage et al., 2020; OpenAI, 2023). Google DeepMind is part of Google’s and Alphabet’s “Vulnerability Reward Program”(Google, n.d.), and OpenAI recently announced its “Bug Bounty Program”(OpenAI, 2023). Under the term “red teaming”, AGI companies also try eliciting harmful outputs from their AI systems(Ganguli et al., 2022; Mishkin et al., 2022; Perez et al., 2022), efforts that fit better under terms like “boundary or stress testing” (Khlaaf, 2023). 4 The term “AGI” lacks a universally accepted definition. According to Goertzel (2011), Gubrud (1997) was the first to use the term, defining it as “AI systems that rival or surpass the human brain in complexity and speed, that can acquire, manipulate and reason with general knowledge, and that are usable in essentially any phase of industrial or military operations where a human intelligence would otherwise be needed”. The term “AGI” is related to concepts of “strong AI”(Searle, 1980), “superintelligence”(Bostrom, 1998, 2014), and “transformative AI” (Dafoe, 2018; Gruetzemacher et al., 2019; Karnofsky, 2016). 5 Other common definitions of the term “risk” in risk management comprise “effect of uncertainty on objectives”(ISO, 2018) as well as “combination of the probability of occurrence of harm and the severity of that harm” (ISO & IEC, 2014). 6 A risk source may have its own sources and so on. There can also be complex interactions between different risk sources. As a result, it often makes sense to assess risk sources as intermediate risks themselves. In the following, we generally use the term “risks” to mean both ultimate risks and risk sources as intermediate risks. If we refer to either ultimate risks or risk sources only, we make this explicit. 7 There are various ways in which AI might cause or contribute to catastrophes. A basic distinction can be drawn between cases where humans intend to cause harm (misuse risks), cases where humans do not intend to cause harm (accident risks), and risks involving complex interactions of economic, political, societal, and other forces (structural risks)(Zwetsloot & Dafoe, 2019). First, malicious actors like terrorists could use AI systems to gain access to or create toxins, pathogens, or weapons(e.g. Brundage et al., 2018; Urbina et al., 2022; Anderljung & Hazell, 2023). Second, AI systems do not necessarily do what their developers or operators want them to do. When this issue is related to the goals they pursue, it is referred to as “misalignment”. If a highly capable AI system was power-seeking, i.e. aiming to accumulate resources, it might eventually take over control and sideline or even completely neglect human concerns(e.g. Yudkowsky, 2008; Bostrom, 2014; Russell, 2019). Third, many structural catastrophic risk scenarios from AI have been outlined by, for example, Dafoe (2020), Bucknall and Dori-Hacohen (2022), and Clarke and Whittlestone (2022). Particularly important risk sources may be competitive dynamics that lead AGI companies to cut corners on safety and other desirable features (Hendrycks et al., 2023) and various dangerous model capabilities and propensities(Shevlane et al., 2023), such as “agenticness”(Chan et al., 2023). For existential risks from AI in particular, Kokotajlo and Wei (2019) have brainstormed an extensive but unstructured list of risk sources. 5 of our main contributions, limitations, and suggestions for further research. The Appendix lists popular risk assessment techniques we did not select. 2 Related work In the following, we give an overview of previous work that reviews or applies established risk assessment techniques to catastrophic risks from AI. Risk identification techniques.Many expert surveys have been conducted about the trajectory of AI progress and the associated risks, including catastrophic ones(e.g. Baum et al., 2011; Clarke et al., 2021; Grace et al., 2018; Michael et al., 2022; V. C. Müller & Bostrom, 2016; Stein-Perlman et al., 2022; Zhang et al., 2022). Clarke (2022) and researchers from Google DeepMind(Kenton et al., 2022a) have developed risk typologies for existential risks from AI. Several other typologies involve, but are not limited to, catastrophic risks from AI(Avin et al., 2018; Cotton-Barratt et al., 2020; Liu et al., 2018). Yampolskiy (2015) and Critch and Russell (2023) have developed risk taxonomies for the development of a dangerous AI system, while Critch and Krueger (2020) have done the same for the deployment of a misaligned power-seeking AI. Rao et al. (2023) have come up with a risk taxonomy for user prompts that generate model outputs unwanted by its developers. Otherwise, not for catastrophic, but more broadly for any societal risks from AI, many taxonomies have been developed by independent researchers(e.g. Center for Security and Emerging Technology, n.d.; Khlaaf, 2023; Newman, 2023; Raji et al., 2022; Suresh & Guttag, 2021) as well as researchers from different AGI companies(e.g. Microsoft, 2020; Shelby et al., 2023; Weidinger et al., 2022). Failure modes and effects analysis (FMEA) has been adapted and applied to risks from AI by Li and Chignell (2022) and Raji et al. (2020) hazard and operability study (HAZOP) by Zendel et al. (2015), and preliminary hazard analysis (PHA) by Khlaaf et al. (2022). Risk analysis techniques.Established risk analysis techniques that have been applied to the superintelligence takeover scenario include influence diagrams(Barrett & Baum, 2017a), fault tree analysis (FTA)(Barrett & Baum, 2017a, 2017b), and event tree analysis (ETA)(Barrett & Baum, 2017b). Furthermore, causal mapping has been used by Cremer and Whittlestone (2021) to determine warning signs of transformative AI, as well as by Losi (2023) to identify feedback loops when regulating AI. Kilian et al. (2023) have applied cross-impact analysis to establish correlations between relevant factors in and generate future scenarios of the global socio-technical AI landscape. Chin (2022a) has used bow tie analysis to map controls with regard to the deployment of a harmful AI system. Risk evaluation techniques.Hendrycks and Mazeika (2022, Annex C) have suggested a checklist for assessing the impact of a potential research project on existential risks from AI. Moreover, Barrett et al. (2023a, Section 3.2.2.1.1) have provided a starting point for a pre-development or pre-deployment checklist for catastrophic risks from AI. Apart from these, only impact assessment checklists for any societal risks from AI have been developed, for example, on trustworthy AI by the EU High-Level Expert Group on AI (2020) on AI fairness by Madaio et al. (2020), and on responsible AI by Microsoft (2022b). Again, not for catastrophic, but more broadly for any societal risks from AI, several risk matrices have been developed (e.g. Khlaaf, 2023; Microsoft, 2022a). Review of several risk assessment techniques.The work most similar to this paper is a blog post which provides a great but much more shallow review of several risk assessment techniques (Chin, 2022b). In the academic literature, only Barrett and Baum (2017b) have reviewed several risk assessment techniques. They suggest FTA and ETA for assessing catastrophic risks from AI. We believe that FTA and ETA on their own may be too simplistic to provide deep insights into catastrophic risks from AI, although variations of them may be very useful. Catastrophic risks from AI involve not only failures of individual components of systems, but complex technical, economic, political, and societal factors, events, and their interactions(Khlaaf, 2023; Leveson & Thomas, 2018), see also Section 3 and Section 5.5). In contrast, FTA and ETA involve drawing logical diagrams of causes and consequences of risks, which simplifies risk sources and their interactions into binary events and linear causal chains. Nevertheless, we mention event trees as a tool for developing scenarios (Section 4.1), and include the fishbone method (Section 4.2) which can be said to be a special type of FTA, as well as bow tie analysis (Section 5.4) which can be understood as involving both fault and event trees. However, these techniques go beyond mere FTA and ETA (scenario analysis can employ other methods to develop scenarios; fishbone method provides structure by establishing categories of 6 causes; bow tie analysis focuses on controls and includes ongoing practices), and we also highlight their simplicity as a major limitation. Moreover, Barrett and Baum (2017b) do not provide the criteria which they used to select techniques, and which other techniques they considered. Their paper is also limited to the (at the time predominant) superintelligence takeover scenario, where a single extremely capable AI gets out of hand and destroys humanity. Finally, it is not specifically targeted at AGI companies, but decision-makers in general. Gap in the literature.Overall, we observe extensive interest in the topic of risk assessment techniques for catastrophic risks from AI. International organizations have started to issue frameworks for risk assessment at AI companies, but they are not tailored to catastrophic risks and lack concreteness(Xia et al., 2023). More actionable efforts so far have mostly focused on developing novel techniques specifically for catastrophic risks from AI, such as evals, or applying a single established risk assessment technique to the context of catastrophic risks from AI. Only Barrett and Baum (2017b) have attempted a review of several established risk assessment techniques, but their paper has several limitations (see previous paragraph). In conclusion, there is no comprehensive and up-to-date review of which established techniques could be useful for AGI companies to assess catastrophic risks from AI. 3 Methodology This section describes our method for identifying and selecting risk assessment techniques. We used a leading risk assessment standard as a starting point and added popular techniques from other industries. We then defined criteria for excluding and prioritizing techniques which we used to narrow down the list. The techniques we selected can be found in Table 1. The techniques we excluded can be found in the Appendix 8. Popular risk assessment techniques.The standard IEC 31010:2019 contains a list of some of the most popular risk assessment techniques among different industries(IEC, 2019). To find additional popular techniques, we reviewed risk assessment techniques in finance, aviation, nuclear, and biolabs (e.g. by investigating guidelines from international agencies and textbooks, and consulting with risk management experts). We chose these four industries because they have a long tradition of risk assessment, the latter three being safety-critical industries. While this approach found more than 100 techniques, most of the popular ones had already been contained in IEC 31010:2019. Criteria for excluding techniques.Next, we used four criteria to narrow down the list. First, catastrophic risks affect society as a whole, not only the organization itself. Techniques therefore need to be applicable to societal risks, not just business risks. Second, catastrophic risks from AI are generally considered low-probability, high-impact events(see V. C. Müller & Bostrom, 2016; Grace et al., 2018; Stein-Perlman et al., 2022; Zhang et al., 2022). We therefore excluded techniques that neglect tail risks. Third, some techniques from industries like aviation, nuclear, or biolabs focus on human performance reliability when it comes to routine tasks. Currently, at AGI companies, there seem to be no routine tasks of employees which, if performed incorrectly, could lead to catastrophe. Therefore, we excluded techniques that aim to ensure that humans perform reliably on routine tasks. However, if this changes in the future (e.g. if employees are tasked with overseeing a dangerous AI system or its usage), these techniques should be reconsidered. Fourth, many techniques from industries like aviation or nuclear have been developed to assess hardware reliability. We assume hardware failures to be much less critical for catastrophic risks from AI. Most of these techniques can be applied to issues other than hardware reliability. Yet, because AI systems and the risks they pose are highly complex, the techniques may not be well-suited for this context(Khlaaf, 2023; Leveson & Thomas, 2018). We therefore excluded many of these traditional techniques. However, if hardware becomes more critical in the future (e.g. if hardware-enabled mechanisms are implemented, if one AI system is used to keep in check another dangerous AI system, or if AI systems become part of critical infrastructure), these techniques should be reconsidered. Criteria for prioritizing techniques.Then, we used the following criteria to identify techniques which are particularly promising. Catastrophic risks from AI are highly complex. They involve various technical, economic, political, and societal factors, events, and their interactions(Dobbe, 2022; Khlaaf, 2023). Therefore, we prioritized techniques that can be used to examine complex interactions between events. For this reason, we included system-theoretic process analysis (STPA) (Section 5.5), even though it is not listed by IEC (2019). Because of the complexity of catastrophic 7 FLIGHT OF THE FLAMINGO (inclusive growth and democracy) Current negotiation ICARUS (macro-economic populism) LAME DUCK (long transition) OSTRICH (non-representative government) Are the policies sustainable? Is the transition rapid and decisive? Is a settlement negotiated? Yes Yes Yes No No No Figure 2: Event tree for the political situation of a country (Kosow & Gaßner, 2008) risks from AI, we also prioritized techniques that help combine the viewpoints of a variety of people with different knowledge and perspectives. This may include experts with different backgrounds (e.g. technical, economic, or political), or employees with different positions in the organization (e.g. researchers, engineers, or managers). Furthermore, catastrophic risks from AI are marked by high uncertainty. They have never happened before and most likely will not happen twice (although some risk sources might). We therefore prioritized techniques that provide clarity on future developments. Quantifying the likelihood of catastrophic risks from AI may be especially challenging(Baum, 2020; Beard et al., 2020a, 2020b). We therefore prioritized qualitative over quantitative techniques. However, quantification may be useful to concretize concerns and enable better comparisons and communication about risks, such that it should at least be attempted(Baum, 2020; Beard et al., 2020a, 2020b), see also Section 2 and Section 5.5). We therefore included some quantitative techniques, too. Finally, we aimed for a variety of techniques, and selected a representative technique when several very similar techniques existed. 8 4 Risk identification Risk identification is the first step in the risk assessment process(IEC, 2019; ISO, 2018). AGI companies should try to identify the risks of specific models – before deploying them, but also before training them(Schuett et al., 2023). In addition to that, they should try to identify all relevant risks in the abstract (so-called “risk universe”). Based on our selection criteria, the following three risk identification techniques seem particularly promising: scenario analysis (Section 4.1), fishbone method (Section 4.2), and risk typologies and taxonomies (Section 4.3). 4.1 Scenario analysis In a scenario analysis, organizations develop and analyze future scenarios of the environment they operate in and plan accordingly (e.g. how competitive the “AGI market” will be in two years). They develop scenarios, for example, by combining driving forces (e.g. the number of AGI companies, their business models, moats, etc.) and then analyze these scenarios by thinking through their implications on risks. Scenario analysis is often used by organizations for long-term and emergency planning (IEC, 2019). How it works.Scenario analysis has many variations, but the basic procedure is usually the same (IEC, 2019; Chermack, 2011; Kosow & Gaßner, 2008; van der Heijden, 2005). First, organizations develop one or several plausible future scenarios with regard to a specific issue. This step can involve 8 We followed IEC 31010:2019 in assigning the selected techniques to one of the three risk assessment steps (IEC, 2019). However, some of them comprise elements of several risk assessment steps, and some of them can be used for other purposes entirely, too (like general forecasting). 8 Government approach to energy industry DirigisteLaissez-faire Natural gas development FavorableScenario 1Scenario 2 UnfavorableScenario 3Scenario 4 Table 2: Scenario matrix for the competitive structure of an industry (Kosow & Gaßner, 2008) gathering expert forecasts or using statistical methods, but it can also consist of mere brainstorming and group discussions. A particularly common approach is to identify a small number of driving forces, i.e. key events or trends, and combine them in all possible consistent ways. In the end, organizations should have a list of scenarios and a story for each of them about how it could unfold. For example, with regard to the competitive structures of an industry, the driving forces could be developments in the supply of relevant materials and government approaches(Kosow & Gaßner, 2008). To combine these driving forces, organizations may draw an event tree (a diagram that depicts a chronological sequence of binary events) or construct a scenario matrix (a matrix that combines the two most important driving forces into all four possible scenarios)(Kosow & Gaßner, 2008). Second, in the same or in a separate workshop, organizations discuss each scenario for the risks it creates or exacerbates. Again, this can be done through more or less structured brainstorming and group discussions. Third, moving beyond risk assessment into risk treatment, organizations discuss strategies and emergency response plans. How AGI companies could use it.AGI companies could develop and analyze scenarios, for example, of the trajectory of AI progress, the “AGI market”, and the geopolitical AI landscape over the next few months or several years. With regard to AI progress, driving forces could be major technological breakthroughs, like eliminating model hallucinations. For the trajectory of the “AGI market”, important trends beyond the ones mentioned above may include the open-sourcing of models, the predominance of foundation models, and regulatory approaches. The geopolitical AI landscape could be largely influenced by events in China, the US-China relationship, and trends in the militarization of AI. The AI Index (https://aiindex .stanford .edu) as well as websites such as Epoch (https://epochai.org) and Our World in Data (https://ourworldindata.org/artificial -intelligence) contain data on some of these driving forces, for example, trends in the number of research publications, the amount of investments into AGI companies, and proposed regulation. AGI companies could pick the most important events and trends, combine them to generate scenarios, and discuss the implications of those for risks. In many cases, AGI companies can also directly analyze previously developed scenarios for the risks they entail (e.g. Davidson, 2023; Leung, 2019). AGI companies could also use scenario analysis to develop and analyze emergency situations. This may involve several group discussions in which participants would first need to come up with a list of worst-case scenarios (e.g. a model being leaked, evals revealing high situational awareness of a model, or a model “escaping” into the internet). Next, participants could analyze these scenarios for the downstream risks they entail (e.g. the model being used for various undesired purposes, the model being more likely to be deceptive, or the model causing various types of harm). Finally, participants could develop emergency response plans (which is part of risk treatment and will thus not be elaborated on further in this paper). Benefits.The use of driving forces ensures building on existing information, even if it is scarce. This may provide for more realistic scenarios than mere brainstorming. Another benefit of this technique is that it can be used to systematically investigate different futures instead of focusing on a single scenario. Given the high stakes of catastrophic risks, AGI companies should aim to be “better safe than sorry” and plan ahead for a variety of possibilities. The scenarios developed can also be used to monitor whether things are moving in a dangerous direction. If events keep occurring as implied by a scenario, this can be understood as a warning sign(IEC, 2019; Etzioni, 2020; Cremer & Whittlestone, 2021). Limitations.On the downside, there is little evidence that scenarios developed through this technique actually occur(IEC, 2019). Therefore, they should not be relied on as predictions of how the future 9 Tire failure PeopleMachine EnvironmentMethodsMaterial Mismatched tires Obstruction in wheel welt Tires out of balance Faulty wheel bearing Wheel rotation problem Overly aggressive brake application Poor tire design Incorrect tire material Poor tire selection for conditions Poor runway surface Improper tire inflation Tire changing errors Inadequate inspections Figure 3: Simple fishbone diagram for a plane tire failure (Stolzer, Sumwalt, & Goglia, 2023) will unfold. Another limitation of scenario analysis is that it does not provide guidance on how to choose and combine driving forces. Instead, it largely hinges on the knowledge and expertise of the participants. It is thus crucial to select people with relevant backgrounds and skills. Recommendations.AGI companies probably already develop and analyze future scenarios to inform business decisions. We recommend them to do so with a focus on catastrophic risks from AI. To that end, AGI companies should take a comprehensive approach, developing and analyzing various different scenarios. In particular, we recommend developing scenarios for different timeframes, from the next few months to several years, as well as for emergency situations. As a starting point, AGI companies could develop scenarios through brainstorming and group discussions. For issues that turn out very complex or important, AGI companies could consider more elaborate versions that involve expert forecasts or statistical methods. These could also be outsourced to external consultancy or research organizations. 4.2 Fishbone method The fishbone method, also known as Ishikawa analysis or cause-and-effect diagram, helps organi- zations to identify sources of risks. In contrast to scenario analysis (Section 4.1), which typically uses forward reasoning, the fishbone method uses backward reasoning from an undesired event to its causes and sub-causes. The causal relationships are visualized in a diagram that resembles a fishbone. The fishbone method is a very popular and comparatively simple technique(IEC, 2019; Ishikawa, 1976). How it works.The technique consists of the following steps(IEC, 2019; Ishikawa, 1976; Rausand & Haugen, 2020). To begin with, organizations choose a risk to be analyzed and place it as the “head” of the fish-like structure. For example, in aviation, this could be the failure of a plane tire(Stolzer et al., 2023). Next, they select the primary categories of causes for the risk and place them as the “bones” branching out from the “spine” of the fish. Some commonly used categories include different combinations of “6Ms”, such as machinery (equipment), manpower (people), milieu (environment), methods and processes, management, and money(Stolzer et al., 2023). Then, organizations identify the causes and sub-causes within each category by repeatedly asking questions like “why?” and “how might that occur?”. Organizations repeat these questions until they no longer yield useful information. For example, within the category of machinery, a cause may be mismatched tires, a sub-cause of which may be the replacement of an old tire by a newer version(Stolzer et al., 2023). Once the diagram is complete, organizations review it to ensure consistency and comprehensiveness. Categories and causes should cover technical, human, and organizational aspects. Finally, organizations discuss the identified causes to determine which of them are most important. How AGI companies could use it.AGI companies could use the fishbone method at the level of catastrophic risks or their sources. At the level of the ultimate risks, AGI companies could (literally) flesh out the many risk scenarios that have already been developed at a high level of abstraction (e.g. the “head” could be “takeover by a misaligned power-seeking AI”). At the level of risk sources, for example, they could examine the emergence of dangerous model capabilities and propensities one by 10 one (e.g. the “head” could be “AI system has situational awareness” or “AI system seeks power”). The driving forces determined in scenario analysis (Section 4.1) as well as the classifications developed for risk typologies and taxonomies (Section 4.3) may provide or help to identify categories. When planning training runs or evals, AGI companies could use the fishbone method to investigate why various undesired events might happen in the course of those. For example, they could examine how an “agentic” model could take advantage of the situation to avoid its shut-down. Some categories may be the different dangerous model capabilities and propensities. How could each of them lead to the undesired event? Other categories may be the researchers conducting the evals, and the model evaluation process. How could the model take advantage of the researchers’ cognitive biases and other human characteristics? For instance, could a researcher be convinced by the model to not shut it down? How could the model evaluation process contribute to the undesired event? For instance, are there any points in time when no one is checking what the model is doing, or other critical moments such as shifts between researchers? The fishbone method was originally developed to ensure product quality. In this case, the categories represent the main steps of the product development process, proceeding chronologically from left to right. Each step is examined for how it contributes to the risk under consideration(Ishikawa, 1976). AGI companies could use this chronological version of the technique to examine the impacts on risks of the different steps of their model or product development pipeline (e.g. the different steps in training or in the whole AI system lifecycle from planning to monitoring). Benefits.The use of categories and the iterative questioning down to the roots of risks makes the fishbone method less likely to miss risk sources than mere brainstorming. Its visualization further helps to understand and communicate risks and their sources to relevant stakeholders, such as researchers or leadership(IEC, 2019). In contrast to most other risk assessment techniques, the fishbone method uses backward reasoning. This makes it a valuable addition when it comes to assessing highly uncertain and complex catastrophic risks from AI (Section 7). Finally, the fishbone method is simple and less time-consuming than other techniques (Rausand & Haugen, 2020). Limitations.The fishbone method does not account for interactions between risk sources that do not follow a linear causal chain, such as feedback loops or other synergies. It is thus not suitable for analyzing risks that involve complex interactions between events (e.g. competitive dynamics). Moreover, the technique hinges on the selection of categories. If important categories are missed, so are the respective causes(IEC, 2019). In some cases, the fishbone method may aggregate and visualize information rather than generating information. Simply asking “why?” does not always spark new insights for complex issues people have already spent considerable time reflecting upon. Recommendations.We recommend AGI companies to use the fishbone method to take a systematic and thorough approach to identifying risk sources. Since the technique is simple and takes little time, AGI companies could simply try it out and see whether it is helpful. For example, they could investigate the sources of dangerous model capabilities and propensities in the abstract, or before pre-training, fine-tuning, or evaluating a new model. 4.3 Risk typologies and taxonomies Risk typologies and taxonomies are categorizations of risks. They are conceptually or empirically derived. Risk typologies and taxonomies structure the entirety of previously identified risks and can help to identify additional risks by uncovering gaps(IEC, 2019). Categorizations of risks are considered a must-have by risk management practitioners (Pritchard, 2015; Stolzer et al., 2023). How they work.While typologies are based on abstract concepts, taxonomies are based on empirical data(IEC, 2019). As an exemplary risk typology, risk managers in aviation use the Human Factors Analysis and Classification System (HFACS) which splits the causes of human mistakes into latent conditions within the system and unsafe actions by individuals(HFACS, n.d.; Reason, 2000; Stolzer et al., 2023). As an exemplary risk taxonomy, risk management in biolabs makes use of a taxonomy by the International Committee on Taxonomy of Viruses which categorizes viruses based on their genome, structure, and strategy of replication(Rozo et al., 2017). In practice, hybrid forms often blend risk typologies and taxonomies and may simply be referred to as “risk taxonomies”(IEC, 2019). 11 Misuse of persuasion tools Structural effects of persuasion tools Reducing overall trust in information More destructive weapons Rapid unintentional escalation in conflict Harmfully altering incentives in conflict Engineered pandemics Full-scale nuclear war Stable totalitarianism Value erosion“Lame” future AI-enabled dystopia AI leads to deployment of technology that causes extinction or unrecoverable collapse AI makes conflict (an existential risk factor) more likely/severe AI degrades epistemic processes (an existential risk factor) AI exacerbates other sources of existential risk AI exacerbates other existential risk factors Conflicts between powerful AI systems Takeover by misaligned power-seeking AI Existential risks from AI Figure 4: Typology of existential risks from AI (Clarke, 2022) To develop a risk typology or taxonomy, organizations usually collect all risks and risk sources that have already been identified. Next, they attempt to structure this list in a useful way and fill potential gaps by ideating the missing items. How to best do this is very context-specific. Categorization schemes for risk typologies and taxonomies are typically intended to be mutually exclusive (i.e. risks can only be in one category) and collectively exhaustive (i.e. they cover all relevant risks). Risk typologies and taxonomies can have subcategories or zoom in on a particular category from another risk typology or taxonomy(IEC, 2019). For example, unsafe actions by individuals as a cause of human mistakes can be split into errors and violations, which in turn are composed of decision errors, skill-based errors, and perceptual errors, as well as routine and exceptional violations(HFACS, n.d.; Reason, 2000; Stolzer et al., 2023). Risk typologies and taxonomies are often presented as tree diagrams or tables. How AGI companies could use them.AGI companies already use typologies and taxonomies for some types of risks from AI. For example, researchers from Google DeepMind have created a taxonomy for risks from language models(Weidinger et al., 2022). We strongly suspect that OpenAI and Anthropic also have similar risk typologies and taxonomies, even if they do not make them public. (Note that the list of risks in the GPT-4 system card is explicitly not intended as a taxonomy; OpenAI (2023).) However, we could not find any public information about whether AGI companies also have typologies and taxonomies for catastrophic risks from AI. Since there is no empirical data on catastrophic risks from AI, in this context typologies are more suitable for the ultimate risks, while taxonomies are more suitable for risk sources, for which some empirical data exists. A number of researchers have already proposed typologies for catastrophic risks from AI. The most high-level typology splits catastrophic risks from AI into accident, misuse, and structural risks. For existential risks, Clarke (2022) distinguishes between takeover by misaligned power-seeking AI, AI exacerbating other existential risks, AI exacerbating other existential risk factors, and conflict between powerful AI systems (Figure 4). With regard to accident risks, researchers from Google Source of misalignment Specification gaming (SG) SG + GMGGoal misgeneral- ization (GMG) Path to existential catastrophe Misaligned power-seeking (MAPS) Cohen et al. (2022) Carlsmith (2022), Christiano (2019, part 2), Cotra (2022), Ngo et al. (2023), Shah (2022) Soares (2022), Hubinger (2022) Interaction of multiple systems Critch (2021), Christiano (2019, part 1) ?? Table 3: Typology of accident risks (Kenton et al., 2022a) 12 AI system failuresExamples Impossible tasks • Conceptually impossible • Practically impossible Engineering failures • Design failures • Implementation failures • Missing safety features Post-deployment failures • Robustness issues • Failure under adversarial attacks • Unanticipated interactions Communication failures • Falsified or overstated capabilities • Misrepresented capabilities Table 4: Taxonomy of AI system failures (Raji et al., 2022) DeepMind have developed a typology that distinguishes between technical sources of misalignment and paths to existential catastrophe. They find that some combinations are missing(Kenton et al., 2022b, 2022a; Krakovna & Shah, 2023). AGI companies could develop additional typologies for misuse risks as well as risks at the intersection of accident and misuse. There are also typologies of catastrophic risks in general. For example, Cotton-Barratt et al. (2020) distinguish between origin (e.g. accident), scaling mechanism (e.g. cascading), and endgame (e.g. ubiquity). Similarly, Avin et al. (2018) distinguish between critical system affected (e.g. food chains), global spread mechanism (e.g. digital), as well as prevention and mitigation failure (e.g. cognitive biases). AGI companies could narrow these two typologies down to catastrophic risks from AI. While they already contain some catastrophic risks from AI, they may also be used to identify additional ones. For instance, Avin et al. (2018) only mention weaponization as a catastrophic risk from AI. However, AGI companies can use their typology to identify many other plausible combinations. One example would be food chains breaking down once they completely rely on AI systems due to automation bias, i.e. the belief that machines are more accurate or reliable than humans. There are also a number of taxonomies for sources of catastrophic risks from AI. Yampolskiy (2015) and Critch and Russell (2023) have developed risk taxonomies for the development of a dangerous AI system. Yampolskiy (2015) distinguishes between timing pre and post deployment, as well as different motivations by the actors involved. Critch and Russell (2023) build on this and include diffuse responsibility among several actors. Critch and Krueger (2020) have developed a risk taxonomy for the deployment of misaligned power-seeking AI. They distinguish between uncoordinated deployment (relevant for risks caused by the interactions of several different systems by several different companies), unrecognized prepotence (understood as extremely high capability), unrecognized misalignment, involuntary deployment, and voluntary (e.g. malicious or indifferent) deployment. Rao et al. (2023) have come up with a risk taxonomy for ways in which users can prompt models to generate output that was not intended by the developers. They distinguish between types of attacks, namely instruction-based or non-instruction based, the intent of the attack, which can be goal hijacking, prompt leaking, or denial of service, and the manner of the attack, that is whether the attack is conducted by a user or by a separate “man-in-the-middle” who alters the input of a user. AGI companies could develop further taxonomies of sources of catastrophic risks from AI. In order to cover all risk sources, AGI companies should also consider the whole AI system lifecycle (OECD, 2023; Newman, 2023; Suresh & Guttag, 2021). Useful categories may be: risks stemming from data, compute, model architecture, pre-training, fine-tuning, evaluating, testing, deployment, and monitoring. Some more fine-grained categories may be: risks related to different AI training techniques (e.g. supervised learning, self-supervised learning, reinforcement learning), AI model modalities (e.g. language, vision, or multimodality), and AI system applications (e.g. chat, search, or code, research, or image generation). AGI companies should also include risks associated with all the main actors involved, such as engineers, researchers, managers, downstream developers, users, other AGI companies, governments, or the general public. Alternatively, AGI companies may start 13 thinking backwards from different types of harm AI systems might cause (e.g. financial cost, damage to critical infrastructure, or fatalities) as has been done for any risks from AI by researchers from Google DeepMind(Weidinger et al., 2022), Google(Shelby et al., 2023), and(Microsoft, 2020). Generally, the various taxonomies developed with regard to any societal risks from AI may serve as blueprints or starting points for taxonomies focused on catastrophic risks from AI(e.g. Center for Security and Emerging Technology, n.d.; Khlaaf, 2023; Raji et al., 2022). Benefits.Risk typologies and taxonomies have three main benefits. First, they can help to avoid blind spots(IEC, 2019). Without a structured approach to risk identification, organizations will likely miss risks. Second, risk typologies and taxonomies can help to build a common understanding of the risk landscape among different people within the AGI company(IEC, 2019). This is important because, for instance, managers and researchers may need to be involved to effectively address risks. Therefore, they need to be on the same page about what constitutes a risk. Third, risk typologies and taxonomies can support other risk assessment techniques. For example, they are necessary to generate checklists (Section 6.1), and they can provide or help to identify the driving forces of an issue as part of scenario analyses (Section 4.1) and the causes of a risk as part of the fishbone method (Section 4.2). Limitations.The main limitation of risk typologies and taxonomies is that creating them is very time-consuming. As an illustration, Google DeepMind’s taxonomy for risks from language models has 23 authors(Weidinger et al., 2022). When it comes to catastrophic risks from AI, complexity and a lack of empirical data may further complicate the development of risk typologies and taxonomies. Another limitation is that, in practice, it is rarely possible to create risk typologies and taxonomies that are fully comprehensive. They will most likely miss some risks or risk sources. AGI companies should therefore avoid relying on a single risk typology or taxonomy. Recommendations.We strongly recommend AGI companies to use risk typologies and taxonomies if this is not already the case. They should create and maintain at least two complementary typologies of catastrophic risks because a single risk typology will likely miss important risks. For the same reason, they should also maintain several taxonomies for each individual risk source. AGI companies may want to zoom in on a number of risk sources that seem particularly important (e.g. dangerous model capabilities and propensities). All risk typologies and taxonomies should be reviewed and updated on a regular basis (e.g. every three or six months). 5 Risk analysis Risk analysis is the second step in the risk assessment process. It aims to facilitate a deep understand- ing of the causes, consequences, and likelihood of risks(IEC, 2019). In the following, we discuss five risk analysis techniques: causal mapping (Section 5.1), Delphi technique (Section 5.2), cross-impact analysis (Section 5.3), bow tie analysis (Section 5.4), and system-theoretic process analysis (STPA) (Section 5.5). 5.1 Causal mapping Causal mapping is an exploratory technique that helps organizations to better understand complex interactions between different causes and consequences of risks. It involves a group of people collectively drawing a map of events and their relationships (IEC, 2019). How it works.To develop a causal map, organizations follow a particular sequence of steps(IEC, 2019; Ackermann et al., 2014; Bryson et al., 2004). First, they brainstorm events related to an important issue. For instance, relevant events with regard to risks when connecting renewables to a local electricity grid could be missed deadlines, an increase in energy prices, and tenants not signing up to the scheme(Ackermann et al., 2014). Second, organizations cluster the identified events based on similarities and relationships. For example, events may be related to suppliers, tenants, or project management(Ackermann et al., 2014). Third, organizations draw arrows between the events to show how they influence each other. For instance, the aforementioned risks may all contribute to higher project costs(Ackermann et al., 2014). Finally, organizations analyze the entire causal map for central events, clusters, feedback loops, and other patterns. For example, when connecting renewables to a local electricity grid, one major concern may be project cost overruns(Ackermann et al., 2014). 14 Increase project and beyond costs Lack of bids put forward to generate power (e.g. small wind farm) Losing match funding due to tenants not signing up to the scheme Not meet timescale and deadline Ongoing maintenance costs associated with nuisance call outs Lack of tenant understanding (e.g. switching heaters off) Inappropriate technology for use New technology like heat pumps, adding power and heating complexity Dump power whilst still paying for it Have too much electricity Develop cost effective wave and tidal energy Unable to get investment to set up turbines Lack of available suitably qualified staff Lose council funding Incur penalties from regulators re why still old power station operating Miss legislation deadline re emissions Increase in the cost of renewable energy technology Inexperienced stakeholders not understanding differences Experience installation contract, financial risk Less properties being available than tendered for Figure 5: Segment of a causal map for risks when connecting renewables to a local electricity grid (Ackermann et al., 2014) To draw causal maps, organizations can use whiteboards and sticky notes, or software that has been developed specifically for that purpose (Wikipedia, n.d.). How AGI companies could use it.AGI companies could use causal mapping to build on other techniques which identify risks but neglect their interactions, such as scenario analysis (Section 4.1) or the fishbone method (Section 4.2). For example, AGI companies could use causal mapping to explore interdependencies of previously identified dangerous model capabilities and propensities. This may reveal how some of them enable, contribute to, or hinder the development of others and could guide the focus of evals and safety research efforts in general. Similarly, Cremer and Whittlestone (2021) have used causal mapping to investigate milestones for transformative AI. They found that internal representations, memory, and the ability to account for unobservable phenomena may be particularly crucial enablers, and thus warning signs (Figure 6). Causal mapping may be particularly helpful for exploring competitive dynamics, which involve many complex interactions. For example, AGI companies could draw a map of the effects of an external audit framework. Among other things, this may provide information on the importance of standards or regulations, whether AGI companies should announce external audits (before or after they take place), and whether auditors should publish their findings. On a similar topic but higher level of Causality Object Permanence Hierarchical Decomposition Representation, Variable- Binding, Disentanglement Catastrophic Forgetting Reinterpretations Uncertainty Estimation Posit Unobservables Concept Formation Flexible Memory Dynamic Data Theorising, Hypothesizing Meta-Learning, Architecture-Search Adversarial Attacks Analogical Reasoning, Overfitting Continual Learning Brittle Environments Common Sense Visual Question Answering Cross-Domain Generalization Efficient Learning Active Learning Misguided Data Collection Grammer Mathematical Reasoning Context-Dependent Decisions Environmental Pressure Scalability Reading Comprehension Figure 6: Causal map of warning signs of transformative AI (Cremer & Whittlestone, 2021) 15 Failure of regulation Poorly-crafted or outdated regulation Support for regulation Well-crafted regulation Surface-level compliance Controls Waste Regulation Perception of risk (fear) Pace of innovation Residual risk Incidents + + + ++ +– – + + + + – –+ + + Figure 7: Causal map of feedback loops with regard to regulating risks from AI (Losi, 2023) abstraction, Losi (2023) has used causal mapping to identify feedback loops with regard to regulating risks from AI. For instance, the diagram shows how ineffective regulation may decrease the support for new regulation (Figure 7). Benefits.Causal mapping reveals interactions between risks. Without a deliberate attempt to identify these, AGI companies will likely miss some (e.g. how situational awareness affects power-seeking behavior). This would be bad because interactions may be important for efficient and effective risk mitigation (e.g. which dangerous model capabilities or propensities to focus on with evals). The technique is relatively simple and not very time-consuming. Using software for causal mapping has a number of additional benefits. First, it enables anonymous and remote participation(IEC, 2019). This lowers the barriers for leading experts to work together, and might even reduce concerns around business secrets in cases where these experts are employed by different AGI companies. Second, specialized software streamlines the whole process(Ackermann et al., 2014). Third, causal maps developed with the help of software can be more easily updated as new knowledge is gained(Cremer & Whittlestone, 2021). Limitations.Directing a causal mapping exercise with a group can be somewhat challenging for the facilitator. The use of software can help with this to some extent(IEC, 2019). It should, however, be noted that most software costs money (usually a couple of hundred or thousand dollars). But there is often a free software version AGI companies could try out. Finally, causal mapping is a qualitative technique, meaning it does not provide the likelihood of events. There are specific variations of it that can be quantified, such as fuzzy causal mapping(Cremer & Whittlestone, 2021). But this is not how the technique is typically used. Recommendations.We assume that AGI companies already use whiteboards to do some types of causal mapping exercises. We recommend them to try out causal mapping as described in this paper. To that end, we specifically recommend trying out software tools. In order to reap as many benefits as possible from a causal mapping exercise, AGI companies may want to update maps they have developed as new knowledge is gained. 5.2 Delphi technique The Delphi technique is a particular process of collecting and collating expert judgment. The technique, which was originally developed for military purposes(RAND, n.d.), is typically employed in forecasting(IEC, 2019; Chapelle, 2019; Pritchard, 2015), including superforecasting(Good Judgment, n.d.). In the risk analysis context, the Delphi technique can be used to assess the likelihood of risks. How it works.The detailed procedure is as follows(IEC, 2019; Chapelle, 2019; Pritchard, 2015). First, organizations develop a set of questions that warrant expert input and foresight. For instance, questions could be about the probability or extent of specific events occurring at a given time horizon. As a concrete example, the Delphi technique has been used to estimate the percentage of electric cars within ten years(Johnson, 1976). Second, organizations send the questions to a number of experts (from a handful to a hundred or even more). This can be done through questionnaires or software 16 developed specifically for that purpose(e.g. Good Judgment, n.d.). Third, the experts provide their answers to the questions independently, along with their reasoning. Fourth, organizations collect and share the responses among the experts, without revealing which response belongs to whom. This may take the form of sharing all responses or synthesizing them into a summary, but should always include the reasoning behind responses. Fifth, the experts can reassess their initial answers based on the information provided. Organizations continue this cycle of sharing and revising responses until a consensus is reached, or until no further changes in opinions are observed. Typically, about two to four rounds are conducted. How AGI companies could use it.AGI companies could use the Delphi technique to inform particularly important decisions. For example, before deploying a model, AGI companies could use the Delphi technique to obtain estimates on the likelihood of this provoking specific actions by competitors. OpenAI (2023) has already relied on professional forecasters in this situation. They predicted, among other things, that delaying the launch of GPT-4 would reduce competitive dynamics. In this situation, the Delphi technique could be used to obtain probabilities for specific risks, like competitors releasing similar models, such as Google’s Bard(Hsiao & Collins, 2023), receiving more investments(e.g. Wiggers, 2023), or even new AGI companies being founded, such as the European start-up “Mistral AI”(Bradshaw & Abboud, 2023). Moving beyond risk assessment into risk treatment, a natural addition may be to ask the experts to what extent measures like delaying the launch by different amounts of time would reduce the probabilities of these risks. Before pre-training or fine-tuning a model, AGI companies could use the Delphi technique to forecast the likelihood of the emergence of various dangerous model capabilities and propensities. To that end, the experts could be provided with the model architecture, the intended process (e.g. dataset, compute, loss function, optimizer, hyperparameters, etc.), dangerous capabilities and propensities of similar models, and other information deemed relevant. They could then be asked to predict which dangerous model capabilities and propensities may appear or be reinforced by pre-training or fine-tuning as it currently is envisioned. The results could inform changes to the process or safeguards to be put in place in order to keep the model in check. Benefits.The Delphi technique is a potentially fairly accurate way of estimating the likelihood of risks. For known quantitative values and near-term forecasting, it is usually more accurate than both the average of individual judgments and the results of unstructured group discussions(Rowe & Wright, 1999). Yet, its accuracy may depend to a large extent on the level of detail of the information exchange between rounds, and the subject matter knowledge as well as forecasting skills of the experts(Rowe & Wright, 1999). There seems to be only one study on the long-term accuracy of the Delphi technique. This involved a forecasting exercise on the mental health profession in the US spanning 30 years. Out of 18 scenarios that were suggested by the facilitators, participants correctly predicted whether they would occur in 14 instances, and for those accurately predicted the time course within about 1 to 5 years(Parente & Anderson-Parente, 2011). Other benefits are the possibility of anonymous and remote participation (IEC, 2019). Limitations.A major limitation of the Delphi technique is that expert forecasting in general has a number of inherent limitations, and is especially hard with regard to unprecedented events(Armstrong et al., 2014; de Neufville, 2023; Morgan, 2014). Therefore, the results of the Delphi technique should not be taken at face value. Another limitation is that if external experts are engaged, they often need proprietary information. Otherwise, the technique is similar to prediction markets where forecasters exchange arguments. While external experts could sign non-disclosure agreements (NDAs), some of them may not want to do so, and AGI companies may still not entrust them with the most sensitive information. Hiring external experts can also be costly. Furthermore, the Delphi technique can be very time-consuming. Depending on the number of experts and cycles, as well as the difficulty of the questions, it can take from a couple of days to weeks or even months. Overall, different versions of the technique are possible, but there is a certain trade-off between effort and thoroughness. Recommendations.We strongly recommend AGI companies to use the Delphi technique to estimate the likelihood of key risks. The technique seems most warranted in especially important situations, such as before deploying, and potentially before pre-training, fine-tuning, or evaluating a new model (Schuett et al., 2023). We recommend AGI companies to involve external forecasters that have a strong track-record, such as Samotsvety (https://samotsvety.org). 17 Tech. 2 Tech. 7 Tech. 9 Tech. 11 Tech. 1 Tech. 12 Tech. 4 Tech. 5 Tech. 3 Tech. 10 Tech. 13 Tech. 8 Tech. 6 Tech. 1Tech. 2Tech. 3Tech. 4Tech. 5Tech. 6Tech. 7Tech. 8Tech. 9Tech. 10Tech. 11Tech. 12Tech. 13 Tech. 1100.29000000000.3720 Tech. 2010.28900.54300000000.364 Tech. 30010000000.41300.2080 Tech. 400010.246000.3490.2940000.235 Tech. 500.50900.2061000.4300000.383 Tech. 60000010000000 Tech. 70000001000000 Tech. 80000.4010.43600100000.237 Tech. 90000.31700.2090010000 Tech. 10000.6190000001000 Tech. 1100000.2040000.2050100 Tech. 120.28900.27600.2080000.29700.20910 Tech. 1300.4900.2950.521000.240000.221 ab Figure 8: Cross-impact matrix (a) and interdependency map (b) of the sub-technologies of humanoids (Kim et al., 2016) 5.3 Cross-impact analysis Cross-impact analysis is another technique that helps organizations to better understand interactions between different events that contribute to a risk. In contrast to causal mapping (Section 5.1), which determines causal influences, cross-impact analysis establishes correlations. Cross-impact analysis gathers expert forecasts on the likelihood of events – similar to the Delphi technique (Section 5.2), but also takes into account the effects of other events that might occur. It can also be considered an elaborate way to develop scenarios for scenario analysis (Section 4.1)(IEC, 2019; European Foresight Platform, n.d.; Gordon, 1994). How it works.Cross-impact analysis is a very complex technique. Basically, it starts with breaking down an issue (e.g. advances in AI) into potentially contributing events (e.g. advances in hardware or algorithms). It then involves gathering expert forecasts on the likelihood of the occurrence of these events, independent and conditional on each of the other events occurring. Through running a computer analysis on these estimates, a map that depicts the strength of the relationship between the different events, and future scenarios of how the whole issue may develop can be generated. Different versions of the technique that require different levels of effort are possible(IEC, 2019). The simplest option is to rely on software developed specifically for that purpose. While some software is available for free(e.g. Centre for Interdisciplinary Risk and Innovation Research, n.d.), consulting firms may also provide software for purchase. The detailed steps of cross-impact analysis are as follows. To begin with, organizations create a matrix of events they assume to be contributing to an important issue. The matrix should contain all events on both axes. In an example from the literature, relevant events for the creation of humanoids, i.e. robots that resemble a human body, were deemed advancements in the necessary sub-technologies (Kim et al., 2016). Then, to fill the matrix, organizations ask experts about the likelihood of each event occurring independently and conditional on each of the other events occurring at a given time horizon. These estimates can be quantitative (i.e. probabilities) or qualitative (i.e. indications on a scale, e.g. from -3 to 3) (IEC, 2019; European Foresight Platform, n.d.; Gordon, 1994). Afterwards, organizations use software or statistical methods, usually Monte Carlo simulations, to confirm the consistency of the estimates provided by the experts. The resulting “cross-impact matrix” displays stronger and weaker correlations between events and can be visualized in an “interdependency map”. For the creation of humanoids, some sub-technologies were found to be much more correlated than others(Kim et al., 2016). Finally, again with the help of software or statistical methods, organizations can generate likely future scenarios. They can also perform so- called “sensitivity analyses”. To that end, they need to change specific inputs, i.e. aggregated expert opinions on independent or conditional likelihood, and observe how this affects the cross-impact matrix, the interdependency map, and the scenarios. This can help identify the most influential events – those whose changes have the most significant overall effects(IEC, 2019; European Foresight Platform, n.d.; Gordon, 1994). How AGI companies could use it.AGI companies could use cross-impact analysis to inform particularly important decisions. For example, when planning their business strategy for the next 18 one or several years, AGI companies could use cross-impact analysis to gain a better understanding of potential competitive dynamics over this period of time. First, AGI companies would need to assemble potentially contributing events. Such events may include, among other things, new model deployments by other AGI companies, the opening of funding rounds by existing AGI companies, and the founding of new AGI companies. Second, AGI companies could ask experts about the likelihood of each of these events occurring, and about the effect of one event happening on the likelihood of each of the other events occurring. Based on the estimates, they could create a cross-impact matrix and draw an interdependency map. Finally, generating future scenarios and performing sensitivity analysis may be valuable. For instance, AGI companies could test how the decision to release a new model would influence all other events and the likely scenarios. Similarly, Kilian et al. (2023) have used cross-impact analysis to generate future scenarios of the global socio-technical AI landscape. They inferred four clusters of possible futures – slow progress and decentralized diffusion of AI technology in an environment of national protectionism and isolation, moderate progress and multipolar diffusion leading to a transition of power from nation states to leading AI companies, moderate progress and decentralized diffusion of various low-capability AI systems in different parts of the economy, and fast, centralized progress originating in a non-Western country. The authors find that each cluster of scenarios implies different risks. Benefits.Cross-impact analysis is a sophisticated technique that can provide advanced insights on mutual influences of events, possible and likely scenarios, and the impacts of decisions and changes of circumstances. In other words, it provides organizations with three helpful things: expert forecasts on the likelihood of events occurring conditional on other events occurring, scenarios of how the future may unfold, and a method to observe the impact of specific decisions or external changes on all other events and the overall scenarios(IEC, 2019). Cross-impact analysis might improve the accuracy of forecasts by ensuring their internal consistency(see Schweizer, 2020) and requiring forecasters to consider the impact of different events on each other(European Foresight Platform, n.d.). Cross-impact analysis also allows for anonymous and remote participation (IEC, 2019). Limitations.While there are theoretical arguments in favor of the accuracy of cross-impact analysis, we could not find empirical studies. As with any expert forecasting technique, its results should be taken with a grain of salt(Armstrong et al., 2014; de Neufville, 2023; Morgan, 2014). The sophistication of cross-impact analysis also comes with the downside of it being very complicated and time-consuming. Gathering expert opinions and analyzing them can take up to several months – depending on the number of experts and events, and the choice of software or statistical method. Generally, the use of any software simplifies the use of the technique to some extent. Another limitation is that Monte Carlo simulations, which are also embedded in most software mentioned above, focus on the most likely scenarios, potentially neglecting extreme outcomes like catastrophic risks (IEC, 2019). Recommendations.We encourage AGI companies to use cross-impact analysis before particularly important decisions, such as the deployment of a new model(Schuett et al., 2023). AGI companies could start with a group of internal experts and rely on software(Centre for Interdisciplinary Risk and Innovation Research, n.d.). If this proves useful, they may consider engaging external experts, or commissioning research or consultancy organizations with conducting a more thorough analysis. We recommend AGI companies to attempt quantification (i.e. asking for probabilities instead of indications on a scale). 5.4 Bow tie analysis Bow tie analysis helps organizations to examine the effectiveness of their controls with respect to different risks. In risk management, controls are mechanisms that are supposed to reduce the likelihood or impact of undesired events. Bow tie analysis involves mapping causes, consequences, and controls of an undesired event in a diagram that resembles a bow tie. It is a very popular and comparatively simple technique (IEC, 2019). How it works.Bow tie analysis consists of the following steps(IEC, 2019; Book, 2012; McConnell & Davies, 2006). First, organizations choose an undesired event and place it at the center of the diagram. For example, this could be a product that does not fulfill its requirements(Aqlan & Mustafa Ali, 2014). Next, they collect the causes that could lead to this event and position them on the diagram’s left side. A cause for a non-conforming product may be the accidental use of wrong materials(Aqlan 19 Release of wrong raw materials Addition of the wrong chemical Insufficient cleaning of tanks & pipes Non-conforming product Customer dissatisfaction Loss of reputation Waste of money Waste of time Materials are issued via computerized system Materials are released with an identification sticker QC supervision Perform a checklist and adding cleaning as the last step in a batch filling Materials are revised by line QC before releasing them from warehouse Continuous supervision by the QC supervisor during production QC inspection during filing Preventive barriersProtective barriersRisk factorsRisk impactRisk event OR Figure 9: Bow tie diagram for a non-conforming product (Aqlan & Mustafa Ali, 2014) & Mustafa Ali, 2014). Organizations then identify preventive controls, which are measures that aim to avert the undesired event, and place them between the causes and the central knot. For instance, this could include labeling materials with stickers(Aqlan & Mustafa Ali, 2014). Subsequently, they determine the possible consequences of the undesired event and situate them on the diagram’s right side. A consequence of a non-conforming product may be dissatisfaction of customers(Aqlan & Mustafa Ali, 2014). Organizations then identify reactive controls, which are measures supposed to minimize the event’s impact after it has occurred, and place them between the consequences and the central knot. For instance, this could be reviewing products before they leave the warehouse(Aqlan & Mustafa Ali, 2014). In more sophisticated versions, organizations also determine escalation factors or conditions that could cause the controls to fail or become less effective, as well as controls that address these escalation factors. Finally, they incorporate ongoing activities that contribute to maintaining and enhancing the effectiveness of all controls. For example, these may include design improvements, maintenance, verifications, procedures, checklists, guidelines, trainings, oversight, audits, and inspections(IEC, 2019). How AGI companies could use it.AGI companies could draw bow tie diagrams to examine their controls both with regard to undesired actions by an “agentic” model as well as malicious actors. To that end, they could build on the causes identified with the fishbone method (Section 4.2), as well as the causes and consequences developed through causal mapping (Section 5.1). For example, AGI companies could use bow tie analysis to depict the event of the model copying itself during or right after training. Relevant causes may include the model having access to the training software or additional hardware, or being able to convince humans to provide such access. To impede these causes, preventive controls may comprise access barriers for both the model and humans interacting with the model (e.g. air gapping or multi-party authentication). Potential consequences of the model copying itself may include the model or copies of it evading off-switch mechanisms, uncertainty about whether all copies of the model have been found, and the whole event going unnoticed. To mitigate these, reactive controls may encompass, again, physical access controls for both the model and humans interacting with it, automated mechanisms to detect model copies, and incident response plans. While monitoring a deployed model, AGI companies could use bow tie analysis to picture the undesired event of users bypassing API safeguards, like rate limits. Potential causes for this event may include users creating multiple accounts and thus increasing the number of their API keys, hacking into other users’ accounts and stealing API keys, or using the same API keys on several devices at the same time. Preventive controls that might hinder these causes to result in the bypassing of rate limits may include various authentication mechanisms, IP-based rate limiting, and limits on the number of devices that can use a single API key at the same time. Should users still manage to bypass rate limits, an exemplary consequence may be users training other models on a large number of outputs. Already, researchers have done this in a presumably legal way(Peng et al., 2023; Taori et al., 2023; The Vicuna Team, 2023). Reactive controls that help mitigate these consequences include automated alerts for abnormal usage patterns, human review of use cases, and suspension of offending accounts. 20 Event cause 1 Event cause 2 Event cause 3 Event cause 4 Consequence 1 Consequence 2 Consequence 3 Escalation factorEscalation factor Sources of risk Areas of impact Event Management activities that support barriers Common control Barrier to escalation Preventative controls Reactive controls Figure 10: Bow tie diagram template (IEC, 2019) Benefits.In contrast to most other techniques, bow tie analysis focuses on controls which gives a more complete picture of the actual level or risk. Its visualization further helps with understanding and communicating risks inside or outside the AGI company, such as to leadership or auditors. Bow tie analysis has the additional benefits of being simple and less time-consuming than other techniques. Beyond risk assessment, it may reveal gaps and necessary improvements, thus providing guidance for risk treatment (IEC, 2019). Limitations.The flipside of its simplicity is that bow tie analysis is prone to over-simplification. It assumes a linear causal chain as it ignores interactions between different causes, consequences, and controls. It is therefore not suitable for analyzing risks which involve complex interactions between events (e.g. competitive dynamics)(IEC, 2019). Bow tie analysis may often make knowledge explicit rather than generating new knowledge(McConnell & Davies, 2006). Finally, some maps developed through bow tie analysis need to be very well protected, because they would reveal sensible information that malicious actors could exploit. Recommendations.We recommend AGI companies to use bow tie analysis to take a systematic approach to their controls. Since the technique is simple and easy to create, AGI companies could try it out and see whether it is helpful. AGI companies should also update the maps as changes are made and observed, and new knowledge is gained. 5.5 System-theoretic process analysis (STPA) System-theoretic process analysis (STPA) helps organizations to assess the effectiveness of their over- all control structure. Compared to bow tie analysis (Section 5.4), STPA is much more sophisticated and complicated. It has been developed to deal with the increasing complexity of systems, especially systems that entail software. The technique assumes that undesired events may not only be caused by failures of individual components of a system, but also by complex interactions between them. STPA involves backward reasoning from undesired events to controls and why they might not have the intended effect (Khlaaf, 2023; Leveson & Thomas, 2018; Rausand & Haugen, 2020). How it works.STPA is a very complex technique. It is based on a different theory of risk assessment and, as such, uses a different terminology. First, organizations generate background information, namely, define a system (e.g. a particular model that interacts with the real world in various ways) and determine undesired events that could stem from this system (e.g. the model telling a user how to build a weapon). Then, organizations model the various ways in which the behaviors or states of the system are constrained to prevent all of these undesired events (e.g. no internet access, hard-coded responses, review of use cases). STPA provides guidance on questions that can help to identify 21 Controller BController AController A Controlled Process Controller BControlled Process Controller A Controlled Process Controller B Controller A Controller BController C Controlled Process ZControlled Process Y Controlled Process W Controlled Process X abc d Figure 11: Simple examples of system control structures (Leveson & Thomas, 2018) Wheels Flight Crew Other subsystems Hydraulic Controller Autobrake Controller Hydraulics Aircraft Wheel Braking Subsystem (WBS) Wheel Brakes Brake Braking mode (normal/alternate) Touchdown Rejected Takeoff Manual controls (engine throttle, steer, reverse thrust, etc.) Arm and set, disargm BSCU power on/off BSCU mode BSCU fault Autobrake mode Programmed decel rate Other system modes, states Brake (pedal) Open/close valves Wheel speed Manual braking state Wheel speed Brake cmd Figure 12: System control structure of a plane’s wheel braking subsystem(Leveson & Thomas, 2018) the many and often unexpected reasons why these constraints may not work as intended (e.g. over the course of many interactions with the system, a researcher might become falsely and secretly convinced that the model is conscious and provides it with internet access to “liberate” it)(Leveson & Thomas, 2018; Rausand & Haugen, 2020). The detailed steps of STPA are the following(Leveson & Thomas, 2018; Rausand & Haugen, 2020). First, organizations define the system to be analyzed, with a clear outline of its boundaries. The system, as opposed to its environment, may only include parts over which the organization has some control. Next, organizations compile “losses” (undesired events), “system-level hazards” (states of the system that may lead to losses), and “system-level constraints” (states of the system that may prevent losses) in a list or table. For example, in aviation, losses include human fatalities or 22 injuries, as well as damage to the plane or other objects. System-level hazards encompass the plane leaving its designated runway or the plane coming too close to other objects. On the other hand, system-level constraints are their inversions, namely, the plane keeping its designated runway and the plane maintaining a safe distance to other objects (Leveson & Thomas, 2018). After generating this background information, organizations draw a model of the “system control structure” (the various ways in which the behavior and states of the overall system are controlled). This includes “controllers” (technical or human entities that decide about control actions), “control actions” (actions by controllers to enforce constraints), “controlled processes” (processes controlled by controllers), “actuators” (mechanisms through which controllers act upon their controlled pro- cesses), and “sensors” (mechanisms through which controllers receive feedback from their controlled processes). The process of creating this model progresses from abstract to more and more concrete. In particular, actuators and sensors can be added during later stages of the analysis. For example, the control structure of a plane’s wheel braking subsystem can be modeled within the aircraft and inter- acting with the flight crew. The flight crew is a human controller, whereas autobrake and hydraulic controllers are technical ones. These controllers work together in a hierarchical control structure to control the processes of hydraulics and ultimately wheels. There are various control actions, such as power buttons and brakes (Leveson & Thomas, 2018). Once the model of the control structure is complete, organizations identify “unsafe control actions (UCAs)”. A control action is considered unsafe if it could lead to a hazard. UCAs can involve actions that are not provided, actions that are unsafe in their context, actions that are improperly timed, or continuous actions that last too long or stop too soon. In the example from aviation, a UCA is “flight crew does not provide manual braking during landing when autobrake is not providing braking or is providing insufficient braking”. Then, organizations determine the causes of these UCAs. These can range from physical failures of controllers, incorrect input from sensors or other controllers, inadequate decision algorithms, and incorrect process models to issues with control action transmission, actuator response, or controlled process response. A potential cause for these two controllers in the aviation example failing may be “wheel speed signals may momentarily reach zero during anti-skid operation, causing both controllers to build flawed process models”. UCAs and their different causes are compiled in a long list or table. Finally, based on the whole analysis, organizations can determine additional necessary system-level constraints (note that this goes beyond risk assessment and already is part of risk treatment) (Leveson & Thomas, 2018). How AGI companies could use it.STPA, designed for inherently dangerous systems, is probably most helpful for controlling AI systems, not aligning them. For example, before pre-training or fine-tuning a new model, STPA may help AGI companies assess whether they are prepared to keep in check an unprecedentedly capable and potentially dangerous model. In this example, the system includes the model, hardware and software used to train it, and humans involved in the process. A loss would be the model escaping human control. System-level hazards, such as the model avoiding shutdown, copying or improving itself, gaining internet access, or collaborating with other models, can be mitigated through various constraints, such as physical access barriers to hardware and software for both the model and the humans interacting with it. The control structure model consists of human controllers (engineers, researchers, managers) and technical controllers (automated mechanisms, other AI systems). Control actions may include shutdown mechanisms, actuators could be changes to the model’s architecture, and sensors could involve evals conducted during training. As an overarching example, UCAs involving controllers, actuators, or sensors failing to respond to triggers might be caused by the model deceiving or manipulating them. Benefits.The inventors of the technique claim that it has been shown to identify more causes of undesired events than other risk assessment techniques(Leveson & Thomas, 2018). It has recently become increasingly popular(Rausand & Haugen, 2020). Because of the technical and societal complexity of risks from AI, STPA has already been argued to be more adequate to this context (Dobbe, 2022; Khlaaf, 2023). At the very least, the technique provides a different perspective than other risk assessment techniques which is a feature in itself (Section 7). Limitations.The technique’s main limitation is its complexity. There is also no standardized approach on how to use it(Rausand & Haugen, 2020). However, we have outlined the common steps which AGI companies could follow. The handbooks and textbooks referenced provide additional details and guidance. Another comparatively minor limitation is that the technique’s usefulness over more established risk assessment techniques is still contested. STPA may help to find different rather 23 than additional causes of undesired events, and thus should be used to obtain a complementary rather than a supplementary perspective on risks (Rausand & Haugen, 2020). Recommendations.We encourage AGI companies to use STPA to think through how they can remain in control over increasingly capable and potentially dangerous models. Nevertheless, as STPA is a time-consuming technique – both to learn and to apply – we recommend starting with bow tie analysis and using STPA to expand on complex issues. AGI companies could also commission a research or consultancy organization with doing this. 6 Risk evaluation Risk evaluation is the third step in the risk assessment process. It aims to determine whether a specific risk is acceptable or whether its treatment is warranted. To this end, organizations compare the results of the risk analysis (Section 5) with the individual and overall risk they are willing to take and able to tolerate (so-called “risk appetite” and “risk tolerance”)(IEC, 2019). 9 According to our selection criteria, we chose the following two risk evaluation techniques: checklists (Section 6.1) and risk matrices (Section 6.2). 6.1 Checklists Checklists are questionnaires to be used in pre-defined situations(IEC, 2019). As a risk evaluation technique, checklists contain questions that help to identify risks associated with a certain decision or project, and judge whether countermeasures are necessary. How they work.To develop a checklist, organizations determine a relevant situation and set up a list of questions to be answered in that situation. Checklists may consist of box-ticking, yes/no questions, or open-ended questions, and can be mandatory or voluntary. As a risk evaluation technique, they contain questions that help assess whether and to what extent a particular decision or project contributes to risks. They may also entail decision rules, such as “if X, then do not do Y” or “if Z, then consult with management”. Alternatively, checklists that have been completed by employees (e.g. researchers) may be reviewed by a particular person or team by default (e.g. a governance or risk management team). For example, a common type of checklists used for risk evaluation are so- called “impact assessments” (e.g. privacy or environmental impact assessments). Impact assessment checklists contain questions about the effects of a specific decision or project on individuals, society, or the environment(e.g. UK Information Commissioner’s Office, n.d.; US Environmental Protection Agency, 1998). How AGI companies could use them.AGI companies could develop checklists for relatively frequent situations that in sum may substantially affect catastrophic risks from AI. Risk typologies and taxonomies (Section 4.3) can be helpful for developing those checklists, both for determining situations in which they may be useful (e.g. based on the AI system lifecycle) and for coming up with their content (e.g. the different risks to be evaluated). For blueprints of how checklists for risks from AI could be structured, AGI companies may want to look at impact assessment checklists on trustworthy AI(EU High-Level Expert Group on AI, 2020), AI fairness(Madaio et al., 2020), and responsible AI(Microsoft, 2022b). For instance, some suitable situations may include choosing a new research project (e.g. to what extent will it contribute to progress in capabilities and safety respectively), making choices about a model’s architecture or training (e.g. what will be the gain in performance versus the increase in risks of different options), publishing a piece of research (e.g. what will be the effects on competitive dynamics and misuse risks), communicating with external parties (e.g. what will be the effects on competitive dynamics), and monitoring a deployed model (e.g. what instances of misuse have occurred). To our knowledge, only a few checklists suitable for evaluating catastrophic risks from AI have already been developed. Hendrycks and Mazeika (2022, Annex C) suggest a checklist for assessing the impact of a potential research project on existential risks from AI. It assumes that research projects will often advance both capabilities and safety as well as have relevant indirect effects (e.g. on competitive dynamics). In light of this, the checklist aims to support individual researchers to 9 How to establish an organization’s risk appetite and risk tolerance is a question beyond risk assessment and thus the scope of this paper (Section 8). For the purposes of this section, we simply assume that the organization’s risk appetite and risk tolerance have been defined. 24 decide whether they should pursue a certain project. Barrett et al. (2023a, Section 3.2.2.1.1) provide a starting point for a pre-development and pre-deployment checklist for catastrophic risks from AI. It asks questions about the type (e.g. health, fundamental rights, national security) and magnitude (e.g. multiple fatalities or negligible harm) of adverse effects of AI systems on individuals, groups, organizations, and society. The checklist concretizes the NIST AI Risk Management Framework (NIST, 2023), but may need to be concretized even further (e.g. it may need to be adapted to specific situations). Benefits.Checklists may save time because they can be used multiple times. They can also decentralize risk evaluation. Instead of a specific team alone being in charge, checklists allow organizations to require all employees to take part in risk evaluation. At the same time, through the design of checklists, organizations remain in control over the focus and substance of this risk evaluation. Checklists may also increase overall risk awareness within the AGI company, and thus contribute to a so-called “safety culture”, where employees generally aim to minimize risks in their everyday work. Limitations.On the other hand, checklists are time-consuming to develop. The introduction of checklists may also increase the workload for employees. Therefore, the costs and benefits of requiring the use of a checklist in a particular situation need to be weighted against each other, and checklists should be as short and simple as possible. Moreover, the introduction of checklists may be perceived as annoying and burdensome by employees. As a result, employees may ignore checklists or only apply them superficially. To prevent this, good communication about the purpose of checklists is necessary. Checklists are also prone to missing “unknown unknowns”. This can be alleviated through more open-ended prompts that encourage creative thinking. However, this in turn necessitates more expertise by the user, leading to a tradeoff between simplicity and usefulness (IEC, 2019). Recommendations.We encourage AGI companies to use checklists, especially ones that have already been developed(e.g. Hendrycks & Mazeika, 2022, Annex C). The larger an organization, the more routine decisions and projects that could have an impact on risks it entails. This makes checklists increasingly worth developing as organizations scale. They may thus be particularly relevant for larger AGI companies like OpenAI and Google DeepMind. AGI companies should monitor correct usage of checklists and update them along with new developments and insights, particularly when new risks are identified. 6.2 Risk matrices A risk matrix, also known as heat map or consequence/likelihood matrix, is a table that contains consequence and likelihood ratings of different risks, often on a scale from 1 to 5. Each cell represents a specific combination of consequence and likelihood. Different risks can be plotted on the matrix to determine the need and priority of addressing them(IEC, 2019). Risk matrices are one of the most common risk evaluation techniques. How it works.To set up a risk matrix, organizations need to develop and combine two scales: one that ranks the consequence of risks, and another that ranks their likelihood. Typically, both scales have three to five points. The consequence scale can be quantitative (e.g. by cost or fatalities) or qualitative (e.g. catastrophic, major, moderate, minor). For different types of risks, separate definitions of the categories may be necessary. For instance, financial risks may be expressed in monetary terms (quantitative), health risks in number and severity of people affected (semi- quantitative), and reputation risks in purely qualitative terms. The likelihood scale can be quantitative (i.e. probabilities) or qualitative (e.g. frequent, probable, occasional, remote, improbable). For the likelihood scale, a time horizon of that risk occurring should be specified (e.g. within the next year). Then, organizations need to combine both scales into a consequence/likelihood matrix and develop an ordinal scale for ranking the priority of risks, typically with three to five points (I, I, I, IV, V). Decision rules can be linked to priority categories, such as that risks of both highest consequence and likelihood must always be treated. The final matrix allows organizations to locate individual risks by their consequence and likelihood descriptor (IEC, 2019). How AGI companies could use them.AGI companies could use a risk matrix to prioritize among different ultimate catastrophic risks from AI. Yet, we note that ultimate catastrophic risks from AI may be extremely hard to evaluate. Consequence ratings will likely depend on fundamental ethical questions, such as the value of happiness, suffering, and other moral goods like equality. 25 Consequence Rating Financial Health and safety Environment and community Etc. a Max credible loss ($) Multiple fatalities Irreversible significant harm; community outrage b c d e Minimum of interest ($) First aid only required Minor temporary damage aIIIIIIIIII bIVIIIIIIIII cVIVIIIIII dVVIVIIIII eVVIVIIIII 12345 Likelihood Rating DescriptorDescriptor meaning 5LikelyExpected to occur within weeks 4 3 2 1Remotely possibleMinor temporary damage Likelihood rating Consequence rating ab c Figure 13: Segments of consequence and likelihood scales and an asymmetric risk matrix (which gives more weight to likelihood than consequence) (IEC, 2019) While 1 trillion fatalities may clearly be worse than 100 million fatalities, it is very hard to compare these outcomes to, for example, lasting environmental damage or epistemic degradation through misinformation or polarization. Likelihood estimates for these scenarios may also be very difficult to obtain. To that end, AGI companies could use the Delphi technique (Section 5.2) and cross-impact analysis (Section 5.3). However, as highlighted in the limitations of these techniques, their results may be highly speculative. AGI companies could instead focus on the sources of catastrophic risks from AI. When doing this, it is important to aim for consistency in the level of granularity (see limitations below). For example, the misuse risk “terrorist attack with a novel pathogen” could be split into the risks of users bypassing various safeguards ingrained in the model or implemented through access design. The accident risk “takeover by misaligned AI”, could be broken down into the risks of the emergence of various dangerous model capabilities and propensities. AGI companies could also use risk matrices to compare catastrophic risks from AI to other types of risks. This would allow them to make overall decisions on which risks to prioritize.(Khlaaf, 2023) has developed a risk matrix for societal risks from AI (where “catastrophic” is defined as “death, permanent total disability, direct harm, significant system or asset loss, or irreversible significant environmental impact”). AGI companies could build on that and add “global” as the highest category of consequence (e.g. global, catastrophic, major, moderate, minor). They could also include types of risks that do not stem directly from AI, such as financial, reputation, or IP risks. In light of the high uncertainty around catastrophic risks from AI, AGI companies could also design the priority scale to give more weight to consequence than likelihood. For example, the column at the intersection of “global” (highest consequence) and third-highest likelihood may still be of highest priority, while the column at the intersection of “major” (third-highest consequence) and highest likelihood may only be of second-highest priority (the opposite is the case in Figure 13). Another option for designing a risk matrix for both catastrophic risks and their sources is to sub- stitute likelihood with vulnerability. Risk can be conceptualized as the concurrence of threats and vulnerabilities(McChrystal & Butrico, 2021). While a threat is the source of a risk, a vulnerability is a necessary condition of the asset at stake(Aven, 2015). In other words, a vulnerability is a weakness in a system which makes it susceptible to harm(Ostrom & Wilhelmsen, 2019). Assessing vulnerability rather than likelihood is an emerging paradigm for infrequent and highly uncertain risks (e.g. Baylon & Hilton, 2020; Ritchey et al., 2004; UK Royal Academy of Engineering, 2023). To that end, the process for developing the risk matrix remains the same except that the second dimension is 26 vulnerability instead of likelihood. The vulnerability scale is qualitative and categories may simply be “high”, “medium”, and “low”. To assess vulnerability, the most relevant criteria are the the existence and effectiveness of preventive and reactive controls, which AGI companies can determine using bow tie analysis (Section 5.4) and STPA (Section 5.5). Other relevant criteria may be origin (e.g. external or internal), driver (e.g. intentionality or negligence), and velocity of the risk (i.e. the speed at which it would materialize). The different criteria can be weighted to obtain a single vulnerability score and assign a vulnerability category. Benefits.Risk matrices are easy to use. They allow comparing and prioritizing between different risks. They provide a clear visualization(IEC, 2019), which makes it easier to communicate risks internally and externally, such as with leadership or auditors. Risk matrices can also be used to compare and prioritize between different types of risks(IEC, 2019), which allows AGI companies to integrate catastrophic risks from AI in their overall risk evaluation. Limitations.Developing a risk matrix is somewhat difficult. First, the consequence scale may sometimes and the priority scale does always involve normative judgments. There is no single correct answer to which types of consequence are of the same severity (e.g. a specific number of facilities and a specific amount of financial costs) or how to weigh consequence and likelihood (e.g. symmetrically or not). This may complicate the process as different people may have diverging opinions. Second, the rating of a risk on consequence and likelihood depends on the level of detail in which it is described. A higher level of detail leads to a higher number of risks, each with lower consequence and likelihood ratings. Therefore, the level of detail should be as consistent as possible among different risks. Otherwise, valid comparisons of risks cannot be made. Finally, priority rankings can mislead users of risk matrices to make invalid comparisons of risks. However, the consequence scale is ordinal (e.g. steps from one category of consequence to the next may not be of equal size). As a result, two risks of the lowest priority category together do not necessarily have the same priority as a single risk of the second-lowest priority category (IEC, 2019). Recommendations.We strongly recommend AGI companies to use risk matrices if this is not already the case. In addition to evaluating the ultimate catastrophic risks from AI, they could break risks down into their sources and evaluate those. In cases where uncertainty about likelihood proves too high, AGI companies could try out using vulnerability as the second dimension instead. 7 Discussion When to conduct risk assessments.If risk assessments are not up to date, important risks can be missed. Therefore, risk assessment should happen continuously and iteratively(ISO, 2018; ISO & IEC, 2023; IEC, 2019; NIST, 2023). Ideally, risk assessments are conducted both in the abstract (initially to establish the risk universe, at regular intervals, and whenever relevant changes of circumstances occur), and before making concrete decisions. The focus at AGI companies so far has been on pre-deployment risk assessment(Brundage et al., 2022; Kavukcuoglu et al., 2022; OpenAI, 2023). However, experts widely agree that AGI companies need to begin even earlier(Schuett et al., 2023). In particular, AGI companies should also conduct risk assessments before training a new model – this may include risk assessments before any step of the training process, such as before conducting smaller training runs, before conducting the final large training run, before fine-tuning, and before conducting evals. It is also important to constantly evaluate and improve risk assessment practices(ISO, 2018; ISO & IEC, 2023; IEC, 2019; NIST, 2023). AGI companies should ensure this is happening, for example, through an internal audit team (Schuett, 2023). When to use which risk assessment technique.There is no specific trigger for each of the techniques discussed in this paper. Instead, they can all be applied in various situations. However, each technique has a particular focus or function (for an overview, see Table 1). AGI companies can use techniques from the three risk assessment steps (risk identification, risk analysis, and risk evaluation) to build on one another. Within risk analysis, there are techniques for analyzing dependencies and interactions between different risk sources (causal mapping, cross-impact analysis), techniques for estimating the likelihood of risks (Delphi technique, cross-impact analysis), and techniques for analyzing controls (bow tie analysis, STPA). AGI companies should conduct risk assessments both in the abstract and to inform concrete decisions. Within risk identification, some techniques are best applied to provide background information (scenario analysis, risk typologies and taxonomies), while some risk evaluation techniques only make sense to help with concrete decisions (checklists). All other 27 techniques can be applied both in the abstract and with regard to concrete decisions. Moreover, some techniques are easy to use (fishbone method, bow tie analysis), while others are very complex (cross-impact analysis, STPA). In choosing among more or less costly techniques, AGI companies should consider their value of information(Barrett, 2017). Finally, for “very novel, complex or challenging issues, where there is high uncertainty and little experience”, it is recommended to use multiple techniques and include a variety of stakeholder views(IEC, 2019; Potts et al., 2014). Therefore, even if the focus and function of different techniques overlap, AGI companies should use several techniques in a given situation (Figure 1). How to use any risk assessment technique.Some overarching factors can greatly affect the usefulness of any risk assessment technique. First, it must be ensured that the people involved know how to properly use the technique and have the relevant subject-matter knowledge. Otherwise, the results of techniques may be less useful, or even create a false sense of security(Khlaaf, 2023). It may thus be helpful for AGI companies to hire or contract experienced risk analysts. Second, many techniques require input from different people within the organization. It is therefore common and highly recommendable to organize cross-functional workshops(Stolzer et al., 2023). Third, risk assessment should not be limited to technical factors, but also take into account human and organizational factors. This has been an important realization in aviation over the last decades (R. Müller & Drax, 2014). Finally, making use of risk assessments techniques is only valuable if their results have a bearing on decisions. Implementing risk assessment techniques may be easier and less costly than actually following through on their results. It is essential that AGI companies set up structures and processes that ensure this. 8 Conclusion Summary.In this paper, we have discussed ten popular risk assessment techniques from established industries like finance, aviation, nuclear, and biolabs. For each of the techniques, we have described how they work and how AGI companies could use them, in many cases by providing concrete examples. We have also discussed their benefits and limitations in the context of AGI companies assessing catastrophic risks from AI, and made recommendations. Finally, we have discussed when to conduct risk assessments, when to use which technique, and how to use any risk assessment technique. Contributions.We have made three main contributions to the existing literature. First, we have developed criteria for selecting risk assessment techniques from other industries. These criteria may also be useful for other efforts that aim to identify best practices from other industries suitable for the context of AGI companies and catastrophic risks from AI. Second, building on existing research which has singled out specific techniques, we have conducted a comprehensive review of established risk assessment techniques. Third, in contrast to previous research, we have focused on AGI companies and made actionable recommendations for how they could implement the techniques we selected. Limitations.This paper has four main limitations. First, we have singled out catastrophic risks from AI. Yet, these risks are intertwined with many other risks AGI companies face and expose their environment to. For instance, AGI companies that expect financial difficulties may be tempted to reduce their safety efforts. Therefore, a full understanding of catastrophic risks from AI may only be possible through considering all types of risks that are relevant for AGI companies, and taking into account phenomena like compound or cascading risks. Second, this paper has looked at popular risk assessment techniques from other industries. While we believe that the techniques we included in this paper may be useful, just as in every other industry, techniques specifically tailored to the AI context will need to be developed, too. Third, we have left out many techniques we expect to currently be less relevant for AGI companies. However, this may change in the future (Section 3). Finally, although we have tried to apply the techniques as concretely as possible to the context of AGI companies, our suggestions remain fairly general. The techniques need to be customized to the particularities of individual AGI companies. Questions for further research.Many questions around risk assessment at AGI companies require further research. For example, the question of how AGI companies should define what level of risk they are willing to take (risk appetite) and able to tolerate (risk tolerance) remains open. The definition of a company’s risk appetite and tolerance should be informed by the results of risk assessments, but 28 they are also required to evaluate in the abstract what level of risk is acceptable. Another question that warrants further research is how AGI companies can improve their safety culture(Hendrycks et al., 2023; Manheim, 2023; Shneiderman, 2020). The question of how AGI companies should document and report the results of risk assessments also needs to be investigated more. Traditionally, risk registers are an important tool in this regard, but there are also AI-specific tools, such as datasheets (Bender & Friedman, 2018; Gebru et al., 2021), model cards(Mitchell et al., 2019), system cards (Procope et al., 2022) and risk cards(Derczynski et al., 2023). Whether and how the results of risk assessments could be shared with external parties like researchers, auditors, or governments warrants further investigation(Anderljung et al., 2023; Shevlane et al., 2023). Finally, concrete ways in which relatively abstract frameworks like ISO/IEC 23894:2023(ISO & IEC, 2023) and the NIST AI Risk Management Framework(NIST, 2023) can be adapted to the context of AGI companies need further elaboration (see e.g. Barrett et al., 2023a, 2023b). AGI companies need to act on their statements that catastrophic risks from AI should be a global priority(Center for AI Safety, 2023). It might be possible that the next generation of state-of-the-art AI systems will already pose catastrophic risks. AGI companies therefore need to urgently improve their risk management practices. We hope this paper can support such efforts. While the reviewed techniques will certainly not be sufficient to assess catastrophic risks from AI, AGI companies should not skip the straightforward step of reviewing best practices from other industries. Acknowledgements We are grateful for invaluable input from Cullen O’Keefe, James Ginns, and Malcolm Murray, as well as research management support from Emma Bluemke and discussions with Jan Brauner. We thank Michael Aird, Markus Anderljung, Anthony Barrett, Seth Baum, Siméon Campos, Zeshen Chin, Shaun Ee, Lennart Heim, Samuel Hilton, Laura Hiscott, Sebastian Lodemann, David Manheim, Joseph O’Brien, Merlin Stein, and Akash Wasil for helpful feedback on an earlier version of this paper. We are also grateful for input from the participants of two Project-in-Progress sessions of the 2023 Winter Research Fellowship of the Centre for the Governance of AI, where we presented earlier versions of this work, namely Ben Garfinkel, Jide Alaga, Kayla Blomquist, Benjamin Bucknall, Conor Downey, Saffron Huang, Kristy Loke, Nikhil Mulani, Jai Vipra, and Owen Yeung. We acknowledge the use of ChatGPT (https://chat.openai.com) to help with editing. All remaining errors are our own. References Ackermann, F., Howick, S., Quigley, J., Walls, L., & Houghton, T. (2014). Systemic risk elicitation: Using causal maps to engage stakeholders and build a comprehensive view of risks.European Journal of Operational Research,238(1), 290–299. doi: 10.1016/j.ejor.2014.03.035 Alignment Research Center. (2023).Update on ARC’s recent eval efforts.Alignment Research Center Blog. Retrieved fromhttps://evals.alignment.org/blog/2023-03-18-update-on -recent-evals/ Anderljung, M., Barnhart, J., Leung, J., Korinek, A., O’Keefe, C., Whittlestone, J., . . . Wolf, K. (2023). Frontier AI regulation: Managing emerging risks to public safety.arXiv preprint arXiv:2307.03718. Anderljung, M., & Hazell, J. (2023). Protecting society from AI misuse: When are restrictions on capabilities warranted?arXiv preprint arXiv:2303.09377. Anthropic. (2023).An AI policy tool for today: Ambitiously invest in NIST.Retrieved fromhttps:// w.anthropic.com/index/an-ai-policy-tool-for-today-ambitiously-invest -in-nist Applebaum, A., Miller, D., Strom, B., Korban, C., & Wolf, R. (2016). Intelligent, automated red team emulation. InProceedings of the 32nd Annual Conference on Computer Security Applications (p. 363–373). doi: 10.1145/2991079.2991111 Aqlan, F., & Mustafa Ali, E. (2014). Integrating lean principles and fuzzy bow-tie analysis for risk assessment in chemical industry.Journal of Loss Prevention in the Process Industries,29, 39–48. doi: 10.1016/j.jlp.2014.01.006 29 Armstrong, S., Sotala, K., & Ó hÉigeartaigh, S. S. (2014, July). The errors, insights and lessons of famous AI predictions – and what they mean for the future.Journal of Experimental & Theoretical Artificial Intelligence,26(3), 317–342. doi: 10.1080/0952813X.2014.895105 Aven, T. (2015).Risk analysis. Wiley. doi: 10.1002/9781119057819 Avin, S., Wintle, B. C., Weitzdörfer, J., Ó hÉigeartaigh, S. S., Sutherland, W. J., & Rees, M. J. (2018). Classifying global catastrophic risks.Futures,102, 20–26. doi: 10.1016/j.futures.2018.02.001 Barrett, A. M. (2017). Value of global catastrophic risk (GCR) information: Cost-effectiveness- based approach for GCR reduction.Decision Analysis,14(3), 187–203. doi: 10 .1287/ deca.2017.0350 Barrett, A. M., & Baum, S. D. (2017a). A model of pathways to artificial superintelligence catastrophe for risk and decision analysis.Journal of Experimental & Theoretical Artificial Intelligence, 29(2), 397–414. doi: 10.1080/0952813X.2016.1186228 Barrett, A. M., & Baum, S. D. (2017b). Risk analysis and risk management for the artificial superintelligence research and development process. In V. Callaghan, J. Miller, R. Yampolskiy, & S. Armstrong (Eds.),The technological singularity(p. 127–140). Springer. doi: 10.1007/ 978-3-662-54033-6_6 Barrett, A. M., Hendrycks, D., Newman, J., & Nonnecke, B. (2023a). Actionable guidance for high-consequence AI risk management: Towards standards addressing ai catastrophic risks. arXiv preprint arXiv:2206.08966. Barrett, A. M., Hendrycks, D., Newman, J., & Nonnecke, B. (2023b).Seeking input and feed- back: AI risk management-standards profile for increasingly multi- or general-purpose AI.Center for Long-Term Cybersecurity.Retrieved fromhttps://cltc.berkeley .edu/seeking -input -and -feedback -ai -risk -management -standards -profile -for-increasingly-multi-purpose-or-general-purpose-ai/ Baum, S. D. (2020). Quantifying the probability of existential catastrophe: A reply to Beard et al. Futures,123, 102608. doi: 10.1016/j.futures.2020.102608 Baum, S. D., Goertzel, B., & Goertzel, T. G. (2011). How long until human-level AI? results from an expert assessment.Technological Forecasting and Social Change,78(1), 185–195. doi: 10.1016/j.techfore.2010.09.006 Baylon, C., & Hilton, S. (2020).Risk management in the UK: What can we learn from COVID-19 and are we prepared for the next disaster?Retrieved fromhttps://w.cser.ac.uk/ resources/risk-management-uk/ Beard, S., Rowe, T., & Fox, J. (2020a). An analysis and evaluation of methods currently used to quantify the likelihood of existential hazards.Futures,115, 102469. doi: 10 .1016/ j.futures.2019.102469 Beard, S., Rowe, T., & Fox, J. (2020b). Existential risk assessment: A reply to Baum.Futures,122, 102606. doi: 10.1016/j.futures.2020.102606 Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science.Transactions of the Association for Computational Linguistics,6, 587–604. doi: 10.1162/tacl_a_00041 Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? InFAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(p. 610–623). doi: 10.1145/3442188.3445922 Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. InAdvances in Neural Information Processing Systems (NeurIPS)(Vol. 29). Book, G. (2012). Lessons learned from real world application of the bow-tie method. InSpe middle east health, safety, security, and environment conference and exhibition.SPE. doi: 10.2118/154549-MS Bostrom, N. (1998). How long before superintelligence?International Journal of Future Studies,2. Bostrom, N. (2002). Existential risks: Analyzing human extinction scenarios.Journal of Evolution and Technology,9(1), 1–31. Bostrom, N. (2013). Existential risk prevention as global priority.Global Policy,4(1), 15–31. doi: 10.1111/1758-5899.12002 Bostrom, N. (2014).Superintelligence: Paths, dangers, strategies. Oxford University Press. Bostrom, N., & ́ Cirkovi ́ c, M. M. (Eds.). (2008).Global catastrophic risks. Oxford University Press. Bradshaw, T., & Abboud, L. (2023).Four-week-old AI start-up raises recordC105mn in European push.Financial Times. Retrieved fromhttps://w.ft.com/content/cf939ea4-d96c 30 -4908-896a-48a74381f251 Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., . . . Amodei, D. (2018). The malicious use of artificial intelligence: Forecasting, prevention, and mitigation.arXiv preprint arXiv:1802.07228. Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., . . . Anderljung, M. (2020). Toward trustworthy AI development: Mechanisms for supporting verifiable claims.arXiv preprint arXiv:2004.07213. Brundage, M., Mayer, K., Eloundou, T., Agarwal, S., Adler, S., Krueger, G., . . . Mishkin, P. (2022). Lessons learned on language model safety and misuse.Retrieved fromhttps://openai.com/ research/language-model-safety-and-misuse Bryson, J. M., Ackermann, F., Eden, C., & Finn, C. B. (2004).Visible thinking: Unlocking causal mapping for practical business results. Wiley. Buchanan, B., Lohn, A., Musser, M., & Sedova. (2021).Truth, lies, and automation - how language models could change disinformation.Center for Security and Emerging Technology. Retrieved fromhttps://cset.georgetown.edu/publication/truth-lies-and-automation/ Bucknall, B. S., & Dori-Hacohen, S. (2022). Current and near-term AI as a potential existential risk factor.arXiv preprint arXiv:2209.10604. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. InProceedings of the 1st Conference on Fairness, Accountability and Transparency(p. 77–91). Carlsmith, J. (2022). Is power-seeking AI an existential risk?arXiv preprint arXiv:2206.13353. Center for AI Safety. (2023).Statement on AI risk.Retrieved fromhttps://w.safe.ai/ statement-on-ai-risk Center for Security and Emerging Technology. (n.d.).Taxonomy.AI Incident Database. Retrieved fromhttps://incidentdatabase.ai/taxonomy/cset/ Centre for Interdisciplinary Risk and Innovation Research. (n.d.).Cross-impact balances home. Retrieved fromhttps://w.cross-impact.org Chan, A., Salganik, R., Markelius, A., Pang, C., Rajkumar, N., Krasheninnikov, D., . . . Maharaj, T. (2023). Harms from increasingly agentic algorithmic systems. InFAccT ’23: the 2023 ACM Conference on Fairness, Accountability, and Transparency(p. 651–666). doi: 10.1145/ 3593013.3594033 Chapelle, A. (2019).Operational risk management: best practices in the financial services industry. Wiley. Chermack, T. J. (2011).Scenario planning in organizations: How to create, use, and assess scenarios. Berrett-Koehler. Chin, Z. (2022a).Embedding safety in ML development.AI Alignment Forum. Retrieved fromhttps://w .alignmentforum .org/posts/dYHiMeSdLrrX3cy4a/embedding -safety-in-ml-development Chin, Z. (2022b).What if we approach AI safety like a technical engineering safety prob- lem.AI Alignment Forum.Retrieved fromhttps://w .alignmentforum .org/ posts/zNYmbFwgrxiNtayMm/what-if-we-approach-ai-safety-like-a-technical -engineering Christiano, P. (2019).What failure looks like.AI Alignment Forum. Retrieved fromhttps:// w.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like Clarke, S.(2022).Classifying sources of AI x-risk.Effective Altruism Forum.Re- trieved fromhttps://forum.effectivealtruism.org/posts/e55QpEExmtkRjw9CD/ classifying-sources-of-ai-x-risk Clarke, S., Carlier, A., & Schuett, J. (2021).Survey on AI existential risk scenarios.Effec- tive Altruism Forum. Retrieved fromhttps://forum.effectivealtruism.org/posts/ 2tumunFmjBuXdfF2F/survey-on-ai-existential-risk-scenarios-1 Clarke, S., & Whittlestone, J. (2022). A survey of the potential long-term impacts of AI: How ai could lead to long-term changes in science, cooperation, power, epistemics and values. In AIES 2022: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society(p. 192–202). doi: 10.1145/3514094.3534131 Cohen, M. K., Hutter, M., & Osborne, M. A. (2022). Advanced artificial agents intervene in the provision of reward.AI Magazine,43(3), 282–293. doi: 10.1002/aaai.12064 COSO. (2017).Guidance on Enterprise Risk Management.https://w.coso.org/sitepages/ guidance-on-enterprise-risk-management.aspx?web=1. 31 Cotra, A.(2020).Draft report on AI timelines.AI Alignment Forum.Retrieved from https://w.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on -ai-timelines Cotra, A. (2022).Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.AI Alignment Forum. Retrieved fromhttps://w.alignmentforum.org/ posts/pRkFkzwKZ2zfa3R6H/without -specific -countermeasures -the -easiest -path-to Cotton-Barratt, O., Daniel, M., & Sandberg, A. (2020). Defence in depth against human extinction: Prevention, response, resilience, and why they all matter.Global Policy,11(3), 271–282. doi: 10.1111/1758-5899.12786 Cremer, C. Z., & Whittlestone, J. (2021). Artificial canaries: Early warning signs for anticipatory and democratic governance of AI.International Journal of Interactive Multimedia and Artificial Intelligence,6(5), 100–109. Critch, A. (2021).What multipolar failure looks like, and robust agent-agnostic processes (RAAPs).AI Alignment Forum. Retrieved fromhttps://w.alignmentforum.org/ posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust -agent-agnostic Critch, A., & Krueger, D. (2020). AI research considerations for human existential safety (ARCHES). arXiv preprint arXiv:2006.04948. Critch, A., & Russell, S. (2023). TASRA: a taxonomy and analysis of societal-scale risks from AI. arXiv preprint arXiv:2306.06924. Dafoe, A. (2018).AI governance: A research agenda.Centre for the Governance of AI. Retrieved fromhttps://uploads-ssl.webflow.com/614b70a71b9f71c9c240c7a7/ 61d48553bf2faf58c3900bd2_GovAI-Research-Agenda.pdf Dafoe, A. (2020).AI governance: Opportunity and theory of impact.Effective Altruism Forum. Re- trieved fromhttps://forum.effectivealtruism.org/posts/42reWndoTEhFqu6T8/ ai-governance-opportunity-and-theory-of-impact Davidson, T. (2021).Report on Semi-informative Priors.Open Philanthropy Blog. Retrieved from https://w .openphilanthropy .org/research/report -on -semi -informative -priors/ Davidson, T. (2023).What a compute-centric framework says about AI takeoff speeds - draft report.AI Alignment Forum. Retrieved fromhttps://w.alignmentforum.org/ posts/Gc9FGtdXhK9sCSEYu/what-a-compute-centric-framework-says-about-ai -takeoff# de Neufville, R. (2023).Forecasting extraordinary events.Telling the Future. Retrieved fromhttps://tellingthefuture.substack.com/p/forecasting-extraordinary -events Derczynski, L., Kirk, H. R., Balachandran, V., Kumar, S., Tsvetkov, Y., Leiser, M. R., & Moham- mad, S. (2023). Assessing language model deployment with risk cards.arXiv preprint arXiv:2303.18190. Dobbe, R. I. J. (2022). System safety and artificial intelligence. In J. B. Bullock et al. (Eds.), The oxford handbook of ai governance.Oxford University Press. doi: 10.1093/oxfordhb/ 9780197579329.013.67 Etzioni, O. (2020).How to know if artificial intelligence is about to destroy civilization.MIT Technology Review. Retrieved fromhttps://w.technologyreview.com/2020/02/ 25/906083/artificial-intelligence-destroy-civilization-canaries-robot -overlords-take-over-world-ai EU High-Level Expert Group on AI. (2020).Assessment list for trustworthy artificial intelligence (AL- TAI) for self-assessment.Retrieved fromhttps://digital-strategy.ec.europa.eu/en/ library/assessment-list-trustworthy-artificial-intelligence-altai-self -assessment European Foresight Platform. (n.d.).Cross-impact analysis.Retrieved fromhttp://foresight -platform.eu/community/forlearn/how-to-do-foresight/methods/analysis/ cross-impact-analysis Fei, N., Lu, Z., Gao, Y., Yang, G., Huo, Y., Wen, J., . . . Wen, J.-R. (2022). Towards artificial general intelligence via a multimodal foundation model.Nature Communications,13(1), 3094. doi: 10.1038/s41467-022-30761-2 Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., . . . Clark, J. (2022). Red teaming language models to reduce harms: Methods, scaling, behaviors, and lessons learned. 32 arXiv preprint arXiv:2209.07858. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets.Communications of the ACM,64(12), 86–92. doi: 10.1145/3458723 Goertzel, B. (2011).Who coined the term “AGI”?Retrieved fromhttps://goertzel.org/ who-coined-the-term-agi/ Good Judgment. (n.d.).Delphineo.Retrieved fromhttps://goodjudgment.com/delphineo/ Google. (n.d.).Google and Alphabet vulnerability reward program (VRP) rules.Retrieved fromhttps://bughunters.google.com/about/rules/6625378258649088/google -and-alphabet-vulnerability-reward-program-vrp-rules Gordon, T. J. (1994).Cross-impact method.Retrieved fromhttps://web.archive.org/web/ 20110713182749/http://w.lampsacus.com/documents/CROSSIMPACT.pdf Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2018). When will AI exceed human performance? evidence from AI experts.arXiv preprint arXiv:1705.08807. Gruetzemacher, R., Paradice, D., & Lee, K. B. (2019). Forecasting transformative AI: An expert survey.arXiv preprint arXiv:1901.08579. Gubrud, M. A. (1997).Nanotechnology and international security.Retrieved fromhttps://web .archive.org/web/20110427135521/http://w.foresight.org/Conferences/ MNT05/Papers/Gubrud/index.html Hazell, J. (2023). Large language models can be used to effectively scale spear phishing campaigns. arXiv preprint arXiv:2305.06972. Hendrycks, D., & Mazeika, M.(2022).X-risk analysis for AI research.arXiv preprint arXiv:2206.05862. Hendrycks, D., Mazeika, M., & Woodside, T. (2023). An overview of catastrophic AI risks.arXiv preprint arXiv:2306.12001. HFACS. (n.d.).The HFACS framework.Retrieved fromhttps://w.hfacs.com/hfacs -framework.html HM Government.(2021).National AI strategy.Retrieved fromhttps://w .gov .uk/ government/publications/national-ai-strategy Hogarth, I. (2023).We must slow down the race to God-like AI.Financial Times. Retrieved from https://w.ft.com/content/03895dc4-a3b7-481e-95c-336a524f2ac2 Hsiao, S., & Collins, E. (2023).Try Bard and share your feedback.Google Blog. Retrieved from https://blog.google/technology/ai/try-bard/ Hubinger, E. (2022).How likely is deceptive alignment?AI Alignment Forum. Retrieved 2023-06-12, fromhttps://w.alignmentforum.org/posts/A9NxPTwbw6r6Awuwt/how-likely -is-deceptive-alignment IEC. (2019).31010:2019 Risk management — Risk assessment techniques.https://w.iso.org/ standard/72140.html. Ishikawa, K. (1976).Guide to quality control. Asian Productivity Organization. ISO. (2018).31000:2018 Risk management — Guidelines.https://w.iso.org/standard/ 65694.html. ISO & IEC. (2014).Guide 51:2014 Safety aspects — Guidelines for their inclusion in standards. https://w.iso.org/standard/53940.html. ISO & IEC. (2023).23894:2023 Information technology — Artificial intelligence — Guidance on risk management.https://w.iso.org/standard/77304.html. Johnson, J. L. (1976). A ten-year Delphi forecast in the electronics industry.Industrial Marketing Management,5(1), 45–55. doi: 10.1016/0019-8501(76)90009-2 Karnofsky, H. (2016).Some background on our views regarding advanced artificial intelli- gence.Open Philanthropy Blog. Retrieved fromhttps://w.openphilanthropy.org/ research/some -background -on -our -views -regarding -advanced -artificial -intelligence/ Kavukcuoglu, K., Kohli, P., Ibrahim, L., Bloxwich, D., & Brown, S. (2022).How our principles helped define AlphaFold’s release.Google DeepMind Blog. Retrieved fromhttps://w .deepmind.com/blog/how-our-principles-helped-define-alphafolds-release Kenton, Z., Shah, R., Lindner, D., Varma, V., Krakovna, V., Phuong, M., . . . Catt, E. (2022a). Clarifying AI x-risk.AI Alignment Forum. Retrieved fromhttps://w.alignmentforum .org/posts/GctJD5oCDRxCspEaZ/clarifying-ai-x-risk Kenton, Z., Shah, R., Lindner, D., Varma, V., Krakovna, V., Phuong, M., . . . Catt, E. (2022b).Threat model literature review.AI Alignment Forum. Retrieved fromhttps:// 33 w.alignmentforum.org/posts/wnnkD6P2k2TfHnNmt/threat-model-literature -review Khlaaf, H. (2023).Toward comprehensive risk assessments and assurance of AI-based systems. Trail of Bits. Retrieved fromhttps://github.com/trailofbits/publications/blob/ master/papers/toward_comprehensive_risk_assessments.pdf Khlaaf, H., Mishkin, P., Achiam, J., Krueger, G., & Brundage, M. (2022). A hazard analysis framework for code synthesis large language models.arXiv preprint arXiv:2207.14157. Kilian, K. A., Ventura, C. J., & Bailey, M. M. (2023). Examining the differential risk from high-level artificial intelligence and the question of control.Futures,151, 103182. doi: 10.1016/j.futures.2023.103182 Kim, J., Lee, J., Kim, G., Park, S., & Jang, D. (2016). A hybrid method of analyzing patents for sustainable technology management in humanoid robot industry.Sustainability,8(5), 474. doi: 10.3390/su8050474 Klein, E. (2023).The surprising thing A.I. engineers will tell you if you let them.The New York Times. Retrieved fromhttps://w.nytimes.com/2023/04/16/opinion/this -is-too-important-to-leave-to-microsoft-google-and-facebook.html Kokotajlo, D., & Wei, D. (2019).The main sources of ai risk?AI Alignment Forum. Re- trieved fromhttps://w .alignmentforum .org/posts/WXvt8bxYnwBYpy9oT/the -main-sources-of-ai-risk Kosow, H., & Gaßner, R. (2008).Methods of future and scenario analysis: Overview, assessment, and selection criteria.Deutsches Institut für Entwicklungspolitik. Retrieved fromhttps:// w.ssoar.info/ssoar/handle/document/19366 Krakovna, V., & Shah, R. (2023).[linkpost] some high-level thoughts on the DeepMind alignment team’s strategy.AI Alignment Forum. Retrieved fromhttps://w.alignmentforum .org/posts/a9SPcZ6GXAg9cNKdi/linkpost-some-high-level-thoughts-on-the -deepmind-alignment Leung, J. (2019).Who will govern artificial intelligence? learning from the history of strategic politics in emerging technologies.University of Oxford. Retrieved fromhttps://ora.ox.ac.uk/ objects/uuid:ea3c7cb8-2464-45f1-a47c-c7b568f27665 Leveson, N. G., & Thomas, J. P. (2018).STPA handbook.Retrieved fromhttps://psas.scripts .mit.edu/home/get_file.php?name=STPA_handbook.pdf Li, J., & Chignell, M. (2022). FMEA-AI: AI fairness impact assessment using failure mode and effects analysis.AI and Ethics,2, 837–850. doi: 10.1007/s43681-022-00145-9 Liu, H.-Y., Lauta, K. C., & Maas, M. M. (2018). Governing boring apocalypses: A new typology of existential vulnerabilities and exposures for existential risk research.Futures,102, 6–19. doi: 10.1016/j.futures.2018.04.009 Losi, S.(2023).System dynamics and AI regulation.Risk Musings.Retrieved from https://riskmusings .substack .com/p/system -dynamics -and -ai -regulation ?utm_campaign=post Madaio, M. A., Stark, L., Wortman Vaughan, J., & Wallach, H. (2020).AI fairness checklist.Retrieved fromhttps://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4t6dA Mahler, T. (2022). Regulating artificial general intelligence (AGI). In B. Custers & E. Fosch- Villaronga (Eds.),Law and artificial intelligence: Regulating AI and applying AI in legal practice(p. 521–540). Springer. doi: 10.1007/978-94-6265-523-2_26 Manheim, D. (2023).Building a culture of safety for AI: Perspectives and challenges.SSRN. Re- trieved fromhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=4491421 McChrystal, S. A., & Butrico, A. (2021).Risk: a user’s guide. Portfolio. McConnell, P., & Davies, M. (2006).Safety first – scenario analysis under Basel I.Retrieved from https://w.continuitycentral.com/SafetyFirstscenarioanalysis.pdf Metz, C. (2023).‘The godfather of A.I.’ leaves Google and warns of danger ahead.The New York Times. Retrieved fromhttps://w.nytimes.com/2023/05/01/technology/ ai-google-chatbot-engineer-quits-hinton.html Michael, J., Holtzman, A., Parrish, A., Mueller, A., Wang, A., Chen, A., . . . Bowman, S. R. (2022). What do NLP researchers believe? results of the NLP community metasurvey.arXiv preprint arXiv:2208.12852. Microsoft. (2020).Assessing harm: A guide for tech builders.Retrieved fromhttps://perma.c/ PV3E-HL23 Microsoft. (2022a).Harms modeling.Retrieved fromhttps://learn.microsoft.com/en-us/ azure/architecture/guide/responsible-innovation/harms-modeling/ 34 Microsoft. (2022b).Responsible AI impact assessment template.Retrieved fromhttps://query .prod.cms.rt.microsoft.com/cms/api/am/binary/RE5cmFk Mishkin, P., Ahmad, L., Brundage, M., Krueger, G., & Sastry, G. (2022).DALL·E 2 preview - risks and limitations.GitHub. Retrieved fromhttps://perma.c/X467-47PX Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., . . . Gebru, T. (2019). Model cards for model reporting. InFAT* ’19: Proceedings of the Conference on Fairness, Accountability, and Transparency(p. 220–229). doi: 10.1145/3287560.3287596 Morgan, M. G. (2014). Use (and abuse) of expert elicitation in support of decision making for public policy.Proceedings of the National Academy of Sciences,111(20), 7176–7184. doi: 10.1073/pnas.1319946111 Müller, R., & Drax, C. (2014). Fundamentals and structure of safety management systems in aviation. In R. Müller, A. Wittmer, & C. Drax (Eds.),Aviation risk and safety management: Methods and applications in aviation organizations(p. 45–55). Springer. doi: 10.1007/ 978-3-319-02780-7_5 Müller, V. C., & Bostrom, N. (2016). Future progress in artificial intelligence: A survey of expert opinion. In V. C. Müller (Ed.),Fundamental issues of artificial intelligence(p. 553–571). Springer. doi: 10.1007/978-3-319-26485-1_33 Newman, J. (2023).A taxonomy of trustworthiness for artificial intelligence.Center for Long-Term Cybersecurity. Retrieved fromhttps://cltc.berkeley.edu/publication/a-taxonomy -of-trustworthiness-for-artificial-intelligence/ Ngo, R., Chan, L., & Mindermann, S. (2023). The alignment problem from a deep learning perspective.arXiv preprint arXiv:2209.00626. NIST. (2023).Artificial Intelligence Risk Management Framework (AI RMF 1.0).https:// doi.org/10.6028/NIST.AI.100-1. NITI Aayog. (2018).National artificial intelligence strategy.Retrieved fromhttps://niti .gov.in/sites/default/files/2019-01/NationalStrategy-for-AI-Discussion -Paper.pdf OECD. (2023).Advancing accountability in AI: Governing and managing risks throughout the lifecycle for trustworthy AI.OECD Digital Economy Papers. Retrieved fromhttps:// w.oecd-ilibrary.org/science-and-technology/advancing-accountability -in-ai_2448f04b-endoi: 10.1787/2448f04b-en OpenAI. (2023).Announcing OpenAI’s bug bounty program.OpenAI Blog. Retrieved from https://openai.com/blog/bug-bounty-program#OpenAI OpenAI. (2023). GPT-4 technical report.arXiv preprint arXiv:2303.08774. Ord, T. (2020).The precipice. Hachette Books. Ostrom, L. T., & Wilhelmsen, C. A. (2019).Risk assessment: tools, techniques, and their applications. John Wiley & Sons, Inc. Parente, R., & Anderson-Parente, J. (2011). A case study of long-term Delphi accuracy.Technological Forecasting and Social Change,78(9), 1705–1711. doi: 10.1016/j.techfore.2011.07.005 Pei, J., Deng, L., Song, S., Zhao, M., Zhang, Y., Wu, S., . . . Shi, L. (2019). Towards artificial general intelligence with hybrid Tianjic chip architecture.Nature,572(7767), 106–111. doi: 10.1038/s41586-019-1424-8 Peng, B., Li, C., He, P., Galley, M., & Gao, J. (2023). Instruction tuning with GPT-4.arXiv preprint arXiv:/2304.03277. Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., . . . Irving, G. (2022). Red teaming language models with language models.arXiv preprint arXiv:2202.03286. Posner, R. A. (2004).Catastrophe: risk and response. Oxford University Press. Potts, H. W., Anderson, J. E., Colligan, L., Leach, P., Davis, S., & Berman, J. (2014). Assessing the validity of prospective hazard analysis methods: a comparison of two techniques.BMC Health Services Research,14, 41. doi: 10.1186/1472-6963-14-41 Pritchard, C. L. (2015).Risk management: Concepts and guidance. Auerbach Publications. Procope, C., Cheema, A., Adkins, D., Alsallakh, B., Green, N., McReynolds, E., . . . Zvyag- ina, P.(2022).System-level transparency of machine learning.Meta AI.Re- trieved fromhttps://ai.facebook.com/research/publications/system-level -transparency-of-machine-learning/ Raji, I. D., & Buolamwini, J. (2019). Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. InAIES 2019: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society(p. 429–435). doi: 10.1145/ 3306618.3314244 35 Raji, I. D., Kumar, I. E., Horowitz, A., & Selbst, A. (2022). The fallacy of AI functionality. InFAccT ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (p. 959–972). doi: 10.1145/3531146.3533158 Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., . . . Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. InFAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency(p. 33–44). doi: 10.1145/3351095.3372873 RAND. (n.d.).Delphi Method.Retrieved fromhttps://w.rand.org/topics/delphi -method.html Rao, A., Vashistha, S., Naik, A., Aditya, S., & Choudhury, M. (2023). Tricking LLMs into disobedi- ence: Understanding, analyzing, and preventing jailbreaks.arXiv preprint arXiv:2305.14965. Rausand, M., & Haugen, S. (2020).Risk assessment: Theory, methods, and applications. Wiley. Reason, J. (2000). Human error: models and management.British Medical Journal,320(7237), 768–770. Rees, M. J. (2004).Our final hour: A scientist’s warning: how terror, error, and environmental disaster threaten humankind’s future in this century on earth and beyond. Basic Books. Ritchey, T., Lövkvist-Andersen, A.-L., Olsson, R., & Stenström, M. (2004). Modelling society’s capacity to manage extraordinary events developing a generic design basis (GDB) model for extraordinary societal events using computer-aided morphological analysis. InSociety for Risk Analysis Conference. Roli, A., Jaeger, J., & Kauffman, S. A. (2022). How organisms come to know the world: Fundamental limits on artificial general intelligence.Frontiers in Ecology and Evolution,9. Rowe, G., & Wright, G. (1999). The Delphi technique as a forecasting tool: issues and analysis. International Journal of Forecasting,15(4), 353–375. doi: 10.1016/S0169-2070(99)00018-7 Rozo, M., Lawler, J., & Paragas, J. (2017). Viral agents of human disease: Biosafety concerns. In D. P. Wooley & K. B. Byers (Eds.),Biological safety: principles and practices(p. 187–220). ASM Press. Russell, S. J. (2019).Human compatible: Artificial intelligence and the problem of control. Viking. Salmon, P. M., Baber, C., Burns, C., Carden, T., Cooke, N., Cummings, M., . . . Stanton, N. A. (2023). Managing the risks of artificial general intelligence: A human factors and ergonomics perspective.Human Factors and Ergonomics in Manufacturing & Service Industries, 1–13. doi: 10.1002/hfm.20996 Schuett, J. (2023). AGI labs need an internal audit function.arXiv preprint arXiv:2305.17038. Schuett, J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L., Bluemke, E., & Garfinkel, B. (2023). Towards best practices in AGI safety and governance: A survey of expert opinion. arXiv preprint arXiv:2305.07153. Schweizer, V. J. (2020). Reflections on cross-impact balances, a systematic method constructing global socio-technical scenarios for climate change research.Climatic Change,162(4), 1705– 1722. doi: 10.1007/s10584-019-02615-2 Searle, J. R. (1980). Minds, brains, and programs.Behavioral and Brain Sciences,3(3), 417–424. doi: 10.1017/S0140525X00005756 Shah, R. (2022).AI risk from program search.AI Alignment Forum. Retrieved fromhttps:// w.alignmentforum.org/posts/wnnkD6P2k2TfHnNmt/threat-model-literature -review Shelby, R., Rismani, S., Henne, K., Moon, A., Rostamzadeh, N., Nicholas, P., . . . Virk, G. (2023). Identifying sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction.arXiv preprint arXiv:2210.05791. Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., . . . Dafoe, A. (2023). Model evaluation for extreme risks.arXiv preprint arXiv:2305.15324. Shneiderman, B. (2020). Bridging the gap between ethics and practice: Guidelines for reliable, safe, and trustworthy human-centered AI systems.ACM Transactions on Interactive Intelligent Systems,10(4), 1–31. doi: 10.1145/3419764 Soares, N. (2022).A central AI alignment problem: capabilities generalization, and the sharp left turn.AI Alignment Forum. Retrieved fromhttps://w.alignmentforum.org/ posts/GNhMPAWcfBCASy8e6/a -central -ai -alignment -problem -capabilities -generalization Stein-Perlman, Z., Weinstein-Raun, B., & Grace, K. (2022).2022 Expert survey on progress in AI.AI Impacts. Retrieved fromhttps://aiimpacts.org/2022-expert-survey-on-progress -in-ai/ 36 Stolzer, A. J., Sumwalt, R. L., & Goglia, J. J. (2023).Safety management systems in aviation. CRC Press. Suresh, H., & Guttag, J. V. (2021). A framework for understanding sources of harm throughout the machine learning life cycle.arXiv preprint arXiv:1901.10002. Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., . . . Hashimoto, T. B. (2023). Alpaca: A Strong, Replicable Instruction-Following Model.Stanford University. Retrieved fromhttps://crfm.stanford.edu/2023/03/13/alpaca.html The Vicuna Team. (2023).Vicuna: An open-source chatbot impressing GPT-4 with 90% ChatGPT quality.LMSYS Org Blog. Retrieved fromhttps://lmsys.org/blog/2023-03-30 -vicuna The White House.(2023).National artificial intelligence research and development strategic plan 2023 update.Retrieved fromhttps://w .whitehouse .gov/wp -content/uploads/2023/05/National-Artificial-Intelligence-Research-and -Development-Strategic-Plan-2023-Update.pdf UK Information Commissioner’s Office. (n.d.).Data protection impact assessments.Retrieved from https://ico .org .uk/for -organisations/uk -gdpr -guidance -and -resources/ accountability-and-governance/guide-to-accountability-and-governance/ accountability-and-governance/data-protection-impact-assessments/ UK Royal Academy of Engineering. (2023).Building resilience: lessons from the academy’s review of the national security risk assessment methodology.Retrieved fromhttps://raeng.org.uk/ policy-and-resources/engineering-policy/security-and-resilience/nsra Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual use of artificial-intelligence-powered drug discovery.Nature Machine Intelligence,4(3), 189–191. doi: 10.1038/s42256-022-00465 -9 US Environmental Protection Agency. (1998).Guidelines for ecological risk assessment.Retrieved fromhttps://w.epa.gov/risk/guidelines-ecological-risk-assessment van der Heijden, K. (2005).Scenarios: The art of strategic conversation. Wiley. Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P.-S., Mellor, J., . . . Gabriel, I. (2022). Taxonomy of risks posed by language models. InFAccT ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(p. 214–229). doi: 10.1145/ 3531146.3533088 Westerlund, M. (2019). The emergence of deepfake technology: A review.Technology Innovation Management Review,9(11), 39–52. doi: 10.22215/timreview/1282 Wiggers, K. (2023).Anthropic raises $450M to build next-gen AI assistants.TechCrunch. Retrieved fromhttps://techcrunch.com/2023/05/23/anthropic-raises-350m-to-build -next-gen-ai-assistants/ Wikipedia. (n.d.).List of causal mapping software.Retrieved fromhttps://en.wikipedia.org/ wiki/List_of_causal_mapping_software Xia, B., Lu, Q., Perera, H., Zhu, L., Xing, Z., Liu, Y., & Whittle, J. (2023). Towards concrete and connected AI risk assessment (C2AIRA): A systematic mapping study.arXiv preprint arXiv:2301.11616. Yampolskiy, R. V. (2015). Taxonomy of pathways to dangerous AI.arXiv preprint arXiv:1511.03246. Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom & M. M. ́ Cirkovi ́ c (Eds.),Global catastrophic risks(p. 308–345). Oxford University Press. Yudkowsky, E. (2023).Pausing AI developments isn’t enough. we need to shut it all down.Time. Retrieved fromhttps://time.com/6266923/ai-eliezer-yudkowsky-open-letter -not-enough/ Zendel, O., Murschitz, M., Humenberger, M., & Herzner, W. (2015). CV-HAZOP: Introducing test data validation for computer vision. InProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)(p. 2066–2074). doi: 10.1109/ICCV.2015.239 Zhang, B., Dreksler, N., Anderljung, M., Kahn, L., Giattino, C., Dafoe, A., & Horowitz, M. C. (2022). Forecasting AI progress: Evidence from a survey of machine learning researchers. arXiv preprint arXiv:2206.04132. Zwetsloot, R., & Dafoe, A. (2019).Thinking about risks from AI: Accidents, misuse and structure. Lawfare. Retrieved fromhttps://perma.c/H3CQ-SEQ9 37 Appendix: Other popular risk assessment techniques TechniqueExplanationReason(s) for excluding BrainstormingBrainstorming is a group exercise to generate and explore ideas. It works best when facilitated to ensure stimulation, avoid common fallacies (digressions, group-think, some participants not speaking up), and make sure that ideas are captured. Brainstorming can be used on its own or within other techniques. Included in many other techniques InterviewsIn structured interviews, interviewees are asked a set of prepared questions. In unstructured interviews, there is freedom to explore issues which arise. Semi-structured interviews are a mix of both pure forms. Interviews can be used on their own or within other techniques. Similar to surveys, but often more time-consuming SurveysSurveys are questionnaires that provide a way to get responses from a large number of people. Surveys can be used on their own or within other techniques. Note that surveys and interviews are overlapping and not necessarily distinct. Included in many other techniques which gather expert opinions Nominal group technique The Nominal group technique is a procedure to collect, explore, and decide on ideas. Participants answer questions independently and then each present their ideas. Next, participants discuss and aggregate these ideas, and finally vote on the ideas anonymously. Somewhat similar to Delphi technique, but not fully anonymous and most useful for making decisions about risks (which already is part of risk treatment) Cindynic approachThe Cindynic approach involves semi-structured interviews that are conducted with different stakeholders. Their answers are put together in a matrix. This helps to identify dissonances, inconsistencies, ambiguities, omissions, and ignorances between stakeholders. Somewhat similar to Delphi technique, but much less established and does not comprise exchange of reasoning between stakeholders Table 5: Techniques for eliciting views from experts and stakeholders 38 TechniqueExplanationReason(s) for excluding Hazard identification (HAZID) HAZID is a high-level technique. It involves conducting a workshop to identify risks, for example, through brainstorming (see above). Needs to be concretized through other techniques (e.g. brainstorming) Preliminary hazard analysis (PHA) PHA involves a preliminary collection of hazards through analyzing historical data, gathering expert forecasts, and brainstorming. Relatively superficial; does not provide a very specific or structured approach Failure modes and effects analysis (FMEA) and failure modes, effects and criticality analysis (FMECA) A system or process is divided into elements, for each of which failure modes, their causes and effects (FMEA), and optionally their criticality (FMECA) are identified through brainstorming. If quantified, a risk priority number can be calculated. All information is captured in a worksheet. Traditional hardware reliability technique; may be too simplistic for catastrophic risks from AI which involve complex interactions between various events (techniques like causal mapping or STPA may be more adequate); overlaps with fishbone method and bow tie analysis, which additionally provide visualizations Hazard and operability (HAZOP) Within a facilitated workshop, a system or process is divided into elements, for each of which possible deviations from design intent, their causes and effects are identified. The facilitator prompts the participants with a set of guide words like “no”, “less”, etc. All information is captured in a worksheet. Traditional hardware reliability technique; may be too simplistic for catastrophic risks from AI which involve complex interactions between various events (techniques like causal mapping or STPA may be more adequate); overlaps with fishbone method and bow tie analysis, which provide more freedom (no predetermined set of guide words) and visualizations What-if analysisWithin a facilitated workshop, a system or process is examined for potential risks and risk sources. The facilitator prompts the participants with “what if?” questions. Implicitly performed in fishbone method, which provides more structure (through categories of causes) as well as a visualization Structured what-if technique (SWIFT) Within a facilitated workshop, a system or process is examined for potential risks. The facilitator prompts the participants with combinations of “what if?” and a set of guide words on timing, amount, etc. SWIFT is relatively little time consuming. It can be used on its own or within other techniques. Similar to fishbone method, which provides more freedom (no predetermined set of guide words) as well as a visualization Business impact analysis (BIA) BIA is a method to identify critical business processes and functions. It aims to enable appropriate planning for disruptive events. BIA is undertaken using surveys, interviews, workshops or a combination of all of these. Limited to financial risks (does not include risks to the organization’s environment for their own sake) Process mappingProcess mapping involves drawing a graphical representation of the steps involved in industrial or other processes and identifying the associated risks. Relatively superficial; does not provide a very specific approach for identifying risks once the process has been mapped 39 Job hazard analysis (JHA) / job safety analysis (JSA) JHA/JSA are conducted before conducting a specific task. The task is broken down into its smallest sub-tasks, for each of which potential risks are brainstormed. Best for routine safety-critical tasks or individual but not very complex safety-critical tasks (such as evals – the technique is usually not used for these kinds of tasks because it would be extremely comprehensive and might not add much as safety is already paramount when designing them) Change analysisChange analysis aims to identify risks by comparing a known system with an unknown system. It can be done when considering or observing changes. Implicit within risk assessment (generally, risks should be assessed again whenever changes are considered or observed) Gap analysisSimilar to change analysis, gap analysis compares two states of a system. However, it starts from an ideal system and aims to identify shortcomings in the current system. Not very helpful if it is unclear what the ideal system would look like Root cause analysis (RCA) RCA is a high-level technique. It involves investigating a risk for its ultimate sources, especially after an incident has occurred. This is done, for example, through what-if analysis (see above). Needs to be concretized through other techniques (e.g. what-if analysis) Table 6: Techniques for identifying risks 40 TechniqueExplanationReason(s) for excluding Bayesian networksBayesian networks are graphical representations of events and include their probability based on other events and the probability of those. There is readily available software to build Bayesian networks. Similar to cross-impact analysis (cross-impact matrices and interdependency maps), which takes into account more complex interactions between events Influence diagramsBayesian networks are called influence diagrams when they include actions and uncertainties. There is readily available software to build influence diagrams. Similar to causal mapping and parts of cross-impact analysis (cross-impact matrices and interdependency maps), which take into account more complex interactions between events Event tree analysis (ETA) ETA involves drawing a graphical representation of the sequence of consequences resulting from a given initiating event (in binary terms of their failure / success). It is developed using forward reasoning to identify these consequences. ETA can be quantified to provide the probabilities of the identified possible outcomes. Traditional hardware reliability technique; may be too simplistic for catastrophic risks from AI which involve complex interactions between non-binary events (techniques like causal mapping or STPA may be more adequate); overlaps with several other techniques (event trees can help to develop scenarios and constitute the right side of bow tie diagrams); see also Section 3 Fault tree analysis (FTA) FTA involves drawing a graphical representation of the sequence of causes leading to a given undesired event, making use of Boolean logic (such as AND / OR). It is developed using backwards reasoning to identify these causes. FTA can be quantified to provide the probabilities of the identified possible failures. Traditional hardware reliability technique; may be too simplistic for catastrophic risks from AI which involve complex interactions between non-binary events (techniques like causal mapping or STPA may be more adequate); overlaps with several other techniques (fault trees are implicit in fishbone diagrams and constitute the left side of bow tie diagrams); see also Section 3 Cause-consequence analysis (CCA) CCA is a combination of ETA and FTA.Same reasoning as for ETA and FTA Master logic diagram (MLD) MLD is similar to FTA (see above), but not limited to binary failure/success events. It can also be understood to be a summary FTA. Same reasoning as for FTA (see above) Human reliability analysis (HRA) HRA is a family of techniques that aim to establish the likelihood of human error within a system. It involves identifying the steps and substeps of an activity, estimating the probability of human error and identifying the factors influencing this probability. Best for routine safety-critical tasks, for which data on human reliability exists Markov analysisMarkov analysis can be made of any system that has discrete, independent states. It uses the probabilities of the transitions between these states to estimate the long-run probability of the system being in a specified state, the expected time before its first failure and the expected time before its return to a specified state. May be too simplistic for catastrophic risks from AI which involve complex interactions between various events (techniques like causal mapping or STPA may be more adequate) 41 Monte Carlo simulations Monte Carlo simulations are a stochastic method that use random sample values to provide a probability distribution of in principle deterministic outcomes. They can be used on their own or within other techniques. Monte Carlo simulations tend to de-emphasize high consequence/low probability risks. Neglects low-probability risks; included in cross-impact analysis Value at risk (VaR)VaR is a measure that indicates the amount of possible loss of financial assets under normal market conditions over a specific time period. It can be developed using Monte Carlo simulations, historical simulations, analytical methods or a combination of all of these. VaR calculations for the tails are often unstable. Limited to financial risks (does not include risks to the organization’s environment for their own sake); neglects low-probability risks Conditional value at risk (CVaR) or expected shortfall (ES) In response to the issues of VaR calculations for the tails, CVaR/ES is the VaR that focuses on the outcomes generating the greatest loss. Limited to financial risks (does not include risks to the organization’s environment for their own sake) Table 7: Techniques for analyzing causes, consequences, and likelihood of risks 42 TechniqueExplanationReason(s) for excluding Hazard analysis and critical control points (HACCP) HACCP is a structure for identifying sources of risks and putting controls in place at all relevant parts of a process to protect against them. After identifying hazards, influencing factors, and possible preventive measures, the points in the process where monitoring and control are possible are determined. Then, a system of critical limits and corrective actions is established. May be too simplistic for catastrophic risks from AI which involve complex interactions between various events (techniques like causal mapping or STPA may be more adequate); already constitutes transition into risk treatment Layers of protection analysis (LOPA) LOPA is a simplified version of ETA (see above), namely a single cause-consequence pair. It involves analyzing the reduction in risk that is achieved by controls by identifying independent protection layers and estimating their individual and the overall probability of failure. Same reasoning as for ETA (see above) Table 8: Techniques for analyzing controls 43 TechniqueExplanationReason(s) for excluding Risk indicesRisk indices are indices that allow comparisons of different risks. Factors which are believed to influence the magnitude of risk are identified, scored and combined. In the simplest formulations, factors that increase the level of risk are multiplied together and divided by those that decrease the level of risk. Similar to risk matrices, which are more common As low as reasonably practicable (ALARP) and so far as is reasonably practicable (SFAIRP) ALARP/SFAIRP involve establishing criteria and categories accounting for both the acceptability of risks and the practicability to reduce them. These help to decide whether it is reasonably practicable to further reduce risk. Similar to risk matrices, which are more common Frequency-number (F-N) diagrams F-N diagrams are a special case of a risk matrix, namely one that focuses on fatalities. They are a graphical representation with x-axis as the cumulative number of fatalities and the y-axis as their frequency. Similar to risk matrices, which are more comprehensive S-curvesS-curves are a graphical display of the severity of consequences of a risk as a probability distribution function (PDF) or cumulative distribution function (CDF). They allow representing the significance of a risk where there is a distribution of consequences. Neglects low-probability risks Reliability centered maintenance (RCM) RCM is a version of FMECA (see above) that focuses on situations where potential failures can be eliminated or reduced in frequency and/or consequence through maintenance of equipment. It enables decisions to be made based on the significance of risk. RCM spans all three risk assessment steps. Only for hardware reliability Table 9: Techniques for evaluating risks 44