Paper deep dive
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello, Rodrigo Duarte de Meneses, Briland Hitaj, Ulf Lindqvist
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%
Last extracted: 3/12/2026, 5:25:53 PM
Summary
This paper provides a systematic review of security and privacy threats to LLM-based systems, categorizing threats and defensive strategies across the entire software and LLM life cycle. It analyzes real-world use cases, defines severity levels for threats, and maps defense strategies to specific life cycle phases to assist developers and organizations in building secure LLM integrations.
Entities (5)
Relation Signals (3)
OWASP → published → Top 10 list
confidence 99% · the Open Worldwide Application Security Project (OWASP) released a Top 10 list specific for this domain
LLM-based systems → isvulnerableto → Jailbreak
confidence 95% · From zero-click AI worms, to jailbreaks and backdoor attacks, LLMs are a constant target
LLM-based systems → isvulnerableto → Prompt Injection
confidence 95% · Research on prompt injection attacks by developing a formal characterization framework
Cypher Suggestions (2)
Find all threats associated with LLM-based systems · confidence 90% · unvalidated
MATCH (s:System {name: 'LLM-based systems'})-[:IS_VULNERABLE_TO]->(t:Threat) RETURN s.name, t.nameList all organizations and their published security guidelines · confidence 85% · unvalidated
MATCH (o:Organization)-[:PUBLISHED]->(d:Document) RETURN o.name, d.name
Abstract
Abstract:The success and wide adoption of generative AI (GenAI), particularly large language models (LLMs), has attracted the attention of cybercriminals seeking to abuse models, steal sensitive data, or disrupt services. Moreover, providing security to LLM-based systems is a great challenge, as both traditional threats to software applications and threats targeting LLMs and their integration must be mitigated. In this survey, we shed light on security and privacy concerns of such LLM-based systems by performing a systematic review and comprehensive categorization of threats and defensive strategies considering the entire software and LLM life cycles. We analyze real-world scenarios with distinct characteristics of LLM usage, spanning from development to operation. In addition, threats are classified according to their severity level and to which scenarios they pertain, facilitating the identification of the most relevant threats. Recommended defense strategies are systematically categorized and mapped to the corresponding life cycle phase and possible attack strategies they attenuate. This work paves the way for consumers and vendors to understand and efficiently mitigate risks during integration of LLMs in their respective solutions or organizations. It also enables the research community to benefit from the discussion of open challenges and edge cases that may hinder the secure and privacy-preserving adoption of LLM-based systems.
Tags
Links
- Source: https://arxiv.org/abs/2509.10682
- Canonical: https://arxiv.org/abs/2509.10682
PDF not stored locally. Use the link above to view on the source site.
Full Text
231,540 characters extracted from source content.
Expand or collapse full text
1 LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello, Rodrigo Duarte de Meneses, Briland Hitaj, and Ulf Lindqvist Abstract—The success and wide adoption of generative AI (GenAI), particularly large language models (LLMs), has at- tracted the attention of cybercriminals seeking to abuse models, steal sensitive data, or disrupt services. Moreover, providing security to LLM-based systems is a great challenge, as both traditional threats to software applications and threats targeting LLMs and their integration must be mitigated. In this survey, we shed light on security and privacy concerns of such LLM-based systems by performing a systematic review and comprehensive categorization of threats and defensive strategies considering the entire software and LLM life cycles. We analyze real-world scenarios with distinct characteristics of LLM usage, spanning from development to operation. In addition, threats are classified according to their severity level and to which scenarios they pertain, facilitating the identification of the most relevant threats. Recommended defense strategies are systematically categorized and mapped to the corresponding life cycle phase and possible attack strategies they attenuate. This work paves the way for consumers and vendors to understand and efficiently mitigate risks during integration of LLMs in their respective solutions or organizations. It also enables the research community to benefit from the discussion of open challenges and edge cases that may hinder the secure and privacy-preserving adoption of LLM-based systems. Index Terms—Cybersecurity, Threats, Risks, Mitigations, De- fenses, Threat Modeling, Use Cases, Generative AI, Large Lan- guage Models, LLM, Real-world Deployment. I. INTRODUCTION The widespread use of large language models (LLMs) has attracted the attention of many to this new artificial intelligence (AI) technology [220]. The ability of LLMs to engage in human-like conversations and provide answers to complex questions has changed the way users search for content online, with many adopting it into their daily lives. LLMs have be- come the source of information for many users, helping them to write, summarize, learn, and develop software, among other tasks. This evolution has extended to companies, as shown by an IBM study [140] in which nearly 80% of UK business leaders confirmed that they have already deployed LLMs or have plans to do so in their organizations, with the goal of enhancing customer experience, advancing modernization, and improving operational efficiency. But even though the benefits of this new technology are many, adopting it widely brings old and new security concerns to software developers. Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello, and Rodrigo Duarte de Meneses are with Instituto de Pesquisas Eldorado, Av. Alan Turing, 275 - Cidade Universit ́ aria, Campinas - SP, 13083- 898, Brazil (e-mail: vitor.moia@eldorado.org.br; igor.sanz@eldorado.org.br; gabriel.rebello@eldorado.org.br; rodrigo.meneses@eldorado.org.br Briland Hitaj and Ulf Lindqvist are with the Computer Science Lab, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA (e- mail: briland.hitaj@sri.com; ulf.lindqvist@sri.com First, LLMs are an attractive target for adversaries due to the sensitive information they contain. Privacy issues arise from the memorization of training data by LLMs, as demonstrated in empirical studies [2], [112], [279]. The origin of LLM information is largely the public Internet, but in some cases, it also contains proprietary data, obtained from an organi- zation and used during model fine-tuning. Thus, the model may unintentionally generate content during its responses that contain sensitive data (e.g., personal information, application programming interface (API) keys, proprietary data), illicit content (e.g., detailed instructions on how to build a bomb or generate malware), or copyrighted data. Second, developing and deploying LLMs is a complex and never-ending task. LLM-based systems inherit threats from common software applications and also present new threats specific to the LLMs and their integration. We define an LLM system as a set of software components responsible for leveraging an LLM to fulfill its goal, as a chat-bot or agent, or integrated into applications to perform specific tasks, and containing some of the following elements: user interface, APIs, databases, input and output processing modules, among others, including a foundation or fine-tuned LLM. The compo- nents may vary according to the purpose and requirements of the solution. Thus, vulnerabilities coming from the elements that compose the system should also be considered during the security assessment of an LLM-based system. Software vulnerabilities may exist in the development pipeline due to supply chain attacks aiming to expose busi- nesses’ secrets (e.g., source code, APIs, training data) via compromised plugins used by developers, or taking advantage of improper input validation in web API tools that could lead to remote code execution. We have seen vulnerabilities in com- mercial LLMs, such as ChatGPT [225], having problems with their open source software dependencies, leaking sensitive data from active users of the platform (e.g., first and last name, email address, payment address, credit card number, etc.). Many Common Vulnerability and Exposure (CVE) records are being disclosed for all types of software used during the development and deployment of LLMs, including CVE- 2024-8309, CVE-2024-28088, CVE-2024-3924, and CVE- 2024-10044. There are also initiatives that foster vulnerability discovery, such as bug bounty programs specific for AI-related software [139]. Known (classical) vulnerabilities [93], [161], [187], [222], [290], [296] also continue to be a problem for LLM-based systems. Particular threats to LLMs are also seen in recent news [46], [65], [155], [160], [260], [313]. From zero-click AI worms [60], to jailbreaks [46], [155], [260], [274] and arXiv:2509.10682v1 [cs.CR] 12 Sep 2025 2 backdoor attacks [312], LLMs are a constant target of new forms of attacks aimed at obtaining sensitive information, implanting malware, and influencing model behavior. Con- cerned with the risks and vulnerabilities in LLM development, the Open Worldwide Application Security Project (OWASP) released a Top 10 list specific for this domain [230]. Many works in the literature have also been presenting different forms of attacks to LLMs [7], [107], [137], [194], [228], [228], [270], [331], but gaps still exist regarding LLM security. With so many threats discovered so far, it is difficult to keep track and identify those applicable to a specific scenario with unique design characteristics. Given the limited resources available, it is paramount that we understand where to focus efforts while protecting LLM-based systems from the most critical threats. In this survey, we address some gaps in the field by designing and analyzing distinctive LLM use case scenarios under a security perspective, highlighting different design choices that may affect the security and privacy of systems. We perform a systematic review and characterization of threats and defenses considering the development and deployment of LLM-based systems, presenting different forms of executing attacks, defining a severity level for each threat, and classifying possible defense strategies according to the threat category they mitigate and to which software development life cycle phase they apply. By considering different LLM scenarios with specific design choices, we present an analysis of threats and defensive strategies to these scenarios. This work seeks to answer the following research questions: RQ1 What are the main use cases and design choices of LLM systems from a security perspective? RQ2 What are the major threats to real-world LLM-based systems and how can they be mitigated? RQ3 How do different use cases and design choices affect the security and privacy of LLM-based systems? By addressing these research questions, we hope to provide guidance for future development and deployment of secure real-world LLM-based systems in different scenarios and use of this technology. It is important to note that while this work focuses on security and privacy risks to LLMs, these concepts also apply to the entire GenAI field. In this paper, we first present in Section I the methodology adopted for the systematic review we performed, followed by a summary of the literature in Section I to compare our contributions to related surveys. In Section IV, we show the phases of the software and LLM life cycles considered in this work and define different LLM scenarios in which one can apply this technology. Section V presents the results of a review and a characterization of threats identified in LLM- based systems, followed by a severity level analysis of each threat in Section VI. In Section VII we categorize possible defensive strategies to be adopted by LLM developers and identify the software and LLM life cycle phase to which they can be applied. Based on the previously defined scenarios, in Section VIII we analyze the application of a threat modeling methodology to some LLM scenarios, highlighting threats and defenses. Section IX discusses results, open challenges, and limitations, and Section X concludes the paper. Table I presents the full list of abbreviations used in the paper. TABLE I GLOSSARY OF TECHNICAL TERMS AND SURVEY ACRONYMS Technical Terms Definitions APIApplication Programming Interface ASRAutomated Speech Recognition CSPCloud Service Provider CVECommon Vulnerabilities and Exposures CVSSCommon Vulnerability Scoring System DevSecOpsDevelopment, Security, and Operations GenAIGenerative AI GPSGlobal Positioning System LLMLarge Language Model LLMOpsLarge Language Model Operations MITMMan-in-the-Middle MLBOMMachine Learning Bill of Materials RAGRetrieval-Augmented Generation RCERemote Code Execution SBOMSoftware Bill of Materials SQLStructured Query Language SSDLCSecure Software Development Lifecycle TTPTactics, Techniques and Procedures Survey Acronyms Definitions A[id]Availability (Threat) AGAgent (Use Case) APIntegrated App (Use Case) ARAccess to Resources (Design Choice) C[id]Confidentiality (Threat) CBChat-bot (Use Case) CLContinuous Learning (Design Choice) DDevelopment DIDev. and Deploy Infrastructure (Design Choice) DPData Provenance (Design Choice) FMFoundation Model (Use Case) FTFine-Tuning (Use Case) I[id]Integrity (Threat) IOPrompt Input Origin (Design Choice) M[id]Mitigation OOperation RGRAG (Use Case) SIShared Infrastructure (Design Choice) SLSW Libraries and Dependencies (Design Choice) STStage UCUse Case I. METHODOLOGY This study follows a systematic approach to identify and select relevant work related to threats and defenses of LLM- based systems. The methodology is structured according to the Preferred Reporting Items for Systematic Reviews and Meta- Analyses (PRISMA) guidelines [110], a well-documented framework that helps make systematic reviews meaningful, transparent, and reproducible. Figure 1 presents an overview of the selection process in the form of a PRISMA flow diagram. Next, we detail the steps performed in each of the three phases illustrated in Figure 1. A. Identification Phase The first step to identify relevant work is to create a precise search query using Boolean operators, wildcards (*), and keywords based on research goals. The search query constructed for this work is as follows: Search query: (“LLM*”∨ “Language Model*”∨ “Generative AI”∨ “ChatGPT”∨ “GPT*”∨ “Chat- bot*”)∧ (“Cybersecurity”∨ “Security”∨ “Attack*” ∨ “Defense*”∨ “Protection”∨ “Risk*”∨ “Jail- break*”∨ “Threat*”∨ “Privacy”∨ “Vulnerabilit*” ∨ “Mitigation*”∨ “Prompt Injection*”∨ “Red Team*”∨ “Side-Channel”∨ “Membership Infer- ence”∨ “Backdoor*”) 3 Papers identified from databases: ACM Digital Library (29) IEEE Xplore (372) Google Scholar (309) arXiv (1507) Papers screened: 2003 Papers excluded: Duplicate papers (214) Identification of new studies via database search Identification Screening Inclusion Papers assessed for eligibility: 436 Final dataset: 198 Papers excluded: By title (1487) By abstract (80) Papers excluded: Lacked substantial discussions (106) Only tangential relevance (93) Surveys or review exisiting work (81) Papers included via other methods: 42 Fig. 1. Steps and results from the application of the PRISMA guidelines – the final list of papers comprises a total of 198 references. We conducted a comprehensive search using the aforemen- tioned query in ACM Digital Library, IEEE Xplore, Google Scholar, and arXiv. The search yielded a total of 2217 refer- ences, of which 214 were identified as duplicates and removed. Hence, 2003 papers continued to the screening phase. B. Screening Phase We performed a two-stage screening process to select the most relevant studies for this review, as described below. 1) Title/Abstract Screening: The first screening stage in- volved a preliminary review of the titles and abstracts of all references with the goal of removing publications that were not explicitly related to the security or privacy of LLMs. Studies that were deemed out of scope based on their title and abstract were excluded. This step reduced the analyzed list to 436 references, excluding 1487 papers based on their titles and 80 references based on their abstracts. 2) Full-text Screening: We reviewed the full-text versions of the remaining 436 studies with two goals: (i) ensure they are indeed high-quality articles relevant to the research objectives of this work; and (i) classify the papers according to their scope. A subjective analysis conducted by the authors filtered 106 works that lacked substantial discussion and 93 that presented only tangential relevance. The remaining references were classified into two types: (a) papers that propose specific attacks and mitigation for LLM systems, which are the object of study of this Systematization of Knowledge (SoK); and (b) references that provide reviews of LLM security and privacy literature, similar to this work. In Section I we analyze the 81 works classified as (b) and separate the relevant works for discussion. After completion of the eligibility assessment, 156 studies of type (a) remained. C. Inclusion Phase The final step of PRISMA involves investigating and re- trieving relevant works cited by the references selected in the screening phase. Throughout this process, we discovered 42 additional type (a) references that should be added to the list. The final list contains 198 references that propose either attack techniques (presented and discussed in Section V), defenses strategies (presented and detailed in Section VII), or both. I. RELATED WORK Several studies review the literature on the security of GenAI, and in particular LLMs. In this section, we present and compare some of these works with respect to the research methodology, scope, coverage, systematic analysis, and limita- tions. Namely, we consider the 81 references classified as type (b) during our selection process and position our contribution in relation to it. The studies discussed in this section are further categorized into three groups: (I-A) publications that analyze the po- tential impact of LLMs regarding security, safety and ethics; (I-B) papers that provide taxonomies of attack and defense techniques in the context of LLMs; and (I-C) works that addresses the security of LLM-based systems. A. Risks and Impacts of Generative AI Works in this category provide directions on which aspects of cybersecurity can change in the presence of LLMs. The authors of [277] focus on creating a comprehensive and cat- egorized database of AI risks extracted from various sources, dividing risks between causal taxonomy (i.e., characterizing by entity, intentionality, and timing factors) and domain taxonomy (separating by technological fields). As a timing factor, they split the risks considering two phases of the AI life cycle: pre-deployment and post-deployment. We follow this approach in the attack classification described in Section VIII. The authors of [14] explore the potential for attacks, defenses, and safety issues in GenAI, proposing short- and long-term goals for the research community. The work of [58] also provides a systematic investigation of safety risks in LLMs focusing on practical implementation of safety measures in the development stages of LLMs, such as training data, model training, prompting, model alignment, and scaling. Similarly, industry-oriented work [87] systematizes the security risks of Generative AI. Although they discuss several countermeasures to mitigate risks in the LLM development life cycle, these works do not address scenarios and issues that appear in LLM systems. References [74] and [109] discuss the security risks associated with LLMs, focusing specifically on ChatGPT, and how LLM safeguards can be bypassed. References [104] and [342] provide a comprehensive overview of privacy and security challenges in GenAI, covering technical, ethical, and regulatory aspects related to users and institutions. The impact of LLMs on cybersecurity in general, as well as safety and ethics concerns, is out of the scope of this work. Although we borrow concepts such as LLM life cycle phases from the aforementioned works to discuss security measures, our work differs from them by focusing on attacks against LLM systems rather than the consequences of misuse. 4 TABLE I COMPARISON OF RELATED SURVEYS. THE FILLED CIRCLES () INDICATE A TRUE CORRESPONDENCE OF A RESEARCH ASPECT TO THE WORK, HALF-FILLED CIRCLES () INDICATE A LIMITED CORRESPONDENCE, BLANK INDICATES AN ABSENCE OF THE ASPECT, AND A CROSS (×) INDICATES THE SPECIFIED LIMITATION IS PRESENT. Methodol.ScopeTargetAttack CoverageSoK AnalysisLimitation Related Work SurveySystematic Review Taxonomy Threats Mitigations LLM LLM-based SystemLLM-based Agents Database Supply-Chain Jailbreak PrivacyPoisoningDisruptionLLM App. FlawsLLM LifecycleDesign ChoicesAttack/Defense Map Use Cases Threat Model Risk Assessment Attack-SpecificModel-Specific Neel et al. (2024) [219]× Ahmed and Jothi (2024) [6]× Guo et al. (2022) [108] Huang et al. (2024) [133]× Yi et al. (2024) [333]× Rababah et al. (2024) [250]× Huang et al. (2024) [130] × Miranda et al. (2025) [211]× Peng et al. (2024) [243] × Chowdhury et al. (2024) [56] Liu et al. (2024) [199]× Chen et al. (2024) [49] × Ferrag et al. (2025) [90] Yan et al. (2025) [323]× Abdali et al. (2024) [1] Cui et al. (2024) [63] Das et al. (2025) [66] He et al. (2024) [114] Wang et al. (2024) [302] Gan et al. (2024) [97] Yao et al. (2024) [331] Cui et al. (2024) [64] Huang et al. (2024) [129]× This work B. LLM Attacks and Defenses While systematizing LLM-related risks may provide an overview of the landscape of offensive techniques, some stud- ies focus specifically on attacks to LLMs and their protection mechanisms. Research on privacy attacks conducted in [133], [211], [219], [257], [323], and [112] analyzes attack methods, their impacts, and corresponding mitigation strategies for securing LLM development, effectively proposing taxonomies for attack techniques and defensive measures. The authors of [130] focus on categorizing and evaluating harmful fine-tuning attacks against LLMs, while jailbreak attacks are systematized in depth by several authors following multiple approaches and taxonomies [6], [17], [30], [158], [197], [199], [243], [250], [250], [266], [270], [303], [319], [333], [333]. The work in [270] surveys real jailbreak prompts collected from online forums and evaluates their effectiveness against state-of-the-art models, while [319] analyzes nine jailbreak attack techniques and benchmarks their performance against their corresponding defensive countermeasures. The works in [158] and [17] focus on understanding how models can be jailbroken. In [32], [197], the authors empirically explore the effectiveness of jailbreak attacks against ChatGPT, while [31] explores LLM biases and robustness in the face of jailbreak prompts. In [30], the authors argue that current jailbreak evaluation processes present limitations and propose new ways to eval- uate such threat. Concerning prompt injection attacks, [266] studies real prompt injection attacks from a crowd-sourced collection approach and proposes a taxonomy for 29 prompt attack techniques. Research on [199] advances the state-of- the-art taxonomy of prompt injection attacks by developing a formal characterization framework, generating a benchmark for evaluating prompt injection attacks and defenses. All aforementioned works share the limitation of focusing on one attack class, either privacy, fine-tuning, jailbreak, or prompt injection, with the last two attack classes often being used interchangeably. On the other hand, [49] explores unique threats and mitigations of LLMs specialized in coding tasks. With a broader attack landscape, [63] and [108] explore recent research on LLM threats and vulnerabilities, discussing current challenges and open problems, while [56] and [1] provide comprehensive surveys of several attack classes, including jailbreaks, prompt injections, and data poisoning. Considering safety issues, [76] examines methods to attack LLMs focusing on conversational safety, exploring factors such as toxicity, discrimination, privacy, and misinformation. Our work not only aims to extend the coverage of attack classes presented in these papers but also includes threats against LLM systems and agents with specific architectures. Industry-oriented efforts also play a role in creating tax- onomies for LLM attacks. MITRE [288] created a knowledge base to catalog known adversarial tactics, techniques, and procedures (TTPs) relevant to AI systems, and the National Institute of Standards and Technology (NIST) [228] devel- oped a taxonomy and terminology in the field of adversarial machine learning. Although both initiatives are crucial to help standardize LLM attack techniques, they present the 5 same drawbacks as academic works by offering only limited coverage of attacks against LLM systems. Finally, [16], an international effort to provide guidelines on developing LLM systems safely, recommends threat modeling as a supportive process but does not explore it in depth. We address these gaps in this paper. C. Security of LLM-based Systems Works of this category are the most similar to ours in the sense that they not only discuss threats against the model, but also consider the system to which the LLM integrates. The work from [7] presents an overview of AI attacks and defense strategies and proposes a security checklist to be followed by AI developers when building an AI-based application, although it does not cover LLM-specific security aspects. The work in [66] reviews the security and privacy issues of LLM systems, with special attention to the architectural components of LLMs, analyzing two examples of scenarios in which LLM vulnerabilities are exploited. The authors of [331] proposed a taxonomy of LLM threats and defense techniques, including attacks that can be executed using LLMs and threats targeting the LLM-integrated application, such as Remote Code Execution (RCE), side-channel, and supply- chain attacks, although the respective mitigations are outside the scope of their paper. The research from [194] proposes LLMSmith, a static analysis tool to scan the source code of LLM systems to detect RCE vulnerabilities and classify real threats detected in the wild. In turn, [26] provides a comparative study of open-source LLM vulnerability scanners, assessing their performance and capabilities. Both [194] and [26] address real LLM system security issues, but do not explore threat modeling and use case scenarios in which the detected security issues could be exploited. Reference [90] explores six different scenarios in which attacks can occur. The survey in [64] proposes a comprehensive risk taxonomy in LLM systems and catalogs benchmarks for LLM safety and security evaluations. It also categorizes risks according to the corresponding modules of the architecture of the LLM system and proposes mitigation techniques, thereby providing an initial effort on threat modeling, although from a high- level perspective and limited to a specific type of application. The study in [287] explores the security of LLM-integrated applications related to six types of attacks and proposes a threat model framework based on the STRIDE [159] and DREAD [123] frameworks, providing an interesting study case of a custom-built LLM-powered application to demonstrate the proposed framework. The surveys in [114] and [97] review attacks against LLM agents, while [302] also addresses threats of LLM-related systems, such as the Retrieval-Augmented Generation (RAG) database. Finally, the survey of [129], instead, reviews the risks of the LLM supply chain. Table I presents a detailed characterization of the most relevant literature surveys that focus on threats to LLMs or LLM systems and compares different research aspects to our work. To the best of our knowledge, no prior work covers LLM threats and defensive strategies considering multiple LLM scenarios, with different use case and design choices that may affect security and privacy. Therefore, we analyze not only the LLM development life cycle, but also the integration between LLMs and common software development phases, inspired by real-world implementations. We assign severity score levels to the analyzed threats and map them to the LLM scenarios they apply. Defensive strategies are grouped and associated with the LLM development life cycle phases in which they should be implemented, making mitigation easier and more precise. Finally, we provide a threat modeling evaluation based on the STRIDE framework for different LLM scenarios, showing the security implications of different system design choices and recommending defensive strategies to attenuate threats. IV. CHARACTERIZATION OF LLM SCENARIOS Developing an LLM-based system is a multifaceted process that requires careful planning, execution, and continuous im- provement. Furthermore, the system security is intrinsically linked to securing the process itself, highlighting the need for a thorough threat assessment at every stage of the development and operation pipeline. Hence, it is imperative to understand the life cycle of an LLM-based system to elucidate how the threats examined in this paper may disrupt such a system. The LLM scenarios discussed in this paper are characterized by their life cycle stage (ST), use case (UC), and decisions for the existing design choices (DP, DI, SL, SI, IO, AR, CL), further explained in this section. To uniquely represent a combination of these variables, we propose a canonical representation of an LLM scenario defined by a string, LLM Scenario : = ST:∗/UC:∗/DP:∗/DI:∗/SL:∗/SI:∗/IO:∗/AR:∗/CL:∗ whereas the asterisks represent the possible configuration values from stage, use case, and the 7 design choices. Finally, we provide four examples of real scenarios created by this framework. A. Life Cycle of LLM-based Systems The life cycle of LLM-based systems has been conceptu- alized in various ways, leading to references that describe in detail the numerous actions involved from the creation of the model to its integration into a system and its operation [16], [234]. For clarity, this paper adopts a simplified model tailored to the classification of threats that aggregates such actions into phases. Figure 2 depicts such a model, with the vertical axis separating the Development stage, which encompasses all pre-production processes for both the LLM and the system (or software application that interacts with the LLM), from the Operation stage, which contains the processes occurring post-deployment. Similarly, the horizontal axis distinguishes the processes related to system development and deployment (DevOps) from the phases specific to the development and integration of LLMs (LLMOps). The phases involved in the system development are: • Planning Phase. In this initial phase, the development team delineates system requirements, schedules, mile- stones, and the overarching architecture of the system, 6 PlanningSystem Development Data Engineering LLM Development LLM Integration Operation DevOps LLMOps Development Operation Create LLM / Fine-tune / RAG Use off-the-shelf LLM Fig. 2. Adopted life cycle model of an LLM system. The phases can be classified along two axes according to their nature: system-related phases (DevOps) vs. LLM-related phases (LLMOps), and Development vs. Operation phases. including the selection of the appropriate LLM. A crit- ical decision during this phase is whether to develop a proprietary model through training or fine-tuning, which necessitates the collection and preparation of relevant data, or to integrate a general-purpose, off-the-shelf LLM directly into the system. Such a decision defines which path will be taken in the LLMOps cycle. • System Development Phase. This phase covers most of the software development life cycle, including coding, building, and testing. Note that independently of the planning phase decision regarding which LLM to use (create a new one, customize an existing one, or develop a RAG application), the system with the integrated LLM must be tested prior to deployment. The LLMOps process must be undertaken when a new model is to be developed, when a general-purpose model needs to acquire specific knowledge to enhance response quality in the application context, or even when an off-the- shelf LLM is just integrated into a system, requiring fewer steps in this case. For the particular case of providing specific knowledge to an LLM, this can be achieved through two primary approaches: model fine-tuning and RAG. For model fine-tuning, particular data is collected to retrain the model, enabling it to provide more specific and accurate responses. In contrast, the RAG approach seeks to construct a database of pertinent information, which the LLM consults prior to reasoning, thereby yielding more precise answers. Regardless of the approach, the LLMOps cycle comprises three phases: • Data Engineering Phase. In this initial phase, relevant data must be collected, analyzed, and prepared for the creation of the LLM or to assist in fine-tuning an existing LLM to align with the primary objectives of the solution. The goal is to acquire high-quality and diverse data to address a broad spectrum of cases. For RAG applica- tions, this phase involves collecting and curating data to establish an efficient database. • LLM Development Phase. This phase entails the cre- ation of a foundation model or the selection and fine- tuning of an existing one using the data prepared in the previous phase. For RAG applications, the database is built, and the RAG process is established through the selection of an efficient embedding and retrieval algo- rithm. Subsequently, the LLM is evaluated by developers to mitigate any unexpected behavior. • LLM Integration Phase. After the developed LLM is fully tested, it is integrated into the software, culminating in the release and deployment of the system. When the LLM is successfully integrated and tested, the software can be considered ready for operation: • Operation Phase. Upon deployment, the system tran- sitions into the production phase, in which both the developed software and the LLM operate together to provide the desired service. Continuous monitoring is implemented to collect user feedback and identify new data that may assist in aligning the LLM with the desired objectives. Although for simplicity, this iterative cycle is not depicted in the Figure 2, this feedback-induced process can be conceptualized as small iterative cycles of the LLMOps collect-train-integrate process [16]. This ensures that the model is perpetually fine-tuned based on observed data. This phase can also receive data related to system failures and any other anomalies. We emphasize that the life cycle refers to software com- panies that need to adapt LLMs to align with the specific requirements of an application. AI companies that develop general-purpose LLMs such as ChatGPT, DeepSeek, and Llama may follow a shorter version of the LLMOps process in which the model itself constitutes the final product, bypassing the integration phase. Moreover, the development of founda- tion models involves extensive data collection, substantially more than that required for fine-tuning or RAG, and involves training the model from scratch, which demands considerably greater effort. Using the proposed life cycle, we define high-level use cases for LLM-based systems to better characterize the unique threats they are subject to. We categorize the use cases using the two main stages of the life cycle: Development and Operation. Shaped by a combination of design choices, the use cases are derived into more granular cases that differ in variety, impact, and likelihood of threats. We refer to each unique variation of design choices within a use case as a scenario. The comprehensive list of all use cases discussed in this paper and their potential design choices is presented in Table I. In the following sections, we characterize the use cases and the aspects that may vary within them. B. Development Use Cases LLMs need large amounts of text data to be created from scratch. Some of the databases used to train LLaMA [292], 7 TABLE I MAP OF POSSIBLE DESIGN CHOICES FOR EACH DEVELOPMENT AND OPERATION USES CASES OF LLM SYSTEMS. CIRCLES INDICATE THAT A DESIGN CHOICE CAN HAVE SUCH CONFIGURATION FOR A SCENARIO, WHILE CROSSES INDICATE THAT IT IS NOT A VALID CONFIGURATION BY DEFINITION. WHEN A DESIGN CATEGORY IS NOT SUITABLE FOR EITHER (D) OR (O) USE CASES, IT IS MARKED AS NOT APPLICABLE. Development Use Cases (D)Operation Use Cases (O) Design Choice Foundation ModelFine-TuningRAGChat-botIntegr. AppAgent (FM)(FT)(RG)(CB)(AP)(AG) Public (U) Private (R)Data Provenance (DP) Hybrid (H) Not applicable On-Premises (P) On-Cloud (C) On-Device (D)× Dev. and Deploy. Infrastructure (DI) Hybrid (H)× Proprietary (P) Open-source (O)SW Libraries and Dependencies (SL) Hybrid (H) Yes (Y) Shared Infrastructure (SI) No (N) Text Field (T)× App (A)× Voice (V)× Prompt Input Origin (IO) Hybrid (H) Not applicable × No (N) Tools (T)× DBs (D) HW/SW Sensors (S)× Internet (I)× Access to Resources (AR) Hybrid (H) Not applicable × No (N) User Feedback (U)Continuous Learning (CL) Federated (F) Not applicable × GPT-3, and other popular LLMs include C4 [251], Common- Crawl [61], WebText2 [27], Github, Wikipedia, and others. Since this is an expensive process in terms of both human and computational effort, most applications reuse pre-trained models adapted for particular needs. Pre-existing foundation models, or off-the-shelf LLMs, can be obtained from public repositories on the Internet, such as Hugging Face 1 , and further trained to address a particular task of interest using techniques like fine-tuning [78] or RAG [172]. The three common use cases related to development phases of LLMs are summarized below: • Foundation Model Creation: The process of creating a general-purpose LLM, trained on large volumes of data, usually collected from public sources, such as blogs, news, social media, web repositories, etc. • Fine-Tuning: The process of taking a foundation model and specializing it for particular tasks using additional labeled data from public or private sources (or both). • Retrieval-Augmented Generation (RAG) Preparation (Offline): The process of constructing an additional knowledge database and a retrieval system that augments prompt requests with contextually relevant information to improve the accuracy and relevance of LLM responses. RAG is a way to specialize the LLM system without model retraining, which is usually more efficient than fine-tuning with respect to computational effort. 1 Available at: https://huggingface.co/ C. Operation Use Cases After creating, training and, if necessary, specializing an LLM, the next step is to deploy it. At this point, the purpose of the model and how users will access and interact with it must be clear. The different use cases related to the LLM operation considered in this work are presented below: • Chat-bot: In chat-bots, users can interact directly with the LLM system using a text field with instructions (prompts) such as questions and requests, mimicking human-like conversations. Chat-bots are arguably the core drivers of GenAI popularization in recent years, with the most notorious example being ChatGPT [223]. The key characteristic of this use case security-wise is that the user is in full control of what the LLM receives. • LLM-Integrated Application: In LLM-integrated appli- cations, users do not interact directly with the LLM, but via an application that queries an integrated model to solve specific problems [85], [107], [196], [307]. In this process, the application acts as an intermediary responsible for converting the user input into an efficient query to the LLM, obtaining its response, processing it, and finally presenting the result to the user. Some examples of this use case include code-generating tasks performed with LLM assistant plugins, LLM-powered summarization of user reviews in marketplaces, LLM- assisted web searches, copilots, and others. • LLM-based Agent: The last use case considers an LLM system that leverages an LLM to reason, take decisions and act [330]. For a particular task, the LLM-based 8 agent prompts an LLM, which is a central component, to perform multi-step reasoning and decide over predefined actions, such as invoking tools, calling APIs, querying databases or interacting with user or the environment. Agents are usually composed of a central system, a memory that stores past context, and a module to call and interact with external tools. LLM agents also have the ability to interact with each other and take deci- sions autonomously [180]. An example of agent-based application is Microsoft Copilot [210], a tool that assists Windows users with many different tasks, such as drafting documents, summarizing emails, enhancing work effi- ciency, etc. We can also employ LLM agents as Personal Assistants [180], or even for interactive environment sim- ulation, in which LLM agents can make inferences about themselves, other agents, and the environment [238]. Frameworks to support the creation, deployment, and management of such agents include Auto-GPT [11], LangChain [165], and AutoDroid [310] (for mobile). D. LLM System Design Choices When developing LLM systems, more specifically, in the planning phase, several aspects can affect security, as design choices may expose the LLM to different threats. The major aspects to consider are: • Data Provenance (DP): Concerns the type or source of data used for training the model for any of the cases presented in Section IV-B. For public data sources, the Internet is the default option, as it contains almost all publicly disclosed information ever registered. Most LLMs available today obtained training data from the Internet [130], [198]. For private data, data sources can include any private data source, such as network data, databases, or other sources of information with restricted access (e.g., sensitive information within the premises of a private company). However, since LLMs require massive amounts of data to train the models, is generally infeasible to use only private data due to its scarceness. Thus, a common practice is to adopt a hybrid approach that mixes both public and private data; public data is used to create a foundation model and private data is used to specialize the model to address specific problems within a private context. • Dev. and Deployment Infrastructure (DI): This de- cision concerns where the developing and deployment will occur. During the development phase, the choices for installing, configuring, and using tools, storing the collected data for training/fine-tuning/RAG, creating and manipulating the model, and dealing with any other sensi- tive information are restricted to the company’s premises or third-party Cloud Service Providers (CSP). During the model operation, one must decide where the LLM system will be deployed, with choices varying from user’s device [45], [272], [317], [334], company premises, and CSP. A hybrid choice is also possible for both scenarios. For instance, a common practice is to have an on-device application interface for user interaction while heavy data is processed on-premises/cloud. • Software Libraries and Dependencies (SL): Concerns the use of external libraries and dependencies. Consid- ering the complex environment and features expected by an LLM-based system, using only proprietary (in-house developed) libraries for developing an LLM-based system can be infeasible due to the amount of effort and human resources needed to maintain such tools. Thus, many developers either use Open-Source Software (OSS) or adopt a mix of OSS and proprietary software to deal with complex tasks. Security-wise, the use of OSS demands a deep analysis of available choices to avoid supply chain attacks or other maintenance risks. • Shared Infrastructure (SI): Concerns whether the same model is shared among different users. LLM deployment in external servers (on-premises/cloud) usually follows the shared infrastructure approach, in which each user has access to a particular instance of the same model – a practice known as LLM-as-a-Service (LLMaaS). Although less common, on-device deployments can also follow a similar architecture that deploys a unique LLM to be shared among applications [334]. In both types of deployment (on-device/external servers), sharing the infrastructure exposes LLM-based systems to additional threats if mitigation strategies are not implemented, espe- cially due to the possibility of unintended data leaks. In a shared infrastructure, the storage of context information per-user or per-app should be [334]. Isolated instances of LLMs can be a safer choice, but it can incur in extra financial costs or be constrained due to device resources. • Prompt Input Origin (IO): Prompt is the way users communicate with LLMs. There are different forms of how the LLM can receive a prompt. In chat-bot appli- cations, the usual way is via a text field, with users providing a text with their wishes to the LLM system directly. Another way is using template prompts adapted at runtime and supplied to the LLM via Apps. that integrate and use the model to solve complex problems [107]. A third option could be by voice [299], using a multi modal model [271] (out of scope of this survey) or using an LLM coupled with text-to-speech and speech-to- text models. For instance, GPT-3.5 and GPT-4 use three models to handle users’ voice: one to transcribe the audio to text, the LLM to process the text as a normal prompt, and a third model to transform the outputted text to audio [226]. It is also possible that an LLM system allows more than one of these methods, using one at a time. • Access to Resources (AR): One interesting feature of LLMs is their ability to interact with their environ- ment. Giving their vast knowledge base, they can be empowered to solve problems beyond the capabilities of natural language processing, and overcome some of their intrinsic limitations of having access to up-to- date information. All of this is achieved by the use of external resources, such as tools, databases (e.g., RAG), hardware and software sensors, and the internet. LLMs can be given the ability to interact with functions written by developers, APIs (including external ones), or other resources, using libraries such as LangChain [165], [166], 9 for instance. Some authors have shown that LLMs are also able to teach themselves how to use some tools, requiring just some demonstrations [264]. Besides calling functions to perform some specific operations, LLMs can also access databases to overcome issues related to performance and outdated information [114], [180]. To this end, RAG [172] can be used, giving the LLM access to a database of specialized knowledge without the costs of retraining (or fine-tuning) it. LLM-based agents can access device information via sensors — hardware (accelerometers, gyroscopes, GPS, etc.) and/or software (app usage, call records, typing), e.g., personal assistant [180]. For last, LLMs can also have the ability to access Internet data, obtaining data from a websites to perform some action (e.g., text summarization), process users’ emails, or fetch data from websites and repositories as part of its functionality [107], [224]. Note that an LLM system can have access to more than one resource type. • Continuous Learning (CL): After creating an LLM, we need to consider how we are going to keep it up-to-date, and even detecting and correcting mistakes. Some possi- bilities are using fine-tuning and RAG (Section IV-B), but we can also collect data from end-users during the system operation and use it to refine the model. One possible approach is using user feedback, that different from reinforced learning approaches (used in the alignment process), aims to obtain the end-user suggestion about the use of the LLM system to improve it, being a simple thumbs up/down about the LLM response, or using other feedback mechanisms [10], [227]. Federated Learning is another possibility to improve the performance of LLMs on specific tasks, using data from multiple entities that, without directly sharing (and exposing) sensitive data of users, can contribute to the training process. This is a well-known approach for ML algorithms, but can also be applied to LLMs [86], [162]. Although there are many interesting possibilities here, there are also threats that we need to be aware of depending on the characteristics we choose when deploying LLMs in real world. Each of these choices will affect the security of the whole system, so understanding the threats associated with possible scenarios can help mitigate potential risks and allow LLM developers to make better choices from a security perspective. E. Examples of LLM Scenarios and Possible Design Choices Some of the scenarios presented in Table I are detailed in this section. Here, we are not interested in how complex it is to build an application, training an LLM, the pros and cons of using a CSP, among other aspects. We focus on showing different forms of building and using LLMs in real-world scenarios and elucidating the threats associated with these scenarios and possible defensive strategies. Besides, we do not aim to present all possible combinations of scenarios and design choices but at least to cover the most important ones from a security perspective. 1) Development of an LLM for Chat-bot Application (ST:D/UC:FM/DP:U/DI:P/SL:H/SI:N/IO/AR/CL): The first example of LLM scenario is depicted in Figure 3. In this scenario, a company chooses to build an LLM-based Chat-bot application, using a model created from scratch. The data used for training the model is collected entirely from the Internet, from sources such as code repositories, social media, news websites, blogs, forums, and other public sources. This data is pre-processed and stored in a database on company premises. Next, the company develops all necessary code considering the whole software and LLM life cycle, such as model de- velopment, data analysis, training, and evaluation, combining open source libraries and proprietary code, and using a unique infrastructure to perform all activities. The company also creates the application interface, responsible for taking user inputs and presenting model output. Fig. 3. Scenario 1: LLM development process (on company premises) for a chat-bot application. 2) Chat-botApplicationonUser’sDevice(ST:O/ UC:CB/DP:U/DI:D/SL:O/SI:N/IO:T/AR:N/CL:N): Figure 4 presents an example of using LLMs deployed on the user’s device. This scenario is respect to a chat-bot application developed using only open-source code and an off-the-Shelf LLM trained on public data. The user can interact via prompts with the LLM, without the delay or need for an Internet connection. Besides faster responses, all data exchanged with the application remain on the device (a desirable privacy feature). Note that an API key may be used to restrict LLM access and functionalities. Although this is not a common scenario for LLMs, as we ad- vance in creating more specialized and compact models (e.g., using quantization, caching, and other techniques to reduce processing power and memory during LLM inference [228], [272]), we expect to see more examples of models deployed on smartphones and IoT [317], [334]. Google AICore is practical example of such deployment [29], acting like a system service, allowing apps to incorporate it. 3) LLM-IntegratedApp.on-Cloud(ST:O/ UC:AP/DP:H/DI:H/SL:H/SI:Y/IO:A/AR:H/CL:U):LLMs can also be integrated by applications to perform specific tasks, e.g., solving a mathematical problem [194]. The user accesses an application and provides a problem. The application creates a prompt to the LLM to solve it. The model returns a code able to solve the problem to the app, which executes it to get the answer. Finally, the application processes the result and displays the solution to the user. 10 Fig. 4. Scenario 2: A chat-bot application operating on users’ device with an off-the-shelf LLM. In this scenario, illustrated in Figure 5, we consider an application developed using a mix of open-source and propri- etary code, a foundation model obtained from public sources, trained on public data and fine-tuned to solve specific problems using private information. The LLM system runs on a shared infrastructure of a CSP, in a hybrid deployment configuration, with the app interface on user device, and the LLM and other resources on the external server. The LLM has the ability to execute tools in the environment and to access the internet as an additional information source. Upon user’s requests, the application can fetch content from the internet to help in solving a problem. Moreover, users can evaluate the application’s effectiveness in solving problems via a feedback feature available. Fig. 5. Scenario 3: An LLM-integrated application deployed on-cloud and with Internet access. 4) LLM-basedAgentforUserAssistance(ST:O/ UC:AG/DP:H/DI:H/SL:H/SI:Y/IO:H/AR:H/CL:F):The last example scenario presented here is the use of LLM-based agents, such as for user assistance. Figure 6 depicts this scenario. Upon receiving a task, the agent plans how to solve it, access its memory (past context data), execute tools, collect necessary environment data (from hardware or software sensors), and, in a autonomous manner, completes the task. This scenario considers a hybrid deployment, with some agents deployed on users’ device to receive tasks (via voice commands or text), collect data, and send all information to a central system (another LLM) deployed on cloud (shared infrastructure) for further processing. The LLM-based agent system is developed using proprietary and open-source code, trained on public and private data, and adopts federated learning for continuous learning. Fig. 6. Scenario 4: LLM-based Agent system deployed on user device and the cloud, with access to resources and tools. V. CHARACTERIZATION OF THREATS TO LLMS Given the widespread adoption of LLMs, threats to this technology have become increasingly common. In this section, we review different manners of abusing or damaging LLM sys- tems and present a new form of classifying and characterizing the threats identified in our literature review from Section I. Our approach centers on the Confidentiality-Integrity- Availability (CIA) triad model, similar to NIST’s classifica- tion [228], but we did not include the misuse category as NIST does because we understand this is a subcategory of the Integrity aspect. 2 We also explore the threats characteris- tics (e.g., the adversary knowledge, strategies to perform an attack to LLM systems, goals and targets, use cases, form of system interaction during the attack) and present other threats associated with solutions using this technology. Tables IV and V contain the list of threats to LLM sys- tems and their main characteristics, highlighting important information to understand the attack requirements, goals, and vulnerable scenarios. In Table IV, we present LLM threats that occur in development phases, mostly related to poisoning and supply chain, while in Table V, we explore the threats to LLMs in their operation. Each identified threat is marked to one aspect of the CIA triad, denoted by the letter of its ID. This means that the main type of damage a threat may cause to a system are: (C) stealing sensitive and valuable data; (I) manipulating or jailbreaking the guardrails and other security mechanisms designed to prevent model misuse; or (A) disrupting services and functionalities or wasting resources. Note that whenever we refer to a threat, risk, or possible attack to LLM systems in this paper, we will associate them with the threats listed in Tables IV and V. We further expand adversaries’ goals into the CIA aspects and define subgroups based on the similarities the attacking forms share, which we define as classes, as presented in the following for each goal. • Confidentiality: Adversaries aim to obtain: sensitive and valuable data, such as data used to train an LLM (C16); parameters or properties enabling them to create a shadow model (C13) (i.e., generating a model that mimics the behavior of the original model for specific tasks, based 2 Although jailbreak attacks may not change the model binary (a strong indication of integrity violation), we believe they can affect how the model will respond to users. A successful attack “breaks” the security mechanisms and guardrails, making the model behavior unpredictable and different from the one programmed by their developers. 11 TABLE IV LIST OF THREATS TO LLM DEVELOPMENT AND THEIR CHARACTERISTICS. IDThreatGoalClassTarget Attack Strategy Adversary Knowledge LLM Interaction Affected Use Case I01Training Data Poisoning [87] [94]LLM MisuseData InfectionData345Black-boxIndirectlyFM, FT, RG I02Model Poisoning [21] [34] [43] [53] [54] [79] [87] [131] [134] [138] [149] [179] [182] [183] [252] [305] [311] [325] [326] [329] [343] [348]–[351] LLM MisuseData InfectionLLM 345White-boxIndirectlyFM, FT, RG I03Fine-Tuning Poisoning [130]LLM MisuseData InfectionData45Black-boxIndirectlyFT, RG I04RAG Poisoning [51] [52] [164] [205] [346] LLM MisuseData InfectionData145Black-boxIndirectlyCB, AP, AG I05ModelDependenciesExploitation [228] LLM MisuseThird-Party SW CompromiseLLM System3Black-boxIndirectlyFM, FT, RG I06Fake Plugins [107]LLM MisuseThird-Party SW CompromiseLLM System 3Black-boxIndirectlyFM, FT, RG TABLE V LIST OF THREATS TO LLM OPERATION AND THEIR CHARACTERISTICS. IDThreatGoalClassTarget Attack Strategy Adversary Knowledge LLM Interaction Affected Use Case A01Disrupting Search Results [107]Service DisruptionBad LLM ResponseLLM2 310Black-boxIndirectlyCB, AP, AG A02Inhibiting Capabilities [107]Service DisruptionBad LLM ResponseLLM 2310 Black-boxIndirectlyCB, AP, AG A03Output Hijacking [5] [194] [236]Service DisruptionBad LLM ResponseLLM System 358Black-boxIndirectlyCB, AP, AG A04Bit-Flip Attack [68] [254] [328]Service DisruptionBad LLM ResponseInfrastructure58 11 White-boxIndirectlyCB, AP, AG A05Sponge Examples [276]Service DisruptionResource DrainLLM 12 10Black-boxMulti-turnCB A06Time Consuming [107] [163]Service DisruptionResource DrainLLM2310Black-boxIndirectlyCB, AP, AG A07Token Wasting [99] [196]Service DisruptionResource DrainLLM System 678 Black-boxIndirectlyCB, AP, AG C01Embedding Inversion [193] [214]Data StealingInferenceLLM System 678White-boxIndirectlyCB, AP, AG C02Gradient Inversion [132] [247]Data StealingInferenceLLM System 568 White-boxIndirectlyCB, AP, AG C03Model Fingerprinting [218]Data StealingInferenceInfrastructure 511Black-boxIndirectlyCB, AP, AG C04Token-length Inf. [309]Data StealingInferenceInfrastructure 5611 Black-boxIndirectlyCB, AP, AG C05GPU Information Leak [204]Data StealingInferenceInfrastructure 5 11Black-boxIndirectlyCB, AP, AG C06Shared-cache Hit Inf. [278] [352]Data StealingInferenceInfrastructure 111 Black-boxMulti-turnCB, AP, AG C07User Data Exfiltration [22] [263]Data StealingInferenceLLM 2Black-boxOne-shotCB, AP, AG C08Membership Inf. [8] [38] [44] [96] [207] [208] [212] [228] [259] [275] [345] Data StealingInferenceLLM 1Black-boxMulti-turnCB C09Distribution Inf. [112] [228] [257] [284]Data StealingInferenceLLM1Black-boxMulti-turnCB C10User Inference [153]Data StealingInferenceLLM1Black-boxMulti-turnCB C11Attribute Inference [112] [174] [279]Data StealingInferenceLLM 1Black-boxMulti-turnCB C12Memorized Data Extraction [35] [257]Data StealingExtractionLLM 1 Black-boxMulti-turnCB C13Model Replication [24] [230] [257]Data StealingExtractionLLM 1Black-boxMulti-turnCB C14Model Reverse Engineering [45] [111] [353]Data StealingExtractionLLM 9 Black-boxIndirectlyCB, AP, AG C15API Key Stealing [194]Data StealingExtractionLLM System2710Black-boxIndirectlyCB, AP, AG C16Prompt Leaking [3] [4] [245]Data StealingExtractionLLM System2 710Black-boxIndirectlyCB, AP, AG C17System Dependencies Exploitation [296]Data StealingExtractionLLM System36Black-boxIndirectlyCB, AP, AG I07Byzantine Attack [311]LLM MisuseData InfectionLLM4 Black-boxIndirectlyCB, AP, AG I08Feedback Poisoning [7] [41]LLM MisuseData InfectionLLM14Black-boxIndirectlyCB, AP, AG I09InstructionManipulation[69] [136] [146] [196] [197] [255] [266] [269] [282] [308] LLM MisuseLLM PromptingLLM 12Black-boxOne-shotCB, AP, AG I10Obfuscation[18] [37] [91] [154] [173] [176] [197] [255] [266] [335] LLM MisuseLLM PromptingLLM 12Black-boxOne-shotCB, AP, AG I11Pretending[57] [70] [72] [135] [197] [206] [255] [268] [270] LLM MisuseLLM PromptingLLM12Black-boxOne-shotCB, AP, AG I12Noise-based Attack [324]LLM MisuseLLM PromptingLLM12Black-boxOne-shotCB, AP, AG I13Recursive Prompt Hacking [266]LLM MisuseLLM PromptingLLM12Black-boxOne-shotCB, AP, AG I14RCE [15] [194] [197]LLM MisuseLLM PromptingLLM2Black-boxOne-shotAP, AG I15Prompt-to-SQL injections [194] [241]LLM MisuseLLM PromptingLLM12Black-boxOne-shotAP, AG I16Copied Prompt Injection [263]LLM MisuseLLM PromptingLLM2Black-boxOne-shotCB I17Fuzzing [105] [327] [336]LLM MisuseLLM PromptingLLM1Black-boxMulti-turnCB, AP, AG I18Context Manipulation [23] [25] [46] [91] [154] [197] [255] [261] [262] [266] LLM MisuseLLM PromptingLLM1Black-boxMulti-turnCB I19LLM Jailbreak Helper [37] [67] [189] [318]LLM MisuseLLM PromptingLLM1Black-boxMulti-turnCB I20LLM Jailbreak Helper w/feedback [40] [75] [77] [147] [170] [175] [184] [195] [209] [337] LLM MisuseLLM PromptingLLM1Black-boxMulti-turnCB I21Function Calling Jailbreak [314]LLM MisuseData InfectionLLM 2 Black-boxIndirectlyCB, AP, AG I22Multi-stage Exploit [107]LLM MisuseLLM PromptingLLM 2 Black-boxIndirectlyAP, AG I23Agent Infection [60] [168]LLM MisuseLLM PromptingLLM2Black-boxIndirectlyAG I24Output metadata Manipulation [9] [80] [181]LLM MisuseLLM PromptingLLM1White-boxMulti-turnCB, AP, AG I25Chat Template Injection [144] [347]LLM MisuseLLM PromptingLLM1 White-boxMulti-turnCB I26Optimization-based Attack [125] [126] [143] [178] [273] [320] [339] [354] LLM MisuseLLM PromptingLLM 1White-boxMulti-turnCB I27Optimization-based Attack w/ LLM Helper [185] [192] [341] [348] LLM MisuseLLM PromptingLLM1White-boxMulti-turnCB I28Jailbreak Bit-Flip Attack [59]LLM MisuseHW ManipulationLLM 568 White-boxIndirectlyCB, AP, AG 12 on inputs and responses collected through queries [87]); data learned by the LLM such as businesses secrets (Intellectual Property) (C12); API keys that grant access to a system (C15); or inputs provided by users that may contain sensitive data (C16). These attacks can be further classified into extraction or inference, based on the form the information is obtained from the target – either directly or estimated based on data obtained, respectively. • Integrity: Adversaries aim to tamper with the LLM system, by poisoning the data used to train (or update) the model (I01, I03-04), changing model parameters and inserting backdoor (especially in those pre-trained models shared in public repositories) (I02), manipulating system dependencies (I05), or even jailbreaking the security mechanisms and guardrails of the LLM to manipulate its behavior (I09-28). By changing the data or how the model will respond, the adversary can disseminate disinformation, hate speeches, discrimination, and other offensive/biased content, distribute malware, obtain sen- sitive data from the LLM (restricted content, such as “how to build a bomb”), cause malfunctioning, etc. These attacks focus on compromising the software supply chain (Third-Party SW Compromise), poisoning training data or auxiliary files used by the LLM system (Data Infection), or sending malicious prompts to the system to perform jailbreak (LLM Prompting). Another form of achieving jailbreak is via model manipulation during runtime (HW Manipulation). • Availability: Adversaries aim to take down the LLM system, making the model provide useless results (A01- 04), causing instability in the service (A05-06), or causing monetary losses to individuals or organizations (consum- ing a paid service in the name of the target victim)(A07). Attacks on availability are motivated by financial gain (via extortion), brand damage, or as a form of hacktivism. These attacks can focus on consuming all resources from the infrastructure or users (Resource Drain) or making the LLM system produce useless results as response to legitimate users (Bad LLM Response). A threat aims to cause damage on a specific target. In our analysis, we set as target the main element in which defenders should focus their attention to build defenses. In case of LLM systems, these targets can be the: 1) LLM: Causing denial of service (including performance degradation), data/behavior manipulation, and/or leakage of sensitive data. 2) LLM System: Causing denial of service (inability to access the model or financial loss), remote code exe- cution on the target system, and/or leakage of sensitive data (such as users API keys). The system includes the application interface and other associated services. 3) Data: Aiming to manipulate (poison) or steal sensitive and confidential data used in the training or model specialization process (fine-tuning or RAG). 4) Infrastructure: Having leakage of proprietary and sen- sitive data (source code or other intellectual property content), inability to provide services, brand damage, and/or financial loss. Other important aspects of our characterization involve the attack strategy (Section V-A), how the adversary interact with the LLM system (Section V-B), and also the LLM use case (Sections IV-B and IV-C), since different scenarios will be vulnerable to a particular set of threats. In the next subsections, we detail the first two aspects, and in Section VIII, we provide an analysis of the third. A. Attack Strategies There are many forms of attacking LLM systems. Adver- saries can attack the model directly, instructing it to perform a malicious or non-allowed action, via direct prompts, or indirectly, by having the model processing a malicious content obtained from an external source. LLM systems can also be exploited using traditional attacks, targeting the authentication process, deployment infrastructure, among other elements and application services. These different attack forms are referred here as attack strategies. Next, we present the strategies an adversary can adopt to attack an LLM system. In our threat characterization, we consider that adversaries will use at least one of these strategies to cause harm, although we understand that some threats can be executed using different strategies. We also map the attack strategies to the components of a general LLM system architecture to which they can be applied, encom- passing parts of the development and deployment processes, as depicted in Figure 7. 1 Direct Prompt: LLMs, especially chat-bots, allow users to submit questions (via a prompt) directly to the model. Adversaries can exploit this feature to perform jailbreak attacks, obtain sensitive data, or cause service disruption at some level. 2 Indirect Prompt: According to [230], Indirect Prompt refers to another form of sending data to an LLM, but instead of supplying it directly, via a prompt, an external source is used, such as websites, files, or emails. For instance, consider an LLM system responsible for man- aging user’s email. An adversary targeting a particular user using such system could hide a prompt injection in the content of an email and send it to the victim. When the LLM process the email (e.g., to summarize it for the user), the content (and malicious prompt) are processed and the malicious action may take effect. This strategy is used to jailbreak or perform service disruption to an LLM system, as well as steal user’s data. 3 Supply Chain: By exploiting third-party LLM system dependencies or pre-trained LLMs (e.g., changing their parameters), adversaries can manipulate (tamper) a sys- tem component to create biased outcomes or introduce security breaches to steal sensitive data or perform system disruptions [231]. Another threat example is embedding malware to the model (e.g., when it is loaded, a malicious code is executed [118], [119], [232]). 4 Poisoning: An adversary manages to have access to and manipulate the data used in the processes of pre-training, fine-tuning, or RAG of the victim LLM to introduce 13 Fig. 7. Possible attack strategies on a general view of an LLM-based system architecture, from development to operation. vulnerabilities, backdoor, or biased behavior [232]), af- fecting the system performance and its reliability. The user feedback can also be a form of poisoning the system [7], [41]. 5Insider: Another threat to LLM systems originate from inside an organization (by its employees) or partners, with the adversary having special privileges to or knowledge about the training data, repository source code, deploy- ment infrastructure, the LLM itself, RAG context data, etc. With this special access and knowledge, the adversary can execute actions to cause harm to the system or steal sensitive information (including Intellectual Property). The motivation for this type of adversary goes from retal- iation to monetary gain, but also includes incompetence, with employees introducing vulnerabilities in the system due to a lack of expertise or by mistake. 6Known Vulnerabilities: An adversary can attack an LLM system by exploiting known software vulnerabilities present in the system, its dependencies, from the infras- tructure where the system is deployed, and also from the communication channel between user and infra (when the LLM is deployed in an external server). Examples include vulnerabilities in the hosting operating system, web server, TLS communication (use of an outdated version, such as the v1.0), among others. Note that in Supply Chain vulnerabilities, adversaries exploit system dependencies to introduce vulnerabilities, while here, the adversary discovers an existing vulnerability (e.g., originated from a bug) and leverages it to cause harm. 7Credential Stealing: Adversaries aim to obtain access to unauthorized data and system. To this end, they may apply traditional attacks such as phishing or use malware to steal users or developers credentials, so they can have access to the LLM system, training data, code repositories, or other important resources. Besides access to sensitive data, they can also cause financial loss to the victim and restrict their access to the system. 8Exploiting Security Mechanisms: By exploiting the lack of security mechanisms or their proper configuration, adversaries can bypass such defenses and infiltrate in the system to obtain sensitive data, install malware, or cause denial of service. Some security mechanisms that can be exploited are the system authentication process or communication channel. 9Reverse Engineering: LLM systems deployed directly on user’s device (i.e., the LLM is running locally on the device rather than a cloud infrastructure) come with the risk of reverse engineering [45]. This practice allows adversaries to have access to the model, its structure, and metadata (weights and other internal parameters). Adversaries can understand how the model works, modify it to remove jailbreak security mechanisms, apply side- channel attacks easier, and also obtain confidential data, including businesses secrets (which is the LLM itself). 10 Malware: A malicious application is used to cause damage (e.g., service degradation or disruption) or steal confidential LLM-related data (e.g., API keys and user supplied input, which may contain sensitive data or business secretes). The malware can be installed in user’s device or in the LLM system deployed network. 11Side-Channel: This attack strategy takes advantage of side effects that occur during the system usage, and by monitoring them, the adversary can extract sensitive information. Attacks consist in the analysis of power consumption, acoustic, timing, electromagnetic, system cache, among others. The attack usually requires physical access to the victim device and, in many cases, especial equipment to capture the information leakage. Insiders can perform side-channel attacks due to their physical access to cloud infrastructures, for instance, but other forms of executing this attack strategy, e.g., remotely by the network, can also be found. B. Adversary Interaction with the LLM To perform an attack using the strategies previously de- scribed, adversaries need to interact with the target LLM system to submit commands and text to try to bypass the security mechanisms or exploit it. We are considering three forms of interaction with the system, as presented next. Note that depending on the interaction, attacks may be easier or more complex to perform; the same applies for defending against them. • One-shot: Adversary only needs a single interaction, via a specifically-crafted prompt, to exploit the model. 14 • Multi-turn: Adversary needs two or more interactions with the target LLM system to exploit it. In this form, the adversary takes advantage of the model’s ability to retain and process context over multiple prompts [46]. • Indirectly: Adversary exploits the LLM by attacking the infrastructure, dependencies, or source of information used by the model when performing tasks. VI. ANALYSIS OF THREAT SEVERITY LEVELS Attacking LLM systems can be simple or extremely difficult depending on adversary goals and how they choose to execute the attack. No group, company, or government wishing to deploy LLMs has enough resources to eliminate every possible threat. Besides, some threats can be so expensive for the adversaries to execute (in terms of computational resources or preparation) that the reward obtained in case of a success is not worth it, or by the time the adversary completes the attack, the data stolen, for instance, has no value anymore. For such reasons, it is paramount to prioritize the limited resources available to implement defenses against the most significant threats for a particular business or scenario. In this section, we analyze the threats presented in Section V according to the likelihood of each threat occurring and the impact in case of a successful attack. We employ two method- ologies commonly used by the security community to assess the severity of a threat or vulnerability: The OWASP Risk Rating 3 and the CVSS v3.1 4 (Common Vulnerability Scoring System). Note that CVSS has many versions, including a more recent one (v4.0). However, we will use version 3.1 since, by the time of writing this paper, it is still the version most adopted by the security community, including in the CVE database of software vulnerabilities. To ensure consistency in assessing the severity of all threats, we followed a uniform approach driven by a set of assump- tions. For each threat from Tables IV and V, we devised a scenario illustrating how an adversary might carry out the attack to estimate a severity level, considering a particular attack strategy. We assumed that the target LLM system is deployed in a manner that meets the threat prerequisites and has a vulnerability (due to a lack of security mechanisms or improper use or configuration of them) that allows an adversary to exploit it via the considered threat. Whenever possible, we used the same considerations as the authors who presented the threat. Although methodologies like the OWASP Risk Rating or CVSS offer detailed information on how to compute the severity level of a vulnerability, there is still subjectiveness in the process. Therefore, these scores reflect our interpretations and are susceptible to discrepancies, serving as references rather than exact measurements. For example, if new conditions or alternative attack methods are discovered by the time a particular threat is analyzed, this may lead to a different interpretation. However, with the information available today, we do not expect the analysis 3 OWASP Risk Rating Methodology, available at https://owasp.org/w-c ommunity/OWASP RiskRatingMethodology (last accessed at 2025.07.25) 4 CVSS v3.1 Specification, available at https://w.first.org/cvss/v3.1/spe cification-document (last accessed at 2025.07.25) to vary significantly. Another point of attention is that we considered only the technical impact of the threats on CIA aspects, not the business impact, which could vary from one company to another. Table VI summarizes the main results. We represent each threat of Tables IV and V by their ID, and present the attack strategy considered during our estimation. Results are listed for the OWASP Risk Rating and CVSS v3.1 methodolo- gies, including the overall score (and its constituting values - Exploitability/Likelihood and Technical Impact). We also include a label indicating the severity of the threat (Low, Medium, High, or Critical), and a string vector summarizing the choices made during the estimation for each parameter of the corresponding methodology. For the OWASP methodology, the string vector is composed of 12 characters, with each string position representing one parameter of the methodology, having specific and predefined values (the 0 is the minimum and 9 the maximum). The parameters are (in order of appearance): Skill Level, Mo- tive, Opportunity, Size, Ease of Discovery, Ease of Exploit, Awareness, Intrusion Detection, Loss of Confidentiality, Loss of Integrity, Loss of Availability, and Loss of Accountability. The CVSS follows a similar way of representing the string vector, but this methodology adopts 8 (eight) different param- eters to compute the severity score. Each string position has some possible values, represented by their initial letter. The parameters are (in order of appearance): Attack Vector, At- tack Complexity, Privileges Required, User Interaction, Scope, Confidentiality Impact, Integrity Impact, and Availability Im- pact. Figure 8 presents the meaning and possible values of each character of CVSS and OWASP Risk Rating strings of Table VI (for a complete description, see their references). The CVSS also has an official calculator to assist in computing the severity level, and based on the string that is inputted in the URL, it can fills the fields automatically. We provide the links to the calculator in the string vector representation. In this analysis, we adopted CVSS v3.1 using only the Base Score Metrics, but it also provides other components to customize the severity score assessment, such as the Temporal (analyzing the existence of exploits, patches, and credibil- ity about the vulnerability) and Environment Score Metrics (allowing a score customization based on asset importance). However, we will not use these other parameters since some of them are specific for a business case and others only apply to vulnerabilities, not threats. In the same way, we choose to ignore the Business Impact factor of OWASP methodology. There are a few differences among the parameters consid- ered in the methodologies. For instance, OWASP takes into consideration the skill level of adversaries choosing to exploit a vulnerability, while CVSS does not. One can argue that with many exploits available today and the rise of LLMs, one can easily obtain enough knowledge to execute some attacks, and therefore, attacker knowledge is not an obstacle that needs to be considered anymore. However, what is more important to consider may be the resources required to perform an attack. This aspect is reflected by both methodologies, the Attack Complexity in CVSS and Opportunity in OWASP. Another difference between the methodologies is that CVSS 15 TABLE VI SEVERITY SCORE OF THREATS TO LLMS USING CVSS AND OWASP METHODOLOGIES. ID Attack Strategy CVSS 3.1OWASP Risk Rating ExploitabilityTech. ImpactOverall ScoreStringLikelihoodTech. ImpactOverall ScoreString A01 2 1.64.25.9 (MEDIUM)NHNRUNLH5.85.833.1 (MEDIUM)917933680599 A0221.64.25.9 (MEDIUM)NHNRUNLH5.85.833.1 (MEDIUM)917933680599 A0382.25.27.4 (HIGH)NHNNUNHH4.86.832.1 (HIGH)914933630999 A045 0.24.24.4 (MEDIUM)PHHNUNLH4.14.317.5 (MEDIUM)914231490377 A051 2.81.44.3 (MEDIUM)NLLNUNNL4.52.09.0 (LOW)917671410071 A0621.63.65.3 (MEDIUM)NHNRUNNH5.84.023.0 (MEDIUM)917933680079 A07 8 1.64.76.9 (MEDIUM)NHNRCLNH6.86.040.5 (CRITICAL)994933986099 C018 0.73.64.4 (MEDIUM)NHHNUHNN4.14.518.6 (MEDIUM)990233619009 C02 5 0.73.64.4 (MEDIUM)NHHNUHNN3.94.517.4 (MEDIUM)990231619009 C03 11 0.41.41.8 (LOW)PHLNULNN5.32.311.8 (LOW)914673482007 C0461.03.64.7 (MEDIUM)LHNRUHNN3.93.312.6 (MEDIUM)990431416007 C05 50.23.63.8 (LOW)PHHNUHNN4.63.315.0 (MEDIUM)914271496007 C0611.21.42.6 (LOW)NHLRULNN4.11.87.2 (LOW)914633436001 C0721.61.43.1 (LOW)NHNRULNN6.53.824.4 (HIGH)947933986009 C0811.61.43.1 (LOW)NHLNULNN4.51.87.9 (LOW)617631486001 C091 1.61.43.1 (LOW)NHLNULNN3.91.86.8 (LOW)614611486001 C1011.61.43.1 (LOW)NHLNULNN4.11.87.2 (LOW)614631486001 C1111.63.65.3 (MEDIUM)NHLNUHNN4.52.511.3 (LOW)644631489001 C12 11.63.65.3 (MEDIUM)NHLNUHNN5.82.514.4 (LOW)647633989001 C13 11.63.65.3 (MEDIUM)NHLNUHNN5.42.513.4 (LOW)640691989001 C1490.43.64.0 (MEDIUM)PHLNUHNN7.34.532.6 (HIGH)994693999009 C152 1.63.75.8 (MEDIUM)NHNRCLLL7.16.344.5 (CRITICAL)997933986379 C1621.61.43.1 (LOW)NHNRULNN6.33.823.4 (HIGH)997933916009 C1763.91.45.8 (MEDIUM)NLNNCLNN6.13.319.9 (HIGH)997635916007 I01 3 1.62.54.2 (MEDIUM)NHNRUNLL6.15.332.2 (HIGH)944933980759 I0231.64.25.9 (MEDIUM)NHNRUNHL6.85.838.8 (HIGH)994933980959 I0341.62.54.2 (MEDIUM)NHLNUNLL5.04.823.8 (MEDIUM)944691430757 I041 1.62.54.2 (MEDIUM)NHLNUNLL5.85.330.2 (MEDIUM)947633680759 I0531.65.37.5 (HIGH)NHNRCLHL6.87.348.9 (CRITICAL)994933986959 I0631.63.65.3 (MEDIUM)NHNRUNHN6.15.332.2 (HIGH)944933982919 I0741.62.54.2 (MEDIUM)NHLNUNLL4.34.820.2 (MEDIUM)944631430757 I081 1.62.54.2 (MEDIUM)NHLNUNLL4.34.820.2 (MEDIUM)944631430757 I0912.84.77.6 (HIGH)NLLNULHL6.43.320.7 (HIGH)647699912911 I1012.84.77.6 (HIGH)NLLNULHL6.43.320.7 (HIGH)647699912911 I1112.84.77.6 (HIGH)NLLNULHL6.43.320.7 (HIGH)647699912911 I12 12.84.77.6 (HIGH)NLLNULHL6.43.320.7 (HIGH)647699912911 I1312.84.77.6 (HIGH)NLLNULHL6.43.320.7 (HIGH)647699912911 I142 1.66.08.3 (HIGH)NHNRCHHH6.89.060.8 (CRITICAL)997933689999 I1511.86.08.5 (HIGH)NHLNCHHH5.57.038.5 (HIGH)997633619991 I1621.64.76.4 (MEDIUM)NHNRULHL5.96.336.7 (HIGH)947933482959 I1712.84.77.6 (HIGH)NLLNULHL6.83.321.9 (HIGH)947699912911 I181 2.84.77.6 (HIGH)NLLNULHL6.43.320.7 (HIGH)647699912911 I1912.84.77.6 (HIGH)NLLNULHL5.93.319.1 (MEDIUM)940699912911 I2012.84.77.6 (HIGH)NLLNULHL5.93.319.1 (MEDIUM)940699912911 I2121.64.76.4 (MEDIUM)NHLNULHL6.63.321.5 (HIGH)944699482911 I22 21.65.37.5 (HIGH)NHNRCLHL6.87.348.9 (CRITICAL)997933686959 I2321.65.37.5 (HIGH)NHNRCLHL7.17.351.7 (CRITICAL)997933986959 I241 1.64.76.4 (MEDIUM)NHLNULHL5.63.318.3 (MEDIUM)944639912911 I25 11.64.76.4 (MEDIUM)NHLNULHL6.13.319.9 (HIGH)647679912911 I2611.64.76.4 (MEDIUM)NHLNULHL5.63.318.3 (MEDIUM)944639912911 I27 11.64.76.4 (MEDIUM)NHLNULHL5.13.316.7 (MEDIUM)940639912911 I28 5 0.24.24.4 (MEDIUM)PHHNUNHL4.05.321.0 (MEDIUM)944231180957 (a) CVSS 3.1(b) OWASP Risk Rating Fig. 8. Example on how to interpret the severity score string values. On part (a) we provide the scoring produced using CVSS 3.1 ratings, whereas on (b), we provide an example of the OWASP risk rating. The acceptable values for each character is listed within parentheses. 16 considers User Interaction in the process, while OWASP does not. This is important because some threats can be exploited only in the moment the user performs a certain action, while others can be exploited at any time, elevating the risks. On the other hand, OWASP considers the Motive of adversaries in exploring a threat, based on the reward obtained. This can also increase or decrease the chances for a threat being exploited: the greater the potential rewards, the more likely adversaries are to invest time and resources in attempting the exploitation. In the Impact category, OWASP provides a more granular level of choices when estimating the damage of a successful exploitation, allowing values from 0 to 9. CVSS provides only 3 choices: No Impact, Low Impact, and High Impact. Having more values to choose enables a more precise estimation, but also increases the chances for disparities related to subjec- tiveness. Another difference is that OWASP includes Loss of Accountability in the process, which considers the impact of tracing back adversary actions on the affected system to individuals. CVSS does not include this category, but does include Scope, which analyzes the impact of an exploit in other resources beyond the affected one. OWASP methodology proposes using only a label to rep- resent the severity level of a vulnerability, whereas CVSS assigns a value from 0.1 to 10, from which a label is derived. In Table VI, in addition to the label obtained following OWASP methodology, we also presented a value obtained by multiplying the Likelihood and Technical Impact averages to understand how the values were distributed among the labels. Note that by the results, we had values, such as for A02 threat of 33.06 with a MEDIUM label, but lower values were attributed a HIGH label, such as for the C07 case, with a value of 24.37. By considering the possible values as 0 to 9, the minimum and maximum values for OWASP would be 0 and 81, respectively. We can also see by the results that, for the same threat, different severity levels were obtained in some cases. In most cases, the severity level of OWASP was higher than that of CVSS. Another difference in the results is that six threats were assigned a CRITICAL label in OWASP, while no case was defined in CVSS. Note that many of the threats presented the same severity level. Such examples are mostly related to Jailbreak attacks, a common threat to LLMs. We have considered different forms of performing a jailbreak; however, when computing their severity level, most of them have the same impact and exploitability/likelihood parameters. For this type of threat, we have considered high impact levels in integrity and low levels in confidentiality and availability, since after a successful exploit of the LLM, adversaries can have access to some sensitive data and make it behave as they wish (i.e., completely control of the LLM). Based on the results, one can prioritize the most severe threats according to the overall score or focus on those that have a higher impact or higher chances of occurring. VII. EXISTING MITIGATION STRATEGIES Based on the threats discussed so far, it is evident that the use of LLMs demands specialized security measures that traditional controls fail to address. To secure systems using this technology, additional countermeasures must be integrated throughout the entire software and LLM development life cycles. This section presents techniques to secure the develop- ment, deployment, and use of LLM systems that can mitigate threats and potential risks associated with LLMs. In this section, we describe how we divided the mitigation techniques into eight categories, each dealing with one part of the LLM system. Note that some techniques may be applicable to more than one category, and for a single threat, more than one technique can be applied. Next, we explain the categories and provide examples of techniques for each one of them. Table VII presents a compilation of some techniques proposed for mitigating the threats presented in Section V, highlighting the life cycle phase and use case scenario in which it should be implemented (or impacts). We highlight that this is not an exhaustive literature revi- sion, nor does it presents all possible defensive techniques used by security professionals in real-world scenarios. It is a compilation of techniques referenced in the literature and proposed as mitigation to threats, as well as those considered most effective by security specialists. The eight mitigation categories described below were obtained from our literature review in Section I and sources such as the Department of Homeland Security [221] and NIST [228]. 1Data Management: An LLM, as with any IA technique, is as good as the data it was trained on. The difference is that LLMs require huge amounts of data compared with other techniques. However, one must be cautious about the data it uses to train the model, since many problems can arise, ranging from privacy issues to poisoning. Some techniques that should be used in any LLM pipeline are data cleaning (including the application of anonymization on sensitive data [87], [150], [323]); sanitization (to detect and remove adversarial data) [228], [230], [269]; encryption (at rest, on training data stored in the LLM infrastructure to avoid compromise) [221]; and a strict access control policy to limit the access to and manip- ulation of the training data [87], [221]. Other important techniques are related to Data Provenance Analysis [87], [230], by recording the obtained data metadata (e.g., source, modification date, etc), and Multiple Training Data Providers [87], by adopting a source credibility analysis or getting data from multiple sources in different time intervals, all with the aim of mitigating poisoning and supply chain threats. 2 Infrastructure - Development Environment: Besides protecting the data, it is also paramount to protect the LLM development infrastructure, to counter any supply chain, insider, or other threat that may compromise the system during its development. Some important tech- niques that can be applied here are: a strict access control policy to limit the access to data and resources to the minimum level, and only for those who require it [87], [221]; encryption of data (in transit) [221]; strong authentication [230]; and secure session management to provide and maintain access to developers; restriction to only trusted development tools [221], [230] with integrity 17 verification processes [228], [230] to avoid supply chain threats; VPN for remote access to the infrastructure [289]; and many others presented in Table VII. For this category, it is also important to adopt Security and Privacy Best Practices, i.e., DevSecOps/LLMOps/SSDLC/Security by Design [221] and Privacy by Design [36], which will aid in mitigating many threats and compliance issues. 3Infrastructure - Deployment Environment: When LLM systems are deployed and ready for use, the se- curity of the infrastructure relies on how the system was developed and whether security best practices were used. Infrastructure security also relies on the techniques that can be applied to protect the system, data, and users from threats, since despite the adoption of security practices during development, failures can still occur and vulnerabilities can be discovered. Some of the defense techniques include creating a segmented network and isolated environment to deploy the LLM system [194], so that it does not compromise other resources in case of a breach. Also important are logging and audit trails [55], [87], [221], [230]; routine vulnerability scanning on infrastructure [221]; software update and patching [230]; and environment monitoring, including solutions to detect threats [64], [221], [230], [289], [298] and mon- itor resource utilization [64], [230]. Finally, restricting resources usage can mitigate DoS attacks, with some techniques limiting the control over the LLM context win- dow [64], [230]; energy consumption [276]; resources per request [87], [230]; and also defining a rate limiting and throttling [87], [230] on requests to LLMs. Another line of defense can be the offense, with solutions exploring vulnerabilities in automated attacks to disrupt operations [239]. 4LLM / App Robustness and Protection: The core of an LLM system is the model. Many attacks target the model, trying to cause a jailbreak, manipulate data, or obtain sensitive data. The techniques to protect the model and make it more robust against attacks can be further classified into four distinct groups: • Model Adversarial Protection: Techniques ranging from adversarial training [64], [87] and backdoor de- tection [64], [248] to model retraining (or fine-tuning) [152], [197], [269], distillation [64], [237], and com- pression [113]. There are also techniques based on the moving target defense [42] approach. • Model Privacy Protection: Techniques adding noise or new data in the model training process to protect sensitive data (differential privacy [64], [87], [101], [323], watermarking [64], [215], [233], regulariza- tion [64], gradient-based defenses [215]), training the models in a specialized way such that sensitive data is protected or avoided during the model conception (task-specific knowledge distillation [101], federated learning methods [323]) or removing sensitive data from already trained models (unlearning [47], [84], [87], [115], [256], [281] and privacy with backdoors [87], [116]). Another type of technique consists in protecting models from side-channel attacks caused by timing and power leakage. In the LLM context, many works show that graphical processing units (GPUs) can leak sensitive data and be exploited. Defenses can be the use of software-based techniques that create some sort of obfuscation during processing, such as shuffling, random workload execution, or using constant-time al- gorithms [122], [322], [338]. Other approaches, based on hardware, aim to reduce the leakage by using an electromagnetic shielding or via noise generation (with a radio-frequency device) [122], [338]. • Model Supply Chain Protection: Techniques for selecting foundation open-source models [87], defining and analyzing LLM system software bill of materials and machine learning bill of materials (SBOM / ML- BOM) [230], and also performing software dependen- cies verification and selection [221]. • Model General Protection: Model defense techniques against memory manipulation [59], [190], [217], [300] or exploitation of model deserialization vulnerabilities, using safe persistence formats to deploy models [228], since many common libraries have vulnerabilities that allow code execution in the hosting environment, e.g., via I47 attack, exploiting CVE-2022-29216 (for tensorfow), or CVE-2019-6446 (for pickle in neural network tools). This group also considers the execution of red teaming practices on LLM (including LLM vulnerability scanners) [26], [28], [87], [171], [215], [230], [269], [289] to search for risks and vulnerabil- ities in the LLM system. Although such practices are constantly encouraged by the industry, [88] argues that red teaming is often divergent regarding its purpose, settings, and target, and brings several concerns about this practice, characterizing its scope, structure, and criteria. Finally, using techniques such as reinforcement learning from human feedback (RLHF) [228], [280] or supervised fine-tuning [102] can help align the model with the system goals, preventing unwanted behaviors and leaking of sensitive content. In particular for the I50 attack, [130] discusses different techniques to defend against such a threat depending on the fine- tuning technique used. Defensive prompts [314] can be another alternative when additional LLM training is not possible. 5Input Preprocessing: Analyzing and processing inputs to the LLM system should also be another concern. It is paramount to consider any entity supplying data to the system as unreliable, and apply different techniques before data is ready for use within the system. Tech- niques exist for validating [64], [87], [228], [230], [269], sanitizing [87], [228], [230], [269], and formatting [64], [87], [168], [228], [283], [332] the input, for instance, so that the model can distinguish between user and injected instructions. Another class of techniques encompasses the detection of malicious content within the input, us- ing techniques such as poisoning protection [64], [248], [249], context-aware filtering [269], adversarial examples 18 detection [64], segmentation [269], warning [64], [151], latent-space monitoring [12], [13], Key-Value Caches Optimization [148], classifiers [48], [95], and also using another LLM to verify the input for malicious content [196], [228], [266]. Other works analyze the inputted prompt using mechanisms such as the attention distri- bution [138], [192], Gradient Cuff [127], or the LLM activations during inference that are outside a defined safety boundary [100] to detect malicious content (a jailbreak attempting). 6Output Processing: It is also important to verify the output of an LLM, since sensitive content may be re- turned to users and applications. Some techniques to this purpose include validation [87], [107], [230], [269], sanitization [87], and encoding [230], [269], with some of them aimed to mitigate undesired code execution or to make automatic processing of outputs more difficult. Techniques for malicious content detection can also be used, including the use of another LLM to verify the content to detect offensive or otherwise undesired content [81], [266] or provide model theft protection [82], [83]. 7User’s Device: When an LLM runs on a user device, specific techniques are required to provide security in an unsafe and uncontrolled (by the LLM provider) environ- ment. To protect sensitive data or company intellectual property information from leaks, techniques such as a trusted execution environment technology [45], [103] can be adopted. Besides, all communication between a user’s device and the LLM provider should be encrypted [221] and have strong authentication [230] and secure session management. Antivirus or control-flow integrity techniques can provide device endpoint security [64], and device authentication and attestation can allow a trust evaluation between the parties. 8User Awareness: The last category is about people, with some solutions requiring the user to approve or confirm (declare consent) about the execution of privileged op- erations executed by the LLM system, e.g., human-in- the-loop [87], [230], especially in scenarios where LLM- integrated applications are used and can perform opera- tions in the device. Another important aspect involving users, including end users and developers in general, is training [87], [221]. It is important to inform those responsible for operating the system or developing it about LLM threats and usage risks, as well as Security by Design / Secure Software Development Lifecycle (SSDLC) principles [221] (including Defense-in-Depth, Least Privilege, Separation of Duties, Weakest Link, among other security principles), and Privacy by Design [36], [87]. Having knowledge about these topics can be effective in counter LLM attacks. In Table VIII, we present the attack strategies attenuated by the mitigation techniques above. All marked cells indicate that a particular attacker strategy has had its effects reduced or completely mitigated by the adoption of the technique. Note that many techniques may reduce the risk of a particular threat, and an effective mitigation should employ multiple techniques concurrently as a Defense-in-Depth approach. VIII. ANALYSIS OF LLM DEPLOYMENT SCENARIOS Based on the scenarios presented in Section IV and the threats presented in Section V, we can see that not every threat applies to all LLM deployment scenarios. For instance, threat C14 (Model Reverse Engineering) only affects models deployed on users’ devices, not in external servers (e.g., the cloud); for those, threat A06 (Time-consuming Background Tasks) is applicable, with adversaries trying to cause instability in the system for legitimate users. For this reason, a deeper analysis of a specific LLM scenario is necessary to identify the relevant threats and best defense strategies to put in place. To assess the full impact of the surveyed threats, we conduct structured threat modeling across the four representative LLM scenarios described in Section IV-E. This process allows for a consistent evaluation of how literature threats associates to practical threat evaluations, regarding severity, attack strate- gies, and mitigations, enabling future researchers to understand to what extent each threat applies to a given deployment. While not exhaustive, the provided modeling bridges theoret- ical insights from literature with practical guidelines, such as from OWASP Top 10 LLM [230], supporting more grounded and actionable security recommendations. A. Application of STRIDE for LLM Scenarios For the threat modeling, we apply the STRIDE framework [159]. We present potential and general risks or threats as- sociated with one or more STRIDE mnemonic(s) (and the OWASP Top 10 LLM category, whenever available), a brief remark about the adversary goal and capabilities, the possible strategy adopted to perform the attack (from Section V-A), and the severity score considering the scenario under analysis and any relevant consideration described in the remark. We associate these general risks/threats with the threats mapped in Section V, and provide potential mitigation strategies from Section VII. For the severity score methodology, we consider only the CVSS v3.1 due to the advantages and objectiveness noticed during its application in Section VI, but we highlight that OWASP methodology would fit as well. Notice, however, that the severity score of a particular risk may have a different value compared to the associated threats mapped, since the score in the STRIDE analysis is adapted to the scenario under consideration. For instance, in Table XI, the Token Wasting risk has a different CVSS value than the A07 threat presented in Table VI, since the conditions for the calculus changed (the privileges required changed from High to None, increasing the chances of a successful attack, thus elevating the score). For each STRIDE acronym, we consider the interpretation adapted to the LLM context presented in Table IX. Tables X to XIII present our analysis results. B. Analysis of Threats Vectors across Different Scenarios Based on the threat model analysis, we examine the security implications of the main architectural differences and design choices across LLM deployment scenarios. We then identify 19 TABLE VII LIST OF MITIGATION TECHNIQUES, THEIR APPLICABILITY TO LLM LIFE CYCLE PHASES, AND THE AFFECTED DESIGN CHOICES PER USE CASE. LLM System Life Cycle DO Design Choices per Use Case DO IDMitigation Technique Mitigation Category Planning Sys. Dev. Data Eng. LLM Dev. LLM Int. Operation FM FT RG CB AP AG M01Data Cleaning [87] [150] [323] 1 -----DP, CL--- M02Data Provenance Analysis [87] [230]1-----DP, DI, SI, CL--- M03Multiple Training Data Providers [87]1-----DP, CL--- M04Training Data Encryption (at rest) [221]1 -----DI, SI--- M05Training Data Sanitization [228] [230] [269]1-----DP, CL--- M06Strict Access Control Policy [87] [221] [230]12 3 DI, SI, AR M07Development Environment Monitoring [230]2-DI, SL, SI M08Restriction to Trusted Development Tools [221] [230] 2-DI, SL M09Security and Privacy Best Practices Adoption [36] [221] 2DP, DI, SL, SI, IO, AR, CL M10Signature and Integrity Verification [228] [230]2 -SL M11VPN for Remote Access [289]2-DI M12Incident Response Plan [221] 23 -----DI M13Logging and Audit Trails [55] [87] [221] [230]23DI, SI, AR M14Network Segmentation [258]23DI, SI M15Network Traffic Analyzer [289]23 DI M16Routine Software Update and Patching [230]23SL M17Routine Vulnerability Scanning on Infrastructure [221]23 DI M18Secure Environment Configuration [221] 23DI, SI, AR M19Communication Data Encryption (in transit) [221] 237 DI, IO, AR, CL M20Secure and Strong Authentication [230]237DI M21Secure Session Management [213] [229]237DI M22Data Processing Encryption (in use) [62]3--------DI, SI M23Deployment Environment Monitoring [64] [221] [230] [289] [298]3--------DI, SI, AR M24Execution Environment Isolation [194]3 --------DI, SI, AR M25Limiting Infrastructure Resources Use [64] [87] [230] [276]3 --------DI M26Load Balancing Adoption [267] 3--------DI M27Minimum Software Permissions and Execution Rights [87] [194] [230]37--------DI, SI, AR M28Trusted Execution Environment Technology Adoption [45] [103]37--------DI, SI M29Model Adversarial Protection [42] [64] [87] [111] [113] [124] [152] [156] [191] [197] [237] [242] [248] [269] [286] [295] [304] [344] 4-----DP, IO M30Model General Protection [20] [26] [28] [32] [50] [59] [71] [73] [87] [88] [92] [98] [102] [145] [167] [169] [171] [190] [215] [217] [228] [230] [244] [253] [269] [280] [289] [293] [297] [300] [314] [316] [321] [332] 4 ----DI, SL, SI, IO, AR M31Model Privacy Protection [47] [64] [78] [84] [87] [89] [101] [115] [116] [141] [157] [200] [202] [203] [215] [235] [240] [256] [281] [291] [323] 14----DP, IO M32Model Supply Chain Protection [87] [221] [230]4-----DI, SL M33Malicious Input Content Detection [12] [13] [33] [48] [64] [95] [100] [107] [127] [128] [138] [142] [148] [151] [177] [192] [196] [216] [228] [248] [249] [266] [269] [285] [301] [315] 5--------IO M34Prompt Instruction and Formatting [64] [87] [168] [228] [283] [332]5--------IO M35Sanitization [87] [228] [230] [269]5--------IO M36Validation [64] [87] [228] [230] [269] 5--------IO M37Encoding [230] [269]6 --------IO M38Malicious Output Content Detection [81]–[83] [266] 6 --------DP M39Sanitization [87]6--------DP M40Validation [87] [107] [230] [269]6--------DP M41Device Authentication and Attestation [201] [294]7DI, SL, SI, IO, AR, CL M42Device Endpoint Security Solution Adoption [64]7DI, SI, IO, AR, CL M43Human-in-the-loop [87] [230]8 ---------DI, IO, AR M44Training Developers [36] [87] [221] 8 -----DP, DI, SL, SI, IO, AR, CL M45Processing Obfuscation [122] [322] [338]4 --------DI, SI M46Device Shielding [122] [338] 3 ,4--------DI, SI M47Hack-back [239] 3 --------DI 20 TABLE VIII MAPPING ATTACK STRATEGIES TO CORRESPONDING MITIGATION TECHNIQUES. MitigationAttack Strategies Attenuated Technique (ID) 1 2 3 4 56 7 8 9 10 11 M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 key security concerns that emerge from the differing threat vectors inherent to each scenario and provide recommenda- tions aiming to support future researchers and practitioners in developing robust security controls for LLM systems. 1) On-device vs. Off-device Model Deployment: The first design choice when deploying LLMs is to define where the model will be hosted, on a user’s device (Section IV-E2, Table XI) or on an external server (Section IV-E3, Table XII or Section IV-E4, Table XIII). Here, we are not concerned about the processing power or other resource limitations of running LLMs on users’ devices, only in the threats to which the model will be exposed. The first major threat of on-device deployed models is the possibility of reverse engineering (attack strategy 9). Having the model deployed on the device allows adversaries to learn about how the model works, obtain internal information, submit any amount of queries to the model without rate limiting (a good security mechanism for external deployed models), and create a shadow model easier. Another threat to models deployed on user devices is mal- ware (attack strategy10). The LLM provider has no control over what tools and applications are running on a user’s device, where the model is being executed. Malicious applications can steal users’ data (including chat history and keys to access the model - C15-16), perform unauthorized actions on the model (I09-27), and also cause denial of service (A01-03, A06-07). Protecting the model in a user’s device is a difficult task, requiring additional hardware to provide some level of security (M28) and endpoint software solutions (M42). In addition to these two major threats, side-channel attacks are facilitated when LLMs run on user devices. Although this type of attack can also occur when models are deployed in external services (C04, C06), by having access to the model, adversaries have more techniques at their disposal to perform the attack. In this way, they can learn a model’s sensitive information and internals (C03, C05), and are not limited to only performing queries to the target model and trying to learn from the results. In addition, adversaries can perform jailbreak by hardware manipulation (I28) to abuse the LLM. However, some threats are attenuated when models are de- ployed on a device, such as the Impersonating the User threat. Note that the CVSS score decreased from 6.4 (Table XII) to 5.0 (Table XI), since an attack would require the adversary to obtain physical access to the device compared to deployments in external servers, in which adversaries can access the LLM remotely. Besides, on-device models can better preserve user privacy, since no data is required to leave the device. Although deploying LLMs in external servers has advan- tages (including not depending on device processing power or worrying about its energy consumption), it also exposes the model to more threats, facilitating attacks strategies such as 5,6,7, and8. A common issue to services deployed in external servers is proper attestation. Adversaries may perform man-in-the-middle (MITM) attacks to intercept user communications, read sensitive information, and/or change it, or launch spoofing attacks to redirect users to malicious services. In addition, Insiders (attack strategy 5 ) become a threat in such a configuration when it comes to the LLM operation, because people with special access or knowledge about the infrastructure and solution become an adversary. Lack or improper use of logging is not a threat itself, but it is a significant risk due to the inability to identify some attacks aimed at stealing data, as well as silent attacks that do not interrupt a service or cause perceptual damage. Performing logging for models deployed on a user device can also be a protection choice, but it is less effective because it can be easily disabled or changed (by malware, for instance). Another risk that comes with the deployment of LLMs is allowing different users (or applications) to access the same model, hosted in a shared infrastructure, with each one having access to an instance of the running service. This exposes users to additional threats, especially because the model needs to maintain context information for each user/app invoca- tion (containing sensitive data) (C16, I14), shared resources (A04), databases (I15), etc. Isolation of environments and LLM instances (M24) or even the use of trusted execution environments (M28) are some possible mitigation strategies. 21 TABLE IX APPLICATION OF STRIDE FOR THE LLM CONTEXT CategoryDescription (S)poofingAdversary impersonates a legitimate user or system developer / administrator to gain access and manipulate or misuse the model’s capabilities. (T)amperingAdversary manipulates data (training, fine-tuning, or RAG context data), model inputs/outputs or the environment where the model runs (configuration files, access control permissions, etc.). (R)epudiationAdversary manipulates inputs to the LLM or performs adversarial attacks without being discovered, because there is no logging / authentication feature or it is insufficient. (I)nformation DisclosureAdversary obtains sensitive information from the LLM, memorized by the model during the training, fine-tuning, or RAG data, by means of prompts or reverse engineering it. (D)enial of ServiceAdversary makes the LLM system unavailable through resource exhaustion or API flooding/wasting. (E)levation of PrivilegeAdversary gains unauthorized control of model or system resources. TABLE X THREAT MODELING OF LLM SCENARIO 1 - DEVELOPMENT OF AN LLM FOR CHAT-BOT APPLICATION. STRIDE Acronym Risk (Owasp Top 10 LLM) Remark CVSS Score Attack Strategy Related Threats Possible Defenses S T DImpersonating the User (LLM04:2025) Adversary impersonates a legitimate user and manipulates information (obtained from public sources) used during LLM training. Adversary injects corrupted, malformed, irrelevant, or adversarial data into the training set to overwhelm and make the learning process more difficult or to cause performance degradation. 4.84I01, I03M02,M06, M13,M20 S T D EImpersonating Vendor (LLM03:2025) Adversary impersonates a legitimate vendor or open-source con- tributor, responsible for providing software used during the model development pipeline. 8.93 I05-06M06-10, M32 TModel Manipulation (LLM03:2025) Adversary manipulates the weights, hyper-parameters or other meta- data that affects the LLM training. 5.0 3 I02M06-07, M10,M13 T, I, DData Manipulation (LLM04:2025) Adversary (insider or one compromising developer’s credentials or security mechanisms) manipulates the data used during LLM training, stored on company’s premises. 6.2/6.04 5 I01M02,M04, M06-07, M13,M20 RLack of Logging (-)Adversary performs malicious and unauthorized actions du- ring LLM development without being discovered (there is no mechanism to track who added or changed data). 5.0/5.078 -M13 ISensitive Data Leakage (LLM02:2025) LLM may expose sensitive information during its operation if trained on such data, due to its data memorization capability. 6.51C12-13M01,M31 External servers require additional security tools to limit the access to the LLM (M25), such as the use of encryption (in transit - M19), firewall (M15), VPNs (M11), access control (M06), authentication technologies (with strong recommenda- tions of two factor authentication - M20), and proper config- uration of all of these tools (M18). In summary, for LLMs deployed on user devices, the adversary needs either physical access to the device or to convince the user to install malware. For LLMs deployed on external servers that adversaries can access from anywhere, the adversary only needs to know the address to make requests, resulting in a higher number of threats and the need to put even more security mechanisms in place. For hybrid deployments, both threat scenarios must be taken into consideration. 2) On-cloud vs. On-premises Infrastructure: Another de- sign choice when deploying LLMs, in this case, in external servers with users making queries to the model via APIs or websites, is to use the companies own infrastructure (on- premise) or a third-party Cloud Service Provider (CSP). Again, here we make the analysis from a security perspective and not considering financial costs or other aspects (e.g., physical space, maintenance). Using company premises to deploy LLMs (Section IV-E1, Table X) requires the purchase (for some technologies), in- stallation, and configuration of many tools to provide security to the model and infrastructure, such as firewall (M15), VPN (M11), authentication (M20), access control (M06), and rate limiting and throttling (M25), among other techniques and technologies. This is important to ensure a Defense-in-Depth approach [265]. By using a third-party CSP, we can have all these tools already in place, requiring one-click to activate them. How- ever, it is imperative to configure tools properly (M18) and to avoid (or pay special attention to) default configurations, which could be inefficient and even make software vulnerable to attacks. External servers are vulnerable to Insiders (attack strategy 5 ). Both on-premise and on-cloud options share this threat, with the latter being amplified due the use of an infrastructure owned by a third party, in which trust is paramount due to the LLM developer not knowing (or being able to control) who has access to the infrastructure and data stored. Note, however, that the CVSS (and other methodologies) do not present a way to highlight this difference regarding the insider threat, thus not reflecting it in the severity score. More details are discussed in Section IX-C. The lack or improper implementation (and use) of logging to track users’ actions every time they perform a query, log 22 TABLE XI THREAT MODELING OF LLM SCENARIO 2 - CHAT-BOT APPLICATION ON USER’S DEVICE. STRIDE Acronym Risk (Owasp Top 10 LLM) Remark CVSS Score Attack Strategy Related Threats Possible Defenses S T IImpersonating the User (LLM03:2025) Adversary (with physical access to the physical device or using malware) makes requests as if they were the legitimate user to obtain sensitive information or influence model behavior. Adversary may also manipulates the weights, hyper-parameters, or other metadata (including RAG data) that affects the LLM. 5.0/5.017 C15, I09-27, I02 M20,M29, M31, M33-36, M42 RLack of Logging (-)Adversary performs malicious and unauthorized actions when using the model without being discovered (there is no mechanism to track who added or changed data). 3.8/3.8 7 8 -M13 RDenial of Malicious Inputs (-) Adversary performs an impersonation attack on a particular user. That user cannot deny malicious actions performed by the adversary. 5.61 -M13,M20, M21,M41 ILocal Data Leakage (LLM02:2025) Adversary exploits vulnerabilities in the device, a compromised LLM- related software component (supply chain attack), or makes the user to install malware. The goal is to steal sensitive data handled by or provided to the LLM (conversations, search history, and documents). 5.3/6.4 310C15-16, I05-06 M16,M27, M32, M41-42, M44 ISide-channel Leak (-)Adversary performs side-channel attacks in the device to obtain sensitive information from the LLM (internals, user queries, etc.). 1.81C03M28,M31, M45 IReverse Engineering (RE) (-) Adversary reverses engineering the LLM to obtain sensitive data by analyzing the binary file available in user’s device or to create a shadow model (via prompts). The goal is to steal IP data (for business competitiveness) or attack other devices using the same model. 5.6/4.091C13-14M04,M19, M28 DDevice Resource Exhaustion (LLM10:2025) Adversary overwhelms the device by instructing the model to per- form time-consuming tasks before answering requests, affecting user experience (service degradation); this is done in the background (not visible to user) and can be achieved using malware. 4.210 A06M25,M29, M33-36 DToken wasting (-)Adversary steals user’s API key (e.g., using malware) and perform several operations with it, incurring in financial loss or running out resource usage. 4.8 10A07M20,M25, M41-42 T I D EPrivilege Escalation through Exploits (-) Adversary exploits vulnerabilities in the system (OS or application) to escalate privileges (usually using malware), having access to sensitive data and the ability to change the application behavior. 6.410I05-06M24,M27, M42 in to the system, or perform modifications in the model or infrastructure, can also be a major risk for both deployment scenarios. This practice (M13) can help detect attacks and anomalies in the system, but requires regular review, using manual or automated methods [186], [188]. In summary, deploying models on premises is more com- plex, but it does not require trusting a third party. The level of trust regarding highly sensitive information can be a turning point in deciding which approach to choose. Although it is easier to obtain top-notch security tools when using a CSP, the lack or improper configuration of the environment and tools can be dangerous, and the use of default configurations is not always the best option. 3) LLM Development vs. Operation Phases: Different threats occur to the phases related to the LLM development (Section IV-E1, Table X) and operation (Section IV-E2, Ta- ble XI; Section IV-E3, Table XII; and Section IV-E4, Ta- ble XIII), requiring the adoption of different defense strategies. During the LLM Development Phase, major threats are poisoning 4and supply chain 3 . The first threat targets the data used during LLM training (I01) or the model itself (I02, I08). Since LLMs require a huge amount of data for their creation, it is difficult to carefully analyze all the data obtained and used during the training, taking care of not introducing any data that may cause bias or unexpected behavior from the model during its operation (e.g., introducing backdoor - I02). In practice, we have seen LLM providers not being careful enough during model training [121], [206], and instead, they spend significant resources to review the model outputs when in production. The problem is that testing every possible scenario is infeasible, and vulnerabilities may remain in production models, as constantly reported in the news [46], [60], [65], [155], [160], [260], [274], [312], [313]. Poisoning can also target model fine-tuning or RAG processes when external content is used to specialize an LLM, especially when these data come from users without proper validation (I03) or are manipulated by insiders or adversaries [306] (I01). Supply-chain threats are also present during model develop- ment, although they are not restricted to this scenario. Adver- saries may manipulate pre-trained LLMs (foundation models) that may be used by applications (I02), since creating an LLM from scratch is an expensive task. Adversaries might also create fake plugins (I06) and dependencies (I05) that devel- opers may adopt during the development process, introducing vulnerabilities in the model or development environment and putting at risk the LLM integrity and associated sensitive data. When models become ready to users (Operation phase), threats related to user interaction may arise. LLMs allow users and applications to submit prompts having any possible content; although this is an interesting characteristic and maybe the reason for making this technology so popular, since the model can interact with users simulating human-like conversations, it can also be a major threat (attack strategies 12 - threats C08-09, C12, I09-27, A01-02, etc). To protect LLMs, developers create security mechanisms and guardrails to restrict and avoid sensitive and forbidden information to be 23 TABLE XII THREAT MODELING OF LLM SCENARIO 3 - LLM-INTEGRATED APP. ON-CLOUD. STRIDE Acronym Risk (Owasp Top 10 LLM) Remark CVSS Score Attack Strategy Related Threats Possible Defenses S I T DImpersonating the User (-) Adversary impersonates a legitimate user (by exploiting failures in the authentication process, stealing an API key or session tokens, MITM, etc.) making requests as if they were the user to obtain sensitive infor- mation, influencing the model behavior (e.g., via jailbreak) or causing a denial of service (Token wasting / Device Resource Exhaustion). 6.4 8 I09-11, I13, I15, I17-I18, I20, C15, A06-07 M20,M23, M25,M31, M33-36 S I T DImpersonating the LLM-integrated App (LLM03:2025) Adversary impersonates the App (e.g., via supply chain attack) by spoofing its requests to the LLM system (on-cloud), thus manipulating the behavior of the model, getting sensitive information, or causing service degradation (device resource exhaustion). 4.6 3C16, I06, A03 M06-10, M32 S I T DServer Spoofing (-)Adversary spoofs the CSP by exploiting communication vulnerabilities and performs MITM, redirecting users to a malicious server controlled by them, reading sensitive content, and changing data sent to the LLM- integrated app. 5.1 6 C15-16M16-17, M19,M41 T, I, DInsider (-)Adversary is someone with privilege access to the LLM and its infrastructure (someone inside the CSP or a company developer), changing configuration files, model parameters, the model itself, or reading sensitive information. 6.3 5I02, C15-16 M04, M06-07, M13,M20 TFeedback Poisoning (LLM04:2025) Adversary manipulates the LLM evaluation option, used to take users’ feedback into consideration for improving (retraining) the model. 5.41I08M02,M05, M33, M35-36 T, I, DData Manipulation (LLM04:2025) Adversary (insider or one compromising developer’s credentials or security mechanisms) manipulates private data used during LLM fine- tuning or RAG, stored on-cloud. 6.0/6.258I01M02, M04-07, M13,M20 T RLack of Logging Mechanism (-) Adversary performs malicious and unauthorized actions to the LLM without being discovered. There is no mechanism to track who performed a particular action (misconfiguration) or tampered with logs. 5.0/5.578-M13 ISensitive Data Leakage (LLM02:2025) LLM may expose proprietary, personal, or other sensitive information during its operation if trained on such data, due to its data memoriza- tion capability. 6.51C12-17M01,M31, M38,M40 IReverse Engineering (RE) (-) Adversary reverses engineering the application (available in user’s device) responsible for communicating and accessing the LLM system to obtain API keys or any other sensitive data. The adversary can also use malware to this end. 4.6/4.8910C14-15M04,M19, M28 TIDERemote Code Execution (RCE) (LLM05:2025) Adversary sends a problem-solving malicious request to the app, which creates a prompt to the LLM. The LLM generates code to solve the problem and returns it to the app. The app executes the command (triggering a malicious activity, e.g., steal confidential data) and returns the result to adversary. 7.1 2 I14M24,M27, M33-36 DService disruption (LLM10:2025) Adversary exploits a vulnerability or forces a failure in the application to crush the system infrastructure, denying responses to other users. 5.36-M16-17, M25, M33-36 TIDEPrivilege Escalation through Exploits (-) Adversary exploits vulnerabilities in the system (OS or application) to escalate privileges, having access to sensitive data and the ability to change the application behavior. 6.68I05-06M23-24, M27 TIDLLM Behavior Manipulation (LLM1:2025) Adversary modifies the behavior of an LLM by indirectly making it retrieve malicious content from the internet (e.g., from a web page controlled by him/her). 5.92I14-15, I22 M29, M33-36 TIDEAbusing Shared Infrastructure (LLM3:2025) Adversary abuses the CSP for not having a proper isolated infrastruc- ture, allowing access to unauthorized data and resources available on that infrastructure (e.g., Cloud Jacking attack). 6.4/6.4 6 11 C06, A04M16-17, M24-25, M27,M28, M45-46 disclosed by the LLM, but as presented in Table V, there are many variations of jailbreak attacks to bypass the defenses in place, and it is difficult to prepare for every possible word combination that could compromise the model. There are many defense alternatives to this sort of attack (M29-31, M33- 40), but we continue to see new attacks to prevail. Supply Chain and poisoning can also be a threat to LLMs during operation. The supply chain threat can be seen in the tools used to make the model available for users. Many open-source tools are available nowadays that can make the processes of deploying LLMs easier and faster, but some of these tools could be compromised and be a threat to users and companies (I05). When it comes to poisoning, the feedback feature about users experience with the LLM system can be a way of adversaries manipulate a particular behavior of the model (I08). Another form of feedback to contribute with model’s training process is through federated learning. Poisoning (I02) and Byzantine (I07) attacks are another threat in this scenario, with adversaries sending erroneous updates to the central aggregation system to manipulate the LLM [171], [311]. Changing data that will be used for model re-training (fine-tuning) or consumption (RAG) can be a serious threat to 24 TABLE XIII THREAT MODELING OF LLM SCENARIO 4 - LLM-BASED AGENT FOR USER ASSISTANCE. STRIDE Acronym Risk (Owasp Top 10 LLM) Remark CVSS Score Attack Strategy Related Threats Possible Defenses S I T DImpersonating the LLM-Agent (LLM03:2025) Adversary impersonates an agent (e.g., via supply chain) by spoofing its requests to the LLM system (on-cloud), thus manipulating the behavior of the model, getting sensitive information, or causing service degradation (device resource exhaustion). 5.0 3 C16, I06, A03 M06-10, M32 S I T DServer Spoofing (-)Adversary spoofs the CSP by exploiting communication vulnerabilities and performs MITM, redirecting users to a malicious server, reading sensitive content, and changing data sent to the LLM system. 5.1 6C15-16M16-17, M19,M41 T, I, DInsider (-)Adversary is someone with privilege access to the LLM and its infrastructure (someone inside the CSP or a company developer), changing configuration files, model parameters, the model itself, or reading sensitive information. 6.3 5 I02, C15-16 M04, M06-07, M13,M20 TFederated Learning Poisoning (LLM04:2025) Adversary, controlling multiple clients or a collusion of multiple adversaries, send manipulated data to the LLM for model improvement (taking advantage of the federated learning process). 5.98I07M02,M05, M33,M41 T, I, DData Manipulation (LLM04:2025) Adversary (insider or one compromising developer’s credentials or security mechanisms) manipulates private data (RAG) stored on-cloud. 6.0/6.258 I01M02, M04-07, M13,M20 T RLack of Logging Mechanism (-) Adversary performs malicious and unauthorized actions to the LLM without being discovered. There is no mechanism to track who performed a particular action (misconfiguration) or tampered with logs. 5.0/5.57 8-M13 ISensitive Data Leakage (LLM02:2025) LLM agents manipulate sensitive data, obtained from hardware and software sensors (e.g., GPS location, running apps on the device, etc.), that is sent to the LLM central system for processing and analysis. Adversary obtains such data during transmission, compromising the cloud, or as an insider. 6.3/7.5 56 C15M01,M04, M06,M13, M16,M19, M41 IReverse Engineering (RE) (-) Adversary reverses engineering the application (available in user’s device) responsible for communicating and accessing the LLM system to obtain API keys or any other sensitive data. The adversary can also use malware to this end. 4.6/4.8910 C14-15M04,M19, M28 TIDERemote Code Execution (RCE) (LLM01:2025) Adversary sends a malicious email to the victim. The LLM agent pro- cesses the email, containing a malicious hidden instruction, executing it and triggering the malicious activity. 7.12I14M24,M27, M33-36 STIDAutomatic Speech Recognition (ASR) system Compromise LLMs with voice command feature are susceptive to adversarial attacks on the ASR system, responsible to convert an acoustic wave- form (user voice) into text. Attacks on these systems, such as the DolphinAttack, add particular noise into the audio (imperceptible to humans) that when interpreted by the ASR, result in malicious instructions that are executed into the system. 7.4 8 [19], [106], [340] [19], [106], [340] TIDEAbusing Shared Infrastructure (LLM3:2025) Adversary abuses the CSP for not having a proper isolated infrastruc- ture, allowing access to unauthorized data and resources available on that infrastructure (e.g., Cloud Jacking attack). 6.4/6.46 11C06,A04M16-17, M24-25, M27-28, M45-46 users, that can be manipulated by insiders or when the infras- tructure security mechanisms are compromised (I01). In the federated learning context, data exfiltration [120] or leakage [117] are also relevant threats that should be considered. The lack or improper implementation (and use) of logging affects both phases, the development and operation. During LLM development, especially during training data processing and collection in Data Engineering Phase, logging and other security mechanisms implemented in the infrastructure storing the data is paramount (M02, M13), especially with a strict access control policy (M06) and strong authentication methods (M20) to avoid unauthorized people to access and manipulate data. Log has an important role of helping identifying any manipulation on data or other abuses. For LLM Operation Phase, logging is of great importance to help detecting attacks and abuses of the model and infrastructure. In essence, LLM development is mostly affected by poi- soning 4and supply chain 3 , while LLM operation, still being at risk for this two attack strategies, has other serious threats related to users interaction, via direct 1 or indirect 2 prompts submitted to the system. 4) General-purpose vs. Goal-oriented LLM Interaction: Different LLM use cases allow different forms of user inter- action. LLMs become widely known by people via general- purpose chat-bot applications (Section IV-E2, Table XI), mostly due to their ability to recognize text from users (via a text prompt) and respond accordingly, simulating real-life conversations. However, this is not the only use case of or interaction form with LLMs. We can also have goal-oriented LLMs, with applications interacting with users and, based on some conditions, prompt the LLM to solve a particular problem (Section IV-E3, Table XII), or even having LLMs with predefined goals acting like an agent, possibly executing or controlling tools and other LLMs (Section IV-E4, Ta- ble XIII). Each of these use cases have their own limitations and associated threats based on the way they interact with 25 users, presented in the following. General-purpose chat-bots allow users to send any input (usually via a text prompt) to the LLM interpret and respond, amplifying threats related to prompt injection (attack strategy 1). Adversaries targeting this LLM use case usually execute jailbreak attacks, trying to bypass the security mechanisms and guardrails to obtain sensitive information from the LLM, e.g., ”how to build a bomb” (I09-27, I16, I28). However, adversaries can also target users API keys (C15), credentials 7, or chat history (C16), since this will allow them to obtain sensitive information (or even business data). Some attack strategies used to this end are malware10, MITM attack6, or users’ credentials compromise 7, all targeting user interaction with the LLM system. Although chat-bots use text as the main form of receiving user input, some solutions are also adopting the voice as another form of interaction [226], [299]. This brings new threats to the system, as another element will be responsible for converting the voice into text before the LLM can make inferences on it. One can use a multimodal model to receive the voice directly [271] (not in the scope of this paper) or use an LLM with an additional tool (usually an ASR - Automated Speech Recognition - system) to make the voice conversion into text, which will feed the LLM. For the second case, adversaries may attack the ASR, for instance, using adversarial attacks such as the Dolphin [340] or any other ones [19], [106] to force a wrong transcription or the embedding of malicious text into the results. On the other hand, goal-oriented LLMs, such as LLM- integrated applications, are exposed to other types of threats due to the way they interact with users (attack strategy 2). In this scenario, the LLM receives data from the integrated ap- plication, which will create a prompt (or use a predefined one) with a particular goal and send it to the LLM to execute; the results will be interpreted by the application and presented to users. Note that user input to the application may be appended to the prompt, which could represent a threat. Adversaries can attack the LLM via inputs to the application or find a way to change the standard prompt used by the application. Another way of attacking LLM-integrated applications is making the LLM fetches and processes external content created by the adversary with hidden malicious instructions (I16) that will be activated when processed by the LLM system. Another type of goal-oriented LLM is the LLM agent, which can interact with users via text (using a chat-bot interface), voice commands (using an ASR system) or with an LLM-integrated application able to receive or send agent requests. In such cases, the LLM agent will inherit the threats from the chosen user interaction method, as described before. With goal-oriented LLMs, where integrated applications or agents can execute predefined actions or commands on the host system, remote code execution (RCE) vulnerabilities (I14) represent a major threat. Compared to traditional chat-bots, susceptibility to denial-of-service attacks (A01-A02, A06) is significantly higher, and the risk of leaking user sensitive information (C15-16) also increases. In summary, all forms of LLM interaction require a strict input (M33-36) and output (M37-40) analysis, as well as other mitigation strategies to harden the model (M29-32) and other resources to withstand adversarial attacks. In addition, there is a significant risk in LLM-integrated applications or LLM agents for their ability to execute code, which can be attenuated by defining minimum permissions and execution rights to applications (M27), perform environment isolation (M14, M24), implement logging (M13), and also having constant monitoring of the environment for detecting attacks and leakage of sensitive data (M15, M23). 5) Enabled vs. Disabled Access to Resources: The last design choice of an LLM system discussed here is about allowing access to external content and tools to the LLM (Section IV-E3, Table XII and Section IV-E4, Table XIII) or not (Section IV-E1, Table X and Section IV-E2, Table XI), such as access to content on the internet, tools, databases, or hardware/software device sensors. Searching for content online can improve LLM capabilities, but also expose the model to new threats, such as indirect prompt injection 2. For instance, an LLM with access to user’ emails can be target of adversaries that craft email messages containing hidden instructions within its content, and executed by the model during the message processing. Another example refers to LLMs that process content from a website (I22), similar to the case just presented, but with adversaries hiding the malicious instructions within a page that will be accessed and processed by an LLM when responding to users’ queries. LLM-integrated applications or agents may have the ability to execute commands on the hosting system, having even control of tools to perform specific tasks. Adversaries targeting such use cases seek to obtain remote code execution attacks to take control of the whole system, via indirect prompt injection 2(I14), supply chain4, or exploiting known vulnerabilities 6(I05) and security mechanisms8. Function calling is also another possibility to allow LLMs to use external tools, but they can also be manipulated via indirect prompt injection 2 , even for simpler attacking methods, since models using such feature may not be prepared with the same safety alignment training as chat-bot LLMs [314]. Access to databases containing sensitive information may also be another feature of LLM-integrated applications or agents. LLMs can query databases to obtain additional infor- mation about a particular topic (e.g., sales information during a particular period) and execute tasks based on some condition. However, adversaries may try to perform SQL injection attacks (I15) to steal these data, using direct1 or indirect 2 prompts. Some LLM agents may have the ability to read from hardware or software sensors available in a device to act upon such information. Note that sensitive data may be obtained in this case, such as the GPS data or a list of running applications in the device, for instance. Allowing the LLM to access such data puts it as a high target for adversaries. Besides, there are threats when the collected data is sent to a cloud for further processing. Attacks such as indirect prompt injection 2, exploit of security mechanisms8 and use of malware10 are some of the alternatives used by adversaries to obtain this highly sensitive data. Even when LLMs have no access to external content directly or the ability to execute commands, risks still exist, such as 26 the Copied Prompt Injection attack (I16)2. In this attack, users are lured to copy content from a malicious website with hidden instructions and provide to the LLM, via a paste operation in the text field of the application interface they use to communicate with the LLM. Without noticing the hidden instructions (obfuscated by the adversaries using style features of the interface), users send the copied content to be processed by the LLM, along with the hidden instructions. Note that such attack targets chat-bots applications and can be a combination of threats, with I16 and another one to jailbreak the LLM (I09-I29) or cause denial of service (A01-03, A06). To summarize, by enabling or not LLMs to access external content, when processing any user or application supplied content requires a deeper inspection, using a combination of the defense techniques to handle the inputted data (M33-36), review the output (M37-40), and protect the LLM from jail- breaks (M29-31). During LLM development, consider using security best practices (M09), which encompasses a zero-trust over any data supplied to the system. To further limit the damage of attacks that bypass the defenses, it is recommended to set minimum permissions and execution rights to appli- cations (M27), keep all software up-to-date (M16), perform isolation of execution environment (M14, M24), implement correctly logging (M13), and keep monitoring the environment for attacks and leakage of sensitive data (M15, M23). IX. DISCUSSIONS, OPEN CHALLENGES, AND LIMITATIONS In this section, we answer the research questions that led us to conduct this research, discuss some open challenges identified in the field according to our perspective, and present the limitations of this paper. A. Research Questions The research questions introduced at the beginning of this work are discussed below. RQ1 What are the main use cases and design choices of LLM systems from a security perspective? Not every threat applies to every possible scenario. Un- derstanding the scenario where the LLM will be used is the first step towards building a secure system. In Section IV, we listed possible LLM use cases, from development to operation, considering different ways of using LLMs. We also discussed the characteristics and design choices of a LLM system that may directly affect security in Section IV-D. Then, we proposed a structured definition of an LLM deployment scenario, grounded in a set of distinct configurations and a canonical vector string. Using this framework, we derive four representa- tive real-world scenarios from a security perspective in Section IV-E, in which we examined in detail, showing that the design choices are critical engineering decisions that shape the threat landscape of real-world LLM-based systems and influence the selection of corresponding mitigation strategies. RQ2 What are the major threats to real-world LLM- based systems and how can they be mitigated? After proposing a structured definition for a LLM scenario, we scrutinized in Section V both academic literature and in- dustry reports to uncover and characterize known threats and recent attacks to LLM systems, ranging from supply chain to jailbreak attacks. It is important to note that many of these threats can be executed through multiple attack strategies, in which we outline in Section V-A. For instance, an adversary can steal a user’s API key (C15) through an indirect prompt injection strategy 2, credential stealing7, or malware infection10. The ease or difficulty of executing an attack depends on the strategy employed, which in turn affects both the likelihood of the attack and the choice of appropriate defense strategies. To evaluate the practical impact of the threats surveyed in the literature, we assessed the risk associated with each threat in the context of LLM real-world scenarios. We found that the most significant risks stem from attacks targeting LLM prompting and system integrity, particularly the strategies manipulating prompts directly 1or indirectly2or exploiting security mechanisms8. Moreover, when examining through the lens of the four real-world LLM deployment scenarios, insider threats ( 5) and the exploitation of known vulner- abilities (6) also emerge as critical concerns in practical LLM deployments. Another important conclusion is that there is no “silver bullet” to mitigate all attacks or even an entire class of attacks. We have seen that the tradi- tional “Defense-in-Depth” approach is the best solution to protect systems, meaning that it is important to use one or more mitigation strategies (from Section VII) to implement in the LLM system, depending on the use case and design choices adopted. RQ3 How do different use cases and design choices affect the security and privacy of LLM-based systems? Finally, in Section VIII we presented an in-depth security analysis of the four LLM scenarios defined in this work (from Section IV-E), applying the STRIDE framework to model threats. The analysis identified specific threats relevant to each scenario and outlined possible defense mechanisms. We also highlighted how different design choices and use cases affect security, showing the threats that should be considered for each setting and correspond- ing mitigation strategies. Notably, changing the scenario settings can increase or decrease the severity score level for a particular attack, and in some cases, if appropriate design choices are selected, they are completely miti- gated. B. Open Challenges After completing this research, we identified some chal- lenges that, from our perspective are worthy of further explo- ration. We separated these challenges by topic, as discussed below. Although some research has already been developed on some of these topics, solutions to satisfactorily address them are still lacking. • The never ending Jailbreak of LLMs. Although much research has been conducted to mitigate jailbreak at- tacks, we continue to see novel methods of performing 27 attacks capable of bypassing the security mechanisms and guardrails in place. We suggest that further research focus on developing effective and alternative strategies for detecting malicious input content. • Testbed of attacks to LLM systems. Some research has focused on creating or gathering many jailbreak attack methods into a single dataset of malicious prompts [39], [64], [199]. However, a more comprehensive testbed to evaluate additional attacks can ease the process of finding other vulnerabilities, including privacy-related and side- channel attacks, among others. Testing the integration of both the application and the LLM is also necessary, since many software vulnerabilities can be exploited to take control of the system and steal valuable data. • Reproducibility. Many researchers have explored various methods of compromising an LLM, but some of the proposed attacks lack straightforward ways to apply and test them on these models. Further research is needed to test these attacks on real-world deployments, in order to refine our analytical approach with empirical results. The community should also strengthen efforts to standardize and provide reproducibility mechanisms to make such comparisons more feasible. • Finding and using good quality data. LLMs require vast amounts of training data to achieve good performance. More often than not, the data is scrapped from the web with little to no oversight, which may inadvertently compromise the LLM training and consequently the pro- duced results. This may introduce many problems to the LLM, including backdoors via data poisoning, toxicity, misinformation, bias, among others. Another emerging challenge is the widespread scraping of AI-generated content, which, after successive generations of reuse in LLM training, can degrade data quality over time and ultimately lead to model collapse [246]. More research is required on developing effective methods for analyzing large datasets to detect malicious or unwanted content. • Respecting privacy laws and avoiding misinformation. Copyright is a concern due to LLMs being trained on data that is protected and should not be used without it being referenced. Research is needed on how to attribute the source of information when LLMs produce their results. This can help to mitigate misinformation or hallucination problems, since the source will be available in the content, allowing others to verify the information. • Measuring efficacy of mitigation strategies. We pre- sented different mitigation strategies that apply for vari- ous configurations of LLM use cases and design choices, and showed, through risk and threat modeling assess- ments, that employing multiple strategies concurrently is crucial to protect the LLM system. However, we have not determined how to assess the efficacy of combining two or more strategies, including the implantation complexity and costs, as well as the extent of protection they offer. What is needed is a framework or methodology to per- form such an analysis that takes into consideration the insights brought by this work. C. Limitations Some of the limitations of this paper, presented below, are inherent to a field or particular technique, while others resulted from our choices during this research. • Peer review is important. Much research encompassing LLMs and LLM security is currently being published in repositories with no peer review process, such as ArXiv. Although this makes the research results available faster to the public, no peers are analyzing and filtering this material (including technical reports from companies or blog presentations) to ensure that high-quality papers and documents are being created and shared. • Same attack, different name. In this paper, we choose to specify different ways of performing jailbreak attacks on LLMs, since this category is particular to the field and has generated considerable amounts of research. Although there are many variations of prompts an adversary can create to bypass security mechanisms and guardrails, as detailed in Section V, one particular form of jailbreak attack may be referred to by different names. It is common to observe the same idea to craft a prompt (or another one with strong similarity) being referred by different researchers to different names. • Focus on security and privacy threats. This paper focuses on security and privacy threats and does not cover alignment-related threats such as LLMs that perform hate speech or other unethical actions. • Limitations of CVSS scoring system. Although CVSS is a standard methodology to assign severity levels to vulnerabilities, when it comes to the threats we addressed in Section VI, it provides a limited view of the problem, and does not highlight some of the differences existing between two related threats. One example, the Insiders threat 5, will have the same severity level in two distinct scenarios, such as when models are deployed in a company’s own infrastructure and when models are deployed in a third-party infrastructure. Note that they are similar, but in the latter case, the threat is more likely to occur because the number of people having access or privileged knowledge about the system may be larger (the same company employees of the first case, plus additional ones responsible for the third-party infrastructure). X. CONCLUSIONS The integration of Generative AI, in particular LLMs, to software applications for distinct purposes demands a proac- tive and vigilant approach to secure all elements within a system. Not only must common software vulnerabilities be identified and mitigated, but also intrinsic threats and vulnera- bilities from LLMs that are discovered as this technology be- comes more mature. In this paper, we conducted an extensive systematic literature review of threats and mitigation strategies to LLM systems, defining and exploring different use cases and settings that such technology may be used and evaluate their impact on the security of a system. We explored not only chat-bot use cases, but also LLMs integrated to applications, including in forms of agents, as well as cases related to the 28 LLM development process. We performed a threat charac- terization and presented existing mitigation strategies to deal with different attack strategies, analyzed the severity score of threats, and performed threat modeling considering examples of real-world scenarios. Although we are far from having systems without security or privacy concerns, we identified possible countermeasures and mapped them according to the LLM use case and settings. These countermeasures can be applied, from the development to operation, including data management techniques, hard- ening of device and development/deployment infrastructures, LLM and application robustness and protection, input pre- processing, output processing, and user awareness. Each cat- egory encompasses the use of different techniques, and we concluded that the best defense is still via a Defense-in-Depth approach, mixing different techniques at different layers to protect the whole system. REFERENCES [1] S. Abdali, R. Anarfi, C. Barberan, and J. He, “Securing large lan- guage models: Threats, vulnerabilities and responsible practices,” arXiv preprint arXiv:2403.12503, 2024. [2] H. Aditya, S. Chawla, G. Dhingra, P. Rai, S. Sood, T. Singh, Z. M. Wase, A. Bahga, and V. K. Madisetti, “Evaluating privacy leakage and memorization attacks on large language models (LLMs) in generative AI applications,” Journal of Software Engineering and Applications, vol. 17, no. 5, p. 421–447, 2024. [3] D. Agarwal, A. Fabbri, B. Risher, and P. Laban, “Prompt leakage effect and mitigation strategies for multi-turn LLM applications,” aclanthology.org, 2024. [Online]. Available: https://aclanthology.org/2 024.emnlp-industry.94/ [4] D. Agarwal, A. R. Fabbri, P. Laban, B. Risher, S. Joty, C. Xiong, and C.-S. Wu, “Investigating the prompt leakage effect and black- box defenses for multi-turn LLM interactions,” arXiv preprint arXiv:2404.16251, 2024. [5] W. Agnew, H. H. Jiang, C. Sum, M. Sap, and S. Das, “Hallucinat- ing AI hijacking attack: Large language models and malicious code recommenders,” arXiv preprint arXiv:2410.06462, 2024. [6] S. S. Ahmed and J. A. A. Jothi, “Jailbreak attacks on large language models and possible defenses: Present status and future possibilities,” in 2024 IEEE Int. Symp. on Technology and Society (ISTAS), 9 2024, p. 1–7. [7] ̇ I. Z. Altun and A. E. ̈ Ozk ̈ ok, “Securing artificial intelligence: Exploring attack scenarios and defense strategies,” in 2024 12th Int. Symp. on Digital Forensics and Security (ISDFS), 2024, p. 1–6. [8] G. Amit, A. Goldsteen, and A. Farkash, “Sok: Reducing the vulnera- bility of fine-tuned language models to membership inference attacks,” arXiv preprint arXiv:2403.08481, 2024. [9] M. Andriushchenko, F. Croce, and N. Flammarion, “Jailbreaking leading safety-aligned LLMs with simple adaptive attacks,” in 13th Int. Conf. on Learning Representations (ICLR) Poster, 2025. [10] Anthropic, “How do you use personal data in model training?” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://privacy.anthropi c.com/en/articles/10023555-how-do-you-use-personal-data-in-model -training [11] AutoGPT, “AutoGPT: Build, deploy, and run AI agents,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://github.com/Signi ficant-Gravitas/AutoGPT [12] L. Bailey, A. Serrano, A. Sheshadri, M. Seleznyov, J. Taylor, E. Jen- ner, J. Hilton, S. Casper, C. Guestrin, and S. Emmons, “Obfus- cated activations bypass LLM latent-space defenses,” arXiv preprint arXiv:2412.09565, 2025. [13] S. Ball, F. Kreuter, and N. Panickssery, “Understanding jailbreak success: A study of latent space dynamics in large language models,” arXiv preprint arXiv:2406.09289, 2024. [14] C. Barrett, B. Boyd, E. Bursztein, N. Carlini, B. Chen, J. Choi, A. R. Chowdhury, M. Christodorescu, A. Datta, S. Feizi, K. Fisher, T. Hashimoto, D. Hendrycks, S. Jha, D. Kang, F. Kerschbaum, E. Mitchell, J. Mitchell, Z. Ramzan, K. Shams, D. Song, A. Taly, and D. Yang, “Identifying and mitigating the security risks of generative AI,” Foundations and Trends in Privacy and Security, vol. 6, no. 1, p. 1–52, 2023. [15] M. Beckerich, L. Plein, and S. Coronado, “RatGPT: Turning online LLMs into proxies for malware attacks,” ACM/JMS Journal of Data Science, vol. 37, 8 2023. [Online]. Available: http: //arxiv.org/abs/2308.09183 [16] Y. Bengio et al., “International AI safety report,” UK AI Safety Institute, 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.gov.uk/government/publications/international-ai-safety-r eport-2025 [17] V. Benjamin, E. Braca, I. Carter, H. Kanchwala, N. Khojasteh, C. Landow, Y. Luo, C. Ma, A. Magarelli, R. Mirin, A. Moyer, K. Simpson, A. Skawinski, and T. Heverin, “Systematically analyzing prompt injection vulnerabilities in diverse LLM architectures,” arXiv preprint arXiv:2410.23308, 2024. [18] S. Berezin, R. Farahbakhsh, and N. Crespi, “The TIP of the iceberg: Revealing a hidden class of task-in-prompt adversarial attacks on LLMs,” arXiv preprint arXiv:2501.18626, 2025. [19] A. R. Bhanushali, H. Mun, and J. Yun, “Adversarial attacks on automatic speech recognition (asr): A survey,” IEEE Access, 2024. [20] M. Bhatt, S. Chennabasappa, Y. Li, C. Nikolaidis, D. Song, S. Wan, F. Ahmad, C. Aschermann, Y. Chen, D. Kapil, D. Molnar, S. Whitman, and J. Saxe, “ALERT: A comprehensive benchmark for assessing large language models’ safety through red teaming,” arXiv preprint arXiv:2404.08676, 2024. [21] —, “Talk too much: Poisoning large language models under token limit,” arXiv preprint arXiv:2404.14795, 2024. [22] D. Bhusal, T. Alam, L. Nguyen, and B. A. Blakely, “Exfiltration of personal information from chatGPT via prompt injection,” arXiv preprint arXiv:2406.00199, 2024. [23] F. Bianchi and J. Zou, “Large language models are vulnerable to bait-and-switch attacks for generating harmful content,” arXiv preprint arXiv:2402.13926, 2024. [24] L. Birch, W. Hackett, S. Trawicki, N. Suri, and P. Garraghan, “Model leeching: An extraction attack targeting LLMs,” in Proc. Conf. on Applied Machine Learning in Information Security (CAMLIS), Oct. 2023, p. 91–104. [25] M. Botacin, “GPThreats-3: Is automatic malware generation a threat?” in 2023 IEEE Security and Privacy Workshops (SPW), 5 2023, p. 238–254. [26] J. Brokman, O. Hofman, O. Rachmil, I. Singh, V. Pahuja, R. S. A. Priya, A. Giloni, R. Vainshtein, and H. Kojima, “Insights and current gaps in open-source LLM vulnerability scanners: A comparative anal- ysis,” arXiv preprint arXiv:2410.16527, 2024. [27] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod- els are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, p. 1877–1901, 2020. [28] B. Bullwinkel, A. Minnich, S. Chawla, G. Lopez, M. Pouliot, W. Maxwell, J. de Gruyter, K. Pratt, S. Qi, N. Chikanov, R. Lutz, R. S. R. Dheekonda, B.-E. Jagdagdorj, E. Kim, J. Song, K. Hines, D. Jones, G. Severi, R. Lundeen, S. Vaughan, V. Westerhoff, P. Bryan, R. S. S. Kumar, Y. Zunger, C. Kawaguchi, and M. Russinovich, “Lessons from red teaming 100 generative AI products,” arXiv preprint arXiv:2501.07238, 2025. [29] D. Burke, “A new foundation for AI on android,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://android-developers.google blog.com/2023/12/a-new-foundation-for-ai-on-android.html [30] H. Cai, A. Arunasalam, L. Y. Lin, A. Bianchi, and Z. B. Celik, “Rethinking how to evaluate language model jailbreak,” arXiv preprint arXiv:2404.06407, 2024. [31] R. Cantini, G. Cosenza, A. Orsino, and D. Talia, “Are large language models really bias-free? jailbreak prompts for assessing adversarial robustness to bias elicitation,” in Discovery Science. Cham: Springer Nature Switzerland, 2025, p. 52–68. [32] B. Cao, H. Lin, X. Han, F. Liu, and L. Sun, “Red teaming chatGPT via jailbreaking: Bias, robustness, reliability and toxicity,” arXiv preprint arXiv:2301.12867, 2023. [33] H. Cao, W. Luo, Y. Wang, Z. Liu, B. Feng, Y. Yao, and Y. Li, “Guide for defense (G4D): Dynamic guidance for robust and balanced defense in large language models,” arXiv preprint arXiv:2410.17922, 2024. [34] Y. Cao, B. Cao, and J. Chen, “Stealthy and persistent unalignment on large language models via backdoor injections,” Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024, vol. 1, p. 4920–4935, 2024. [Online]. Available: https://aclanthology.org/2024.naacl-long.276/ 29 [35] N. Carlini, F. Tram ` er, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, ́ U. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” in 30th USENIX Security Symp. (USENIX Security 21).USENIX Association, Aug. 2021, p. 2633–2650. [36] A. Cavoukian et al., “Privacy by design: The seven foundational principles,” IAPP Resource Center, 2021. [37] Y. S. Chan, N. Ri, Y. Xiao, and M. Ghassemi, “Speak easy: Eliciting harmful jailbreaks from LLMs with simple interactions,” arXiv preprint arXiv:2502.04322, 2025. [38] H. Chang, A. S. Shamsabadi, K. Katevas, H. Haddadi, and R. Shokri, “Context-aware membership inference attacks against pre-trained large language models,” arXiv preprint arXiv:2409.13745, 2024. [39] P. Chao, E. Debenedetti, A. Robey, M. Andriushchenko, F. Croce, V. Sehwag, E. Dobriban, N. Flammarion, G. J. Pappas, F. Tramer et al., “Jailbreakbench: An open robustness benchmark for jailbreaking large language models,” Advances in Neural Information Processing Systems, vol. 37, p. 55 005–55 029, 2024. [40] P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” in 2023 NeurIPS Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo), 2023. [41] B. Chen, H. Guo, G. Wang, Y. Wang, and Q. Yan, “The dark side of human feedback: Poisoning large language models via user inputs,” arXiv preprint arXiv:2409.00787, 2024. [42] B. Chen, H. Guo, and Q. Yan, “FlexLLM: Exploring LLM customiza- tion for moving target defense on black-box LLMs against jailbreak attacks,” arXiv preprint arXiv:2412.07672, 2024. [43] B. Chen, N. Ivanov, G. Wang, and Q. Yan, “Multi-turn hidden backdoor in large language model-powered chatbot models,” in Proc. 19th ACM Asia Conf. on Computer and Communications Security. Association for Computing Machinery, 2024, p. 1316–1330. [Online]. Available: https://doi.org/10.1145/3634737.3656289 [44] B. Chen, N. Han, and Y. Miyao, “A statistical and multi-perspective re- visiting of the membership inference attack in large language models,” arXiv preprint arXiv:2412.13475, 2024. [45] D. Chen, Y. Liu, M. Zhou, Y. Zhao, H. Wang, S. Wang, X. Chen, T. F. Bissyand ́ e, J. Klein, and L. Li, “LLM for mobile: An initial roadmap,” ACM Transactions on Software Engineering and Methodology, 2024. [46] J. Chen and R. Lu, “Deceptive delight: Jailbreak LLMs through camouflage and distraction,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://unit42.paloaltonetworks.com/jailbreak-llms-through -camouflage-distraction/ [47] J. Chen and D. Yang, “Unlearn what you want to forget: Efficient unlearning for LLMs,” in Proc. 2023 Conf. on Empirical Methods in Natural Language Processing (EMNLP).Association for Computa- tional Linguistics, Dec. 2023, p. 12 041–12 052. [48] Q. Chen, S. Yamaguchi, and Y. Yamamoto, “Defending against gcg jailbreak attacks with syntax trees and perplexity in LLMs,” in 2024 IEEE 13th Global Conf. on Consumer Electronics (GCCE), 10 2024, p. 1411–1415. [49] Y. Chen, W. Sun, C. Fang, Z. Chen, Y. Ge, T. Han, Q. Zhang, Y. Liu, Z. Chen, and B. Xu, “Security of language models for code: A systematic literature review,” arXiv preprint arXiv:2410.15631, 2024. [50] Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red- teaming LLM agents via poisoning memory or knowledge bases,” arXiv preprint arXiv:2407.12784, 2024. [51] Z. Chen, J. Liu, H. Liu, Q. Cheng, F. Zhang, W. Lu, and X. Liu, “Black- box opinion manipulation attacks to retrieval-augmented generation of large language models,” arXiv preprint arXiv:2407.13757, 2024. [52] P. Cheng, Y. Ding, T. Ju, Z. Wu, W. Du, P. Yi, Z. Zhang, and G. Liu, “TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models,” arXiv preprint arXiv:2405.13401, 2024. [53] P. Cheng, W. Du, Z. Wu, F. Zhang, L. Chen, and G. Liu, “Syntactic ghost: An imperceptible general-purpose backdoor attacks on pre- trained language models,” arXiv preprint arXiv:2402.18945, 2024. [54] P. Cheng, Z. Wu, T. Ju, W. Du, Z. Zhang, and G. Liu, “Transferring backdoors between large language models by knowledge distillation,” arXiv preprint arXiv:2408.09878, 2024. [55] M. Chernyshev, Z. Baig, and R. Doss, “Forensic analysis of indirect prompt injection attacks on LLM agents,” in 2024 IEEE 6th Int. Conf. on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), 10 2024, p. 409–411. [56] A. G. Chowdhury, M. M. Islam, V. Kumar, F. H. Shezan, V. Kumar, V. Jain, and A. Chadha, “Breaking down the defenses: A compar- ative survey of attacks on large language models,” arXiv preprint arXiv:2403.04786, 2024. [57] J. Chu, Y. Liu, Z. Yang, X. Shen, M. Backes, and Y. Zhang, “Compre- hensive assessment of jailbreak attacks against LLMs,” arXiv preprint arXiv:2402.05668, 2024. [58] J. Chua, Y. Li, S. Yang, C. Wang, and L. Yao, “AI safety in generative AI large language models: A survey,” arXiv preprint arXiv:2407.18369, 2024. [59] Z. Coalson, J. Woo, S. Chen, Y. Sun, L. Yang, P. Nair, B. Fang, and S. Hong, “PrisonBreak: Jailbreaking large language models with fewer than twenty-five targeted bit-flips,” arXiv preprint arXiv:2412.07192, 2024. [60] S. Cohen, R. Bitton, and B. Nassi, “Here comes the AI worm: Unleashing zero-click worms that target genAI-powered applications,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://sites.google.com/view/compromptmized [61] CommonCrawl, “Common crawl maintains a free, open repository of web crawl data that can be used by anyone,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://commoncrawl.org/ [62] V. Costan and S. Devadas, “Intel SGX explained,” IACR Cryptology ePrint Archive, vol. 2016, p. 86, 2016. [Online]. Available: https://eprint.iacr.org/2016/086 [63] J. Cui, Y. Xu, Z. Huang, S. Zhou, J. Jiao, and J. Zhang, “Recent advances in attack and defense approaches of large language models,” arXiv preprint arXiv:2409.03274, 2024. [64] T. Cui, Y. Wang, C. Fu, Y. Xiao, S. Li, X. Deng, Y. Liu, Q. Zhang, Z. Qiu, P. Li et al., “Risk taxonomy, mitigation, and assess- ment benchmarks of large language model systems,” arXiv preprint arXiv:2401.05778, 2024. [65] A. Cuthbertson, “AI worm that infects computers and reads emails created by researchers,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://w.independent.co.uk/tech/ai-worm-computer-sec urity-chatgpt-malware-b2506594.html [66] B. C. Das, M. H. Amini, and Y. Wu, “Security and privacy challenges of large language models: A survey,” ACM Comput. Surv., vol. 57, no. 6, Feb. 2025. [67] N. Das, E. Raff, U. Booz, A. Hamilton, and M. Gaur, “Human-readable adversarial prompts: An investigation into LLM vulnerabilities using situational context,” arXiv preprint arXiv:2412.16359, 2024. [68] S. Das, S. Bhattacharya, S. Kundu, S. Kundu, A. Menon, A. Raha, and K. Basu, “AttentionBreaker: Adaptive evolutionary optimization for unmasking vulnerabilities in LLMs through bit-flip attacks,” arXiv preprint arXiv:2411.13757, 2024. [69] E. Debenedetti, J. Zhang, M. Balunovi ́ c, L. Beurer-Kellner, M. Fischer, and F. Tram ` er, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS) - Datasets and Benchmarks, 2024. [70] B. Deng, W. Wang, F. Feng, Y. Deng, Q. Wang, and X. He, “Attack prompt generation for red teaming and defending large language models,” in Findings of the Association for Computational Linguistics (EMNLP 2023). Association for Computational Linguistics, 2023, p. 2176–2189. [71] D. Deng, C. Zhang, H. Zheng, Y. Pu, S. Ji, and Y. Wu, “Adversaflow: Visual red teaming for large language models with multi-level adversar- ial flow,” IEEE Transactions on Visualization and Computer Graphics, vol. 31, p. 492–502, 1 2025. [72] G. Deng, Y. Liu, Y. Li, K. Wang, Y. Zhang, Z. Li, H. Wang, T. Zhang, and Y. Liu, “MASTERKEY: Automated jailbreaking of large language model chatbots,” in Network and Distributed System Security (NDSS) Symp. 2024, Jan. 2024. [73] J. der Assen, A. Huertas, J. Sharif, C. Feng, G. Bovet, and B. Stiller, “ThreatFinderAI: Automated threat modeling applied to LLM system integration,” in 2024 20th Int. Conf. on Network and Service Manage- ment (CNSM), Oct. 2024, p. 1–3. [74] E. Derner and K. Batisti ˇ c, “Beyond the safeguards: Exploring the security risks of chatGPT,” arXiv preprint arXiv:2305.08005, 2023. [75] P. Ding, J. Kuang, D. Ma, X. Cao, Y. Xian, J. Chen, and S. Huang, “A wolf in sheep’s clothing: Generalized nested jailbreak prompts can fool large language models easily,” Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024, vol. 1, p. 2136–2153, 2024. [76] Z. Dong, Z. Zhou, C. Yang, J. Shao, and Y. Qiao, “Attacks, defenses and evaluations for LLM conversation safety: A survey,” in Proc. 2024 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jan. 2024, p. 6734–6747. 30 [77] A. Draguns, A. Gritsevskiy, S. R. Motwani, C. Rogers-Smith, J. Ladish, and C. S. D. Witt, “When LLM meets drl: Advancing jailbreaking efficiency via drl-guided search,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS) Poster, 2024. [78] H. Du, S. Liu, L. Zheng, Y. Cao, A. Nakamura, and L. Chen, “Privacy in fine-tuning large language models: Attacks, defenses, and future directions,” arXiv preprint arXiv:2412.16504, 2024. [79] W. Du, P. Li, H. Zhao, T. Ju, G. Ren, and G. Liu, “UOR: Universal backdoor attacks on pre-trained language models,” in Findings of the Association for Computational Linguistics: ACL 2024.Association for Computational Linguistics, Aug. 2024, p. 7865–7877. [80] Y. Du, S. Zhao, M. Ma, Y. Chen, and B. Qin, “Analyzing the inherent response tendency of LLMs: Real-world instructions-driven jailbreak,” arXiv preprint arXiv:2312.04127, 2023. [81] Y. Du, Z. Li, P. Cheng, X. Wan, and A. Gao, “Detecting AI flaws: Target-driven attacks on internal faults in language models,” arXiv preprint arXiv:2408.14853, 2024. [82] J. Dubi ́ nski, S. Pawlak, F. Boenisch, T. Trzcinski, and A. Dziedzic, “Bucks for buckets (B4B): Active defenses against stealing encoders,” Advances in Neural Information Processing Systems, vol. 36, 2024. [83] A. Dziedzic, M. A. Kaleem, Y. S. Lu, and N. Papernot, “Increasing the cost of model extraction with calibrated proof of work,” in 10th Int. Conf. on Learning Representations (ICLR) 2022 Spotlight, 2022. [84] R. Eldan and M. Russinovich, “Who’s harry potter? approximate unlearning in LLMs,” arXiv preprint arXiv:2310.02238, 2023. [85] J. Evertz, M. Chlosta, L. Sch ̈ onherr, and T. Eisenhofer, “Whispers in the machine: Confidentiality in LLM-integrated systems,” in 33rd USENIX Security Symposium (USENIX Security 24) Poster, 2024. [86] T. Fan, Y. Kang, G. Ma, W. Chen, W. Wei, L. Fan, and Q. Yang, “Fate-LLM: A industrial grade federated learning framework for large language models,” arXiv preprint arXiv:2310.10049, 2023. [87] Federal Office for Information Security (BSI/DE), “Generative AI models opportunities and risks for industry and authorities,” Apr. 2024, accessed: Aug. 01, 2025. [Online]. Available: https://w.bsi. bund.de/SharedDocs/Downloads/EN/BSI/KI/Generative AIModels [88] M. Feffer, A. Sinha, W. H. Deng, Z. C. Lipton, and H. Heidari, “Red- teaming for generative AI: Silver bullet or security theater?” in Proc. 2024 AAAI/ACM Conf. on AI, Ethics, and Society. AAAI Press, 2025, p. 421–437. [89] G. Feretzakis and V. S. Verykios, “Trustworthy AI: Securing sensitive data in large language models,” AI, vol. 5, p. 2773–2800, Dec. 2024. [90] M. A. Ferrag, F. Alwahedi, A. Battah, B. Cherif, A. Mechri, N. Tihanyi, T. Bisztray, and M. Debbah, “Generative AI and large language models for cyber security: All insights you need,” arXiv preprint arXiv:2405.12750, 2024. [91] M. Figueroa, “ChatGPT-4o guardrail jailbreak: Hex encoding for writing CVE exploits,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://0din.ai/blog/chatgpt-4o-guardrail-jailbreak-hex-enc oding-for-writing-cve-exploits [92] R. H. Filho and D. Colares, “A methodology for risk management of generative AI based systems,” in 2024 Int. Conf. on Software, Telecommunications and Computer Networks (SoftCOM), 9 2024, p. 1–6. [93] B. Flesch, “OpenAI: ChatGPT crawler vulnerability,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://github.com/bf/security-adv isories/blob/main/2025- 01- ChatGPT- Crawler- Reflective- DDOS- Vul nerability.md [94] T. Fu, M. Sharma, P. Torr, S. B. Cohen, D. Krueger, and F. Barez, “PoisonBench: Assessing large language model vulnerability to data poisoning,” arXiv preprint arXiv:2410.08811, 2024. [95] E. Galinkin and M. Sablotny, “Improved large language model jailbreak detection via pretrained embeddings,” arXiv preprint arXiv:2412.01547, 2024. [96] F. Galli, L. Melis, and T. Cucinotta, “Noisy neighbors: Effi- cient membership inference attacks against LLMs,” arXiv preprint arXiv:2406.16565, 2024. [97] Y. Gan, Y. Yang, Z. Ma, P. He, R. Zeng, Y. Wang, Q. Li, C. Zhou, S. Li, T. Wang, Y. Gao, Y. Wu, and S. Ji, “Navigating the risks: A survey of security, privacy, and ethics threats in LLM-based agents,” arXiv preprint arXiv:2411.09523, 2024. [98] D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran- Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, and J. Clark, “Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned,” arXiv preprint arXiv:2209.07858, 2022. [99] K. Gao, T. Pang, C. Du, Y. Yang, S.-T. Xia, and M. Lin, “Denial- of-service poisoning attacks against large language models,” arXiv preprint arXiv:2410.10760, 2024. [100] L. Gao, X. Zhang, P. Nakov, and X. Chen, “Shaping the safety boundaries: Understanding and defending against jailbreaks in large language models,” arXiv preprint arXiv:2412.17034, 2024. [101] S. Garg and V. Torra, “Task-specific knowledge distillation with dif- ferential privacy in LLMs,” in 29th European Symp. on Research in Computer Security (ESORICS 2024).Cham: Springer Nature Switzerland, 2024, p. 374–389. [102] Y. Ge, D. Hazarika, Y. Liu, and M. Namazifar, “Supervised fine-tuning of large language models on human demonstrations through the lens of memorization,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://w.amazon.science/publications/supervised- fine- tuning- of-l arge-language-models-on-human-demonstrations-through-the-lens-o f-memorization [103] I. Gim, C. Li, and L. Zhong, “Confidential prompting: Protecting user prompts from cloud LLM providers,” arXiv preprint arXiv:2409.19134, 2024. [104] A. Golda, K. Mekonen, A. Pandey, A. Singh, V. Hassija, V. Chamola, and B. Sikdar, “Privacy and security concerns in generative AI: A comprehensive survey,” IEEE Access, 2024. [105] X. Gong, M. Li, Y. Zhang, F. Ran, C. Chen, Y. Chen, Q. Wang, and K.-Y. Lam, “Papillon: Efficient and stealthy fuzz testing-powered jailbreaks for LLMs,” arXiv preprint arXiv:2409.14866, 2024. [106] Y. Gong and C. Poellabauer, “An overview of vulnerabilities of voice controlled systems,” in 1st Int. Workshop on Security and Privacy for the Internet-of-Things (IoTSec), 2018. [107] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” in Proc. 16th ACM Workshop on Artificial Intelligence and Security.New York, NY, USA: Association for Computing Machinery, 2023, p. 79–90. [108] S. Guo, C. Xie, T. Xiang, J. Li, L. Lyu, and T. Zhang, “Threats to pre-trained language models: Survey and taxonomy,” arXiv preprint arXiv:2202.06862, 2022. [109] M. Gupta, C. Akiri, K. Aryal, E. Parker, and L. Praharaj, “From chatGPT to threatGPT: Impact of generative AI in cybersecurity and privacy,” IEEE Access, 2023. [110] N. R. Haddaway, M. J. Page, C. C. Pritchard, and L. A. McGuinness, “PRISMA2020: An r package and shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digi- tal transparency and open synthesis,” Campbell Systematic Reviews, vol. 18, 2022. [111] H. Hajipour, K. Hassler, T. Holz, L. Schonherr, and M. Fritz, “Codelm- sec benchmark: Systematically evaluating and finding security vul- nerabilities in black-box code language models,” Proceedings - IEEE Conference on Safe and Trustworthy Machine Learning, SaTML 2024, p. 684–709, 2024. [112] V. Hartmann, A. Suri, V. Bindschaedler, D. Evans, S. Tople, and R. West, “Sok: Memorization in general-purpose large language mod- els,” arXiv preprint arXiv:2310.18362, 2023. [113] A. Hasan, I. Rugina, and A. Wang, “Pruning for protection: Increasing jailbreak resistance in aligned LLMs without fine-tuning,” in Proc. 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP.Miami, Florida, US: Association for Computational Lin- guistics, Nov. 2024, p. 417–430. [114] F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,” arXiv preprint arXiv:2407.19354, 2024. [115] D. Hintersdorf, L. Struppek, K. Kersting, A. Dziedzic, and F. Boenisch, “Finding NeMo: Localizing neurons responsible for memorization in diffusion models,” in Conf. on Neural Information Processing Systems (NeurIPS), 2024. [116] D. Hintersdorf, L. Struppek, D. Neider, and K. Kersting, “Defending our privacy with backdoors,” in 2023 NeurIPS Workshop on Backdoors in Deep Learning (BUGS), 2023. [117] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the GAN: Information leakage from collaborative deep learning,” in Pro- ceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, (CCS’17), 2017, p. 603–618. [118] D. Hitaj, G. Pagnotta, F. De Gaspari, S. Ruko, B. Hitaj, L. V. Mancini, and F. Perez-Cruz, “Do you trust your model? emerging malware threats in the deep learning ecosystem,” IEEE Transactions on Dependable and Secure Computing, 2025. 31 [119] D. Hitaj, G. Pagnotta, B. Hitaj, L. V. Mancini, and F. Perez-Cruz, “MaleficNet: Hiding Malware into Deep Neural Networks Using Spread-Spectrum Channel Coding,” in 27th European Symposium on Research in Computer Security (ESORICS 2022), Copenhagen, Den- mark, September 26–30, 2022, Proceedings, Part I. Springer, 2022, p. 425–444. [120] D. Hitaj, G. Pagnotta, B. Hitaj, F. Perez-Cruz, and L. V. Mancini, “Fed- comm: Federated learning as a medium for covert communication,” IEEE Transactions on Dependable and Secure Computing, 2023. [121] R. Hornby, “Microsoft Copilot just helped me pirate Windows 11 - here’s proof,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.laptopmag.com/ai/microsoft-copilot-is-actively-helping-u sers-pirate-windows-heres-proof [122] P. Horvath, L. Chmielewski, L. Weissbart, L. Batina, and Y. Yarom, “BarraCUDA: Edge GPUs do leak DNN weights,” arXiv preprint arXiv:2312.07783, 2024. [123] M. Howard and D. LeBlanc, Writing Secude Code.2nd edition, Microsoft Press, 2003. [124] C.-Y. Hsu, Y.-L. Tsai, C.-H. Lin, P.-Y. Chen, C.-M. Yu, and C.-Y. Huang, “Safe LoRA: the silver lining of reducing safety risks when fine-tuning large language models,” arXiv preprint arXiv:2405.16833, 2024. [125] K. Hu, W. Yu, T. Yao, X. Li, W. Liu, L. Yu, Y. Li, K. Chen, Z. Shen, and M. Fredrikson, “Efficient LLM jailbreak via adaptive dense-to-sparse constrained optimization,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS) Poster, 2024. [126] L. Hu and B. Wang, “DROJ: A prompt-driven attack against large language models,” arXiv preprint arXiv:2411.09125, 2024. [127] X. Hu, P.-Y. Chen, and T.-Y. Ho, “Gradient cuff: Detecting jailbreak attacks on large language models by exploring refusal loss landscapes,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS) Poster, 2024. [128] —, “Token highlighter: Inspecting and mitigating jailbreak prompts for large language models,” in 2024 NeurIPS Workshop SafeGenAi, 2024. [129] K. Huang, B. Chen, Y. Lu, S. Wu, D. Wang, Y. Huang, H. Jiang, Z. Zhou, J. Cao, and X. Peng, “Lifting the veil on the large lan- guage model supply chain: Composition, risks, and mitigations,” arXiv preprint arXiv:2410.21218, 2024. [130] T. Huang, S. Hu, F. Ilhan, S. F. Tekin, and L. Liu, “Harmful fine- tuning attacks and defenses for large language models: A survey,” arXiv preprint arXiv:2409.18169, 2024. [131] —, “Virus: Harmful fine-tuning attack for large language models bypassing guardrail moderation,” arXiv preprint arXiv:2501.17433, 2025. [132] Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating gradient inversion attacks and defenses in federated learning,” Advances in Neural Information Processing Systems, vol. 34, p. 7232–7241, 2021. [133] Y. Huang, S. Gupta, M. Xia, K. Li, and D. Chen, “Identifying and mitigating privacy risks stemming from language models: A survey,” arXiv preprint arXiv:2310.01424, 2023. [134] —, “Catastrophic jailbreak of open-source LLMs via exploiting generation,” in 12th Int. Conf. on Learning Representations (ICLR), 2024. [135] Y. Huang, Y. Ji, W. Hu, J. Chen, A. Rao, and D. Tsechansky, “Bad likert judge: A novel multi-turn technique to jailbreak LLMs by misusing their evaluation capability,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://unit42.paloaltonetworks.com/multi- turn- t echnique-jailbreaks-llms/ [136] Y. Huang, J. Tang, D. Chen, B. Tang, Y. Wan, L. Sun, and X. Zhang, “Obscureprompt: Jailbreaking large language models via obscure in- put,” arXiv preprint arXiv:2406.13662, 2024. [137] B. Hui, H. Yuan, N. Gong, P. Burlina, and Y. Cao, “Pleak: Prompt leaking attacks against large language model applications,” in Proc. 2024 on ACM SIGSAC Conf. on Computer and Communications Security, 2024, p. 3600–3614. [138] K.-H. Hung, C.-Y. Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y. Chen, “Attention tracker: Detecting prompt injection attacks in LLMs,” arXiv preprint arXiv:2411.00348, 2024. [139] Huntr, “The world’s first bug bounty platform for AI/ML,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://huntr.com/ [140] IBM, “Leadership in the age of AI,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://newsroom.ibm.com/image/IBM Responsib le LeadershipReportOCT2023UK1530071123.pdf [141] J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, and M. Seo, “Knowledge unlearning for mitigating privacy risks in lan- guage models,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, p. 14 389–14 408, 2023. [142] J. Ji, B. Hou, A. Robey, G. J. Pappas, H. Hassani, Y. Zhang, E. Wong, and S. Chang, “Defending large language models against jailbreak attacks via semantic smoothing,” arXiv preprint arXiv:2402.16192, 2024. [143] X. Jia, T. Pang, C. Du, Y. Huang, J. Gu, Y. Liu, X. Cao, and M. Lin, “Improved techniques for optimization-based jailbreaking on large language models,” in 13th Int. Conf. on Learning Representations (ICLR) Poster, 2025. [144] F. Jiang, Z. Xu, L. Niu, B. Y. Lin, and R. Poovendran, “ChatBug: A common vulnerability of aligned LLMs induced by chat templates,” arXiv preprint arXiv:2406.12935, 2024. [145] L. Jiang, K. Rao, S. Han, A. Ettinger, F. Brahman, S. Kumar, N. Mireshghallah, X. Lu, M. Sap, Y. Choi, and N. Dziri, “WildTeaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models,” in ICML 2024 Next Generation of AI Safety Workshop, 2024. [146] S. Jiang, X. Chen, and R. Tang, “Prompt packer: Deceiving LLMs through compositional instruction with hidden attacks,” arXiv preprint arXiv:2310.10077, 2023. [147] S. Jiang, X. Chen, K. Xu, L. Chen, H. Ren, and R. Tang, “Decom- position, synthesis and attack: A multi-instruction fusion method for jailbreaking LLMs,” IEEE Internet of Things Journal, p. 1, Feb. 2025. [148] T. Jiang, Z. Wang, J. Liang, C. Li, Y. Wang, and T. Wang, “RobustKV: Defending large language models against jailbreak attacks via kv eviction,” in 13th Int. Conf. on Learning Representations (ICLR) Poster, 2025. [149] R. Jiao, S. Xie, J. Yue, T. Sato, L. Wang, Y. Wang, Q. A. Chen, and Q. Zhu, “Can we trust embodied agents? exploring backdoor attacks against embodied LLM-based decision-making systems,” arXiv preprint arXiv:2405.20774, 2024. [150] A. E. W. Johnson, L. Bulgarelli, and T. J. Pollard, “Deidentification of free-text medical records using pre-trained bidirectional transformers,” in Proc. ACM Conf. on Health, Inference, and Learning. New York, NY, USA: Association for Computing Machinery, 2020, p. 214–221. [151] M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “PRADA: Protecting against DNN model stealing attacks,” in 2019 IEEE European Symp. on Security and Privacy (EuroS&P). Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2019, p. 512–527. [152] N. Kandpal, M. Jagielski, F. Tram ` er, and N. Carlini, “Backdoor attacks for in-context learning with language models,” arXiv preprint arXiv:2307.14692, 2023. [153] N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C. A. Choquette-Choo, and Z. Xu, “User inference attacks on large language models,” in Proc. 2024 Conf. on Empirical Methods in Natural Language Processing (EMNLP).Association for Computational Linguistics, 2024, p. 18 238–18 265. [154] D. Kang, X. Li, I. Stoica, C. Guestrin, M. Zaharia, and T. Hashimoto, “Exploiting programmatic behavior of LLMs: Dual-use through stan- dard security attacks,” in 2024 IEEE Security and Privacy Workshops (SPW). Los Alamitos, CA, USA: IEEE Computer Society, May 2024, p. 132–143. [155] P. Kassianik, “Using AI to automatically jailbreak gpt-4 and other LLMs in under a minute,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://w.robustintelligence.com/blog-posts/using-ai-to-a utomatically-jailbreak-gpt-4-and-other-llms-in-under-a-minute [156] M. Kim, Y. Kim, H. Seo, H. Choi, J. Han, G. Kee, S. Ko, H. Jung, B. Kim, Y.-H. Kim, S. Park, and T. J. Jun, “Mitigating adversarial attacks in LLMs through defensive suffix generation,” arXiv preprint arXiv:2412.13705, 2024. [157] S. Kim, S. Yun, H. Lee, M. Gubri, S. Yoon, and S. J. Oh, “Propile: Probing privacy leakage in large language models,” Advances in Neural Information Processing Systems, vol. 36, 2023. [158] N. M. Kirch, S. Field, and S. Casper, “What features in prompts jailbreak LLMs? investigating the mechanisms behind attacks,” in 2024 NeurIPS Workshop Red Teaming GenAI: What Can We Learn from Adversaries?, 2024. [159] L. Kohnfelder and P. Garg, “The threats to our products,” Microsoft, Apr. 1999, accessed: Aug. 01, 2025. [Online]. Available: https://shostack.org/files/microsoft/The-Threats-To-Our-Products.docx [160] E. Kovacs, “ChatGPT, deepseek vulnerable to AI jailbreaks,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.securitywe ek.com/ai-jailbreaks-target-chatgpt-deepseek-alibaba-qwen/ [161] —, “Unprotected DeepSeek database exposed chats, other sensitive information,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.securityweek.com/unprotected- deepseek- database- leake d-highly-sensitive-information/ 32 [162] W. Kuang, B. Qian, Z. Li, D. Chen, D. Gao, X. Pan, Y. Xie, Y. Li, B. Ding, and J. Zhou, “Federatedscope-LLM: A comprehensive package for fine-tuning large language models in federated learning,” in Proc. 30th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery, 2024, p. 5260–5271. [163] A. Kumar, J. Roh, A. Naseh, M. Karpinska, M. Iyyer, A. Houmansadr, and E. Bagdasarian, “OverThink: Slowdown attacks on reasoning LLMs,” arXiv preprint arXiv:2502.02542, 2025. [164] T. Kwartler, M. Berman, and A. Aqrawi, “Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models,” arXiv preprint arXiv:2410.14479, 2024. [165] LangChain, “LangChain: Build context-aware reasoning applications,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://github.c om/langchain-ai/langchain [166] —, “Langchain: Tool calling,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://python.langchain.com/v0.1/docs/modules /model io/chat/functioncalling/ [167] L. D. lderczynski, E. G. egalinkin, J. M. jemartin, S. Majumdar, and N. I. nans, “garak: A framework for security probing large language models,” arXiv preprint arXiv:2406.11036, 2024. [168] D. Lee and M. Tiwari, “Prompt infection: LLM-to-LLM prompt in- jection within multi-agent systems,” arXiv preprint arXiv:2410.07283, 2024. [169] S. Lee, M. Kim, L. Cherif, D. Dobre, J. Lee, S. J. Hwang, K. Kawaguchi, G. Gidel, Y. Bengio, N. Malkin, and M. Jain, “Learning diverse attacks on large language models for robust red-teaming and safety tuning,” arXiv preprint arXiv:2405.18540, 2024. [170] S. Lee, S. Ni, C. Wei, S. Li, L. Fan, A. Argha, H. Alinejad-Rokny, R. Xu, Y. Gong, and M. Yang, “xJailbreak: Representation space guided reinforcement learning for interpretable LLM jailbreaking,” arXiv preprint arXiv:2501.16727, 2025. [171] Y. Lee, T. Park, Y. Lee, J. Gong, and J. Kang, “Exploring potential prompt injection attacks in federated military LLMs and their mitiga- tion,” arXiv preprint arXiv:2501.18416, 2025. [172] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K ̈ uttler, M. Lewis, W.-t. Yih, T. Rockt ̈ aschel et al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, p. 9459–9474, 2020. [173] B. Li, H. Xing, C. Huang, J. Qian, H. Xiao, L. Feng, and C. Tian, “Ex- ploiting uncommon text-encoded structures for automated jailbreaks in LLMs,” arXiv preprint arXiv:2406.08754, 2024. [174] H. Li, Y. Chen, J. Luo, J. Wang, H. Peng, Y. Kang, X. Zhang, Q. Hu, C. Chan, Z. Xu et al., “Privacy in large language models: Attacks, defenses and future directions,” arXiv preprint arXiv:2310.10383, 2023. [175] H. Li, J. Ye, J. Wu, T. Yan, C. Wang, and Z. Li, “JailPO: A novel black- box jailbreak framework via preference optimization against aligned LLMs,” arXiv preprint arXiv:2412.15623, 2024. [176] J. Li, Y. Liu, C. Liu, L. Shi, X. Ren, Y. Zheng, Y. Liu, and Y. Xue, “A cross-language investigation into jailbreak attacks in large language models,” arXiv preprint arXiv:2401.16765, 2024. [177] X. Li, Y. Zhang, R. Lou, C. Wu, and J. Wang, “Chain-of-scrutiny: Detecting backdoor attacks for large language models,” arXiv preprint arXiv:2406.05948, 2024. [178] X. Li, Z. Li, Q. Li, B. Lee, J. Cui, and X. Hu, “Faster-GCG: Efficient discrete optimization jailbreak attacks against aligned large language models,” arXiv preprint arXiv:2410.15362, 2024. [179] Y. Li, T. Li, K. Chen, J. Zhang, S. Liu, W. Wang, T. Zhang, and Y. Liu, “Badedit: Backdooring large language models by model editing,” 12th International Conference on Learning Representations, ICLR 2024, 2024. [180] Y. Li, H. Wen, W. Wang, X. Li, Y. Yuan, G. Liu, J. Liu, W. Xu, X. Wang, Y. Sun et al., “Personal LLM agents: Insights and sur- vey about the capability, efficiency and security,” arXiv preprint arXiv:2401.05459, 2024. [181] Y. Li, Y. Liu, Y. Li, L. Shi, G. Deng, S. Chen, and K. Wang, “Lock- picking LLMs: A logit-based jailbreak using token-level manipulation,” arXiv preprint arXiv:2405.13068, 2024. [182] Y. Li, Z. Zhang, K. Wang, L. Shi, and H. Wang, “Model-editing-based jailbreak against safety-aligned large language models,” arXiv preprint arXiv:2412.08201, 2024. [183] Z. Li, Y. Zeng, P. Xia, L. Liu, Z. Fu, and B. Li, “Large language models are good attackers: Efficient and stealthy textual backdoor attacks,” arXiv preprint arXiv:2408.11587, 2024. [184] S. Lin, R. Li, X. Wang, C. Lin, W. Xing, and M. Han, “Figure it out: Analyzing-based jailbreak attack on large language models,” arXiv preprint arXiv:2407.16205, 2024. [185] Y. Lin, P. He, H. Xu, Y. Xing, M. Yamada, H. Liu, and J. Tang, “Towards understanding jailbreak attacks in LLMs: A representation space analysis,” in Proc. 2024 Conf. on Empirical Methods in Natural Language Processing (EMNLP).Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, p. 7067–7085. [186] U. Lindqvist and E. Jonsson, “How to systematically classify computer security intrusions,” in Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No. 97CB36097). IEEE, 1997, p. 154–163. [187] —, “A map of security risks associated with using cots,” Computer, vol. 31, no. 6, p. 60–66, 2002. [188] U. Lindqvist and P. A. Porras, “Detecting computer and network misuse through the production-based expert system toolset (p-best),” in Proceedings of the 1999 IEEE symposium on security and privacy (Cat. No. 99CB36344). IEEE, 1999, p. 146–161. [189] C. Liu, F. Zhao, L. Qing, Y. Kang, C. Sun, K. Kuang, and F. Wu, “Goal-oriented prompt attack and safety evaluation for LLMs,” arXiv preprint arXiv:2309.11830, 2023. [190] Q. Liu, J. Yin, W. Wen, C. Yang, and S. Sha, “NeuroPots: Realtime proactive defense against Bit-Flip attacks in neural networks,” in 32nd USENIX Security Symp. (USENIX Security 23).Anaheim, CA: USENIX Association, Aug. 2023, p. 6347–6364. [191] Q. Liu, W. Mo, T. Tong, J. Xu, F. Wang, C. Xiao, and M. Chen, “Mit- igating backdoor threats to large language models: Advancement and challenges,” in 2024 60th Annual Allerton Conf. on Communication, Control, and Computing, 9 2024, p. 1–8. [192] S. Liu, B. Sabir, S. I. Jang, Y. Kansal, Y. Gao, K. Moore, A. Abuadbba, and S. Nepal, “Feint and attack: Attention-based strategies for jail- breaking and protecting LLMs,” arXiv preprint arXiv:2410.16327, 2024. [193] T. Liu, H. Yao, T. Wu, Z. Qin, F. Lin, K. Ren, and C. Chen, “Mitigating privacy risks in LLM embeddings from embedding inversion,” arXiv preprint arXiv:2411.05034, 2024. [194] T. Liu, Z. Deng, G. Meng, Y. Li, and K. Chen, “Demystifying rce vulnerabilities in LLM-integrated apps,” in Proc. 2024 on ACM SIGSAC Conf. on Computer and Communications Security. New York, NY, USA: Association for Computing Machinery, 2024, p. 1716– 1730. [195] X. Liu, P. Li, E. Suh, Y. Vorobeychik, Z. Mao, S. Jha, P. McDaniel, H. Sun, B. Li, and C. Xiao, “Autodan-turbo: A lifelong agent for strategy self-exploration to jailbreak LLMs,” in 13th Int. Conf. on Learning Representations (ICLR) Spotlight, 2025. [196] Y. Liu, G. Deng, Y. Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y. Liu, H. Wang, Y. Zheng et al., “Prompt injection attack against LLM-integrated applications,” arXiv preprint arXiv:2306.05499, 2023. [197] Y. Liu, G. Deng, Z. Xu, Y. Li, Y. Zheng, Y. Zhang, L. Zhao, T. Zhang, K. Wang, and Y. Liu, “Jailbreaking chatGPT via prompt engineering: An empirical study,” arXiv preprint arXiv:2305.13860, 2023. [198] Y. Liu, H. He, T. Han, X. Zhang, M. Liu, J. Tian, Y. Zhang, J. Wang, X. Gao, T. Zhong, Y. Pan, S. Xu, Z. Wu, Z. Liu, X. Zhang, S. Zhang, X. Hu, T. Zhang, N. Qiang, T. Liu, and B. Ge, “Understanding LLMs: A comprehensive overview from training to inference,” Neurocomput- ing, vol. 620, p. 129190, 2025. [199] Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in 33rd USENIX Security Symp. (USENIX Security 24).Philadelphia, PA: USENIX Association, Aug. 2024, p. 1831–1847. [200] Z. Liu, T. Zhu, C. Tan, and W. Chen, “Learning to refuse: Towards mitigating privacy risks in LLMs,” arXiv preprint arXiv:2407.10058, 2024. [201] L. Lundblade, G. Mandyam, J. O’Donoghue, and C. Wallace, “The Entity Attestation Token (EAT),” RFC 9711, Apr. 2025. [Online]. Available: https://w.rfc-editor.org/info/rfc9711 [202] S. Luo, W. Shao, Y. Yao, J. Xu, M. Liu, Q. Li, B. He, M. Wang, G. Deng, H. Hou, X. Zhang, and L. Song, “Privacy in LLM-based rec- ommendation: Recent advances and future directions,” arXiv preprint arXiv:2406.01363, 2024. [203] P. Mai, R. Yan, Z. Huang, Y. Yang, and Y. Pang, “Split- and-denoise: Protect large language model inference with local differential privacy,” Proceedings of Machine Learning Research, vol. 235, p. 34 281–34 302, 2024. [Online]. Available: https: //proceedings.mlr.press/v235/mai24a.html [204] H. Maia, C. Xiao, D. Li, E. Grinspun, and C. Zheng, “Can one hear the shape of a neural network?: Snooping the GPU via magnetic 33 side channel,” in 31st USENIX Security Symp. (USENIX Security 22). USENIX Association, Aug. 2022, p. 4383–4400. [205] C. D. Maio, C. Cosci, M. Maggini, V. Poggioni, and S. Melacci, “Pirates of the RAG: Adaptively attacking LLMs to leak knowledge bases,” arXiv preprint arXiv:2412.18295, 2024. [206] E. Maor, “2025 cato CTRL threat report,” CATO Networks, 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.catonetwor ks.com/resources/2025-cato-ctrl-threat-report-rise-of-zero-knowledge -threat-actor/ [207] J. Mattern, F. Mireshghallah, Z. Jin, B. Sch ̈ olkopf, M. Sachan, and T. Berg-Kirkpatrick, “Membership inference attacks against language models via neighbourhood comparison,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, p. 11 330–11 343, 2023. [Online]. Available: https: //aclanthology.org/2023.findings-acl.719/ [208] M. Meeus, S. Jain, M. Rei, and Y.-A. de Montjoye, “Did the neurons read your book? document-level membership inference for large lan- guage models,” in 33rd USENIX Security Symp. (USENIX Security 24). USA: USENIX Association, 2024, p. 2369–2385. [209] A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box LLMs automatically,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS 2024), 2023. [210] Microsoft, “Announcing microsoft copilot, your everyday AI companion,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://blogs.microsoft.com/blog/2023/09/21/announcing- microsoft- c opilot-your-everyday-ai-companion/ [211] M. Miranda, E. S. Ruzzetti, A. Santilli, F. M. Zanzotto, S. Brati ` eres, and E. Rodol ` a, “Preserving privacy in large language models: A survey on current threats and solutions,” Transactions on Machine Learning Research, 2025. [212] F. Mireshghallah, K. Goyal, A. Uniyal, T. Berg-Kirkpatrick, and R. Shokri, “Quantifying privacy risks of masked language models using membership inference attacks,” arXiv preprint arXiv:2203.03929, 2022. [213] L. Montulli and D. M. Kristol, “HTTP State Management Mechanism,” RFC 2109, Feb. 1997. [Online]. Available: https: //w.rfc-editor.org/info/rfc2109 [214] J. X. Morris, V. Kuleshov, V. Shmatikov, and A. M. Rush, “Text embeddings reveal (almost) as much as text,” in Proc. 2023 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2023, p. 12 448–12 460. [215] M. Mozes, X. He, B. Kleinberg, and L. D. Griffin, “Use of LLMs for illicit purposes: Threats, prevention measures, and vulnerabilities,” arXiv preprint arXiv:2308.12833, 2023. [216] O. Muliarevych, “Enhancing system security: LLM-driven defense against prompt injection vulnerabilities,” in 2024 IEEE 17th Int. Conf. on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), 10 2024, p. 420–423. [217] N. Nazari, H. M. Makrani, C. Fang, H. Sayadi, S. Rafatirad, K. N. Khasawneh, and H. Homayoun, “Forget and rewire: Enhancing the resilience of transformer-based models against bit-flip attacks,” in 33rd USENIX Security Symp. (USENIX Security 24).Philadelphia, PA: USENIX Association, Aug. 2024, p. 1349–1366. [218] N. Nazari, F. Xiang, C. Fang, H. M. Makrani, A. Puri, K. Patwari, H. Sayadi, S. Rafatirad, C.-N. Chuah, and H. Homayoun, “LLM-FIN: Large language models fingerprinting attack on edge devices,” in 2024 25th Int. Symp. on Quality Electronic Design (ISQED), 2024, p. 1–6. [219] S. Neel and P. W. Chang, “Privacy issues in large language models: A survey,” arXiv preprint arXiv:2312.06717, 2023. [220] P. G. Neumann and U. Lindqvist, “The future of misuse detection,” Communications of the ACM, vol. 67, no. 11, p. 27–28, 2024. [221] D. of Homeland Security, “Roles and responsibilities framework for artificial intelligence in critical infrastructure,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://w.dhs.gov/publication/roles-and-r esponsibilities-framework-artificial-intelligence-critical-infrastructure [222] R. Onitza-Klugman, “In localhost we trust: Exploring vulnerabilities in cortex.cpp, jan’s AI engine,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://snyk.io/articles/in-localhost-we-trust-explo ring-vulnerabilities-in-cortex-cpp-jans-ai-engine/ [223] OpenAI, “Introducing chatGPT,” 2022, accessed: Aug. 01, 2025. [Online]. Available: https://openai.com/index/chatgpt/ [224] —, “ChatGPT plugins,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://openai.com/index/chatgpt-plugins/ [225] —, “March 20 chatGPT outage: Here’s what happened,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://openai.com/ind ex/march-20-chatgpt-outage/ [226] —, “Hello GPT-4o,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://openai.com/index/hello-gpt-4o/ [227] OpenAI, “Sharing feedback, evals, and API data with openAI,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://help.openai.com/ en/articles/10306912-sharing-feedback-evals-and-api-data-with-opena i [228] A. Oprea and A. Vassilev, “Adversarial machine learning: A taxonomy and terminology of attacks and mitigations,” National Institute of Standards and Technology (NIST), 2023, accessed: Aug. 01, 2025. [Online]. Available: https://csrc.nist.gov/pubs/ai/100/2/e2023/final [229] OWASP, “OWASP cheat sheet series,” accessed: Aug. 01, 2025. [Online]. Available: https://cheatsheetseries.owasp.org/ [230] OWASP, “OWASP top 10 for LLM applications,” Oct. 2023, accessed: Aug. 01, 2025. [Online]. Available: https://owasp.org/w-project-t op-10-for-large-language-model-applications/ [231] —, “LLM03:2025 supply chain,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://genai.owasp.org/llmrisk/llm032025-suppl y-chain/ [232] OWASP, “LLM04:2025 data and model poisoning,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://genai.owasp.org/llmrisk/l m042025-data-and-model-poisoning/ [233] G. Pagnotta, D. Hitaj, B. Hitaj, F. Perez-Cruz, and L. V. Mancini, “Tat- tooed: A robust deep neural network watermarking scheme based on spread-spectrum channel coding,” in 2024 Annual Computer Security Applications Conference (ACSAC). IEEE, 2024, p. 1245–1258. [234] S. Pahune and Z. Akhtar, “Transitioning from MLOps to LLMOps: Navigating the unique challenges of large language models,” Informa- tion, vol. 16, no. 2, 2025. [235] X. Pan, M. Zhang, S. Ji, and M. Yang, “Privacy risks of general-purpose language models,” in 2020 IEEE Symp. on Security and Privacy (SP), 5 2020, p. 1314–1331. [236] Y. Pan, L. Pan, W. Chen, P. Nakov, M. Y. Kan, and W. Y. Wang, “On the risk of misinformation pollution with large language models,” Findings of the Association for Computational Linguistics: EMNLP 2023, p. 1389–1403, 2023. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.97/ [237] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE Symp. on Security and Privacy (SP).Los Alamitos, CA, USA: IEEE Computer Society, May 2016, p. 582–597. [238] J. Park, J. O’Brien, C. Cai, M. Morris, P. Liang, and M. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in 36th Annual ACM Symp. on User Interface Software and Technology, Oct. 2023, p. 1–22. [239] D. Pasquini, E. M. Kornaropoulos, and G. Ateniese, “Hacking back the AI-hacker: Prompt injection as a defense against LLM-driven cyberattacks,” arXiv preprint arXiv:2410.20911, 2024. [240] V. Patil, P. Hase, and M. Bansal, “Can sensitive information be deleted from LLMs? objectives for defending against extraction attacks,” 12th International Conference on Learning Representations, ICLR 2024, 2024. [Online]. Available: https://openreview.net/forum?id=7erlRDoa V8 [241] R. Pedro, D. Castro, P. Carreira, and N. Santos, “From prompt injec- tions to sql injection attacks: How protected is your LLM-integrated web application?” arXiv preprint arXiv:2308.01990, 2023. [242] A. Peng, J. Michael, H. Sleight, E. Perez, and M. Sharma, “Rapid response: Mitigating LLM jailbreaks with a few examples,” arXiv preprint arXiv:2411.07494, 2024. [243] B. Peng, Z. Bi, Q. Niu, M. Liu, P. Feng, T. Wang, L. K. Q. Yan, Y. Wen, Y. Zhang, and C. H. Yin, “Jailbreaking and mitigation of vulnerabilities in large language models,” arXiv preprint arXiv:2410.15236, 2024. [244] E. Perez, S. Huang, F. Song, T. Cai, R. Ring, J. Aslanides, A. Glaese, N. McAleese, and G. Irving, “Red teaming language models with language models,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, p. 3419– 3448, 2022. [245] F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” in 2022 NeurIPS Machine-Learning Safety Workshop (MLSW), 2022. [246] A. J. Peterson, “Ai and the problem of knowledge collapse,” AI & SOCIETY, vol. 40, no. 5, p. 3249–3269, Jun 2025. [247] I. Petrov, D. I. Dimitrov, M. Baader, M. N. M ̈ uller, and M. Vechev, “DAGER: Exact gradient inversion for large language models,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS) Poster, 2024. 34 [248] F. Qi, Y. Chen, M. Li, Y. Yao, Z. Liu, and M. Sun, “ONION: A simple and effective defense against textual backdoor attacks,” in Proc. 2021 Conf. on Empirical Methods in Natural Language Processing, 2021. [249] F. Qi, M. Li, Y. Chen, Z. Zhang, Z. Liu, Y. Wang, and M. Sun, “Hidden killer: Invisible textual backdoor attacks with syntactic trigger,” in Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, 2021. [250] B. Rababah, S. T. Wu, M. Kwiatkowski, C. K. Leung, and C. G. Akcora, “SoK: Prompt hacking of large language models,” in 2024 IEEE Int. Conf. on Big Data (BigData), 2024, p. 5392–5401. [251] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of machine learning research, vol. 21, no. 140, p. 1–67, 2020. [252] J. Raghuram, G. Kesidis, and D. J. Miller, “A study of back- doors in instruction fine-tuned language models,” arXiv preprint arXiv:2406.07778, 2024. [253] T. Raheja, N. Pochhi, and F. D. C. M. Curie, “Recent advancements in LLM red-teaming: Techniques, defenses, and ethical considerations,” arXiv preprint arXiv:2410.09097, 2024. [254] A. S. Rakin, Z. He, and D. Fan, “Bit-flip attack: Crushing neural network with progressive bit search,” in 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2019, p. 1211–1220. [255] A. Rao, S. Vashistha, A. Naik, S. Aditya, and M. Choudhury, “Tricking LLMs into disobedience: Formalizing, analyzing, and detecting jail- breaks,” in Proc. 2024 Joint Int. Conf. on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).ELRA and ICCL, 2024, p. 16 802–16 830. [256] J. Ren, Y. Li, S. Zeng, H. Xu, L. Lyu, Y. Xing, and J. Tang, “Unveiling and mitigating memorization in text-to-image diffusion models through cross attention,” in 18th European Conf. Computer Vision (ECCV 2024). Berlin, Heidelberg: Springer-Verlag, 2024, p. 340–356. [257] M. Rigaki and S. Garcia, “A survey of privacy attacks in machine learning,” ACM Computing Surveys, vol. 56, no. 4, p. 1–34, 2023. [258] R. Ross et al., “Security and privacy controls for information systems and organizations, SP 800-53 Rev5,” National Institute of Standards and Technology (NIST), 2020, accessed: Aug. 01, 2025. [Online]. Available: https://csrc.nist.gov/pubs/sp/800/53/r5/upd1/final [259] A. Rrv, N. Tyagi, N. Uddin, N. Varshney, and C. Baral, “Semantic membership inference attack against large language models,” arXiv preprint arXiv:2406.10218, 2024. [260] M. Russinovich, “Mitigating skeleton key, a new type of generative AI jailbreak technique,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://w.microsoft.com/en-us/security/blog/2024/06/26 /mitigating-skeleton-key-a-new-type-of-generative-ai-jailbreak-techn ique/ [261] M. Russinovich and A. Salem, “Jailbreaking is (mostly) simpler than you think,” arXiv preprint arXiv:2503.05264, 2025. [262] M. Russinovich, A. Salem, and R. Eldan, “Great, now write an article about that: The crescendo multi-turn LLM jailbreak attack,” arXiv preprint arXiv:2404.01833, 2024. [263] R. Samoilenko, “New prompt injection attack on chatGPT web version. reckless copy-pasting may lead to serious privacy issues in your chat.” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://kajojify.github.io/articles/1 chatgptattack.pdf [264] T. Schick, J. Dwivedi-Yu, R. Dess ` ı, R. Raileanu, M. Lomeli, E. Ham- bro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” Advances in Neural Information Processing Systems, vol. 36, p. 68 539–68 551, 2023. [265] B. Schneier, “Security in the cloud,” 2006, accessed: Aug. 01, 2025. [Online]. Available: https://w.schneier.com/blog/archives/2006/02/s ecurity inthe.html [266] S. Schulhoff, J. Pinto, A. Khan, L.-F. Bouchard, C. Si, S. Anati, V. Tagliabue, A. Kost, C. Carnahan, and J. Boyd-Graber, “Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition,” in Proc. 2023 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Singapore: Association for Computational Linguistics, Dec. 2023, p. 4945–4977. [267] D. A. Shafiq, N. Jhanjhi, and A. Abdullah, “Load balancing techniques in cloud computing environment: A review,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, p. 3910–3933, 2022. [268] M. Shanahan, K. McDonell, and L. Reynolds, “Role play with large language models,” Nature, vol. 623, no. 7987, p. 493–498, 2023. [269] S. Shang, Z. Yao, Y. Yao, L. Su, Z. Fan, X. Zhang, and Z. Jiang, “IntentObfuscator: A jailbreaking method via confusing LLM with prompts,” in 29th European Symp. on Research in Computer Security (ESORICS 2024). Berlin, Heidelberg: Springer-Verlag, 2024, p. 146– 165. [270] X. Shen, Z. Chen, M. Backes, Y. Shen, and Y. Zhang, “Do anything now: Characterizing and evaluating in-the-wild jailbreak prompts on large language models,” in Proc. 2024 on ACM SIGSAC Conf. on Computer and Communications Security.New York, NY, USA: Association for Computing Machinery, 2024, p. 1671–1685. [271] X. Shen, Y. Wu, M. Backes, and Y. Zhang, “Voice jailbreak attacks against GPT-4o,” arXiv preprint arXiv:2405.19103, 2024. [272] M. Sherwood and J. Lee, “Large language models on-device with MediaPipe and TensorFlow lite,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://developers.googleblog.com/en/large-langu age-models-on-device-with-mediapipe-and-tensorflow-lite/ [273] J. Shi, Z. Yuan, Y. Liu, Y. Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” in Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024, p. 660–674. [274] E. Shimony, “Jailbreak of openAI o3 models,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.linkedin.com/posts/eran-s himony cybersecurity-aijailbreak-adversarialai-activity-72928571761 61714177-iC2/ [275] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE Symp. on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE Computer Society, May 2017, p. 3–18. [276] I. Shumailov, Y. Zhao, D. Bates, N. Papernot, R. Mullins, and R. An- derson, “Sponge examples: Energy-latency attacks on neural networks,” in 2021 IEEE European Symp. on Security and Privacy (EuroS&P), 2021, p. 212–231. [277] P. Slattery, A. K. Saeri, E. A. C. Grundy, J. Graham, M. Noetel, R. Uuk, J. Dao, S. Pour, S. Casper, and N. Thompson, “The AI risk repository: A comprehensive meta-review, database, and taxonomy of risks from artificial intelligence,” arXiv preprint arXiv:2408.12622, 2024. [278] L. Song, Z. Pang, W. Wang, Z. Wang, X. Wang, H. Chen, W. Song, Y. Jin, D. Meng, and R. Hou, “The early bird catches the leak: Un- veiling timing side channels in LLM serving systems,” arXiv preprint arXiv:2409.20002, 2024. [279] R. Staab, M. Vero, M. Balunovi ́ c, and M. Vechev, “Beyond memoriza- tion: Violating privacy via inference with large language models,” in 12th Int. Conf. on Learning Representations (ICLR) Spotlight, 2024. [280] J. Su, J. Kempe, K. Ullrich, and M. Ai, “Mission impossible: A statistical perspective on jailbreaking LLMs,” in 38th Annual Conf. on Neural Information Processing Systems (NeurIPS) Poster, 2024. [281] Y. Sun, L. Duan, and Y. Li, “PSY: Posterior sampling based privacy enhancer in large language models,” arXiv preprint arXiv:2410.18824, 2024. [282] Z. Sun and A. V. Miceli-Barone, “Scaling behavior of machine trans- lation with large language models under prompt injection attacks,” in Proc. 1st Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024).St. Julian’s, Malta: Association for Computational Linguistics, Mar. 2024, p. 9–23. [283] X. Suo, “Signed-prompt: A new approach to prevent prompt injection attacks against LLM-integrated applications,” AIP Conference Proceed- ings, vol. 3194, no. 1, p. 40013, 12 2024. [284] A. Suri and D. Evans, “Formalizing and estimating distribution infer- ence risks,” in Proc. Privacy Enhancing Technologies (PoPETS), vol. 4, 2022, p. 528–551. [285] X. Tan, H. Luan, M. Luo, X. Sun, P. Chen, and J. Dai, “Knowledge database or poison base? detecting RAG poisoning attack through LLM activations,” arXiv preprint arXiv:2411.18948, 2024. [286] Y. Tao, Y. Shen, H. Zhang, Y. Shen, L. Wang, C. Shi, and S. Du, “Robustness of large language models against adversarial attacks,” arXiv preprint arXiv:2412.17011, 2024. [287] S. B. Tete, “Threat modelling and risk analysis for large language model (LLM)-powered applications,” arXiv preprint arXiv:2406.11007, 2024. [288] The MITRE Corporation, “MITRE ATLAS,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://atlas.mitre.org/matrices/ATLAS [289] L. N. Tidjon and F. Khomh, “Threat assessment in machine learning based systems,” arXiv preprint arXiv:2207.00091, 2022. [290] L. Tinnel and U. Lindqvist, “Importance of cyber security analysis in the operational technology system lifecycle,” in International Confer- ence on Critical Infrastructure Protection. Springer, 2022, p. 73–101. 35 [291] A. Tomassi, “Data security and privacy concerns for generative AI platforms,” Politecnico di Torino, 2024, accessed: Aug. 01, 2025. [Online]. Available: https://webthesis.biblio.polito.it/33202/ [292] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi ` ere, N. Goyal, E. Hambro, F. Azhar et al., “LLaMa: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. [293] K. Traykov, “A framework for security testing of large language models,” in 2024 IEEE 12th Int. Conf. on Intelligent Systems (IS), 8 2024, p. 1–7. [294] Trusted Computing Group, “Tpm 2.0 library specification,” 2023, accessed: Aug. 01, 2025. [Online]. Available: https://trustedcomputing group.org/resource/tpm-library-specification/ [295] N. Varshney, P. Dolin, A. Seth, and C. Baral, “The art of defending: A systematic evaluation and analysis of LLM defense strategies on safety and over-defensiveness,” arXiv preprint arXiv:2401.00287, 2023. [296] Veriti Research, “CVE-2024-27564 actively exploited in the wild,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://veriti.ai/bl og/veriti-research/cve-2024-27564-actively-exploited/ [297] A. Verma, S. Krishna, S. Gehrmann, M. Seshadri, A. Pradhan, T. Ault, L. Barrett, D. Rabinowitz, J. Doucette, and N. Phan, “Operationalizing a threat model for red-teaming large language models (LLMs),” arXiv preprint arXiv:2407.14937, 2024. [298] VirusTotal, “VirusTotal platform,” 2025, accessed: Aug. 01, 2025. [Online]. Available: https://w.virustotal.com/ [299] Voice LLM Bot, “Speak to ollama LLMs in any language,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://github.com/iam aziz/llm-voice-bot [300] J. Wang, Z. Zhang, M. Wang, H. Qiu, T. Zhang, Q. Li, Z. Li, T. Wei, and C. Zhang, “Aegis: Mitigating targeted bit-flip attacks against deep neural networks,” in 32nd USENIX Security Symp. (USENIX Security 23). Anaheim, CA: USENIX Association, Aug. 2023, p. 2329–2346. [301] J. Wang, S. Ahn, T. Dalal, X. Zhang, W. Pan, Q. Zhang, B. Chen, H. H. Dodge, F. Wang, and J. Zhou, “Defensive prompt patch: A robust and interpretable defense of LLMs against jailbreak attacks,” arXiv preprint arXiv:2405.20099, 2024. [302] S. Wang, P. S. Yu, T. Zhu, B. Liu, M. Ding, X. Guo, D. Ye, and W. Zhou, “Unique security and privacy threats of large language model: A comprehensive survey,” arXiv preprint arXiv:2406.07973, 2018. [303] S. Wang, Z. Long, Z. Fan, and Z. Wei, “From LLMs to MLLMs: Exploring the landscape of multimodal jailbreaking,” arXiv preprint arXiv:2406.14859, 2024. [304] X. Wang, D. Wu, Z. Ji, Z. Li, P. Ma, S. Wang, Y. Li, Y. Liu, N. Liu, and J. Rahmel, “Adversarial tuning: Defending against jailbreak attacks for LLMs,” arXiv preprint arXiv:2406.06622, 2024. [305] Y. Wang, D. Xue, S. Zhang, and S. Qian, “Badagent: Inserting and activating backdoor attacks in LLM agents,” in Proc. 62nd Annual Meeting of the Association for Computational Linguistics, vol. 1, 2024. [306] Z. Wang, J. Liu, S. Zhang, and Y. Yang, “Poisoned langchain: Jailbreak LLMs by langchain,” arXiv preprint arXiv:2406.18122, vol. 1, 2024. [307] I. Weber, “Large language models as software components: A taxonomy for LLM-integrated applications,” arXiv preprint arXiv:2406.10300, 2024. [308] A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does LLM safety training fail?” Advances in Neural Information Processing Systems, vol. 36, 2024. [309] R. Weiss, D. Ayzenshteyn, and Y. Mirsky, “What was your prompt? a remote keylogging attack on AI assistants,” in 33rd USENIX Security Symposium (USENIX Security 24).Philadelphia, PA: USENIX Association, Aug. 2024, p. 3367–3384. [310] H. Wen, Y. Li, G. Liu, S. Zhao, T. Yu, T. J.-J. Li, S. Jiang, Y. Liu, Y. Zhang, and Y. Liu, “AutoDroid: LLM-powered task automation in android,” in Proc. 30th Annual Int. Conf. on Mobile Computing and Networking (MobiCom).New York, NY, USA: Association for Computing Machinery, 2024, p. 543–557. [311] J. Wen, Z. Zhang, Y. Lan, Z. Cui, J. Cai, and W. Zhang, “A survey on federated learning: challenges and applications,” International Journal of Machine Learning and Cybernetics, vol. 14, no. 2, p. 513–535, 2023. [312] E. Wickens, K. Schulz, and T. Bonner, “ShadowLogic,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://hiddenlayer.com/ innovation-hub/shadowlogic/ [313] —, “ShadowLogic attack targets AI model graphs to create codeless backdoors,” 2024, accessed: Aug. 01, 2025. [Online]. Available: https://w.securityweek.com/shadowlogic-attack-targets-ai-model-g raphs-to-create-codeless-backdoors/ [314] Z. Wu, H. Gao, J. He, and P. Wang, “The dark side of function calling: Pathways to jailbreaking large language models,” in Proc. 31st Int. Conf. on Computational Linguistics.Abu Dhabi, UAE: Association for Computational Linguistics, Jan. 2025, p. 584–592. [315] Z. Xi, T. Du, C. Li, R. Pang, S. Ji, J. Chen, F. Ma, and T. Wang, “Defending pre-trained language models as few-shot learners against backdoor attacks,” arXiv preprint arXiv:2309.13256, 2023. [316] H. Xu, W. Zhang, Z. Wang, F. Xiao, R. Zheng, Y. Feng, Z. Ba, and K. Ren, “RedAgent: Red teaming large language models with context- aware autonomous language agent,” arXiv preprint arXiv:2407.16667, 2024. [317] J. Xu, Z. Li, W. Chen, Q. Wang, X. Gao, Q. Cai, and Z. Ling, “On-device language models: A comprehensive review,” arXiv preprint arXiv:2409.00088, 2024. [318] X. Xu, K. Kong, N. Liu, L. Cui, D. Wang, J. Zhang, and M. Kankan- halli, “An LLM can fool itself: A prompt-based adversarial attack,” 12th International Conference on Learning Representations, ICLR 2024, 2024. [319] Z. Xu, Y. Liu, G. Deng, Y. Li, and S. Picek, “A comprehensive study of jailbreak attack versus defense for large language models,” in Findings of the Association for Computational Linguistics ACL 2024, Jan. 2024, p. 7432–7449. [320] Z. Xu, Y. Liu, G. Deng, K. Wang, Y. Li, L. Shi, and S. Picek, “Continuous embedding attacks via clipped inputs in jailbreaking large language models,” arXiv preprint arXiv:2407.13796, 2024. [321] J. Xue, M. Zheng, T. Hua, Y. Shen, Y. Liu, L. B ̈ ol ̈ oni, and Q. Lou, “Ex- plore, establish, exploit: Red teaming language models from scratch,” arXiv preprint arXiv:2306.09442, 2023. [322] F. Yaman, Agent SCA: Advanced Physical Side Channel Analysis Agent with LLMs. North Carolina State University, 2023. [323] B. Yan, K. Li, M. Xu, Y. Dong, Y. Zhang, Z. Ren, and X. Cheng, “On protecting the data privacy of large language models (LLMs) and LLM agents: A literature review,” High-Confidence Computing, p. 100300, 2025. [324] L. Yan, S. Cheng, X. Chen, K. Zhang, G. Shen, Z. Zhang, and X. Zhang, “Flipattack: Jailbreak LLMs via flipping,” arXiv preprint arXiv:2410.02832, 2024. [325] W. Yang, X. Bi, Y. Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to LLM-based agents,” arXiv preprint arXiv:2402.11208, 2024. [326] Z. Yang, M. Backes, Y. Zhang, and A. Salem, “Sos! soft prompt attack against open-source large language models,” arXiv preprint arXiv:2407.03160, 2024. [327] D. Yao, J. Zhang, I. G. Harris, and M. Carlsson, “FuzzLLM: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models,” in 2024 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2024, p. 4485– 4489. [328] F. Yao, A. S. Rakin, and D. Fan, “DeepHammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips,” in 29th USENIX Security Symp. (USENIX Security 20).USENIX Association, Aug. 2020, p. 1463–1480. [329] H. Yao, J. Lou, and Z. Qin, “Poisonprompt: Backdoor attack on prompt- based large language models,” in 2024 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 4 2024, p. 7745–7749. [330] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “ReAct: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629, 2025. [331] Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, “A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly,” High-Confidence Computing, vol. 4, no. 2, p. 100211, 2024. [332] J. Yi, Y. Xie, H. Kong, B. Zhu, E. Kiciman, G. Sun, and F. Wu, “Benchmarking and defending against indirect prompt injection attacks on large language models,” Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD ’25), August 3 ˆ a•fi7, 2025, Toronto, ON, Canada, vol. 1, 12 2023. [Online]. Available: http://arxiv.org/abs/2312.14197 [333] S. Yi, Y. Liu, Z. Sun, T. Cong, X. He, J. Song, K. Xu, and Q. Li, “Jailbreak attacks and defenses against large language models: A survey,” arXiv preprint arXiv:2407.04295, 2024. [334] W. Yin, M. Xu, Y. Li, and X. Liu, “LLM as a system service on mobile devices,” arXiv preprint arXiv:2403.11805, 2024. [335] Z.-X. Yong, C. Menghini, and S. H. Bach, “Low-resource languages jailbreak GPT-4,” arXiv preprint arXiv:2310.02446, 2023. 36 [336] J. Yu, X. Lin, Z. Yu, and X. Xing, “GPTFUZZER: Red teaming large language models with auto-generated jailbreak prompts,” arXiv preprint arXiv:2309.10253, 2023. [337] L. Yu, S. Cheng, H. Yuan, P. Wang, Z. Huang, J. Zhang, C. Shen, F. Zhang, L. Yang, and J. Ma, “LLMStinger: Jailbreaking LLMs using rl fine-tuned LLMs,” arXiv preprint arXiv:2411.08862, 2024. [338] Z. Zhan, Z. Zhang, S. Liang, F. Yao, and X. Koutsoukos, “Graphics peeping unit: Exploiting EM side-channel information of GPUs to eavesdrop on your neighbors,” in 2022 IEEE Symp. on Security and Privacy (SP), 2022, p. 1440–1457. [339] C. Zhang, M. Jin, D. Shu, T. Wang, D. Liu, and X. Jin, “Target-driven attack for large language models,” Frontiers in Artificial Intelligence and Applications, vol. 392, p. 1752–1759, 10 2024. [Online]. Available: http://dx.doi.org/10.3233/FAIA240685 [340] G. Zhang, C. Yan, X. Ji, T. Zhang, T. Zhang, and W. Xu, “DolphinAt- tack: Inaudible voice commands,” in Proc. 2017 ACM SIGSAC Conf. on Computer and Communications Security.New York, NY, USA: Association for Computing Machinery, 2017, p. 103–117. [341] J. Zhang, Z. Wang, R. Wang, X. Ma, and Y.-G. Jiang, “Enja: Ensemble jailbreak on large language models,” arXiv preprint arXiv:2408.03603, 2024. [342] J. Zhang, H. Bu, H. Wen, Y. Chen, L. Li, and H. Zhu, “When LLMs meet cybersecurity: A systematic literature review,” Cybersecurity, vol. 8, no. 55, Feb. 2025. [343] R. Zhang, H. Li, R. Wen, W. Jiang, Y. Zhang, M. Backes, Y. Shen, and Y. Zhang, “Instruction backdoor attacks against customized LLMs,” in 33rd USENIX Security Symp. (USENIX Security 24). USA: USENIX Association, 2024. [344] R. Zhang, N. Javidnia, N. Sheybani, and F. Koushanfar, “”short-length” adversarial training helps LLMs defend ”long-length” jailbreak attacks: Theoretical and empirical evidence,” arXiv preprint arXiv:2502.04204, 2025. [345] S. Zhang and H. Li, “Code membership inference for detecting unauthorized data use in code pre-trained language models,” arXiv preprint arXiv:2312.07200, 2023. [346] Y. Zhang, Q. Li, T. Du, X. Zhang, X. Zhao, Z. Feng, and J. Yin, “Hi- jackRAG: Hijacking attacks against retrieval-augmented large language models,” arXiv preprint arXiv:2410.22832, 2024. [347] J. Zhao, K. Chen, W. Zhang, and N. Yu, “Sql injection jailbreak: A structural disaster of large language models,” arXiv preprint arXiv:2411.01565, 2024. [348] S. Zhao, L. Gan, Z. Guo, X. Wu, L. Xiao, X. Xu, C.-D. Nguyen, and L. A. Tuan, “Weak-to-strong backdoor attack for large language models,” arXiv preprint arXiv:2409.17946, 2024. [349] S. Zhao, M. Jia, L. A. Tuan, F. Pan, and J. Wen, “Universal vulnerabil- ities in large language models: In-context learning backdoor attacks,” arXiv preprint arXiv:2401.05949, 2024. [350] S. Zhao, L. A. Tuan, J. Fu, J. Wen, and W. Luo, “Exploring clean label backdoor attacks and defense in language models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, p. 3014–3024, 2 2024. [351] S. Zhao, J. Wen, L. A. Tuan, J. Zhao, and J. Fu, “Prompt as triggers for backdoor attack: Examining the vulnerability in language models,” EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings, p. 12 303–12 317, 2023. [Online]. Available: https://aclanthology.org/2023.emnlp-main.757/ [352] X. Zheng, H. Han, S. Shi, Q. Fang, Z. Du, Q. Guo, and X. Hu, “InputSnatch: Stealing input in LLM services via timing side-channel attacks,” arXiv preprint arXiv:2411.18191, 2024. [353] M. Zhou, X. Gao, J. Wu, K. Liu, H. Sun, and L. Li, “Investigating white-box attacks for on-device models,” in Proc. IEEE/ACM 46th Int. Conf. on Software Engineering.New York, NY, USA: Association for Computing Machinery, 2024. [354] A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,” arXiv preprint arXiv:2307.15043, 2023. 37 Vitor Hugo Galhardo Moia is a Cybersecurity researcher and consultant at Instituto de Pesquisas Eldorado (Brazil) since 2022. He received his M.Sc. (2016) and Ph.D. (2020) degrees in Computer En- gineering from the School of Electrical and Com- puter Engineering (FEEC), University of Campinas (UNICAMP), Brazil. During his master’s studies, he focused on the security and privacy on cloud data storage. His Ph.D. research centered on digital forensics. He also worked for about three years at Samsung R&D Institute as a Security Researcher, where he led projects and conducted applied research in mobile and net- work security. Since 2021, Vitor is part of SBSeg program committee. His research interests include Red teaming activities, secure software development practices, malware detection and analysis, and the application of Artificial Intelligence to security problems. Igor Jochem Sanz is a researcher and technology consultant at Instituto de Pesquisas Eldorado since 2022, where he coordinates research groups and lead R&D projects. His main specialties includes soft- ware security, network security and artificial intelli- gence. He also worked as a security researcher from Samsung R&D Institute Brazil (SRBR) between 2018 and 2022. He received a master’s degree in Electrical Engineering from the Federal University of Rio de Janeiro (COPPE/UFRJ) in 2018 and a bachelor’s degree in Electronic and Computer En- gineering from the same institution in 2017, with sandwich graduation at Bangor University, UK. He has published scientific texts, including two patents, among several topics of computer and security fields, such as intrusion detection, malware analysis, wireless security, mobile computing, machine- learning applied to security, network function virtualization, software-defined networks, cryptography, and blockchain. Gabriel Antonio Fontes Rebello is a cybersecurity researcher and consultant at Instituto de Pesquisas Eldorado (Brazil). He earned a Ph.D. degree in Com- puter Science from Sorbonne Universit ́ e in 2023, a master’s degree in Electrical Engineering from Universidade Federal do Rio de Janeiro (UFRJ) in 2019, and a cum laude B.Eng. degree in Computer Engineering from UFRJ in 2019. His areas of ex- pertise lie in AI for cybersecurity, cloud security, network security, and blockchains, having published several papers in IEEE conferences and journals such as INFOCOM, COMST, TNSM, and others. Rodrigo Duarte de Meneses is a cybersecurity in- tern at Instituto de Pesquisas Eldorado (Brazil) since 2024. He is currently an undergraduate student in Electrical and Electronics Engineering at the School of Electrical and Computer Engineering (FEEC) from the University of Campinas (UNICAMP). To date, he has published various papers at the Brazilian Symposium on Information Security and Computer Systems (SBSeg) and the International Symposium on the Internet of Things (SIoT), spanning several topics such as post-quantum cryptography, zero- knowledge proofs, homomorphic encryption and embedded systems security. Briland Hitaj is an Advanced Computer Scien- tist at SRI’s Computer Science Laboratory, bring- ing extensive expertise on security and privacy of machine learning models, with a particular focus on applications of generative models as AI Red- teaming mechanisms. Dr. Hitaj has studied the pri- vacy implications of collaborative (federated) learn- ing, resulting in the demonstration of the first data reconstruction attack in the field based on generative adversarial networks (GANs). More recently, Dr. Hitaj’s research focus has been on studying the security and privacy risks related to the use of machine learning models hosted on third-party, often unvetted repositories. Dr. Hitaj has demonstrated that an adversary could embed malicious payloads within the weights of a neural network without hindering the performance on the model, paving the way to a new class of attacks on end-user devices and infrastructure. His work also spans other critical topics in AI security such as adversarial samples, covert communication on top of federated learning, deep neural network watermarking, and password security. Dr. Hitaj received his Ph.D. in Computer Science from Sapienza University of Rome in 2018. Ulf Lindqvist is a Senior Technical Director at SRI’s Computer Science Laboratory where he manages research and development programs. Dr. Lindqvist established and leads SRI’s program in infrastructure security research, which is focused on cybersecurity for critical infrastructure systems, including spe- cialized systems in the Internet of Things, electric power, oil and gas, telecommunications, finance, au- tomotive, aviation, and space sectors. His expertise and interests are focused on cyber security, infras- tructure systems, intrusion detection in computer systems, and security for systems that interact with the physical world. He has more than 40 publications in the computer security area, many of which are bridging the gap between theoretical and applied research, and he holds several patents. He served as the 2016-2017 Chair of the IEEE Computer Society’s Technical Committee on Security and Privacy and also served as the Vice Chair of the IEEE Cybersecurity Initiative. He holds a Ph.D. in computer engineering from Chalmers University of Technology in Sweden and was named an SRI Fellow in 2016.