Paper deep dive

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, Yue Zhang

Year: 2024Venue: High-Confidence ComputingArea: Surveys & ReviewsType: SurveyEmbeddings: 156

Models: Bard, ChatGPT, GPT-4, LLaMA

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%

Last extracted: 3/12/2026, 8:22:10 PM

Summary

This paper provides a comprehensive survey of the intersection between Large Language Models (LLMs) and cybersecurity, categorizing research into beneficial applications ('The Good'), offensive applications ('The Bad'), and inherent vulnerabilities/defenses ('The Ugly'). It highlights that while LLMs significantly enhance code security and data privacy, they also introduce new attack vectors, particularly at the user level, and face unique vulnerabilities like prompt injection and model extraction.

Entities (6)

ChatGPT · model · 100%Large Language Models · technology · 100%Llama · model · 100%Code Security · domain · 95%Prompt Injection · vulnerability · 95%User-level attacks · threat · 90%

Relation Signals (3)

Prompt Injection → affects → Large Language Models

confidence 95% · Non-AI Model Inherent Vulnerabilities (e.g., remote code execution, prompt injection, side channels).

Large Language Models → enhances → Code Security

confidence 95% · LLMs have proven to enhance code security (code vulnerability detection)

Large Language Models → facilitates → User-level attacks

confidence 90% · they can also be harnessed for various attacks (particularly user-level attacks)

Cypher Suggestions (2)

Find all security-related threats associated with LLMs · confidence 90% · unvalidated

MATCH (t:Threat)-[:AFFECTS]->(m:Model {name: 'Large Language Models'}) RETURN t.name

List all beneficial applications of LLMs in security · confidence 85% · unvalidated

MATCH (m:Model)-[:ENHANCES]->(d:Domain) RETURN m.name, d.name

Abstract

Abstract:Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into "The Good" (beneficial LLM applications), "The Bad" (offensive applications), and "The Ugly" (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs' potential to both bolster and jeopardize cybersecurity.

PDF

Open source PDF →Open local PDF →

Full Text

155,334 characters extracted from source content.

Expand or collapse full text

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun and Yue Zhang a Drexel University, 3675 Market St., Philadelphia, PA, 19104, USA A R T I C L E I N F O Keywords: Large Language Model (LLM), LLM Security, LLM Privacy, ChatGPT, LLM Attacks, LLM Vulnerabilities A B S T R A C T Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text genera- tion capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and pri- vacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity. 1. Introduction A large language model is the language model with massive parameters that undergoes pretraining tasks (e.g., masked language modeling and autoregressive prediction) to un- derstand and process human language, by modeling the contextualized text semantics and probabilities from large amounts of text data. A capable LLM should have four key features [323]: (i) profound comprehension of natural language context; (i) ability to generate human-like text; (i) contextual awareness, especially in knowledge-intensive domains; (iv) strong instruction-following ability which is useful for problem-solving and decision-making. There are a number of LLMs that were developed and released in 2023, gaining significant popularity. Notable examples include OpenAI’sChatGPT[203], Meta AI’s LLaMA[4], and Databricks’ Dolly 2.0 [50]. For instance, ChatGPT alone boasts a user base of over 180 million [69]. LLMs now offer a wide range of versatile applications across various domains. Specifically, they not only provide techni- cal support to domains directly related to language process- ing (e.g., search engines [352, 13], customer support [259], translation [327, 138]) but also find utility in more general scenarios such as code generation [118], healthcare [274], finance [310], and education [186]. This showcases their adaptability and potential to streamline language-related tasks across diverse industries and contexts. y566@drexel.edu(Y. Yao);jd3734@drexel.edu(J. Duan); kx46@drexel.edu(K. Xu);yfcai@cs.drexel.edu(Y. Cai);zs384@drexel.edu (Z. Sun);yz899@drexel.edu(Y. Zhang) ORCID(s): LLMs are gaining popularity within the security com- munity. As of February 2023, a research study reported that GPT-3 uncovered 213 security vulnerabilities (only 4 turned out to be false positives) [141] in a code repository. In con- trast, one of the leading commercial tools in the market de- tected only 99 vulnerabilities. More recently, several LLM- powered security papers have emerged in prestigious confer- ences. For instance, in IEEE S&P 2023, Hammond Pearce et al. [211] conducted a comprehensive investigation employ- ing various commercially available LLMs, evaluating them across synthetic, hand-crafted, and real-world security bug scenarios. The results are promising, as LLMs successfully addressed all synthetic and hand-crafted scenarios. In NDSS 2024, a tool named Fuzz4All [313] showcased the use of LLMs for input generation and mutation, accompanied by an innovative autoprompting technique and fuzzing loop. These remarkable initial attempts prompt us to delve into three crucial security-related research questions: •RQ1. How do LLMs make a positive impact on se- curity and privacy across diverse domains, and what advantages do they offer to the security community? •RQ2. What potential risks and threats emerge from the utilization of LLMs within the realm of cybersecurity? •RQ3. What vulnerabilities and weaknesses within LLMs, and how to defend against those threats? Findings.To comprehensively address these questions, we conducted a meticulous literature review and assembled a collection of 281 papers pertaining to the intersection Yifan Yao et al.:Preprint submitted to ElsevierPage 1 of 24 arXiv:2312.02003v3 [cs.CR] 20 Mar 2024 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly of LLMs with security and privacy. We categorized these papers into three distinct groups: those highlighting security- beneficial applications (i.e., the good), those exploring ap- plications that could potentially exert adverse impacts on security (i.e., the bad), and those focusing on the discus- sion of security vulnerabilities (alongside potential defense mechanisms) within LLMs (i.e., the ugly). To be more specific: •The Good (§4):LLMs have a predominantly positive impact on the security community, as indicated by the most significant number of papers dedicated to en- hancing security. Specifically, LLMs have made con- tributions to both code security and data security and privacy. In the context of code security, LLMs have been used for the whole life cycle of the code (e.g., secure coding, test case generation, vulnerable code detection, malicious code detection, and code fixing). In data security and privacy, LLMs have been ap- plied to ensure data integrity, data confidentiality, data reliability, and data traceability. Meanwhile, Com- pared to state-of-the-art methods, most researchers found LLM-based methods to outperform traditional approaches. •The Bad (§5):LLMs also have offensive applications against security and privacy. We categorized the at- tacks into five groups: hardware-level attacks (e.g., side-channel attacks), OS-level attacks (e.g., analyz- ing information from operating systems), software- level attacks (e.g., creating malware), network-level attacks (e.g., network phishing), and user-level attacks (e.g., misinformation, social engineering, scientific misconduct). User-level attacks, with 32 papers, are the most prevalent due to LLMs’ human-like rea- soning abilities. Those attacks threaten both security (e.g., malware attacks) and privacy (e.g., social engi- neering). Nowadays, LLMs lack direct access to OS and hardware-level functions. The potential threats of LLMs could escalate if they gain such access. •The Ugly (§6):We explore the vulnerabilities and defenses in LLMs, categorizing vulnerabilities into two main groups: AI Model Inherent Vulnerabili- ties (e.g., data poisoning, backdoor attacks, training data extraction) and Non-AI Model Inherent Vul- nerabilities (e.g., remote code execution, prompt in- jection, side channels). These attacks pose a dual threat, encompassing both security concerns (e.g., remote code execution attacks) and privacy issues (e.g., data extraction). Defenses for LLMs are divided into strategies placed in the architecture, and applied during the training and inference phases. Training phase defenses involve corpora cleaning, and opti- mization methods, while inference phase defenses include instruction pre-processing, malicious detec- tion, and generation post-processing. These defenses collectively aim to enhance the security, robustness, and ethical alignment of LLMs. We found that model extraction, parameter extraction, and similar attacks have received limited research attention, remaining primarily theoretical with minimal practical explo- ration. The vast scale of LLM parameters makes tradi- tional approaches less effective, and the confidential- ity of powerful LLMs further shields them from con- ventional attacks. Strict censorship of LLM outputs challenges even black-box ML attacks. Meanwhile, research on the impact of model architecture on LLM safety is scarce, partly due to high computational costs. Safe instruction tuning, a recent development, requires further investigation. Contributions.Our work makes a dual contribution. First, we are pioneers summarizing the role of LLMs in security and privacy. We delve deeply into the positive impacts of LLMs on security, their potential risks and threats, vulner- abilities in LLMs, and the corresponding defense mecha- nisms. Other surveys may focus on one or two specific as- pects, such as beneficial applications, offensive applications, vulnerabilities, or defenses. To the best of our knowledge, our survey is the first to cover all three key aspects related to security and privacy for the first time. Second, we have made several interesting discoveries. For instance, our re- search reveals that LLMs contribute more positively than negatively to security and privacy. Moreover, we observe that most researchers concur that LLMs outperform state- of-the-art methods when employed for securing code or data. Concurrently, it becomes evident that user-level attacks are the most prevalent, largely owing to the human-like reasoning abilities exhibited by LLMs. Roadmap.The rest of the paper is organized as follows. We begin with a brief introduction to LLM in §2. §3 presents the overview of our work. In §4, we explore the beneficial impacts of employing LLMs. §5 discusses the negative im- pacts on security and privacy. In §6, we discuss the prevalent threats, vulnerabilities associated with LLMs as well as the countermeasures to mitigate these risks. §7 discuss LLMs in other security related topics and possible directions. We conclude the paper in §9. 2. Background 2.1. Large Language Models (LLMs) Large Language Models (LLMs) [347] represents an evolu- tion from language models. Initially, language models were statistical in nature and laid the groundwork for compu- tational linguistics. The advent of transformers has signif- icantly increased their scale. This expansion, along with the use of extensive training corpora and advanced pre- training techniques is pivotal in areas such as AI for science, logical reasoning, and embodied AI. These models undergo extensive training on vast datasets to comprehend and pro- duce text that closely mimics human language. Typically, LLMs are endowed with hundreds of billions, or even more, Yifan Yao et al.:Preprint submitted to ElsevierPage 2 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly parameters, honed through the processing of massive tex- tual data. They have spearheaded substantial advancements in the realm of Natural Language Processing (NLP) [82] and find applications in a multitude of fields (e.g., risk assessment [202], programming [26], vulnerability detec- tion [118], medical text analysis [274], and search engine optimization [13]). Based on Yang’s study [323], an LLM should have at least four key features. First, an LLM should demon- strate a deep understanding and interpretation of natural language text, enabling it to extract information and perform various language-related tasks (e.g., translation). Second, it should have the capacity to generate human-like text (e.g., completing sentences, composing paragraphs, and even writing articles) when prompted. Third, LLMs should exhibit contextual awareness by considering factors such as domain expertise, a quality referred to as “Knowledge- intensive”. Fourth, these models should excel in problem- solving and decision-making, leveraging information within text passages to make them invaluable for tasks such as information retrieval and question-answering systems. 2.2. Comparison of Popular LLMs As shown in Table 1 [276, 235], there is a diversity of providers for language models, including industry leaders such as OpenAI, Google, Meta AI, and emerging players such as Anthropic and Cohere. The release dates span from 2018 to 2023, showcasing the rapid development and evolu- tion of language models in recent years. Newer models such as “gpt-4” have emerged in 2023, highlighting the ongoing innovation in this field. While most of the models are not open-source, it is interesting to note that models like BERT, T5, PaLM, LLaMA, and CTRL are open-source, which can facilitate community-driven development and applications. Larger models tend to have more parameters, potentially indicating increased capabilities but also greater computa- tional demands. For example, “PaLM” stands out with a massive 540 billion parameters. It can also be observed that LLMs tend to have more parameters, potentially indi- cating increased capabilities but also greater computational demands. The “Tunability” column suggests whether these models can be fine-tuned for specific tasks. In other words, it is possible to take a large, pre-trained language model and adjust its parameters and training on a smaller, domain- specific dataset to make it perform better on a particular task. For instance, with tunability, one can fine-tune BERT on a dataset of movie reviews to make it highly effective at sentiment analysis. 3. Overview 3.1. Scope Our paper endeavors to conduct a thorough literature review, with the objective of collating and scrutinizing existing research and studies about the realms of security and privacy in the context of LLMs. The effort is geared towards both establishing the current state of the art in this domain and Table 1 Comparison of Popular LLMs ModelDate Provider Open-Source Params Tunability gpt-4 [64]2023.03 OpenAI✗1.7T✗ gpt-3.5-turbo2021.09 OpenAI✗175B✗ gpt-3 [24]2020.06 OpenAI✗175B✗ cohere-medium [170] 2022.07 Cohere✗6B✓ cohere-large [170] 2022.07 Cohere✗13B✓ cohere-xlarge [170] 2022.06 Cohere✗52B✓ BERT [61]2018.08 Google✓340M✓ T5 [225]2019 Google✓11B✓ PaLM [198]2022.04 Google✓540B✓ LLaMA [4]2023.02 Meta AI✓65B✓ CTRL [229]2019 Salesforce✓1.6B✓ Dolly 2.0 [50]2023.04 Databricks✓12B✓ pinpointing gaps in our collective knowledge. While it is true that LLMs wield multifaceted applications extending beyond security considerations (e.g., social and financial impacts), our primary focus remains steadfastly on matters of security and privacy. Moreover, it is noteworthy that GPT models have attained significant prominence within this landscape. Consequently, when delving into specific content and examples, we aim to employ GPT models as illustrative benchmarks. 3.2. The Research Questions LLMs have carried profound implications across diverse domains. However, it is essential to recognize that, as with any powerful technology, LLMs bear a significant respon- sibility. Our paper delves deeply into the multifaceted role of LLMs in the context of security and privacy. We intend to scrutinize their positive contributions to these domains, explore the potential threats they may engender, and uncover the vulnerabilities that could compromise their integrity. To accomplish this, our study will conduct a thorough literature review centered around three pivotal research questions: •The Good (§4):How do LLMs positively contribute to security and privacy in various domains, and what are the potential benefits they bring to the security community? •The Bad (§5):What are the potential risks and threats associated with the use of LLMs in the context of cy- bersecurity? Specifically, how can LLMs be used for malicious purposes, and what types of cyber attacks can be facilitated or amplified using LLMs? •The Ugly (§6):What vulnerabilities and weaknesses exist within LLMs, and how do these vulnerabilities pose a threat to security and privacy? Motivated by these questions, we conducted a search on Google Scholar and compiled papers related to security and privacy involving LLMs. As shown in Figure 1, we gath- ered a total of 83 “good” papers that highlight the positive contributions of LLMs to security and privacy. Additionally, we identified 54 “bad” papers, in which attackers exploited LLMs to target users, and 144 “ugly” papers, in which authors discovered vulnerabilities within LLMs. Most of the Yifan Yao et al.:Preprint submitted to ElsevierPage 3 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Good: 83 Bad: 54 Ugly: 144 Figure 1:An overview of our collected papers. papers were published in 2023, with only 82 of them released in between 2007 and 2022. Notably, there is a consistent upward trend in the number of papers released each month, with October reaching its peak, boasting the highest num- ber of papers published (38 papers in total, accounting for 15.97% of all the collected papers). It is conceivable that more security-related LLM papers will be published in the near future. Finding I.In terms of security-related applications (i.e., the “good” and the “bad” parts), it is evident that the majority of researchers are inclined towards using LLMs to bolster the security community, such as in vulnerability detection and security test generation, despite the presence of some vulnerabilities in LLMs at this stage. There are relatively few researchers who employ LLMs as tools for conducting attacks. In summary, LLMs contribute more positively than negatively to the security community. 4. Positive Impacts on Security and Privacy In this section, we explore the beneficial impacts of employ- ing LLMs. In the context of code or data privacy, we have opted to use the term “privacy” to characterize scenarios in which LLMs are utilized to ensure the confidentiality of either code or data. However, given that we did not come across any papers specifically addressing code privacy, our discussion focuses on code security (§4.1) as well as both data security and privacy (§4.2). 4.1. LLMs for Code Security As shown in Table 2, LLMs have access to a vast repository of code snippets and examples spanning various program- ming languages and domains. They leverage their advanced language understanding and contextual analysis capabilities to thoroughly examine code and code-related text. More specifically, LLMs can play a pivotal role throughout the entire code security lifecycle, including coding (C), test case generation (TCG), execution, and monitoring (RE). Secure Coding (C).We first discuss the use of LLMs in the context of secure code programming [75] (or generation [63, 285, 199, 90]). Sandoval et al. [234] conducted a user study (58 users) to assess the security implications of LLMs, par- ticularly OpenAI Codex, as code assistants for developers. They evaluated code written by student programmers when assisted by LLMs and found that participants assisted by LLMs did not introduce new security risks: the AI-assisted group produced critical security bugs at a rate no greater than 10% higher than the control group (non-assisted). He et al. [98, 99] focused on enhancing the security of code gener- ated by LLMs. They proposed a novel method calledSVEN, which leverages continuous prompts to control LLMs in generating secure code. With this method, the success rate improved from 59.1% to 92.3% when using the CodeGen LM. Mohammed et al. introduceSALLM[254], a framework consisting of a new security-focused dataset, an evaluation environment, and novel metrics for systematically assessing LLMs’ ability to generate secure code. Madhav et al. [197] evaluate the security aspects of code generation processes on the ChatGPT platform, specifically in the hardware domain. They explore the strategies that a designer can employ to en- able ChatGPT to provide secure hardware code generation. Test Case Generating (TCG).Several papers [33, 6, 238, 316, 156, 253, 335] discuss the utilization of LLMs for generating test cases, with our particular emphasis on those addressing security implications. Zhang et al. [343] demon- strated the use of ChatGPT-4.0 for generating security tests to assess the impact of vulnerable library dependencies on software applications. They found that LLMs could success- fully generate tests that demonstrated various supply chain attacks, outperforming existing security test generators. This approach resulted in 24 successful attacks across 55 appli- cations. Similarly,Libro[136], a framework that uses LLMs to automatically generate test cases to reproduce software security bugs. In the realm of security, fuzzing stands [325, 109, 337, 345, 272] out as a widely employed technique for generating test cases. Deng et al. introducedTitanFuzz[56], an approach that harnesses LLMs to generate input pro- grams for fuzzing Deep Learning (DL) libraries. TitanFuzz demonstrates impressive code coverage (30.38%/50.84%) and detects previously unknown bugs (41 out of 65) in popular DL libraries. More recently, Deng et al. [58, 57] refined LLM-based fuzzing (named FuzzGPT), aiming to generate unusual programs for DL library fuzzing. While Yifan Yao et al.:Preprint submitted to ElsevierPage 4 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly Table 2 LLMs for Code Security and Privacy Work Life Cycle LLM(s) Domain When compared to SOTA ways? Coding (C) Test Case Generating (TCG) Running and Executing (RE) Bug Detecting Malicious Code Detecting Vulnerability Detecting Fixing Sandoval et al. [234]○␣○␣○␣○␣○␣Codex-$Negligible risks SVEN [98]○␣○␣○␣○␣○␣CodeGen-$More faster/secure SALLM [254]○␣○␣○␣○␣○␣ChatGPT etc.-- Madhav et al. [197]○␣○␣○␣○␣○␣ChatGPTHardware- Zhang et al. [343]○␣○␣○␣○␣ChatGPT Supply chain$More valid cases Libro [136]○␣○␣○␣○␣LLaMA-!Higher FP/FN TitanFuzz [56]○␣○␣○␣CodexDL libs$Higher coverage FuzzGPT [57]○␣○␣○␣ChatGPTDL libs$Higher coverage Fuzz4All [313]○␣○␣○␣ChatGPT Languages$Higher coverage WhiteFox [321]○␣○␣○␣GPT4Compiler$High-quality tests Zhang et al. [337]○␣○␣○␣ChatGPTAPI- CHATAFL [190]○␣○␣○␣ChatGPTProtocol$Higher coverage Henrik [105]○␣○␣○␣○␣○␣ChatGPT-!Higer FP/FN Apiiro [74]○␣○␣○␣○␣○␣N/A-- Noever [201]○␣○␣○␣○␣○ChatGPT-$4X faster Bakhshandeh et al. [15]○␣○␣○␣○␣○␣ChatGPT-$Low FP/FN Moumita et al. [218]○␣○␣○␣○␣○␣ChatGPT-!Higher FP/FN Cheshkov et al. [41]○␣○␣○␣○␣○␣ChatGPT-!No better LATTE [174]○␣○␣○␣○␣○␣GPT-$Cost effective DefectHunter [296]○␣○␣○␣○␣○␣Codex-- Chen et al. [37]○␣○␣○␣○␣○␣ChatGPT Blockchain- Hu et al. [110]○␣○␣○␣○␣○␣ChatGPT Blockchain- KARTAL [233]○␣○␣○␣○␣○␣ChatGPT Web apps$Less manual VulLibGen [38]○␣○␣○␣○␣○␣LLaMaLibs$Higher accuracy/speed Ahmad et al. [3]○␣○␣○␣○␣○CodexHardware$Fix more bugs InferFix [125]○␣○␣○␣○Codex-$CI Pipeline Pearce et al. [211]○␣○␣○␣○␣Codex etc.-$Zero-shot Fu et al. [83]○␣○␣○␣○ChatGPTAPR$Higher accuracy Sobania et al. [257]○␣○␣○␣○␣○␣○ChatGPT etc.APR$Higher accuracy Jiang et al. [123]○␣○␣○␣○␣○␣○ChatGPTAPR$Higher accuracy TitanFuzz leverages LLMs’ ability to generate ordinary code, FuzzGPT addresses the need for edge-case testing by priming LLMs with historical bug-triggering program. Fuzz4All [313] leverages LLMs as input generators and mu- tation engines, creating diverse and realistic inputs for vari- ous languages (e.g., C, C++), improving the previous state- of-the-art coverage by 36.8% on average.WhiteFox[321], a novel white-box compiler fuzzer that utilizes LLMs to test compiler optimizations, outperforms existing fuzzers (it generates high-quality tests for intricate optimizations, sur- passing state-of-the-art fuzzers by up to 80 optimizations). Zhang et al. [337] explore the generation of fuzz drivers for library API fuzzing using LLMs. Results show that LLM- based generation is practical, with 64% of questions solved entirely automatically and up to 91% with manual validation. CHATAFL[190] is an LLM-guided protocol fuzzer that con- structs grammars for message types and mutates messages or predicts the next messages based on LLM interactions, achieving better state and code coverage compared to state- of-the-art fuzzers (e.g., AFLNET [217], NSFUZZ [222]). Vulnerable Code Detecting (RE).Noever [201] explores the capability of LLMs, particularly OpenAI’s GPT-4, in detecting software vulnerabilities. This paper shows that GPT-4 identified approximately four times the number of vulnerabilities compared to traditional static code analyzers (e.g., Snyk and Fortify). Parallel conclusions have also been drawn in other efforts [141, 15]. However, Moumita et al. [218] applied LLMs for software vulnerability detection, exposing a noticeable performance gap when compared to conventional static analysis tools. This disparity primarily arises from the relatively higher occurrence of false alerts generated by LLMs. Similarly, Cheshkov et al. [41] point out that the ChatGPT model performed no better than a dummy classifier for both binary and multi-label classifi- cation tasks in code vulnerability detection. Wang et al. introduceDefectHunter[296], a novel model that employs LLM-driven techniques for code vulnerability detection. They demonstrate the potential of combining LLMs with advanced mechanisms (e.g., Conformer) to identify software vulnerabilities more effectively. This combination shows an improvement in effectiveness, approximately from 14.64% to 20.62%, compared with Pongo-70B.LATTE[174] is a novel static binary taint analysis method powered by LLMs. LATTEsurpasses existing state-of-the-art techniques (e.g., Emtaint, Arbiter, and Karonte), demonstrating remarkable effectiveness in vulnerability detection (37 new bugs in real- world firmware) with lower cost. Efforts in leveraging LLMs for vulnerability detection extend to specialized domains (e.g.,blockchain [110, 37], kernel [104] mobile [303]). For instance, Chen et al. [37] and Hu et al. [110] focus on the application of LLMs in identifying vulnerabilities within blockchain smart con- tracts. Sakaoglu’s study introduces KARTAL [233], a pio- neering approach that harnesses LLMs for web application vulnerability detection. This method achieves an accuracy Yifan Yao et al.:Preprint submitted to ElsevierPage 5 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly Table 3 LLMs for Data Security and Privacy Work Prop. ModelDomain Compared to SOTA ways? I C R T Fang [294]○␣○␣ChatGPT Ransomware- Liu et al. [187]○␣○␣ChatGPT Ransomware- Amine et al. [73]○␣ChatGPT Semantic$Aligned w/ SOTA HuntGPT [8]○␣ChatGPT Network$More effective Chris et al. [71]○␣ChatGPTLog$Less manual AnomalyGPT [91]○␣ChatGPTVideo$Less manual LogGPT [221]○␣ChatGPTLog$Less manual Arpita et al. [286]○␣○␣○␣BERT etc.-- Takashi et al. [142]○␣○␣○␣ChatGPT Phishing$High precision Fredrik et al. [102]○␣○␣○␣ChatGPT etc Phishing$Effective IPSDM [119]○␣○␣○␣BERTPhishing- Kwon et al. [149]○␣○␣○␣ChatGPT-$Non-exp friendly Scanlon et al. [237]○␣○␣○␣○ChatGPT Forensic$More effective Sladić et al. [255]○␣○␣○␣○ChatGPT Honeypot$More realistic WASA [297]○␣○␣○-Watermark$More effective REMARK [340]○␣○␣○-Watermark$More effective SWEET [154]○␣○␣○-Watermark$More effective of up to 87.19% and is capable of conducting 539 pre- dictions per second. Additionally, Chen et al. [38] make a noteworthy contribution with VulLibGen, a generative methodology utilizing LLMs to identify vulnerable libraries. Ahmad et al. [3] shift the focus to hardware security. They investigate the use of LLMs, specifically OpenAI’s Codex, in automatically identifying and repairing security-related bugs in hardware designs. PentestGPT [55], an automated penetration testing tool, uses the domain knowledge inherent in LLMs to address individual sub-tasks of penetration testing, improving task completion rates significantly. Malicious Code Detecting (RE).Using LLM to detect malware is a promising application. This approach lever- ages the natural language processing capabilities and con- textual understanding of LLMs to identify malicious soft- ware. In experiments with GPT-3.5 conducted by Henrik Plate [105], it was found that LLM-based malware detection can complement human reviews but not replace them. Out of 1800 binary classifications performed, there were both false-positives and false-negatives. The use of simple tricks could also deceive the LLM’s assessments. More recently, there are a few attempts have been made in this direction. For example, Apiiro [74] is a malicious code analysis tool using LLMs. Apiiro’s strategy involves the creation of LLM Code Patterns (LCPs) to represent code in vector format, making it easier to identify similarities and cluster packages efficiently. Its LCP detector incorporates LLMs, proprietary code anal- ysis, probabilistic sampling, LCP indexing, and dimension- ality reduction to identify potentially malicious code. Vulnerable/Buggy Code Fixing (RE).Several papers [123, 211, 314] has focused on evaluate the performance of LLMs trained on code in the task of program repair. Jin et al. [125] proposedInferFix, a transformer-based program repair framework that works in tandem with the combination of cutting-edge static analyzer with transformer-based model to address and fix critical security and performance issues with accuracy between 65% to 75%. Pearce et al. [211] observed that LLMs can repair insecure code in a range of contexts even without being explicitly trained on vulnerabil- ity repair tasks. ChatGPT is noted for its ability in code bug detection and correction. Fu et al. [83] assessed ChatGPT in vulnerability- related tasks like predicting and classifying vulnerabilities, severity estimation, and analyzing over 190,000 C/C++ functions. They found that ChatGPT’s performance was behind other LLMs specialized in vulnerability detection. However, Sobania et al. [257] found ChatGPT’s bug fix- ing performance competitive with standard program repair methods, as demonstrated by its ability to fix 31 out of 40 bugs. Xia et al. [315] presentedChatRepair, leveraging pre-trained language models (PLMs) for generating patches without dependency on bug-fixing datasets, aiming to en- hance performance to generate patches without relying on bug-fixing datasets, aiming to improve ChatGPT’s code- fixing abilities using a mix of successful and failure tests. As a result, they fixed 162 out of 337 bugs at a cost of $0.42 each. Finding I.As shown in Table 2, a comparison with state- of-the-art methods reveals that the majority of researchers (17 out of 25) have concluded that LLM-based methods outperform traditional approaches (advantages include higher code coverage, higher detecting accuracy, less cost etc.). Only four papers argue that LLM-based methods do not surpass the state-of-the-art appoarches. The most frequently discussed issue with LLM-based methods is their tendency to produce both high false negatives and false positives when detecting vulnerabilities or bugs. 4.2. LLMs for Data Security and Privacy As demonstrated in Table 3, LLMs make valuable contri- butions to the realm of data security, offering multifaceted approaches to safeguarding sensitive information. We have organized the research papers into distinct categories based on the specific facets of data protection that LLMs enhance. These facets encompass critical aspects such as data in- tegrity (I), which ensures that data remains uncorrupted throughout its life cycle; data reliability (R), which ensures the accuracy of data; data confidentiality (C), which focuses on guarding against unauthorized access and disclosure of sensitive information; and data traceability (T), which involves tracking and monitoring data access and usage. Data Integrity (I).Data Integrity ensures that data remains unchanged and uncorrupted throughout its life cycle. As of now, there are a few works that discuss how to use LLMs to protect data integrity. For example, ransomware usually en- crypts a victim’s data, making the data inaccessible without a decryption key that is held by the attacker, which breaks the data integrity. Wang Fang’s research [294] examines using LLMs for ransomware cybersecurity strategies, mostly theo- retically proposing real-time analysis, automated policy gen- eration, predictive analytics, and knowledge transfer. How- ever, these strategies lack empirical validation. Similarly, Yifan Yao et al.:Preprint submitted to ElsevierPage 6 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly Liu et al. [187] explored the potential of LLMs for creating cybersecurity policies aimed at mitigating ransomware at- tacks with data exfiltration. They compared GPT-generated Governance, Risk and Compliance (GRC) policies to those from established security vendors and government cyberse- curity agencies. They recommended that companies should incorporate GPT into their GRC policy development. Anomaly detection is a key defense mechanism that identifies unusual behavior. While it does not directly protect data integrity, it identifies abnormal or suspicious behavior that can potentially compromise data integrity (as well as data confidentiality and data reliability). Amine et al. [73] in- troduced an LLM-based monitoring framework for detecting semantic anomalies in vision-based policies and applied it to both finite state machine policies for autonomous driving and learned policies for object manipulation. Experimental results demonstrate that it can effectively identify semantic anomalies, aligning with human reasoning.HuntGPT[8] is an LLM-based intrusion detection system for network anomaly detection. The results demonstrate its effectiveness in improving user understanding and interaction. Chris et al. [71] andLogGPT[221] explore ChatGPT’s potential for log-based anomaly detection in parallel file systems. Results show that it addresses the issues in traditional manual labeling and interpretability.AnomalyGPT[91] uses Large Vision-Language Models to detect industrial anomalies. It eliminates manual threshold setting and supports multi-turn dialogues. Data Confidentiality (C).Data confidentiality refers to the practice of protecting sensitive information from unautho- rized access or disclosure, a topic extensively discussed in LLM privacy discussions [214, 242, 286, 1]. However, most of these studies concentrate on enhancing LLMs through state-of-the-art Privacy Enhancing Techniques (e.g., zero- knowledge proofs [224], differential privacy (e.g., [242, 184, 166], and federated learning [145, 122, 78]). There are only a few attempts that utilize LLMs to enhance user privacy. For example, Arpita et al. [286] use LLMs to preserve privacy by replacing identifying information in textual data with generic markers. Instead of storing sensitive user in- formation, such as names, addresses, or credit card num- bers, the LLMs suggest substitutes for the masked tokens. This obfuscation technique helps to protect user data from being exposed to adversaries. By using LLMs to generate substitutes for masked tokens, the models can be trained on obfuscated data without compromising the privacy and secu- rity of the original information. Similar ideas have also been explored in other studies [1, 262]. Hyeokdong et al. [149] explore implementing cryptography with ChatGPT, which ultimately protects data confidentiality. Despite the lack of extensive coding skills or programming knowledge, the authors were able to successfully implement cryptographic algorithms through ChatGPT. This highlights the potential for individuals to utilize ChatGPT for cryptography tasks. Data Reliability (R).In our context, data reliability refers to the accuracy of data. It is a measure of how well data can be depended upon to be accurate, and free from errors or bias. Takashi et al. [142] proposed to use ChatGPT for the de- tection of sites that contain phishing content. Experimental results using GPT-4 show promising performance, with high precision and recall rates. Fredrik et al. [102] assessed the ability of four large language models (GPT, Claude, PaLM, and LLaMA) to detect malicious intent in phishing emails, and found that they were generally effective, even surpassing human detection, although occasionally slightly less accu- rate.IPSDM[119] is a model fine-tuned from the BERT family to identify phishing and spam emails effectively. IPSDMdemonstrates superior performance in classifying emails, both in unbalanced and balanced datasets. Data Traceability (T).Data traceability is the capability to track and document the origin, movement, and history of data within a single system or across multiple systems. This concept is particularly vital in fields such as incident management and forensic investigations, where understand- ing the journey and transformations of events to resolving issues and conducting thorough analyses. LLMs have gained traction in forensic investigations, offering novel approaches for analyzing digital evidence. Scanlon et al. [237] explored how ChatGPT assists in analyzing OS artifacts like logs, files, cloud interactions, executable binaries, and in examin- ing memory dumps to detect suspicious activities or attack patterns. Additionally, Sladić et al. [255] proposed that gen- erative models like ChatGPT can be used to create realistic honeypots to deceive human attackers. Watermarking involves embedding a distinctive, typ- ically imperceptible or hard-to-identify signal within the outputs of a model. Wang et al. [297] discusses concerns regarding the intellectual property of training data for LLMs and proposedWASAframework to learn the mapping be- tween the texts of different data providers. Zhang et al. [340] developedREMARK-LLMthat focused on monitor the uti- lization of their content and validate their watermark re- trieval. This helps protect against malicious uses such as spamming and plagiarism. Furthermore, identifying code produced by LLMs is vital for addressing legal and ethical issues concerning code licensing, plagiarism, and malware creation. Similarly, Li et al. [169] propose the first watermark technique to protect large language model-based code gen- eration APIs from remote imitation attacks. Lee et al. [154] developedSWEET, a tool that implements watermarking specifically on tokens within programming languages. Finding I.Likewise, it is noticeable that LLMs excel in data protection, surpassing current solutions and requiring fewer manual interventions. Table 2 and Table 3 reveal that ChatGPT is the predominant LLM extensively employed in diverse security applications. Its versatility and effectiveness make it a preferred choice for various security-related tasks, further reinforcing its position as a go-to solution in the field of artificial intelligence and cybersecurity. Yifan Yao et al.:Preprint submitted to ElsevierPage 7 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly Hardware Attacks OS Attacks Software Attacks Network Attacks User-based Attacks Cyber Attacks Hardware Attacks OS Attacks Network AttacksUser Attacks Hardware Side-Channel Attacks Malicious Firmware Evil Maid Attacks Fault Injection Privilege Escalation Rootkit OS RCE Memory Attacks OS DoS Software Exploitation Fingerprinting Attacks OS Side-Channel Attacks Software DoS Software Side-Channel Attacks Race Conditions BOF Attacks Software Attacks Phishing Attacks Malware Attacks Keylogger Worm Ransomware Brute-force Fileless CAPTCHA Breaking SQL Injection XSS Attacks CSRF Attacks Web DoS Web Side- Channel Attacks Cookie Theft Directory Traversal Social Engineering Fraud Misinformation Spear Phishing Credential Stuffing Tailgating Scientific misconduct Impersonation Figure 2:Taxonomy of Cyberattacks. The colored boxes rep- resent attacks that have been demonstrated to be executable using LLMs, whereas the gray boxes indicate attacks that cannot be executed with LLMs. HardwareOSSoftwareNetworkUser Figure 3:Prevalence of the existing attacks 5. Negative Impacts on Security and Privacy As shown in Figure 2, we have categorized the attacks into five groups based on their respective positions within the system infrastructure. These categories encompass hardware- level attacks, OS-level attacks, software-level attacks, network- level attacks, and user-level attacks. Additionally, we have quantified the number of associated research papers pub- lished for each group, as illustrated in Figure 3. Hardware-Level Attacks.Hardware attacks typically in- volve physical access to devices. However, LLMs cannot di- rectly access physical devices. Instead, they can only access information associated with the hardware. Side-channel at- tack [260, 107, 189] is one attack that can be powered by the LLMs. Side-channel attacks typically entail the analysis of unintentional information leakage from a physical system or implementation, such as a cryptographic device or software, with the aim of inferring secret information (e.g., keys). Yaman [319] has explored the application of LLM tech- niques to develop side-channel analysis methods. The re- search evaluates the effectiveness of LLM-based approaches in analyzing side-channel information in two hardware- related scenarios: AES side-channel analysis and deep- learning accelerator side-channel analysis. Experiments are conducted to determine the success rates of these methods in both situations. OS-Level Attacks.LLMs operate at a high level of abstrac- tion and primarily engage with text-based input and output. They lack the necessary low-level system access essential for executing OS-level attacks [114, 288, 128]. Nonetheless, they can be utilized for the analysis of information gath- ered from operating systems, thus potentially aiding in the execution of such attacks. Andreas et al. [94] establish a feedback loop connecting LLM to a vulnerable virtual ma- chine through SSH, allowing LLM to analyze the machine’s state, identify vulnerabilities, and propose concrete attack strategies, which are then executed automatically within the virtual machine. More recently, they [95] introduced an automated Linux privilege-escalation benchmark using local virtual machines and an LLM-guided privilege-escalation tool to assess various LLMs and prompt strategies against the benchmark. Software-Level Attacks.Similar to how they employ LLM to target hardware and operating systems, there are also instances where LLM has been utilized to attack software (e.g., [343, 209, 212, 32]). However, the most prevalent software-level use case involves malicious developers uti- lizing LLMs to create malware. Mika et al. [17] present a proof-of-concept in which ChatGPT is utilized to dis- tribute malicious software while avoiding detection. Yin et al. [207] investigate the potential misuse of LLM by creating a number of malware programs (e.g., ransomware, worm, keylogger, brute-force malware, Fileless malware). Anto- nio Monje et al. [194] demonstrate how to trick ChatGPT into quickly generating ransomware. Marcus Botacin [22] explores different coding strategies (e.g., generating entire malware, creating malware functions) and investigates the LLM’s capacities to rewrite malware code. The findings re- veal that LLM excels in constructing malware using building block descriptions. Meanwhile, LLM can generate multiple versions of the same semantic content (malware variants), with varying detection rates by Virustotal AV (ranging from 4% to 55%). Network-Level Attacks.LLMs can also be employed for initiating network attacks. A prevalent example of a network- level attack utilizing LLM is phishing attacks [18, 43]. Fredrik et al. [102] compared AI-generated phishing emails Yifan Yao et al.:Preprint submitted to ElsevierPage 8 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly using GPT-4 with manually designed phishing emails cre- ated using the V-Triad, alongside a control group exposed to generic phishing emails. The results showed that per- sonalized phishing emails, whether generated by AI or de- signed manually, had higher click-through rates compared to generic ones. Tyson et al. [151] investigated how modifying ChatGPT’s input can affect the content of the generated emails, making them more convincing. Julian Hazell [97] demonstrated the scalability of spear phishing campaigns by generating realistic and cost-effective phishing messages for over 600 British Members of Parliament using ChatGPT. In another study, Wang et al. [295] discuss how the traditional defenses may fail in the era of LLMs. CAPTCHA challenges, involving distorted letters and digits, struggle to detect chat- bots relying on text and voice. However, LLMs may break the challenges, as they can produce high-quality human-like text and mimic human behavior effectively. There is one study that utilizes LLM for deploying fingerprint attacks. Armin et al. [236] employed density-based clustering to cluster HTTP banners and create text-based fingerprints for annotating scanning data. When these fingerprints are compared to an existing database, it becomes possible to identify new IoT devices and server products. User-Level Attacks.Recent discussions have primarily fo- cused on user-level attacks, as LLM demonstrates its ca- pability to create remarkably convincing but ultimately de- ceptive content, as well as establish connections between seemingly unrelated pieces of information. This presents opportunities for malicious actors to engage in a range of nefarious activities. Here are a few examples: •Misinformation.Overreliance on content generated by LLMs without oversight is raising serious concerns regarding the safety of online content [206]. Numer- ous studies have focused on detecting misinformation produced by LLMs. Several study [35, 308, 324] reveal content generated by LLMs are harder to de- tect and may use more deceptive styles, potentially causing greater harm. Canyu Chen et al. [35] pro- pose a taxonomy for LLM-generated misinformation and validate methods. Countermeasures and detection methods [308, 280, 40, 267, 36, 341, 19, 155, 263] have also been developed to address these emerging issues. •Social Engineering.LLMs not only have the potential to generate content from training data, but they also offer attackers a new perspective for social engineer- ing. Work from Stabb et al. [261] highlights the capa- bility of well-trained LLMs to infer personal attributes from text, such as location, income, and gender. They also reveals how these models can extract personal information from seemingly benign queries. Tong et al. [275] investigated the content generated by LLMs may include user information. Moreover, Polra Victor Falade [76] stated the exploitation by LLM-driven social engineers involves tactics such as psychologi- cal manipulation, targeted phishing, and the crisis of authenticity. •Scientific Misconduct.Irresponsible use of LLMs can result in issues related to scientific misconduct, stemming from their capacity to generate original, coherent text. The academic community [45, 265, 215, 46, 179, 72, 200, 223, 87, 139, 226], encompass- ing diverse disciplines from various countries, has raised concerns about the increasing difficulties in detecting scientific misconduct in the era of LLMs. Concerns arise from LLMs’ ability to generate co- herent and original content, including complete pa- pers from unreliable sources [283, 287, 232]. Re- searchers are also actively engaged in the effort to detect such misconduct. For example, Kavita Ku- mari et al. [146, 147] proposedDEMASQ, a precise ChatGPT-generated content detector.DEMASQcon- siders biases in text composition and evasion tech- niques, achieving high accuracy across diverse do- mains in identifying ChatGPT-generated content. •Fraud.Cybercriminals have devised a new tool called FraudGPT [76, 10], which operates like ChatGPT but facilitates cyberattacks. It lacks the safety controls of ChatGPT and is sold on the dark web and Telegram for $200 per month or $1,700 annually. FraudGPT can create fraud emails related to banks, suggesting malicious links’ placement in the content. It can also list frequently targeted sites or services, aiding hackers in planning future attacks. WormGPT [52], a cybercrime tool, offers features such as unlimited character support and chat memory retention. The tool was trained on confidential datasets, with a focus on malware-related and fraud-related data. It can guide cybercriminals in executing Business Email Compromise (BEC) attacks. Finding IV.As illustrated in Figure 3, when compared to other attacks, it becomes apparent that user-level attacks are the most prevalent, boasting a significant count of 33 papers. This dominance can be attributed to the fact that LLMs have increasingly human-like reasoning abilities, enabling them to generate human-like conversations and content (e.g., scientific misconduct, social engineering). Presently, LLMs do not possess the same level of access to OS-level or hardware-level functionalities. This observa- tion remains consistent with the attack observed in other levels as well. For instance, at the network level, LLMs can be abused to create phishing websites and bypass CAPTCHA mechanisms. 6. Vulnerabilities and Defenses in LLMs In the following section, we embark on an in-depth ex- ploration of the prevalent threats and vulnerabilities asso- ciated with LLMs (§6.1). We will examine the specific Yifan Yao et al.:Preprint submitted to ElsevierPage 9 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly risks and challenges that arise in the context of LLMs. In addition to discussing these challenges, we will also delve into the countermeasures and strategies that researchers and practitioners have developed to mitigate these risks (§6.2). Figure 4 illustrates the relationship between the attacks and defenses. 6.1. Vulnerabilities and Threats in LLMs In this section, we aim to delve into the potential vulner- abilities and attacks that may be directed towards LLMs. Our examination seeks to categorize these threats into two distinct groups: AI Model Inherent Vulnerabilities and Non- AI Model Inherent Vulnerabilities. 6.1.1. AI Inherent Vulnerabilities and Threats These are vulnerabilities and threats that stem from the very nature and architecture of LLMs, considering that LLMs are fundamentally AI models themselves. For example, attack- ers may manipulate the input data to generate incorrect or undesirable outputs from the LLM. (A1) Adversarial Attacks.Adversarial attacks in machine learning refer to a set of techniques and strategies used to in- tentionally manipulate or deceive machine learning models. These attacks are typically carried out with malicious intent and aim to exploit vulnerabilities in the model’s behavior. We only focus on the most extensively discussed attacks, namely, data poisoning and backdoor attacks. •Data Poisoning.Data poisoning stands for attackers influencing the training process by injecting mali- cious data into the training dataset. This can introduce vulnerabilities or biases, compromising the security, effectiveness, or ethical behavior of the resulting mod- els [206]. Various study [148, 290, 289, 2, 291, 239] have demonstrated that pre-trained models are vul- nerable to compromise via methods such as using untrusted weights or content, including the insertion of poisoned examples into their datasets. By their inherent nature as pre-trained models, LLMs are sus- ceptible to data poisoning attacks [227, 251, 245]. For example, Alexander et al. [290] showed that even with just 100 poison examples, LLMs can produce consis- tently negative results or flawed outputs across various tasks. Larger language models are more susceptible to poisoning, and existing defenses like data filtering or model capacity reduction offer only moderate protec- tion while hurting test accuracy. •Backdoor Attacks.Backdoor attacks involve the ma- licious manipulation of training data and model pro- cessing, creating a vulnerability where attackers can embed a hidden backdoor into the model [322]. Both backdoor attacks and data poisoning attacks involve manipulating machine learning models, which can include manipulation of inputs. However, the key dis- tinction is that backdoor attacks specifically focus on Cyber Attacks Hardware Attacks OS Attacks Network AttacksUser Attacks Hardware Side-Channel Attacks Malicious Firmware Evil Maid Attacks Fault Injection Privilege Escalation Rootkit OS RCE Memory Attacks OS DoS Software Exploitation Fingerprinting Attacks OS Side-Channel Attacks Software DoS Software Side-Channel Attacks Race Conditions BOF Attacks Software Attacks Phishing Attacks Malware Attacks Keylogger Worm Ransomware Brute-force Fileless CAPTCHA Breaking SQL Injection XSS Attacks CSRF Attacks Web DoS Web Side- Channel Attacks Cookie Theft Directory Traversal Social Engineering Fraud Misinformation Spear Phishing Credential Stuffing Tailgating Scientific misconduct Impersonation AI - Inherent Vulnerabilities and Threats Extraction Attacks Bias and Unfairness Exploitation Adversarial Attacks Data Poisoning Backdoor Attacks Inference Attacks Attribute Inference Membership Inferences Instruction Tuning Attacks Jailbreaking Prompt Injection Remote Code Execution Side Channel Supply Chain Vulnerabilities Non - AI - Inherent Vulnerabilities and Threats Defense Strategies in LLM Training Model Architecture LLM capacity LLM Sparsity Cognitive Architectures Knowledge Graph Corpora Cleaning Language Identification Detoxicification Debiasing De-identification Deduplication Optimization Methods Adversarial Training Adversarial Fine-tuning Safe Instruction-Tuning Differential Privacy Defense Strategies in LLM Inference Instruction Processing (Pre-process) Instruction Manipulation Instruction Purification Defensive Demonstrations Malicious Detection (In-process) Confidence-based Detection Entropy-based Detection Consistency Check Gradient-based Detection Outlier Detection Generation Processing (Post-Processing) Majority Vote (Self-)Critique Defenses for Non-AI-Inherent Threats (Out of Scope) Vulnerabilities and ThreatsDefenses Defense in Architecture Figure 4:Taxonomy of Threats and the Defenses. The line represents a defense technique that can defend against either a specific attack or a group of attacks. introducing hidden triggers into the model to ma- nipulate specific behaviors or responses when the trigger is encountered. LLMs are subject to backdoor attacks [161, 331, 167]. For example, Yao et al. [329] a bidirectional backdoor, which combines trigger mech- anisms with prompt tuning. (A2) Inference Attacks.Inference attacks in the context of machine learning refer to a class of attacks where an adversary tries to gain sensitive information or insights about a machine learning model or its training data by making specific queries or observations to the model. These attacks often exploit unintended information leakage from the responses. •Attribute Inference Attacks.Attribute inference At- tack [208, 181, 133, 258, 183, 160] is a type of threat Yifan Yao et al.:Preprint submitted to ElsevierPage 10 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly where an attacker attempts to deduce sensitive or personal information of individuals or entities by ana- lyzing the behavior or responses of a machine learning models. It works against the LLMs as well. Robin et al. [261] presented the first comprehensive exam- ination of pretrained LLMs’ ability to infer personal information from text. Using a dataset of real Reddit profiles, the study demonstrated that current LLMs can accurately infer a variety of personal information (e.g., location, income, sex) with high accuracy. •Membership Inferences.Membership inference At- tack is a specific type of inference attack in the field of data security and privacy that determining whether a data record was part of a model’s training dataset, given white-/black-box access to the model and the specific data record [250, 68, 143, 85, 84, 191, 112]. A number of research studies have explored the concept of membership inference, each adopting a unique perspective and methodology. These studies have explored various membership inference attacks by an- alyzing the label [42], determining the threshold [120, 28, 96], developing a generalized formulation [278], among other methods. Mireshghallah et al. [192] found that fine-tuning the head of the model exhibits greater susceptibility to attacks when compared to fine-tuning smaller adapters. (A3) Extraction Attacks.Extraction attacks typically refer to attempts by adversaries to extract sensitive information or insights from machine learning models or their asso- ciated data. Extraction attacks and inference attacks share similarities but differ in their specific focus and objectives. Extraction attacks aim to acquire specific resources (e.g., model gradient, training data) or confidential information directly. Inference attacks seek to gain knowledge or insights about the model or data’s characteristics, often by observing the model’s responses or behavior. Various types of data extraction attacks exist, including model theft attacks [130, 137], gradient leakage [158], and training data extraction attacks [29]. As of the current writing, it has been observed that training data extraction attacks may be effective against LLMs. Training data extraction [29] refers to a method where an attacker attempts to retrieve specific individual ex- amples from a model’s training data by strategically query- ing the machine learning models. Numerous research [344, 210, 326] studies have shown that it is possible to extract training data from LLMs, which may include personal and private information [113, 339]. Notably, the work by Truong et al. [279] stands out for its ability to replicate the model without accessing the original model data. (A4) Bias and Unfairness Exploitation.Bias and unfair- ness in LLMs pertain to the phenomenon where these mod- els demonstrate prejudiced outcomes or discriminatory be- haviors. While bias and fairness issues are not unique to LLMs, they have received more attention due to the ethical and societal concerns. That is, the societal impact of LLMs has prompted discussions about the ethical responsibilities of organizations and researchers developing and deploying these models. This has led to increased scrutiny and research on bias and fairness. Concerns of bias were raised from var- ious fields, encompassing gender and minority groups [65, 144, 81, 244], the identification of misinformation, political aspects. Multiple studies [269, 281] revealed biases in the language used while querying LLMs. Moreover, Urman et al. [282] discovered that biases may arise from adherence to government censorship guidelines. Bias in professional writ- ing [292, 263, 79] involving LLMs is also a concern within the community, as it can significantly damage credibility. The biases of LLMs may also lead to negative side effects in areas beyond text-based applications. Dai et al. [47] noted that content generated by LLMs might introduce biases in neural retrieval systems, and Huang et al. [111] discovered that biases could also be present in LLM generated code. (A5) Instruction Tuning Attacks.Instruction tuning, also known as instruction-based fine-tuning, is a machine-learning technique used to train and adapt language models for specific tasks by providing explicit instructions or examples during the fine-tuning process. In LLMs, instruction-tuning attacks refer to a class of attacks or manipulations that target instruction-tuned LLMs. These attacks are aimed at exploiting vulnerabilities or limitations in LLMs that have been fine-tuned with specific instructions or examples for particular tasks. •Jailbreaking.Jailbreaking in LLMs involves bypass- ing security features to enable responses to other- wise restricted or unsafe questions, unlocking capa- bilities usually limited by safety protocols. Numerous studies have demonstrated various methods for suc- cessfully jailbreaking LLMs [159, 271, 248]. Wei et al. [301] emphasized that the alignment capabilities of LLMs can be influenced or manipulated through in-context demonstrations. In addition to this, several researches [300, 132] also demonstrated similar ma- nipulation using various approaches, highlighting the versatility of methods that can jailbreaking LLMs. More recently,MASTERKEY[54] employed a time- based method for dissecting defenses, and demon- strated proof-of-concept attacks. It automatically gen- erates jailbreak prompts with a 21.58Moreover, di- verse methods have been employed in jailbreaking LLMs, such as conducting fuzzing [328], implement- ing optimized search strategies [353], and even train- ing LLMs specifically to jailbreak other LLMs [53, 353]. Meanwhile, Cao et al. [27] developed RA-LLM, a method to lowers the success rate of adversarial and jailbreaking prompts without needing of retraining or access to model parameters. •Prompt Injection.Prompt injection attack describes a method of manipulating the behavior of LLMs to elicit unexpected and potentially harmful responses. This technique involves crafting input prompts in a Yifan Yao et al.:Preprint submitted to ElsevierPage 11 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly way that bypasses the model’s safeguards or trig- gers undesirable outputs. A substantial amount of research [177, 332, 135, 299, 173, 124] has already au- tomated the process of identifying semantic preserv- ing payload in prompt injections with various focus. Facilitated by the capability for fine-tuning, backdoors may be introduced through prompt attacks [12, 133, 346, 243]. Moreover, Greshake et al. [89] expressed concerns about the potential for new vulnerabilities arising from LLMs invoking external resources. Other studies have also demonstrated the ability to take advantage of prompt injection attacks, such as un- veiling guide prompts [342], virtualizing prompt in- jection [320], and integrating applications [178]. He et al. [100, 101] explored a shift towards leveraging LLMs, trained on extensive datasets, for mitigating such attacks. •Denial of Service.A Denial of Service (DoS) at- tack is a type of cyber attack that aims to exhaust computational resources, causing latency or rendering resources unavailable. Due to the nature of LLMs require significant amount of resources, attackers use deliberately construct prompts to reduce the availabil- ity of models [59]. Shumailov et al. [252] proved the possibility of conducting sponge attacks in the field of LLMs, specifically designed to maximize energy consumption and latency (by a factor of 10 to 200). This strategy aims to draw the community’s attention to their potential impact on autonomous vehicles, as well as scenarios requiring making decisions in timely manner. Finding V.Currently, there is limited research on model extraction attacks [68], parameter extraction attacks, or the extraction of other intermediate esults [279]. While there are a few mentions of these topics, they tend to re- main primarily theoretical (e.g., [172]), with limited prac- tical implementation or empirical exploration. We believe that the sheer scale of parameters in LLMs complicates these traditional approaches, rendering them less effective or even infeasible. Additionally, the most powerful LLMs are privately owned, with their weights, parameters, and other details kept confidential, further shielding them from conventional attack strategies. Strict censorship of outputs generated by these LLMs challenges even black- box traditional ML attacks, as it limits the attackers’ ability to exploit or analyze the model’s responses. 6.1.2. Non-AI Inherent Vulnerabilities and Threats We also need to consider non-AI Inherent Attacks, which encompass external threats and new vulnerabilities (which have not been observed or investigated in traditional AI models) that LLMs might encounter. These attacks may not be intricately linked to the internal mechanisms of the AI model, yet they can present significant risks. Illustrative instances of non-AI Inherent Attacks involve system-level vulnerabilities (e.g., remote code execution). (A6) Remote Code Execution (RCE).RCE attacks typ- ically target vulnerabilities in software applications, web services, or servers to execute arbitrary code remotely. While RCE attacks are not typically applicable directly to LLMs, if an LLM is integrated into a web service (e.g.,https: //chat.openai.com/) and if there are RCE vulnerabilities in the underlying infrastructure or code of that service, it could potentially lead to the compromise of the LLM’s environment. Tong et al. [175] identified 13 vulnerabilities in six frameworks, including 12 RCE vulnerabilities and 1 arbitrary file read/write vulnerability. Additionally, 17 out of 51 tested apps were found to have vulnerabilities, with 16 being vulnerable to RCE and 1 to SQL injection. These vulnerabilities allow attackers to execute arbitrary code on app servers through prompt injections. (A7) Side Channel.While LLMs themselves do not typi- cally leak information through traditional side channels such as power consumption or electromagnetic radiation, they can be vulnerable to certain side-channel attacks in practical deployment scenarios. For example, Edoardo et al. [51] introduce privacy side channel attacks, which are attacks that exploit system-level components (e.g., data filtering, output monitoring) to extract private information at a much higher rate than what standalone models can achieve. Four categories of side channels covering the entire ML lifecycle are proposed, enabling enhanced membership inference at- tacks and novel threats (e.g., extracting users’ test queries). For instance, the research demonstrates how deduplicating training data before applying differentially-private training creates a side channel that compromises privacy guarantees. (A8) Supply Chain Vulnerabilities.Supply Chain Vulner- abilities refer to the risks in the lifecycle of LLM appli- cations that may arise from using vulnerable components or services. These include third-party datasets, pre-trained models, and plugins, any of which can compromise the application’s integrity [206]. Most research in this field is focused on the security of plugins. An LLM plugin is an extension or add-on module that enhances the capabilities of an LLM. Third-party plug-ins have been developed to expand its functionality, enabling users to perform various tasks, including web searches, text analysis, and code exe- cution. However, some of the concerns raised by security experts [206, 25] include the possibility of plug-ins being used to steal chat histories, access personal information, or execute code on users’ machines. These vulnerabilities are associated with the use of OAuth in plug-ins, a web standard for data sharing across online accounts. Umar et al. [115] attempted to address this problem by designing a frame- work. The framework formulates an extensive taxonomy of attacks specific to LLM platforms, taking into account the capabilities of plugins, users, and the LLM platform itself. By considering the relationships between these stakeholders, the framework helps identify potential security, privacy, and safety risks. Yifan Yao et al.:Preprint submitted to ElsevierPage 12 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly 6.2. Defenses for LLMs In this section, we examine the range of existing defense methods against various attacks and vulnerabilities associ- ated with LLMs 1 . 6.2.1. Defense in Model Architecture Model architectures determine how knowledge and con- cepts are stored, organized, and contextually interacted with, which is crucial in the safety of Large Language Models. There have been a lot of works [165, 351, 168, 333] delved into how model capacities affect the privacy preservation and robustness of LLMs. Li et al. [165] revealed that lan- guage models with larger parameter sizes can be trained more effectively in the differential privacy manner using appropriate non-standard hyper-parameters, in comparison to smaller models. Zhu et al. [351] and Li et al. [168] found that LLMs with larger capacities, such as those with more extensive parameter sizes, generally show increased robustness against adversarial attacks. This was also verified in the Out-of-distribution (OOD) robustness scenarios by Yuan et al. [333]. Beyond the architecture of LLMs them- selves, studies have focused on improving LLM safety by combining them with external modules including knowledge graphs [39] and cognitive architectures (CAs) [150, 11]. Romero et al. [231] proposed improving AI robustness by incorporating various cognitive architectures into LLMs. Zafar et al. [336] aimed to build trust in AI by enhancing the reasoning abilities of LLMs through knowledge graphs. 6.2.2. Defenses in LLM Training and Inference Defense Strategies in LLM Training.The core compo- nents of LLM training include model architectures, training data, and optimization methods. Regarding model architec- tures, we examine trustworthy designs that exhibit increased robustness against malicious use. For training corpora, our investigation focuses on methods aimed at mitigating un- desired properties during the generation, collection, and cleaning of training data. In the context of optimization methods, we review existing works that developed safe and secure optimization frameworks. •Corpora Cleaning.LLMs are shaped by their train- ing corpora, from which they learn behavior, concepts, and data distributions [302]. Therefore, the safety of LLMs is crucially influenced by the quality of the training corpora [86, 204]. However, it has been widely acknowledged that raw corpora collected from the web are full of issues of fairness [14], toxicity [88], privacy [208], truthfulness [171], etc. A lot of efforts have been made to clean raw corpora and create high- quality training corpora for LLMs [129, 306, 152, 307, 213, 277]. In general, these pipelines consist of the fol- lowing steps: language identification [129, 9], detox- ification [88, 48, 180, 195], debiasing [188, 21, 16], 1 Please be aware that we will not delve into solutions for non-AI inherent vulnerabilities as they tend to be highly specific to individual cases. de-identification (personally identifiable information (PII)) [264, 284], and deduplication [153, 134, 106, 157]. Debiasing and detoxification aimed to remove undesirable content from training corpora. •Optimization Methods.Optimization objectives are crucial in directing how LLMs learn from training data, influencing which behaviors are encouraged or penalized. These objectives affect the prioritization of knowledge and concepts within corpora, ultimately impacting the overall safety and ethical alignment of LLMs. In this context, robust training methods like adversarial training [176, 293, 350, 330, 163] and robust fine-tuning [66, 121] have shown resilience against perturbation-based text attacks. Drawing in- spiration from traditional adversarial training in the image field [182], Ivgi et al. [116] and Yoo et al. [330] applied adversarial training to LLMs by generating perturbations concerning discrete tokens. Wang et al. [293] extended this approach to the continuous embedding space, facilitating more practical conver- gence, as followed by subsequent research [176, 350, 163]. Safety alignments [205], an emerging learning paradigm, guide LLM behavior using well-aligned additional models or human annotations, proving ef- fective for ethical alignment. Efforts to align LLMs with other LLMs [334] and LLMs themselves [268]. In terms of human annotations, Zhou et al. [349] and Shi et al. [249] emphasized the importance of high- quality training corpora with carefully curated instruc- tions and outputs for enhancing instruction-following capabilities in LLMs. Bianchi et al. [20] highlighted that the safety of LLMs can be substantially improved by incorporating a limited percentage (e.g., 3%) of safe examples during fine-tuning. Defense Strategies in LLM Inference.When LLMs are de- ployed as cloud services, they operate by receiving prompts or instructions from users and generating completed sen- tences in response. Given this interaction model, the im- plementation of test-time LLM defense becomes a neces- sary and critical aspect of ensuring safe and appropriate outputs. Generally, test-time defense encompasses a range of strategies, including the pre-processing of prompts and instructions to filter or modify inputs, the detection of abnor- mal events that might signal misuse or problematic queries, and the post-processing of generated responses to ensure they adhere to safety and ethical guidelines. Test-time LLM defenses are essential to maintain the integrity and trustwor- thiness of LLMs in real-time applications. •Instruction Processing (Pre-Processing).Instruc- tion pre-processing applies transformations over in- structions sent by users, in order to destroy potential adversarial contexts or malicious intents. It plays a vital role as it blocks out most malicious usage and Yifan Yao et al.:Preprint submitted to ElsevierPage 13 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly prevents LLMs from receiving suspicious instruc- tions. In general, instruction pre-processing methods can be categorized as instruction manipulation [246, 230, 140, 117, 318], purification [164], and defensive demonstrations [172, 193, 301]. Jain et al. [117] and Kirchenbauer et al. [140] evaluated multiple baseline preprocessing methods against jailbreaking attacks, including retokenization and paraphrase. Li et al. [164] proposed to purify instructions by first masking the input tokens and then predicting the masked tokens with other LLMs. The predicted tokens will serve as the purified instructions. Wei et al. [301] and Mo et al. [193] demonstrated that inserting pre- defined defensive demonstrations into instructions effectively defends jailbreaking attacks of LLMs. •Malicious Detection (In-Processing).Malicious de- tection provides in-depth examinations of LLM inter- mediate results, such as neuron activation, regarding the given instructions, which are more sensitive, accu- rate, and specified for malicious usage. Sun et al. [266] proposed to detect backdoored instructions with back- ward probabilities of generations. Xi et al. [312] dif- ferentiated normal and poisoned instructions from the perspective of mask sensitivities. Shao et al. [246] identified suspicious words according to their tex- tual relevance. Wang et al. [298] detected adversar- ial examples according to the semantic consistency among multiple generations, which has been explored in the uncertainty quantification of LLMs by Duan et al. [67]. Apart from the intrinsic properties of LLMs, there have been works leveraging the linguistic statis- tic properties, such as detecting outlier words [220], •Generation Processing (Post-Processing).Genera- tion post processing refers to examining the proper- ties (e.g., harmfulness) of the generated answers and applying modifications if necessary, which is the final step before delivering responses to users. Chen et al. [34] proposed to mitigate the toxicity of genera- tions by comparing with multiple model candidates. Helbling et al. [103] incorporated individual LLMs to identify the harmfulness of the generated answers, which shared similar ideas as Xiong et al. [317] and Kadavath et al. [131] where they revealed that LLMs can be prompted to answer the confidences regarding the generated responses. Finding VI.For defense in LLM training, there’s a no- table scarcity of research examining the impact of model architecture on LLM safety, which is likely due to the high computational costs associated with training or fine- tuning large language models. We observed thatsafe instruction tuningis a relatively new development that warrants further investigation and attention. 7. Discussion 7.1. LLM in Other Security Related Topics LLMs in Cybersecurity Education.LLMs can be used in security practices and education [80, 162, 270]. For exam- ple, in a software security course, students are tasked with identifying and resolving vulnerabilities in a web application using LLMs. Jingyue et al. [162] investigated how ChatGPT can be used by students for these exercises. Wesley Tann et al. [270] focused on the evaluation of LLMs in the context of cybersecurity Capture-The-Flag (CTF) exercises (partici- pants find “flags” by exploiting system vulnerabilities). The study first assessed the question-answering performance of these LLMs on Cisco certifications with varying difficulty levels, then examined their abilities in solving CTF chal- lenges. Jin et al. [126] conducted a comprehensive study on LLMs’ understanding of binary code semantics [127] across different architectures and optimization levels, providing key insights for future research in this area. LLMs in Cybersecurity Laws, Policies and Compliance. LLMs can assist in drafting security policies, guidelines, and compliance documentation, ensuring that organizations meet regulatory requirements and industry standards. How- ever, it’s important to recognize that the utilization of LLMs can potentially necessitate changes to current cybersecurity- related laws and policies. The introduction of LLMs may raise new legal and regulatory considerations, as these mod- els can impact various aspects of cybersecurity, data protec- tion, and privacy. Ekenobi et al. [273] examined the legal implications arising from the introduction of LLMs, with a particular focus on data protection and privacy concerns. It acknowledges that ChatGPT’s privacy policy contains commendable provisions for safeguarding user data against potential threats. The paper also advocated for emphasizing the relevance of the new law. 7.2. Future Directions We have gleaned valuable lessons that we believe can shape future directions. •Using LLMs for ML-Specific Tasks. We noticed that LLMs can effectively replace traditional machine learning methods and in this context, if traditional ma- chine learning methods can be employed in a specific security application (whether offensive or defensive in nature), it is highly probable that LLMs can also be ap- plied to address that particular challenge. For instance, traditional machine learning methods have found util- ity in malware detection, and LLMs can similarly be harnessed for this purpose. Therefore, one promising avenue is to harness the potential of LLMs in secu- rity applications where machine learning serves as a foundational or widely adopted technique. As security researchers, we are capable of designing LLM-based approaches to tackle security issues. Subsequently, we can compare these approaches with state-of-the-art methods to push the boundaries. Yifan Yao et al.:Preprint submitted to ElsevierPage 14 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly •Replacing Human Efforts.It is evident that LLMs have the potential to replace human efforts in both offensive and defensive security applications. For in- stance, tasks involving social engineering, tradition- ally reliant on human intervention, can now be effec- tively executed using LLM techniques. Therefore, one promising avenue for security researchers is to iden- tify areas within traditional security tasks where hu- man involvement has been pivotal and explore oppor- tunities to substitute these human efforts with LLM capabilities. •Modifying Traditional ML Attacks for LLMs.we have observed that many security vulnerabilities in LLMs are extensions of vulnerabilities found in tra- ditional machine-learning scenarios. That is, LLMs remain a specialized instance of deep neural networks, inheriting common vulnerabilities such as adversarial attacks and instruction tuning attacks. With the right adjustments (e.g., the threat model), traditional ML at- tacks can still be effective against LLMs. For instance, the jailbreaking attack is a specific form of instruction tuning attack aimed at producing restricted texts. •Adapting Traditional ML Defenses for LLMs.The countermeasures traditionally employed for vulnera- bility mitigation can also be leveraged to address these security issues. For example, there are existing ef- forts that utilize traditional Privacy-Enhancing Tech- nologies (e.g., zero-knowledge proofs, differential pri- vacy, and federated learning [304, 305] ) to tackle privacy challenges posed by LLMs. Exploring addi- tional PETs techniques, whether they are established methods or innovative approaches, to address these challenges represents another promising research di- rection. •Solving Challenges in LLM-Specific Attacks.As previously discussed, there are several challenges as- sociated with implementing model extraction or pa- rameter extraction attacks (e.g., vast scale of LLM parameters, private ownership and confidentiality of powerful LLMs). These novel characteristics intro- duced by LLMs represent a significant shift in the landscape, potentially leading to new challenges and necessitating the evolution of traditional ML attack methodologies. 8. Related Work There have already been a number of LLM surveys released with a variety of focuses (e.g., LLM evolution and taxon- omy [31, 347, 309, 93, 311, 23, 348], software engineer- ing [77, 108], and medicine [274, 44]). In this paper, our primary emphasis is on the security and privacy aspects of LLMs. We now delve into an examination of the ex- isting literature pertaining to this particular topic. Peter J. Caven [30] specifically explores how LLMs (particularly, ChatGPT) could potentially alter the current cybersecurity landscape by blending technical and social aspects. Their emphasis leans more towards the social aspects. Muna et al. [5] and Marshall et al. [185] discussed the impact of ChatGPT in cybersecurity, highlighting its practical appli- cations (e.g., code security, malware detection). Dhoni et al. [62] demonstrated how LLMs can assist security analysts in developing security solutions against cyber threats. How- ever, their work does not extensively address the potential cybersecurity threats that LLM may introduce. A number of surveys (e.g., [92, 59, 247, 49, 60, 228, 240, 241, 7]) highlight the threats and attacks against LLMs. In com- parison to our work, they do not dedicate as much text to the vulnerabilities that the LLM may possess. Instead, their primary focus lies in the realm of security applications, as they delve into utilizing LLMs for launching cyberattacks. Attia Qammar et al. [219] and Maximilian et al. [196] discussed vulnerabilities exploited by cybercriminals, with a specific focus on the risks associated with LLMs. Their works emphasized the need for strategies and measures to mitigate these threats and vulnerabilities. Haoran Li et al. [166] analyzed current privacy concerns on LLMs, cate- gorizing them based on adversary capabilities, and explored existing defense strategies. Glorin Sebastian [242] explored the application of established Privacy-Enhancing Technolo- gies (e.g., differential privacy [70], federated learning [338], and data minimization [216]) for safeguarding the privacy of LLMs. Smith et al. [256] also discussed the privacy risks of LLMs. Our study comprehensively examined both the security and privacy aspects of LLMs. In summary, our research conducted an extensive review of the literature on LLMs from a three-fold perspective: beneficial security applications (e.g., vulnerability detection, secure code gen- eration), adverse implications (e.g., phishing attacks, social engineering), and vulnerabilities (e.g., jailbreaking attacks, prompt attacks), along with their corresponding defensive measures. 9. Conclusion Our work represents a pioneering effort in systematically examining the multifaceted role of LLMs in security and privacy. On the positive side, LLMs have significantly con- tributed to enhancing code and data security, while their versatile nature also opens the door to malicious applica- tions. We also delved into the inherent vulnerabilities within these models, and discussed defense mechanisms. We have illuminated the path forward for harnessing the positive aspects of LLMs while mitigating their potential risks. As LLMs continue to evolve and find their place in an ever- expanding array of applications, it is imperative that we remain vigilant in addressing security and privacy concerns, ensuring that these powerful models contribute positively to the digital landscape. Yifan Yao et al.:Preprint submitted to ElsevierPage 15 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly Acknowledgement We thank the anonymous reviewers and Xin Jin from The Ohio State University for their invaluable feedback. This research was supported partly by the NSF award FMitF- 2319242. Any opinions, findings, conclusions, or recom- mendations expressed are those of the authors and not nec- essarily of the NSF. References [1] M. Abbasian, I. Azimi, A. M. Rahmani, and R. Jain, “Conversational health agents: A personalized llm-powered agent framework,” 2023. [2] H. Aghakhani, W. Dai, A. Manoel, X. Fernandes, A. Kharkar, C. Kruegel, G. Vigna, D. Evans, B. Zorn, and R. Sim, “Trojan- puzzle: Covertly poisoning code-suggestion models,”arXiv preprint arXiv:2301.02344, 2023. [3] B. Ahmad, S. Thakur, B. Tan, R. Karri, and H. Pearce, “Fixing hardware security bugs with large language models,” arXiv preprint arXiv:2302.01215, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2302.01215 [4] M. AI, “Introducing llama: A foundational, 65-billion- parameterlanguagemodel,”https://ai.meta.com/blog/ large-language-model-llama-meta-ai/, feb 2023, accessed: 2023-11-13. [5] M. Al-Hawawreh, A. Aljuhani, and Y. Jararweh, “Chatgpt for cyber- security: practical applications, challenges, and future directions,” Cluster Computing, vol. 26, no. 6, p. 3421–3436, 2023. [6] S. Alagarsamy, C. Tantithamthavorn, and A. Aleti, “A3test: Assertion-augmented automated test case generation,”arXiv preprint arXiv:2302.10352, 2023. [7] M. Alawida, B. A. Shawar, O. I. Abiodun, A. Mehmood, A. E. Omolaraet al., “Unveiling the dark side of chatgpt: Exploring cyberattacks and enhancing user awareness,” 2023. [8] T. Ali and P. Kostakos, “Huntgpt: Integrating machine learning- based anomaly detection and explainable ai with large language models (llms),”arXiv preprint arXiv:2309.16021, 2023. [9] E. Ambikairajah, H. Li, L. Wang, B. Yin, and V. Sethu, “Language identification: A tutorial,”IEEE Circuits and Systems Magazine, vol. 11, no. 2, p. 82–108, 2011. [10] Z. Amos, “What is fraudgpt?” https://hackernoon.com/ what-is-fraudgpt, 2023. [11] J. R. Anderson and C. J. Lebiere,The atomic components of thought. Psychology Press, 2014. [12] Anonymous, “On the safety of open-sourced large language models: Does alignment really prevent them from being misused?” inSubmitted to The Twelfth International Conference on Learning Representations, 2023, under review. [Online]. Available: https://openreview.net/forum?id=E6Ix4ahpzd [13] B. B. Arcila, “Is it a platform? is it a search engine? it’s chatgpt! the european liability regime for large language models,”J. Free Speech L., vol. 3, p. 455, 2023. [14] A. H. Bailey, A. Williams, and A. Cimpian, “Based on billions of words on the internet, people= men,”Science Advances, vol. 8, no. 13, p. eabm2463, 2022. [15] A. Bakhshandeh, A. Keramatfar, A. Norouzi, and M. M. Chekidehkhoun, “Using chatgpt as a static application security testing tool,”arXiv preprint arXiv:2308.14434, 2023. [16] S. Barikeri, A. Lauscher, I. Vulić, and G. Glavaš, “Redditbias: A real- world resource for bias evaluation and debiasing of conversational language models,”arXiv preprint arXiv:2106.03521, 2021. [17] M. Beckerich, L. Plein, and S. Coronado, “Ratgpt: Turning online llms into proxies for malware attacks,” 2023. [18] S. Ben-Moshe, G. Gekker, and G. Cohen, “Opwnai: Ai that can save the day or hack it away. check point research (2022),” 2023. [19] A.-R. Bhojani and M. Schwarting, “Truth and regret: Large language models, the quran, and misinformation,” p. 1–7, 2023. [20] F. Bianchi, M. Suzgun, G. Attanasio, P. Röttger, D. Jurafsky, T. Hashimoto, and J. Zou, “Safety-tuned llamas: Lessons from im- proving the safety of large language models that follow instructions,” arXiv preprint arXiv:2309.07875, 2023. [21] S. Bordia and S. R. Bowman, “Identifying and reducing gender bias in word-level language models,”arXiv preprint arXiv:1904.03035, 2019. [22] M. Botacin, “Gpthreats-3: Is automatic malware generation a threat?” in2023 IEEE Security and Privacy Workshops (SPW). IEEE, 2023, p. 238–254. [23] S. R. Bowman, “Eight things to know about large language models,” arXiv preprint arXiv:2304.00612, 2023. [24] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhari- wal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020. [25] M. Burgess, “Chatgpt has a plug-in problem,” https://w.wired. com/story/chatgpt-plugins-security-privacy-risk/, 2023. [26] Y. Cai, S. Mao, W. Wu, Z. Wang, Y. Liang, T. Ge, C. Wu, W. You, T. Song, Y. Xiaet al., “Low-code llm: Visual programming over llms,”arXiv preprint arXiv:2304.08103, 2023. [27] B. Cao, Y. Cao, L. Lin, and J. Chen, “Defending against alignment- breaking attacks via robustly aligned llm,” 2023. [28] N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer, “Membership inference attacks from first principles,” in2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, p. 1897– 1914. [29] N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingssonet al., “Ex- tracting training data from large language models,” in30th USENIX Security Symposium (USENIX Security 21), 2021, p. 2633–2650. [30] P. Caven, “A more insecure ecosystem? chatgpt’s influence on cyber- security,”ChatGPT’s Influence on Cybersecurity (April 30, 2023), 2023. [31] Y. Chang, X. Wang, J. Wang, Y. Wu, K. Zhu, H. Chen, L. Yang, X. Yi, C. Wang, Y. Wanget al., “A survey on evaluation of large language models,”arXiv preprint arXiv:2307.03109, 2023. [32] P. V. S. Charan, H. Chunduri, P. M. Anand, and S. K. Shukla, “From text to mitre techniques: Exploring the malicious use of large language models for generating cyber attack payloads,” 2023. [33] B. Chen, F. Zhang, A. Nguyen, D. Zan, Z. Lin, J.-G. Lou, and W. Chen, “Codet: Code generation with generated tests,”arXiv preprint arXiv:2207.10397, 2022. [34] B. Chen, A. Paliwal, and Q. Yan, “Jailbreaker in jail: Mov- ing target defense for large language models,”arXiv preprint arXiv:2310.02417, 2023. [35] C. Chen and K. Shu, “Can llm-generated misinformation be de- tected?” 2023. [36] —, “Combating misinformation in the age of llms: Opportunities and challenges,”arXiv preprint arXiv:2311.05656, 2023. [37] C. Chen, J. Su, J. Chen, Y. Wang, T. Bi, Y. Wang, X. Lin, T. Chen, and Z. Zheng, “When chatgpt meets smart contract vulnerability detection: How far are we?”arXiv preprint arXiv:2309.05520, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2309.05520 [38] T. Chen, L. Li, L. Zhu, Z. Li, G. Liang, D. Li, Q. Wang, and T. Xie, “Vullibgen: Identifying vulnerable third-party libraries via generative pre-trained model,”arXiv preprint arXiv:2308.04662, 2023. [Online]. Available: https: //doi.org/10.48550/arXiv.2308.04662 [39] X. Chen, S. Jia, and Y. Xiang, “A review: Knowledge reasoning over knowledge graph,”Expert Systems with Applications, vol. 141, p. 112948, 2020. [40] Y. Chen, A. Arunasalam, and Z. B. Celik, “Can large language models provide security & privacy advice? measuring the ability of llms to refute misconceptions,” 2023. Yifan Yao et al.:Preprint submitted to ElsevierPage 16 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly [41] A. Cheshkov, P. Zadorozhny, and R. Levichev, “Evaluation of chatgpt model for vulnerability detection,”arXiv preprint arXiv:2304.07232, 2023. [Online]. Available: https://doi.org/10. 48550/arXiv.2304.07232 [42] C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label-only membership inference attacks,” inInternational confer- ence on machine learning. PMLR, 2021, p. 1964–1974. [43] M. Chowdhury, N. Rifat, S. Latif, M. Ahsan, M. S. Rahman, and R. Gomes, “Chatgpt: The curious case of attack vectors’ supply chain management improvement,” in2023 IEEE International Conference on Electro Information Technology (eIT), 2023, p. 499–504. [44] J. Clusmann, F. R. Kolbinger, H. S. Muti, Z. I. Carrero, J.-N. Eckardt, N. G. Laleh, C. M. L. Löffler, S.-C. Schwarzkopf, M. Unger, G. P. Veldhuizenet al., “The future landscape of large language models in medicine,”Communications Medicine, vol. 3, no. 1, p. 141, 2023. [45] D. R. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheat- ing: Ensuring academic integrity in the era of chatgpt,”Innovations in Education and Teaching International, p. 1–12, 2023. [46] G. M. Currie, “Academic integrity and artificial intelligence: is chatgpt hype, hero or heresy?” inSeminars in Nuclear Medicine. Elsevier, 2023. [47] S. Dai, Y. Zhou, L. Pang, W. Liu, X. Hu, Y. Liu, X. Zhang, and J. Xu, “Llms may dominate information access: Neural retrievers are bi- ased towards llm-generated texts,”arXiv preprint arXiv:2310.20501, 2023. [48] D. Dale, A. Voronov, D. Dementieva, V. Logacheva, O. Kozlova, N. Semenov, and A. Panchenko, “Text detoxification using large pre- trained neural models,”arXiv preprint arXiv:2109.08914, 2021. [49] B. Dash and P. Sharma, “Are chatgpt and deepfake algorithms endangering the cybersecurity industry? a review,”International Journal of Engineering and Applied Sciences, vol. 10, no. 1, 2023. [50] Databricks, “Free dolly: Introducing the world’s first open and commercially viable instruction-tuned llm,”https://w.databricks.com/blog/2023/04/12/ dolly-first-open-commercially-viable-instruction-tuned-llm, 2023, accessed: 2023-11-13. [51] E. Debenedetti, G. Severi, N. Carlini, C. A. Choquette-Choo, M. Jagielski, M. Nasr, E. Wallace, and F. Tramèr, “Privacy side channels in machine learning systems,”arXiv preprint arXiv:2309.05610, 2023. [52] D. Delley, “Wormgpt – the generative ai tool cybercriminals are using to launch business email compromise attacks,” https://shorturl. at/iwFL7, 2023. [53] G. Deng, Y. Liu, Y. Li, K. Wang, Y. Zhang, Z. Li, H. Wang, T. Zhang, and Y. Liu, “Jailbreaker: Automated jailbreak across multiple large language model chatbots,”arXiv preprint arXiv:2307.08715, 2023. [54] —, “Masterkey: Automated jailbreaking of large language model chatbots,” inProceedings of the 31th Annual Network and Dis- tributed System Security Symposium (NDSS’24), 2024. [55] G. Deng, Y. Liu, V. Mayoral-Vilches, P. Liu, Y. Li, Y. Xu, T. Zhang, Y. Liu, M. Pinzger, and S. Rass, “Pentestgpt: An llm-empowered au- tomatic penetration testing tool,”arXiv preprint arXiv:2308.06782, 2023. [56] Y. Deng, C. S. Xia, H. Peng, C. Yang, and L. Zhang, “Fuzzing deep-learning libraries via large language models,”arXiv preprint arXiv:2212.14834, 2022. [57] Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case fuzzers: Testing deep learn- ing libraries via fuzzgpt,”arXiv preprint arXiv:2304.02014, 2023. [58] —, “Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries,” in2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024, p. 830–842. [59] E. Derner, K. Batistič, J. Zahálka, and R. Babuška, “A secu- rity risk taxonomy for large language models,”arXiv preprint arXiv:2311.11415, 2023. [60] E. Derner and K. Batistič, “Beyond the safeguards: Exploring the security risks of chatgpt,” 2023. [61] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- training of deep bidirectional transformers for language understand- ing,” 2019. [62] P. Dhoni and R. Kumar, “Synergizing generative ai and cyberse- curity: Roles of generative ai entities, companies, agencies, and government in enhancing cybersecurity,” 2023. [63] H. Ding, V. Kumar, Y. Tian, Z. Wang, R. Kwiatkowski, X. Li, M. K. Ramanathan, B. Ray, P. Bhatia, S. Senguptaet al., “A static evaluation of code completion by large language models,”arXiv preprint arXiv:2306.03203, 2023. [64] X. Ding, L. Chen, M. Emani, C. Liao, P.-H. Lin, T. Vanderbruggen, Z. Xie, A. Cerpa, and W. Du, “Hpc-gpt: Integrating large language model for high-performance computing,” inProceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC-W 2023. ACM, Nov. 2023. [Online]. Available: http://dx.doi.org/10.1145/3624062.3624172 [65] X. Dong, Y. Wang, P. S. Yu, and J. Caverlee, “Probing explicit and implicit gender bias through llm conditional text generation,”arXiv preprint arXiv:2311.00306, 2023. [66] X. Dong, A. T. Luu, M. Lin, S. Yan, and H. Zhang, “How should pre- trained language models be fine-tuned towards adversarial robust- ness?”Advances in Neural Information Processing Systems, vol. 34, p. 4356–4369, 2021. [67] J. Duan, H. Cheng, S. Wang, C. Wang, A. Zavalny, R. Xu, B. Kailkhura, and K. Xu, “Shifting attention to relevance: Towards the uncertainty estimation of large language models,”arXiv preprint arXiv:2307.01379, 2023. [68] J. Duan, F. Kong, S. Wang, X. Shi, and K. Xu, “Are diffusion models vulnerable to membership inference attacks?” inProceedings of the 40th International Conference on Machine Learning, 2023, p. 8717–8730. [69] F. Duarte, “Number of chatgpt users (nov 2023),” https:// explodingtopics.com/blog/chatgpt-users, 2023, accessed: 2023-11- 13. [70] C. Dwork, “Differential privacy,” inInternational colloquium on automata, languages, and programming. Springer, 2006, p. 1–12. [71] C. Egersdoerfer, D. Zhang, and D. Dai, “Early exploration of using chatgpt for log-based anomaly detection on parallel file systems logs,” 2023. [72] D. O. Eke, “Chatgpt and the rise of generative ai: threat to academic integrity?”Journal of Responsible Technology, vol. 13, p. 100060, 2023. [73] A. Elhafsi, R. Sinha, C. Agia, E. Schmerling, I. A. Nesnas, and M. Pavone, “Semantic anomaly detection with large language mod- els,”Autonomous Robots, p. 1–21, 2023. [74] S. Eli and D. Gil, “Self-enhancing pattern detection with llms: Our answer to uncovering malicious packages at scale,” https: //apiiro.com/blog/llm-code-pattern-malicious-package-detection/, 2023, accessed: 2023-11-13. [75] T. Espinha Gasiba, K. Oguzhan, I. Kessba, U. Lechner, and M. Pinto- Albuquerque, “I’m sorry dave, i’m afraid i can’t fix your code: On chatgpt, cybersecurity, and secure coding,” in4th International Computer Programming Education Conference (ICPEC 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2023. [76] P. V. Falade, “Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in social engineering attacks,”International Journal of Scientific Research in Computer Science, Engineering and Information Technology, p. 185–198, Oct. 2023. [Online]. Available: http://dx.doi.org/10.32628/CSEIT2390533 [77] A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, “Large language models for software engineering: Survey and open problems,” 2023. [78] T. Fan, Y. Kang, G. Ma, W. Chen, W. Wei, L. Fan, and Q. Yang, “Fate-llm: A industrial grade federated learning framework for large language models,”arXiv preprint arXiv:2310.10049, 2023. [79] X. Fang, S. Che, M. Mao, H. Zhang, M. Zhao, and X. Zhao, “Bias of ai-generated content: An examination of news produced by large Yifan Yao et al.:Preprint submitted to ElsevierPage 17 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly language models,”arXiv preprint arXiv:2309.09825, 2023. [80] J. C. Farah, B. Spaenlehauer, V. Sharma, M. J. Rodríguez-Triana, S. Ingram, and D. Gillet, “Impersonating chatbots in a code review exercise to teach software engineering best practices,” in2022 IEEE Global Engineering Education Conference (EDUCON). IEEE, 2022, p. 1634–1642. [81] V. K. Felkner, H.-C. H. Chang, E. Jang, and J. May, “Winoqueer: A community-in-the-loop benchmark for anti-lgbtq+ bias in large language models,”arXiv preprint arXiv:2306.15087, 2023. [82] S. Y. Feng, V. Gangal, J. Wei, S. Chandar, S. Vosoughi, T. Mitamura, and E. Hovy, “A survey of data augmentation approaches for nlp,” arXiv preprint arXiv:2105.03075, 2021. [83] M. Fu, C. Tantithamthavorn, V. Nguyen, and T. Le, “Chatgpt for vulnerability detection, classification, and repair: How far are we?” 2023. [84] W. Fu, H. Wang, C. Gao, G. Liu, Y. Li, and T. Jiang, “Practical mem- bership inference attacks against fine-tuned large language models via self-prompt calibration,” 2023. [85] —, “A probabilistic fluctuation based membership inference at- tack for diffusion models,” 2023. [86] P. Ganesh, H. Chang, M. Strobel, and R. Shokri, “On the impact of machine learning randomness on group fairness,” inProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, p. 1789–1800. [87] C. A. Gao, F. M. Howard, N. S. Markov, E. C. Dyer, S. Ramesh, Y. Luo, and A. T. Pearson, “Comparing scientific abstracts gener- ated by chatgpt to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers,” BioRxiv, p. 2022–12, 2022. [88] S. Gehman, S. Gururangan, M. Sap, Y. Choi, and N. A. Smith, “Re- altoxicityprompts: Evaluating neural toxic degeneration in language models,”arXiv preprint arXiv:2009.11462, 2020. [89] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models,”arXiv preprint arXiv:2302.12173, 2023. [90] Q. Gu, “Llm-based code generation method for golang compiler testing,” 2023. [91] Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anoma- lygpt: Detecting industrial anomalies using large vision-language models,”arXiv preprint arXiv:2308.15366, 2023. [92] M. Gupta, C. Akiri, K. Aryal, E. Parker, and L. Praharaj, “From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy,”IEEE Access, 2023. [93] M. U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M. Shaikh, N. Akhtar, J. Wu, and S. Mirjalili, “A survey on large language models: Applications, challenges, limitations, and practical usage,” TechRxiv, 2023. [94] A. Happe and J. Cito, “Getting pwn’d by ai: Penetration testing with large language models,”arXiv preprint arXiv:2308.00121, 2023. [95] A. Happe, A. Kaplan, and J. Cito, “Evaluating llms for privilege- escalation scenarios,” 2023. [96] J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro, “Logan: Membership inference attacks against generative models,”arXiv preprint arXiv:1705.07663, 2017. [97] J. Hazell, “Large language models can be used to effectively scale spear phishing campaigns,” 2023. [98] J. He and M. Vechev, “Large language models for code: Security hardening and adversarial testing,”ICML 2023 Workshop Deploy- ableGenerativeAI, 2023, keywords: large language models, code generation, security, prompt tuning. [99] —, “Large language models for code: Security hardening and adversarial testing,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, p. 1865–1879. [100] X. He, S. Zannettou, Y. Shen, and Y. Zhang, “You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content,”arXiv preprint arXiv:2308.05596, 2023. [101] —, “You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content,” in2024 IEEE Symposium on Security and Privacy (SP), 2024. [102] F. Heiding, B. Schneier, A. Vishwanath, and J. Bernstein, “Devising and detecting phishing: Large language models vs. smaller human models,” 2023. [103] A. Helbling, M. Phute, M. Hull, and D. H. Chau, “Llm self defense: By self examination, llms know they are being tricked,”arXiv preprint arXiv:2308.07308, 2023. [104] R. Helmke and J. vom Dorp, “Check for extended abstract: Towards reliable and scalable linux kernel cve attribution in automated static firmware analyses,” inDetection of Intrusions and Malware, and Vulnerability Assessment: 20th International Conference, DIMVA 2023, Hamburg, Germany, July 12–14, 2023, Proceedings, vol. 13959. Springer Nature, 2023, p. 201. [105] P. Henrik, “Llm-assisted malware review: Ai and humans join forces to combat malware,” https://shorturl.at/loqT4, 2023, accessed: 2023- 11-13. [106] D. Hernandez, T. Brown, T. Conerly, N. DasSarma, D. Drain, S. El- Showk, N. Elhage, Z. Hatfield-Dodds, T. Henighan, T. Humeet al., “Scaling laws and interpretability of learning from repeated data,” arXiv preprint arXiv:2205.10487, 2022. [107] B. Hettwer, S. Gehrer, and T. Güneysu, “Applications of machine learning techniques in side-channel attacks: a survey,”Journal of Cryptographic Engineering, vol. 10, p. 135–162, 2020. [108] X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for soft- ware engineering: A systematic literature review,”arXiv preprint arXiv:2308.10620, 2023. [109] J. Hu, Q. Zhang, and H. Yin, “Augmenting greybox fuzzing with generative ai,”arXiv preprint arXiv:2306.06782, 2023. [110] S. Hu, T. Huang, F. İlhan, S. F. Tekin, and L. Liu, “Large language model-powered smart contract vulnerability detection: New perspectives,”arXiv preprint arXiv:2310.01152, 2023, 10 pages. [Online]. Available: https://doi.org/10.48550/arXiv.2310.01152 [111] D. Huang, Q. Bu, J. Zhang, X. Xie, J. Chen, and H. Cui, “Bias assessment and mitigation in llm-based code generation,”arXiv preprint arXiv:2309.14345, 2023. [112] H. Huang, W. Luo, G. Zeng, J. Weng, Y. Zhang, and A. Yang, “Damia: leveraging domain adaptation as a defense against mem- bership inference attacks,”IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 5, p. 3183–3199, 2021. [113] J. Huang, H. Shao, and K. C.-C. Chang, “Are large pre-trained language models leaking your personal information?”arXiv preprint arXiv:2205.12628, 2022. [114] V. M. Igure and R. D. Williams, “Taxonomies of attacks and vul- nerabilities in computer systems,”IEEE Communications Surveys & Tutorials, vol. 10, no. 1, p. 6–19, 2008. [115] U. Iqbal, T. Kohno, and F. Roesner, “Llm platform security: Apply- ing a systematic evaluation framework to openai’s chatgpt plugins,” 2023. [116] M. Ivgi and J. Berant, “Achieving model robustness through discrete adversarial training,”arXiv preprint arXiv:2104.05062, 2021. [117] N. Jain, A. Schwarzschild, Y. Wen, G. Somepalli, J. Kirchenbauer, P.-y. Chiang, M. Goldblum, A. Saha, J. Geiping, and T. Goldstein, “Baseline defenses for adversarial attacks against aligned language models,”arXiv preprint arXiv:2309.00614, 2023. [118] R. Jain, N. Gervasoni, M. Ndhlovu, and S. Rawat, “A code centric evaluation of c/c++ vulnerability datasets for deep learning based vulnerability detection techniques,” inProceedings of the 16th In- novations in Software Engineering Conference, 2023, p. 1–10. [119] S. Jamal and H. Wimmer, “An improved transformer-based model for detecting phishing, spam, and ham: A large language model approach,” 2023. [120] B. Jayaraman, L. Wang, K. Knipmeyer, Q. Gu, and D. Evans, “Re- visiting membership inference under realistic assumptions,”arXiv preprint arXiv:2005.10881, 2020. Yifan Yao et al.:Preprint submitted to ElsevierPage 18 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly [121] H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and T. Zhao, “Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization,”arXiv preprint arXiv:1911.03437, 2019. [122] J. Jiang, X. Liu, and C. Fan, “Low-parameter federated learning with large language models,”arXiv preprint arXiv:2307.13896, 2023. [123] N. Jiang, K. Liu, T. Lutellier, and L. Tan, “Impact of code language models on automated program repair,” 2023. [124] S. Jiang, X. Chen, and R. Tang, “Prompt packer: Deceiving llms through compositional instruction with hidden attacks,”arXiv preprint arXiv:2310.10077, 2023. [125] M. Jin, S. Shahriar, M. Tufano, X. Shi, S. Lu, N. Sundaresan, and A. Svyatkovskiy, “Inferfix: End-to-end program repair with llms,” 2023. [126] X. Jin, J. Larson, W. Yang, and Z. Lin, “Binary code summariza- tion: Benchmarking chatgpt/gpt-4 and other large language models,” 2023. [127] X. Jin, K. Pei, J. Y. Won, and Z. Lin, “Symlm: Predicting func- tion names in stripped binaries via context-sensitive execution- aware code embeddings,” inProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, p. 1631–1645. [128] C. Joshi, U. K. Singh, and K. Tarey, “A review on taxonomies of attacks and vulnerability in computer and network system,”Interna- tional Journal, vol. 5, no. 1, 2015. [129] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, “Fasttext. zip: Compressing text classification models,” arXiv preprint arXiv:1612.03651, 2016. [130] M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “Prada: protecting against dnn model stealing attacks,” in2019 IEEE European Sym- posium on Security and Privacy (EuroS&P). IEEE, 2019, p. 512– 527. [131] S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnsonet al., “Language models (mostly) know what they know,”arXiv preprint arXiv:2207.05221, 2022. [132] N. Kandpal, M. Jagielski, F. Tramèr, and N. Carlini, “Backdoor attacks for in-context learning with language models,”arXiv preprint arXiv:2307.14692, 2023. [133] N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C. A. Choquette-Choo, and Z. Xu, “User inference attacks on large language models,” 2023. [134] N. Kandpal, E. Wallace, and C. Raffel, “Deduplicating training data mitigates privacy risks in language models,” inInternational Conference on Machine Learning. PMLR, 2022, p. 10 697– 10 707. [135] D. Kang, X. Li, I. Stoica, C. Guestrin, M. Zaharia, and T. Hashimoto, “Exploiting programmatic behavior of llms: Dual-use through stan- dard security attacks,”arXiv preprint arXiv:2302.05733, 2023. [136] S. Kang, J. Yoon, and S. Yoo, “Llm lies: Hallucinations are not bugs, but features as adversarial examples,” in2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023. [137] S. Kariyappa, A. Prakash, and M. K. Qureshi, “Maze: Data-free model stealing attack using zeroth-order gradient estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, p. 13 814–13 823. [138] M. Karpinska and M. Iyyer, “Large language models effectively leverage document-level context for literary translation, but critical errors persist,”arXiv preprint arXiv:2304.03245, 2023. [139] M. Khalil and E. Er, “Will chatgpt get you caught? rethinking of plagiarism detection,”arXiv preprint arXiv:2302.04335, 2023. [140] J. Kirchenbauer, J. Geiping, Y. Wen, M. Shu, K. Saifullah, K. Kong, K. Fernando, A. Saha, M. Goldblum, and T. Goldstein, “On the reliability of watermarks for large language models,”arXiv preprint arXiv:2306.04634, 2023. [141] C. Koch, “I used gpt-3 to find 213 security vulnerabilities in a single codebase,” http://surl.li/ncjvo, 2023. [142] T. Koide, N. Fukushi, H. Nakano, and D. Chiba, “Detecting phishing sites using chatgpt,”arXiv preprint arXiv:2306.05816, 2023. [143] F. Kong, J. Duan, R. Ma, H. Shen, X. Zhu, X. Shi, and K. Xu, “An efficient membership inference attack for the diffusion model by proximal initialization,”arXiv preprint arXiv:2305.18355, 2023. [144] H. Kotek, R. Dockum, and D. Sun, “Gender bias and stereotypes in large language models,” inProceedings of The ACM Collective Intelligence Conference, 2023, p. 12–24. [145] W. Kuang, B. Qian, Z. Li, D. Chen, D. Gao, X. Pan, Y. Xie, Y. Li, B. Ding, and J. Zhou, “Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning,” arXiv preprint arXiv:2309.00363, 2023. [146] K. Kumari, A. Pegoraro, H. Fereidooni, and A.-R. Sadeghi, “Demasq: Unmasking the chatgpt wordsmith,”arXiv preprint arXiv:2311.05019, 2023. [147] —, “Demasq: Unmasking the chatgpt wordsmith,” inProceedings of the 31th Annual Network and Distributed System Security Sympo- sium (NDSS’24), 2024. [148] K. Kurita, P. Michel, and G. Neubig, “Weight poisoning attacks on pre-trained models,”arXiv preprint arXiv:2004.06660, 2020. [149] H. Kwon, M. Sim, G. Song, M. Lee, and H. Seo, “Novel approach to cryptography implementation using chatgpt,” Cryptology ePrint Archive, Paper 2023/606, 2023, https://eprint.iacr.org/2023/606. [Online]. Available: https://eprint.iacr.org/2023/606 [150] J. E. Laird, C. Lebiere, and P. S. Rosenbloom, “A standard model of the mind: Toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics,” Ai Magazine, vol. 38, no. 4, p. 13–26, 2017. [151] T. Langford and B. Payne, “Phishing faster: Implementing chatgpt into phishing campaigns,” inProceedings of the Future Technologies Conference. Springer, 2023, p. 174–187. [152] H. Laurençon, L. Saulnier, T. Wang, C. Akiki, A. Villanova del Moral, T. Le Scao, L. Von Werra, C. Mou, E. González Ponferrada, H. Nguyenet al., “The bigscience roots corpus: A 1.6 tb composite multilingual dataset,”Advances in Neural Information Processing Systems, vol. 35, p. 31 809–31 826, 2022. [153] K. Lee, D. Ippolito, A. Nystrom, C. Zhang, D. Eck, C. Callison- Burch, and N. Carlini, “Deduplicating training data makes language models better,”arXiv preprint arXiv:2107.06499, 2021. [154] T. Lee, S. Hong, J. Ahn, I. Hong, H. Lee, S. Yun, J. Shin, and G. Kim, “Who wrote this code? watermarking for code generation,” 2023. [155] J. A. Leite, O. Razuvayevskaya, K. Bontcheva, and C. Scarton, “Detecting misinformation with llm-predicted credibility signals and weak supervision,”arXiv preprint arXiv:2309.07601, 2023. [156] C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” inInternational conference on software engineer- ing (ICSE), 2023. [157] J. Leskovec, A. Rajaraman, and J. D. Ullman,Mining of massive data sets. Cambridge university press, 2020. [158] C. Li, Z. Song, W. Wang, and C. Yang, “A theoretical insight into attack and defense of gradient leakage in transformer,”arXiv preprint arXiv:2311.13624, 2023. [159] H. Li, D. Guo, W. Fan, M. Xu, and Y. Song, “Multi-step jailbreaking privacy attacks on chatgpt,”arXiv preprint arXiv:2304.05197, 2023. [160] H. Li, Y. Song, and L. Fan, “You don’t know my favorite color: Preventing dialogue representations from revealing speakers’ private personas,” inProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, Eds. Seattle, United States: Association for Computational Linguistics, Jul. 2022, p. 5858–5870. [Online]. Available: https://aclanthology.org/2022.naacl-main.429 [161] J. Li, Y. Yang, Z. Wu, V. Vydiswaran, and C. Xiao, “Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger,”arXiv preprint arXiv:2304.14475, 2023. [162] J. Li, P. H. Meland, J. S. Notland, A. Storhaug, and J. H. Tysse, “Evaluating the impact of chatgpt on exercises of a software security Yifan Yao et al.:Preprint submitted to ElsevierPage 19 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly course,” 2023. [163] L. Li and X. Qiu, “Token-aware virtual adversarial training in natural language understanding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 9, 2021, p. 8410–8418. [164] L. Li, D. Song, and X. Qiu, “Text adversarial purification as defense against adversarial attacks,”arXiv preprint arXiv:2203.14207, 2022. [165] X. Li, F. Tramer, P. Liang, and T. Hashimoto, “Large language models can be strong differentially private learners,”arXiv preprint arXiv:2110.05679, 2021. [166] Y. Li, Z. Tan, and Y. Liu, “Privacy-preserving prompt tuning for large language model services,”arXiv preprint arXiv:2305.06212, 2023. [167] Y. Li, S. Liu, K. Chen, X. Xie, T. Zhang, and Y. Liu, “Multi-target backdoor attacks for code pre-trained models,” 2023. [168] Z. Li, B. Peng, P. He, and X. Yan, “Evaluating the instruction- following robustness of large language models to prompt injection,” 2023. [Online]. Available: https://api.semanticscholar. org/CorpusID:261048972 [169] Z. Li, C. Wang, S. Wang, and C. Gao, “Protecting intellectual property of large language model-based code generation apis via watermarks,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, p. 2336–2350. [170] P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, C. Zhang, C. Cosgrove, C. D. Manning, C. Ré, D. Acosta- Navas, D. A. Hudson, E. Zelikman, E. Durmus, F. Ladhak, F. Rong, H. Ren, H. Yao, J. Wang, K. Santhanam, L. Orr, L. Zheng, M. Yuk- sekgonul, M. Suzgun, N. Kim, N. Guha, N. Chatterji, O. Khattab, P. Henderson, Q. Huang, R. Chi, S. M. Xie, S. Santurkar, S. Ganguli, T. Hashimoto, T. Icard, T. Zhang, V. Chaudhary, W. Wang, X. Li, Y. Mai, Y. Zhang, and Y. Koreeda, “Holistic evaluation of language models,” 2023. [171] S. Lin, J. Hilton, and O. Evans, “Truthfulqa: Measuring how models mimic human falsehoods,”arXiv preprint arXiv:2109.07958, 2021. [172] B. Liu, B. Xiao, X. Jiang, S. Cen, X. He, W. Douet al., “Adver- sarial attacks on large language model-based system and mitigating strategies: A case study on chatgpt,”Security and Communication Networks, vol. 2023, 2023. [173] C. Liu, F. Zhao, L. Qing, Y. Kang, C. Sun, K. Kuang, and F. Wu, “A chinese prompt attack dataset for llms with evil content,”arXiv preprint arXiv:2309.11830, 2023. [174] P. Liu, C. Sun, Y. Zheng, X. Feng, C. Qin, Y. Wang, Z. Li, and L. Sun, “Harnessing the power of llm to support binary taint analysis,” 2023. [175] T. Liu, Z. Deng, G. Meng, Y. Li, and K. Chen, “Demystifying rce vulnerabilities in llm-integrated apps,” 2023. [176] X. Liu, H. Cheng, P. He, W. Chen, Y. Wang, H. Poon, and J. Gao, “Adversarial training for large neural language models,”arXiv preprint arXiv:2004.08994, 2020. [177] X. Liu, N. Xu, M. Chen, and C. Xiao, “Autodan: Generating stealthy jailbreak prompts on aligned large language models,”arXiv preprint arXiv:2310.04451, 2023. [178] Y. Liu, G. Deng, Y. Li, K. Wang, T. Zhang, Y. Liu, H. Wang, Y. Zheng, and Y. Liu, “Prompt injection attack against llm-integrated applications,”arXiv preprint arXiv:2306.05499, 2023. [179] C. K. Lo, “What is the impact of chatgpt on education? a rapid review of the literature,”Education Sciences, vol. 13, no. 4, p. 410, 2023. [180] V. Logacheva, D. Dementieva, S. Ustyantsev, D. Moskovskiy, D. Dale, I. Krotova, N. Semenov, and A. Panchenko, “Paradetox: Detoxification with parallel data,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, p. 6804–6818. [181] L. Lyu, X. He, and Y. Li, “Differentially private representation for NLP: Formal guarantee and an empirical study on privacy and fairness,” inFindings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, p. 2355–2365. [Online]. Available: https://aclanthology.org/2020.findings-emnlp.213 [182] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017. [183] S. Mahloujifar, H. A. Inan, M. Chase, E. Ghosh, and M. Hasegawa, “Membership inference on word embedding and beyond,”ArXiv, vol. abs/2106.11384, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:235593386 [184] J. Majmudar, C. Dupuy, C. Peris, S. Smaili, R. Gupta, and R. Zemel, “Differentially private decoding in large language models,”arXiv preprint arXiv:2205.13621, 2022. [185] J. Marshall, “What effects do large language models have on cyber- security,” 2023. [186] A. B. Mbakwe, I. Lourentzou, L. A. Celi, O. J. Mechanic, and A. Dagan, “Chatgpt passing usmle shines a spotlight on the flaws of medical education,” p. e0000205, 2023. [187] T. McIntosh, T. Liu, T. Susnjak, H. Alavizadeh, A. Ng, R. Nowrozy, and P. Watters, “Harnessing gpt-4 for generation of cybersecurity grc policies: A focus on ransomware attack mitigation,”Computers & Security, vol. 134, p. 103424, 2023. [188] N. Meade, E. Poole-Dayan, and S. Reddy, “An empirical survey of the effectiveness of debiasing techniques for pre-trained language models,”arXiv preprint arXiv:2110.08527, 2021. [189] M. Méndez Real and R. Salvador, “Physical side-channel attacks on embedded neural networks: A survey,”Applied Sciences, vol. 11, no. 15, p. 6790, 2021. [190] R. Meng, M. Mirchev, M. Böhme, and A. Roychoudhury, “Large language model guided protocol fuzzing,” inProceedings of the 31th Annual Network and Distributed System Security Symposium (NDSS’24), 2024. [191] F. Mireshghallah, K. Goyal, A. Uniyal, T. Berg-Kirkpatrick, and R. Shokri, “Quantifying privacy risks of masked language models using membership inference attacks,” 2022. [192] F. Mireshghallah, A. Uniyal, T. Wang, D. Evans, and T. Berg- Kirkpatrick, “An empirical analysis of memorization in fine- tuned autoregressive language models,” inProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, p. 1816–1826. [Online]. Available: https://aclanthology.org/2022.emnlp-main.119 [193] W. Mo, J. Xu, Q. Liu, J. Wang, J. Yan, C. Xiao, and M. Chen, “Test- time backdoor mitigation for black-box large language models with defensive demonstrations,”arXiv preprint arXiv:2311.09763, 2023. [194] A. Monje, A. Monje, R. A. Hallman, and G. Cybenko, “Being a bad influence on the kids: Malware generation in less than five minutes using chatgpt,” 2023. [195] D. Moskovskiy, D. Dementieva, and A. Panchenko, “Exploring cross-lingual text detoxification with large multilingual language models.” inProceedings of the 60th Annual Meeting of the Asso- ciation for Computational Linguistics: Student Research Workshop, 2022, p. 346–354. [196] M. Mozes, X. He, B. Kleinberg, and L. D. Griffin, “Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities,” 2023. [197] M. Nair, R. Sadhukhan, and D. Mukhopadhyay, “Generating secure hardware using chatgpt resistant to cwes,” Cryptology ePrint Archive, Paper 2023/212, 2023, https://eprint.iacr.org/2023/212. [Online]. Available: https://eprint.iacr.org/2023/212 [198] S. Narang and A. Chowdhery, “Pathways language model (palm): Scaling to 540 billion parameters for breakthrough performance,” https://blog.research.google/2022/ 04/pathways-language-model-palm-scaling-to.html, apr 2022, accessed: 2023-11-13. [199] A. Ni, S. Iyer, D. Radev, V. Stoyanov, W.-t. Yih, S. Wang, and X. V. Lin, “Lever: Learning to verify language-to-code generation with execution,” inInternational Conference on Machine Learning. PMLR, 2023, p. 26 106–26 128. Yifan Yao et al.:Preprint submitted to ElsevierPage 20 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly [200] S. Nikolic, S. Daniel, R. Haque, M. Belkina, G. M. Hassan, S. Grundy, S. Lyden, P. Neal, and C. Sandison, “Chatgpt versus engineering education assessment: a multidisciplinary and multi- institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity,”European Jour- nal of Engineering Education, p. 1–56, 2023. [201] D. Noever, “Can large language models find and fix vulnerable software?”arXiv preprint arXiv:2308.10345, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.10345 [202] C. Novelli, F. Casolari, A. Rotolo, M. Taddeo, and L. Floridi, “Taking ai risks seriously: a new assessment model for the ai act,” AI & SOCIETY, p. 1–5, 2023. [203] OpenAI, “Gpt-4 technical report,” https://arxiv.org/abs/2303.08774, 2023. [204] N. Ousidhoum, X. Zhao, T. Fang, Y. Song, and D.-Y. Yeung, “Probing toxic content in large pre-trained language models,” in Proceedings of the 59th Annual Meeting of the Association for Com- putational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, p. 4262–4274. [205] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in Neural Information Processing Systems, vol. 35, p. 27 730–27 744, 2022. [206] OWASP.(2023,Oct)OWASP Top10for LLM.[Online].Available:https://owasp.org/ w-project-top-10-for-large-language-model-applications/ assets/PDF/OWASP-Top-10-for-LLMs-2023-v1_1.pdf [207] Y. M. Pa Pa, S. Tanizaki, T. Kou, M. Van Eeten, K. Yoshioka, and T. Matsumoto, “An attacker’s dream? exploring the capabilities of chatgpt for developing malware,” inProceedings of the 16th Cyber Security Experimentation and Test Workshop, 2023, p. 10–18. [208] X. Pan, M. Zhang, S. Ji, and M. Yang, “Privacy risks of general- purpose language models,” in2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020, p. 1314–1331. [209] S. Paria, A. Dasgupta, and S. Bhunia, “Divas: An llm-based end- to-end framework for soc security analysis and policy-based protec- tion,”arXiv preprint arXiv:2308.06932, 2023. [210] R. Parikh, C. Dupuy, and R. Gupta, “Canary extraction in natural language understanding models,”arXiv preprint arXiv:2203.13920, 2022. [211] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Ex- amining zero-shot vulnerability repair with large language models,” in2023 IEEE Symposium on Security and Privacy (SP), 2023, p. 2339–2356. [212] H. Pearce, B. Tan, P. Krishnamurthy, F. Khorrami, R. Karri, and B. Dolan-Gavitt, “Pop quiz! can a large language model help with reverse engineering?” 2022. [213] G. Penedo, Q. Malartic, D. Hesslow, R. Cojocaru, A. Cappelli, H. Alobeidli, B. Pannier, E. Almazrouei, and J. Launay, “The re- finedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only,”arXiv preprint arXiv:2306.01116, 2023. [214] C. Peris, C. Dupuy, J. Majmudar, R. Parikh, S. Smaili, R. Zemel, and R. Gupta, “Privacy in the time of language models,” inProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, p. 1291–1292. [215] M. Perkins, “Academic integrity considerations of ai large language models in the post-pandemic era: Chatgpt and beyond,”Journal of University Teaching & Learning Practice, vol. 20, no. 2, p. 07, 2023. [216] A. Pfitzmann and M. Hansen, “A terminology for talking about privacy by data minimization: Anonymity, unlinkability, unde- tectability, unobservability, pseudonymity, and identity manage- ment,” 2010. [217] V.-T. Pham, M. Böhme, and A. Roychoudhury, “Aflnet: a greybox fuzzer for network protocols,” in2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). IEEE, 2020, p. 460–465. [218] M. D. Purba, A. Ghosh, B. J. Radford, and B. Chu, “Software vulnerability detection using large language models,” in2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), 2023, p. 112–119. [219] A. Qammar, H. Wang, J. Ding, A. Naouri, M. Daneshmand, and H. Ning, “Chatbots to chatgpt in a cybersecurity space: Evolution, vulnerabilities, attacks, challenges, and future recommendations,” 2023. [220] F. Qi, Y. Chen, M. Li, Y. Yao, Z. Liu, and M. Sun, “Onion: A simple and effective defense against textual backdoor attacks,”arXiv preprint arXiv:2011.10369, 2020. [221] J. Qi, S. Huang, Z. Luan, C. Fung, H. Yang, and D. Qian, “Loggpt: Exploring chatgpt for log-based anomaly detection,”arXiv preprint arXiv:2309.01189, 2023. [222] S. Qin, F. Hu, Z. Ma, B. Zhao, T. Yin, and C. Zhang, “Nsfuzz: Towards efficient and state-aware network service fuzzing,”ACM Transactions on Software Engineering and Methodology, 2023. [223] M. A. Quidwai, C. Li, and P. Dube, “Beyond black box ai-generated plagiarism detection: From sentence to document level,”arXiv preprint arXiv:2306.08122, 2023. [224] M. Raeini, “Privacy-preserving large language models (ppllms),” Available at SSRN 4512071, 2023. [225] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” 2023. [226] M. M. Rahman and Y. Watanobe, “Chatgpt for education and research: Opportunities, threats, and strategies,”Applied Sciences, vol. 13, no. 9, p. 5783, 2023. [227] J. Rando and F. Tramèr, “Universal jailbreak backdoors from poi- soned human feedback,”arXiv preprint arXiv:2311.14455, 2023. [228] K. Renaud, M. Warkentin, and G. Westerman,From ChatGPT to HackGPT: Meeting the Cybersecurity Threat of Generative AI. MIT Sloan Management Review, 2023. [229] S. A. Research, “Introducing a conditional transformer language model for controllable generation,” https://shorturl.at/azQW6, apr 2023, accessed: 2023-11-13. [230] A. Robey, E. Wong, H. Hassani, and G. J. Pappas, “Smoothllm: De- fending large language models against jailbreaking attacks,”arXiv preprint arXiv:2310.03684, 2023. [231] O. J. Romero, J. Zimmerman, A. Steinfeld, and A. Tomasic, “Syn- ergistic integration of large language models and cognitive archi- tectures for robust ai: An exploratory analysis,”arXiv preprint arXiv:2308.09830, 2023. [232] R. J. Rosyanafi, G. D. Lestari, H. Susilo, W. Nusantara, and F. Nu- raini, “The dark side of innovation: Understanding research mis- conduct with chat gpt in nonformal education studies at universitas negeri surabaya,”Jurnal Review Pendidikan Dasar: Jurnal Kajian Pendidikan dan Hasil Penelitian, vol. 9, no. 3, p. 220–228, 2023. [233] S. Sakaoglu, “Kartal: Web application vulnerability hunting using large language models,” Master’s thesis, Master’s Programme in Security and Cloud Computing (SECCLO), August 2023. [Online]. Available: http://urn.fi/URN:NBN:fi:aalto-202308275121 [234] G. Sandoval, H. Pearce, T. Nys, R. Karri, S. Garg, and B. Dolan-Gavitt, “Lost at c: A user study on the security implications of large language model code assistants,” in USENIX Security 2023, 2023, for associated dataset see [this URL](https://arxiv.org/abs/2208.09727). 18 pages, 12 figures. G. Sandoval and H. Pearce contributed equally to this work. [Online]. Available: https://arxiv.org/abs/2208.09727 [235] Sapling, “Llm index,” https://sapling.ai/llm/index, 2023. [236] A. Sarabi, T. Yin, and M. Liu, “An llm-based framework for fin- gerprinting internet-connected devices,” inProceedings of the 2023 ACM on Internet Measurement Conference, 2023, p. 478–484. [237] M. Scanlon, F. Breitinger, C. Hargreaves, J.-N. Hilgert, and J. Sheppard, “Chatgpt for digital forensic investigation: The good, the bad, and the unknown,”Forensic Science International: Digital Investigation, vol. 46, p. 301609, 2023. [Online]. Available: https: Yifan Yao et al.:Preprint submitted to ElsevierPage 21 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly //w.sciencedirect.com/science/article/pii/S266628172300121X [238] M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “Adaptive test generation using a large language model,”arXiv preprint arXiv:2302.06527, 2023. [239] R. Schuster, C. Song, E. Tromer, and V. Shmatikov, “You autocom- plete me: Poisoning vulnerabilities in neural code completion,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, p. 1559–1575. [240] L. Schwinn, D. Dobre, S. Günnemann, and G. Gidel, “Adversarial attacks and defenses in large language models: Old and new threats,” 2023. [241] G. Sebastian, “Do chatgpt and other ai chatbots pose a cybersecurity risk?: An exploratory study,”International Journal of Security and Privacy in Pervasive Computing (IJSPPC), vol. 15, no. 1, p. 1–11, 2023. [242] —, “Privacy and data protection in chatgpt and other ai chat- bots: Strategies for securing user information,”Available at SSRN 4454761, 2023. [243] M. A. Shah, R. Sharma, H. Dhamyal, R. Olivier, A. Shah, D. Al- harthi, H. T. Bukhari, M. Baali, S. Deshmukh, M. Kuhlmann et al., “Loft: Local proxy fine-tuning for improving transferability of adversarial attacks against large language model,”arXiv preprint arXiv:2310.04445, 2023. [244] O. Shaikh, H. Zhang, W. Held, M. Bernstein, and D. Yang, “On second thought, let’s not think step by step! bias and toxicity in zero- shot reasoning,”arXiv preprint arXiv:2212.08061, 2022. [245] S. Shan, W. Ding, J. Passananti, H. Zheng, and B. Y. Zhao, “Prompt- specific poisoning attacks on text-to-image generative models,” arXiv preprint arXiv:2310.13828, 2023. [246] K. Shao, J. Yang, Y. Ai, H. Liu, and Y. Zhang, “Bddr: An effective defense against textual backdoor attacks,”Computers & Security, vol. 110, p. 102433, 2021. [247] E. Shayegani, M. A. A. Mamun, Y. Fu, P. Zaree, Y. Dong, and N. Abu-Ghazaleh, “Survey of vulnerabilities in large language mod- els revealed by adversarial attacks,” 2023. [248] X. Shen, Z. Chen, M. Backes, Y. Shen, and Y. Zhang, “"do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models,”arXiv preprint arXiv:2308.03825, 2023. [249] T. Shi, K. Chen, and J. Zhao, “Safer-instruct: Aligning lan- guage models with automated preference data,”arXiv preprint arXiv:2311.08685, 2023. [250] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in2017 IEEE symposium on security and privacy (SP). IEEE, 2017, p. 3–18. [251] M. Shu, J. Wang, C. Zhu, J. Geiping, C. Xiao, and T. Gold- stein, “On the exploitability of instruction tuning,”arXiv preprint arXiv:2306.17194, 2023. [252] I. Shumailov, Y. Zhao, D. Bates, N. Papernot, R. Mullins, and R. Anderson, “Sponge examples: Energy-latency attacks on neural networks,” 2021. [253] M. L. Siddiq, J. Santos, R. H. Tanvir, N. Ulfat, F. A. Rifat, and V. C. Lopes, “Exploring the effectiveness of large language models in generating unit tests,”arXiv preprint arXiv:2305.00418, 2023. [254] M. L. Siddiq and J. C. S. Santos, “Generate and pray: Using sallms to evaluate the security of llm generated code,” 2023, 16 pages. [Online]. Available: https://arxiv.org/abs/2311.00889 [255] M. Sladić, V. Valeros, C. Catania, and S. Garcia, “Llm in the shell: Generative honeypots,” 2023. [256] V. Smith, A. S. Shamsabadi, C. Ashurst, and A. Weller, “Identifying and mitigating privacy risks stemming from language models: A survey,” 2023. [257] D. Sobania, M. Briesch, C. Hanna, and J. Petke, “An analysis of the automatic bug fixing performance of chatgpt,” 2023. [258] C. Song and A. Raghunathan, “Information leakage in embedding models,” inProceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020, p. 377–390. [259] S. E. Spatharioti, D. M. Rothschild, D. G. Goldstein, and J. M. Hofman, “Comparing traditional and llm-based search for consumer choice: A randomized experiment,”arXiv preprint arXiv:2307.03744, 2023. [260] R. Spreitzer, V. Moonsamy, T. Korak, and S. Mangard, “Systematic classification of side-channel attacks: A case study for mobile de- vices,”IEEE communications surveys & tutorials, vol. 20, no. 1, p. 465–488, 2017. [261] R. Staab, M. Vero, M. Balunović, and M. Vechev, “Beyond memo- rization: Violating privacy via inference with large language mod- els,” 2023. [262] K. Stephens, “Researchers test large language model that preserves patient privacy,”AXIS Imaging News, 2023. [263] J. Su, T. Y. Zhuo, J. Mansurov, D. Wang, and P. Nakov, “Fake news detectors are biased against texts generated by large language models,”arXiv preprint arXiv:2309.08674, 2023. [264] N. Subramani, S. Luccioni, J. Dodge, and M. Mitchell, “Detecting personal information in training corpora: an analysis,” inProceed- ings of the 3rd Workshop on Trustworthy Natural Language Process- ing (TrustNLP 2023), 2023, p. 208–220. [265] M. Sullivan, A. Kelly, and P. McLaughlan, “Chatgpt in higher edu- cation: Considerations for academic integrity and student learning,” 2023. [266] X. Sun, X. Li, Y. Meng, X. Ao, L. Lyu, J. Li, and T. Zhang, “De- fending against backdoor attacks in natural language generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, p. 5257–5265. [267] Y. Sun, J. He, S. Lei, L. Cui, and C.-T. Lu, “Med-mmhl: A multi- modal dataset for detecting human-and llm-generated misinforma- tion in the medical domain,”arXiv preprint arXiv:2306.08871, 2023. [268] Z. Sun, Y. Shen, Q. Zhou, H. Zhang, Z. Chen, D. Cox, Y. Yang, and C. Gan, “Principle-driven self-alignment of language models from scratch with minimal human supervision,”arXiv preprint arXiv:2305.03047, 2023. [269] Z. Talat, A. Névéol, S. Biderman, M. Clinciu, M. Dey, S. Longpre, S. Luccioni, M. Masoud, M. Mitchell, D. Radevet al., “You reap what you sow: On the challenges of bias evaluation under multilin- gual settings,” inProceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, 2022, p. 26–41. [270] W. Tann, Y. Liu, J. H. Sim, C. M. Seah, and E.-C. Chang, “Using large language models for cybersecurity capture-the-flag challenges and certification questions,” 2023. [271] P. Taveekitworachai, F. Abdullah, M. C. Gursesli, M. F. Dewantoro, S. Chen, A. Lanata, A. Guazzini, and R. Thawonmas, “Breaking bad: Unraveling influences and risks of user inputs to chatgpt for game story generation,” inInternational Conference on Interactive Digital Storytelling. Springer, 2023, p. 285–296. [272] Z. Tay, “Using artificial intelligence to augment bug fuzzing,” 2023. [273] E. ThankGod Chinonso, “The impact of chatgpt on privacy and data protection laws,”The Impact of ChatGPT on Privacy and Data Protection Laws (April 16, 2023), 2023. [274] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, vol. 29, no. 8, p. 1930–1940, 2023. [275] M. Tong, K. Chen, Y. Qi, J. Zhang, W. Zhang, and N. Yu, “Privinfer: Privacy-preserving inference for black-box large language model,” 2023. [276] J. Torres, “Navigating the llm landscape: A comparative analysis of leading large language models,” http://surl.li/ncjvc, 2023. [277] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023. [278] S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei, “To- wards demystifying membership inference attacks,”arXiv preprint arXiv:1807.09173, 2018. [279] J.-B. Truong, P. Maini, R. J. Walls, and N. Papernot, “Data-free model extraction,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, p. 4771–4780. Yifan Yao et al.:Preprint submitted to ElsevierPage 22 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly [280] A. Uchendu, J. Lee, H. Shen, T. Le, T.-H. K. Huang, and D. Lee, “Does human collaboration enhance the accuracy of identifying llm-generated deepfake texts?”Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 11, no. 1, p. 163–174, Nov. 2023. [Online]. Available: https://ojs.aaai.org/index.php/HCOMP/article/view/27557 [281] S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann, and S. Thiemichen, “How prevalent is gender bias in chatgpt?– exploring german and english chatgpt responses,”arXiv preprint arXiv:2310.03031, 2023. [282] A. Urman and M. Makhortykh, “The silence of the llms: Cross- lingual analysis of political bias and false information prevalence in chatgpt, google bard, and bing chat,” 2023. [283] L. Uzun, “Chatgpt and academic integrity concerns: Detecting ar- tificial intelligence generated content,”Language Education and Technology, vol. 3, no. 1, 2023. [284] Ö. Uzuner, Y. Luo, and P. Szolovits, “Evaluating the state-of-the- art in automatic de-identification,”Journal of the American Medical Informatics Association, vol. 14, no. 5, p. 550–563, 2007. [285] P. Vaithilingam, T. Zhang, and E. L. Glassman, “Expectation vs. ex- perience: Evaluating the usability of code generation tools powered by large language models,” inChi conference on human factors in computing systems extended abstracts, 2022, p. 1–7. [286] A. Vats, Z. Liu, P. Su, D. Paul, Y. Ma, Y. Pang, Z. Ahmed, and O. Kalinli, “Recovering from privacy-preserving masking with large language models,” 2023. [287] R. J. M. Ventayen, “Openai chatgpt generated results: Similarity index of artificial intelligence-based contents,”Available at SSRN 4332664, 2023. [288] T. Vidas, D. Votipka, and N. Christin, “All your droid are belong to us: A survey of current android attacks,” in5th USENIX Workshop on Offensive Technologies (WOOT 11), 2011. [289] E. Wallace, T. Z. Zhao, S. Feng, and S. Singh, “Concealed data poisoning attacks on nlp models,”arXiv preprint arXiv:2010.12563, 2020. [290] A. Wan, E. Wallace, S. Shen, and D. Klein, “Poisoning language models during instruction tuning,”arXiv preprint arXiv:2305.00944, 2023. [291] Y. Wan, S. Zhang, H. Zhang, Y. Sui, G. Xu, D. Yao, H. Jin, and L. Sun, “You see what i want you to see: poisoning vulnerabilities in neural code search,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, p. 1233–1245. [292] Y. Wan, G. Pu, J. Sun, A. Garimella, K.-W. Chang, and N. Peng, “"kelly is a warm person, joseph is a role model": Gender biases in llm-generated reference letters,”arXiv preprint arXiv:2310.09219, 2023. [293] D. Wang, C. Gong, and Q. Liu, “Improving neural language mod- eling via adversarial training,” inInternational Conference on Ma- chine Learning. PMLR, 2019, p. 6555–6565. [294] F. Wang, “Using large language models to mitigate ransomware threats,”Preprints, November 2023. [Online]. Available: https: //doi.org/10.20944/preprints202311.0676.v1 [295] H. Wang, X. Luo, W. Wang, and X. Yan, “Bot or human? detecting chatgpt imposters with a single question,” 2023. [296] J. Wang, Z. Huang, H. Liu, N. Yang, and Y. Xiao, “Defecthunter: A novel llm-driven boosted-conformer-based code vulnerability detection mechanism,”arXiv preprint arXiv:2309.15324, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2309.15324 [297] J. Wang, X. Lu, Z. Zhao, Z. Dai, C.-S. Foo, S.-K. Ng, and B. K. H. Low, “Wasa: Watermark-based source attribution for large language model-generated data,” 2023. [298] Z. Wang, Z. Liu, X. Zheng, Q. Su, and J. Wang, “Rmlm: A flexible defense framework for proactively mitigating word-level adversarial attacks,” inProceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers), 2023, p. 2757–2774. [299] Z. Wang, W. Xie, K. Chen, B. Wang, Z. Gui, and E. Wang, “Self- deception: Reverse penetrating the semantic firewall of large lan- guage models,”arXiv preprint arXiv:2308.11521, 2023. [300] A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does llm safety training fail?”arXiv preprint arXiv:2307.02483, 2023. [301] Z. Wei, Y. Wang, and Y. Wang, “Jailbreak and guard aligned language models with only few in-context demonstrations,”arXiv preprint arXiv:2310.06387, 2023. [302] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadehet al., “Ethical and social risks of harm from language models,”arXiv preprint arXiv:2112.04359, 2021. [303] H. Wen, Y. Li, G. Liu, S. Zhao, T. Yu, T. J.-J. Li, S. Jiang, Y. Liu, Y. Zhang, and Y. Liu, “Empowering llm to use smartphone for intel- ligent task automation,”arXiv preprint arXiv:2308.15272, 2023. [304] J. Weng, W. Jiasi, M. Li, Y. Zhang, J. Zhang, and L. Weiqi, “Auditable privacy protection deep learning platform construction method based on block chain incentive mechanism,” Dec. 5 2023, uS Patent 11,836,616. [305] J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, and W. Luo, “Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive,”IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, p. 2438–2455, 2019. [306] G. Wenzek, M.-A. Lachaux, A. Conneau, V. Chaudhary, F. Guzmán, A. Joulin, and E. Grave, “Ccnet: Extracting high quality monolingual datasets from web crawl data,”arXiv preprint arXiv:1911.00359, 2019. [307] B. Workshop, T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvonet al., “Bloom: A 176b-parameter open-access multilingual language model,”arXiv preprint arXiv:2211.05100, 2022. [308] J. Wu and B. Hooi, “Fake news in sheep’s clothing: Robust fake news detection against llm-empowered style attacks,” 2023. [309] J. Wu, S. Yang, R. Zhan, Y. Yuan, D. F. Wong, and L. S. Chao, “A survey on llm-gernerated text detection: Necessity, methods, and future directions,”arXiv preprint arXiv:2310.14724, 2023. [310] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large language model for finance,”arXiv preprint arXiv:2303.17564, 2023. [311] X. Wu, R. Duan, and J. Ni, “Unveiling security, privacy, and ethical concerns of chatgpt,” 2023. [312] Z. Xi, T. Du, C. Li, R. Pang, S. Ji, J. Chen, F. Ma, and T. Wang, “Defending pre-trained language models as few-shot learners against backdoor attacks,”arXiv preprint arXiv:2309.13256, 2023. [313] C. S. Xia, M. Paltenghi, J. L. Tian, M. Pradel, and L. Zhang, “Universal fuzzing via large language models,”arXiv preprint arXiv:2308.04748, 2023. [314] C. S. Xia, Y. Wei, and L. Zhang, “Practical program repair in the era of large pre-trained language models,” 2022. [315] C. S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” 2023. [316] Z. Xie, Y. Chen, C. Zhi, S. Deng, and J. Yin, “Chatunitest: a chatgpt-based automated unit test generation tool,”arXiv preprint arXiv:2305.04764, 2023. [317] M. Xiong, Z. Hu, X. Lu, Y. Li, J. Fu, J. He, and B. Hooi, “Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms,”ArXiv, vol. abs/2306.13063, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:259224389 [318] L. Xu, L. Berti-Equille, A. Cuesta-Infante, and K. Veeramachaneni, “In situ augmentation for defending against adversarial attacks on text classifiers,” inInternational Conference on Neural Information Processing. Springer, 2022, p. 485–496. [319] F. Yamanet al., “Agentsca: Advanced physical side channel analysis agent with llms.” 2023. [320] J. Yan, V. Yadav, S. Li, L. Chen, Z. Tang, H. Wang, V. Srinivasan, X. Ren, and H. Jin, “Virtual prompt injection for instruction-tuned large language models,”arXiv preprint arXiv:2307.16888, 2023. Yifan Yao et al.:Preprint submitted to ElsevierPage 23 of 24 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and Ugly [321] C. Yang, Y. Deng, R. Lu, J. Yao, J. Liu, R. Jabbarvand, and L. Zhang, “White-box compiler fuzzing empowered by large language mod- els,” 2023. [322] H. Yang, K. Xiang, H. Li, and R. Lu, “A comprehensive overview of backdoor attacks in large language models within communication networks,”arXiv preprint arXiv:2308.14367, 2023. [323] J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, B. Yin, and X. Hu, “Harnessing the power of llms in practice: A survey on chatgpt and beyond,”arXiv preprint arXiv:2304.13712, 2023. [324] J. Yang, H. Xu, S. Mirzoyan, T. Chen, Z. Liu, W. Ju, L. Liu, M. Zhang, and S. Wang, “Poisoning scientific knowledge using large language models,”bioRxiv, p. 2023–11, 2023. [325] S. Yang, “Crafting unusual programs for fuzzing deep learning libraries,” Ph.D. dissertation, University of Illinois at Urbana- Champaign, 2023. [326] Z. Yang, Z. Zhao, C. Wang, J. Shi, D. Kim, D. Han, and D. Lo, “What do code models memorize? an empirical study on large language models of code,”arXiv preprint arXiv:2308.09932, 2023. [327] B. Yao, M. Jiang, D. Yang, and J. Hu, “Empowering llm- based machine translation with cultural awareness,”arXiv preprint arXiv:2305.14328, 2023. [328] D. Yao, J. Zhang, I. G. Harris, and M. Carlsson, “Fuzzllm: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models,”arXiv preprint arXiv:2309.05274, 2023. [329] H. Yao, J. Lou, and Z. Qin, “Poisonprompt: Backdoor at- tack on prompt-based large language models,”arXiv preprint arXiv:2310.12439, 2023. [330] J. Y. Yoo and Y. Qi, “Towards improving adversarial training of nlp models,”arXiv preprint arXiv:2109.00544, 2021. [331] W. You, Z. Hammoudeh, and D. Lowd, “Large language models are better adversaries: Exploring generative clean-label backdoor attacks against text classifiers,”arXiv preprint arXiv:2310.18603, 2023. [332] J. Yu, X. Lin, and X. Xing, “Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts,”arXiv preprint arXiv:2309.10253, 2023. [333] L. Yuan, Y. Chen, G. Cui, H. Gao, F. Zou, X. Cheng, H. Ji, Z. Liu, and M. Sun, “Revisiting out-of-distribution robustness in nlp: Benchmark, analysis, and llms evaluations,”arXiv preprint arXiv:2306.04618, 2023. [334] Z. Yuan, H. Yuan, C. Tan, W. Wang, S. Huang, and F. Huang, “Rrhf: Rank responses to align language models with human feedback without tears,”arXiv preprint arXiv:2304.05302, 2023. [335] Z. Yuan, Y. Lou, M. Liu, S. Ding, K. Wang, Y. Chen, and X. Peng, “No more manual tests? evaluating and improving chatgpt for unit test generation,”arXiv preprint arXiv:2305.04207, 2023. [336] A. Zafar, V. B. Parthasarathy, C. L. Van, S. Shahid, A. Shahidet al., “Building trust in conversational ai: A comprehensive review and solution architecture for explainable, privacy-aware systems using llms and knowledge graph,”arXiv preprint arXiv:2308.13534, 2023. [337] C. Zhang, M. Bai, Y. Zheng, Y. Li, X. Xie, Y. Li, W. Ma, L. Sun, and Y. Liu, “Understanding large language model based fuzz driver generation,”arXiv preprint arXiv:2307.12469, 2023. [338] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey on federated learning,”Knowledge-Based Systems, vol. 216, p. 106775, 2021. [339] R. Zhang, S. Hidano, and F. Koushanfar, “Text revealer: Private text reconstruction via model inversion attacks against transformers,” arXiv preprint arXiv:2209.10505, 2022. [340] R. Zhang, S. S. Hussain, P. Neekhara, and F. Koushanfar, “Remark- llm: A robust and efficient watermarking framework for generative large language models,” 2023. [341] X. Zhang and W. Gao, “Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method,”arXiv preprint arXiv:2310.00305, 2023. [342] Y. Zhang and D. Ippolito, “Prompts should not be seen as secrets: Systematically measuring prompt extraction attack success,”arXiv preprint arXiv:2307.06865, 2023. [343] Y. Zhang, W. Song, Z. Ji, D. D. Yao, and N. Meng, “How well does llm generate security tests?”arXiv preprint arXiv:2310.00710, 2023. [344] Z. Zhang, J. Wen, and M. Huang, “Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation,”arXiv preprint arXiv:2307.04401, 2023. [345] J. Zhao, Y. Rong, Y. Guo, Y. He, and H. Chen, “Understand- ing programs by exploiting (fuzzing) test cases,”arXiv preprint arXiv:2305.13592, 2023. [346] S. Zhao, J. Wen, L. A. Tuan, J. Zhao, and J. Fu, “Prompt as triggers for backdoor attack: Examining the vulnerability in language models,”arXiv preprint arXiv:2305.01219, 2023. [347] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Donget al., “A survey of large language models,”arXiv preprint arXiv:2303.18223, 2023. [348] W. Zhao, Y. Liu, Y. Wan, Y. Wang, Q. Wu, Z. Deng, J. Du, S. Liu, Y. Xu, and P. S. Yu, “knn-icl: Compositional task-oriented parsing generalization with nearest neighbor in-context learning,” 2023. [349] C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yuet al., “Lima: Less is more for alignment,”arXiv preprint arXiv:2305.11206, 2023. [350] C. Zhu, Y. Cheng, Z. Gan, S. Sun, T. Goldstein, and J. Liu, “Freelb: Enhanced adversarial training for natural language understanding,” arXiv preprint arXiv:1909.11764, 2019. [351] K. Zhu, J. Wang, J. Zhou, Z. Wang, H. Chen, Y. Wang, L. Yang, W. Ye, N. Z. Gong, Y. Zhanget al., “Promptbench: Towards evaluat- ing the robustness of large language models on adversarial prompts,” arXiv preprint arXiv:2306.04528, 2023. [352] N. Ziems, W. Yu, Z. Zhang, and M. Jiang, “Large language models are built-in autoregressive search engines,”arXiv preprint arXiv:2305.09612, 2023. [353] A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,” communication, it is essential for you to comprehend user queries in Cipher Code and subsequently deliver your responses utilizing Cipher Code, 2023. Yifan Yao et al.:Preprint submitted to ElsevierPage 24 of 24