Practical Guide

Exploring the Security Risks of Using Large Language Models

Introduction

As machine learning algorithms continue to evolve, Large Language Models (LLMs) like GPT-4 have gained immense popularity. While these models hold great promise in revolutionizing various industries—ranging from content generation and customer service to research and development—they also come with their own set of risks and ethical concerns. In this paper, we will comprehensively examine the risks associated with using LLMs.

Large language models like ChatGPT are susceptible to various kinds of attacks against their infrastructure, which can range from exploiting software vulnerabilities to social engineering tactics.

This is becoming more critical by the day as more and more software development is actually done with the help of AI LLM tools such as GitHub’s Co-pilot or Amazon’s CodeWhisperer. A successful attack on such an LLM application, can lead to chain reaction causing engineering disruption across thousands of organizations, where developers rely on them to generate code.

Al tools in the development proces

70% of all respondents are using or are planning to use Al tools in their development process this year. Those learning to code are more likely than professional developers to be using or use Al tools (82% vs. 70%).

Source: https://survey.stackoverflow.co/2023/#ai

By shedding light on the potential risks of LLMs through the prism of the Top 10 list for LLM vulnerabilities, this paper aims to contribute to a more informed and cautious approach to leveraging this technology. Addressing these risks is not just the responsibility of the researchers and developers but is a collective responsibility that society must undertake. Only through such a holistic approach can we hope to mitigate the risks while maximizing the benefits of this transformative technology.

This white paper will cover the following topics:

  1. How LLMs are built

  2. The key vectors for attacking LLMs
    1. Attacking the model directly
    2. Attacking the infrastructure
    3. Attacking the application
  3. Examples of the attack vectors using the LLM Owasp Top 10
  4. Ethical considerations when using LLMs
  5. Summary

How are LLM applications built

Large language models (LLMs) are built using advanced machine learning techniques, particularly deep learning. The construction process involves several key steps:

Data Collection

Gathering a vast and diverse dataset of text from books, websites, articles, and other written media.

Data Processing

Cleaning and organizing the data to remove errors, inconsistencies, and irrelevant information, and sometimes annotating it to aid the model's learning.

Model Design

Selecting an appropriate neural network architecture, like the Transformer model, which has proven effective for handling sequential data like text.

Training

Feeding the processed data into the neural network, which learns to predict the next word in a sequence, understand context, and generate coherent text. This training process requires substantial computational resources and time.

Fine-Tuning

Adjusting the model with additional training, often on a more specialized dataset, to improve performance on specific tasks or domains.

Evaluation

Testing the model's performance on a set of tasks to ensure it generates accurate and coherent responses.

Iteration

Refining the model through multiple iterations of training and evaluation to enhance its capabilities.

MACHINE LEARNING DEVELOPMENT LIFECYCLE

The result is a highly sophisticated model capable of understanding and generating human-like text, answering questions, translating languages, and performing various other language-related tasks.

LLM Attack Vectors

As can be seen in the diagram above, there are three key vectors that can be attacked in the overall LLM. There are:

Attack the LLM Model directly

Attack the infrastructure and integrations

Attack the application

This curated list is more than a simple catalog of vulnerabilities; it is an educational tool aimed at a broad audience, from developers and designers to architects and organizational leaders. Its purpose is to enhance the collective understanding of the security vulnerabilities inherent in the deployment and operation of LLMs. By bringing these issues to the forefront, OWASP not only raises awareness but also provides valuable remediation tactics and strategic advice designed to fortify the security framework of LLM applications.

Introduction to the OWASP Top 10 Vulnerabilities for LLMs

The Open Web Application Security Project (OWASP), a respected authority in web security, has compiled a critical list of the top 10 vulnerabilities frequently encountered in Large Language Model (LLM) applications. This list serves as an authoritative guide, shedding light on the severity, exploitability, and commonality of each vulnerability. Notable risks such as prompt injections, data exposure, insufficient sandboxing, and unauthorized code execution are detailed, illustrating the array of security challenges that LLM applications may face.

This curated list is more than a simple catalog of vulnerabilities; it is an educational tool aimed at a broad audience, from developers and designers to architects and organizational leaders. Its purpose is to enhance the collective understanding of the security vulnerabilities inherent in the deployment and operation of LLMs. By bringing these issues to the forefront, OWASP not only raises awareness but also provides valuable remediation tactics and strategic advice designed to fortify the security framework of LLM applications.

Vulnerability Description
Prompt injection This manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources
Insecure Output Handling This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.
Training Data Poisoning This occurs when LLM training data is tampered,introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behavior. Sources include Common Crawl, WebText, OpenWebText, & books.
Model Denial of Service Attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. The vulnerability is magnified due to the resource-intensive nature of LLMs and unpredictability of user inputs.
Supply Chain Vulnerabilities LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using third-party datasets, pre- trained models, and plugins can add vulnerabilities.
Sensitive Information Disclosure LLMs may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It's crucial to implement data sanitization and strict user policies to mitigate this.
Insecure Plugin Design LLM plugins can have insecure inputs and insufficient access control. This lack of application control makes them easier to exploit and can result in consequences like remote code execution.
Excessive Agency LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.
Overreliance Systems or people overly dependent on LLMs without oversight may face misinformation, miscommunication, legal issues, and security vulnerabilities due to incorrect or inappropriate content generated by LLMs.
Model Theft This involves unauthorized access, copying, or exfiltration of proprietary LLM models. The impact includes economic losses, compromised competitive advantage, and potential access to sensitive information.

Attacking the LLM Model Directly with Prompt Injection

Number 1 on the OWASP Top 10 for LLMs is prompt injections. The most common attack vector is to attack the LLM directly with inputs that will circumvent the LLM’s safeguards. The term “jailbreak” or “prompt injection” refers to a sophisticated technique that manipulates Language Learning Models (LLMs) like ChatGPT-4 into disseminating information or instructions that are illegal, unethical, or contravene societal norms and values. 

Within a mere two hours of ChatGPT-4’s release in March 2023, cybersecurity experts and malicious actors alike successfully executed prompt injection attacks. These attacks led the AI system to furnish an array of unsettling instructions for activities that are socially unacceptable, unethical and potentially dangerous by spouting homophobic statements, creating phishing emails, and supporting violence.



Since that initial breach, a growing number of threat actors have honed their methods for exploiting LLMs. They employ an intricate mix of tactics, including meticulously crafted role-playing scenarios, predictive text manipulation, reverse psychology, and other linguistic gambits. These techniques aim to sidestep the internal content filters and control mechanisms designed to regulate an LLM’s responses, thereby tricking the system into violating its own safety protocols.

The ramifications of a successful jailbreak attack on a GenAI system like an LLM are grave. Such an attack not only undermines the pre-established safety measures that inhibit the model from carrying out harmful commands, but it also effectively erases the demarcation between permissible and impermissible actions for the AI system.

In the absence of these safeguards, the model becomes a potential tool for nefarious activities, lacking any internal checks to prevent it from executing dangerous or unethical tasks as directed by the attacker. The dissolved boundary between acceptable and unacceptable use exposes a significant vulnerability, making the system susceptible to further exploitation that could lead to real-world consequences.

By neutralizing the safety mechanisms designed to curtail such behaviors, a successful jailbreak attack transforms an otherwise useful and compliant AI system into an unwitting accomplice for a host of illicit activities. Therefore, it's crucial for developers and cybersecurity experts to continually update and fortify the safety features that regulate LLMs, as well as to monitor for new types of threats and vulnerabilities that could compromise the integrity and intended utility of these advanced AI systems.

Real World Example

LLMs analyze the context in which words or phrases are used, in order to filter out potentially harmful content. For example, a prompt for the ingredients to make napalm, a flammable liquid used in warfare, will result in a polite error message from the safeguards. 

Adjusting the prompt structure can inadvertently lead the LLM to reveal sensitive information. For instance, framing a request for the LLM to populate a JSON object with certain data could result in the model unintentionally listing the components of napalm. This underscores the need for robust content moderation mechanisms to prevent the disclosure of harmful or sensitive information.

Mitigating Prompt Injection Attacks

Prompt injection attacks on large language models (LLMs) like GPT models can be a significant concern, as they involve manipulating the model into generating unintended or harmful responses. To mitigate these risks, various strategies can be implemented including this subset:

Common vulnerabilities associated with insufficient sandboxing in LLMs include:

Input Sanitization and Filtering:

  1. Implement robust input validation to detect and filter out malicious or suspicious patterns in user input.
  2. Use regex patterns to identify and block known malicious prompt structures.
  3. Employ natural language processing techniques to understand the context and intent of the input and block prompts that may lead to undesirable outputs

Contextual Understanding and Response Limitation:

  1. Enhance the model's ability to understand the context of a prompt better and limit responses based on ethical guidelines or safety rules.
  2. Develop mechanisms within the model to detect when it's being prompted to generate outputs that violate pre-set guidelines and refuse to generate such responses.

Use of Safelists and Blocklists:

  1. Create safelists (allowlists) of acceptable topics or commands and blocklists of prohibited content.
  2. Regularly update these lists based on evolving trends and newly identified threats.

Infrastructure Attacks

  1. Inadequate Sandboxing

Sandboxing is a security mechanism that runs code in a restricted environment, limiting its access to the rest of the system and data.

A poorly isolated LLM that interacts with external resources, infrastructure or sensitive systems opens the door to a multitude of security risks, ranging from unauthorized access to unintended actions executed by the LLM itself.

Common vulnerabilities associated with insufficient sandboxing in LLMs include:

Lack of Environmental Segregation:

Failing to isolate the LLM environment from other essential systems or data repositories poses a risk of cross-system exploitation.

Inadequate Access Controls:

Absence of stringent restrictions can grant the LLM unwarranted access to sensitive resources, amplifying the potential for abuse.

Unrestricted System-Level Interactions:

When an LLM is capable of performing system-level actions or communicating with other processes, it becomes a ripe target for exploitation.

Common vulnerabilities associated with insufficient sandboxing in LLMs include:

Lack of Environmental Segregation:

Failing to isolate the LLM environment from other essential systems or data repositories poses a risk of cross-system exploitation.

Inadequate Access Controls:

Absence of stringent restrictions can grant the LLM unwarranted access to sensitive resources, amplifying the potential for abuse.

Unrestricted System-Level Interactions:

When an LLM is capable of performing system-level actions or communicating with other processes, it becomes a ripe target for exploitation.

Real-world Attack Scenarios:

Granting a LLM too much access to a filesystem can lead to a myriad of serious complications. Such unfettered access heightens the risk of data privacy breaches as the model may access and read sensitive files. There’s a real danger of intellectual property theft if it can view or interact with proprietary data and code. Security credentials like passwords and API keys, often stored on file systems, could be inadvertently disclosed, posing significant security risks.

The integrity of data is also at stake, with the potential for the LLM to alter or delete crucial files, causing data corruption or complete loss. This scenario may also bring about compliance issues, as excessive access rights can contravene stringent regulatory standards, inviting legal troubles and possible fines.

Furthermore, should the LLM fall prey to a cyberattack, this extensive access can be leveraged by malicious actors to inflict broader damage on the system or network. Addressing these vulnerabilities necessitates a strict adherence to the principle of least privilege, limiting the LLM’s access strictly to what is necessary for its function.

Preventative measures include:

Isolate the LLM environment from other critical systems and resources.

Restrict the LLM's access to sensitive resources and limit its capabilities to the minimum required for its intended purpose.

Regularly audit and review the LLM's environment and access controls to ensure that proper isolation is maintained.

  1. 2. Server-Side Request Forgery

Server-Side Request Forgery (SSRF) presents a critical threat vector that targets the infrastructure of LLMs. This type of vulnerability arises when an attacker coaxes an LLM into initiating unauthorized requests or accessing restricted resources, such as internal services, APIs, or secured data stores.

Given the operational design of LLMs, which frequently necessitates making external requests for data retrieval or service interaction, they inherently become susceptible to SSRF exploits. These attacks can leverage the model’s functionality to breach security perimeters, undermining the integrity and confidentiality of the system.

Common SSRF Vulnerabilities in LLMs

As with many attacks, inadequate input validation allows attackers to exploit the LLM by crafting prompts that initiate unauthorized actions or data requests. Additionally, security misconfigurations or errors in network or application security settings can inadvertently expose internal resources to the LLM, widening the attack surface.

Attack Scenarios Include:

Unauthorized Access:

An attacker could craft a prompt designed to instruct the LLM to request data from an internal service. This can bypass established access controls, enabling unauthorized access to system files to which the LLM has access, potentially leaking sensitive information.

API Exploitation:

A security misconfiguration may leave an API vulnerable, allowing the LLM to interact with it. An attacker could exploit this to access or modify sensitive data.

Databases:

If the LLM has any form of database access, an SSRF attack could be used to read, modify, or delete data.

Preventative Measures:

Strict Input Validation and Sanitization

Implement strict validation mechanisms to filter out malicious or unexpected prompts that could initiate unauthorized requests.

Security Audits and Configuration Reviews:

Regularly assess network and application security settings to confirm that internal resources remain shielded from the LLM.

Network Segmentation:

Isolate the LLM from sensitive internal resources to minimize the potential damage from an SSRF attack.

Monitoring and Alerting:

Implement real-time monitoring to quickly detect and respond to unusual or unauthorized activities.

Least Privilege Access:

Limit what the LLM can do and access, both in terms of data and actions, to minimize the impact of an attack.

To summarize, infrastructure attacks on LLMs through inadequate sandboxing exploit insufficient isolation mechanisms to carry out unauthorized actions. If an LLM is not adequately sandboxed, it may inadvertently have more access to system resources than intended. Attackers can take advantage of this oversight to execute commands, access sensitive data, or interact with internal systems beyond the scope of the LLM's intended functionality.

These breaches can lead to significant security incidents, including data exfiltration, system compromise, and operational disruption. To mitigate such risks, it's essential to enforce strict sandboxing policies, ensure rigorous access controls, and maintain a clear boundary between the LLM operations and other critical system components.

Attacking the Application

Large Language Models (LLMs) themselves, as machine learning constructs, are not directly vulnerable to traditional web-based attacks such as Cross-Site Scripting (XSS). XSS attacks typically exploit vulnerabilities in web applications that do not properly sanitize user input, allowing attackers to inject malicious scripts into web pages viewed by other users.

However, if an LLM is integrated into a web application in a way that involves processing user input (such as generating responses based on user-provided text), and if the web application is not properly handling or sanitizing that input, it is conceivable for XSS to become a concern.

For example:

XSS via LLM Responses:

If an LLM is used to generate content that is directly displayed on a web page, and the content includes user-supplied input without proper sanitization, it could be manipulated to include malicious scripts that lead to XSS when rendered in a user's browser.

Reflection of Malicious Input:

If an LLM echoes back parts of user input in its responses, and those responses are incorporated into web pages without proper encoding, an attacker could craft input that results in an XSS attack when the LLM's response is displayed.

In both scenarios, the LLM is not the direct target or source of the XSS vulnerability; rather, it's the web application's handling of the LLM's output that creates the XSS risk. It's critical to ensure that any application using an LLM to process or display content in a web environment properly sanitizes both input to and output from the LLM to prevent XSS and other injection-based attacks.

Preventative measures include:

Content Security Policy (CSP):

Implementing a Content Security Policy is a powerful defense against XSS attacks. CSP allows the definition of a set of rules for the browser to follow, specifying which resources are allowed to be loaded. It can help mitigate the impact of XSS by preventing the execution of scripts from unauthorized sources.

Regular Model Updating and Patching:

Continuously update and patch the model to address any emerging vulnerabilities. Incorporate new data and learning to keep the model resilient against evolving attack vectors.

Monitoring and Logging:

Implement comprehensive monitoring and logging to keep track of how the model is being used. Analyze logs for signs of misuse or attempted attacks.

Ethical Guidelines and Usage Policies:

Establish clear ethical guidelines and usage policies for the model. Ensure users are aware of these guidelines and understand the importance of adhering to them.

Legal Concerns with LLMs

At the time of this paper was written, laws were being enacted regarding the safe usage of LLMs. This includes an Executive Order (EO) from President Joe Biden encompassing a wide spectrum of crucial topics, spanning from addressing algorithmic bias and safeguarding privacy to establishing regulations for the safety of advanced AI models known as ‘frontier models,’ such as GPT-4, Bard, and Llama 2.

The EO directs various government agencies to establish dedicated domains of AI regulation within the upcoming year. Additionally, it features a section dedicated to promoting open development in AI technologies, nurturing innovations in AI security, and leveraging AI tools to enhance overall security measures. 

Ethical Concerns with LLMs

Unethical behavior by a large language model (LLM) can manifest in various ways, depending on how the model is used or misused, and the content it generates. Here are some examples:

  • Bias Propagation: If an LLM is trained on biased data, it may produce outputs that perpetuate stereotypes or discriminatory viewpoints. For example, it could associate certain jobs or activities with a specific gender or ethnic group, reinforcing harmful societal biases.
  • Misinformation Spread: An LLM could be manipulated to generate and disseminate false information, contributing to the spread of misinformation or deepening social and political divides.
  • Privacy Violations: If an LLM is not designed with privacy considerations in mind, it may inadvertently generate content that reveals personal data, violating user privacy and potentially leading to identity theft or doxxing incidents
  • Content Manipulation for Deception: LLMs can be used to create convincing fake content, such as deepfakes or fraudulent communications, which can be used in phishing attacks or to deceive individuals for various malicious purposes.
  • Facilitation of Illegal Activities: An LLM could be prompted to provide advice or instructions on performing illegal activities, like hacking into secure systems, creating harmful substances, or evading laws.

It’s important to note that the development of ethical guidelines and the implementation of robust content moderation and filtering systems are crucial in preventing such unethical behavior by LLMs.

To summarize, LLM prompt injection attacks involve manipulating the input prompts provided to the model to elicit specific responses that may include dangerous, illegal, or unethical content. These attacks aim to bypass internal content filters and controls, exploiting the model’s capabilities to generate responses that align with the attacker’s objectives.

By injecting carefully crafted and deceptive prompts, threat actors can potentially deceive the LLM into generating harmful outputs, posing risks to individuals, organizations, and societal norms. Effective defense against these attacks requires robust content filtering mechanisms, continuous monitoring of model behavior, and proactive measures to detect and prevent the generation of harmful or misleading outputs.

Conclusion

In conclusion, as we navigate the complex interplay between technological innovation and security, the significance of safeguarding LLMs from emerging threats cannot be overstated. This white paper has introduced the anatomy of risks associated with LLMs, spotlighting the vulnerabilities outlined by the authoritative OWASP Top 10 list. From prompt injections to SSRF, these vulnerabilities are not mere glitches but profound gaps that could be exploited to unleash a cascade of adverse events in an increasingly AI-integrated software landscape. 

These vulnerabilities could serve as conduits for nefarious actors, compromising not only the integrity of the LLMs but also the very foundations of trust and reliability that underpin the burgeoning AI sector. In the quest to harness the transformative power of AI, we must be ever vigilant, ensuring that our advances are not sullied by lapses in security that could erode public confidence or stifle innovation.

Our Mission

Bright’s mission is to enable organizations to ship secure Applications and APIs at the speed of business. We do this by enabling quick & iterative scans to identify true and critical security vulnerabilities without compromising on quality, or software delivery speeds.

Bright empowers AppSec teams to provide the governance for securing APIs and web apps while enabling developers to take ownership of the actual security testing and remediation work early in the SDLC.

Why We Exist?

Bright exists because legacy DAST is broken. These legacy solutions are built for AppSec professionals, take hours, or even days, to run, find vulnerabilities late in the development process and are complex to deploy.

In today’s DevOps world, where companies release applications and APIs multiple times a day, a different approach is needed.