XML External Entity (XXE) Vulnerabilities and How to Fix Them

Nedim Maric

What Is an XXE Attack?

XXE (XML External Entity Injection) is a common web-based security vulnerability that enables an attacker to interfere with the processing of XML data within a web application.  

While XML is an extremely popular format used by developers to transfer data between the web browser and the server, this results in XXE being a common security flaw.

XML requires a parser, which is typically where vulnerabilities occur. XXE enables an entity to be defined based on the content of a file path or URL. When the XML attack payload is read by the server, the external entity is parsed, merged into the final document, and returns it to the user with the sensitive data inside. 

XXE attacks can result in port scanning within the internal network, server-side request forgery (SSRF), data exfiltration, use of an organization’s servers to perform denial of service (DoS), and more. It is therefore important to implement XXE prevention strategies.

This is part of an extensive series of guides about application security

xxe vulnerability

In this article:

XXE Attack Types (With Code Examples)

Billion Laughs Attack

Consider a web application that accepts XML input and outputs the result. A request would look like this:

RequestPOST http://example.com/xml HTTP/1.1
<mytype>
Hello and welcome to my website!
</mytype>
ResponseHTTP/1.0 200 OK

Hello and welcome to my website!

XML documents can have a specific type, which can be defined using two standards – XSD and DTD. XML documents defined using a DTD are vulnerable to XXE attacks. 

See the following example, which uses a DTD called mytype. This DTD defines an XML entity called name. When this element is called in the HTML output, the XML parser reads the DTD and replaces it with a value.

RequestPOST http://example.com/xml HTTP/1.1

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE mytype [  
<!ELEMENT mytype ANY>  
<!ENTITY name "John">
]>
<mytype>  
Hello &name; and welcome to my website!
</mytype>
ResponseHTTP/1.0 200 OK 

Hello John and welcome to my website!

Now let’s see how an attacker can carry the so-called “billion laughs attack”. If the XML parser does not limit the amount of memory it can use, this attack uses a recursive technique to overload its memory. 

RequestPOST http://example.com/xml HTTP/1.1

<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE mytype [  
<!ELEMENT mytype ANY>  
<!ENTITY name "John ">  
<!ENTITY name2 "&name;&name;">  
<!ENTITY name3 "&name2;&name2;&name2;&name2;">  
<!ENTITY name4 "&name3;&name3;&name3;&name3;">]>
<foo>  
Hello &t3;
</foo>
ResponseResponse
HTTP/1.0 200 OK 

Hello John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John John

In essence, this is a type of denial of service (DoS) attack that can deny access to an application relying on an XML parser.

XXE SSRF Attack

Now let’s see how a similar attack can be used to perform server side request forgery (SSRF). 

Here the attacker uses XML entities from external sources. When this happens, XXE becomes a server side request forgery (SSRF) attack.

An attacker can run a system command using an XML system identifier. Most XML parsers process external entities by default, and as a result, the server runs the system code in the malicious XML element. 

The following code shows how XXE can be used to return the contents of a sensitive file – the etc/hosts file.

RequestPOST http://example.com/xml HTTP/1.1

<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE mytype [  
<!ELEMENT mytype ANY>  
<!ENTITY malicious SYSTEM  
"file:///etc/hosts">
]>
<foo> 
&xxe;
</foo>
ResponseHTTP/1.0 200 OK 

IPAddress     Hostname    Alias
127.0.0.1   localhost web.mydomain.com
208.164.186.1 web.mydomain.com    web 
208.164.186.2 mail.openna.com mail
(...)

This attack can be extended to gain access to other files on the server, beyond system files. Some XML parsers make it possible to retrieve directory listings and use them to find other sensitive data on the machine.

Limitations of XXE SSRF attacks

XXE attacks are limited to obtaining files that contain plain text or valid XML. They cannot be used to obtain binary files, or files that contain code that is similar to XML, but in fact is not valid XML. This will return a parser error and the attacker will not be able to view their content. 

Related content: Read our guide to Server Side Request Forgery

Blind XXE Vulnerability

A blind XXE vulnerability means that the application does process external XML entities in an unsecure way, but does not return those entities in its responses. This means attackers will need to use advanced techniques to detect the vulnerability and exploit it. 

Attackers can still exfiltrate data using blind XXE, for example by causing the server to connect to a URL controlled by the attacker. 

How to Prevent XXE Vulnerabilities

Although a common vulnerability, preventing XXE attacks can be easily achieved with good coding practices and some language-specific advice.

XXE Vulnerability in Java

Java inherently makes a programmer’s task of defending against XXE less definive, due to the reliance on parsers. Java XML parsers are often vulnerable to XXE attacks, resulting in less control in securing your applications.

Thankfully, creators of these parsers are wise to this issue, actively ensuring that they are updated accordingly to be more secure, but you are still reliant on these third parties. Some of the most common XML parsers for Java include: 

  • Dom Parser
  • SAX Parser
  • JDOM Parser
  • DOM4J Parser
  • StAX Parser

When relying on third party parsers, you should disable DOCTYPES, which will automatically protect you from XXE attacks. 

XXE Vulnerability in PHP

PHP holds the title of perhaps the most popular back-end web application language, and as such, is a primary target for attackers, including XXE attacks. With attackers routinely finding new vulnerabilities, it is imperative to keep your PHP version up to date to secure your applications. 

In relation to XXE prevention, there are things that you can do in order to ensure you’re a victim. Since PHP version 8.0.0, it is highly recommended that you use libxml_disable_entity_loader. Further information on fully undersanding and implementing this functionality in your code can be found here

XXE Vulnerability in Python

Python’s popularity is growing each day with both new programmers and seasoned veterans. However, with rapid growth and expansion comes risk.

The first step in securing your Python applications is ensuring that the XML parsers you are using are safe. Some, such as Etree, Minidom, Xmlrpc, and Genshi are built with security in mind, resistant to XXE vulnerabilities. However, other popular modules such as Pulldom and Lxlm aren’t inherently safe, and precaution is advised.

Additional Prevention Tips

Here are a few general guidelines that can help you prevent XXE:

  • Manually disable DTDs – configure XML parsers in your applications to disable custom document type definitions (DTDs). Most applications don’t use DTDs, so this should not hurt any functionality, but can prevent XXE attacks.
  • Instrument your application server – insert checkpoints in specific parts of your code to monitor runtime execution, and detect and block classes related to XML processing. This can deal with XML parsers you missed somewhere in your application code, and can prevent the most severe XXE exploits which lead to remote code execution.
  • Use security tools – Web Application Firewalls (WAF) have built-in rules that can block obvious XXE inputs. Dynamic Application Security Testing (DAST) tools can scan for XXE vulnerabilities early in the development process and suggest how to remediate them.
  • Harden configuration against XXE – the regular application hardening best practices will also be effective against XXE. Limit permissions, validate all inputs to ensure they do not reach XML parsing logic, handle errors, use authentication and encryption, limit outbound traffic, and limit DNS communications.

Learn more in our detailed guide to XXE prevention

Real-Life Examples of XXE Vulnerability

Here are some real-life examples of XXE vulnerabilities:

  • Android development tools – some of the most popular Android development tools include Android Studio, Eclipse and APKTool. They all parse XML in a way that allow attackers to gain access through external entities, creating a huge exploit in these apps. Obviously, this was patched in later versions, but it serves as a good reminder that one can never be complacent when it comes to security in development, even when relying on industry leading third party tools.
  • WordPress – XXE vulnerabilities occurred in WordPress as well. This is especially alarming given approximately 40% of all websites use this CMS provider. 

XXE Protection with Bright

Bright Dynamic Application Security Testing (DAST) helps automate the detection and remediation of many vulnerabilities including XXE, early in the development process, across web applications and APIs. 

By shifting DAST scans left, and integrating them into the SDLC, developers and application security professionals can detect vulnerabilities early, and remediate them before they appear in production. Bright completes scans in minutes and achieves zero false positives, by automatically validating every vulnerability. This allows developers to adopt the solution and use it throughout the development lifecycle. 

Scan any web app, or REST, SOAP and GraphQL APIs to prevent XXE vulnerabilities – try Bright free

See Our Additional Guides on Key Application Security Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of application security.

Security Testing

Learn about security testing techniques and best practices for modern applications and microservices.

API Security

Learn how to secure application programming interfaces (API) and their sensitive data from cyber threats.

CSRF

Learn about cross site request forgery (CSRF) attacks which hijack authenticated connections to perform unauthorized actions.

XSS

Learn about cross site scripting (XSS) attacks which allow hackers to inject malicious code into visitor browsers.

LFI

Learn about local file injection (LFI) attacks which allow hackers to run malicious code on remote servers.

Website Security

Learn about how to defend critical websites and web applications against cyber threats.

Secure your app with every build

Sign up for a FREE Bright account.
Related Articles
Categories