# XML External Entity Attacks

**XML External Entity** (XXE) attacks allow a malicious user to read arbitrary files on your server by taking advantage of an unsecured XML parser. If your web-server parses XML you should be sure to disable parsing of inline *Document Type Definitions* (DTDs), since these can be maliciously crafted by an attacker to probe for files on your server.

## Anatomy of an XXE Attack

XML is a useful data format because data files can be checked for correctness before being processed. The structure of an XML document can be validated against a **Document Type Definition** (DTD). DTDs can be inlined in XML documents, and can refer to external entities.

This is where problems can occur. In the process of resolving external entities, an XML parser may consult various networking protocols depending on the scheme specified in URLs. By making clever use of external entity references, an attacker can probe your server for files, hang the parser altogether
by referencing URLs that never respond, or trigger fraudulent requests on the server-side.

Below is an example of an XML document with an inline DTD, that references a local file `/etc/passwd`, commonly used to store user information:

“`dtd
<?xml version=”1.0″ encoding=”utf-8″?>
<!DOCTYPE xrds [
<!ENTITY passwords SYSTEM “file://etc/passwd”>
]>
<xrds>
&passwords
</xrds>
“`

An unsecured XML parser will expand the file inline during parsing:

“`dtd
<?xml version=”1.0″ encoding=”utf-8″?>
<xrds>
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
</xrds>
“`

If this expanded XML document is returned to the attacker, or leaked in error messages, the attacker can read arbitrary files on your server in this manner.

## Mitigation

Inline DTDs are a rarely used feature. However, XML external attacks remain a risk because many XML parsing libraries do not disable this feature by default. **Make sure your XML parser configuration disables this feature.** This is done slightly differently depending upon which XML parsing library you are using. You generally have three options, any of which will keep you safe:

* Disable DOCTYPE declarations altogether.
* Disable external entity declarations.
* Disallow any protocols by external entities.

The most secure way to process XML in Python is to use the `defusedxml` package, which offers drop-in replacements for each of the XML parsers in the standard library, specifically hardened against vulnerabilities. It is *strongly* recommended you switch to this package of you want to avoid XML exploits in your Python code.

The standard library parsers also offer some protection against XXE attacks, in case switching to `defusedxml` is not feasible.

### The `sax` Package

Since version 3.7.1, the `sax` package disables processing of external entities by default. You should make sure you are running this version, or explicitly disable the feature as shown:

“`python
from xml.dom.pulldom import parse
from xml.sax import make_parser
from xml.sax.handler import feature_external_gesxml_parser = make_parser()
xml_parser.setFeature(feature_external_ges, False)
parse(xml_file, parser=xml_parser)
“`

### The `lxml` Package

When parsing XML, disable resolve_entities and network access as follows:

“`python
from lxml.etree import XMLParser, parsexml_parser = XMLParser(resolve_entities=False, no_network=True)
parsed_xml = parse(xml_file, xml_parser)
root_node = parsed_xml.getroot()
“`

### The `etree` Package

`xml.etree` does not expand entities and raises a `ParserError` when an entity occurs.

### The `minidom` Package

The `minidom` package does not expand entities and simply returns the unexpanded entity verbatim.

## Further Considerations

You should run your server processes with only the permissions they require to function – follow the *principle of least privilege*. This means restricting which directories in the file-system can be accessed. Consider running in a **chroot jail** if you are running on Linux.

This “defense in depth” approach means that even if an attacker manages to compromise your web-server with an XML attack, the damage they can do is limited.

## CWEs

* [CWE-611](https://cwe.mitre.org/data/definitions/611.html)

About ShiftLeft

ShiftLeft empowers developers and AppSec teams to dramatically reduce risk by quickly finding and fixing the vulnerabilities most likely to reach their applications and ignoring reported vulnerabilities that pose little risk. Industry-leading accuracy allows developers to focus on security fixes that matter and improve code velocity while enabling AppSec engineers to shift security left.

A unified code security platform, ShiftLeft CORE scans for attack context across custom code, APIs, OSS, containers, internal microservices, and first-party business logic by combining results of the company’s and Intelligent Software Composition Analysis (SCA). Using its unique graph database that combines code attributes and analyzes actual attack paths based on real application architecture, ShiftLeft then provides detailed guidance on risk remediation within existing development workflows and tooling. Teams that use ShiftLeft ship more secure code, faster. Backed by SYN Ventures, Bain Capital Ventures, Blackstone, Mayfield, Thomvest Ventures, and SineWave Ventures, ShiftLeft is based in Santa Clara, California. For information, visit: www.shiftleft.io.

Share

See for yourself – run a scan on your code right now