# XML External Entity Attacks
**XML External Entity** (XXE) attacks allow a malicious user to read arbitrary files on your server by taking advantage of an unsecured XML parser. If your web-server parses XML you should be sure to disable parsing of inline *Document Type Definitions* (DTDs), since these can be maliciously crafted by an attacker to probe for files on your server.
## Anatomy of an XXE Attack
XML is a useful data format because data files can be checked for correctness before being processed. The structure of an XML document can be validated against a **Document Type Definition** (DTD). DTDs can be inlined in XML documents, and can refer to external entities.
This is where problems can occur. In the process of resolving external entities, an XML parser may consult various networking protocols depending on the scheme specified in URLs. By making clever use of external entity references, an attacker can probe your server for files, hang the parser altogether
by referencing URLs that never respond, or trigger fraudulent requests on the server-side.
Below is an example of an XML document with an inline DTD, that references a local file `/etc/passwd`, commonly used to store user information:
“`dtd <?xml version=”1.0″ encoding=”utf-8″?> <!DOCTYPE xrds [ <!ENTITY passwords SYSTEM “file://etc/passwd”> ]> <xrds> &passwords </xrds> “` |
An unsecured XML parser will expand the file inline during parsing:
“`dtd <?xml version=”1.0″ encoding=”utf-8″?> <xrds> root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin </xrds> “` |
If this expanded XML document is returned to the attacker, or leaked in error messages, the attacker can read arbitrary files on your server in this manner.
## Mitigation
Inline DTDs are a rarely used feature. However, XML external attacks remain a risk because many XML parsing libraries do not disable this feature by default. **Make sure your XML parser configuration disables this feature.** This is done slightly differently depending upon which XML parsing library you are using. You generally have three options, any of which will keep you safe:
* Disable DOCTYPE declarations altogether.
* Disable external entity declarations.
* Disallow any protocols by external entities.
The most secure way to process XML in Python is to use the `defusedxml` package, which offers drop-in replacements for each of the XML parsers in the standard library, specifically hardened against vulnerabilities. It is *strongly* recommended you switch to this package of you want to avoid XML exploits in your Python code.
The standard library parsers also offer some protection against XXE attacks, in case switching to `defusedxml` is not feasible.
### The `sax` Package
Since version 3.7.1, the `sax` package disables processing of external entities by default. You should make sure you are running this version, or explicitly disable the feature as shown:
“`python from xml.dom.pulldom import parse from xml.sax import make_parser from xml.sax.handler import feature_external_gesxml_parser = make_parser() xml_parser.setFeature(feature_external_ges, False) parse(xml_file, parser=xml_parser) “` |
### The `lxml` Package
When parsing XML, disable resolve_entities and network access as follows:
“`python from lxml.etree import XMLParser, parsexml_parser = XMLParser(resolve_entities=False, no_network=True) parsed_xml = parse(xml_file, xml_parser) root_node = parsed_xml.getroot() “` |
### The `etree` Package
`xml.etree` does not expand entities and raises a `ParserError` when an entity occurs.
### The `minidom` Package
The `minidom` package does not expand entities and simply returns the unexpanded entity verbatim.
## Further Considerations
You should run your server processes with only the permissions they require to function – follow the *principle of least privilege*. This means restricting which directories in the file-system can be accessed. Consider running in a **chroot jail** if you are running on Linux.
This “defense in depth” approach means that even if an attacker manages to compromise your web-server with an XML attack, the damage they can do is limited.
## CWEs
* [CWE-611](https://cwe.mitre.org/data/definitions/611.html)