XML standard allows the use of entities, declared in the DOCTYPE of the document, which can be internal or external.
When parsing the XML file, the content of the external entities is retrieved from an external storage such as the file system or network, which may
lead, if no restrictions are put in place, to arbitrary file disclosures or server-side request forgery (SSRF) vulnerabilities.
It’s recommended to limit resolution of external entities by using one of these solutions:
- If DOCTYPE is not necessary, completely disable all DOCTYPE declarations.
- If external entities are not necessary, completely disable their declarations.
- If external entities are necessary then:
- Use XML processor features, if available, to authorize only required protocols (eg: https).
- And use an entity resolver (and optionally an XML Catalog) to resolve only trusted entities.
Noncompliant Code Example
lxml module:
parser = etree.XMLParser() # Noncompliant: by default resolve_entities is set to true
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=True) # Noncompliant
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=True) # Noncompliant
treexsd = etree.parse('ressources/xxe.xsd', parser)
rootxsd = treexsd.getroot()
schema = etree.XMLSchema(rootxsd)
ac = etree.XSLTAccessControl(read_network=True, write_network=False) # Noncompliant, read_network is set to true/network access is authorized
transform = etree.XSLT(rootxsl, access_control=ac)
xml.sax module:
parser = xml.sax.make_parser()
myHandler = MyHandler()
parser.setContentHandler(myHandler)
parser.setFeature(feature_external_ges, True) # Noncompliant
parser.parse("ressources/xxe.xml")
Compliant Solution
lxml module:
- When parsing XML, disable
resolve_entities
and network access:
parser = etree.XMLParser(resolve_entities=False, no_network=True) # Compliant
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=False) # Compliant: by default no_network is set to true
treexsd = etree.parse('ressources/xxe.xsd', parser)
rootxsd = treexsd.getroot()
schema = etree.XMLSchema(rootxsd) # Compliant
- When transforming XML, disable access to network and file system:
parser = etree.XMLParser(resolve_entities=False) # Compliant
treexsl = etree.parse('ressources/xxe.xsl', parser)
rootxsl = treexsl.getroot()
ac = etree.XSLTAccessControl.DENY_ALL # Compliant
transform = etree.XSLT(rootxsl, access_control=ac) # Compliant
To prevent xxe attacks with xml.sax module (for other security reasons than XXE, xml.sax is not recommended):
parser = xml.sax.make_parser()
myHandler = MyHandler()
parser.setContentHandler(myHandler)
parser.parse("ressources/xxe.xml") # Compliant: in version 3.7.1: The SAX parser no longer processes general external entities by default
parser.setFeature(feature_external_ges, False) # Compliant
parser.parse("ressources/xxe.xml")
See