Why is this an issue?
XML injections occur when an application builds an XML-formatted string from user input without prior validation or sanitation. In such a case, a
tainted user-controlled value can tamper with the XML string content. Especially, unexpected arbitrary elements and attributes can be inserted in the
corresponding XML description.
A malicious injection payload could, for example:
- Insert tags into the main XML document.
- Add attributes to an existing XML tag.
- Change the data value inside a tag.
A malicious user-supplied value can perform other modifications depending on where and how the constructed data is later used.
What is the potential impact?
The consequences of an XML injection attack on an application vary greatly depending on the application’s logic. It can affect the application
itself or another element if the XML document is used for cross-component data exchange. For this reason, the actual impact can range from benign
information disclosure to critical remote code execution.
Information disclosure
An attacker can forge an attack payload that will modify the XML document so that it will become syntactically incorrect. In that case, when the
data is later used, the parsing component will raise a technical error. If displayed back to the attacker or made available through log files, this
technical error may disclose sensitive business or technical information.
This scenario, while in general the less severe one, is the most frequently encountered. It can combine with any other logic-dependant threat.
Internal requests tampering
Some applications communicate with backend micro-services APIs using XML-based protocols such as SOAP. When those applications are vulnerable to
XML injections, attackers can tamper with the internal requests' content. This will allow them to change internal requests' parameters or locations
which, in turn, can lead to various consequences like performing unauthorized actions or accessing sensitive data.
For example, altering a user creation request in such a way can lead to a privilege escalation if attackers manage to modify the default account
privilege level.
Code execution
An application might build objects based on an XML serialization string. In that case, an attacker that would exploit an XML injection could be
able to alter the serialization string to modify the corresponding object’s properties.
Depending on the deserialization process, this might allow instantiating arbitrary objects or objects with sensitive properties altered. This can
lead to arbitrary code being executed in the same way as a deserialization injection vulnerability.
How to fix it in Java SE
Code examples
The following code is an example of an overly simple authentication function: The role of a user is set in an XML file and the default user role is
user
.
This example code is vulnerable to an XML injection vulnerability because it builds an XML string from user input without prior
sanitation or validation.
In this particular case, the query can be exploited with the following string:
attacker</username><role>admin</role></user>
<user><username>foo
By adapting and inserting this string into the username
field, an attacker would be able to log in as an admin.
Noncompliant code example
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
String xml =
"""<user>
<username>"""+req.getParameter("username")+"""</username>
<role>user</role>
</user>""";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
builder.parse(new InputSource(new StringReader(xml))); // Noncompliant
} catch (ParserConfigurationException | SAXException e) {
resp.sendError(400);
}
}
Compliant solution
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.newDocument();
Element user = doc.createElement("user");
doc.appendChild(user);
Element usernameElement = doc.createElement("username");
user.appendChild(usernameElement);
username_element.setTextContent(req.getParameter("username"));
Element role = doc.createElement("role");
user.appendChild(role);
role.setTextContent("user");
} catch (ParserConfigurationException e) {
resp.sendError(400);
}
}
How does this work?
In most cases, building XML strings with a direct concatenation of user input is discouraged. While not always possible, a strong pattern-based
validation can help sanitize tainted inputs. Likewise, converting to a harmless type can sometimes be a solution.
However, directly constructing Java objects should be preferred over handling the properties of objects as strings.
Programmatic object building
In most cases, an application can directly create documents from user input without having to build and parse an XML string. Doing so prevents
injection vulnerabilities as XML document construction libraries and functions will properly escape and check the type of input values.
Sometimes, the application might need to include the user input in a document built from a trusted XML string. In that case, the recommended
solution is to parse the trusted string first and then programmatically modify the resulting document.
The example compliant code takes advantage of the javax.xml
and org.w3c.dom
libraries capabilities to programmatically
build XML documents.
Converting to a harmless type
When the application allows it, casting user-submitted data to a harmless type can help prevent XML injection vulnerabilities. In particular,
converting user inputs to numeric types is an efficient sanitation mechanism.
This mechanism can be extended to other types, including more complex ones. However, care should be taken when dealing with them, as manually
validating or sanitizing complex types can represent a challenge.
Note that choosing this solution can be error-prone: every user input has to be validated or sanitized without oversight.
Resources
Standards