Why is this an issue?
Deserialization injections occur when applications deserialize wholly or partially untrusted data without verification.
What is the potential impact?
In the context of a web application performing unsafe deserialization:
After detecting the injection vector, attackers inject a
carefully-crafted payload into the application.
Below are some real-world scenarios that illustrate some impacts of an attacker exploiting the vulnerability.
Application-specific attacks
In this scenario, the attackers succeed in injecting an object of the expected class, but with malicious properties that affect the object’s
behavior.
If the application relies on the properties of the deserialized object, attackers can modify the data structure or content to escalate privileges
or perform unwanted actions.
In the context of an e-commerce application, this could be changing the number of products or prices.
Full application compromise
In the worst-case scenario, the attackers succeed in injecting an object of a completely different class than expected, triggering code
execution.
Depending on the attacker, code execution can be used with different intentions:
- Download the internal server’s data, most likely to sell it.
- Modify data, install malware, for instance, malware that mines cryptocurrencies.
- Stop services or exhaust resources, for instance, with fork bombs.
This threat is particularly insidious if the attacked organization does not maintain a Disaster Recovery Plan (DRP).
Root privilege escalation and pivot
In this scenario, the attacker can do everything described in the previous section. The difference is that the attacker additionally manages to
elevate his privileges as an administrator and attack other servers.
Here, the impact depends on how much the target company focuses on its Defense In Depth. For example, the entire infrastructure can be compromised
through a combination of unsafe deserialization and misconfiguration:
- Docker or Kubernetes clusters
- cloud services
- network firewalls and routing
- OS access control
How to fix it in Python Standard Library
Code examples
The following code is vulnerable to deserialization attacks because it deserializes HTTP data without validating it first.
Noncompliant code example
def unsafe():
objstr = b64decode(request.args.get("object"))
obj = pickle.loads(objstr)
return str(obj.status == "OK")
Compliant solution
def safe():
obj = json.loads(request.args.get("object"))
return str(obj["status"] == "OK")
How does this work?
Allowing users to provide data for deserialization generally creates more problems than it solves.
Anything that can be done through deserialization can generally be done with more secure data structures.
Therefore, our first suggestion is to
avoid deserialization in the first place.
However, if deserialization mechanisms are valid in your context, here are some security suggestions.
More secure serialization methods
Some more secure serialization methods reduce the risk of security breaches, although not definitively.
A complete object serializer is probably unnecessary if you only need to receive primitive data (for example integers, strings, bools, etc.).
In this case, formats such as JSON and XML protect the application from deserialization attacks by default.
For more complex objects, the next step is to control which class fields are exposed by creating class-specific serialization methods.
The most
common method is to use Data Transfer Objects (DTO) patterns or Google Protocol Buffers (protobufs). After creating the Protobuf data structure, the
Protobuf compiler creates class files that handle operations such as serializing and deserializing data.
Integrity check
Message authentication codes (MAC) can be used to prevent tampering with serialized data that is meant to be stored outside the application
server:
- On the server-side, when serializing an object, compute a MAC of the result and append it to the serialized object string.
- When the serialized value is submitted back, verify the serialization string MAC on the server side before deserialization.
Depending on the situation, two MAC computation modes can be used.
If the same application will be responsible for the MAC computing and validation, a symmetric signature algorithm can be used. In that case, HMAC
should be preferred, with a strong underlying hash algorithm such as SHA-256.
If multiple parties have to validate the serialized data, an asymetric signature algorithm should be used. This will reduce the chances for a
signing secret to be leaked. In that case, the RSASSA-PSS
algorithm can be used.
Note: Be sure to store the signing secret securely.
How to fix it in PyYAML
Code examples
The following code is vulnerable to deserialization attacks because it deserializes untrusted data without validating it first.
Noncompliant code example
def unsafe():
obj = yaml.load(request.args.get("object"), Loader=yaml.Loader)
return str(obj["status"] == "OK")
Compliant solution
def safe():
obj = yaml.safe_load(request.args.get("object"))
return str(obj["status"] == "OK")
How does this work?
Allowing users to provide data for deserialization generally creates more problems than it solves.
Anything that can be done through deserialization can generally be done with more secure data structures.
Therefore, our first suggestion is to
avoid deserialization in the first place.
However, if deserialization mechanisms are valid in your context, here are some security suggestions.
More secure serialization methods
Some more secure serialization methods reduce the risk of security breaches, although not definitively.
A complete object serializer is probably unnecessary if you only need to receive primitive data (for example integers, strings, bools, etc.).
In this case, formats such as JSON and XML protect the application from deserialization attacks by default.
For more complex objects, the next step is to control which class fields are exposed by creating class-specific serialization methods.
The most
common method is to use Data Transfer Objects (DTO) patterns or Google Protocol Buffers (protobufs). After creating the Protobuf data structure, the
Protobuf compiler creates class files that handle operations such as serializing and deserializing data.
The example compliant solution uses the safe_load
method in place of the less secure load
one. It disables loading
arbitrary Python object and thus prevents the execution of arbitrary or dangerous functions.
Note that the same level of security can be reached with the load
method as soon as a safe Loader
component is passed to
it.
yaml.load(untrusted, Loader=yaml.SafeLoader)
Integrity check
Message authentication codes (MAC) can be used to prevent tampering with serialized data that is meant to be stored outside the application
server:
- On the server-side, when serializing an object, compute a MAC of the result and append it to the serialized object string.
- When the serialized value is submitted back, verify the serialization string MAC on the server side before deserialization.
Depending on the situation, two MAC computation modes can be used.
If the same application will be responsible for the MAC computing and validation, a symmetric signature algorithm can be used. In that case, HMAC
should be preferred, with a strong underlying hash algorithm such as SHA-256.
If multiple parties have to validate the serialized data, an asymetric signature algorithm should be used. This will reduce the chances for a
signing secret to be leaked. In that case, the RSASSA-PSS
algorithm can be used.
Note: Be sure to store the signing secret securely.
Resources
Standards