Why is this an issue?
Regular expression injections occur when the application retrieves untrusted data and uses it as a regex to pattern match a string with it.
Most regular expression search engines use backtracking to try all possible regex execution paths when evaluating an input. Sometimes this
can lead to performance problems also referred to as catastrophic backtracking situations.
What is the potential impact?
In the context of a web application vulnerable to regex injection:
After discovering the injection point, attackers insert data into the
vulnerable field to make the affected component inaccessible.
Depending on the application’s software architecture and the injection point’s location, the impact may or may not be visible.
Below are some real-world scenarios that illustrate some impacts of an attacker exploiting the vulnerability.
Self Denial of Service
In cases where the complexity of the regular expression is exponential to the input size, a small, carefully-crafted input (for example, 20 chars)
can trigger catastrophic backtracking and cause a denial of service of the application.
Super-linear regex complexity can produce the same effects for a large, carefully crafted input (thousands of chars).
If the component jeopardized by this vulnerability is not a bottleneck that acts as a single point of failure (SPOF) within the application, the
denial of service might only affect the attacker who initiated it.
Such benign denial of service can also occur in architectures that rely heavily on containers and container orchestrators. Replication systems
would detect the failure of a container and automatically replace it.
Infrastructure SPOFs
However, a denial of service attack can be critical to the enterprise if it targets a SPOF component. Sometimes the SPOF is a software architecture
vulnerability (such as a single component on which multiple critical components depend) or an operational vulnerability (for example, insufficient
container creation capabilities or failures from containers to terminate).
In either case, attackers aim to exploit the infrastructure weakness by sending as many malicious payloads as possible, using potentially huge
offensive infrastructures.
These threats are particularly insidious if the attacked organization does not maintain a disaster recovery plan (DRP).
How to fix it in Java SE
Code examples
The following noncompliant code is vulnerable to Regex Denial of Service because untrusted data is used as a regex to scan a string without prior
sanitization or validation.
Noncompliant code example
public boolean validate(HttpServletRequest request) {
String regex = request.getParameter("regex");
String input = request.getParameter("input");
return input.matches(regex);
}
Compliant solution
public boolean validate(HttpServletRequest request) {
String regex = request.getParameter("regex");
String input = request.getParameter("input");
return input.matches(Pattern.quote(regex));
}
How does this work?
Sanitization and Validation
Metacharacters escape using native functions is a solution against regex injection.
The escape function sanitizes the input so that the regular
expression engine interprets these characters literally.
An allowlist approach can also be used by creating a list containing authorized and secure strings you want the application to use in a query.
If a user input does not match an entry in this list, it should be considered unsafe and rejected.
Important Note: The application must sanitize and validate on the server side. Not on client-side front end.
Where possible, use non-backtracking regex engines, for example, Google’s re2.
In the example, Pattern.quote
escapes metacharacters and escape sequences that could have broken the initially intended logic.
Resources
Articles & blog posts
Standards