In regular expressions the escape sequence \cX
, where the X stands for any character that’s either @
, any capital ASCII
letter, [
, \
, ]
, ^
or _
, represents the control character that "corresponds" to the
character following \c
, meaning the control character that comes 64 bytes before the given character in the ASCII encoding.
In some other regex engines (for example in that of Perl) this escape sequence is case insensitive and \cd
produces the same control
character as \cD
. Further using \c
with a character that’s neither @
, any ASCII letter, [
,
\
, ]
, ^
nor _
, will produce a warning or error in those engines. Neither of these things is true
in Java, where the value of the character is always XORed with 64 without checking that this operation makes sense. Since this won’t lead to a
sensible result for characters that are outside of the @
to _
range, using \c
with such characters is almost
certainly a mistake.
Noncompliant code example
Pattern.compile("\\ca"); // Noncompliant, 'a' is not an upper case letter
Pattern.compile("\\c!"); // Noncompliant, '!' is outside of the '@'-'_' range
Compliant solution
Pattern.compile("\\cA"); // Compliant, this will match the "start of heading" control character
Pattern.compile("\\c^"); // Compliant, this will match the "record separator" control character