While technically out of the scope of this specification, Section
10 of
RFC 3629 [
STD63] applies to implementations. Particular note needs to be taken of the last paragraph of Section
3 of
RFC 3629 [
STD63]; an I-Regexp implementation may need to mitigate limitations of the platform implementation in this regard.
As discussed in
Section 6, more complex regexp libraries may contain exploitable bugs, which can lead to crashes and remote code execution. There is also the problem that such libraries often have performance characteristics that are hard to predict, leading to attacks that overload an implementation by matching against an expensive attacker-controlled regexp.
I-Regexps have been designed to allow implementation in a way that is resilient to both threats; this objective needs to be addressed throughout the implementation effort. Non-checking implementations (see
Section 3.1) are likely to expose security limitations of any regexp engine they use, which may be less problematic if that engine has been built with security considerations in mind (e.g., [
RE2]). In any case, a checking implementation is still
RECOMMENDED.
Implementations that specifically implement the I-Regexp subset can, with care, be designed to generally run in linear time and space in the input and to detect when that would not be the case (see below).
Existing regexp engines should be able to easily handle most I-Regexps (after the adjustments discussed in
Section 5), but may consume excessive resources for some types of I-Regexps or outright reject them because they cannot guarantee efficient execution. (Note that different versions of the same regexp library may be more or less vulnerable to excessive resource consumption for these cases.)
Specifically, range quantifiers (as in
a{2,4}) provide particular challenges for both existing and I-Regexp focused implementations. Implementations may therefore limit range quantifiers in composability (disallowing nested range quantifiers such as
(a{2,4}){2,4}) or range (disallowing very large ranges such as
a{20,200000}), or detect and reject any excessive resource consumption caused by range quantifiers.
I-Regexp implementations that are used to evaluate regexps from untrusted sources need to be robust in these cases. Implementers using existing regexp libraries are encouraged:
-
to check their documentation to see if mitigations are configurable, such as limits in resource consumption, and
-
to document their own degree of robustness resulting from employing such mitigations.