Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. #7318

bcaller · 2023-01-02T15:51:57Z

Some regular expressions are vulnerable to Regular Expression Denial of Service (ReDoS).

While the regexes are quite inefficient in the worst case, this does not manifest as a DoS vulnerability in PHP because of the PHP regex backtracking defaults https://www.php.net/manual/en/pcre.configuration.php. If a regular expression takes too long to process, PHP just bails and says no match found even if the string would eventually match.

However, in ports such as the ruby version which relies on these regular expressions, this manifests as a ReDoS vulnerability where a malicious visitor can send a crafted long user agent header and trigger denial of service.

Let's take:

Chrome/(\d+[\.\d]+).*MRCHROME

This regular expression is vulnerable to ReDoS due to the section:

(\d+[\.\d]+).*M

\d+ matches digits
[\.\d]+ matches digits (and '.')
.* matches digits (and everything else)

A malicious User-Agent string can be crafted containing 'Chrome/' followed by a long string of only digits:

Chrome/000000000000000000000000000000000000000000000000000 but with 3000+ zeroes

As this malicious string does NOT match the entire regular expression (no MRCHROME), the regular expression matcher will backtrack and try all N(N-1)/2 different ways of splitting the string of digits into the three \d groups. Due to this backtracking, the runtime of the malicious request will be approximately cubic with respect to N, so doubling the length of the string of digits will make processing take approximately 8 times longer.

There are some regexes remaining with quadratic ReDoS but that can be fixed another time if anybody cares. It requires much longer user-agent strings to trigger an actual effect.

The fixed regexes are not 100% identical but I'm hoping that the test coverage is sufficient to make sure I've not messed up here.

Description:

Certain regular expressions had poor performance when processing maliciously crafted user agent strings. In downstream ports such as ruby and nodejs, this resulted in a ReDoS vulnerability when processing User-Agent headers. The 15 affected patterns were altered to remove this issue.

Review

sgiehl

LGTM
@sanchezzzhak could you have a look here as well and check if you have any objections?
Might also be good to try to avoid possible ReDoS vulnerabilities in the future, even though they have not that much effect for PHP.

sanchezzzhak · 2023-01-03T15:01:49Z

to be honest, the changes are useless from part,

qutebrowser/(\d+\.[\.\d]+)[^\.\d].*Chrome

qutebrowser/10.01.0 a0000000000000000000000000000000000000000000Chrome

the most effective solution is to reduce the useragent reception to 500 chars.
For 7 years of collecting user agents saved, I have never approached this value.
the longest string I've ever received contains 387 chars.

bcaller · 2023-01-03T15:47:10Z

I agree with you that chopping the user agent to a few hunderd characters solves most non-exponential ReDoS.

regexes/client/browsers.yml

Some regular expressions are vulnerable to Regular Expression Denial of Service (ReDoS). While the regexes are quite inefficient in the worst case, this does not manifest as a DoS vulnerability in PHP because of the PHP regex backtracking defaults https://www.php.net/manual/en/pcre.configuration.php. If a regular expression takes too long to process, PHP just bails and says no match found even if the string would eventually match. However, in ports such as the ruby version which relies on these regular expressions, this manifests as a ReDoS vulnerability where a malicious visitor can send a crafted long user agent header and trigger denial of service. Let's take: Chrome/(\d+[\.\d]+).*MRCHROME This regular expression is vulnerable to ReDoS due to the section: (\d+[\.\d]+).*M \d+ matches digits [\.\d]+ matches digits (and '.') .* matches digits (and everything else) A malicious User-Agent string can be crafted containing 'Chrome/' followed by a long string of only digits: Chrome/000000000000000000000000000000000000000000000000000 but with 3000+ zeroes As this malicious string does NOT match the entire regular expression (no MRCHROME), the regular expression matcher will backtrack and try all N(N-1)/2 different ways of splitting the string of digits into the three \d groups. Due to this backtracking, the runtime of the malicious request will be approximately cubic with respect to N, so doubling the length of the string of digits will make processing take approximately 8 times longer. There are some remaining with quadratic ReDoS but that can be fixed another time if anybody cares. It requires much longer user-agent strings to trigger an actual effect. The fixed regexes are not 100% identical but I'm hoping that the test coverage is sufficient to make sure I've not messed up here.

sgiehl previously approved these changes Jan 3, 2023

View reviewed changes

sanchezzzhak reviewed Jan 3, 2023

View reviewed changes

regexes/client/browsers.yml Outdated Show resolved Hide resolved

bcaller dismissed sgiehl’s stale review via 4566c49 January 3, 2023 19:36

bcaller force-pushed the master branch from 8889f95 to 4566c49 Compare January 3, 2023 19:36

sanchezzzhak approved these changes Jan 3, 2023

View reviewed changes

sanchezzzhak changed the title ~~Remove ReDoS vulnerability~~ Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. Jan 3, 2023

sanchezzzhak merged commit 8d6c179 into matomo-org:master Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. #7318

Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. #7318

bcaller commented Jan 2, 2023 •

edited

Loading

sgiehl left a comment

sanchezzzhak commented Jan 3, 2023

bcaller commented Jan 3, 2023

Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. #7318

Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. #7318

Conversation

bcaller commented Jan 2, 2023 • edited Loading

Description:

Review

sgiehl left a comment

Choose a reason for hiding this comment

sanchezzzhak commented Jan 3, 2023

bcaller commented Jan 3, 2023

bcaller commented Jan 2, 2023 •

edited

Loading