Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. #7318

Merged
merged 1 commit into from
Jan 3, 2023

Conversation

bcaller
Copy link
Contributor

@bcaller bcaller commented Jan 2, 2023

Some regular expressions are vulnerable to Regular Expression Denial of Service (ReDoS).

While the regexes are quite inefficient in the worst case, this does not manifest as a DoS vulnerability in PHP because of the PHP regex backtracking defaults https://www.php.net/manual/en/pcre.configuration.php. If a regular expression takes too long to process, PHP just bails and says no match found even if the string would eventually match.

However, in ports such as the ruby version which relies on these regular expressions, this manifests as a ReDoS vulnerability where a malicious visitor can send a crafted long user agent header and trigger denial of service.

Let's take:

Chrome/(\d+[\.\d]+).*MRCHROME

This regular expression is vulnerable to ReDoS due to the section:

(\d+[\.\d]+).*M

\d+ matches digits
[\.\d]+ matches digits (and '.')
.* matches digits (and everything else)

A malicious User-Agent string can be crafted containing 'Chrome/' followed by a long string of only digits:

Chrome/000000000000000000000000000000000000000000000000000 but with 3000+ zeroes

As this malicious string does NOT match the entire regular expression (no MRCHROME), the regular expression matcher will backtrack and try all N(N-1)/2 different ways of splitting the string of digits into the three \d groups. Due to this backtracking, the runtime of the malicious request will be approximately cubic with respect to N, so doubling the length of the string of digits will make processing take approximately 8 times longer.

There are some regexes remaining with quadratic ReDoS but that can be fixed another time if anybody cares. It requires much longer user-agent strings to trigger an actual effect.

The fixed regexes are not 100% identical but I'm hoping that the test coverage is sufficient to make sure I've not messed up here.

Description:

Certain regular expressions had poor performance when processing maliciously crafted user agent strings. In downstream ports such as ruby and nodejs, this resulted in a ReDoS vulnerability when processing User-Agent headers. The 15 affected patterns were altered to remove this issue.

Review

sgiehl
sgiehl previously approved these changes Jan 3, 2023
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@sanchezzzhak could you have a look here as well and check if you have any objections?
Might also be good to try to avoid possible ReDoS vulnerabilities in the future, even though they have not that much effect for PHP.

@sanchezzzhak
Copy link
Collaborator

to be honest, the changes are useless from part,

qutebrowser/(\d+\.[\.\d]+)[^\.\d].*Chrome

qutebrowser/10.01.0 a0000000000000000000000000000000000000000000Chrome

the most effective solution is to reduce the useragent reception to 500 chars.
For 7 years of collecting user agents saved, I have never approached this value.
the longest string I've ever received contains 387 chars.

Some regular expressions are vulnerable to Regular Expression Denial of
Service (ReDoS).

While the regexes are quite inefficient in the worst case, this does not
manifest as a DoS vulnerability in PHP because of the PHP regex
backtracking defaults
https://www.php.net/manual/en/pcre.configuration.php.
If a regular expression takes too long to process, PHP just bails and
says no match found even if the string would eventually match.

However, in ports such as the ruby version which relies on these regular
expressions, this manifests as a ReDoS vulnerability where a malicious
visitor can send a crafted long user agent header and trigger denial of
service.

Let's take:

Chrome/(\d+[\.\d]+).*MRCHROME

This regular expression is vulnerable to ReDoS due to the section:

(\d+[\.\d]+).*M

\d+ matches digits
[\.\d]+ matches digits (and '.')
.* matches digits (and everything else)

A malicious User-Agent string can be crafted containing
'Chrome/' followed by a long string of only digits:

Chrome/000000000000000000000000000000000000000000000000000
but with 3000+ zeroes

As this malicious string does NOT match the entire regular expression
(no MRCHROME), the regular expression matcher will backtrack and try
all N(N-1)/2 different ways of splitting the string of digits into the
three \d groups. Due to this backtracking, the runtime of the
malicious request will be approximately cubic with respect to N, so
doubling the length of the string of digits will make processing take
approximately 8 times longer.

There are some remaining with quadratic ReDoS but that can be fixed
another time if anybody cares. It requires much longer user-agent
strings to trigger an actual effect.

The fixed regexes are not 100% identical but I'm hoping that the test
coverage is sufficient to make sure I've not messed up here.
@sanchezzzhak sanchezzzhak changed the title Remove ReDoS vulnerability Improving regex rules for browser versions: Chromium GOST, CoolBrowser, Amigo, Opera Mobile etc. Jan 3, 2023
@sanchezzzhak sanchezzzhak merged commit 8d6c179 into matomo-org:master Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants