-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: handle multiline dotnev edge case #52268
Conversation
cc @nodejs/platform-windows. this bug is only reproducible on windows. would you mind taking a look? |
The test is not failing because of multi-line, is failling because the value can't have the length higher than 291, try:
If you remove 1 character, it will run successfully. |
Why 292 @H4ad? Am I missing something or is this Windows magic? |
I have no idea 😆 From what I'm trying to debug, it throws during the So, maybe is an issue/limitation of |
There is a limit to the size of an env. variable under Windows although it does not look like that is the issue here: https://devblogs.microsoft.com/oldnewthing/20100203-00/?p=15083 |
@lemire the code breaks during the parsing, so I don't think the size of the env impacts something. |
Minimal reproduction of the stack overflow caused by this issue: #include <regex>
#include <iostream>
int main() {
std::string content = "JWT_PUBLIC_KEY=\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\"";
std::regex LINE(
"\\s*(?:export\\s+)?([\\w.-]+)(?:\\s*=\\s*?|:\\s+?)(\\s*'(?:\\\\'|[^']"
")*'|\\s*\"(?:\\\\\"|[^\"])*\"|\\s*`(?:\\\\`|[^`])*`|[^#\r\n]+)?\\s*(?"
":#.*)?");
std::smatch match;
if (std::regex_search(content, match, LINE)) {
std::cout << match[0].str() << std::endl;
std::cout << match[1].str() << std::endl;
}
} Stack trace
Is it an issue with C++? |
My comment was to the effect that the limit is too short here for the variable size limit to be a factor. So we are in agreement. (Just so we are clear.) |
The issue is always surely the complexity of the regular expression: docopt/docopt.cpp#49 |
In general, a regex match is an NP-hard problem so a given implementation can nearly run forever on a given problem. This recently caused a major outage of cloudflare:
https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ Using very complicated regular expressions in production code is risky. |
Maybe this could be a good reason to rewrite this code back to JS instead of C++ |
This has to be in C++ because we support |
You can have a look at the regular expression here: https://www.debuggex.com/r/KjGfODUlA1pHPDIC Note that this happens with Visual Studio, but it is just generally true that complicated regular expressions with arbitrarily large inputs are dangerous, irrespective of the underlying engine. See
My recommendations:
|
The complicated regular expression appears to be a new addition: I should stress that this regular expression is likely slow. It would be better to write regular code to replace it.
I submit to you that one way to view a regular expression in this instance is not as C++ programming. Regular expressions are their own language... but can appear in C++, Java, JavaScript... Just like SQL. I expect that you can crash JavaScript, Lua, Java, C#... with a complicated regex on an arbitrarily large input. I recommend reading the Cloudflare bug report... https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ Again... either you limit the size of the input processed by the regular expression, or you get rid of the complicated regular expression. I think that these are the practical options. |
Input limit might be an unsolvable breaking issue for someone relying on large inputs. |
Here is a classical example in Java: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993 |
At a glance, the code was up until recently conventional C++ code written by @anonrig It was not very complicated. It was seemingly deliberately replaced with this regular expression. The original code was assuredly faster. @anonrig could put back his code, maybe adapted for the recent changes. This PR should then be merged, so that the bug does not reoccur. |
@H4ad would you be interested in rewriting it without regexp? i'm happy to help. |
I'm but currently I don't have time for this, my focus now is on the research for logging lib on Node. |
@lemire I was the one who introduced a complex regex in production, not @anonrig 😛
@anonrig, I'll work on rewriting it without the regex and may seek your assistance. |
This PR is not for landing. I wanted to run this test on Windows on CI.
Ref: #52248
cc @kibertoad