-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new(userspace/libsinsp): support regular expression operator in sinsp filters #1904
Conversation
Very nice, on that note: Did we ever compare our simple string comparison approach with re2 simple string comparison (no regex)? Just to check if our simple string comparisons have the best performance since re2 is a pretty strong package. WDYT? |
@incertum from my experiments, our simple comparisons are generally faster by at least 5x. This is an indicator that the best practice is to use regex comparisons only where regex are the only way of implementing the check |
Great, thanks for sharing! |
auto evt = generate_getcwd_failed_entry_event(); | ||
|
||
// legit use case with a string | ||
EXPECT_TRUE(evaluate_filter_str(&m_inspector, "evt.source regex '^[s]{1}ysca[l]{2}$'", evt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in #1911 I'm refactoring these kind of tests to be like:
EXPECT_TRUE(eval_filter(evt, "evt.source regex '^[s]{1}ysca[l]{2}$'"));
The change to these will be very quick to do, but the file is going to conflict I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the changes since #1911 was merged?
This LGTM then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for having lost these comments entirely! Let me rebase and push the changes as suggested.
b4d9614
to
91c68fb
Compare
Perf diff from master - unit tests
Perf diff from master - scap file
Heap diff from master - unit tests
Heap diff from master - scap file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: d9313d6a46ed1d69bafd283800f6aceac33daceb
|
Perf diff from master - unit tests
Perf diff from master - scap file
Heap diff from master - unit tests
Heap diff from master - scap file
|
Signed-off-by: Jason Dellaluce <[email protected]>
Signed-off-by: Jason Dellaluce <[email protected]>
…ings and euristics Signed-off-by: Jason Dellaluce <[email protected]>
Signed-off-by: Jason Dellaluce <[email protected]>
91c68fb
to
b28be6c
Compare
Perf diff from master - unit tests
Perf diff from master - scap file
Heap diff from master - unit tests
Heap diff from master - scap file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: c007313025128e3c404af20998594ea33da29c69
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: FedeDP, incertum, jasondellaluce The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
/area libsinsp
/area tests
Does this PR require a change in the driver versions?
What this PR does / why we need it:
This has been requested by the community since a long time (~6 years) both publicly and in private channels. The main concern of supporting regular expressions is the potential performance impact on high events throughput such as the one of syscalls. However, I consider the project mature enough and in need of serving a multitude of different use cases, so I'm proposing to support this feature and empowering the user with the choice and wisdom on how best using it.
The contribution implements a new
regex
operator that can be used with string fields such asfd.name regex [a-z]*/proc/[0-9]+/cmdline
or similars. The supported standard is the one of https://github.com/google/re2, on which we base our regex implementation.This has also the benefit of making performance predictable, as RE2 does not support backtracking and full regex support (from my understanding). On top of that, I added few filter compilation heuristic warnings/checks (with tests) to inform users in cases where regular expression operators are used but not truly needed.
It is crucial that this operator is used with parsimony and only where strictly needed. From my personal benchmarks, the
regex
operator is 5x to 10x than a simple equality comparison for simple examples. As such, existing more trivial operators are always preferable in case they can implement the same check logic.Which issue(s) this PR fixes:
Fixes:
Special notes for your reviewer:
/milestone 0.18.0
Does this PR introduce a user-facing change?: