-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] btf/meta filter: introduce boolean expressions #496
base: main
Are you sure you want to change the base?
Conversation
The order was changed, but this adds nothing but having fields a little scattered. Signed-off-by: Paolo Valerio <[email protected]>
Filtering is no longer inlined in the main program, but it is now part of a subprog. This has multiple benefits: - stack separation - reduced risk of unintentional register clobbering - potentially reduced load time Given the removal of the nop trick, pseudo calls are now fixed up while scanning the program, also func_info and line_info offsets are adjusted accordingly. All the instructions need to be checked as relative jumps may point to negative offset and so preceding subprogs, this is a bit tedious but doesn't add much overhead compared to the previous way as the only additional instructions scanned would be the hooks stubs. This whole change also paves the way for dynamically generate meta-filtering. Signed-off-by: Paolo Valerio <[email protected]>
The change doesn't only includes boolean expressions with parentheses support, but also convert meta filtering as an instruction generator unifying the way filters are set up. The change is an RFC/RFT, it certainly require a different split and probably some more clean up (although too many at once might be too much). Meta filtering is still coupled with the inspector, but that could be easily lifted when it becomes generic and no longer bound to sk_buff only (requires per probe filtering as well). Also, it ideally should only depend on generic BTF collections to make it reusable in any context. Signed-off-by: Paolo Valerio <[email protected]>
c55979f
to
66cffa0
Compare
- replaces the EBNF with Pest syntax - include some examples of boolean expressions documenting syntax and precedence - while at it, improve the remaining doc Signed-off-by: Paolo Valerio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly looked at the first 2 commits (started looking at the third one but will need to come back at it a few times). The general direction is amazing and I'm not seeing any major blocker preventing this from making it out of RFC. I'd still like to test it a bit before you invest more time; just in case.
I'm submitting the comments on the first commits, in case they can be merged separately (I saw but too late that they are in another PR, but I don't think I can move comments between PRs; sorry).
for (pos, insn) in insns.iter().enumerate() { | ||
let imm = insn.imm as u32; | ||
|
||
if insn.code != (bpf_sys::BPF_JMP | bpf_sys::BPF_CALL) || imm == 0xBAD2310 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment on 0xBAD2310
being a special value to mark instructions that failed CO-RE by libbpf.
filter.to_vec() | ||
} | ||
|
||
fn copy_btf_info<T>(ptr: *mut T, count: usize) -> Vec<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is currently used for BTF data; but is not BTF-specific. copy_ptr_to_vec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm.
let (insns, insns_cnt) = unsafe { | ||
( | ||
libbpf_sys::bpf_program__insns(prog), | ||
libbpf_sys::bpf_program__insn_cnt(prog), | ||
) | ||
}; | ||
|
||
let insns = unsafe { std::slice::from_raw_parts(insns, insns_cnt as usize) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nit but given the function is complex this will help reading it; the above can be squashed (no need for intermediate values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree
// placeholder_calls is sorted, so proceed backward to avoid | ||
// further adjustment. | ||
for placeholder in placeholder_calls.iter().rev() { | ||
let filter = retrieve_filter(placeholder.1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called twice for each placeholder (in two different loops). Could they be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loops are kept intentionally separate because we keep the pseudo calls index, and so forth, and injecting while fixing them up requires further adjustments of the index at every injection (based on whether they appear before, or after the injected code).
Splitting made things clearer to read without a real impact (placeholder_calls is k and can be max 3 with metafiltering).
I guess if we don't want to change the logic, we can store this in a HashMap by code and reuse in the injection loop, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, I repushed this and the remaining non-pending suggestions to #494
@@ -49,6 +49,7 @@ impl FilterPacket { | |||
bail!("Filter exceeds the maximum allowed size."); | |||
} | |||
|
|||
// ebpf_filter.disasm(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely :)
#define l2 0xdeadbeef | ||
#define l3 0xdeadc0de |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the following comment in Retis,
/* We need an actual define here because __FILTER_MAX_INSNS is used by the
* pre-processor who doesn't know about enums yet.
*/
IMO it would apply here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'm not sure as we'd essentially end up saying we need a define because with preprocessor enums cannot be expanded to literals, right?
With BINDING_DEF()
probably could make sense to get the point without looking at the macro (it hides the fact it is an enum), but here things are pretty explicit. But if you still think it adds value, I can add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explicit if you know about it :) Up to you, I personally like to describe tricks so older me understands quickly what's going on, but not a blocker.
@@ -1,6 +1,6 @@ | |||
# This file is automatically @generated by Cargo. | |||
# It is not intended for manual editing. | |||
version = 3 | |||
version = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep version 3 for now.
@@ -6,7 +6,7 @@ | |||
|
|||
def extract_annotated_types(file_path, annotation, clang_args): | |||
index = clang.cindex.Index.create() | |||
translation_unit = index.parse(file_path, args=clang_args) | |||
translation_unit = index.parse(file_path, args=clang_args, options=clang.cindex.TranslationUnit.PARSE_INCOMPLETE | clang.cindex.TranslationUnit.PARSE_SKIP_FUNCTION_BODIES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a leftover. It's part of an investigation I was doing with bindings (but turns out bindgen is more sensitive) and macros that don't always seem to play well in all cases.
Meta filtering is still coupled with the inspector, but that could be easily lifted when it becomes generic and no longer bound to sk_buff only (requires per-probe filtering as well). Also, it ideally should only depend on generic BTF collections to make it reusable in any context.
Based on #494