Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] btf/meta filter: introduce boolean expressions #496

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

vlrpl
Copy link
Contributor

@vlrpl vlrpl commented Feb 23, 2025

  • Introduced the new grammar
  • parsing is now done using pest (doesn't affect rpm as fedora ships the same version) with better error handling
  • the filter no longer relies on maps and pre-defined ebpf programs, but the instructions are dynamically generated and then inlined in the same way packet filter does unifying both types of filters (this was ready since a while waiting for the remaining part to land)
  • boolean expressions allow matches on multiple fields. It also supports short circuits meaning A || B doesn't process B if A is true (short circuits are applied at any level)
  • doc got updated (no plan to go back to EBNF as pest grammar is just as readable)
  • tests no longer check the output but just the correctness of the filter (everything but the instructions generated)

Meta filtering is still coupled with the inspector, but that could be easily lifted when it becomes generic and no longer bound to sk_buff only (requires per-probe filtering as well). Also, it ideally should only depend on generic BTF collections to make it reusable in any context.

Based on #494

The order was changed, but this adds nothing but having fields a
little scattered.

Signed-off-by: Paolo Valerio <[email protected]>
Filtering is no longer inlined in the main program, but it is now part
of a subprog. This has multiple benefits:

- stack separation
- reduced risk of unintentional register clobbering
- potentially reduced load time

Given the removal of the nop trick, pseudo calls are now fixed up
while scanning the program, also func_info and line_info offsets are
adjusted accordingly.
All the instructions need to be checked as relative jumps may point to
negative offset and so preceding subprogs, this is a bit tedious but
doesn't add much overhead compared to the previous way as the only
additional instructions scanned would be the hooks stubs.

This whole change also paves the way for dynamically generate
meta-filtering.

Signed-off-by: Paolo Valerio <[email protected]>
@vlrpl vlrpl changed the title btf/meta filter: introduce boolean expressions [RFC/draft] btf/meta filter: introduce boolean expressions Feb 23, 2025
The change doesn't only includes boolean expressions with parentheses
support, but also convert meta filtering as an instruction generator
unifying the way filters are set up.

The change is an RFC/RFT, it certainly require a different split and
probably some more clean up (although too many at once might be too
much).

Meta filtering is still coupled with the inspector, but that could
be easily lifted when it becomes generic and no longer bound to
sk_buff only (requires per probe filtering as well). Also, it ideally
should only depend on generic BTF collections to make it reusable in
any context.

Signed-off-by: Paolo Valerio <[email protected]>
@vlrpl vlrpl force-pushed the btf-filter branch 2 times, most recently from c55979f to 66cffa0 Compare February 24, 2025 08:04
- replaces the EBNF with Pest syntax
- include some examples of boolean expressions documenting syntax and
  precedence
- while at it, improve the remaining doc

Signed-off-by: Paolo Valerio <[email protected]>
@vlrpl vlrpl changed the title [RFC/draft] btf/meta filter: introduce boolean expressions [RFC] btf/meta filter: introduce boolean expressions Feb 27, 2025
Copy link
Contributor

@atenart atenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly looked at the first 2 commits (started looking at the third one but will need to come back at it a few times). The general direction is amazing and I'm not seeing any major blocker preventing this from making it out of RFC. I'd still like to test it a bit before you invest more time; just in case.

I'm submitting the comments on the first commits, in case they can be merged separately (I saw but too late that they are in another PR, but I don't think I can move comments between PRs; sorry).

for (pos, insn) in insns.iter().enumerate() {
let imm = insn.imm as u32;

if insn.code != (bpf_sys::BPF_JMP | bpf_sys::BPF_CALL) || imm == 0xBAD2310 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on 0xBAD2310 being a special value to mark instructions that failed CO-RE by libbpf.

filter.to_vec()
}

fn copy_btf_info<T>(ptr: *mut T, count: usize) -> Vec<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is currently used for BTF data; but is not BTF-specific. copy_ptr_to_vec?

Copy link
Contributor Author

@vlrpl vlrpl Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm.

Comment on lines +115 to +122
let (insns, insns_cnt) = unsafe {
(
libbpf_sys::bpf_program__insns(prog),
libbpf_sys::bpf_program__insn_cnt(prog),
)
};

let insns = unsafe { std::slice::from_raw_parts(insns, insns_cnt as usize) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nit but given the function is complex this will help reading it; the above can be squashed (no need for intermediate values).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree

// placeholder_calls is sorted, so proceed backward to avoid
// further adjustment.
for placeholder in placeholder_calls.iter().rev() {
let filter = retrieve_filter(placeholder.1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called twice for each placeholder (in two different loops). Could they be merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loops are kept intentionally separate because we keep the pseudo calls index, and so forth, and injecting while fixing them up requires further adjustments of the index at every injection (based on whether they appear before, or after the injected code).

Splitting made things clearer to read without a real impact (placeholder_calls is k and can be max 3 with metafiltering).
I guess if we don't want to change the logic, we can store this in a HashMap by code and reuse in the injection loop, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, I repushed this and the remaining non-pending suggestions to #494

@@ -49,6 +49,7 @@ impl FilterPacket {
bail!("Filter exceeds the maximum allowed size.");
}

// ebpf_filter.disasm();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover? :)

Copy link
Contributor Author

@vlrpl vlrpl Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely :)

Comment on lines +31 to +32
#define l2 0xdeadbeef
#define l3 0xdeadc0de
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the following comment in Retis,

/* We need an actual define here because __FILTER_MAX_INSNS is used by the
 * pre-processor who doesn't know about enums yet.
 */

IMO it would apply here too.

Copy link
Contributor Author

@vlrpl vlrpl Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'm not sure as we'd essentially end up saying we need a define because with preprocessor enums cannot be expanded to literals, right?
With BINDING_DEF() probably could make sense to get the point without looking at the macro (it hides the fact it is an enum), but here things are pretty explicit. But if you still think it adds value, I can add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's explicit if you know about it :) Up to you, I personally like to describe tricks so older me understands quickly what's going on, but not a blocker.

@@ -1,6 +1,6 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3
version = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep version 3 for now.

@@ -6,7 +6,7 @@

def extract_annotated_types(file_path, annotation, clang_args):
index = clang.cindex.Index.create()
translation_unit = index.parse(file_path, args=clang_args)
translation_unit = index.parse(file_path, args=clang_args, options=clang.cindex.TranslationUnit.PARSE_INCOMPLETE | clang.cindex.TranslationUnit.PARSE_SKIP_FUNCTION_BODIES)
Copy link
Contributor Author

@vlrpl vlrpl Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a leftover. It's part of an investigation I was doing with bindings (but turns out bindgen is more sensitive) and macros that don't always seem to play well in all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants