-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynptr key for hash map #8405
base: bpf-next_base
Are you sure you want to change the base?
Conversation
Upstream branch: b420b57 |
3ec13f6
to
e65d1a6
Compare
Upstream branch: 0fc5ddd |
8f7ac2f
to
d94cb8d
Compare
e65d1a6
to
dbbbca0
Compare
Upstream branch: c03320a |
d94cb8d
to
5866c92
Compare
dbbbca0
to
71beb23
Compare
Upstream branch: c03320a |
5866c92
to
d50b0d3
Compare
71beb23
to
b8121cf
Compare
Upstream branch: 57e71f8 |
d50b0d3
to
ec4f719
Compare
b8121cf
to
421ec9c
Compare
Add BPF_DYNPTR in btf_field_type to support bpf_dynptr in map key. The parsing of bpf_dynptr in btf will be done in the following patch, and the patch only adds two helpers: btf_new_bpf_dynptr_record() creates an btf record which only includes a bpf_dynptr and btf_type_is_dynptr() checks whether the btf_type is a bpf_dynptr or not. With the introduction of BPF_DYNPTR, BTF_FIELDS_MAX is changed from 11 to 13, therefore, update the hard-coded number in cpumask test as well. Signed-off-by: Hou Tao <[email protected]>
To support variable-length key or strings in map key, use bpf_dynptr to represent these variable-length objects and save these bpf_dynptr fields in the map key. As shown in the examples below, a map key with an integer and a string is defined: struct pid_name { int pid; struct bpf_dynptr name; }; The bpf_dynptr in the map key could also be contained indirectly in a struct as shown below: struct pid_name_time { struct pid_name process; unsigned long long time; }; If the whole map key is a bpf_dynptr, the map could be defined as a struct or directly using bpf_dynptr as the map key: struct map_key { struct bpf_dynptr name; }; The bpf program could use bpf_dynptr_init() to initialize the dynptr part in the map key, and the userspace application will use bpf_dynptr_user_init() or similar API to initialize the dynptr. Just like kptrs in map value, the bpf_dynptr field in the map key could also be defined in a nested struct which is contained in the map key struct. The patch updates map_create() accordingly to parse these bpf_dynptr fields in map key, just like it does for other special fields in map value. To enable bpf_dynptr support in map key, the map_type should be BPF_MAP_TYPE_HASH. For now, the max number of bpf_dynptr in a map key is limited as 1 and the limitation can be relaxed later. Signed-off-by: Hou Tao <[email protected]>
get_map_btf() gets the btf from the btf fd and ensure the btf is not a kernel btf. Signed-off-by: Hou Tao <[email protected]>
As for now, map_create() calls ->map_alloc_check() and ->map_alloc() first, then it initializes map btf. In order to support dynptr in map key, map_create() needs to check whether there is bpf_dynptr in map key btf type and passes the information to ->map_alloc_check() and ->map_alloc(). However, the case where btf_vmlinux_value_type_id > 0 needs special handling. The reason is that the probe of struct_ops map in libbpf doesn't pass a valid btf_fd to map_create syscall, and it expects ->map_alloc() to be invoked before the initialization of the map btf. If the initialization of the map btf happens before ->map_alloc(), the probe of struct_ops will fail. To prevent breaking the old libbpf in the new kernel, the patch only moves the initialization of btf before ->map_alloc_check() for non-struct-ops map case. Signed-off-by: Hou Tao <[email protected]>
Introduce an internal map flag BPF_F_DYNPTR_IN_KEY to support dynptr in map key. Add the corresponding helper bpf_map_has_dynptr_key() to check whether the support of dynptr-key is enabled. The reason for an internal map flag is twofolds: 1) user doesn't need to set the map flag explicitly map_create() will use the presence of bpf_dynptr in map key as an indicator of enabling dynptr key. 2) avoid adding new arguments for ->map_alloc_check() and ->map_alloc() map_create() needs to pass the supported status of dynptr key to ->map_alloc_check (e.g., check the maximum length of dynptr data size) and ->map_alloc (e.g., check whether dynptr key fits current map type). Adding new arguments for these callbacks to achieve that will introduce too much churns. Therefore, the patch uses the topmost bit of map_flags as the internal map flag. map_create() checks whether the internal flag is set in the beginning and bpf_map_get_info_by_fd() clears the internal flag before returns the map flags to userspace. Signed-off-by: Hou Tao <[email protected]>
When there is bpf_dynptr field in the map key btf type or the map key btf type is bpf_dyntr, set BPF_INT_F_DYNPTR_IN_KEY in map_flags. Signed-off-by: Hou Tao <[email protected]>
For map with dynptr key support, it needs to use map_extra to specify the maximum data length of these dynptrs. The implementation of the map will check whether map_extra is smaller than the limitation imposed by memory allocation during map creation. It may also use map_extra to optimize the memory allocation for dynptr. Signed-off-by: Hou Tao <[email protected]>
It is a preparatory patch for supporting map key with bpf_dynptr in verifier. The patch splits check_stack_range_initialized() into multiple small functions and the following patch will reuse these functions to check whether the access of stack range which contains bpf_dynptr is valid or not. Beside the splitting of check_stack_range_initialized(), the patch also changes its name to check_stack_range_access() to better reflect its purpose, because the function also allows uninitialized stack range. Signed-off-by: Hou Tao <[email protected]>
The patch basically does the following three things to enable dynptr key for bpf map: 1) Only allow PTR_TO_STACK typed register for dynptr key The main reason is that bpf_dynptr can only be defined in the stack, so for dynptr key only PTR_TO_STACK typed register is allowed. bpf_dynptr could also be represented by CONST_PTR_TO_DYNPTR typed register (e.g., in callback func or subprog), but it is not supported now. 2) Only allow fixed-offset for PTR_TO_STACK register Variable-offset for PTR_TO_STACK typed register is disallowed, because it is impossible to check whether or not the stack access is aligned with BPF_REG_SIZE and is matched with the location of dynptr or non-dynptr part in the map key. 3) Check the layout of the stack content is matched with the btf_record Firstly check the start offset of the stack access is aligned with BPF_REG_SIZE, then check the offset and the size of dynptr/non-dynptr parts in the stack range is consistent with the btf_record of the map key. Signed-off-by: Hou Tao <[email protected]>
For bpf map with dynptr key support, the userspace application will use bpf_dynptr_user to represent the bpf_dynptr in the map key and pass it to bpf syscall. The bpf syscall will copy from bpf_dynptr_user to construct a corresponding bpf_dynptr_kern object when the map key is an input argument, and copy to bpf_dynptr_user from a bpf_dynptr_kern object when the map key is an output argument. For now the size of bpf_dynptr_user must be the same as bpf_dynptr, but the last u32 field is not used, so make it a reserved field. Signed-off-by: Hou Tao <[email protected]>
Introduce bpf_copy_from_dynptr_ukey() helper to handle map key with bpf_dynptr when the map key is used in map lookup, update, delete and get_next_key operations. The helper places all variable-length data of these bpf_dynptr_user objects at the end of the map key to simplify the allocation and the freeing of map key with dynptr. Signed-off-by: Hou Tao <[email protected]>
For get_next_key operation, unext_key is used as an output argument. When there is dynptr in map key, unext_key will also be used as an input argument, because the userspace application needs to pre-allocate a buffer for each variable-length part in the map key and save the length and the address of these buffers in bpf_dynptr_user objects. To support get_next_key op for map with dynptr key, map_get_next_key() first calls bpf_copy_from_dynptr_ukey() to construct a map key in which each bpf_dynptr_kern object has the same size as the corresponding bpf_dynptr_user object. It then calls ->map_get_next_key() to get the next_key, and finally calls bpf_copy_to_dynptr_ukey() to copy both the non-dynptr part and dynptr part in the map key to unext_key. Signed-off-by: Hou Tao <[email protected]>
The patch supports lookup, update, delete and lookup_delete operations for hash map with dynptr map. There are two major differences between the implementation of normal hash map and dynptr-keyed hash map: 1) dynptr-keyed hash map doesn't support pre-allocation. The reason is that the dynptr in map key is allocated dynamically through bpf mem allocator. The length limitation for these dynptrs is 4088 bytes now. Because there dynptrs are allocated dynamically, the consumption of memory will be smaller compared with normal hash map when there are big differences between the length of these dynptrs. 2) the freed element in dynptr-key map will not be reused immediately For normal hash map, the freed element may be reused immediately by the newly-added element, so the lookup may return an incorrect result due to element deletion and element reuse. However dynptr-key map could not do that, there are pointers (dynptrs) in the map key and the updates of these dynptrs are not atomic: both the address and the length of the dynptr will be updated. If the element is reused immediately, the access of the dynptr in the freed element may incur invalid memory access due to the mismatch between the address and the size of dynptr, so reuse the freed element after one RCU grace period. Beside the differences above, dynptr-keyed hash map also needs to handle the maybe-nullified dynptr in the map key. After the support of dynptr key in hash map, the performance of lookup and update/delete operations in map_perf_test degrades a lot. Marking lookup_nulls_elem_raw() and lookup_elem_raw() as always_inline will narrow the gap from 21%/7% to 4%/2%. Therefore, the patch also adds always_inline for these two hot functions. The following lines show the detailed performance numbers: before patch: 0:hash_map_perf kmalloc 693450 events per sec 0:hash_lookup 89366531 lookups per sec after patch (without always_inline): 0:hash_map_perf kmalloc 650396 events per sec 0:hash_lookup 73961003 lookups per sec after patch: 0:hash_map_perf kmalloc 665317 events per sec 0:hash_lookup 87842644 lookups per sec Signed-off-by: Hou Tao <[email protected]>
It will be used by the following patch to shrink the size of dynptr when the actual data length is smaller than the size of dynptr during map_get_next_key operation. Signed-off-by: Hou Tao <[email protected]>
It firstly passed the key_record to htab_map_hash() and lookup_nulls_eleme_raw() to find the target key, then it uses htab_copy_dynptr_key() helper to copy from the target key to the next key used for output. Signed-off-by: Hou Tao <[email protected]>
Both batched map operation and dumping the map content through bpffs for maps with dynptr keys are not supported, so disable these operations for now. Signed-off-by: Hou Tao <[email protected]>
Enable BPF_INT_F_DYNPTR_IN_KEY in HTAB_CREATE_FLAG_MASK to support the creation of hash map with dynptr key. Signed-off-by: Hou Tao <[email protected]>
Add bpf_dynptr_user_init() to initialize a bpf_dynptr_user object. It will be used test_progs and bench. User can dereference the {data|size} fields directly to get the address and length of the dynptr object. Signed-off-by: Hou Tao <[email protected]>
Add three positive test cases to test the basic operations on the dynptr-keyed hash map. The basic operations include lookup, update, delete and get_next_key. These operations are exercised both through bpf syscall and bpf program. These three test cases use different map keys. The first test case uses both bpf_dynptr and a struct with only bpf_dynptr as map key, the second one uses a struct with an integer and a bpf_dynptr as map key, and the last one use a struct with bpf_dynptr being nested in another struct as map key. Also add multiple negative test cases for dynptr-keyed hash map. These test cases mainly check whether the layout of dynptr and non-dynptr in the stack is matched with the definition of map->key_record. Signed-off-by: Hou Tao <[email protected]>
The patch adds a benchmark test to compare the lookup and update/delete performance between normal hash map and dynptr-keyed hash map. It also compares the memory usage of these two maps after fill up these two maps. The benchmark simulates the case when the map key is composed of a 8-bytes integer and a variable-size string. Now the integer just saves the length of the string. These strings will be randomly generated by default, and they can also be specified by a external file (e.g., the output from awk '{print $3}' /proc/kallsyms). The key definitions for dynptr-keyed and normal hash map are defined as shown below: struct dynptr_key { __u64 cookie; struct bpf_dynptr desc; } struct norm_key { __u64 cookie; char desc[MAX_STR_SIZE]; }; The lookup or update procedure will first lookup an array to get the key of hash map. The returned value from the array is the same as norm_key definition. For normal hash map, it will use the returned value to manipulate the hash map directly. For dynptr-keyed hash map, it will construct a bpf_dynptr object from the returned value (the value of cookie is the same as the string length), then passes the key to dynptr-keyed hash map. Because the lookup procedure is lockless, therefore, each producer during lookup test will lookup the whole hash map. However, update and deletion procedures have lock, therefore, each producer during update test only updates different part of the hash map. The following is the benchmark results when running the benchmark under a 8-CPUs VM: (1) Randomly generate 128K strings (max_size=256, entries=128K) ENTRIES=131072 ./benchs/run_bench_dynptr_key.sh normal hash map =============== htab-lookup-p1-131072 2.977 ± 0.017M/s (drops 0.006 ± 0.000M/s, mem 64.984 MiB) htab-lookup-p2-131072 6.033 ± 0.048M/s (drops 0.015 ± 0.000M/s, mem 64.966 MiB) htab-lookup-p4-131072 11.612 ± 0.063M/s (drops 0.026 ± 0.000M/s, mem 64.984 MiB) htab-lookup-p8-131072 22.918 ± 0.315M/s (drops 0.055 ± 0.001M/s, mem 64.966 MiB) htab-update-p1-131072 2.121 ± 0.014M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) htab-update-p2-131072 4.138 ± 0.047M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) htab-update-p4-131072 7.378 ± 0.078M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) htab-update-p8-131072 13.774 ± 0.129M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB) dynptr-keyed hash map ===================== htab-lookup-p1-131072 3.891 ± 0.008M/s (drops 0.009 ± 0.000M/s, mem 34.908 MiB) htab-lookup-p2-131072 7.467 ± 0.054M/s (drops 0.016 ± 0.000M/s, mem 34.925 MiB) htab-lookup-p4-131072 15.151 ± 0.054M/s (drops 0.030 ± 0.000M/s, mem 34.992 MiB) htab-lookup-p8-131072 29.461 ± 0.448M/s (drops 0.076 ± 0.001M/s, mem 34.910 MiB) htab-update-p1-131072 2.085 ± 0.124M/s (drops 0.000 ± 0.000M/s, mem 34.888 MiB) htab-update-p2-131072 3.278 ± 0.068M/s (drops 0.000 ± 0.000M/s, mem 34.888 MiB) htab-update-p4-131072 6.840 ± 0.100M/s (drops 0.000 ± 0.000M/s, mem 35.023 MiB) htab-update-p8-131072 11.837 ± 0.190M/s (drops 0.000 ± 0.000M/s, mem 34.941 MiB) (2) Use strings in /proc/kallsyms (max_size=82, entries=150K) STR_FILE=kallsyms.txt ./benchs/run_bench_dynptr_key.sh normal hash map =============== htab-lookup-p1-kallsyms.txt 7.201 ± 0.080M/s (drops 0.482 ± 0.005M/s, mem 26.384 MiB) htab-lookup-p2-kallsyms.txt 14.217 ± 0.114M/s (drops 0.951 ± 0.008M/s, mem 26.384 MiB) htab-lookup-p4-kallsyms.txt 29.293 ± 0.141M/s (drops 1.959 ± 0.010M/s, mem 26.384 MiB) htab-lookup-p8-kallsyms.txt 58.406 ± 0.384M/s (drops 3.906 ± 0.026M/s, mem 26.384 MiB) htab-update-p1-kallsyms.txt 3.864 ± 0.036M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB) htab-update-p2-kallsyms.txt 5.757 ± 0.078M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB) htab-update-p4-kallsyms.txt 10.195 ± 0.655M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB) htab-update-p8-kallsyms.txt 18.203 ± 0.165M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB) dynptr-keyed hash map ===================== htab-lookup-p1-kallsyms.txt 7.223 ± 0.007M/s (drops 0.483 ± 0.003M/s, mem 20.993 MiB) htab-lookup-p2-kallsyms.txt 14.350 ± 0.035M/s (drops 0.960 ± 0.004M/s, mem 20.968 MiB) htab-lookup-p4-kallsyms.txt 29.317 ± 0.153M/s (drops 1.960 ± 0.013M/s, mem 20.963 MiB) htab-lookup-p8-kallsyms.txt 58.787 ± 0.662M/s (drops 3.931 ± 0.047M/s, mem 21.018 MiB) htab-update-p1-kallsyms.txt 2.503 ± 0.124M/s (drops 0.000 ± 0.000M/s, mem 20.972 MiB) htab-update-p2-kallsyms.txt 4.622 ± 0.422M/s (drops 0.000 ± 0.000M/s, mem 21.104 MiB) htab-update-p4-kallsyms.txt 8.374 ± 0.149M/s (drops 0.000 ± 0.000M/s, mem 21.027 MiB) htab-update-p8-kallsyms.txt 14.608 ± 0.319M/s (drops 0.000 ± 0.000M/s, mem 21.027 MiB) Signed-off-by: Hou Tao <[email protected]>
Upstream branch: 9af5c78 |
ec4f719
to
b9819dd
Compare
Pull request for series with
subject: Support dynptr key for hash map
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315