Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynptr key for hash map #8405

Open
wants to merge 20 commits into
base: bpf-next_base
Choose a base branch
from

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: Support dynptr key for hash map
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: b420b57
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 0fc5ddd
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: c03320a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: c03320a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 57e71f8
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315
version: 2

Hou Tao added 12 commits January 29, 2025 18:49
Add BPF_DYNPTR in btf_field_type to support bpf_dynptr in map key. The
parsing of bpf_dynptr in btf will be done in the following patch, and
the patch only adds two helpers: btf_new_bpf_dynptr_record() creates an
btf record which only includes a bpf_dynptr and btf_type_is_dynptr()
checks whether the btf_type is a bpf_dynptr or not.

With the introduction of BPF_DYNPTR, BTF_FIELDS_MAX is changed from 11
to 13, therefore, update the hard-coded number in cpumask test as well.

Signed-off-by: Hou Tao <[email protected]>
To support variable-length key or strings in map key, use bpf_dynptr to
represent these variable-length objects and save these bpf_dynptr
fields in the map key. As shown in the examples below, a map key with an
integer and a string is defined:

	struct pid_name {
		int pid;
		struct bpf_dynptr name;
	};

The bpf_dynptr in the map key could also be contained indirectly in a
struct as shown below:

	struct pid_name_time {
		struct pid_name process;
		unsigned long long time;
	};

If the whole map key is a bpf_dynptr, the map could be defined as a
struct or directly using bpf_dynptr as the map key:

	struct map_key {
		struct bpf_dynptr name;
	};

The bpf program could use bpf_dynptr_init() to initialize the dynptr
part in the map key, and the userspace application will use
bpf_dynptr_user_init() or similar API to initialize the dynptr. Just
like kptrs in map value, the bpf_dynptr field in the map key could also
be defined in a nested struct which is contained in the map key struct.

The patch updates map_create() accordingly to parse these bpf_dynptr
fields in map key, just like it does for other special fields in map
value. To enable bpf_dynptr support in map key, the map_type should be
BPF_MAP_TYPE_HASH. For now, the max number of bpf_dynptr in a map key
is limited as 1 and the limitation can be relaxed later.

Signed-off-by: Hou Tao <[email protected]>
get_map_btf() gets the btf from the btf fd and ensure the btf is not a
kernel btf.

Signed-off-by: Hou Tao <[email protected]>
As for now, map_create() calls ->map_alloc_check() and ->map_alloc()
first, then it initializes map btf. In order to support dynptr in map
key, map_create() needs to check whether there is bpf_dynptr in map key
btf type and passes the information to ->map_alloc_check() and
->map_alloc().

However, the case where btf_vmlinux_value_type_id > 0 needs special
handling. The reason is that the probe of struct_ops map in libbpf
doesn't pass a valid btf_fd to map_create syscall, and it expects
->map_alloc() to be invoked before the initialization of the map btf. If
the initialization of the map btf happens before ->map_alloc(), the
probe of struct_ops will fail. To prevent breaking the old libbpf in the
new kernel, the patch only moves the initialization of btf before
->map_alloc_check() for non-struct-ops map case.

Signed-off-by: Hou Tao <[email protected]>
Introduce an internal map flag BPF_F_DYNPTR_IN_KEY to support dynptr in
map key. Add the corresponding helper bpf_map_has_dynptr_key() to check
whether the support of dynptr-key is enabled.

The reason for an internal map flag is twofolds:
1) user doesn't need to set the map flag explicitly
map_create() will use the presence of bpf_dynptr in map key as an
indicator of enabling dynptr key.
2) avoid adding new arguments for ->map_alloc_check() and ->map_alloc()
map_create() needs to pass the supported status of dynptr key to
->map_alloc_check (e.g., check the maximum length of dynptr data size)
and ->map_alloc (e.g., check whether dynptr key fits current map type).
Adding new arguments for these callbacks to achieve that will introduce
too much churns.

Therefore, the patch uses the topmost bit of map_flags as the internal
map flag. map_create() checks whether the internal flag is set in the
beginning and bpf_map_get_info_by_fd() clears the internal flag before
returns the map flags to userspace.

Signed-off-by: Hou Tao <[email protected]>
When there is bpf_dynptr field in the map key btf type or the map key
btf type is bpf_dyntr, set BPF_INT_F_DYNPTR_IN_KEY in map_flags.

Signed-off-by: Hou Tao <[email protected]>
For map with dynptr key support, it needs to use map_extra to specify
the maximum data length of these dynptrs. The implementation of the map
will check whether map_extra is smaller than the limitation imposed by
memory allocation during map creation. It may also use map_extra to
optimize the memory allocation for dynptr.

Signed-off-by: Hou Tao <[email protected]>
It is a preparatory patch for supporting map key with bpf_dynptr in
verifier. The patch splits check_stack_range_initialized() into multiple
small functions and the following patch will reuse these functions to
check whether the access of stack range which contains bpf_dynptr is
valid or not.

Beside the splitting of check_stack_range_initialized(), the patch also
changes its name to check_stack_range_access() to better reflect its
purpose, because the function also allows uninitialized stack range.

Signed-off-by: Hou Tao <[email protected]>
The patch basically does the following three things to enable dynptr key
for bpf map:

1) Only allow PTR_TO_STACK typed register for dynptr key
The main reason is that bpf_dynptr can only be defined in the stack, so
for dynptr key only PTR_TO_STACK typed register is allowed. bpf_dynptr
could also be represented by CONST_PTR_TO_DYNPTR typed register (e.g.,
in callback func or subprog), but it is not supported now.

2) Only allow fixed-offset for PTR_TO_STACK register
Variable-offset for PTR_TO_STACK typed register is disallowed, because
it is impossible to check whether or not the stack access is aligned
with BPF_REG_SIZE and is matched with the location of dynptr or
non-dynptr part in the map key.

3) Check the layout of the stack content is matched with the btf_record
Firstly check the start offset of the stack access is aligned with
BPF_REG_SIZE, then check the offset and the size of dynptr/non-dynptr
parts in the stack range is consistent with the btf_record of the map
key.

Signed-off-by: Hou Tao <[email protected]>
For bpf map with dynptr key support, the userspace application will use
bpf_dynptr_user to represent the bpf_dynptr in the map key and pass it
to bpf syscall. The bpf syscall will copy from bpf_dynptr_user to
construct a corresponding bpf_dynptr_kern object when the map key is an
input argument, and copy to bpf_dynptr_user from a bpf_dynptr_kern
object when the map key is an output argument.

For now the size of bpf_dynptr_user must be the same as bpf_dynptr, but
the last u32 field is not used, so make it a reserved field.

Signed-off-by: Hou Tao <[email protected]>
Introduce bpf_copy_from_dynptr_ukey() helper to handle map key with
bpf_dynptr when the map key is used in map lookup, update, delete and
get_next_key operations.

The helper places all variable-length data of these bpf_dynptr_user
objects at the end of the map key to simplify the allocation and the
freeing of map key with dynptr.

Signed-off-by: Hou Tao <[email protected]>
For get_next_key operation, unext_key is used as an output argument.
When there is dynptr in map key, unext_key will also be used as an input
argument, because the userspace application needs to pre-allocate a
buffer for each variable-length part in the map key and save the
length and the address of these buffers in bpf_dynptr_user objects.

To support get_next_key op for map with dynptr key, map_get_next_key()
first calls bpf_copy_from_dynptr_ukey() to construct a map key in which
each bpf_dynptr_kern object has the same size as the corresponding
bpf_dynptr_user object. It then calls ->map_get_next_key() to get the
next_key, and finally calls bpf_copy_to_dynptr_ukey() to copy both the
non-dynptr part and dynptr part in the map key to unext_key.

Signed-off-by: Hou Tao <[email protected]>
Hou Tao added 8 commits January 29, 2025 18:49
The patch supports lookup, update, delete and lookup_delete operations
for hash map with dynptr map. There are two major differences between
the implementation of normal hash map and dynptr-keyed hash map:

1) dynptr-keyed hash map doesn't support pre-allocation.
The reason is that the dynptr in map key is allocated dynamically
through bpf mem allocator. The length limitation for these dynptrs is
4088 bytes now. Because there dynptrs are allocated dynamically, the
consumption of memory will be smaller compared with normal hash map when
there are big differences between the length of these dynptrs.

2) the freed element in dynptr-key map will not be reused immediately
For normal hash map, the freed element may be reused immediately by the
newly-added element, so the lookup may return an incorrect result due to
element deletion and element reuse. However dynptr-key map could not do
that, there are pointers (dynptrs) in the map key and the updates of
these dynptrs are not atomic: both the address and the length of the
dynptr will be updated. If the element is reused immediately, the access
of the dynptr in the freed element may incur invalid memory access due
to the mismatch between the address and the size of dynptr, so reuse the
freed element after one RCU grace period.

Beside the differences above, dynptr-keyed hash map also needs to handle
the maybe-nullified dynptr in the map key.

After the support of dynptr key in hash map, the performance of lookup
and update/delete operations in map_perf_test degrades a lot. Marking
lookup_nulls_elem_raw() and lookup_elem_raw() as always_inline will
narrow the gap from 21%/7% to 4%/2%. Therefore, the patch also adds
always_inline for these two hot functions. The following lines show the
detailed performance numbers:

before patch:
0:hash_map_perf kmalloc 693450 events per sec
0:hash_lookup 89366531 lookups per sec

after patch (without always_inline):
0:hash_map_perf kmalloc 650396 events per sec
0:hash_lookup 73961003 lookups per sec

after patch:
0:hash_map_perf kmalloc 665317 events per sec
0:hash_lookup 87842644 lookups per sec

Signed-off-by: Hou Tao <[email protected]>
It will be used by the following patch to shrink the size of dynptr when
the actual data length is smaller than the size of dynptr during
map_get_next_key operation.

Signed-off-by: Hou Tao <[email protected]>
It firstly passed the key_record to htab_map_hash() and
lookup_nulls_eleme_raw() to find the target key, then it uses
htab_copy_dynptr_key() helper to copy from the target key to the next
key used for output.

Signed-off-by: Hou Tao <[email protected]>
Both batched map operation and dumping the map content through bpffs for
maps with dynptr keys are not supported, so disable these operations for
now.

Signed-off-by: Hou Tao <[email protected]>
Enable BPF_INT_F_DYNPTR_IN_KEY in HTAB_CREATE_FLAG_MASK to support the
creation of hash map with dynptr key.

Signed-off-by: Hou Tao <[email protected]>
Add bpf_dynptr_user_init() to initialize a bpf_dynptr_user object. It
will be used test_progs and bench. User can dereference the {data|size}
fields directly to get the address and length of the dynptr object.

Signed-off-by: Hou Tao <[email protected]>
Add three positive test cases to test the basic operations on the
dynptr-keyed hash map. The basic operations include lookup, update,
delete and get_next_key. These operations are exercised both through
bpf syscall and bpf program. These three test cases use different map
keys. The first test case uses both bpf_dynptr and a struct with only
bpf_dynptr as map key, the second one uses a struct with an integer and
a bpf_dynptr as map key, and the last one use a struct with bpf_dynptr
being nested in another struct as map key.

Also add multiple negative test cases for dynptr-keyed hash map. These
test cases mainly check whether the layout of dynptr and non-dynptr in
the stack is matched with the definition of map->key_record.

Signed-off-by: Hou Tao <[email protected]>
The patch adds a benchmark test to compare the lookup and update/delete
performance between normal hash map and dynptr-keyed hash map. It also
compares the memory usage of these two maps after fill up these two
maps.

The benchmark simulates the case when the map key is composed of a
8-bytes integer and a variable-size string. Now the integer just saves
the length of the string. These strings will be randomly generated by
default, and they can also be specified by a external file (e.g., the
output from awk '{print $3}' /proc/kallsyms).

The key definitions for dynptr-keyed and normal hash map are defined as
shown below:

struct dynptr_key {
	__u64 cookie;
	struct bpf_dynptr desc;
}

struct norm_key {
	__u64 cookie;
	char desc[MAX_STR_SIZE];
};

The lookup or update procedure will first lookup an array to get the key
of hash map. The returned value from the array is the same as norm_key
definition. For normal hash map, it will use the returned value to
manipulate the hash map directly. For dynptr-keyed hash map, it will
construct a bpf_dynptr object from the returned value (the value of
cookie is the same as the string length), then passes the key to
dynptr-keyed hash map. Because the lookup procedure is lockless,
therefore, each producer during lookup test will lookup the whole hash
map. However, update and deletion procedures have lock, therefore, each
producer during update test only updates different part of the hash map.

The following is the benchmark results when running the benchmark under a
8-CPUs VM:

(1) Randomly generate 128K strings (max_size=256, entries=128K)

ENTRIES=131072 ./benchs/run_bench_dynptr_key.sh

normal hash map
===============
htab-lookup-p1-131072 2.977 ± 0.017M/s (drops 0.006 ± 0.000M/s, mem 64.984 MiB)
htab-lookup-p2-131072 6.033 ± 0.048M/s (drops 0.015 ± 0.000M/s, mem 64.966 MiB)
htab-lookup-p4-131072 11.612 ± 0.063M/s (drops 0.026 ± 0.000M/s, mem 64.984 MiB)
htab-lookup-p8-131072 22.918 ± 0.315M/s (drops 0.055 ± 0.001M/s, mem 64.966 MiB)
htab-update-p1-131072 2.121 ± 0.014M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB)
htab-update-p2-131072 4.138 ± 0.047M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB)
htab-update-p4-131072 7.378 ± 0.078M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB)
htab-update-p8-131072 13.774 ± 0.129M/s (drops 0.000 ± 0.000M/s, mem 64.986 MiB)

dynptr-keyed hash map
=====================
htab-lookup-p1-131072 3.891 ± 0.008M/s (drops 0.009 ± 0.000M/s, mem 34.908 MiB)
htab-lookup-p2-131072 7.467 ± 0.054M/s (drops 0.016 ± 0.000M/s, mem 34.925 MiB)
htab-lookup-p4-131072 15.151 ± 0.054M/s (drops 0.030 ± 0.000M/s, mem 34.992 MiB)
htab-lookup-p8-131072 29.461 ± 0.448M/s (drops 0.076 ± 0.001M/s, mem 34.910 MiB)
htab-update-p1-131072 2.085 ± 0.124M/s (drops 0.000 ± 0.000M/s, mem 34.888 MiB)
htab-update-p2-131072 3.278 ± 0.068M/s (drops 0.000 ± 0.000M/s, mem 34.888 MiB)
htab-update-p4-131072 6.840 ± 0.100M/s (drops 0.000 ± 0.000M/s, mem 35.023 MiB)
htab-update-p8-131072 11.837 ± 0.190M/s (drops 0.000 ± 0.000M/s, mem 34.941 MiB)

(2) Use strings in /proc/kallsyms (max_size=82, entries=150K)

STR_FILE=kallsyms.txt ./benchs/run_bench_dynptr_key.sh

normal hash map
===============
htab-lookup-p1-kallsyms.txt 7.201 ± 0.080M/s (drops 0.482 ± 0.005M/s, mem 26.384 MiB)
htab-lookup-p2-kallsyms.txt 14.217 ± 0.114M/s (drops 0.951 ± 0.008M/s, mem 26.384 MiB)
htab-lookup-p4-kallsyms.txt 29.293 ± 0.141M/s (drops 1.959 ± 0.010M/s, mem 26.384 MiB)
htab-lookup-p8-kallsyms.txt 58.406 ± 0.384M/s (drops 3.906 ± 0.026M/s, mem 26.384 MiB)
htab-update-p1-kallsyms.txt 3.864 ± 0.036M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB)
htab-update-p2-kallsyms.txt 5.757 ± 0.078M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB)
htab-update-p4-kallsyms.txt 10.195 ± 0.655M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB)
htab-update-p8-kallsyms.txt 18.203 ± 0.165M/s (drops 0.000 ± 0.000M/s, mem 26.387 MiB)

dynptr-keyed hash map
=====================
htab-lookup-p1-kallsyms.txt 7.223 ± 0.007M/s (drops 0.483 ± 0.003M/s, mem 20.993 MiB)
htab-lookup-p2-kallsyms.txt 14.350 ± 0.035M/s (drops 0.960 ± 0.004M/s, mem 20.968 MiB)
htab-lookup-p4-kallsyms.txt 29.317 ± 0.153M/s (drops 1.960 ± 0.013M/s, mem 20.963 MiB)
htab-lookup-p8-kallsyms.txt 58.787 ± 0.662M/s (drops 3.931 ± 0.047M/s, mem 21.018 MiB)
htab-update-p1-kallsyms.txt 2.503 ± 0.124M/s (drops 0.000 ± 0.000M/s, mem 20.972 MiB)
htab-update-p2-kallsyms.txt 4.622 ± 0.422M/s (drops 0.000 ± 0.000M/s, mem 21.104 MiB)
htab-update-p4-kallsyms.txt 8.374 ± 0.149M/s (drops 0.000 ± 0.000M/s, mem 21.027 MiB)
htab-update-p8-kallsyms.txt 14.608 ± 0.319M/s (drops 0.000 ± 0.000M/s, mem 21.027 MiB)

Signed-off-by: Hou Tao <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9af5c78
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=928315
version: 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants