Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel-5.10,5.15: Backport of upstream commit bpf: Adjust insufficient default bpf_jit_limit #3002

Merged
merged 1 commit into from
Apr 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
kernel-5.10,5.15: Backport of upstream commit `bpf: Adjust insufficient
default bpf_jit_limit`

Backport upstream commit to adjust the memory size available to the bpf
jit. While we will pick this patch up in due curse once we update kernel
to 5.10.177 and 5.15.105, pick them up for now to ensure our next
release has this one fixed.

Signed-off-by: Leonard Foerster <[email protected]>
  • Loading branch information
foersleo committed Apr 11, 2023
commit 819f78704040df2e12428ec56f21de15b085368b
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
From a4bbab27c4bf69486f5846d44134eb31c37e9b22 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <[email protected]>
Date: Mon, 20 Mar 2023 15:37:25 +0100
Subject: [PATCH] bpf: Adjust insufficient default bpf_jit_limit

[ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ]

We've seen recent AWS EKS (Kubernetes) user reports like the following:

After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
clusters after a few days a number of the nodes have containers stuck
in ContainerCreating state or liveness/readiness probes reporting the
following error:

Readiness probe errored: rpc error: code = Unknown desc = failed to
exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
OCI runtime exec failed: exec failed: unable to start container process:
unable to init seccomp: error loading seccomp filter into kernel:
error loading seccomp filter: errno 524: unknown

However, we had not been seeing this issue on previous AMIs and it only
started to occur on v20230217 (following the upgrade from kernel 5.4 to
5.10) with no other changes to the underlying cluster or workloads.

We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
which helped to immediately allow containers to be created and probes to
execute but after approximately a day the issue returned and the value
returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <[email protected]>
Reported-by: Lefteris Alexakis <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/bpf/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 73d4b1e32fbd..d3f6a070875c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -826,7 +826,7 @@ static int __init bpf_jit_charge_init(void)
{
/* Only used as heuristic here to derive limit. */
bpf_jit_limit_max = bpf_jit_alloc_exec_limit();
- bpf_jit_limit = min_t(u64, round_up(bpf_jit_limit_max >> 2,
+ bpf_jit_limit = min_t(u64, round_up(bpf_jit_limit_max >> 1,
PAGE_SIZE), LONG_MAX);
return 0;
}
--
2.39.2

2 changes: 2 additions & 0 deletions packages/kernel-5.10/kernel-5.10.spec
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Source103: config-bottlerocket-vmware
Patch1001: 1001-Makefile-add-prepare-target-for-external-modules.patch
# Enable INITRAMFS_FORCE config option for our use case.
Patch1002: 1002-initramfs-unlink-INITRAMFS_FORCE-from-CMDLINE_-EXTEN.patch
# Backport of bpf jit limit adjustments, see https://github.com/awslabs/amazon-eks-ami/issues/1179
Patch1003: 1003-bpf-Adjust-insufficient-default-bpf_jit_limit.patch

# Add zstd support for compressed kernel modules
Patch2000: 2000-kbuild-move-module-strip-compression-code-into-scrip.patch
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
From 54869daa6a437887614274f65298ba44a3fac63a Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <[email protected]>
Date: Mon, 20 Mar 2023 15:37:25 +0100
Subject: [PATCH] bpf: Adjust insufficient default bpf_jit_limit

[ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ]

We've seen recent AWS EKS (Kubernetes) user reports like the following:

After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
clusters after a few days a number of the nodes have containers stuck
in ContainerCreating state or liveness/readiness probes reporting the
following error:

Readiness probe errored: rpc error: code = Unknown desc = failed to
exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
OCI runtime exec failed: exec failed: unable to start container process:
unable to init seccomp: error loading seccomp filter into kernel:
error loading seccomp filter: errno 524: unknown

However, we had not been seeing this issue on previous AMIs and it only
started to occur on v20230217 (following the upgrade from kernel 5.4 to
5.10) with no other changes to the underlying cluster or workloads.

We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
which helped to immediately allow containers to be created and probes to
execute but after approximately a day the issue returned and the value
returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <[email protected]>
Reported-by: Lefteris Alexakis <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/bpf/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index cea0d1296599..f7c27c1cc593 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -829,7 +829,7 @@ static int __init bpf_jit_charge_init(void)
{
/* Only used as heuristic here to derive limit. */
bpf_jit_limit_max = bpf_jit_alloc_exec_limit();
- bpf_jit_limit = min_t(u64, round_up(bpf_jit_limit_max >> 2,
+ bpf_jit_limit = min_t(u64, round_up(bpf_jit_limit_max >> 1,
PAGE_SIZE), LONG_MAX);
return 0;
}
--
2.39.2

2 changes: 2 additions & 0 deletions packages/kernel-5.15/kernel-5.15.spec
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Patch1001: 1001-Makefile-add-prepare-target-for-external-modules.patch
Patch1002: 1002-Revert-kbuild-hide-tools-build-targets-from-external.patch
# Enable INITRAMFS_FORCE config option for our use case.
Patch1003: 1003-initramfs-unlink-INITRAMFS_FORCE-from-CMDLINE_-EXTEN.patch
# Backport of bpf jit limit adjustments, see https://github.com/awslabs/amazon-eks-ami/issues/1179
Patch1004: 1004-bpf-Adjust-insufficient-default-bpf_jit_limit.patch

BuildRequires: bc
BuildRequires: elfutils-devel
Expand Down