Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2413: Graduate SeccompDefault to stable #3718

Merged
merged 1 commit into from
Feb 8, 2023

Conversation

saschagrunert
Copy link
Member

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 10, 2023
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 10, 2023
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 10, 2023
@saschagrunert saschagrunert force-pushed the seccomp-default-ga branch 2 times, most recently from 9ffc575 to 1474c2a Compare January 10, 2023 10:29
@bart0sh
Copy link
Contributor

bart0sh commented Jan 11, 2023

/lgtm
/assign @derekwaynecarr @mrunalp @dchen1107

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 11, 2023
@mrunalp
Copy link
Contributor

mrunalp commented Jan 26, 2023

/lgtm
/approve

@mrunalp
Copy link
Contributor

mrunalp commented Jan 26, 2023

@derekwaynecarr ptal

@derekwaynecarr
Copy link
Member

From what I understand, no actual code changes are required as part of this move.

/approve
/lgtm

@saschagrunert
Copy link
Member Author

@kubernetes/production-readiness PTAL

@tallclair
Copy link
Member

Github doesn't let me comment on arbitrary lines, so I'm leaving some general feedback here:

  1. What is the behavior if you specify the profile RuntimeDefault with a privileged: true container? I think I'd expect the default to be unconfined with a privileged pod, but to enforce the RuntimeDefault profile if I explicitly specify it. It looks like Kubelet will still pass RuntimeDefault to the CRI in this case. What do containerd & crio do in this case?
  2. It's unfortunate that this approach doesn't work well with Pod Security Admission. Without global cluster configuration and/or visibility, the restricted PSS profile will still need to require an explicit Seccomp profile. Any ideas for addressing this?

From a PRR perspective, this LGTM.

@tallclair tallclair self-assigned this Feb 2, 2023
@saschagrunert
Copy link
Member Author

Hey @tallclair, thank you for raising those questions!

  1. What is the behavior if you specify the profile RuntimeDefault with a privileged: true container? I think I'd expect the default to be unconfined with a privileged pod, but to enforce the RuntimeDefault profile if I explicitly specify it. It looks like Kubelet will still pass RuntimeDefault to the CRI in this case. What do containerd & crio do in this case?

CRI-O and containerd will still apply unconfined (aka nothing), because the seccomp paths are guarded by the privileged check:

https://github.com/cri-o/cri-o/blob/cda07c8ac0b2b53397a286e7b540394c9f1358db/server/container_create_linux.go#L622-L631

https://github.com/containerd/containerd/blob/6ed24c88ed0530aa8d4483b13a217172365f61cc/pkg/cri/server/container_create_linux.go#L464-L467

  1. It's unfortunate that this approach doesn't work well with Pod Security Admission. Without global cluster configuration and/or visibility, the restricted PSS profile will still need to require an explicit Seccomp profile. Any ideas for addressing this?

From a PRR perspective, this LGTM.

I was thinking back and forth about this feature about making it API aware. The fact that we designed it as kubelet feature makes it easily configurable for cluster admins on a per-node basis, but also less visible to end users. If this feature is stable and we have a good user base adopting it (let's say 2-3 released), then we could think about introducing a similar feature in the API server.

@tallclair
Copy link
Member

Thanks. /lgtm

@bobbypage
Copy link
Member

bobbypage commented Feb 2, 2023

Just to confirm, with this graduating to stable, users will still need to opt-in by configuring their kubelets with --seccomp-default? Is it correct?

I ask since I am aware of some performance regressions when running with seccomp enabled with some workloads, specifically with spectre mitigations. There were some discussions around disabling those mitigations with seccomp on the container runtime side, but I don't think they were completed. Namely:

It also looks like Linux 5.16 has changed some of the defaults regarding seccomp with this patch (xref: https://lore.kernel.org/lkml/[email protected]/)

I've been recommending to everyone to use
"spec_store_bypass_disable=prctl spectre_v2_user=prctl" for a while
now. I already recommend to Yifei too a few months ago when he first
found out of the huge seccomp regression when he upgraded his codebase
to the upstream kernel with both STIBP/SSBD enabled in seccomp jails.

I haven't tested the perf changes with that patch, but it may be something we should document, as there is no stable kernel release with that patch yet, so most users will not have it and possibly have performance degradations as a result.

@deads2k
Copy link
Contributor

deads2k commented Feb 2, 2023

/approve
/hold

holding for and answer to @bobbypage's question. I don't think an already fixed performance degradation changes the PRR answers, but it's worth addressing.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 2, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, derekwaynecarr, mrunalp, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2023
@saschagrunert
Copy link
Member Author

Just to confirm, with this graduating to stable, users will still need to opt-in by configuring their kubelets with --seccomp-default? Is it correct?

No, the main goal of the graduation is to switch --seccomp-default=true per default:
https://github.com/kubernetes/enhancements/pull/3718/files#diff-0509bda37cb3b099fee004ae9dfcaf2029ce1370d81f3f3931a67dff8ec26997R189

Users should now configure their kubelets to opt-out either by using the feature gate or the configuration option if they don't want to use it.

I haven't tested the perf changes with that patch, but it may be something we should document, as there is no stable kernel release with that patch yet, so most users will not have it and possibly have performance degradations as a result.

I agree we should document the performance impact of seccomp on k/website, I can take care of that during the graduation of the feature.

@mrunalp
Copy link
Contributor

mrunalp commented Feb 3, 2023

I am 👍 on enabling by default. This is the performance vs. security tradeoff and we can choose security while giving admins opt out if they prefer performance. (A bunch of CVEs would have been avoided if seccomp were enabled by default.)
cc: @derekwaynecarr

@dchen1107
Copy link
Member

I am ok with the default on from the security perspective. But as pointed by @bobbypage at #3718 (comment), we should

  1. Documented the performance impact and how to disable the feature
  2. Mention the potential performance impact on the release note

@mrunalp
Copy link
Contributor

mrunalp commented Feb 3, 2023

Some work that went into runc for perf - opencontainers/runc#3588

@mrunalp
Copy link
Contributor

mrunalp commented Feb 3, 2023

From the kernel link:

- runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s
  and podman have a default json seccomp allowlist that cannot be
  slowed down, so for the #1 seccomp user this change is already a
  noop.

@bobbypage
Copy link
Member

bobbypage commented Feb 4, 2023

Thanks all for discussion. I performed a bit of adhoc testing:

$ gcloud compute instances create cos-dev --machine-type=n2-standard-4     --image-family="cos-101-lts"     --image-project="cos-cloud"     --zone=us-central1-c

cos-dev ~ # uname -a
Linux cos-dev2 5.15.65+ #1 SMP Sat Jan 21 10:12:05 UTC 2023 x86_64 Intel(R) Xeon(R) CPU @ 2.80GHz GenuineIntel GNU/Linux

cos-dev ~ # cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=44456f0e9d2cd7a9616fb0d05bc4020237839a5a
GOOGLE_CRASH_ID=Lakitu
VERSION=101
VERSION_ID=101
BUILD_ID=17162.40.56

This is latest COS OS, with latest 5.15 stable kernel. I ran pybench benchmark as recommend in this article. I'm not sure how representative this benchmark is, but it seems like a common python bechmark:

## With seccomp
cos-dev1 ~ # docker run --rm     -v `pwd`:/root/results     ljishen/pybench     bench /root/results/output
Totals:                           2773ms   2798ms
 
## Disable Seccomp
cos-dev1 ~ # docker run --rm  --security-opt seccomp=unconfined   -v `pwd`:/root/results     ljishen/pybench     bench /root/results/output
Totals:                           1964ms   1981ms

So with seccomp enabled out of the box, the perf impact is pretty significant, ~ 1.5x worse.

Applying the kernel command line changes as was done in the above kernel patch of setting: (spec_store_bypass_disable=prctl spectre_v2_user=prctl):

## With seccomp, but spectre mitigations disabled
cos-dev1 ~ # docker run --rm     -v `pwd`:/root/results     ljishen/pybench     bench /root/results/output
Totals:                           1955ms   1975ms

The results are much better and equivalent to running without seccomp before.

The kernel patch will change defaults to make it the default in newer kernels but it was not back ported to the latest 5.15 kernel yet. The runc patches will also accomplish the same effect of disabling those mitigations, but will require folks to use the newer runc.

I'm also supportive to enable this for security improvements, and it's great to see we have mitigation in place for perf impact, but we need to communicate it clearly to users that they should either:

(1) update to newer kernel with that patch
(2) change their kernel command line
(3) update to latest runc

... to ensure they will not be impacted by any possible perf regressions.

We now graduate `SeccompDefault` to stable by basically making the feature
enabled by default.

Signed-off-by: Sascha Grunert <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2023
@saschagrunert
Copy link
Member Author

Thank you for all the inputs, I added a new section to the "Beta to GA Graduation" to contain all the documentation bits.

@saschagrunert
Copy link
Member Author

@mrunalp @bobbypage @derekwaynecarr @dchen1107 can we merge this one now? I don't see any outstanding points left.

@mrunalp
Copy link
Contributor

mrunalp commented Feb 7, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2023
@mrunalp
Copy link
Contributor

mrunalp commented Feb 7, 2023

@deads2k Are you okay removing the hold?

@deads2k
Copy link
Contributor

deads2k commented Feb 8, 2023

@deads2k Are you okay removing the hold?

Yes, thank you for the updates.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2023
@k8s-ci-robot k8s-ci-robot merged commit b6ab8a0 into kubernetes:master Feb 8, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 8, 2023
@saschagrunert saschagrunert deleted the seccomp-default-ga branch February 9, 2023 07:45
@pacoxu pacoxu mentioned this pull request Jan 22, 2025
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

9 participants