Jail/chroot nginx process inside controller container #8337

rikatz · 2022-03-15T03:02:39Z

What this PR does / why we need it:

Isolate nginx program into its own userns inside the main controller container.

Using this technique, we can provide a sandbox inside a sandbox (beautiful, uh) so nginx program wont have access to sensitive files from the main container.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Which issue/s this PR fixes

So many...

How Has This Been Tested?

Generated chroot container and ran inside a Kubernetes cluster (simple approach)
Next step is to change the e2e tests and point them all to use the chrooted container instead of the non sandboxed container

TODO (Don't merge it before we pass through the TODO!!!)

Plan the BETA for this, as this may be a breaking change
We need to make sure it's clear that:
- We are adding a new sec capability to the container, allowing it to do a unshare/create a new netns inside of it
- If the admin adds / enables the seccomp feature with default profiles, ingress may stop working. This new approach NEEDS seccomp enabling "clone" syscalls

rikatz · 2022-03-15T03:03:16Z

/kind bug
/priority important-soon
/triage accepted

tao12345666333 · 2022-03-15T03:40:51Z

wow! Great!

longwuyuan · 2022-03-15T06:01:46Z

This is definitely such a beautiful direction to take. Next dream is if nginx process or another process makes a unneccessary system call, then it gets SIGKILL. Even bigger dream is declare the sys calls a process is going to make and abide by it. Thanks, ; Long Wu Yuan

…

On 15-Mar-2022, at 9:11 AM, Jintao Zhang ***@***.***> wrote: wow! Great! — Reply to this email directly, view it on GitHub <#8337 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGZVWVHAYUVHJTB5DQY52TVAABFLANCNFSM5QXLMBPQ>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.

rikatz · 2022-03-15T17:07:24Z

rootfs/nginx-chroot-wrapper.sh

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+cat /etc/resolv.conf > /chroot/etc/resolv.conf


nope, because the resolv.conf is generated on runtime (when the Pod is bootstrapped) and may change from cluster to cluster :D

tallclair · 2022-03-15T20:32:17Z

Is there a reason the controller can't be separated out into a different container? Naively that seems like it would be a lot simpler than jailing it within the same container.

rikatz · 2022-03-15T22:12:34Z

Is there a reason the controller can't be separated out into a different container? Naively that seems like it would be a lot simpler than jailing it within the same container.

Hey @tallclair so let me try to clarify why we didn't followed this split, even this being actually our first attempt:

One of the main problems is serviceAccount extraction. Mounting the serviceAccount token is a per pod, and not a per container configuration (I guess, at least I couldn't find a way to say "don't mount this SA on this container only"). So, even if we split the containers, the SA will still be there, usable if someone can add arbitrary configs to read the container FS. BTW, I can write a KEP on this to have automountServiceAccount per POD, although I promised myself to be away from KEPs for the next 3 years ;P
Based on that, another approach was: "let's disable the SA, and mount/project a specific SA in a specific container" (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#serviceaccounttokenprojection-v1-core). This works "partially". There are some weird workaround that needs to be made so client-go can use the token and behave as "inCluster". I'm not saying this is not impossible, I could make it work, but there are so many changes required just to use this token that it was a huge effort to change it.
Now, considering that we could mount the SA only on one container and not on the other (based on any of the approaches above), there are some restriction in the way ingress-nginx was built that turns into a lot more code changes. Starting with the process monitoring. The ingress-controller process monitors, via a channel open with the nginx process start, if the process is respawning. To be more broad, ingress-controller process not only is responsible for that, but for all the control of the process. It uses nginx -t to test if a new template is valid. It reloads. It stops when requested to stop. So it needs not only to share the same PID namespace, but also to have access to that same binary. One thing we started to discuss here was to move all this process logic to another Go program, and receive the commands via a REST API or gRPC (like ingress controller calls the endpoint to start, stop, restart the container). This would take at least 15 days of implementation, as I'm not fully focused on ingress-nginx :)
We can share the PID NS between both containers without any problem as well. But there are some caveats on that restart/reload/stop/etc process that wasn't that easy

Thinking on the same approach kube-proxyv2 is taking (aka kpng), we started to discuss about this as an opportunity to split CP/DP using a single gRPC controller and multiple proxies talking only with the controller and getting the right information. This is, for us, the new ingress-nginx and the desired state of art. But, again, at least myself stopped on the amount of time I have to develop vs amount of time required to make it work :)

So, we didn't either wanted to stop on the promise of a new implementation or a hard workaround, after all this needed to be fixed fast. And this is why the chroot/jail solution came on a weekend discussion and was implemented in like 4 hours :) I've spent some time as well researching the previous solutions, but all of them needed deep changes in the code not as easy as it seems (I though it was going to be easy the first time as well!)

Makes sense? :)

tallclair · 2022-03-29T01:00:49Z

@rikatz thanks for the detailed response.

I can write a KEP on this to have automountServiceAccount per POD

I might be misunderstanding you, but there's already a pod-level AutomountServiceAccountToken field: https://github.com/kubernetes/api/blob/master/core/v1/types.go#L3134-L3136

There are some weird workaround that needs to be made so client-go can use the token and behave as "inCluster".

Do you have any more details on what went wrong? I'm guessing you're missing some of the other projected data. Easiest thing is to just copy the projected volume from a pod with the SA auto mounted:

  - name: kube-api-access-7g4tx
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

So it needs not only to share the same PID namespace, but also to have access to that same binary.

I'm sure I don't understand all the nuances here, but if the issue is just that the controller needs access to the nginx binary, then you could consider just putting everything you need in a single container image, and run both containers off the same image (just with a different entrypoint / command).

We can share the PID NS between both containers without any problem as well. But there are some caveats on that restart/reload/stop/etc process that wasn't that easy

Yeah, that makes sense. I think stop should be straightforward? In place of restart, could you just have something that monitors the process (e.g. systemd, or just rely on the kubelet) and restarts it as soon as it's killed?

I'm just guessing here... I can see why you might need the gRPC interface instead.

On a separate topic, you might want to have a look at AppArmor hats, which are a way to change the AppArmor context, designed to change privileges on the fly. It might be a good fit for this use case (but only works on AppArmor systems).

tao12345666333 · 2022-04-07T02:10:58Z

charts/ingress-nginx/templates/controller-daemonset.yaml

@@ -74,7 +74,7 @@ spec:
      containers:
        - name: {{ .Values.controller.containerName }}
          {{- with .Values.controller.image }}
-          image: "{{- if .repository -}}{{ .repository }}{{ else }}{{ .registry }}/{{ .image }}{{- end -}}:{{ .tag }}{{- if (.digest) -}} @{{.digest}} {{- end -}}"
+          image: "{{- if .repository -}}{{ .repository }}{{ else }}{{ .registry }}/{{ include "ingress-nginx.image" . }}{{- end -}}:{{ .tag }}{{- if (.digest) -}} @{{.digest}} {{- end -}}"


I understand that currently, we can complete image name conversion through helper functions, but once chroot is enabled, should we cancel digest?

The expected behavior is that we use a configuration item to allow users to switch images without perception, but digests are not the same. If the user configures digests, the image download will fail.

nope, I think we may have digest for chroot as well. This can be another argument, it's fine. Because we don't have digest yet (no image published) I forgot this, but we can fix this on a followup

tao12345666333 · 2022-04-07T02:19:38Z

internal/ingress/controller/nginx.go

@@ -61,6 +59,7 @@ import (
 	"k8s.io/ingress-nginx/internal/nginx"
 	"k8s.io/ingress-nginx/internal/task"
 	"k8s.io/ingress-nginx/internal/watch"
+	"k8s.io/klog/v2"


nit: keep the previous import order

🤦 fixing

rikatz · 2022-04-09T03:41:04Z

@tao12345666333 fixed the helm thing, please take a look into it

tao12345666333

I think this is great! If you want to merge this PR at any time please remove the hold

/lgtm
/hold

k8s-ci-robot · 2022-04-09T03:47:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rikatz, tao12345666333

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rikatz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rikatz · 2022-04-09T04:12:51Z

/hold cancel
So I can sleep fine and start working on another part tomorrow :)

* Initial work on chrooting nginx process * More improvements in chroot * Fix charts and some file locations * Fix symlink on non chrooted container * fix psp test * Add e2e tests to chroot image * Fix logger * Add internal logger in controller * Fix overlay for chrooted tests * Fix tests * fix boilerplates * Fix unittest to point to the right pid * Fix PR review

k8s-ci-robot requested review from ChiefAlexander and strongjz March 15, 2022 03:03

k8s-ci-robot added area/helm Issues or PRs related to helm charts area/lua Issues or PRs related to lua code approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 15, 2022

rikatz force-pushed the chroot-nginx-proc branch 2 times, most recently from fe42bc2 to b61ae2c Compare March 15, 2022 03:10

ElvinEfendi reviewed Mar 15, 2022

View reviewed changes

rikatz force-pushed the chroot-nginx-proc branch 4 times, most recently from 56ffbfa to d2557a3 Compare March 21, 2022 02:52

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 1, 2022

tao12345666333 reviewed Apr 7, 2022

View reviewed changes

rikatz added 12 commits April 9, 2022 00:21

Initial work on chrooting nginx process

1e4dbdd

More improvements in chroot

2275ac5

Fix charts and some file locations

916e915

Fix symlink on non chrooted container

2f3a211

fix psp test

6386a91

Add e2e tests to chroot image

19a9c06

Fix logger

072cf27

Add internal logger in controller

75fcc1d

Fix overlay for chrooted tests

bfab29d

Fix tests

b4488bf

fix boilerplates

495f833

Fix unittest to point to the right pid

ce42045

rikatz force-pushed the chroot-nginx-proc branch from b0aec45 to 875a86f Compare April 9, 2022 03:38

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 9, 2022

Fix PR review

2718ca6

rikatz force-pushed the chroot-nginx-proc branch from 875a86f to 2718ca6 Compare April 9, 2022 03:39

tao12345666333 approved these changes Apr 9, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2022

k8s-ci-robot assigned tao12345666333 Apr 9, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2022

k8s-ci-robot merged commit 3def835 into kubernetes:main Apr 9, 2022

github-actions bot mentioned this pull request May 6, 2022

Blog: Increasing the security bar in Ingress-NGINX v1.2.0 rajula96reddy/rss-issue-action-test#20

Open

mac-chaffee mentioned this pull request May 26, 2022

Documentation for updating seccomp policy to allow chroot #8640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jail/chroot nginx process inside controller container #8337

Jail/chroot nginx process inside controller container #8337

rikatz commented Mar 15, 2022

rikatz commented Mar 15, 2022

tao12345666333 commented Mar 15, 2022

longwuyuan commented Mar 15, 2022 via email

This comment was marked as resolved.

rikatz Mar 15, 2022

tallclair commented Mar 15, 2022

rikatz commented Mar 15, 2022

tallclair commented Mar 29, 2022

tao12345666333 Apr 7, 2022

rikatz Apr 9, 2022

tao12345666333 Apr 9, 2022

tao12345666333 Apr 7, 2022

rikatz Apr 9, 2022

rikatz commented Apr 9, 2022

tao12345666333 left a comment

k8s-ci-robot commented Apr 9, 2022

rikatz commented Apr 9, 2022

Jail/chroot nginx process inside controller container #8337

Jail/chroot nginx process inside controller container #8337

Conversation

rikatz commented Mar 15, 2022

What this PR does / why we need it:

Types of changes

Which issue/s this PR fixes

How Has This Been Tested?

TODO (Don't merge it before we pass through the TODO!!!)

rikatz commented Mar 15, 2022

tao12345666333 commented Mar 15, 2022

longwuyuan commented Mar 15, 2022 via email

This comment was marked as resolved.

rikatz Mar 15, 2022

Choose a reason for hiding this comment

tallclair commented Mar 15, 2022

rikatz commented Mar 15, 2022

tallclair commented Mar 29, 2022

tao12345666333 Apr 7, 2022

Choose a reason for hiding this comment

rikatz Apr 9, 2022

Choose a reason for hiding this comment

tao12345666333 Apr 9, 2022

Choose a reason for hiding this comment

tao12345666333 Apr 7, 2022

Choose a reason for hiding this comment

rikatz Apr 9, 2022

Choose a reason for hiding this comment

rikatz commented Apr 9, 2022

tao12345666333 left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 9, 2022

rikatz commented Apr 9, 2022