Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker image to run as non-root #2265

Merged
merged 11 commits into from
Apr 4, 2024

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Mar 27, 2023

What this PR does:

BREAKING CHANGE

Here we make the image adjustment necessary to run Tempo as non-root in the
Docker container, as well as include some util jsonnet to allow statefulsets to
chown their data directories to match the new permissions.

With this new chown init contianer, both the ingester and metrics-generator
statefulsets start and function.

❯ docker run -it --entrypoint sh zalegrala/tempo:tempoNonRootImage-eefae10f9
/ $ whoami
tempo
/ $ id
uid=10001(tempo) gid=10001(tempo) groups=10001(tempo)
/ $ ls -ld /var/tempo
drwxr-xr-x    2 tempo    tempo         4096 Apr  2 19:40 /var/tempo

Note that the securityContext.fsGroup(10001) may be required for environments
that mount additional volumes which do not have read/write permissions for the
tempo user. Users may also wish to recursively chown the /var/tempo
directory for the new ownership. This will need to be done only once.

Which issue(s) this PR fixes:
Fixes #334

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@github-actions github-actions bot added the stale Used for stale issues / PRs label May 28, 2023
@zalegrala zalegrala added the keepalive Label to exempt Issues / PRs from stale workflow label May 30, 2023
@github-actions github-actions bot removed the stale Used for stale issues / PRs label May 31, 2023
@github-actions github-actions bot added stale Used for stale issues / PRs and removed stale Used for stale issues / PRs labels Oct 12, 2023
@zalegrala zalegrala force-pushed the tempoNonRootImage branch 2 times, most recently from fe17af1 to 22a27e7 Compare March 15, 2024 15:07
@zalegrala zalegrala marked this pull request as ready for review March 15, 2024 15:12
@zalegrala
Copy link
Contributor Author

Needs a mention of the breaking change and an entry in the changelog.

@grafana grafana deleted a comment from github-actions bot Mar 15, 2024
@grafana grafana deleted a comment from github-actions bot Mar 15, 2024
@zalegrala zalegrala force-pushed the tempoNonRootImage branch from c8f7675 to d102622 Compare April 2, 2024 19:16
@zalegrala
Copy link
Contributor Author

Thanks for the review folks.

@mdisibio I pushed a commit for your feedback on the jsonnet. Feel free to call out anything else you'd like to see addressed here.

@zalegrala
Copy link
Contributor Author

I caught up with mdisibio earlier and we're good on the changes here.

@zalegrala zalegrala merged commit 0d3bde5 into grafana:main Apr 4, 2024
14 checks passed
@zalegrala zalegrala deleted the tempoNonRootImage branch April 4, 2024 16:43
@zalegrala zalegrala self-assigned this Apr 4, 2024
@gangstead
Copy link

For anyone else whose helm release broke when this was released (chart version 1.9.0).

The correct place to add the securityContext at the container level in the chart is here: https://github.com/grafana/helm-charts/blob/main/charts/tempo/values.yaml#L234

    securityContext:
      fsGroup: 10001

gangstead referenced this pull request in grafana/helm-charts May 31, 2024
@zalegrala
Copy link
Contributor Author

Thanks for the note @gangstead. I checked the tempo-distributed chart before, but not the tempo chart, so apologies for the trouble. I think you will want more than fsGroup on your pods. The change would be something similar to what is in the tempo-distributed chart here: https://github.com/grafana/helm-charts/blob/main/charts/tempo-distributed/values.yaml#L68-L70

@StefanLobbenmeierObjego
Copy link

StefanLobbenmeierObjego commented Jun 4, 2024

In understood this as add this to the tempo values:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000

but actually I have to make sure to use the UID/GID from a previous release? Or use 10001?

level=error ts=2024-06-04T12:02:37.078578894Z caller=poller.go:245 msg="failed to write tenant index" tenant=single-tenant err="open /var/tempo/traces/single-tenant/index.json.gz: permission denied"

@StefanLobbenmeierObjego
Copy link

StefanLobbenmeierObjego commented Jun 4, 2024

Combining both and using 10001 seems to work:

securityContext:
  fsGroup: 10001
  runAsNonRoot: true
  runAsUser: 10001
  runAsGroup: 10001

@gangstead
Copy link

It worked for me with just fsGroup: 10001. I think this PR already sets the user/group with the USER 10001:10001 command in the dockerfile.

I'm not really sure on the interaction between the runAs* in the k8s securityContext and the user command in the dockerfile. They seem to be two ways of setting the user / group but I don't know if there's any subtle distinction.

@zalegrala
Copy link
Contributor Author

If only fsGroup is used, I expect that the reads would be functional because of the umask, but writes and deletes would not. A one-time chown of the data is also an approach to drop the security context handling entirely. A successul read of the local data is enough to let the ingester start.

Is the ingester writing data locally successfully after using your securityContext updates? @StefanLobbenmeierObjego @gangstead

@gangstead
Copy link

Yes. Looking in Grafana there are new traces being saved. Maybe I had some data loss and just didn't realize it? We only store traces for 36 hours due to the huge volume of storage it takes. It's too late for me to go back and verify. Losing some traces is fine in our case.

@StefanLobbenmeierObjego
Copy link

Is the ingester writing data locally successfully after using your securityContext updates?

yeah I also had to issues with my tempo configuration

@StefanLobbenmeierObjego
Copy link

Actually, no, something is off. I can see the traces from today without issues, but no traces from past days. I am assuming the writing to the disk fails but it still shows some cached in memory?

@StefanLobbenmeierObjego
Copy link

Nevermind, is seems that was just a hickup. I tried again now and I can see traces from today and traces from up to 14 days ago (which is the limit we configured). So everything is working, sorry for the confusion

@pharaujo
Copy link
Contributor

pharaujo commented Jun 7, 2024

While testing Tempo v2.5.0 with runAsNonRoot in our incubator environment, I noticed vulture was missed: https://github.com/grafana/tempo/blob/v2.5.0/cmd/tempo-vulture/Dockerfile

@zalegrala
Copy link
Contributor Author

Thanks @pharaujo, would you like to PR a change?

@zalegrala
Copy link
Contributor Author

@StefanLobbenmeierObjego Would you like to PR the tempo chart for this update?

pharaujo added a commit to pharaujo/tempo that referenced this pull request Jun 8, 2024
Follow-up to grafana#2265 which missed tempo-vulture

Signed-off-by: Pedro Araujo <[email protected]>
pharaujo added a commit to pharaujo/tempo that referenced this pull request Jun 19, 2024
Follow-up to grafana#2265 which missed tempo-vulture

Signed-off-by: Pedro Araujo <[email protected]>
pharaujo added a commit to pharaujo/tempo that referenced this pull request Jul 26, 2024
Follow-up to grafana#2265 which missed tempo-vulture

Signed-off-by: Pedro Araujo <[email protected]>
zalegrala pushed a commit that referenced this pull request Jul 29, 2024
Follow-up to #2265 which missed tempo-vulture

Signed-off-by: Pedro Araujo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Label to exempt Issues / PRs from stale workflow
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Allow for images to run as non-root
6 participants