-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default workload identity always uses nomadproject.io audience #25079
Comments
Hi @shoeffner! The
(Emphasis added in bold.) Nomad doesn't mint workload identities for Vault at all unless you have a So therefore:
Change your jobspec to the following (note the empty job "token-test" {
datacenters = ["*"]
type = "batch"
task "spawn" {
driver = "docker"
config {
image = "busybox"
entrypoint = [""]
command = "/bin/sh"
args = ["-c", "echo $NOMAD_TOKEN | cut -d. -f 2 | base64 -d; sleep 5"]
}
vault {}
identity {
env = true
}
restart {
attempts = 0
}
}
reschedule {
attempts = 0
}
} That'll allow Nomad to mint a Vault token for you using the audience you've specified. It'll be exposed to the workload as the (Oh also, you're 1.8.2 hosts won't have the |
Thanks for your thorough answer, @tgross ! Now my current understanding how it works with
So this clears up quite a bit of confusion, thanks! Thanks again for your answer, it helped me a lot! Feel free to close the issue again, as this then works as intended. Maybe a bit on the background (as you asked if we would plan to login in manually): As I initially stated, we are migrating away from Vault tokens to workload identities. We don't plan on manually authenticating with Vault, even though it would be a workaround to indirectly provide namespace-level workload policies (see also hashicorp/terraform-provider-nomad#500) – we decided that it's not worth the effort for a handful of jobs which would benefit from that. In a web service to schedule jobs from predefined templates (think of nomad-pack in Web UI form with some job progress monitoring), the jobs it schedules (see also #24663) report back via an API when certain events happen. We create the API credentials and pass them via Vault (with a We offered access to user-specified secrets in Vault so users wouldn't have to share their personal GitLab tokens etc. This is no longer possible with workload identities (and was only possible because we built around Vault's inflexibility in the first place, see hashicorp/vault#16183 – OIDC mostly resolved this, though) because we cannot know all job names users come up with in advance. One way around this is to create a namespace for each user, but for now we will simply remove that feature. Apart from those three things – which are arguably a little bit astray from Nomad's ideal use-cases (we have user-specified, arbitrary workloads vs. Nomad's preferred well-defined, user-unaware workloads) – so far the workload identities seem to make everything smoother and our pilot users love the ease of use to get access to CSIs etc., it's just a little rough for the non-default things we do :) |
Yeah that's all right. To be precise, Nomad always creates
There is no way to do that currently. Here's some of the rationale for how that works from an internal design document (NMD-174 for fellow Hashi folks):
That is, each identity should have a single use, and the default identity's intended use is accessing the Nomad API. We could make the value configurable, I suppose, but there doesn't seem to be a strong use case of that and it introduces configuration management problems (servers could have different values configured and then not be able to validate WI's signed by the leader).
Oof! Yeah, namespaces really are the smallest tenancy primitive in Nomad in terms of access to Vault. Even without workload identity, I suspect your users could read each others secrets by grabbing them from each other's jobs via I'm going to close this issue as resolved. But if you've got more thoughts on this and how we can try to improve your intended workflow, we're definitely open to hearing them! |
Thanks again for your reply! I have a couple of thoughts on the aud below.
Yes, which is precisely what we do in "public" namespaces because of that.
Isn't this already the case for other things, e.g., the gossip key? Maybe it's not a big deal as that would cause failures early at startup time, rather than unexpectedly at runtime.
I don't really understand this line of reasoning. While I get the idea that this should prevent abuse (and it is worded with "if ... with audience="foo""), it seems very easy to forget to explicitly specify the audience. If I used Proof of concept (with a job running nomad agent -dev -acl-enabled > /dev/null &
NOMAD_PID=$!
sleep 2 # wait for nomad to startup
export NOMAD_TOKEN=$(nomad acl bootstrap -json | jq -r .SecretID)
sleep 1 # wait until keyring is ready
nomad var put nomad/jobs/token-test a=b
ALLOC=$(nomad job run token-test.hcl | grep Allocation | cut -d'"' -f 2)
sleep 5 # wait until the job runs and produces the output
export NOMAD_TOKEN=$(nomad alloc logs $ALLOC)
nomad var get nomad/jobs/token-test
pkill $NOMAD_PID So another party which gets the token is able to use it (while the job was running) to retrieve a secret for that job, which is ... kind of by design for such tokens? Wouldn't this [the argument in NMD-174] actually be an argument for configuring an expected audience which is only set automatically if no identity block is specified, but explicitly not set if an identity block is specified, precisely to prevent this kind of unintended credential forwarding? For example, if my nomad cluster only accepts tokens for the audience "my-nomad-cluster.io", it could silently use that to setup the job, write secrets from variables, etc. if a user does not request access to the identity via file or env to the token – the risk of it being distributed by the job is relatively low this way (unless I am missing something). But for us, this is not a big issue right now, my initial confusion is clear. Thanks again :) |
Yes! And as you note, without the right gossip key the server won't join the cluster properly. For other configuration values that aren't so fatal, we still have some of them in config files but... honestly we kinda hate that. 😀 Because it makes it unfortunately easy for cluster admins to break things in subtle ways that they don't notice until ex. a leader election. Over time you'll find we're trying to make more cluster configuration something we distribute via Raft instead. (But there are always bits that'll be impossible to do that with.)
I think the argument is that it prevents a token that's being passed intentionally to a third party for non-Nomad API uses from masquerading as the workload itself if the third party leaks it. A better example than your use case might be using WI as an authorizer for AWS IAM (ex. I want my workload to be able to login to AWS to upload stuff to S3). Here the expected audience is set by the third party and it'll never accept the default WI w/ |
And yet, this is where that token could still be leaked, as Nomad shifts the responsibility to reject the token to the third party; the simplest way being in logging some error message which accidentally contains "Invalid token: ". Oops. I think even for the intentional case, at least the token which gets exposed to Nomad should not include that audience unless explicitly requested, because you could forget to specify the audience. So I am currently thinking about two things "mostly unrelated" things: The first, not important thing: making nomadproject.io a configurable option -- if only to distinguish, e.g., different Nomad clusters, or even ensuring that different nomad clusters which have bugs do not accept tokens from others, which is... mostly hypothetical. The second is to not expose the default audience to the job unless explicitly requested, to prevent intentional or unintentional passing to a third party. If I want AWS to control Nomad for me, fine, then I can put the nomadproject.io audience in there -- but I will not accidentally send that token there, even though I intended to send some token there. Still, thanks for your point of view, much appreciated that you took the time! |
That's already the case; unless you have |
Nomad version
Nomad v1.8.2
BuildDate 2024-07-26T12:22:15Z
Revision 919bd4e7602ed1c6e26e865186be6a51f5dc33e1
(this is a custom build, hence the date and revision will not match the official release, out custom patches are unrelated though)
Nomad v1.9.1
BuildDate 2024-10-21T09:00:50Z
Revision d9ec23f
(this is the official release)
Operating system and Environment details
Ubuntu 22 (1.8) and 24 (1.9)
Issue
We are currently transitioning away from the Vault token integration to workload identities.
It seems the default audience (vault > default_identity > aud in the nomad server config) is not taken into account when the workload_identity tokens are generated, instead, the default
nomadproject.io
is used.Specifying an audience for a specific identity works as expected.
Reproduction steps
Run a server with a config containing the following vault block:
To test the token, we can run this job:
To see the job-override in action, modify the job by extending the identity and updating the variable name:
Expected Result
Both jobs should have this audience claim at the beginning of the output:
Actual Result
The first job (using the default-identity) has the default audience:
The second job (using the test-identity) has the correct audience:
Job file (if appropriate)
See above.
It could be that we are configuring something wrong, but we don't know what else could have gone wrong other than the server config not being used.
If you need any more information about our setup or if I can assist in reproducing the issue, please let me know!
The text was updated successfully, but these errors were encountered: