-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] Change consul SI tokens to be local? #8063
Comments
Good catch @jorgemarey ! Considering each SI token is tied to the allocation in which a task is running, and each SI token is created/destroyed only by the Nomad Servers associated with the cluster of the requesting Nomad Client, yeah I think it makes sense to create them as local tokens. Did your test work out as expected? |
Hey @shoenig. Just tested it and everyting seems to work fine. The tokens are created locally and I was able to deploy new allocations event then the primay DC was down. I don't know if this has any other implications, but I could make a PR with this change if you think that's ok. |
A PR would be great @jorgemarey , thanks! |
Ahh so unfortunately there is a problem with using We'll keep this issue and PR open for now, and merge it when the functionality is in place. |
Thanks @shoenig, I'll be looking forward to this fix. We're working on federating our consul DCs but we don't want nomad to fail (fail to run new allocations) if the connection to the primary consul DC fails. |
We should revisit this; according to consul team this may just work now - as long as the remote DC's Consul default agent token is privileged enough for |
Hi @shoenig, any news on this? We're having some problems, and I think it's due to this. Sometimes the envoy sidecar fails to start (after a few tries it starts correctly), but I think it's related to token creation being done in the primary datacenter, so until the token gets replicated the task fails to start. |
Hi, sorry to ping again over here. Any news on this? |
Hey @jorgemarey sorry this has taken 2 years, but we may be in a reasonable place now to make the switch. The fundamental change comes from hashicorp/consul#7414 which shipped in Consul 1.8, which is now beyond EOL. As described in that issue the implication is the Consul agent in the remote DC will now require its anonymous ACL token to contain the permissions service_prefix "" { policy = "read" }
node_prefix "" { policy = "read" } but that's not unusual ( |
Hi @shoenig. That's great. We have several federated DCs distributed all over the world. This change will allow me to sleep better :D (we never had any problem related to this, but the fear of losing contact with the primary and not being able to deploy on the secondary is there) |
Even with setting the default token to something with sufficient ACLs (i've verified this), consul connect still doesn't work nor does accessing dc-1 as proxied by a server in dc-2 with a local token, my gues is that the token stripping isnt taking place like it should.
If I log into Consul through dc-2 with the token generated by Nomad that's set |
Indeed reverting this changes fixes all my issues. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Hi! I was testing nomad+consul connect in some test clusters. I tested to shut down the primary consul datacenter and saw that I wasn't able to run any connect task because the consul SI token can't be created due to it being global. Does it make sense/Can it be local so if the primary DC fails for some reason other datacenters can still work properly?
I will try to change the code here and set a
Local: true
to test it.https://github.com/hashicorp/nomad/blob/master/nomad/consul.go#L220
Edit: to clarify, everyting thats running it's running ok, but can't deploy new allocations.
The text was updated successfully, but these errors were encountered: