-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output of "nomad run" seems wrong for system job with constraints. #2381
Comments
+1 to this. The non-zero exist status is the real issue for us. |
I'm running into this problem on our Nomad clusters at Density with Nomad 0.7. Our CI/CD pipeline attempts to plan and run jobs via the Nomad API and reports failure with system jobs. The Nomad CLI's exit code 2 appears to reflect the failed allocations coming back from the API: Lines 184 to 186 in 6a783e9
Lines 317 to 321 in 6a783e9
I'd be happy to contribute a fix for this, but it's not totally clear what the correct behavior should be. Should there simply be more exit codes to reflect different kinds of warnings? |
@dadgar any updates regarding this issue? Encountering the same issue with Nomad 0.7.1 |
We are experiencing this as well. |
a "quick" work-around is to submit it over the HTTP API rather than CLI and inspect the evaluation your self i would expect any placement due to lack of resources for a system job to fail like it does today though |
I ran into the same issue today as well, it looks like this is more then an exit-code issue. The scheduler reports failed allocations over HTTP API as well. (so you get the same behaviour submitting over HTTP). The allocations do get scheduled properly, but it reports the filtered nodes as failed allocations.
|
Same thing here... Running Nomad 0.7.1, whenever I use constraints with a system job in the same workflow as described in this issue I get placement errors even though the allocations are successful. It's like Nomad is treating a constrained node as a failure placement on system jobs when actually it is not! |
For the record, this error still appears on 0.8.1. Example code: https://pastebin.com/raw/f7yH5Q4U |
Follow up, After doing a fresh install of Nomad server, running the same job above, no errors exist in the UI. Errors sill persist when running the job via CLI. |
I've just run into this as well. I launch my jobs from ansible, and now I have to tell ansible that exit code 2 is OK, which is .. sub optimal. |
same with 0.8.4, but errcode 1... |
Come on guys, this is really a bug and should be dealt with. Many, if not most, people that run a service are going to constrain it to a subset of nodes. Having it throw an error for such a common use case isn't good. Here's my workaround/backflip to take care of this in Ansible. At least it'll let some errors get trapped.
|
Any news on this? |
We are getting hit by this. Hope you fix this fast |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Reference: https://groups.google.com/forum/#!topic/nomad-tool/t3bFTwSVgdQ
Nomad version
Nomad v0.5.4
Issue
Quoted from the mailing list:
The text was updated successfully, but these errors were encountered: