-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Template rendering through nomad job failing on windows nodes #20034
Comments
I'm experiencing a similar problem. Template rendering fails on the following template:
I didn't have any issues with this template on 1.7.1, but after upgrading to 1.7.5 I started to get the |
I've experienced the same issue in both |
Hi @meowtini @hardselius and @hardselius; thanks for raising and contributing to this issue. I believe this is caused by the changes introduced within #19888 and therefore I would ask for some additional information to help us to understand the problem which our testing missed.
Thanks. |
@jrasell Pls find info you had asked
Again this is seen only on Windows server nodes and not on linux nodes (atleast in our case) |
Yeah, the security update in 1.6.7 has significantly different implementation on Windows than on any other operating system. We had to implement AppContainers rather than just chrooting the rendering subprocess. Unfortunately it looks like the client logs you've provided here are at info-level only so we may be missing some context. Here's the only relevant bits: {"@level":"info","@message":"(runner) creating new runner (dry: false, once: false)","@module":"agent","@timestamp":"2024-02-28T04:59:42.152055-05:00"}
{"@level":"info","@message":"(runner) creating watcher","@module":"agent","@timestamp":"2024-02-28T04:59:42.153260-05:00"}
{"@level":"info","@message":"(runner) starting","@module":"agent","@timestamp":"2024-02-28T04:59:42.153828-05:00"}
{"@level":"error","@message":"exit status 0xc0000142","@module":"agent","@timestamp":"2024-02-28T04:59:42.204358-05:00"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-02-28T04:59:42.204358-05:00","alloc_id":"de978c15-f7fe-ec5a-c386-a205a79ec2d5","failed":true,"msg":"Template failed: error rendering \"(dynamic)\" =\u003e \"C:\\\\ProgramData\\\\nomad\\\\alloc\\\\de978c15-f7fe-ec5a-c386-a205a79ec2d5\\\\control-plane-logging-task\\\\local\\\\log_config\": template render subprocess failed: exit status 0xc0000142","task":"control-plane-logging-task","type":"Killing"} According to the MSFT error reference documentation (PDF), the exit code we're getting here is It shouldn't make a difference, but just to help me eliminate possibilities, are you running Nomad as a Windows Service using the instructions in Register Nomad with Windows, or some other way? |
trace_log_windows_client.json |
Thanks @pavanrangain! Sorry to get back so slowly on this. I just wanted to pop in and say we haven't forgotten you but I've been swamped with a couple of other items. We're going to hand this issue off to @angrycub who helped me work on the security fix that's at the heart of this, and he'll start looking at this once he's wrapped up his current task. Thanks for your patience! |
Any progress on this? We're still stuck on 1.7.3 in order to avoid this issue. |
Hi @DTTerastar! We have a solid idea of the problem, which is a difference in ambient credentials between running Nomad as a Windows service vs otherwise (which is what all our tests did! 🤦). It's taking longer than we'd like to figure out the solution however. We'll update this issue when we have more information. |
I am having same issue on ubuntu for many jobs.
|
@mikedvinci90 if your issue isn't on Windows, please open a new issue for that. The isolation mechanism is very different between the two OS. Debugging this is likely possible on Linux without the patch we're working on (slowly!) for Windows. |
Hi @tgross . I'm facing the similar issue. Nomad version1.7.7 Operating system and Environment detailsWindows 11 Home Issue:The template rendering works fine if it was running the Nomad binary by Powershell (Administrator Mode) but it fails in running as Window Service. On the Web UI, I saw the error message when I used Nomad Window Service. But actually the template file is there. Reproduction stepsJust use "sc.exe create ..." to create Window Service and user "Local System" as the running user. My server configuration
My Job
The Window Service Properties
|
@thfai2000 for now the only solution is to disable the file sandbox: https://developer.hashicorp.com/nomad/docs/configuration/client#disable_file_sandbox This sounds much worse than is really is, as you're already using |
hi @tgross Thanks for your advice. It works now after I use "disable_file_sandbox = true" in my server configuration file.
|
Not the same issue but deeply interrelated: #20585 |
Disabling the file sandbox also worked for us but would to see a proper fix for this. |
Hi @pavanrangain, we just merged 2 changes that will remedy this problem. Nomad 1.8.2 will no longer sandbox template rendering on Windows, and to address the security aspect (which is only relevant for running Docker with Process Isolation as |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
1.6.7
Operating system and Environment details
Windows Server 2019
Issue
Template rendering in a nomad job fails on windoes nodes since 1.6.7 (issue seen even in 1.6.8). Issue was not there with version 1.6.6
Reproduction steps
Expected Result
Job should get deployed successfully
Actual Result
Job failing with error something as below
Template failed: error rendering "(dynamic)" => "<path removed>//log_config": template render subprocess failed: exit status 0xc0000142
NOTE - actual path removed from errorJob file (if appropriate)
Nomad Server logs (if appropriate)
Nothing relevant
Nomad Client logs (if appropriate)
Just shows same error
Template failed: error rendering "(dynamic)" => "<path removed>//log_config": template render subprocess failed: exit status 0xc0000142
Observation:
Issue may be with this change that went into 1.6.7. There is no issue seen on linux node wrt to template rendering. Issue is only on windows nodes
The text was updated successfully, but these errors were encountered: