-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: synth of CustomResourceProvider hangs in Docker on Linux 5.6-5.10 #21379
Comments
Wow, this is horrible. Thanks for reporting. If I'm reading the linked issues correctly, it seems that
The linked thread in nodejs/node#40200 seems to not have seen movement in ~a year, and it seems the issue there is punted to a combination of Docker/kernel issues. I will leave this thread open for discussion and tracking, but I'm not too inclined to make any changes currently. I wouldn't even know how or what to do, properly. Special case detection that we're inside Docker, notice which paths are mapped to volumes (if that's even something we can do), and then choose a different copy target? If we're in an environment where we can't trust the filesystem anymore... I mean... 🤷♂️ I give up. |
Also, this only seems to happen on very specific instances of Docker on a particular Linux kernel, right? |
As discussed here, this affects Linux kernels 5.6.x-5.10.x: https://lore.kernel.org/stable/[email protected]/ Workaround is setting |
The CDK behavior is as follows:
The change was:
The problem was:
Full props to @nburtsev for figuring this out. I'm not sure I myself would have been able to put all of this together. In summary: The CDK does not directly communicate with the kernel--we just perform filesystem copies. Bugs in the interaction of other pieces of software cause the file copy to loop endlessly if the right combination of circumstances is hit. |
A particular combination of software has hard-to-recover bug. Add a check and warning for it. Closes #21379.
A particular combination of software has hard-to-diagnose bug. Add a check and warning for it. Closes #21379. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
A particular combination of software has hard-to-diagnose bug. Add a check and warning for it. Closes aws#21379. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
A particular combination of software has hard-to-diagnose bug. Add a check and warning for it. Closes aws#21379. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
A particular combination of software has hard-to-diagnose bug. Add a check and warning for it. Closes aws#21379. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Describe the bug
After update to 2.34 we noticed that some of our deploys and tests hang in pipelines but work just fine locally.
I was able to localize it to us deploying S3 buckets with
autoDeleteObjects: true
and this change #20953. the symptoms (endless copy_file_range in strace) look very similar to nodejs/node#40200 that points further to Docker/Kernel bugs.Simple workaround for this is to set TMP env var to a path inside build working dir (aka git tree) i.e.
mkdir tmp && export TMP=$PWD/tmp && cdk deploy
Our pipelines are executed in containers inside OCP4.10.13 (k8s 1.23.5) cluster in AWS with no special configuration, both node14 and node16 behave the same way. I have limited access to EKS 1.22, using the same container image - the problem does not occur, which probably makes sense - kernel versions seem to be different (4.18.0-305.45.1.el8_4.x86_64 vs 5.4.188-104.359.amzn2.x86_64"
Expected Behavior
Stack is deployed
Current Behavior
cdk deploy
hangs in endless loopReproduction Steps
Any kind of test that renders this stack will also hang.
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.34.0 (build 633edab)
Framework Version
No response
Node.js Version
14.19.0
OS
Ubuntu 20.04.4
Language
Typescript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: