-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change timeout kill signals to SIGQUIT from SIGTERM #45864
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a convention in this repo to do JULIA_TEST_*
instead of JULIA_TESTING_*
, so it would be good to be consistent.
fa24405
to
89321d0
Compare
89321d0
to
63a435d
Compare
Canceling buildkite build, since it's truly being tested here: JuliaCI/julia-buildkite#165 |
@vtjnash Could you explain why this is a bad idea? |
Seemed somewhat awkward for the API, but I didn't really think much about it so I didn't leave a review. It also may mean we won't generate a backtrace before exiting anymore Unrelatedly: currently this appears to be also killing much too fast, so our CI logs are getting incorrectly littered with errors from this, and leaving behind some tmp files that seemed to sometimes now cause some issues for the machines |
63a435d
to
7e542aa
Compare
@staticfloat Can you rebase and fix the merge conflicts? |
7e542aa
to
cdf0c36
Compare
SIGQUIT generally causes a process to coredump, so let's use that as the termination signal throughout our test suite. X-ref: JuliaLang/julia#45864
cdf0c36
to
8400403
Compare
8400403
to
1c21afc
Compare
After talking it over with Jameson, he recommended that we skip the whole customization thing and just unconditionally use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you hard-coding SIGQUIT, instead of reading it from an environment variable?
😂 Left my comment before I saw your comment. |
1c21afc
to
535e402
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
Jameson tells me that one complication with this is that we may not be able to use SIGQUIT on Windows. |
Use SIGTERM on Windows, and SIGQUIT otherwise? |
How does one even trigger a core dump on Windows? |
There are a couple library functions for that I believe (notably ReportFault) https://learn.microsoft.com/en-us/windows/win32/dxtecharts/crash-dump-analysis |
Ah, that's right... IIRC Keno had some code in I don't think that needs to block this PR though. |
So Jameson what do you suggest here? Should we embed |
I think we can patch libuv to make this fatal, but currently to define SIGQUIT=SIGTERM also |
For extra bonus points, you could even make a dump file when sending SIGQUIT (since there is an API function for doing just that) |
Here's my initial stab at this: JuliaLang/libuv#30 |
This commit provides two pieces; first, it enables building libuv with `SIGQUIT` support on Windows (useful for [0]), and second it enables coredumping when exceptions are hit, after our exception handler has finished printing out a backtrace. This is useful for using WER to create dump files when we segfault, etc... This commit requires libuv PRs [1] and [2] to properly function. [0] #45864 [1] JuliaLang/libuv#30 [2] JuliaLang/libuv#31
Switching this PR to draft until the following three PRs are merged: |
Seems like those PRs are done now |
We've been having a lot of timeouts on CI recently. Our coredumps might be helpful in tracking these down, but when we send `SIGTERM` or `SIGKILL` we don't get coredumps. This changes timeout signals to generally use `SIGQUIT` instead of `SIGTERM`, when sending the first kill signal.
535e402
to
b487ae6
Compare
This allows for easier collection of core dumps. X-ref: JuliaLang/julia#45864
Okay, I rebased and moved the |
We've been having a lot of timeouts on CI recently. Our coredumps might
be helpful in tracking these down, but when we send
SIGTERM
orSIGKILL
we don't get coredumps. This allows customization of whichsignal is sent during these timeout messages, to allow for CI to set it
to
SIGSEGV
or similar, to force core dumps for debugging purposes.