-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Watcher SingleNodeTests failed due to running threads #36782
Comments
Pinging @elastic/es-core-features |
Also on This one also caused an additional test failure, but I think they're related:
|
Another occurrence: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/910/console |
I've been working on reproducing this failure locally for about 2 days now by trying to update the timing using sleeps in places that appear to be related to the failure, but have been unsuccessful in getting it to recur. The shutdown process intuitively seems a bit suspect to me since it appears that watches can potentially be executed during the different stages of stopping the service if there is already a watch queued up, but I have no evidence that's it's the direct cause or technically incorrect. I would like to re-enable this test on master if there are no objections with the caveat that we add the stack trace to the warning message for "failed to execute watch" so that we have more information should it happen again. |
@hub-cap What do you think ^^ |
+1 to re-enable with additional information that may help track down the root cause. These can be frustrating difficult to reproduce on command. If this happens again we can re-mute but grab the additional info. |
here are some logs including the DEBUG logs that were exposed after unmuting
|
@talevy thanks, will continue to investigate |
I think the error that happened above is actually slightly different from the original, but I believe they all stem from the same thing. It appears to me that watcher initiates it's shutdown and drains the queue, but it's possible that something has already made it through the queue already on a different thread after the shutdown takes place and the watcher indices are already gone. Watcher then tries to recreate the indices, but it cannot because of the ordering of shutdown in Node. I believe the solution is to ensure that no watches are already in progress during a watcher shutdown. @jakelandis Does this make sense? Or am I off base here? |
This test has been enabled a for a while now and hasn't yet failed. |
See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+periodic/376/console
the logfile is pretty clobbered as two exceptions are written at the same time, this is the first one
this is the second one containing jstack output
Looks as if there is a problem with closing the bulk processor used in the history store and in the triggered watch store when shutting down.
The text was updated successfully, but these errors were encountered: