-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reliability issues in the parallelize tool #1632
Conversation
* output_collecting_pipe's destructor could hang on exit because it did not clear the running flag before cancelling. The threadpool callback would then start another IO operation which never completes. * Use a manual reset event for overlapped IO to prevent potential memory corruption. AFAICT waiting on threadpool IO with named pipe handles is unreliable and may result in the kernel writing to a free'd OVERLAPPED struct.
Thanks! I'm far from a Windows API expert but I compared this to the documentation (e.g. the |
FYI @BillyONeal. Feel free to review if you like, mostly I want you to know we've found a race condition in parallelize that you may want to pick up if you've reused the tool elsewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me after clarifications to STL's questions.
The thread pool is responsible for performing that wait, and then we wait on the threadpool with
You need separate manual reset events if you issue multiple outstanding IOs on the same file handle. If an event is not supplied, the "secret event" in the file handle itself is used as the synchronization entity. ( https://devblogs.microsoft.com/oldnewthing/20190719-00/?p=102722 ) We never issue concurrent IOs here so the one in the file handle should be OK. |
To my knowledge this isn't true. The docs state:
Completed here means that the result is written back to the OVERLAPPED structure (aborted or not). So in the repro case, there aren't any queued callbacks, just a single pending one (from the last call to
This is my understanding as well. But here the named pipe handle cannot be used to wait, it's never signaled (again?). But "waiting" with a loop of non-blocking |
I see, then maybe the right fix is to pass false rather than true there?
Yes, because we passed |
The gh1619-repro branch indeed repros for me, thanks. (I observe It's interesting that I think I'm satisfied with this investigation. We have a repro, we have a change that fixes it, the change doesn't appear to be harmful in any way, and although we can't really explain why it works, the documentation is overall very vague (as Billy noted) so this isn't too surprising. We know of no other solutions, so I think we should merge the fix and revise it later if we learn more. (And I ultimately suspect that this is responsible for occasional infrastructure hangs, so there is a real productivity improvement here.) |
I still believe that it's not fully correct (why it already works with just an event I don't know). Either:
I don't think just using |
I've mirrored this to the MSVC-internal repo for merging. If you can improve the fix with |
Thanks for improving this tool, and congratulations on your first microsoft/STL commit! 🎉 ⚙️ 😸 |
Fixes #1619