-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linking with Lwt_unix can cause unexpected EINTR #738
Comments
We should indeed be able to delay this, until either of |
Thanks for the quick response! |
I will make sure this is resolved by the next release. I understood that this isn't an emergency for you presently, since you have worked around it. Is that right? If so, I'll do it on Lwt's ordinary release schedule :) |
No emergency, and absolutely no rush on our end. |
Great, and thanks for diagnosing that! |
I have a minor concern, that fixing this will make the problem even more surprising and/or difficult to diagnose, since I'm still leaning toward fixing this, but would you mind commenting on the above? For example, while diagnosing, did you quickly find that Lwt is now being linked in, but (reasonably) could have no easy way of knowing that it is Lwt which is causing |
That is a fair concern. For what it's worth, I opened this issue to make you aware of something that happened internally, but I'm not convinced that any change to Lwt is actually necessary. In the abstract, I feel that library code should always be careful to handle EINTR, since the programmer can not know whether the program will use signals, as you noted. Application code can probably be a bit less careful, but in our case being lax about EINTR combined with multiple programmers working with little coordination led to issues. I was not the one who connected the unexpected EINTR to a change that only linked in Lwt_unix, I only connected a few dots to show that it was the sigchld handler which caused the change in behavior. I am not certain how the initial connection to Lwt_unix was made. As I understand, your concern is that people might write library code that does not properly handle EINTR, but wouldn't necessarily figure it out during testing, since their test harness might not call |
It's more specific. When EINTR first appears for a user that has EINTR-unsafe code somewhere in their program (whether in their project or something it is linked with):
(1) seems better because...
In my mind, the main argument for delayed signal handler installation at this point is that the developers of a project that is suffering from EINTR might not have control over the EINTR-unsafe code that is linked into their project. However, in OCaml and Reason, I think most developers ultimately are in control of all the code, since it is generally open source or they have source access. ...which then suggests not to change the code of Lwt, but maybe to add a note to the docs of Lwt_unix, in hopes it will help people to diagnose this more quickly. |
Yeah, I see what you mean. Both ways have problems, so perhaps the best course right now is to just document the behavior. I think the confusing part of the existing behavior is that people assume that module initialization alone will not affect runtime behavior. Is it the case that |
That's a good idea. It's not necessary to call I'm still not 100% sure this is better than failing if Lwt is simply linked, but I think I will go with this approach. Could you get a comment from the person who connected this to Lwt_unix, about what would have helped them to diagnose the issue more quickly? |
Some change was introduced that caused this behavior. I found the problem by bisecting to that commit. Then I reduced the changes in that commit until I got a repro. Here's my original comment on the bug:
I was only able to bisect it to the commit easily and narrow down the cause because it very reliably triggered I do think it would be more intuitive if we started getting |
Excellent, thanks! |
To this day, that binary doesn't use Lwt. But there was a time when I was considering running |
If we make the change to defer installation, we would also document |
In particular, we would mention the need for the whole program to be |
If the text |
The attached commit defers installation of the I also mentioned |
@rvantonder Great :) The release should be next week. Just want to confirm: you have tried your application with Lwt from |
Yes, I removed the special casing and compiled against Lwt |
The Lwt_unix module's top-level code installs a sigchld handler, which can cause blocking system calls to be interrupted. Note that the default behavior is to ignore the signal, so even if sigchld is received, blocking system calls will not be interrupted.
It's easy to write some code that does not check for EINTR which, while arguably wrong, works well enough because the program does not use signals. This happened to us in the Flow/Hack codebase where a shared module became linked with Lwt_unix and then linked into a binary that uses fork and select, but does not handle EINTR on the select.
The behavior was difficult to reproduce, since it depends on receiving the sigchld while in the select call. The issue was further difficult to diagnose since no relevant code had recently changed. Once diagnosed, the issue was trivially resolved by separating the Lwt-using module from the non-Lwt code.
While this is an edge case, and I think it's fair to say that Lwt_unix is doing nothing wrong, it would be nice to defer setting the sigchld handler until later, so that code which is linked with Lwt_unix but does not actually use it does not have its blocking system calls interrupted by sigchld.
The text was updated successfully, but these errors were encountered: