-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when installing several dune packages at the same time #2131
Comments
I'm wondering if this issue is caused by the overly eager package loading in dune. To check this hypothesis, could you try and reproduce this issue with dune 1.9.2 and above? If this is indeed the issue, you will not observe this failure with these versions. Now if this is in fact the issue, then 1.9.2+ do not actually fix it. They only mitigate it whenever variants aren't used. To truly fix this issue, dune should only ever try loading packages which are dependencies of the package that it is currently building. Side note: In fact, this seems to be desirable behavior anyway from a sandboxing perspective: dune should not observe packages that are not dependencies (including transitive dependencies here). Now let's brainstorm on how to fix this issue properly. First, I think we will need to implement the variants index discussed in the meeting yesterday. Also, I'm wondering if we should introduce the following restriction: a variant implementation is not visible unless it exists in one of the package dependencies of the package being built. Hence when we specify variants when building an executable, we may also add package names so that dune knows where to look for variant implementations. |
Implementing the per-virtual-lib variant index we discussed yesterday should be enough. Should we just completely disable variants until this is implemented? |
Is it enough? Opam's installation isn't atomic so I'm thinking we could still have issues where the index is updated by the package still isn't fully installed (or vice versa) for example. |
BTW, @kit-ty-kate we were thinking of marking 1.9.0 and 1.9.1 as unavailable in opam once 1.9.2 is released |
I'm not sure this is necessary as we've shipped virtual libraries in an incomplete state before (behind a feature flag as well). So I think it's ok to leave them as 0.1. |
@rgrinberg the index is never updated. It is constructed once and for all when building the virtual library and will live in the |
I'll write down the concrete proposal we discussed to make sure we are all on the same page about this |
I wrote down the new proposal: #2134 |
I believe this can be now closed as we've fixed this in 1.9.3 and made variants experimental. |
I have observed this error twice, using dune 1.11.4.
Was it truly fixed? :) |
Do you have a reproduction case we could look at? |
I don't know how useful these logs will be to you, but you can see three examples here: To find the error, you can search for "ERROR while compiling" on the page. In the second case, the job was retried, and succeeded: https://gitlab.com/ligolang/ligo/-/jobs/341466583 The third case just happened (thankfully, outside of the special "debian 9" job, I guess that was a red herring.) The thing our build script was doing when the failure happened was:
You might notice in the logs a problem with our custom opam repository:
I wonder if that could somehow cause this? I have just added the missing repo file, will wait and see if it happens again... In all cases, the failure:
I haven't been able to reproduce it locally yet, and I'm not sure what other kinds of information would be relevant. I am happy to try to get more details for you, if you have a clue where to look... I might try some more to reproduce locally, but, my current plan is to simply reduce the frequency with which our CI does this step. That would be helpful for other reasons too. |
Just a thought: do the package for which you are observing issues have dependencies? i.e. do they have If there are optional dependencies that are not present in the opam file, then that would explain the race condition. |
I couldn't find any I haven't seen this happen again yet, since I fixed this problem with our custom opam repository:
Still waiting to see if the issue happens again... |
Astounding! Thanks all! |
@diml: how about making dune aware of opam locks, i.e. taking a read-lock on |
Adding a write lock when one calls dune install manually would be quite nice but I’m not so sure about the read lock. Dune does not read try to read any libraries unless they’re dependencies, and when installation starts, i think it’s assumed the all the dependencies should be present already.
…On Nov 25, 2019, 4:22 PM +0700, Louis Gesbert ***@***.***>, wrote:
@diml: how about making dune aware of opam locks, i.e. taking a read-lock on <pfx>/.opam-switch/lock if it exists before loading the libs, and a write lock before processing dune install ?
It could make dune and opam play more nicely together in this kind of scenarios.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I don't like the idea of a read lock either. Regarding a write lock, my first thought is about people who do |
Why does |
Yep. I also feel like it's a workaround for a real issue that should be fixed. Dune should absolutely not be looking at library files from things that are not dependency of what is being built. |
Yeah, but I'm not sure if this possible with our current design. The library database is available for querying when setting up the builds rules. So this means that if we have a directory with rules look like this:
And we're installing The way to avoid the look up to |
Or do the lookup lazily? i.e. instead of using the |
I'm not really sure how the lookup can be done lazily. As I understand the situation, we setup rules for a directory all at once. So anyway to make things lazier here would require us to skip setting up the rules for |
This is what I had in mind: right now, we do the library resolution inside the So in fact, even Regarding dune not setting up rules it doesn't execute. I'm not entirely sure what it entails code-wise. It might be a serious refactoring. |
opam-health-check (check.ocamllabs.io) detected a transient failure. While building one of the packages, the following error happened:
The
result
package is not a dependency of the package failing and so is built and installed whilecppo_ocamlbuild
is building. I'm guessing what is happening here is that, dune is scanning for all packages currently installed and thedune-package
file gets installed by opam in the meantime (first created, then filed up I suppose. cc @AltGr @rjbou). With the right timing dune parses the file in the interval it is created but not filled up, and fails.The behaviour was observed with
dune.1.9.1
The text was updated successfully, but these errors were encountered: