-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile: only initialise the repository during the migration #4900
Profile: only initialise the repository during the migration #4900
Conversation
I haven't added a test yet that should prevent the same regression that we have twice now. But this should fix the bug and improves the code/logic in general. Will think of a test soon, but this way the code can already be reviewed |
Codecov Report
@@ Coverage Diff @@
## develop #4900 +/- ##
===========================================
+ Coverage 80.06% 80.07% +0.01%
===========================================
Files 518 518
Lines 36680 36683 +3
===========================================
+ Hits 29366 29371 +5
+ Misses 7314 7312 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
The new disk object store migration that will ship with `v2.0` requires to be initialised once and only once, and it will generate the necessary folders and configuration file. This process was being done in the method `Profile.get_repository` guarded by a check in case it was already initialised. The problem with this was that the container could be initialised too early during the early stages of a profile setup. As soon as the repository was fetched, it would be initialised generating, among other things, the UUID. This would then trigger the check that the database contained the same UUID, which would of course fail, since the database was empty. This could have been fixed by ignoring the check at this point, but the real problem is that the repository should not be initialised at this point. The only point at which the repo should be initialised is during the corresponding database migration that introduced the disk object store repository. Both for existing as well as for new profiles, they will go through this migration and so it and it alone should be responsible for initialising the repository. This approach did create problems for the unittests though, as they would sometimes clean the repository. It would this not by just removing the contents, but it would delete the entire container. This meant it had to be recreated, but since in normal operations this only happens during the migration (which also during tests only happens once, unless maybe during the migration tests themselves) and so an error would be raised that the repository is not initialised. The solution is to reinitialise a new repo as soon as the old one was destroyed. Currently this is done by simply deleting the folder on disk and reinitialising an entire new instance. In the future, it would be better if the existing container could be kept and its contents could simply be dropped, but this would require a feature in the `disk-objectstore` library.
79efc65
to
10dd224
Compare
I'll test this now (for a small db only). Edit: Done. Tested for a DB I had. It represents a part of Thibault's old data. |
Thanks a lot for the testing @CasperWA . As responded in that document, the problem is a new one and unrelated to the issue that this PR fixes. Since you were able to migrate at all, #4897 can be considered fixed by this. We should investigate more how the failure of your test is possible at all. For reference, the migration that changes the
So it should not have been able to insert |
I think this can be merged, since it seems to fix the specific issue it aims to address. The other failures mentioned by @CasperWA are edge-cases of certain databases that are failures of other specific migrations, which should be addressed in a separate issue. @mbercx would you be able to review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, thanks @sphuber! I've also tested the migration for a ~30k node database, and it seems to have gone smoothly. Also tested:
- Querying ✅
- Group operations ✅
- Running computations ✅
verdi archive create
- singleStructureData
(to see if Repository:verdi archive create
fails due to emptyrepository_metadata
#4892 is truly fixed) ✅verdi archive create
-Group
withWorkChainNodes
✅
Fixes #4897
The new disk object store migration that will ship with
v2.0
requiresto be initialised once and only once, and it will generate the necessary
folders and configuration file. This process was being done in the
method
Profile.get_repository
guarded by a check in case it wasalready initialised.
The problem with this was that the container could be initialised too
early during the early stages of a profile setup. As soon as the
repository was fetched, it would be initialised generating, among other
things, the UUID. This would then trigger the check that the database
contained the same UUID, which would of course fail, since the database
was empty. This could have been fixed by ignoring the check at this
point, but the real problem is that the repository should not be
initialised at this point. The only point at which the repo should be
initialised is during the corresponding database migration that
introduced the disk object store repository. Both for existing as well
as for new profiles, they will go through this migration and so it and
it alone should be responsible for initialising the repository.
This approach did create problems for the unittests though, as they
would sometimes clean the repository. It would this not by just removing
the contents, but it would delete the entire container. This meant it
had to be recreated, but since in normal operations this only happens
during the migration (which also during tests only happens once, unless
maybe during the migration tests themselves) and so an error would be
raised that the repository is not initialised. The solution is to
reinitialise a new repo as soon as the old one was destroyed. Currently
this is done by simply deleting the folder on disk and reinitialising an
entire new instance. In the future, it would be better if the existing
container could be kept and its contents could simply be dropped, but
this would require a feature in the
disk-objectstore
library.