-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added option to copy data when combining datasets #82
Added option to copy data when combining datasets #82
Conversation
a bit uglier than initially, but unfortunately because of the limitations of h5py you can't do it like in the first commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good, and it seems to perform as intended, creating combined datasets that are not broken when the source datasets are deleted.
I think also it would be good to include a test that makes sure the true-copied combined dataset is still intact after deleting the source datasets.
In a future PR, I think the true copy behavior should be made the default for the combine_datasets()
function, and also the default for the combine
command in the cli, though it should still be possible to enable the external link combination behavior in the cli with the optional flag --external-link
I think we should make there be a check to see if combined datasets depend on other datasets, because right now you can break combined datasets with no warning by deleting their dependency datasets using minari delete
.
Curious to hear your thoughts on this.
@balisujohn yup, I will add the test for deletion. I think the default behavior should still be to create a link, so that it does not take up too much disk space. Usually it is still a combination of datasets for the convenience of working with them, and not to move it somewhere or delete old ones. |
LGTM :^) |
@Howuhh Thanks for your thoughts on this. I can understand the argument against switching to true copy as the primary method, so I'll hold off on making that change for the time being. I will make a task to add am optional command line flag to enable true copy to the cli command |
Description
Related to #56
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)pytest -v
and no errors are present.pytest -v
has generated that are related to my code to the best of my knowledge.