-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inplace save & update dataset #102
Conversation
214d859
to
a09ddfd
Compare
@zhiltsov-max , do we have any difficulties to solve that: "The order of elements in a Dataset is maintained, but is not guaranteed to be the same after saving and loading"? I'm not sure that it is critical, but I prefer deterministic behavious if it is easy to achieve. |
@nmanovic, if a format represents a dataset with several subset files, it is impossible to reproduce initial item ordering. Example:
|
@zhiltsov-max , should we update documentation? Are you planning to add some short tutorials for new use cases? |
@nmanovic, I'd prefer to update documentation after new API for operations are introduced, otherwise the changes are hard to perceive. Small catchy examples were added earlier, they still work - but now they also have good performance because of added transparent caching. Thorough documentation will be added with r0.2 (VCS) / r0.3 (stable API) and stable API introduction. |
Summary
Dataset
operations are finally made lazyDataset
Dataset
is maintained, but is not guaranteed to be the same after saving and loadingDatumaro
formatDataset
interface with cache control, changed data info, source path and format infoDataset.get()
returnsNone
instead of raising an exception when the item doesn't existin
operator forDataset
get
operation forExtractor
Dataset
classConverter
interface is extended by optional operation to support partial data update (patch()
). The default implementation uses the regular full-dataset saving.Exception
Dataset
can track updates and generate patches. Transform is considered updating the whole datasetDataset.get_subset
provides modifiable slicesTBD:
How to test
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.