-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Process supporting Data Identification #35
Comments
How's the procedure? is the persisted data a collection or a user job? If it's a collection at the back-end then load_collection with a filter on sci:doi is the easiest solution. This is now possible directly in load_collection. If it's a job result then you can only load it by job id at the moment as we haven't yet defined details about job results. But I'd like to store it with STAC metadata and then you could also add a filter option to load_results in future. But I think that's a very specialized case which is better solved by a filter, otherwise people would always want convenience functions for all their special tasks. Edit: The filter callback would be simply: |
This was just about a data collection at the back end, so I think a filter by doi would be perfectly fine in this case and I think also intuitive. load_result sounds also nice, this would make it easy to validate reproductions. |
I implemented something similar (an extension of the get_collection process) at the EODC back end in an test instance and I think, if this goes into the next version, EODC and I can implement it, so that we have some kind of "proof of concept" to show. |
Have you looked into the new load_collection (there's no get_collection in the next API version). It allows filtering directly when loading a collection. This is also discussed in todays telco, so maybe we can better discuss afterwards. How's your JSON definition of your new process? |
Yes I looked into it and I think this basically can be used by my current solution at the EODC test instance, by simply adding the filter expression as you suggested before. "dataid": { |
Oh, I just realized that load_collection only works for loading a subset of data from a collection. |
What's the ToDo here? I think solving #104 will make this usable... |
If there exists a back end that persists input data of jobs and provides an data identifier, it would be nice to have a process to automatically get the same input data as in a job before. Whereas I define data in this context as the collection identifier and all filter operations on it.
So if a back end provides an data identifier (something like doi), the user can call
e.g. "load_persisted_data" with the "doi" identifier as input and gets the same data as the original job, without having to look into the filter operations or having to extract only the filtering operations from a shared process graph. It then replaces the "load_collection" and the filter processes of the graph.
The text was updated successfully, but these errors were encountered: