Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Process supporting Data Identification #35

Closed
bgoesswe opened this issue Feb 21, 2019 · 7 comments
Closed

New Process supporting Data Identification #35

bgoesswe opened this issue Feb 21, 2019 · 7 comments
Labels
question Further information is requested
Milestone

Comments

@bgoesswe
Copy link
Member

If there exists a back end that persists input data of jobs and provides an data identifier, it would be nice to have a process to automatically get the same input data as in a job before. Whereas I define data in this context as the collection identifier and all filter operations on it.
So if a back end provides an data identifier (something like doi), the user can call
e.g. "load_persisted_data" with the "doi" identifier as input and gets the same data as the original job, without having to look into the filter operations or having to extract only the filtering operations from a shared process graph. It then replaces the "load_collection" and the filter processes of the graph.

@m-mohr m-mohr added the question Further information is requested label Feb 21, 2019
@m-mohr
Copy link
Member

m-mohr commented Feb 21, 2019

How's the procedure? is the persisted data a collection or a user job? If it's a collection at the back-end then load_collection with a filter on sci:doi is the easiest solution. This is now possible directly in load_collection. If it's a job result then you can only load it by job id at the moment as we haven't yet defined details about job results. But I'd like to store it with STAC metadata and then you could also add a filter option to load_results in future. But I think that's a very specialized case which is better solved by a filter, otherwise people would always want convenience functions for all their special tasks.

Edit: The filter callback would be simply: eq(property('sci:doi'), '1234/56789')

@bgoesswe
Copy link
Member Author

bgoesswe commented Feb 21, 2019

This was just about a data collection at the back end, so I think a filter by doi would be perfectly fine in this case and I think also intuitive. load_result sounds also nice, this would make it easy to validate reproductions.

@bgoesswe
Copy link
Member Author

I implemented something similar (an extension of the get_collection process) at the EODC back end in an test instance and I think, if this goes into the next version, EODC and I can implement it, so that we have some kind of "proof of concept" to show.

@m-mohr
Copy link
Member

m-mohr commented Feb 21, 2019

Have you looked into the new load_collection (there's no get_collection in the next API version). It allows filtering directly when loading a collection. This is also discussed in todays telco, so maybe we can better discuss afterwards. How's your JSON definition of your new process?

@bgoesswe
Copy link
Member Author

bgoesswe commented Feb 21, 2019

Yes I looked into it and I think this basically can be used by my current solution at the EODC test instance, by simply adding the filter expression as you suggested before.
My current JSON definition is just the same as get_collection from version 0.3.1
but with an additional filter for the data identifier doi:

"dataid": {
"description": "Filter by data identifier.",
"schema": {
"type": "string",
"examples": [
"doi:10.1038/eodc1170"
]
}
}

@m-mohr m-mohr added this to the v0.4 milestone Feb 21, 2019
@m-mohr
Copy link
Member

m-mohr commented Feb 21, 2019

Oh, I just realized that load_collection only works for loading a subset of data from a collection.
The process you'd need to use is find_collection with the same callback as specified above:
get_collection(first(find_collection(callback)))
Maybe we can make that a little easier in the future though.

@m-mohr m-mohr modified the milestones: v0.4, v0.5 Feb 21, 2019
@m-mohr
Copy link
Member

m-mohr commented Dec 16, 2019

What's the ToDo here? I think solving #104 will make this usable...

@m-mohr m-mohr closed this as completed Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants