New Process supporting Data Identification #35

bgoesswe · 2019-02-21T09:03:11Z

If there exists a back end that persists input data of jobs and provides an data identifier, it would be nice to have a process to automatically get the same input data as in a job before. Whereas I define data in this context as the collection identifier and all filter operations on it.
So if a back end provides an data identifier (something like doi), the user can call
e.g. "load_persisted_data" with the "doi" identifier as input and gets the same data as the original job, without having to look into the filter operations or having to extract only the filtering operations from a shared process graph. It then replaces the "load_collection" and the filter processes of the graph.

m-mohr · 2019-02-21T09:29:27Z

How's the procedure? is the persisted data a collection or a user job? If it's a collection at the back-end then load_collection with a filter on sci:doi is the easiest solution. This is now possible directly in load_collection. If it's a job result then you can only load it by job id at the moment as we haven't yet defined details about job results. But I'd like to store it with STAC metadata and then you could also add a filter option to load_results in future. But I think that's a very specialized case which is better solved by a filter, otherwise people would always want convenience functions for all their special tasks.

Edit: The filter callback would be simply: eq(property('sci:doi'), '1234/56789')

bgoesswe · 2019-02-21T09:35:22Z

This was just about a data collection at the back end, so I think a filter by doi would be perfectly fine in this case and I think also intuitive. load_result sounds also nice, this would make it easy to validate reproductions.

bgoesswe · 2019-02-21T09:39:43Z

I implemented something similar (an extension of the get_collection process) at the EODC back end in an test instance and I think, if this goes into the next version, EODC and I can implement it, so that we have some kind of "proof of concept" to show.

m-mohr · 2019-02-21T09:42:25Z

Have you looked into the new load_collection (there's no get_collection in the next API version). It allows filtering directly when loading a collection. This is also discussed in todays telco, so maybe we can better discuss afterwards. How's your JSON definition of your new process?

bgoesswe · 2019-02-21T09:57:25Z

Yes I looked into it and I think this basically can be used by my current solution at the EODC test instance, by simply adding the filter expression as you suggested before.
My current JSON definition is just the same as get_collection from version 0.3.1
but with an additional filter for the data identifier doi:

"dataid": {
"description": "Filter by data identifier.",
"schema": {
"type": "string",
"examples": [
"doi:10.1038/eodc1170"
]
}
}

m-mohr · 2019-02-21T10:41:39Z

Oh, I just realized that load_collection only works for loading a subset of data from a collection.
The process you'd need to use is find_collection with the same callback as specified above:
get_collection(first(find_collection(callback)))
Maybe we can make that a little easier in the future though.

m-mohr · 2019-12-16T15:35:27Z

What's the ToDo here? I think solving #104 will make this usable...

m-mohr added the question Further information is requested label Feb 21, 2019

m-mohr added this to the v0.4 milestone Feb 21, 2019

m-mohr modified the milestones: v0.4, v0.5 Feb 21, 2019

m-mohr closed this as completed Dec 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Process supporting Data Identification #35

New Process supporting Data Identification #35

bgoesswe commented Feb 21, 2019

m-mohr commented Feb 21, 2019 •

edited

Loading

bgoesswe commented Feb 21, 2019 •

edited

Loading

bgoesswe commented Feb 21, 2019

m-mohr commented Feb 21, 2019

bgoesswe commented Feb 21, 2019 •

edited

Loading

m-mohr commented Feb 21, 2019 •

edited

Loading

m-mohr commented Dec 16, 2019

New Process supporting Data Identification #35

New Process supporting Data Identification #35

Comments

bgoesswe commented Feb 21, 2019

m-mohr commented Feb 21, 2019 • edited Loading

bgoesswe commented Feb 21, 2019 • edited Loading

bgoesswe commented Feb 21, 2019

m-mohr commented Feb 21, 2019

bgoesswe commented Feb 21, 2019 • edited Loading

m-mohr commented Feb 21, 2019 • edited Loading

m-mohr commented Dec 16, 2019

m-mohr commented Feb 21, 2019 •

edited

Loading

bgoesswe commented Feb 21, 2019 •

edited

Loading

bgoesswe commented Feb 21, 2019 •

edited

Loading

m-mohr commented Feb 21, 2019 •

edited

Loading