-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/improve data filtering #84
Comments
Correct.
I don't have a process graph at hand, but it should be something like:
Notes:
Couldn't you just use the callback to create whatever filter you need? I mean the callback could be more than just a set of parameters, e.g. a logical expression with some and and or etc.
I like this idea, but STAC Query API is not finished yet and is likely to change in the next STAC/WFS sprint. So I'd like to wait for the sprint to happen before deciding for this approach. |
@m-mohr Thank you for the explanations, appreciate it! We will try to implement something, and will try to keep it as close to specification as possible - at least this way we will be better prepared with concrete suggestions on how to improve it.
We could, our main concern is that often one wants to limit loading to a subset of data, which is often achieved by passing additional parameters to the backend. Generic filtering (which supports logical expressions) is stronger than just specifying the keys/values, so it would need to be implemented on the side of the driver (instead of just passing appropriate parameters to the backend) because at least Sentinel Hub doesn't support a similar mechanism. Which means that all data (or at least metadata) would need to be loaded by driver, only to discard it with filtering, which is not ideal. The way I see it, there are three options:
I might be biased, but I find 2nd and 3rd option much easier to implement in an efficient way. :)
👍 Thank you again for the example, will try to implement something similar. |
@sinergise-anze How did you proceed with your implementation? Any lessons learned to share? Unfortunately, the STAC/OGC meeting ended with a less definitive solution as I would have hoped. OGC tries to (re-)define a CQL-based query language. STAC will for now stick with an updated version of their query language (probably until CQL is ready). With the STAC query language I have some concerns it is a bit flawed at the moment (see radiantearth/stac-spec#692) and might limit us. CQL is not ready at all yet. Our approach is also flawed (how to define which field to work on?), but still better aligned to our general data model and probably relatively easy to fix. Also, I think it is relatively easy to convert into something "STAC-ish". |
Example for two equal queries: STAC: {
"query": {
"eo:cloud_cover": {
"lt": 50
},
"provider": {
"eq": "Planet"
},
"published": {
"gte": "2018-02-12T00:00:00Z",
"lte": "2018-03-18T12:31:12Z"
},
"pl:item_type": {
"startsWith": "PSScene"
},
"product": {
"in": ["foo","bar"]
}
}
} openEO, based on what I expect to be in 1.0 (might slightly differ). {
"all": {
"process_id": "all",
"arguments": {
"expressions": [
{
"process_id": "lt",
"arguments": {
"x": {"from_metadata": "eo:cloud_cover"},
"y": 50
}
},
{
"process_id": "eq",
"arguments": {
"x": {"from_metadata": "provider"},
"y": "Planet"
}
},
{
"process_id": "between",
"arguments": {
"x": {"from_metadata": "published"},
"min": "2018-02-12T00:00:00Z",
"max": "2018-03-18T12:31:12Z"
}
},
{
"process_id": "text_begins",
"arguments": {
"data": {"from_metadata": "pl:item_type"},
"pattern": "PSScene",
"case_sensitive": false
}
},
{
"process_id": "any",
"arguments": {
"expressions": [
{
"process_id": "array_contains",
"arguments": {
"data": {"from_metadata": "product"},
"value": "foo"
}
},
{
"process_id": "array_contains",
"arguments": {
"data": {"from_metadata": "product"},
"value": "bar"
}
}
]
}
}
]
},
"result": true
}
} Yes, that's more verbose... |
@m-mohr We kept it simple. Since the only thing we needed was to pass some options to the backend, we implemented it like this:
Note that this approach proved to be sufficient for the use-cases we have tried, and since it translates more or less 1:1 to what our service supports, I think it should suffice. That said, a more powerful mechanism is of course better, as long as it is not too difficult to implement. |
I finished work on this, see PR #128. |
We are trying to implement support for for Sentinel-1 GRD data in load_collection process. The data in this collection has a property "orbitDirection" (sar:pass_direction), with possible values
"ascending"
,"descending"
ornull
(any). Our backend supports filtering by this property.In load_collection process, I assume
properties
is meant for filtering the data that is to be loaded. However, it is unclear to me how the process graph would look like. Are there any examples available, or could someone please create a short example for this?The issue with
properties
(as defined, that is: with callbacks) is that it would imho be quite difficult to convert the properties filters to query parameters for our backend, which means thatload_collection
would need to fetch all metadata, perform filtering on its end (for example just items withorbitDirection
==ascending
), and then request only data for remaining items. Not impossible, just more difficult to implement and a bit less optimal performance-wise. Ideally we would want to pass the appropriate parameters when fetching metadata, so that backend would already take care of filtering.That said, since
properties
are not finalized yet, one alternative suggestion would be to usefilters
instead ofproperties
, and use notation of STAC Query API. For example:This would of course open an issue of how to describe which kind of queries the backend supports (
startsWith
might be unsupported) - or, alternatively,load_collection
could simulate filtering if backend lacks such support.I would appreciate some thoughts on this.
The text was updated successfully, but these errors were encountered: