Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splitting up Simple CQL into multiple conformance classes #579

Closed
philvarner opened this issue May 12, 2021 · 38 comments · Fixed by #621 or #629
Closed

Consider splitting up Simple CQL into multiple conformance classes #579

philvarner opened this issue May 12, 2021 · 38 comments · Fixed by #621 or #629
Assignees
Labels

Comments

@philvarner
Copy link
Contributor

In the context of STAC Items, the biggest piece missing from the Part 1 "items" endpoint is a formalized mechanism for filtering based on fields in the Feature/Item's Properties. Implementations can support arbitrary query parameters to support this, but there's no "queryables" mechanism to discover what fields can be filtered on. For example, a common filter criteria is cloud cover, and there's no well-defined way to express a simple cloud_cover <= 10. An implementer may desire to only support this one predicate, and is satisfied with using the existing datetime and bbox for spatial and temporal filtering.

However, with the current Simple CQL conformance class, an implementer would be forced to support more than just simple comparison operations (e.g., gt, lt, etc.) and would also need to support all the logical operators and the more complex comparison operators, IN, LIKE, BETWEEN and IS NULL. (They would not actually need to implement INTERSECTS or ANYINTERACTS if they don't advertise any queryables of spatial or temporal types) This makes conforming to the CQL spec much more difficult that the needs of the implementation require, and will likely lead to implementers doing something proprietary or, even worse, only partially implementing CQL but advertising that it is conformant!

Additionally, I don't like the use of "Simple" for the class, as it is still a pretty complex language. I think a word like "basic" would be more appropriate here. Also, this class has "CQL" in the name, whereas as none of the other "operator" classes do.

I'd like to propose to divide Simple CQL into these separate conformance classes:

  • Basic Comparison Operators - =,<,<=. >, >=
  • Enhanced Comparison Operators - IN, LIKE, BETWEEN, IS NULL
  • Logical Operators - and, or, not (these could also go into a "basic logical and comparison operators" class, but it feels a little cleaner to split them out)
  • Basic Spatial Operators - intersects
  • Basic Temporal Operators - anyinteracts

While this does introduce some overhead, in that now an implementation has to advertise 5 classes instead of 1, I think there is a large benefit in allowing implementations with limited needs to properly implement a subset of these conformance classes, instead of simply not conforming to CQL at all. I think that this will result in both more robust implementations and provide an "on-ramp" for an implemenation to support more and more of the CQL conformance classes over time.

Thanks for your consideration of this!

@pvretano
Copy link
Contributor

pvretano commented May 12, 2021

@philvarner we have already been discussing removing the spatial and temporal operators for simple CQL altogether and simply having two classes (spatial operators, temporal operators) where all these operators reside. This was motivated by the fact that features already provides bbox and date-time for simple spatial and temporal filtering so there is no use in duplicating those capabilities in "simple" CQL.
I am not opposed to taking that logic one step further and breaking the operators down into finer classes. I don't think it would be such a big deal for implementations ... 1 URL, 5 URLs in the conformance sections ... not big deal! Would be a lot of document work though ... but ... meh! ;)

@cholmes
Copy link
Member

cholmes commented May 12, 2021

I agree with the logic of having the most 'basic' level enabling simple comparisons. That's one that seems like it could almost be in 'core', or at least is the clear 'next step', and shouldn't force people to do a whole lot more.

I think the key is to get the 'groupings' right. I believe as Phil laid them out it'll be 10 total classes, the five core plus functions, arithmetic expressions, arrays, enhanced spatial and enhanced temporal.

I could see whittling that down some, if we thought ten was a lot. Put logical in basic comparisons, and just have one 'spatial' and one 'temporal'. I think I do like 'enhanced comparison' as its own - there's a lot of extras in there that take more work.

Spatial I think I do like having 'intersects' and 'enhanced spatial operators' separated out. Intersects get you actual geometries vs just the core, and many spatial backends don't implement the full suite of spatial operators. In STAC's query language we just had 'intersects' (above what we got from features API).

Temporal I honestly still don't understand, but perhaps should try to dig in. But it seems like the basic comparisons should handle time? I don't really understand what anyinteracts does, but if basic comparisons handle time then there doesn't seem to be much value in having just one temporal operation as its own class.

I guess I don't see much that'd be wrong about having 9 or 10 conformance classes. I think temporal is the one I'd consider just doing one class of temporal operators, but I also don't fully understand it.

I do like this makes for an easier 'on ramp'. I would love us to consider actually breaking the spec into smaller pieces, so we don't present people with a 99 page document when they first hear about CQL. I likely have some funded hours to be able to help with document work if people are up for that (or I can focus my time on the 'guide'/'intro' texts to accompany the core spec text).

@philvarner
Copy link
Contributor Author

@philvarner we have already been discussing removing the spatial and temporal operators for simple CQL altogether and simply having two classes (spatial operators, temporal operators) where all these operators reside. This was motivated by the fact that features already provides bbox and date-time for simple spatial and temporal filtering so there is no use in duplicating those capabilities in "simple" CQL.

I would prefer to also split the spatial and temporal operators into basic (intersects, anyinteracts) and Enhanced. For spatial operators, one practical reason is that Elasticsearch only supports intersects, so if they were all in a single spatial class, someone using Elastic could not implement it. For temporal, I think most use cases only need anyinteracts, and the other ones just add complexity that isn't necessary for many implementations.

I would also argue against there being "no use" for intersects in simple CQL -- I think this will be used commonly in scientific applications. bbox is obviously restricted to a rectangle, and that doesn't work well for many of the scientific use cases i've encountered -- for example, several different small areas or points over a large geographic areas, or a complex shape whose bounding box has significantly more area that the shape itself.

I am not opposed to taking that logic one step further and breaking the operators down into finer classes. I don't think it would be such a big deal for implementations ... 1 URL, 5 URLs in the conformance sections ... not big deal! Would be a lot of document work though ... but ... meh! ;)

sounds good to me!

@philvarner
Copy link
Contributor Author

Temporal I honestly still don't understand, but perhaps should try to dig in. But it seems like the basic comparisons should handle time? I don't really understand what anyinteracts does, but if basic comparisons handle time then there doesn't seem to be much value in having just one temporal operation as its own class.

anyinteracts is effectively "do these two datetimes or intervals have any intersection with each other", and is effectively the time equivalent to spatial intersects. It's what the existing datetime query parameter semantics are now.

@cholmes
Copy link
Member

cholmes commented May 12, 2021

anyinteracts is effectively "do these two datetimes or intervals have any intersection with each other", and is effectively the time equivalent to spatial intersects. It's what the existing datetime query parameter semantics are now.

Then that would seem to me to indicate that a basic class with just 'anyinteracts' isn't so valuable since you can accomplish that with datetime in core already? Or am I misunderstanding? If so then I'd lean towards just a single 'time' conformance class (and then have spatial split in 2, that one I agree there's value in just 'intersects' above what features api core offers).

@pvretano
Copy link
Contributor

@cholmes correct ... which is why in the SWG we discussed just removing both intersect and anyinteracts from SimpleCQL and putting those operators into the Extended Spatial Operators and Extended Temporal Operators classes (and then dropping "Extended").
I believe that @philvarner is arguing that there is still value in having simpler spatial and temporal operator classes and also having more advanced spatial and temporal operator classes to accommodate various level of requirement from simple to more complex.

In the old OGC Filter specification we got around this be simply letting a server list which operators the server supported .... see here.

Its starting to sound like this idea might be usefully translated into the OGC APIs since no matter how we decide to partition the operators, someone is not going to be happy! ;)

@philvarner
Copy link
Contributor Author

anyinteracts is effectively "do these two datetimes or intervals have any intersection with each other", and is effectively the time equivalent to spatial intersects. It's what the existing datetime query parameter semantics are now.

Then that would seem to me to indicate that a basic class with just 'anyinteracts' isn't so valuable since you can accomplish that with datetime in core already? Or am I misunderstanding? If so then I'd lean towards just a single 'time' conformance class (and then have spatial split in 2, that one I agree there's value in just 'intersects' above what features api core offers).

The operation is the same, but you can't compose them with logical operators in core datetime. This is useful for queries that do period-over-period comparisons, e.g., this day in each of 5 years. I think there's also value in being able to push all individual query params into just a unified CQL statement instead of having some filters in each.

@jisantuc
Copy link

you can't compose them with logical operators in core datetime

^^ that to me seems like the important part of CQL -- since it's the only place where this kind of expressiveness is available (at least in STAC API spec) i don't think it's bad if specific combinations of parameters happen to duplicate functionality elsewhere.

Multiple conformance classes also makes sense to me -- as already talked about with ElasticSearch there are going to be queries that are easier / more difficult to support in different settings, and letting people build their conformance a la carte will be nicer than the alternatives I think. I don't have much of a horse in the race of how finely things are sliced -- Franklin will probably implement all of them, but I still think it's valuable not to have to.

@cholmes
Copy link
Member

cholmes commented May 13, 2021

Its starting to sound like this idea might be usefully translated into the OGC APIs since no matter how we decide to partition the operators, someone is not going to be happy! ;)

I think I prefer us to actually think through the best grouping, as I think we can get to groupings that make most people happy, and I see value in chunks of functionality. This can also help implementors figure out what to prioritize.

@cholmes
Copy link
Member

cholmes commented May 13, 2021

Then that would seem to me to indicate that a basic class with just 'anyinteracts' isn't so valuable since you can accomplish that with datetime in core already? Or am I misunderstanding? If so then I'd lean towards just a single 'time' conformance class (and then have spatial split in 2, that one I agree there's value in just 'intersects' above what features api core offers).

The operation is the same, but you can't compose them with logical operators in core datetime. This is useful for queries that do period-over-period comparisons, e.g., this day in each of 5 years. I think there's also value in being able to push all individual query params into just a unified CQL statement instead of having some filters in each.

Ah, gotcha. Cool, makes sense to me then.

@jampukka
Copy link
Contributor

jampukka commented May 14, 2021

I prefer groups/classes over listing each operator separately.

Like @cholmes and @philvarner I also think there's value in intersects being separate from rest of the spatial operators as it's the most commonly used and also supported by different backends plus it works with geometries not just envelopes.

@cportele
Copy link
Member

Meeting 2021-06-21: We do not want the conformance classes too fine grained (this did not work well in Filter Encoding with the filter capabilities). We have already moved intersects and anyinteracts to the spatial/temporal conformance class. We might also move IN, LIKE and BETWEEN to an enhanced comparison operator conformance class, but currently think that the basic comparison operators, the logical operators and IS NULL should be in "Simple CQL". @cportele to make a proposal. We are aware that we will never be able to make it correct for everyone.

Question during the discussion: Can prop = NULL be used to test for null? We think this is not consistently defined in SQL, @pvretano will check.

@cportele
Copy link
Member

cportele commented Jul 5, 2021

Proposal for the CQL2 conformance classes based on the discussion:

  • logical expressions (AND/OR/NOT), sub-expressions and basic comparison predicates on strings/numbers/boolean/instants with properties as first operand, literals as second operand - "BasicCQL"
  • LIKE/BETWEEN/IN operators, UPPER()/LOWER() - "Advanced comparison predicates"
  • spatial predicates
  • temporal predicates
  • array predicates
  • literals also as first operand, properties also as second+ operand - "???"
  • (custom) functions
  • arithmetic expressions

@philvarner
Copy link
Contributor Author

Could you explain what CQL2 is?

I think that this proposal doesn't account for datastores that don't support all the spatial and temporal predicates, and that's there's a need for classes that only support INTERSECTS and ANYINTERACTS but not any of the other predicates.

@philvarner
Copy link
Contributor Author

Also, to clarify "IS NULL" will be part of "BasicSQL"?

@philvarner
Copy link
Contributor Author

I would prefer to have "ILIKE" in Advanced Comparison Operators rather than UPPER()/LOWER() functions, so that an implementation can entirely ignore parsing functions in expressions.

@cportele
Copy link
Member

cportele commented Jul 5, 2021

Meeting 2021-07-05: general agreement that the breakdown makes sense.

@philvarner - regarding your comments/questions: @pvretano will respond to the first comment. Regarding IS NULL, yes it is the idea to have it in BasicCQL, it is seems to us to be an important basic test. Regarding the approach to LIKE (function or ILIKE etc, this is discussed in #541 and related issues and we should continue that discussion there).

@pvretano
Copy link
Contributor

pvretano commented Jul 5, 2021

@philvarner Part 3 was derived from the CQL that was defined in old Catalog standard and for a long time Part 3 was conformant to CQL from the old Catalog standard so it was safe to use the term "CQL" to refer to either. However, with all the recent changes, the two CQLs are diverging quite a bit and so the SWG felt that we should refer to the CQL from Part 3 as CQL2 to avoid any confusion.

@philvarner with regard to supporting INTERSECTS and ANYINTERACTS, the feeling of the SWG was that Part 1 already defines bbox and datetime that take care of basic spatial and temporal filtering and there is no use in duplicating that capability in the BasicCQL conformance class. There might be a discussion point for further sub-dividing the spatial and temporal predicate conformance classes into basic spatial/temporal and then the rest. That is something we will need to discuss at the next SWG meeting.

@philvarner
Copy link
Contributor Author

@philvarner Part 3 was derived from the the CQL that was defined in old Catalog standard and for a long time Part 3 was conformant to CQL from the old Catalog standard so it was safe to use the term "CQL" to refer to either. However, with all the recent changes, the two CQLs are diverging quite a bit and so the SWG felt that we should refer to the CQL from Part 3 as CQL2 to avoid any confusion.

Thanks for the details!

@philvarner with regard to supporting INTERSECTS and ANYINTERACTS, the feeling of the SWG was that Part 1 already defines bbox and datetime that take care of basic spatial and temporal filtering and there is no use in duplicating that capability in the BasicCQL conformance class. There might be a discussion point for further sub-dividing the spatial and temporal predicate conformance classes into basic spatial/temporal and then the rest. That is something we will need to discuss at the next SWG meeting.

I am in favor of dividing them. I'm going to break these up differently in the STAC API Filter Extension conformance classes for now, but hopefully we'll re-align at some point in the future.

The two main issues I see are (1_ that bboxes are too course-grained for many applications, for example, where scientists have a precise shapefile of the area of interest, and (2) the bbox or datetimes cannot be composed with "OR" when there are several related areas of interest or when trying to query for period-over-period data (e.g., the same month in multiple years)

Thanks!

cportele added a commit that referenced this issue Aug 16, 2021
reorganise conformance classes

Merge as agreed in the meeting today. The issue #579 will remain open while we resolve the open discussion topics.
@philvarner
Copy link
Contributor Author

Sorry I didn't get back to you about this before the meeting.

I have the same concern with having T_INTERSECTS in what is now the Enhanced Temporal Operators as I had for S_INTERSECTS. Some databases like Elasticsearch don't support all of those operators, so then that implementation can't support any conformance class for temporal operators.

@cportele
Copy link
Member

Thanks @philvarner. But does Elasticsearch support intervals and if the answer is "no", does it really need the temporal operators or are the comparison operators like >, =, etc. not enough? Temporal is really different from spatial in that respect. What filter expression do you want to support, where you need T_INTERSECTS() and the comparison operators are not sufficient? See #579 (comment).

@cportele cportele reopened this Aug 16, 2021
@philvarner
Copy link
Contributor Author

After talking with @pvretano earlier at the STAC API, it seems that one thing I missed was that the basic comparison operators support datetime comparison?

So if I only implement Comparison operators conformance class, I can write a CQL statement (assuming datetime is a queryable) datetime >= "2014-12-05T12:30:45Z" and must do the right temporal comparison?

If this is the case, it's not clear in the spec, and the grammar seems to indicate this is not allowed.

@pvretano
Copy link
Contributor

pvretano commented Aug 16, 2021

@philvarner just checked the BNF and the text encoding allows using standard comparison operators to evaluate simple temporal predicates involving time instants. There seems to be a bug in the JSON encoding, though, which I will fix. I'll create a PR to fix the JSON encoding problem and add more text to clarify this use case. Please stand by.

@pvretano
Copy link
Contributor

@philvarner OK, I patched the bug in the CQL2 JSON schema to allow standard comparison operators to be used to evaluate simple temporal predicates involving time instants. The Basic CQL2 clause already includes a permission for this. I added a reciprocal note to the "Temporal Operators" clause pointing back to the relevant sections in the Basic CQL2 clause. I hope that is sufficient clarity. Let me know if it is not ...

@cportele
Copy link
Member

Reopened, because the approach to case-insensitive comparisons is still open (#579 (comment)):

  • @pvretano will look into how the various backends support upper/lower functions. We will discuss this aspect after the analysis.
  • We will also decide, if the support for case-insensitive comparisons should be its own requirements class.

@cportele cportele reopened this Aug 17, 2021
@pvretano
Copy link
Contributor

@cportele oops! Didn't mean to close this. Thanks for reopening.

@philvarner
Copy link
Contributor Author

I can't remember if I commented on lower/upper or not, but I would prefer this be a separate conformance class because including it in the IN/BETWEEN/LIKE class would require implementers to implement function parsing (even if just for these two named function) just to advertise IN/BETWEEN/LIKE support. This makes it a much bigger piece of work and I worry could harm adoption.

@philvarner
Copy link
Contributor Author

@philvarner just checked the BNF and the text encoding allows using standard comparison operators to evaluate simple temporal predicates involving time instants. There seems to be a bug in the JSON encoding, though, which I will fix. I'll create a PR to fix the JSON encoding problem and add more text to clarify this use case. Please stand by.

Ah, I see this wasn't present in the draft for CQL Text, but was added last month 7c3e18c

I should probably go through the latest version again and start using that as a reference.

Thanks!

@philvarner
Copy link
Contributor Author

I just created a PR to realign STAC API - Filter Extension with this https://github.com/radiantearth/stac-api-spec/pull/202/files

The classes are now aligned with the exception that we do not require upper/lower as part of Advanced Comparison Ops.

Though, I am reconsidering that. Now that IS NULL is part of the Basic CQL, LIKE is the only operator that can't otherwise be expressed in Basic CQL (e.g., IN and BETWEEN can be converted to more verbose clauses).

I would prefer that an implementor didn't need to implement function parsing to support this class, but I think, ultimately, it's not worth having this minor difference between STAC API and OAFeat CQL2, so I'll defer to whatever y'all finalize.

@cportele
Copy link
Member

Meeting 2021-08-30: We will separate the case-insensitive comparison support into a separate requirements class. This satisfies the preference from the STAC community and also helps to separate this capability where we had lots of discussions into a separate module. What is still open is the details on how we provide the capability (still a @pvretano action). @cportele will introduce the new requirements class in the document.

@philvarner
Copy link
Contributor Author

Thanks!

cportele added a commit that referenced this issue Sep 13, 2021
closes #579, the open issue how to implement case-insensitive comparisons will be transferred to a new issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment