Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL Support IN operator for Date nanos #119772

Merged
merged 17 commits into from
Jan 14, 2025

Conversation

not-napoleon
Copy link
Member

Add support for using nanosecond dates with the IN operator. This behavior should be consistent with equals, and support comparisons between milliseconds and nanoseconds the same as the binary comparison operators support it.

Resolves #118578

@elasticsearchmachine
Copy link
Collaborator

Hi @not-napoleon, I've created a changelog YAML for you.

elasticsearchmachine and others added 4 commits January 8, 2025 16:17
 Conflicts:
	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/In.java
@not-napoleon not-napoleon marked this pull request as ready for review January 9, 2025 14:10
@not-napoleon not-napoleon requested a review from bpintea January 9, 2025 14:10
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small notes, except for the check to prevent mixing millis and nonos values in the IN list:

This used to work, but no longer with these checks:

FROM employees
| EVAL x = NULL
| WHERE birth_date IN (TO_DATETIME("1958-02-19T00:00:00Z"), x)

(I think we should add this test, both for DATETIMEs and DATE_NANOS.)

I can write:

FROM date_nanos
| WHERE MV_FIRST(nanos) == TO_DATETIME("2023-10-23T12:27:28.948")

, but not:

FROM date_nanos
| WHERE MV_FIRST(nanos) IN (TO_DATETIME("2023-10-23T12:27:28.948"), TO_DATE_NANOS("2023-10-23T12:27:28.948"))

IMO we should make DATETIME and DATE_NANOS compatible and document the pitfalls. I expect it to be more useful for current/future dates, than dangerous with pre-epoch dates.

@@ -283,7 +321,7 @@ public EvalOperator.ExpressionEvaluator.Factory toEvaluator(ToEvaluator toEvalua
if (commonType == INTEGER) {
return new InIntEvaluator.Factory(source(), lhs, factories);
}
if (commonType == LONG || commonType == DATETIME || commonType == UNSIGNED_LONG) {
if (commonType == LONG || commonType == DATETIME || commonType == DATE_NANOS || commonType == UNSIGNED_LONG) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional:

Suggested change
if (commonType == LONG || commonType == DATETIME || commonType == DATE_NANOS || commonType == UNSIGNED_LONG) {
if (commonType == LONG || commonType.isDate() || commonType == UNSIGNED_LONG) {

Comment on lines 395 to 398
Boolean compResult = DateUtils.compareNanosToMillis(lhs, rhs[i]) == 0;
if (compResult == Boolean.TRUE) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the boxing. Comparisons.eq() used in other methods can return null (though not sure if at the respective call sites this can actually occur).

Suggested change
Boolean compResult = DateUtils.compareNanosToMillis(lhs, rhs[i]) == 0;
if (compResult == Boolean.TRUE) {
return true;
}
if (DateUtils.compareNanosToMillis(lhs, rhs[i]) == 0) {
return true;
}

@not-napoleon not-napoleon added the auto-backport Automatically create backport pull requests when merged label Jan 10, 2025
@not-napoleon
Copy link
Member Author

This used to work, but no longer with these checks:

FROM employees
| EVAL x = NULL
| WHERE birth_date IN (TO_DATETIME("1958-02-19T00:00:00Z"), x)

You're right. I missed the null case, and we didn't have tests for it. I fixed it and added tests.

I can write:

FROM date_nanos
| WHERE MV_FIRST(nanos) == TO_DATETIME("2023-10-23T12:27:28.948")

, but not:

FROM date_nanos
| WHERE MV_FIRST(nanos) IN (TO_DATETIME("2023-10-23T12:27:28.948"), TO_DATE_NANOS("2023-10-23T12:27:28.948"))

Yes, but you could write

FROM date_nanos
| WHERE MV_FIRST(nanos) IN (TO_DATETIME("2023-10-23T12:27:28.948"), TO_DATETIME("2023-10-23T12:27:28.948"))

IMO we should make DATETIME and DATE_NANOS compatible and document the pitfalls. I expect it to be more useful for current/future dates, than dangerous with pre-epoch dates.

They are compatible, in almost every situation. Even this operator correctly handles the mixed case. The only thing we don't support right now (other than the remaining open work) is exactly having both millisecond dates and nanosecond dates in the match list for the IN. So far, we've managed to achieve that without any need for autocasting.

I have a couple of concerns with applying autocasting here. The largest is that the results are not the same. Imagine a clause like WHERE nanos IN (TO_DATETIME("1960-01-01), TO_DATETIME("2020-01-01")) OR foo == bar. Under the logic as I have it now, and presuming nanos doesn't match "2020-01-01", the result of the IN clause will be false, and the result of the OR clause will be whatever foo == bar is. If we cast to nanos however, the 1960 date will be null, which will result in the whole IN expression being null, which will result in the whole OR expression being null. Rows that would have matched on foo == bar are now excluded.

This gets even worse because equals does not display this behavior (since it correctly compares millisecond dates that are out of the nanosecond range), so nanos == TO_DATETIME("1960-01-01) OR nanos == TO_DATETIME("2020-01-01") OR foo == bar would have idfferent behavior than the above clause. Changing between IN and == OR == is something the optimizer does, and it seems very trappy to me to create a situation where the optimizer can change the result of a clause.

Bear in mind that currently, in all the released versions of Elasticsearch, it is not possible to have nanosecond in an IN operator, much less mixed nanoseconds and millisecond. So it's not like this changes existing behavior. We can always add mixed support via autocasting later, but it's much harder to remove that once we release it, if we decide it's a bad idea.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@bpintea
Copy link
Contributor

bpintea commented Jan 14, 2025

If we cast to nanos however, the 1960 date will be null

Do we document this behaviour with pre-epoch values anywhere?
BTW, it seems we have the TO_DATE_NANOS doc files, but the function doesn't show up in the documentation -- maybe it's a matter of including a reference only?

@not-napoleon not-napoleon merged commit 187a6a7 into elastic:main Jan 14, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Jan 14, 2025
Add support for using nanosecond dates with the IN operator. This behavior should be consistent with equals, and support comparisons between milliseconds and nanoseconds the same as the binary comparison operators support it.

Resolves elastic#118578

---------

Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Jan 14, 2025
* ESQL Support IN operator for Date nanos (#119772)

Add support for using nanosecond dates with the IN operator. This behavior should be consistent with equals, and support comparisons between milliseconds and nanoseconds the same as the binary comparison operators support it.

Resolves #118578

---------

Co-authored-by: elasticsearchmachine <[email protected]>

* remove use of future java functions

---------

Co-authored-by: elasticsearchmachine <[email protected]>
@not-napoleon
Copy link
Member Author

Do we document this behaviour with pre-epoch values anywhere?
BTW, it seems we have the TO_DATE_NANOS doc files, but the function doesn't show up in the documentation -- maybe it's a matter of including a reference only?

Apparently I missed linking those docs. Fixed in #120124. I also made it explicit that it will return null for out of range values.

not-napoleon added a commit that referenced this pull request Jan 21, 2025
This adds appropriate ignore annotations for the date tests I added in #119772 to not run the tests against very old versions. Should be a no-op for main, but fix some tests in 8.x BWC. Still, I'd rather keep the tests in sync, so I'm applying this to main first. It's not set for auto-backport because I'll need to add the unmutes to the backport.
not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Jan 21, 2025
This adds appropriate ignore annotations for the date tests I added in elastic#119772 to not run the tests against very old versions. Should be a no-op for main, but fix some tests in 8.x BWC. Still, I'd rather keep the tests in sync, so I'm applying this to main first. It's not set for auto-backport because I'll need to add the unmutes to the backport.
not-napoleon added a commit that referenced this pull request Jan 22, 2025
Resolves #120159
Resolves #120157
Resolves #120151
Resolves #120158
Resolves #120155
Resolves #120156
Resolves #120348
Resolves #120349

This adds appropriate ignore annotations for the date tests I added in #119772 to not run the tests against very old versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Date Nanos for In function
3 participants