Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESQL] Add TO_DATE_NANOS conversion function #112150

Merged
merged 28 commits into from
Sep 26, 2024

Conversation

not-napoleon
Copy link
Member

@not-napoleon not-napoleon commented Aug 23, 2024

Resolves #111842

This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward.

It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.

Making that assumption does, however, create some weirdness. Consider two comparisons:

TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.

@not-napoleon not-napoleon marked this pull request as ready for review September 11, 2024 13:36
@not-napoleon not-napoleon requested a review from a team as a code owner September 11, 2024 13:36
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@@ -347,6 +348,7 @@ private FunctionDefinition[][] functions() {
def(ToCartesianShape.class, ToCartesianShape::new, "to_cartesianshape"),
def(ToDatePeriod.class, ToDatePeriod::new, "to_dateperiod"),
def(ToDatetime.class, ToDatetime::new, "to_datetime", "to_dt"),
def(ToDateNanos.class, ToDateNanos::new, "to_date_nanos", "to_datenanos"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a to_datetime_nanos? If I were accustomed to to_datetime I'd expect a _nanos equivalent function.
Otherwise, I guess the name of the function maps on the datetype name, but I guess I do find it a tad confusing the difference between name types (some relevant comments found in #99006)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't think so. If I was going to add another alias, I'd opt for date_nanoseconds, to align with the title of https://www.elastic.co/guide/en/elasticsearch/reference/master/date_nanos.html, but really I think date_nanos is fine.


@ConvertEvaluator(extraName = "FromLong", warnExceptions = { IllegalArgumentException.class })
static long fromLong(long in) {
if (in < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 0L

}

@ConvertEvaluator(extraName = "FromString", warnExceptions = { IllegalArgumentException.class })
static long fromKeyword(BytesRef in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to have this method delegate to something in EsqlDataTypeConverter (similar to nanoTimeToString())? I imagine this conversion might be useful when we add support for parsing nanos or bucketing them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this conversion actually exists as org.elasticsearch.common.time.DateUtils#toLong, and in retrospect I should probably just delegate to that.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the type to the EsqlDataTypeConverter.TYPE_TO_CONVERTER_FUNCTION mapping and add a few tests with the :: conversion operator.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments. Nothing major. Also:

  • some more tests in csv-spec would be welcomed:
    • a foldable function like concat("2023-01-01","T12:12:12") should be used as part of eval or row and its value computed in a to_date_nanos or ::date_nanos.
    • tests with null as a constant and as a computed value: to_date_nanos(null + 1::long), eval x = null, y = concat("2023", null) | limit 1 | stats max(123) by to_date_nanos(y)
    • some constant that fails the range check: eval g = to_date_nanos(-5::long)
    • tests with real indices data
  • some tests with union types (date and date_nanos for example, long and date_nanos)
  • date_nanos being a date data type, I think it's worth taking our date functions for a spin. I looked only at date_extract("nano_of_day", to_date_nanos(z)) which fails with second argument of [date_extract("nano_of_day", to_date_nanos(z))] must be [datetime], found value [to_date_nanos(z)] type [date_nanos] and it shouldn't. Indeed, I didn't want to extend the support you are adding with this PR and have it more contained, I only tried to test the new function in nested functions calls

string to date nanos, out of range
required_capability: date_nanos_type

ROW d = TO_DATE_NANOS("2262-04-12T00:00:00.000") | KEEP d;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the syntax ::date_nanos is not supported (yet?). Is this intentional?

try {
return Math.multiplyExact(in, 1_000_000L);
} catch (ArithmeticException e) {
// This seems like a more useful error message than "Long Overflow"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


@ConvertEvaluator(extraName = "FromString", warnExceptions = { IllegalArgumentException.class })
static long fromKeyword(BytesRef in) {
Instant parsed = DateFormatters.from(DEFAULT_DATE_NANOS_FORMATTER.parse(in.utf8ToString())).toInstant();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something here throwing already on pre-1970 dates?
I don't see a test case for pre-1970 in ToDateNanosTests, would be nice adding it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to use DateUtils.toLong() which does that check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm missing checks for pre-1970 dates for all types here. A single test checking all would be enough.
Nitpick because we already have the ToDateNanosTests cases, but I'd add it here too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added row tests for pre-1970 dates.

@not-napoleon
Copy link
Member Author

@astefan

some tests with union types (date and date_nanos for example, long and date_nanos)

I have a separate ticket for this, as I expect it will be quite involved. #112885. That said, if you'd rather I did it in this PR, I would be fine with that. I just thought this PR was already getting pretty big.

date_nanos being a date data type, I think it's worth taking our date functions for a spin. I looked only at date_extract("nano_of_day", to_date_nanos(z)) which fails with second argument of [date_extract("nano_of_day", to_date_nanos(z))] must be [datetime], found value [to_date_nanos(z)] type [date_nanos] and it shouldn't. Indeed, I didn't want to extend the support you are adding with this PR and have it more contained, I only tried to test the new function in nested functions calls

Indeed, passing date nanos into date functions is not (yet) supported. The roadmap for the feature lists that work further down: #109352. At the moment we have it marked as not required for an MVP (i.e. before taking off the feature flag), but we can move it into the per-release work if we want to. It will definitely be required if we implement the auto casting into date nanos that we talked about recently.

@astefan
Copy link
Contributor

astefan commented Sep 18, 2024

I have a separate ticket for this, as I expect it will be quite involved. #112885.

That's ok 👍

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@not-napoleon
Copy link
Member Author

@elasticmachine update branch

@not-napoleon not-napoleon merged commit 122e728 into elastic:main Sep 26, 2024
15 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 112150

not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Sep 26, 2024
Resolves elastic#111842

This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward.

It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.

Making that assumption does, however, create some weirdness. Consider two comparisons:

TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.

---------

Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Sep 27, 2024
Resolves #111842

This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward.

It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.

Making that assumption does, however, create some weirdness. Consider two comparisons:

TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL backport pending >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ESQL] create TO_DATE_NANOS function
8 participants