-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESQL] Add TO_DATE_NANOS conversion function #112150
Conversation
Conflicts: x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Pinging @elastic/es-analytical-engine (Team:Analytics) |
@@ -347,6 +348,7 @@ private FunctionDefinition[][] functions() { | |||
def(ToCartesianShape.class, ToCartesianShape::new, "to_cartesianshape"), | |||
def(ToDatePeriod.class, ToDatePeriod::new, "to_dateperiod"), | |||
def(ToDatetime.class, ToDatetime::new, "to_datetime", "to_dt"), | |||
def(ToDateNanos.class, ToDateNanos::new, "to_date_nanos", "to_datenanos"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a to_datetime_nanos
? If I were accustomed to to_datetime
I'd expect a _nanos
equivalent function.
Otherwise, I guess the name of the function maps on the datetype name, but I guess I do find it a tad confusing the difference between name types (some relevant comments found in #99006)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't think so. If I was going to add another alias, I'd opt for date_nanoseconds
, to align with the title of https://www.elastic.co/guide/en/elasticsearch/reference/master/date_nanos.html, but really I think date_nanos is fine.
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDateNanos.java
Outdated
Show resolved
Hide resolved
|
||
@ConvertEvaluator(extraName = "FromLong", warnExceptions = { IllegalArgumentException.class }) | ||
static long fromLong(long in) { | ||
if (in < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 0L
} | ||
|
||
@ConvertEvaluator(extraName = "FromString", warnExceptions = { IllegalArgumentException.class }) | ||
static long fromKeyword(BytesRef in) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to have this method delegate to something in EsqlDataTypeConverter
(similar to nanoTimeToString()
)? I imagine this conversion might be useful when we add support for parsing nanos or bucketing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this conversion actually exists as org.elasticsearch.common.time.DateUtils#toLong
, and in retrospect I should probably just delegate to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the type to the EsqlDataTypeConverter.TYPE_TO_CONVERTER_FUNCTION
mapping and add a few tests with the ::
conversion operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments. Nothing major. Also:
- some more tests in csv-spec would be welcomed:
- a foldable function like
concat("2023-01-01","T12:12:12")
should be used as part ofeval
orrow
and its value computed in ato_date_nanos
or::date_nanos
. - tests with
null
as a constant and as a computed value:to_date_nanos(null + 1::long)
,eval x = null, y = concat("2023", null) | limit 1 | stats max(123) by to_date_nanos(y)
- some constant that fails the range check:
eval g = to_date_nanos(-5::long)
- tests with real indices data
- a foldable function like
- some tests with union types (
date
anddate_nanos
for example,long
anddate_nanos
) date_nanos
being a date data type, I think it's worth taking our date functions for a spin. I looked only atdate_extract("nano_of_day", to_date_nanos(z))
which fails withsecond argument of [date_extract("nano_of_day", to_date_nanos(z))] must be [datetime], found value [to_date_nanos(z)] type [date_nanos]
and it shouldn't. Indeed, I didn't want to extend the support you are adding with this PR and have it more contained, I only tried to test the new function in nested functions calls
string to date nanos, out of range | ||
required_capability: date_nanos_type | ||
|
||
ROW d = TO_DATE_NANOS("2262-04-12T00:00:00.000") | KEEP d; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the syntax ::date_nanos
is not supported (yet?). Is this intentional?
try { | ||
return Math.multiplyExact(in, 1_000_000L); | ||
} catch (ArithmeticException e) { | ||
// This seems like a more useful error message than "Long Overflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDateNanos.java
Outdated
Show resolved
Hide resolved
|
||
@ConvertEvaluator(extraName = "FromString", warnExceptions = { IllegalArgumentException.class }) | ||
static long fromKeyword(BytesRef in) { | ||
Instant parsed = DateFormatters.from(DEFAULT_DATE_NANOS_FORMATTER.parse(in.utf8ToString())).toInstant(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is something here throwing already on pre-1970 dates?
I don't see a test case for pre-1970 in ToDateNanosTests
, would be nice adding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this to use DateUtils.toLong()
which does that check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'm missing checks for pre-1970 dates for all types here. A single test checking all would be enough.
Nitpick because we already have the ToDateNanosTests
cases, but I'd add it here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added row tests for pre-1970 dates.
I have a separate ticket for this, as I expect it will be quite involved. #112885. That said, if you'd rather I did it in this PR, I would be fine with that. I just thought this PR was already getting pretty big.
Indeed, passing date nanos into date functions is not (yet) supported. The roadmap for the feature lists that work further down: #109352. At the moment we have it marked as not required for an MVP (i.e. before taking off the feature flag), but we can move it into the per-release work if we want to. It will definitely be required if we implement the auto casting into date nanos that we talked about recently. |
That's ok 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Resolves elastic#111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <[email protected]>
Resolves #111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <[email protected]>
Resolves #111842
This adds a conversion function that yields
DATE_NANOS
. Mostly this is straight forward.It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that
TO_DATE_NANOS
extendAbstractConvertFunction
, which itself extendsUnaryScalarFunction
, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.Making that assumption does, however, create some weirdness. Consider two comparisons:
TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360")
will return true whileTO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360")
will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.