Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix fromDateTime malform handling #11258

Merged
merged 1 commit into from
Sep 25, 2023
Merged

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Aug 3, 2023

Allow malformed dateTime string to return default value configurable in the function signature (must explicitly state the timezone)
syntax

fromDateTime(colContainsMalformedStr, '<dateTimeFormat>', '<timezone>', <default_value>)

LIMITATION

this is a workaround of the more standard SQL

COALESCE(fromDateTime(colContainsMalformedStr, '<dateTimeFormat>', '<timezone>'), <default_value>)
  • currently, ScalarFunction doesn't support scalar function return null explicitly unless the input contains null
  • currently, ScalarFunction doesn't support type-based function registry, only with function arg number when function names are the same. (and thus timezone must be specified)

REFERENCE

@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2023

Codecov Report

Merging #11258 (d9bf40f) into master (9d7676e) will increase coverage by 14.33%.
Report is 250 commits behind head on master.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master   #11258       +/-   ##
=============================================
+ Coverage      0.11%   14.44%   +14.33%     
- Complexity        0      201      +201     
=============================================
  Files          2229     2343      +114     
  Lines        119803   125698     +5895     
  Branches      18126    19310     +1184     
=============================================
+ Hits            137    18160    +18023     
+ Misses       119646   106002    -13644     
- Partials         20     1536     +1516     
Flag Coverage Δ
integration 0.00% <0.00%> (?)
integration1temurin11 ?
integration1temurin20 ?
integration2 0.00% <0.00%> (?)
integration2temurin11 ?
integration2temurin17 ?
integration2temurin20 ?
java-11 14.44% <0.00%> (?)
java-17 14.44% <0.00%> (?)
java-20 14.44% <0.00%> (?)
temurin 14.44% <0.00%> (?)
unittests 14.44% <0.00%> (?)
unittests1temurin11 ?
unittests2 14.44% <0.00%> (?)
unittests2temurin11 ?
unittests2temurin17 ?
unittests2temurin20 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
.../pinot/common/function/DateTimePatternHandler.java 0.00% <0.00%> (ø)
...inot/common/function/scalar/DateTimeFunctions.java 0.00% <0.00%> (ø)

... and 901 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

try {
return dateTimeFormatter.parseMillis(dateTimeString);
} catch (Exception e) {
return 0L;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have concerns on doing this by default. one alternative is to add a

fromDateTime(String dateTimeString, String pattern, long defaultValue)

and only allow returning default value when malformed dateTimeString is encountered otherwise still throw an exception. please share your thoughts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest doing it more explicitly by asking for an extra default value. This is similar to how we handle json path error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unfortunately the function signature with extra default value is taken by the timezone :-( so we can't add that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? The default value should be long type

Copy link
Contributor Author

@walterddr walterddr Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function registry is using number of arguments to match scalar function with the same name, it doesn't consider type b/c v1 is not strong-typed

@walterddr walterddr marked this pull request as ready for review August 4, 2023 15:51
@@ -491,7 +492,7 @@ public static String[] toDateTimeMV(long[] millis, String pattern, String timezo
* Converts DateTime string represented by pattern to epoch millis
*/
@ScalarFunction
public static long fromDateTime(String dateTimeString, String pattern) {
public static long fromDateTime(@Nullable String dateTimeString, String pattern) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. We don't expect null values to be passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dateTimeString column can have nulls right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is null, this scalar function will return null, which is expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i think i mixed null and malformed handling , let's only target malformed string in this PR. i am not sure whether we should allow null in dateTimeString functions

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the method with a default value, we should have try catch, and return default in the catch; for the method without the default value, we should just throw the exception out

}

private static long parseMillis(String dateTimeString, DateTimeFormatter dateTimeFormatter) {
return parseMillis(dateTimeString, dateTimeFormatter, 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should directly throw exception instead of returning 0. Using implicit default value can case unexpected behavior


public static long parseDateTimeStringToEpochMillis(String dateTimeString, String pattern, String timezoneId,
long defaultVal) {
DateTimeFormatter dateTimeFormatter = getDateTimeFormatter(pattern, timezoneId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to have a try-catch over getDateTimeFormatter as well

@walterddr walterddr merged commit 7cccc5e into apache:master Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants