Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime transform functions #8397

Merged

Conversation

richardstartin
Copy link
Member

@richardstartin richardstartin commented Mar 24, 2022

This PR starts with a small refactoring to allow TransformFunctionTypes to have aliases, then introduces TransformFunctions for the following functions:

  • year
  • yearOfWeek
  • yow
  • month
  • week
  • weekOfYear
  • quarter
  • dayOfWeek
  • dow
  • dayOfYear
  • doy
  • dayOfMonth
  • day
  • hour
  • minute
  • millisecond

The transform functions are tested against the @ScalarFunctions.

The motivation for the change is the high cost of using scalar functions in transforms. Here is the allocation rate observed for

select count(*), year(event_time) as y, month(event_time) as m from githubEvents group by y, m

Screenshot 2022-03-24 at 12 11 19

vs the similar

select count(*), dateTrunc('year', event_time) as y, dateTrunc('m', event_time) as y from githubEvents group by y, m

Screenshot 2022-03-24 at 12 11 33

When scalar function is used, 16% of samples were in verifying accessibility of the Method - @ScalarFunction should be used only as a last resort.

Screenshot 2022-03-24 at 12 12 00

@richardstartin richardstartin requested a review from xiangfu0 March 24, 2022 12:15
@richardstartin
Copy link
Member Author

I've reverted to JODA because I noticed that DateTime objects appear to scalarise in loops like this, where java.time results in allocation. I only tested for year but the difference is big enough to stay on JODA:

Benchmark                                                      (_size)  Mode  Cnt       Score       Error   Units
BenchmarkTimeAPI.convertJavaTime                                  1024  avgt    5      35.190 ±     1.019   us/op
BenchmarkTimeAPI.convertJavaTime:·gc.alloc.rate                   1024  avgt    5    3842.276 ±   112.354  MB/sec
BenchmarkTimeAPI.convertJavaTime:·gc.alloc.rate.norm              1024  avgt    5  212992.015 ±     0.002    B/op
BenchmarkTimeAPI.convertJavaTime:·gc.churn.G1_Eden_Space          1024  avgt    5    3845.330 ±   353.367  MB/sec
BenchmarkTimeAPI.convertJavaTime:·gc.churn.G1_Eden_Space.norm     1024  avgt    5  213156.517 ± 17414.389    B/op
BenchmarkTimeAPI.convertJavaTime:·gc.churn.G1_Old_Gen             1024  avgt    5       0.003 ±     0.009  MB/sec
BenchmarkTimeAPI.convertJavaTime:·gc.churn.G1_Old_Gen.norm        1024  avgt    5       0.190 ±     0.510    B/op
BenchmarkTimeAPI.convertJavaTime:·gc.count                        1024  avgt    5      91.000              counts
BenchmarkTimeAPI.convertJavaTime:·gc.time                         1024  avgt    5      54.000                  ms
BenchmarkTimeAPI.convertJodaTime                                  1024  avgt    5       6.541 ±     0.302   us/op
BenchmarkTimeAPI.convertJodaTime:·gc.alloc.rate                   1024  avgt    5      ≈ 10⁻⁴              MB/sec
BenchmarkTimeAPI.convertJodaTime:·gc.alloc.rate.norm              1024  avgt    5       0.003 ±     0.001    B/op
BenchmarkTimeAPI.convertJodaTime:·gc.count                        1024  avgt    5         ≈ 0              counts

@codecov-commenter
Copy link

Codecov Report

Merging #8397 (95369f4) into master (5e7c005) will increase coverage by 33.53%.
The diff coverage is 95.10%.

@@              Coverage Diff              @@
##             master    #8397       +/-   ##
=============================================
+ Coverage     30.54%   64.08%   +33.53%     
- Complexity        0     4275     +4275     
=============================================
  Files          1642     1610       -32     
  Lines         86112    84709     -1403     
  Branches      13000    12853      -147     
=============================================
+ Hits          26304    54285    +27981     
+ Misses        57420    26499    -30921     
- Partials       2388     3925     +1537     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.00% <95.10%> (?)
unittests2 14.13% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../transform/function/DateTimeTransformFunction.java 88.60% <88.60%> (ø)
...e/pinot/common/function/TransformFunctionType.java 100.00% <100.00%> (+15.47%) ⬆️
...r/transform/function/TransformFunctionFactory.java 83.59% <100.00%> (+1.77%) ⬆️
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...apache/pinot/common/helix/ExtraInstanceConfig.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1353 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e7c005...95369f4. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants