Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support SKEWNESS and KURTOSIS aggregates #10021

Merged
merged 3 commits into from
Dec 23, 2022
Merged

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Dec 21, 2022

See commit message 😄 - uses the apache commons implementation as the internal format, if we are worried about performance we can copy the code inline but that would add a lot of error-prone mathematical code.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we also add SKEWSAMP/KURTOSISSAMP if it's just the matter of dividing by n-1?
  2. Can we add the test to merge <EmptyFourthMoment, NonemptyFourthMoment> and <NonemptyFourthMoment, EmptyFourthMoment>? I made a mistake on this when I worked on VAR/STDDEV
  3. Can we run the quick benchmarking of the performance of doing standalone FourthMoment vs extending FourthMoment from Apache?

@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2022

Codecov Report

Merging #10021 (e7e0d04) into master (c034503) will decrease coverage by 7.68%.
The diff coverage is 82.25%.

@@             Coverage Diff              @@
##             master   #10021      +/-   ##
============================================
- Coverage     68.67%   60.99%   -7.69%     
+ Complexity     5653     5545     -108     
============================================
  Files          1991     1982       -9     
  Lines        107251   107109     -142     
  Branches      16302    16292      -10     
============================================
- Hits          73654    65330    -8324     
- Misses        28382    36942    +8560     
+ Partials       5215     4837     -378     
Flag Coverage Δ
integration1 ?
integration2 24.53% <3.22%> (?)
unittests1 67.94% <82.25%> (+0.08%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...tion/function/FourthMomentAggregationFunction.java 66.66% <66.66%> (ø)
...org/apache/pinot/core/common/ObjectSerDeUtils.java 91.37% <77.77%> (-0.07%) ⬇️
.../segment/local/customobject/PinotFourthMoment.java 96.49% <96.49%> (ø)
...gregation/function/AggregationFunctionFactory.java 81.81% <100.00%> (+0.25%) ⬆️
...che/pinot/segment/spi/AggregationFunctionType.java 90.62% <100.00%> (+0.19%) ⬆️
...g/apache/pinot/server/api/resources/ErrorInfo.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/minion/exception/TaskCancelledException.java 0.00% <0.00%> (-100.00%) ⬇️
...ot/broker/requesthandler/BrokerRequestHandler.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/controller/recommender/io/ConfigManager.java 0.00% <0.00%> (-100.00%) ⬇️
.../org/apache/pinot/client/AggregationResultSet.java 0.00% <0.00%> (-100.00%) ⬇️
... and 473 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agavra
Copy link
Contributor Author

agavra commented Dec 21, 2022

@snleee

Can we also add SKEWSAMP/KURTOSISSAMP if it's just the matter of dividing by n-1?

I checked whether this is common or not, and it seems that sample skew/sample kurtosis aren't really used (e.g. presto doesn't define those two functions) so I'll go ahead and just rename the functions SKEWNESS and KURTOSIS

@agavra
Copy link
Contributor Author

agavra commented Dec 22, 2022

I won't inline - benchmark results from inlining everything:

Benchmark                       Mode  Cnt  Score    Error  Units
FourthMomentBenchmark.apacheFM  avgt    5  0.056 ±  0.001  ms/op
FourthMomentBenchmark.nativeFM  avgt    5  0.056 ±  0.002  ms/op
public class NativeFourthMoment {

  private long _n = 0;
  private double _m1 = Double.NaN;
  private double _m2 = Double.NaN;
  private double _m3 = Double.NaN;
  private double _m4 = Double.NaN;

  public void increment(double d) {
    if (_n == 0) {
      _m4 = 0d;
      _m3 = 0d;
      _m2 = 0d;
      _m1 = 0d;
    }

    _n++;

    double n = _n; // prevent overflow
    double dev = d - _m1;
    double nDev = dev / n;
    double nDevSq = nDev * nDev;

    _m4 = _m4 - 4.0 * nDev * _m3
        + 6.0 * nDevSq * _m2
        + ((n * n) - 3 * (n - 1)) * (nDevSq * nDevSq * (n - 1) * n);

    _m3 = _m3 - 3.0 * nDev * _m2
        + (n - 1) * (n - 2) * nDevSq * dev;

    _m2 += (n - 1) * dev * nDev;

    _m1 += nDev;
  }

  public double kurtosis() {
    double kurtosis = Double.NaN;
    if (_n > 3) {
      double variance = _m2 / (_n - 1);
      if (variance < 10E-20) {
        kurtosis = 0.0;
      } else {
        double n = _n;
        kurtosis =
            (n * (n + 1) * _m4 -
                3 * _m2 * _m2 * (n - 1)) /
                ((n - 1) * (n -2) * (n -3) * variance * variance);
      }
    }
    return kurtosis;
  }

@agavra agavra requested a review from snleee December 22, 2022 00:10
@snleee
Copy link
Contributor

snleee commented Dec 22, 2022

@agavra Is Serde failing due to our change?

[INFO] -------------------------------------------------------------
Error:  /home/runner/work/pinot/pinot/pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:[490,104] error: cannot infer type arguments for ObjectSerDe<T>

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments. LGTM! Thank you for adding this super quickly 👍

* specific language governing permissions and limitations
* under the License.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove line?

* specific language governing permissions and limitations
* under the License.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove line :)



/**
* A {@link Comparable} implementation of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the descriptions? e.g. which algorithm/library that we are depending on etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! I was in the middle of writing it (clearly) and then forgot all about it...

@agavra
Copy link
Contributor Author

agavra commented Dec 22, 2022

@agavra Is Serde failing due to our change?

ah yes... JDK8 doesn't infer properly for anonymous classes 😢

@snleee
Copy link
Contributor

snleee commented Dec 23, 2022

@agavra Thank you for the quick turnaround on this!

@snleee snleee merged commit 6303fc9 into apache:master Dec 23, 2022
@snleee snleee changed the title support SKEW_POP and KURTOSIS_POP aggregates support SKEWNESS and KURTOSIS aggregates Dec 23, 2022
@agavra agavra deleted the skew_kurt branch December 23, 2022 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants