Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add t_statistic(), paired_t_statistic() agg functions (#15798) #15817

Closed
wants to merge 1 commit into from

Conversation

leepface
Copy link
Contributor

Summary

Test plan

Tested interactively and compared results to R:

$ java -jar presto-cli/target/presto-cli-*-executable.jar --catalog tpch --schema sf10

SELECT
  t_statistic(a) AS t_a,
  t_statistic(b) AS t_b,
  paired_t_statistic(a, b) as t_paired_diff
FROM (
VALUES
  (30.02, 29.89),
  (29.99, 29.93), 
  (30.11, 29.72),
  (39.97, 29.98),
  (30.01, 30.02),
  (29.99, 29.98)
) AS t (a, b);

        t_a        |       t_b        |   t_paired_diff    
-------------------+------------------+--------------------
 19.11105607133625 | 679.298607343007 | -1.069793672096585 
(1 row)

Query 20210311_162206_00001_i4ig7, FINISHED, 1 node
Splits: 6 total, 6 done (100.00%)
0:02 [0 rows, 0B] [0 rows/s, 0B/s]


$ R 
a <- c(30.02, 29.99, 30.11, 39.97, 30.01, 29.99)
b <- c(29.89, 29.93, 29.72, 29.98, 30.02, 29.98)

# paired
t.test(b, a, paired = TRUE)$statistic
##        t 
## -1.069794

# unpaired one-sample (b) 
t.test(a)$statistic
##        t 
## 19.11106

# unpaired one-sample (b)
t.test(b)$statistic
##        t 
## 679.2986 

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. Don't forget to follow our attribution guidelines for any code copied from other projects.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* Added one-sample Student's t-test `t_statistic()` aggregation function
* Added paired two-sample Student's t-test `paired_t_statistic()` aggregation function

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some nits. Please also update the commit title to Add t_statistic and parired_t_statistic aggregation functions (uppercase first letter, remove ()). Thanks!

@InputFunction
public static void doubleInput(@AggregationState CentralMomentsState state, @SqlType(StandardTypes.DOUBLE) double x, @SqlType(StandardTypes.DOUBLE) double y)
{
double value = y - x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the local variable is not necessary, just use updateCentralMomentsState(state, y - x); Generally speaking, if a variable is only used once, it's not necessary. Please update other places as well.

@OutputFunction(StandardTypes.DOUBLE)
public static void paired_t_statistic(@AggregationState CentralMomentsState state, BlockBuilder out)
{
long n = state.getCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can either remove n and use state.getCount() through out, or update the other places using state.getCount to use n instead.

@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot added the stale label Apr 16, 2022
@stale stale bot closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants