Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the Poisson distribution #15814

Merged
merged 1 commit into from
May 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions presto-docs/src/main/sphinx/functions/math.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,13 @@ Mathematical Functions
a real value and the standard deviation must be a real and positive value.
The probability p must lie on the interval (0, 1).

.. function:: inverse_poisson_cdf(lambda, p) -> integer

Compute the inverse of the Poisson cdf with given lambda (mean) parameter for the cumulative
probability (p). It returns the value of n so that: P(N <= n; lambda) = p.
The lambda parameter must be a positive real number (of type DOUBLE).
The probability p must lie on the interval [0, 1).

.. function:: normal_cdf(mean, sd, v) -> double

Compute the Normal cdf with given mean and standard deviation (sd): P(N < v; mean, sd).
Expand Down Expand Up @@ -130,6 +137,11 @@ Mathematical Functions

Returns the constant Pi.

.. function:: poisson_cdf(lambda, value) -> double

Compute the Poisson cdf with given lambda (mean) parameter: P(N <= value; lambda).
The lambda parameter must be a positive real number (of type DOUBLE) and value must be a non-negative integer.

.. function:: pow(x, p) -> double

This is an alias for :func:`power`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
import org.apache.commons.math3.distribution.BetaDistribution;
import org.apache.commons.math3.distribution.BinomialDistribution;
import org.apache.commons.math3.distribution.ChiSquaredDistribution;
import org.apache.commons.math3.distribution.PoissonDistribution;
import org.apache.commons.math3.special.Erf;

import java.math.BigDecimal;
Expand Down Expand Up @@ -761,6 +762,32 @@ public static double chiSquaredCdf(
return distribution.cumulativeProbability(value);
}

@Description("Inverse of Poisson cdf given lambda (mean) parameter and probability")
@ScalarFunction
@SqlType(StandardTypes.INTEGER)
public static long inversePoissonCdf(
@SqlType(StandardTypes.DOUBLE) double lambda,
@SqlType(StandardTypes.DOUBLE) double p)
{
checkCondition(p >= 0 && p < 1, INVALID_FUNCTION_ARGUMENT, "p must be in the interval [0, 1)");
checkCondition(lambda > 0, INVALID_FUNCTION_ARGUMENT, "lambda must be greater than 0");
PoissonDistribution distribution = new PoissonDistribution(lambda);
return distribution.inverseCumulativeProbability(p);
}

@Description("Poisson cdf given the lambda (mean) parameter and value")
@ScalarFunction
@SqlType(StandardTypes.DOUBLE)
public static double poissonCdf(
@SqlType(StandardTypes.DOUBLE) double lambda,
@SqlType(StandardTypes.INTEGER) long value)
{
checkCondition(value >= 0, INVALID_FUNCTION_ARGUMENT, "value must be a non-negative integer");
checkCondition(lambda > 0, INVALID_FUNCTION_ARGUMENT, "lambda must be greater than 0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the library API is already doing this check, I say just do a try/catch and throw user_error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Since this is the method used by all the distribution functions (i.e.: normal, beta, chi-square, binomial), do you think it should be changed there as well?
If so - could you please help with this change? (I'm not experienced in Java, so want to make sure I understand what you're proposing and which error will be thrown)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do something like:

try { ... } catch(NotStrictlyPositiveException notStrictlyPositiveException) { throw new PrestoException(GENERIC_USER_ERROR, ...)

Look at StandardErrorCodes.java and other files to see the pattern what they do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah it will be good to do for all of them.

PoissonDistribution distribution = new PoissonDistribution(lambda);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lambda going to be generally fixed in a query? If so, you should find a way to avoid new object creation to improve memory perf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also interesting. As I wrote to your other comment - this is the method used by all the distribution functions (i.e.: normal, beta, chi-square, binomial), do you think it should be changed there as well?
If so - could you please help with this change / propose how to do it?

Thanks upfront.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good suggestion here :( I looked in the code but can't find other examples. I will look further and comment here if I find something.

return distribution.cumulativeProbability((int) value);
}

@Description("round to nearest integer")
@ScalarFunction("round")
@SqlType(StandardTypes.TINYINT)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1427,6 +1427,30 @@ public void testChiSquaredCdf()
assertInvalidFunction("chi_squared_cdf(3, -10)", "value must non-negative");
}

@Test
public void testInversePoissonCdf()
{
assertFunction("inverse_poisson_cdf(3, 0.0)", INTEGER, 0);
assertFunction("inverse_poisson_cdf(3, 0.3)", INTEGER, 2);
assertFunction("inverse_poisson_cdf(3, 0.95)", INTEGER, 6);
assertFunction("inverse_poisson_cdf(3, 0.99999999)", INTEGER, 17);

assertInvalidFunction("inverse_poisson_cdf(-3, 0.3)", "lambda must be greater than 0");
assertInvalidFunction("inverse_poisson_cdf(3, -0.1)", "p must be in the interval [0, 1)");
assertInvalidFunction("inverse_poisson_cdf(3, 1.1)", "p must be in the interval [0, 1)");
assertInvalidFunction("inverse_poisson_cdf(3, 1)", "p must be in the interval [0, 1)");
}

@Test
public void testPoissonCdf()
{
assertFunction("round(poisson_cdf(10, 0), 2)", DOUBLE, 0.0);
assertFunction("round(poisson_cdf(3, 5), 2)", DOUBLE, 0.92);

assertInvalidFunction("poisson_cdf(-3, 5)", "lambda must be greater than 0");
assertInvalidFunction("poisson_cdf(3, -10)", "value must be a non-negative integer");
}

@Test
public void testWilsonInterval()
{
Expand Down