-
-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add the negative binomial distribution to rand_distr. #1296
ENH: Add the negative binomial distribution to rand_distr. #1296
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negative binomial makes no sense for r=0. But we can generalise to p=0, since our output is floating-point which has a representation for infinity?
The code style looks good.
As for the implementation, all I can say is that it correlates with the mentioned reference, 10.1007/978-1-4613-8643-8. I tried comparing with 10.1002/0471715816 (Johnson 2005, Univariate Discrete Distributions), but my understanding of statistics is lacking. Perhaps @saona-raimundo would be willing to take a look?
If p=0, the probability mass function of the distribution would be |
Hi!
Note that Wikipedia accepts Regarding the implementation, I confirm it corresponds to the citation. @dhardy, I will check your reference 10.1002/0471715816 to see if there are improved algorithms proposed. |
// and saved in the NegativeBinomial instance, because it | ||
// depends on just the parameters `r` and `p`. We have to | ||
// create a new Poisson instance for each variate generated. | ||
Poisson::<F>::new(gamma.sample(rng)).unwrap().sample(rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of unwrap
, one should take care of the case where gamma.sample(rng)
returns a float which should not be accepted.
I suggest introducing a loop which samples until one gets a finite sample.
The Float
trait has the method is_finite
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen? The Gamma distribution should return strictly positive values, so Poisson::new
should never fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my point was about handling infinity as the result of simulating the gamma variable.
Nothing ensures that a Gamma samples always a positive and finite float.
Not is the signature of the sample
method nor in its documentation.
At some point there was a discussion about who should handle infinity
out of the simulation: the library or the user. I thought the decision was that the user should handle infinity floats, maybe I am wrong. This is why I suggest handling a possible infinite value here with a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I need to fix this. If p
is extremely small (e.g. 1e-40), then the scale passed to Gamma
is huge (1e+40), and with such a scale, Gamma
will generate samples that are infinity.
A simple loop would not be safe if we don't have a bound on how frequently infinity is generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was not even thinking on extreme values, just the unwrap
.
The thing is, if gamma samples infinity, then the Poisson "should" also infinity.
Then, instead of a loop, one should introduce an if
checking if the gamma samples infinity.
If it does, return infinity directly, if it does not, sample the Poisson (created with new
and unwrap
).
The reference 10.1002/0471715816 "Univariate Discrete Distributions" is really nice!
The reference If I am not mistaken, |
For the normal distribution, we are using tables (Ziggurat algorithm). However, I agree that the approach here is fine. |
Great, and thanks for the review. Then are we agreed to merge this (once the above is corrected)? I didn't review in detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! We just need to update the changelog.
Looks ready to merge @WarrenWeckesser? |
@dhardy, I'm still looking into the issue that @saona-raimundo raised here. I'm checking how extreme values of the parameters might break things. |
@WarrenWeckesser are you still working on this? |
I've been away from this (and most of my other open source work) for much of last year, but I haven't forgotten about it. I have a project that I need to finish up before I can get back to this. That might take a few weeks. |
Closing as stale. Feel free to re-open. |
No description provided.