-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::rand::distributions::Exp1 is biased #10084
Comments
I'm working on this (as actively as I can). |
huonw
added a commit
to huonw/rust
that referenced
this issue
Oct 27, 2013
The Ziggurat implementation of Exp1 is incorrect: it appears to have an expected value of around 0.994 rather than 1 and the density of a large number of sample means does not match the normal distribution expected by the central limit theorem (specifically, it's a normal distribution centred at 0.994). This replaces Ziggurat with a "plain" inverse-cdf implementation for now. This implementation has the expected properties, although it is approximately 4 times slower. cc rust-lang#10084 (this does *not* fix it).
This was referenced Oct 27, 2013
bors
added a commit
that referenced
this issue
Nov 1, 2013
The code was using (in the notation of Doornik 2005) `f(x_{i+1}) - f(x_{i+2})` rather than `f(x_i) - f(x_{i+1})`. This corrects that, and removes the F_DIFF tables which caused this problem in the first place. They `F_DIFF` tables are a micro-optimisation (in theory, they could easily be a micro-pessimisation): that `if` gets hit about 1% of the time for Exp/Normal, and the rest of the condition involves RNG calls and a floating point `exp`, so it is unlikely that saving a single FP subtraction will be very useful (especially as more tables means more memory reads and higher cache pressure, as well as taking up space in the binary (although only ~2k in this case)). Closes #10084. Notably, unlike that issue suggests, this wasn't a problem with the Exp tables. It affected Normal too, but since it is symmetric, there was no bias in the mean (as the bias was equal on the positive and negative sides and so cancelled out) but it was visible as a variance slightly lower than it should be. New plot:  I've started writing some tests in [huonw/random-tests](https://github.com/huonw/random-tests) (not in the main repo because they can and do fail occasionally, due to randomness, but it is on Travis and Rust-CI so it will hopefully track the language), unsurprisingly, they're [currently failing](https://travis-ci.org/huonw/random-tests/builds/13313987) (note that both exp and norm are failing, the former due to both mean and variance the latter due to just variance), but pass at the 0.01 level reliably with this change. (Currently the only test is essentially a quantitative version of the plots I've been showing, which is run on the `f64` `Rand` instance (uniform 0 to 1), and the Normal and Exp distributions.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The following program generates 10,000 samples of the sample mean of 100,000 Exponential(1) random variates.
The following plot plots the kernel density estimate of the above (solid) line, against the expected density based on the central limit theorem (dashed). They should match fairly closely with samples of this size. (The vertical line is the expected mean value of the samples, i.e. the peak should lie there.)
Note that the scale is slightly deceptive and the bias isn't huge, the sample mean is converging to approximately 0.9964 rather than 1. However, it is highly unlikely that this is by chance with such large samples (1 billion samples in total), and my experimentation has shown it to be stable, i.e. running it several times with many different values of
NUM_MEANS
andSAMPLES_PER_MEAN
all give plots similar to the above, with overall means around 0.9964.(Plot created with R via:
FWIW, generating a similar sample in R matches the expected density almost exactly.)
Having done a very small amount of investigation, I think this might be an error in the Ziggurat tables for exp, or an error in the implementation of Ziggurat for non-symmetric random variables. (Reasons: the
f64
generator seems to match a uniform(0,1) closely, and the normal generator (which also uses Ziggurat) matches its expected distribution closely too. That said, I didn't check either of these super closely.)The text was updated successfully, but these errors were encountered: