std::rand::distributions::Exp1 is biased #10084

huonw · 2013-10-26T07:06:24Z

The following program generates 10,000 samples of the sample mean of 100,000 Exponential(1) random variates.

use std::rand::{StdRng, Rng};
use std::rand::distributions::Exp1;

static SAMPLES_PER_MEAN: uint = 100_000;
static NUM_MEANS: uint = 10_000;

fn main() {
    let mut rng = StdRng::new();
    let n = SAMPLES_PER_MEAN as f64;

    for _ in range(0, NUM_MEANS) {
        let mut s = 0.;
        for _ in range(0, SAMPLES_PER_MEAN) {
            s += *rng.gen::<Exp1>();
        }
        println!("{:.20}", s / n)
    }
}

The following plot plots the kernel density estimate of the above (solid) line, against the expected density based on the central limit theorem (dashed). They should match fairly closely with samples of this size. (The vertical line is the expected mean value of the samples, i.e. the peak should lie there.)

Note that the scale is slightly deceptive and the bias isn't huge, the sample mean is converging to approximately 0.9964 rather than 1. However, it is highly unlikely that this is by chance with such large samples (1 billion samples in total), and my experimentation has shown it to be stable, i.e. running it several times with many different values of NUM_MEANS and SAMPLES_PER_MEAN all give plots similar to the above, with overall means around 0.9964.

(Plot created with R via:

dat <- read.table('out.csv')$V1
x <- seq(0.98, 1.02, by=0.0001)
png('exp-density.png', width=600, height=600)
plot(density(dat), main='Density of 10000 means of 100000 samples of Exp(1)', lty=1)
lines(x, dnorm(x, 1, 1/sqrt(100000)), lty=2)
lines(c(1.0, 1.0), c(-1000, 1000), lty=3)
legend('topright', c('Sample density', 'True density'), lty=c(1,2))
dev.off()

FWIW, generating a similar sample in R matches the expected density almost exactly.)

Having done a very small amount of investigation, I think this might be an error in the Ziggurat tables for exp, or an error in the implementation of Ziggurat for non-symmetric random variables. (Reasons: the f64 generator seems to match a uniform(0,1) closely, and the normal generator (which also uses Ziggurat) matches its expected distribution closely too. That said, I didn't check either of these super closely.)

The text was updated successfully, but these errors were encountered:

huonw · 2013-10-26T07:08:27Z

I'm working on this (as actively as I can).

The Ziggurat implementation of Exp1 is incorrect: it appears to have an expected value of around 0.994 rather than 1 and the density of a large number of sample means does not match the normal distribution expected by the central limit theorem (specifically, it's a normal distribution centred at 0.994). This replaces Ziggurat with a "plain" inverse-cdf implementation for now. This implementation has the expected properties, although it is approximately 4 times slower. cc rust-lang#10084 (this does *not* fix it).

The code was using (in the notation of Doornik 2005) `f(x_{i+1}) - f(x_{i+2})` rather than `f(x_i) - f(x_{i+1})`. This corrects that, and removes the F_DIFF tables which caused this problem in the first place. They `F_DIFF` tables are a micro-optimisation (in theory, they could easily be a micro-pessimisation): that `if` gets hit about 1% of the time for Exp/Normal, and the rest of the condition involves RNG calls and a floating point `exp`, so it is unlikely that saving a single FP subtraction will be very useful (especially as more tables means more memory reads and higher cache pressure, as well as taking up space in the binary (although only ~2k in this case)). Closes #10084. Notably, unlike that issue suggests, this wasn't a problem with the Exp tables. It affected Normal too, but since it is symmetric, there was no bias in the mean (as the bias was equal on the positive and negative sides and so cancelled out) but it was visible as a variance slightly lower than it should be. New plot: ![exp-density](https://f.cloud.github.com/assets/1203825/1445796/42218dfe-422a-11e3-9f98-2cd146b82b46.png) I've started writing some tests in [huonw/random-tests](https://github.com/huonw/random-tests) (not in the main repo because they can and do fail occasionally, due to randomness, but it is on Travis and Rust-CI so it will hopefully track the language), unsurprisingly, they're [currently failing](https://travis-ci.org/huonw/random-tests/builds/13313987) (note that both exp and norm are failing, the former due to both mean and variance the latter due to just variance), but pass at the 0.01 level reliably with this change. (Currently the only test is essentially a quantitative version of the plots I've been showing, which is run on the `f64` `Rand` instance (uniform 0 to 1), and the Normal and Exp distributions.)

ghost assigned huonw Oct 26, 2013

This was referenced Oct 27, 2013

std::rand: substitute an unbiased Exp1 temporarily. #10099

Closed

std::rand: correct an off-by-one in the Ziggurat code. #10196

Merged

huonw closed this as completed in 1e2283d Nov 1, 2013

huonw removed their assignment Jun 16, 2014

pitdicker mentioned this issue Mar 30, 2018

Testing of distributions rust-random/rand#357

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::rand::distributions::Exp1 is biased #10084

std::rand::distributions::Exp1 is biased #10084

huonw commented Oct 26, 2013

huonw commented Oct 26, 2013

std::rand::distributions::Exp1 is biased #10084

std::rand::distributions::Exp1 is biased #10084

Comments

huonw commented Oct 26, 2013

huonw commented Oct 26, 2013