Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rng::pick method #198

Closed
wants to merge 10,000 commits into from
Closed

Add Rng::pick method #198

wants to merge 10,000 commits into from

Conversation

nvzqz
Copy link
Contributor

@nvzqz nvzqz commented Dec 1, 2017

Acts similarly to choose, except it is generic over all T: IntoIterator.

Jorge Aparicio and others added 30 commits January 29, 2015 07:49
Conflicts:
	src/libcoretest/iter.rs
Also some tidying up of a bunch of crate attributes
Nothing actually compiles here because `extern crate rand` is apparently
still importing the built-in one.
These are not necessary since the crate doesn’t participate in staged_api
Add random keyword to Cargo.toml
Make version bound less strict
Put benchmarks into a black box
@nvzqz
Copy link
Contributor Author

nvzqz commented Dec 1, 2017

I would like to discuss an alternative approach that would result in a breaking change.

A better way (imo) of doing this would be to define a trait Choose.

https://play.rust-lang.org/?gist=49c32c8585108afae51e9642c6872f68&version=nightly

RandomKit does something similar via the RandomRetrievable trait as far as library API goes.

trait Choose {
    type Item;

    fn choose<R: Rng>(self, rng: &mut R) -> Option<Self::Item>;
}

impl<T: IntoIterator> Choose for T {
    type Item = <T as IntoIterator>::Item;

    default fn choose<R: Rng>(self, rng: &mut R) -> Option<T::Item> {
        let mut value = None;
        for (i, elem) in self.into_iter().enumerate() {
            if i == 0 || rng.gen_range(0, i + 1) == 0 {
                value = Some(elem);
            }
        }
        value
    }
}

impl<'a, T> Choose for &'a [T] {
    fn choose<R: Rng>(self, rng: &mut R) -> Option<&'a T> {
        if self.is_empty() {
            None
        } else {
            Some(&self[rng.gen_range(0, self.len())])
        }
    }
}

impl<'a, T> Choose for &'a mut [T] {
    fn choose<R: Rng>(self, rng: &mut R) -> Option<&'a mut T> {
        if self.is_empty() {
            None
        } else {
            Some(&mut self[rng.gen_range(0, self.len())])
        }
    }
}

The downside is that this relies on specialization, which is currently unstable.

This would remove the need for the choose_mut mutable variant of choose within Rng:

trait Rng {
    ...

    fn choose<T: Choose>(&mut self, iter: T) -> Option<T::Item> where Self: Sized {
        iter.choose()
    }

    ...
}

@dhardy
Copy link
Member

dhardy commented Dec 1, 2017

See this code

Acts similarly to choose, except it is generic over all T: IntoIterator.
@nvzqz
Copy link
Contributor Author

nvzqz commented Dec 4, 2017

I think there should be some method that works with any iterator, not just slices.

@dhardy
Copy link
Member

dhardy commented Dec 4, 2017

I think an iterator implementation could be added. But we shouldn't rely on that alone, because iterators are O(n) (just getting the length requires that; size_hint() isn't a reliable alternative).

@nvzqz
Copy link
Contributor Author

nvzqz commented Dec 4, 2017

It's a much better alternative to collecting n elements into a Vec and retrieving a random value, which is the naive solution that people might reach for.

@dhardy
Copy link
Member

dhardy commented Dec 4, 2017

Not necessarily. If the length is known in advance, one can do iter.nth(rng.gen_range(0, length)).unwrap() (still O(n) unless the iterator code can be unravelled, but very efficient otherwise). If the length is unknown, the options are collect() and use the slice impl, or an algorithm like in this PR. But the problem with this above algorithm is that it must sample from the RNG for each item, which may or may not be faster than copying the whole lot to a buffer. A compromise between the two approaches using a fixed size buffer may be the fastest approach.

BTW I'm not going to accept a function like this being added into lib.rs; please start from #195 and use the seq module.

@burdges
Copy link
Contributor

burdges commented Dec 4, 2017

I think @dhardy is right that iter.nth(rng.gen_range(0, iter.len())) is the right way to do this, as it expresses the running time which we cannot improve, but users can improve by using a distribution that favors earlier elements. Also NLL will make the mut version of this one liner work eventually.

@dhardy
Copy link
Member

dhardy commented Dec 4, 2017

@burdges len is only available on ExactSizeIterator. I said if the length is known in advance ...

@dhardy dhardy added P-medium F-new-int Functionality: new, within Rand P-low Priority: Low and removed P-medium labels Mar 12, 2018
@pitdicker pitdicker mentioned this pull request Mar 29, 2018
@pitdicker pitdicker closed this Apr 4, 2018
@pitdicker
Copy link
Contributor

Sorry this PR got so suddenly closed. The cause was that we cleaned up the git history (#350) and forced-pushed it to master. Now there was a difference of about 30.000, and I suppose GitHub just gave up...

This PR, together with two others, stayed open for so long with little attention because it touches on an area of Rand that needs some design first. That is now slowly getting explored in dhardy#82.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-int Functionality: new, within Rand P-low Priority: Low
Projects
None yet
Development

Successfully merging this pull request may close these issues.