Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for deterministic random number generation #131606

Open
1 of 4 tasks
joboet opened this issue Oct 12, 2024 · 28 comments
Open
1 of 4 tasks

Tracking Issue for deterministic random number generation #131606

joboet opened this issue Oct 12, 2024 · 28 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@joboet
Copy link
Member

joboet commented Oct 12, 2024

Feature gate: #![feature(deterministic_random_chacha8)]

This is a tracking issue for DeterministicRandomSource, a deterministic random number generation that will always generate the same values for a given seed.

Public API

// core::random;

pub struct DeterministicRandomSource { .. }

impl DeterministicRandomSource {
    pub const fn from_seed(seed: [u8; 32]) -> DeterministicRandomSource;
    pub const fn seed(&self) -> &[u8; 32];
}

impl RandomSource for DeterministicRandomSource { .. }
impl Random for DeterministicRandomSource { .. }

Steps / History

Unresolved Questions

  • Is ChaCha8 the right random number generator to use? Given that this RNG should not be used for cryptographic purposes, ChaCha8 might be a bit overkill.
  • Does the seed function make sense?

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@joboet joboet added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 12, 2024
joboet added a commit to joboet/rust that referenced this issue Oct 12, 2024
ACP: rust-lang/libs-team#394
Tracking issue: rust-lang#131606

The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG.

Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.
@Noratrieb
Copy link
Member

Given that this RNG should not be used for cryptographic purposes, ChaCha8 might be a bit overkill.

https://go.dev/blog/randv2 might be relevant here

joboet added a commit to joboet/rust that referenced this issue Oct 12, 2024
ACP: rust-lang/libs-team#394
Tracking issue: rust-lang#131606

The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG.

Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.
@juntyr
Copy link
Contributor

juntyr commented Oct 12, 2024

If a counter-based PRNG is chosen instead, exposing the current state (in addition or instead of?) the original seed would be valuable

@juntyr
Copy link
Contributor

juntyr commented Oct 12, 2024

Could the source also support seeding from a u64 (e.g. adopting how rand expands the seed), so that there is one well-supported approaches for this very common use case?

@hanna-kruppe
Copy link
Contributor

Is ChaCha8 the right random number generator to use? Given that this RNG should not be used for cryptographic purposes, ChaCha8 might be a bit overkill.

Since it's always supposed to give the same output forever after stabilization, it seems prudent to choose an algorithm that can be expected to last for a decade or more without regrets. It would be awkward to deprecate this part of std because it later turns out to have some flaws that can't be fixed without breaking reproducibility (see the Go math/rand/v2 story). A well-established cipher that's considered secure today almost certainly won't turn out have major flaws as a source of statistical randomness: any feasible way to distinguish the output from random is a big deal for cryptanalysis, and even an "academic break" (say, key recovery in 2^100 time) doesn't necessarily mean anything about the suitability for Monte Carlo simulations. The same can't be said about the myriad of non-cryptographic designs, where it's often only a matter of time and eyeballs until serious flaws are discovered.

For example, AES (Rijndael) is from 1998 and still going strong while the non-cryptographic MT19937 from 1997 was very popular for many years but is now considered flawed and obsolete. ChaCha is from 2008. Several years later, people were still publishing new non-crypto RNGs that fail TestU01, a suite of statistical tests dating to 2007--2009.

@dhardy
Copy link
Contributor

dhardy commented Nov 28, 2024

Forgive me, but why re-invent this functionality in the std lib?

Provision of an OS-getrandom API makes sense since much of the code is in std already. Provision of an alternative to rand::rng (aka ThreadRng) may also make sense: it's a single API which may make use of the OS's random source to varying (platform-dependant) extent.

Yes, I get it: rand is complicated, so it's tempting to start fresh. I fear you'd eventually end up in a similar place however, after multiple iterations of additional frequently-requested functionality.

PRNGs

There is no particular best PRNG. ChaCha8 should be fine for this use-case, but it's fundamentally a block-based RNG whereas a word-based PRNG like Xoshiro or PCG will be substantially faster for many use-cases. Since this is explicitly about a user-seeded PRNG with user-managed-state there is no good reason not to let the user choose the algorithm too (from some set of options).

The name DeterministicRandomSource does not imply portable results. rand::rngs::SmallRng is deterministic but not portable for example (different 32-bit and 64-bit impls). If you only want one generator with portable results, name it PortableRandomSource or maybe ReproducibleRandomSource to avoid the confusion.

Random trait

The proposed trait Random appears to be the equivalent of StandardUniform (possibly restricted to primitive integer types). This is only one form of random-value-generation, and while it's mostly easy to support (there may be questions around precision for FP types), it's not that useful on its own.

Uniform ranged sampling is the obvious other application. There are quite a few algorithms for this. If you want reproducible outputs, pick one of these and stick with it (or keep the impl unstable until the algorithm is fixed). Now implement for f32, Duration, char, simd::i16x8 etc.

On the topic of reproducibility, we recently removed support for sampling isize and usize in rand due to multiple reports of it being a portability hazard. Exception: we do support usize sampling for fixed ranges but use 32-bit sampling wherever possible.

Additional scope?

There are several obvious possibilities for scope creep:

  • shuffling slices
  • choosing one or multiple elements of slices
  • choosing one or multiple elements of iterators
  • sampling random alphanumeric values
  • sampling a bool with a defined probability or ratio
  • weighted sampling of sequences

This is most of what we elected to keep in rand (vs rand_distr). If some alternative to ThreadRng (now rand::rng()) were to be implemented within std, then rand could be shrunk down to almost just this (we'd no longer need the getrandom dependency, we could merge a simpler RNG trait from rand_core).

Proposal

What I'd propose therefore is:

  • Stabilise trait RandomSource and DefaultRandomSource from Tracking Issue for secure random data generation in std #130703. These serve cryptography applications and other needs for a random-byte-generator Edit: this trait may not be the best design; see discussion below
  • We drop ThreadRng along with rand::rngs::OsRng, ReseedingRng and likely StdRng, thus removing everything "cryptographic" from rand
  • We merge a simpler RngCore trait into rand, possibly just with next_u32 and next_u64, and remove the rand_core dependency
  • We implement rand::rngs::RngCore for all RandomSource and make rand::rng a wrapper over DefaultRandomSource
  • We also provide deterministic and portable PRNGs in rand::rngs
  • Publish a new rand version with significantly less code and dependencies

I believe this would reduce many people's issues with rand, namely that it's quite large and complex. (Not completely though; there would still be significant code involved in impls for Uniform etc.)

By itself, this wouldn't remove all usages of unsafe in rand; that is another issue which I believe is addressable (though possibly not without performance impact), if people care sufficiently about usage of unsafe outside std.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 28, 2024

Under your proposal, why even have the RandomSource trait at all? It wouldn't even work very well as a vocabulary type if rand::RngCore would still exist and there's a impl<T: RandomSource> rand::RngCore for T. That blanket impl rules out directly implementing both traits for any given generator, and many generators will want to implement next_{u32,u64} directly instead of the detour through RandomSource::fill_bytes.

And if std drops the trait, then DefaultRandomSource could be simplified down to std::random::fill_bytes(). That's also the primary interface of the getrandom crate, modulo fallibility.


I can see how this would remove some of the most tricky parts of rand's internals/dependency tree. But it would not address the stated motivation of rust-lang/libs-team#394 - practically every use of randomness still requires a third-party crate, or manually re-implementing non-trivial parts of such a crate (likely in worse ways). Even getting a uniformly random usize from a byte stream has subtle portability/reproducibility pitfalls, as you point out. There is a risk of slippery-sloping the standard library into re-inventing all of rand in the limit. But the other extreme also has a risk: if you give people who want to limit themselves to the standard library just a raw byte stream, then many will implement basic operations like "pick an element from a slice" roughly like this:

let mut buf = [0; size_of::<usize>()];
DefaultRandomSouce::default().fill_bytes(&mut buf);
// ^ or a hand-rolled seedable PRNG of choice
let r = usize::from_ne_bytes(buf);
let elem = slice[r % slice.len()];

... and that's a worse outcome than std providing std::random::range(0..slice.len()) or something at roughly that level of abstraction.

@dhardy
Copy link
Contributor

dhardy commented Nov 29, 2024

Under your proposal, why even have the RandomSource trait at all? It wouldn't even work very well as a vocabulary type if rand::RngCore would still exist and there's a impl<T: RandomSource> rand::RngCore for T. That blanket impl rules out directly implementing both traits for any given generator, and many generators will want to implement next_{u32,u64} directly instead of the detour through RandomSource::fill_bytes.

This is a good point, but I think it's workable (though there is possibly reason to keep RngCore::fill_bytes too).

For block-based PRNGs like ChaCha as well as for OsRng / DefaultRandomSource (whether or not this uses a local cache), I see no reason to implement next_{u32,u64} directly.

For word-based PRNGs, it's questionable whether these should implement the RandomSource trait. Firstly, is there any implication regarding security of the output as there is with rand::CryptoRng? Second, the only reasons to use a word-based PRNG are simplicity of implementation, small state size, and performance of word-sized outputs. I can think of some cases where it might be useful having a fill_bytes type interface on such a PRNG (we use this in rand's SIMD implementations, and it might be used to seed another PRNG, though one needs to be careful to avoid correlations there), but for the most part I would expect a block-based PRNG to be a better choice when significant usage of byte-output is required. Thus, I think it is reasonable for word-based PRNGs not to implement RandomSource.

But it would not address the stated motivation of rust-lang/libs-team#394 - practically every use of randomness still requires a third-party crate, or manually re-implementing non-trivial parts of such a crate (likely in worse ways).

This goes for some other areas too, e.g. std::time can only tell you relative time, not local time, because that's complicated (though I suspect most uses of chrono will only use a small subset of its functionality like Local and NaiveDateTime). Similarly, I think an interface for seeding a randomised hash state or cryptographic key fits well within std, but that including machinery for generating random values is a slippery slope.

But the other extreme also has a risk: if you give people who want to limit themselves to the standard library just a raw byte stream, then many will implement basic operations like "pick an element from a slice" roughly like this:

I mean, we can tell them they shouldn't do that. But if you do want such functionality, then:

// surely it's better to support this:
let elt = slice.choose(&mut rng);
// instead of expecting people to write this:
let elt = slice[rng.random_range(..slice.len())];

? (This doesn't remove the need for fn random_range but is an obvious extension.)

And even if the above functionality is in std, I still don't see much need for trait Random, except possibly as a helper to implement the above. Random is easy to implement but most of the time not what people actually want to use (except possibly fuzzing, but arguably that might also want a different API).

@newpavlov
Copy link
Contributor

newpavlov commented Nov 29, 2024

I believe only the following things should be in std/core:

  • Secure "system" source.
  • Insecure "system" source used for stuff like seeding HashMaps and non-security-critical PRNGs.
  • ThreadRng, i.e. cryptographically strong and reasonably fast RNG periodically reseeded from the secure "system" source. It's implementation could benefit from the unstable #[thread_local] feature.
  • RNG traits to generalize over the previous items.

The "system" sources should allow registration of an alternative implementation similarly to #[global_allocator]. The latter 3 items can be considered "optional", but I think it's reasonable to have them as part of std. It also may be worth to add a weaker version of ThreadRng (i.e. SmallRng periodically reseeded from the "insecure" source), but its benefits are less clear.

Everything else should be just part of a third-party crate like rand.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 30, 2024

@dhardy

For block-based PRNGs like ChaCha as well as for OsRng / DefaultRandomSource (whether or not this uses a local cache), I see no reason to implement next_{u32,u64} directly.

OS interfaces are generally byte-based, but anything ChaCha-based (and related constructions like Blake3 used as XOF) does all the computations over words. If your output buffer is [u32; N] anyway you might as well read from it in word sized chunks. rand_chacha does that via BlockRng! Even when the internal buffer is byte-based, it's easier to implement a good fast path for word-sized reads from that buffer than to make sure that the generic fill_bytes loop is optimized into something reasonable for small fixed-size reads. (And you have the option of simplifying it by skipping some bytes when the read would straddle a block boundary, which may be significantly faster when that case occurs frequently.)

Thus, I think it is reasonable for word-based PRNGs not to implement RandomSource.

This would make RandomSource much less useful than rand::Rng as a bound or trait object. Security-sensitive code that would be okay with the restriction to blocks of bytes will frequently just use a single, trusted source instead of abstracting over it. So what use cases are left? Who would implement such a trait, who would use it, and why do these two groups of people need the trait to be in the standard library instead of a crates.io library?

I mean, we can tell them they shouldn't do that. But if you do want such functionality, then [surely it's better to support slice.choose() than expecting people to sample 0..slice.len()]

Just telling people to not do it is unhelpful and unlikely to be heeded. I've done the modulo thing myself more than once, in full awareness of its issues, just because it was more expedient in that case and I judged it to be good enough. Including "random integer in range" is a sweet spot on the slippery slope to re-inventing all of rand because it's low cost and high value to provide:

  • Doing better than rand_u32() % n is just tricky and subtle enough that any given user is very unlikely to do it when needed. Even more so if they don't already know a better algorithm (the vast majority won't), or aren't even aware that there's a problem with modulo.
  • However, it's a small and simple interface backed by a very limited amount of (tricky, but completely portable and low-maintenance) code, so it's a much smaller addition to the standard library than basically anything else in this space.
  • Once this operation is available, a lot of other tasks becomes much easier to implement correctly. While slice.choose() is nicer, implementing it in terms of "random integer in range" is just a small amount of boilerplate and no harder to get wrong than any other slice indexing. Having random-in-range available is even useful for decidedly less trivial tasks like shuffling a slice (I'd rather re-implement Fisher-Yates than any of the random-in-range algorithms).

I think the situation with std::time is quite different: anything touching on time zones immediately opens huge can of worms, both in implementation and in API design. There doesn't seem to be a minimal "easy" subset of "local time" handling that would be useful to provide because it's an inherently difficult topic and glossing over the hard parts makes it easier to write incorrect programs.

@dhardy
Copy link
Contributor

dhardy commented Dec 2, 2024

If your output buffer is [u32; N] anyway you might as well read from it in word sized chunks. rand_chacha does that via BlockRng!

Yes, you are right. Replacing the RngCore for BlockRng impls as follows:

impl<R: BlockRngCore<Item = u32>> RngCore for BlockRng<R> {
    #[inline]
    fn next_u32(&mut self) -> u32 {
        super::impls::next_u32_via_fill(self)
    }

    #[inline]
    fn next_u64(&mut self) -> u64 {
        super::impls::next_u64_via_fill(self)
    }
    
    // ...
}

causes a large regression (approx 2-4x cost for u32 / u64 output). So then it does make sense to consider using a trait like RngCore instead of RandomSource (also since WASI has get-random-u64).


Who would implement such a trait, who would use it,

This is the wrong place to discuss RandomSource, but briefly: this allows writing generic RNG-agnostic code which can then be tested deterministically and benchmarked without concern for OS overhead. Compare with why rand::CryptoRng exists.

As for why this should be in std and not an external crate: there is call for a "getrandom" interface within std and this is part of what was proposed. But yes, it is worth asking whether we should instead do one of the following:

  • Add next_u32, next_u64 or maybe something like fn fill_words(&mut self, buf: &mut [u32]) to this trait
  • Or drop the RandomSource trait and only implement a function like fn fill_bytes(but: &mut [u8]), suggesting people use rand::RngCore whenever they want a trait (with rand::OsRng being a shim over fn core::random::fill_bytes).

Just telling people to not do it is unhelpful and unlikely to be heeded. I've done the modulo thing myself more than once, in full awareness of its issues, just because it was more expedient in that case and I judged it to be good enough.

My apology for the tongue-in-cheek comment ("tell them they shouldn't do that"). So there is a need for "random integer in a range".

The first question regarding random-value-generation-in-std is when do you stop the scope creep? Slice shuffling seems useful, as does choose for slices and maybe iterators. choose_multiple and choose_weighted also made the cut into rand proper (not rand_distr). Also ranged sampling for f32, f64, char, Duration, SIMD types. Also different implementations of ranged sampling for "sample once" and "make an object for repeated sampling".

The second question is whether bringing this into std better than recommending usage of rand? This is a question of how large the standard library should be and how much one should rely on third-party crates. It's also a question of trust (but less so for choose than for getrandom). It's also worth noting that rand could be a significantly small dependency if RngCore and a default CSPRNG were brought into std.

The third question is whether there might be some other advantages to merging a subset of rand's functionality into std. For example, IndexedRandom is a trait enabling additional functionality on implementations of core::ops::Index<usize>; this might not need a separate trait in std. Actually, it might: all other methods require fn len, which is not provided by Index<usize>. SliceRandom at least could have its functionality merged into [T].


I think the situation with std::time is quite different

Yes, time zones are complicated. An API for "give me a UTC time stamp for now" is not, though its implementation may be.

The fact that there are plenty of details to discuss regarding random generators and algorithms despite many people "just wanting a random integer in a range" shows the cases are not entirely dissimilar.

No, I am not fundamentally opposed to incorporating random-value functionality into std, though I dislike some of the proposals so far. Especially the way it ignores a lot of what has been built and field-tested already.

@joshtriplett
Copy link
Member

joshtriplett commented Dec 15, 2024

If it'd be substantially more efficient (or enable types of generators which are substantially more efficient) to add methods like next_u32, next_u64, next_u128, and similar to RandomSource (which I assume can have reasonable default implementations), then by all means we should do that. That trait was an attempt to implement a minimal interface, something sufficient to build initial consensus on for experimentation.

There was no desire or attempt to ignore rand here. Rather, I think that attempting to propose a larger set of functionality all at once would not have been successful. The original proposals here were, effectively, just enough to say "we should have a system RNG in the standard library" and "we should have a seedable portable deterministic RNG in the standard library for testing and replay". Those were the most controversial aspects, and needed to be settled first before we could consider what set of utility functionality to supply atop them.

Along similar lines, while I think "give me a random value of this type" is a useful trait, it's not by any means the only one, and we've already discussed having some mechanism for sampling a distribution so that we can support random floats and similar.

Slice shuffling seems useful, as does choose for slices and maybe iterators.

I expect that these at a minimum would make the cut, along with random-in-range functionality (e.g. for things like die rolls).

@dhardy
Copy link
Contributor

dhardy commented Dec 15, 2024

If it'd be substantially more efficient (or enable types of generators which are substantially more efficient) to add methods like next_u32, next_u64, next_u128,

Short answer (from memory): this is important for small (non-block) RNGs for the word size of the output (usually u32 or u64). For rand we considered that (almost?) all real targets are 32-bit or 64-bit native, and though CPUs may have some 128-bit support there isn't usually much application for this (other than generating keys/blocks which can use fill on &mut [u8] instead). Also e.g. pcg32 operates on 64-bit words internally (and pcg64 128-bit words while Xoshiro256++ operates on [u64; 4] for u64 output). next_u32 + next_u64 gives us loss-less outputs in the most important two cases.

If you want benchmarks, we should be able to hack rand removing RngCore::next_u* easily enough, to compare vs the current design.

The other important point here is the generator (and intended application). If std::random::RandomSource is only used to produce random keys/seeds direct from the OS, then I don't see any need for next_u*. But if it's intended to replace rand::RngCore supporting deterministic seeded RNGs in games/sims, then I think there is good justification for these methods.

This is the point I'm getting a little lost on here: is the intended scope to cover only getrandom or a large part of what rand currently does? Because in the latter case, I think it would be better to start from what rand has now and whittle it down (e.g. if weighted sampling is not wanted).

Along similar lines, while I think "give me a random value of this type" is a useful trait

  • For key-generation, yes, but this is what Rng::fill is for.
  • In other cases, for integer types, I don't think this is a very useful trait, except...
  • ...as part of the implementation of Rng::random_range, yes.
  • For floats, an impl which returns values in the range [0, 1) (0.0 .. 1.0 in Rust parlance) is common, though usually only to then multiply/offset into a different target range.
  • For other applications, though, I don't think it's that useful, which is why rand named this functionality rand::distr::StandardUniform implementing the same Distribution<T> trait used for other types of random distribution.

So this is another point of potential scope creep: do we want a Distribution trait in std which then gets implemented by e.g. rand_distr::Normal? Or should this trait stay in a third-party crate like rand?

Should a Uniform distribution object be included in std or only a method like Rng::random_range?

Without having answers to these questions it's hard to know what exactly should be included in std which is why above I proposed only having support for generating keys/seeds in std and recommending usage of rand for everything else. But we've already had talk of people using random() % range or similar quick hacks, so I don't think it's useful to only consider trait Random for now (and incrementally extend the scope later).

@newpavlov
Copy link
Contributor

If std::random::RandomSource is only used to produce random keys/seeds direct from the OS, then I don't see any need for next_u*.

Some targets (Hermit, RDRAND, RNDR, WASI p2) directly generate random u32/u64 values. Relying on the byte API may be less efficient in these cases. Yes, it's not super important, but it's a relatively cheap addition which also can be quite convenient in practice. Plus, I think it's reasonable to assume that system RNG, thread RNG, and user-defined PRNGs will use the same RNG trait (there is an open question about error handling, but let's not focus on it right now), and, as you've wrote, it's important for small PRNGs to have next_u32/64 methods.

@Noratrieb
Copy link
Member

I've posted it on the PR that added it already but I will repeat it here for visibility: I believe this entire feature does not belong in the Rust standard library, and it is a mistake to put there.
This feature tries to provide a "one-size-fits-all-forever" approach to random number generation, which I think is not only a rather rare use case, but a use case that the standard library, with its unique requirement to never release a 2.0, is ill-equipped to handle.

RNGs evolve over time, and better RNGs get made. In fact, what is a "good" RNG is of course entirely subjective! Some projects might really need quality, some might really need speed.
This RNG here leans on the side of quality. I think that's the right choice if a choice has to be made, but I believe that making a choice is the mistake in the first place! I have no doubt that ChaCha8 will be a high-quality RNG forever. But it is also clear that ChaCha8 is by no means the fastest RNG today, and won't be the fastest RNG in the future.

I think that when a project needs a long-term stable RNG, they should consider their choice carefully, and not just pick the easiest one. But that's exactly what std does here, provide the easiest one. A project should carefully evaluate its choice and then lock it in by choosing an exact algorithm. This should, like many other similar cases, happen via the crates.io ecosystem. A project might determine that ChaCha8 is the right algorithm for them. If they find that to be the case, they can depend on a ChaCha crate and use that forever. That crate can make breaking changes and their code will still work because it's still ChaCha, they can switch the crate to a different one etc. The key is having an algorithm that they chose.

I think that this API is an example of something where I may need to tell people "don't use std, use a crate" which I think is sad. std makes a tradeoff here that users should do themselves, and also locks itself into a potentially subpar RNG algorithm forever.

std should be careful with providing stability guarantees, it shouldn't provide guarantees it doesn't have to.

@hanna-kruppe
Copy link
Contributor

If they find that to be the case, they can depend on a ChaCha crate and use that forever. That crate can make breaking changes and their code will still work because it's still ChaCha, they can switch the crate to a different one etc. The key is having an algorithm that they chose.

Picking an algorithm and sticking to it is indeed the way to go if you want to ensure long-term reproducibility. I will say, however, that in practice this ends up meaning "picking the eclectic mix of implementation decisions made by one specific version of one specific crate" and not merely "picking ChaCha8" -- because no two crates implement it in exactly the same way and none of them want to commit to keeping every detail the same forever. I've looked at pretty much every PRNG crate on crates.io a couple months ago and none of them did that.

If you consider a ChaCha PRNG only as a long stream of bytes determined by a 32-byte key, or some non-cryptographic PRNG as producing a sequence of u64s, you may be able to find two implementations that agree and let you use seeds in the same format. But as soon as you mix different ways of pulling data from the same PRNG instance, crates won't even document how they do that, and they'll all handle it slightly differently. And don't even get me started if you want anything other than uniformly random integers -- even for something as simple as "integer in range" there's half a dozen different ways to do it. And no crate wants to commit to doing it the same way forever. I wrote one myself that tries to give such guarantees but this was only feasible because I didn't need anything other than "fill a byte buffer" and "get a uniform u32 or u64".

So, in practice, if you want the level of reproducibility that the standard library would aspire to offer (for whatever subset of functionality it includes), you end up pinning a crate version forever or vendoring the code or just writing your own. Which is fine if that's what the application in question needs. But it's not much better than similar code going into the standard library to die. It's better in that 10 years from now someone might pick a different algorithm that isn't (well-)known today, and ossify an implementation of that instead of picking the ossified ChaCha8 implementation from std. But it's substantially different from "using a crate" in the typical sense.

@workingjubilee
Copy link
Member

I have less issues with putting ChaCha8 in the stdlib and more issues with calling it something other than ChaCha8.

@newpavlov
Copy link
Contributor

newpavlov commented Feb 12, 2025

The rand crate exposes the StdRng PRNG. Under the hood it uses ChaCha8, but it explicitly opts out of portability guarantees, i.e. its output may change across platforms or even in new patch releases. It's primarily used for implementing ThreadRng.

Assuming that std will add RNG traits (I mean something like rand_core::{RngCore, SeedableRng}, not rand::Rng) and something like ThreadRng, I think it's reasonable for it to expose an StdRng-like PRNG as well.

@hanna-kruppe
Copy link
Contributor

If you let users seed it, some of them will assume (possibly unintentionally!) that the output for a given seed will remain the same. Lots of other languages' standard libraries have made that experience over the years (most recently Go). Third party crates like rand have the option of saying "don't assume that the output will remain the same forever" because if someone does depend on it, there's mitigations available: users can just not upgrade, or the crate's developers can bundle a PRNG change with a semver-incompatible release. Putting the PRNG in the standard library removes both of those options.

@dhardy
Copy link
Contributor

dhardy commented Feb 13, 2025

The rand crate exposes the StdRng PRNG. Under the hood it uses ChaCha8, but it explicitly opts out of portability guarantees, i.e. its output may change across platforms or even in new patch releases. It's primarily used for implementing ThreadRng.

No, it's ChaCha12Rng.

And yes, we've had many people complain having expected something supporting SeedableRng to offer reproducible output. We added some extra doc regarding that recently (not yet released).

@hanna-kruppe hits the nail on the head with various subtleties: a shuffling algorithm may change, a PRNG may change how it extracts a u64 or a bool from its byte-buffer, etc. rand has done a lot in the name of reproducibility (always using little-Endian byte conversions, special handling of usize ranges to always use u32 sampling where possible, testing the output of everything, slow release cycles to avoid exposing users to lots of churn), but we still reserve the right to make value-breaking changes. Once rand hits 1.0 we may revisit this, but I don't think we'll ever ban value-breaking changes (though they won't land in patch releases).

@joshtriplett
Copy link
Member

There are very much two use cases here: secure random number generation that makes no reproducibility promises, and indefinitely-reproducible random number generation that makes no security promises.

Both of those are useful, and both of them are things that libs-api specifically reviewed for suitability for the standard library, for different purposes. Nailing down an algorithm and implementation for the latter is the expected approach. The testsuite will ensure, among other things, that the same seed produces exactly the same values, so that it won't get unintentionally broken in the future.

Orthogonally to that, there are extremely valid concerns about the interface to that functionality, and it sounds like the interfaces to both may well need to be reworked. That's fine, and that was expected. The goal of these initial proposals wasn't to propose the perfect interface on the first try; it was to have a minimal sufficient interface to successfully evaluate the proposition of having random number generation in the standard library at all.

@newpavlov
Copy link
Contributor

newpavlov commented Feb 14, 2025

@joshtriplett
I think we have more valid use cases than the two listed by you:

  • Non-deterministic secure RNG (rand::{OsRng, ThreadRng})
  • Non-deterministic (potentially) insecure RNG
  • Deterministic secure PRNG (rand::StdRng)
  • Deterministic insecure (but really fast) PRNG (rand::SmallRng)

Non-deterministic RNGs are discussed in the separate issue, so let's not focus on them.

The main concern about deterministic RNGs exposed in std is whether they should keep output reproducibility across targets and/or compiler versions. We have 3 options:

  1. Do not expose deterministic PRNGs in std at all and leave them for third-party crates, even if std will include them internally for stuff like ThreadRng.
  2. Expose deterministic PRNGs and explicitly do not provide any reproducibility guarantees like we do with StdRng and SmallRng. As noted above, the main disadvantage is that some users may mistakenly depend on output reproducibility.
  3. Expose deterministic PRNGs and provide reproducibility guarantees. The main disadvantage is ossification risks, the chosen PRNG may become broken or suboptimal and we will not be able to change it because of the provided guarantees.

Personally, I would rank the options (from the most preferred to the least) as 2, 1, and 3. Yes, some users would make the mistake, but it's their fault and potential risks are relatively contained. It may be even worth to intentionally change PRNG output on each compiler release (e.g. by including compiler version during PRNG initialization).

@joshtriplett
Copy link
Member

2 would negate the value of shipping this feature in the first place. One of the use cases is for things like replaying recorded data and getting the same results, years later.

Also, I agree that there are other RNG use cases. There are a variety of hazards associated with the "deterministic secure RNG" case, and that one I'd be completely on board with "this makes absolutely no guarantees about reproducibility from version to version", because there's no way we could comfortably promise security otherwise.

@dhardy
Copy link
Contributor

dhardy commented Feb 14, 2025

2 would negate the value of shipping this feature in the first place. One of the use cases is for things like replaying recorded data and getting the same results, years later.

This use-case is easily solved by advising people to use some rand_foo::FooRng (external crate), or simply to copy the code (for non-crypto PRNGs not much code is required).

But as to value of a non-numerically-stable PRNG, rand::rngs::SmallRng is small and fast thing. We use it in benchmarks where we don't want the RNG to be a bottleneck. It appears to have plenty of uses in the wild. rand::rngs::StdRng was only really made public since it's a necessary dependency of ThreadRng, but it appears to have even more uses — possibly just because people picked it as a default choice, given that at a glance these usages don't appear to need the cryptographic property of StdRng.

The big caveat here is that not everyone can be bothered to read the docs and see that rand::rngs::StdRng does not guarantee reproducibility. We certainly have had people assume it would be since it supports SeedableRng.

@joshtriplett
Copy link
Member

joshtriplett commented Feb 14, 2025

This use-case is easily solved by advising people to use some rand_foo::FooRng (external crate)

Which is advocating for case 1 (leaving it out of the standard library). I'm advocating for case 3: this should be in the standard library. And if people are concerned that the PRNG chosen might become ossified, well, ossified is very much the intentional point.

@Noratrieb
Copy link
Member

Noratrieb commented Feb 14, 2025

I would say that ossifying the RNG is actually not the point, it's a mistake. What projects want is that when they pick an RNG, they want it to be the same forever. Notably, projects do not need their stable-forever RNG to be the same as every other project's stable-forever RNG.
Today, we pick ChaCha8 as the RNG for this job. I think if a new project is started today and needs a stable-forever RNG, ChaCha8 is indeed a great choice. But I do not think that ChaCha8 will be the best choice for this in 20 years. Now for the project that has been started today, it doesn't care, it will still need ChaCha8 in 20 years.
But what about a project that's started in 20 years? If we leave this job to external crates, they can come and go. Today using a ChaCha8 crate is popular, and that will still exist in 20 years and be used by old projects. But in 20 years we'll also have a crate for a new algorithm, which will become popular and used by new projects.
But if we put this in the standard library, projects in 20 years will still get ChaCha8 (a subpar outdated algorithm), because the the standard library can't change it obviously.
And in my opinion that's exactly why the standard library is the wrong place for a stable-forever RNG. It cannot ever evolve it without breaking anyone, while the crates ecosystem can evolve it without breaking anyone.

@joshtriplett
Copy link
Member

will still get ChaCha8 (a subpar outdated algorithm),

ChaCha8 will not become any worse of a choice, for this purpose. You shouldn't be using this unless you need forever-reproducible random numbers.

@workingjubilee
Copy link
Member

then why use ChaCha8 and not some other unfortunately antiquated algorithm that is easier to implement, validate, and maintain?

@joshtriplett
Copy link
Member

then why use ChaCha8 and not some other unfortunately antiquated algorithm that is easier to implement, validate, and maintain?

If there's a better algorithm that is likely to be reasonably fast on a wide range of CPUs, has a very long period, and generates output where every subset of bits is equally random, then by all means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants