-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for deterministic random number generation #131606
Comments
ACP: rust-lang/libs-team#394 Tracking issue: rust-lang#131606 The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG. Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.
https://go.dev/blog/randv2 might be relevant here |
ACP: rust-lang/libs-team#394 Tracking issue: rust-lang#131606 The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG. Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.
If a counter-based PRNG is chosen instead, exposing the current state (in addition or instead of?) the original seed would be valuable |
Could the source also support seeding from a u64 (e.g. adopting how rand expands the seed), so that there is one well-supported approaches for this very common use case? |
Since it's always supposed to give the same output forever after stabilization, it seems prudent to choose an algorithm that can be expected to last for a decade or more without regrets. It would be awkward to deprecate this part of std because it later turns out to have some flaws that can't be fixed without breaking reproducibility (see the Go math/rand/v2 story). A well-established cipher that's considered secure today almost certainly won't turn out have major flaws as a source of statistical randomness: any feasible way to distinguish the output from random is a big deal for cryptanalysis, and even an "academic break" (say, key recovery in 2^100 time) doesn't necessarily mean anything about the suitability for Monte Carlo simulations. The same can't be said about the myriad of non-cryptographic designs, where it's often only a matter of time and eyeballs until serious flaws are discovered. For example, AES (Rijndael) is from 1998 and still going strong while the non-cryptographic MT19937 from 1997 was very popular for many years but is now considered flawed and obsolete. ChaCha is from 2008. Several years later, people were still publishing new non-crypto RNGs that fail TestU01, a suite of statistical tests dating to 2007--2009. |
Forgive me, but why re-invent this functionality in the Provision of an OS-getrandom API makes sense since much of the code is in Yes, I get it: PRNGsThere is no particular best PRNG. ChaCha8 should be fine for this use-case, but it's fundamentally a block-based RNG whereas a word-based PRNG like Xoshiro or PCG will be substantially faster for many use-cases. Since this is explicitly about a user-seeded PRNG with user-managed-state there is no good reason not to let the user choose the algorithm too (from some set of options). The name Random traitThe proposed Uniform ranged sampling is the obvious other application. There are quite a few algorithms for this. If you want reproducible outputs, pick one of these and stick with it (or keep the impl unstable until the algorithm is fixed). Now implement for On the topic of reproducibility, we recently removed support for sampling Additional scope?There are several obvious possibilities for scope creep:
This is most of what we elected to keep in ProposalWhat I'd propose therefore is:
I believe this would reduce many people's issues with By itself, this wouldn't remove all usages of |
Under your proposal, why even have the And if std drops the trait, then I can see how this would remove some of the most tricky parts of let mut buf = [0; size_of::<usize>()];
DefaultRandomSouce::default().fill_bytes(&mut buf);
// ^ or a hand-rolled seedable PRNG of choice
let r = usize::from_ne_bytes(buf);
let elem = slice[r % slice.len()]; ... and that's a worse outcome than std providing |
This is a good point, but I think it's workable (though there is possibly reason to keep For block-based PRNGs like ChaCha as well as for For word-based PRNGs, it's questionable whether these should implement the
This goes for some other areas too, e.g.
I mean, we can tell them they shouldn't do that. But if you do want such functionality, then: // surely it's better to support this:
let elt = slice.choose(&mut rng);
// instead of expecting people to write this:
let elt = slice[rng.random_range(..slice.len())]; ? (This doesn't remove the need for And even if the above functionality is in |
I believe only the following things should be in
The "system" sources should allow registration of an alternative implementation similarly to Everything else should be just part of a third-party crate like |
OS interfaces are generally byte-based, but anything ChaCha-based (and related constructions like Blake3 used as XOF) does all the computations over words. If your output buffer is
This would make
Just telling people to not do it is unhelpful and unlikely to be heeded. I've done the modulo thing myself more than once, in full awareness of its issues, just because it was more expedient in that case and I judged it to be good enough. Including "random integer in range" is a sweet spot on the slippery slope to re-inventing all of
I think the situation with |
Yes, you are right. Replacing the impl<R: BlockRngCore<Item = u32>> RngCore for BlockRng<R> {
#[inline]
fn next_u32(&mut self) -> u32 {
super::impls::next_u32_via_fill(self)
}
#[inline]
fn next_u64(&mut self) -> u64 {
super::impls::next_u64_via_fill(self)
}
// ...
} causes a large regression (approx 2-4x cost for
This is the wrong place to discuss As for why this should be in
My apology for the tongue-in-cheek comment ("tell them they shouldn't do that"). So there is a need for "random integer in a range". The first question regarding random-value-generation-in-std is when do you stop the scope creep? Slice shuffling seems useful, as does The second question is whether bringing this into The third question is whether there might be some other advantages to merging a subset of
Yes, time zones are complicated. An API for "give me a UTC time stamp for now" is not, though its implementation may be. The fact that there are plenty of details to discuss regarding random generators and algorithms despite many people "just wanting a random integer in a range" shows the cases are not entirely dissimilar. No, I am not fundamentally opposed to incorporating random-value functionality into |
If it'd be substantially more efficient (or enable types of generators which are substantially more efficient) to add methods like There was no desire or attempt to ignore Along similar lines, while I think "give me a random value of this type" is a useful trait, it's not by any means the only one, and we've already discussed having some mechanism for sampling a distribution so that we can support random floats and similar.
I expect that these at a minimum would make the cut, along with random-in-range functionality (e.g. for things like die rolls). |
Short answer (from memory): this is important for small (non-block) RNGs for the word size of the output (usually If you want benchmarks, we should be able to hack The other important point here is the generator (and intended application). If This is the point I'm getting a little lost on here: is the intended scope to cover only
So this is another point of potential scope creep: do we want a Should a Without having answers to these questions it's hard to know what exactly should be included in |
Some targets (Hermit, RDRAND, RNDR, WASI p2) directly generate random |
I've posted it on the PR that added it already but I will repeat it here for visibility: I believe this entire feature does not belong in the Rust standard library, and it is a mistake to put there. RNGs evolve over time, and better RNGs get made. In fact, what is a "good" RNG is of course entirely subjective! Some projects might really need quality, some might really need speed. I think that when a project needs a long-term stable RNG, they should consider their choice carefully, and not just pick the easiest one. But that's exactly what std does here, provide the easiest one. A project should carefully evaluate its choice and then lock it in by choosing an exact algorithm. This should, like many other similar cases, happen via the crates.io ecosystem. A project might determine that ChaCha8 is the right algorithm for them. If they find that to be the case, they can depend on a ChaCha crate and use that forever. That crate can make breaking changes and their code will still work because it's still ChaCha, they can switch the crate to a different one etc. The key is having an algorithm that they chose. I think that this API is an example of something where I may need to tell people "don't use std, use a crate" which I think is sad. std makes a tradeoff here that users should do themselves, and also locks itself into a potentially subpar RNG algorithm forever. std should be careful with providing stability guarantees, it shouldn't provide guarantees it doesn't have to. |
Picking an algorithm and sticking to it is indeed the way to go if you want to ensure long-term reproducibility. I will say, however, that in practice this ends up meaning "picking the eclectic mix of implementation decisions made by one specific version of one specific crate" and not merely "picking ChaCha8" -- because no two crates implement it in exactly the same way and none of them want to commit to keeping every detail the same forever. I've looked at pretty much every PRNG crate on crates.io a couple months ago and none of them did that. If you consider a ChaCha PRNG only as a long stream of bytes determined by a 32-byte key, or some non-cryptographic PRNG as producing a sequence of So, in practice, if you want the level of reproducibility that the standard library would aspire to offer (for whatever subset of functionality it includes), you end up pinning a crate version forever or vendoring the code or just writing your own. Which is fine if that's what the application in question needs. But it's not much better than similar code going into the standard library to die. It's better in that 10 years from now someone might pick a different algorithm that isn't (well-)known today, and ossify an implementation of that instead of picking the ossified ChaCha8 implementation from std. But it's substantially different from "using a crate" in the typical sense. |
I have less issues with putting ChaCha8 in the stdlib and more issues with calling it something other than ChaCha8. |
The Assuming that |
If you let users seed it, some of them will assume (possibly unintentionally!) that the output for a given seed will remain the same. Lots of other languages' standard libraries have made that experience over the years (most recently Go). Third party crates like |
No, it's And yes, we've had many people complain having expected something supporting @hanna-kruppe hits the nail on the head with various subtleties: a shuffling algorithm may change, a PRNG may change how it extracts a |
There are very much two use cases here: secure random number generation that makes no reproducibility promises, and indefinitely-reproducible random number generation that makes no security promises. Both of those are useful, and both of them are things that libs-api specifically reviewed for suitability for the standard library, for different purposes. Nailing down an algorithm and implementation for the latter is the expected approach. The testsuite will ensure, among other things, that the same seed produces exactly the same values, so that it won't get unintentionally broken in the future. Orthogonally to that, there are extremely valid concerns about the interface to that functionality, and it sounds like the interfaces to both may well need to be reworked. That's fine, and that was expected. The goal of these initial proposals wasn't to propose the perfect interface on the first try; it was to have a minimal sufficient interface to successfully evaluate the proposition of having random number generation in the standard library at all. |
@joshtriplett
Non-deterministic RNGs are discussed in the separate issue, so let's not focus on them. The main concern about deterministic RNGs exposed in
Personally, I would rank the options (from the most preferred to the least) as 2, 1, and 3. Yes, some users would make the mistake, but it's their fault and potential risks are relatively contained. It may be even worth to intentionally change PRNG output on each compiler release (e.g. by including compiler version during PRNG initialization). |
2 would negate the value of shipping this feature in the first place. One of the use cases is for things like replaying recorded data and getting the same results, years later. Also, I agree that there are other RNG use cases. There are a variety of hazards associated with the "deterministic secure RNG" case, and that one I'd be completely on board with "this makes absolutely no guarantees about reproducibility from version to version", because there's no way we could comfortably promise security otherwise. |
This use-case is easily solved by advising people to use some But as to value of a non-numerically-stable PRNG, The big caveat here is that not everyone can be bothered to read the docs and see that |
Which is advocating for case 1 (leaving it out of the standard library). I'm advocating for case 3: this should be in the standard library. And if people are concerned that the PRNG chosen might become ossified, well, ossified is very much the intentional point. |
I would say that ossifying the RNG is actually not the point, it's a mistake. What projects want is that when they pick an RNG, they want it to be the same forever. Notably, projects do not need their stable-forever RNG to be the same as every other project's stable-forever RNG. |
ChaCha8 will not become any worse of a choice, for this purpose. You shouldn't be using this unless you need forever-reproducible random numbers. |
then why use ChaCha8 and not some other unfortunately antiquated algorithm that is easier to implement, validate, and maintain? |
If there's a better algorithm that is likely to be reasonably fast on a wide range of CPUs, has a very long period, and generates output where every subset of bits is equally random, then by all means. |
Feature gate:
#![feature(deterministic_random_chacha8)]
This is a tracking issue for
DeterministicRandomSource
, a deterministic random number generation that will always generate the same values for a given seed.Public API
Steps / History
Unresolved Questions
seed
function make sense?Footnotes
https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩
The text was updated successfully, but these errors were encountered: