Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the dependency on zerocopy optional #1574

Open
hpenne opened this issue Feb 3, 2025 · 32 comments
Open

Make the dependency on zerocopy optional #1574

hpenne opened this issue Feb 3, 2025 · 32 comments

Comments

@hpenne
Copy link

hpenne commented Feb 3, 2025

Background

Projects that need to vet their dependencies will suffer greatly now that you have introduced a dependency on zerocopy, which is a big crate with a huge amount of unsafe code. Since just about any major project uses rand, this now makes "every" major Rust project depend on zerocopy as well.
This is really problematic. Please make this optional and off by default.

I understand that some users want the absolute best performance and that you want to support this, but most applications do not need this and introducing a dependency like zerocopy will have a significant negative impact on many users, one that it is currently almost impossible to get out of.

Feature request

Please make the performance optimizations that require zerocopy optional and off by default.

@hpenne
Copy link
Author

hpenne commented Feb 3, 2025

Looking quickly at the commit that introduced this, it may seem like zerocopy (which contains lots of unsafe code) was introduced to remove a very small amount of unsafe code from rand. Was that the only motivation? If so, then this change is a huge negative for many users, as the amount of unsafe code in total in our projects has now suddenly increased by a very significant number. If this was the only motivation then please consider reverting this change.

@benjamin-lieser
Copy link
Member

Using zerocopy should make things safer and not more unsafe. It replaces potentially unsound unsafe code with safe semantics which are carefully checked and relay more on the compiler than on humans to verify.

I think it is a bit misguided to only focus on the amount of unsafe code. zerocopy reduces the possibility to write unsound code and that is a good thing in my opinion.

You could argue that we can leave some performance and remove unsafe code and zerocopy, but this is a different story.

@dhardy
Copy link
Member

dhardy commented Feb 3, 2025

I saw there was a discussion on reddit about this, including a number of comments supporting this move.

I'm not fully aware of how this affects different groups, but yes, zerocopy was introduced to reduce the amount of unsafe code in rand (though obviously it hasn't replaced all usages of unsafe). So if reverting back to use more unsafe in rand is better for more users I am receptive to making that change — it can be hard to judge reception of such a change.

See also this comment by @joshlf in #1349 and also #1502.

@jswrenn
Copy link

jswrenn commented Feb 3, 2025

I co-maintain zerocopy and lead Rust's Project Safe Transmute — happy to answer questions about either project.

I apologize for any inconvenience this has caused you, but I do believe it's a one-time cost. Zerocopy is a dependency of many major Rust crates, and thus already is a transitive dependency of many non-trivial Rust projects. If zerocopy's use results in more unsafe being concentrated into a single dependency in your project's tree, that means you only need to vet zerocopy — rather than n dependencies all doing their own unsafe transmutes.

We strive to make this vetting process as easy as possible. Zerocopy has extensive internal documentation, and most of our SAFETY comments directly quote from Rust's stdlib or reference documentation. If there's anything we can do to make the vetting process easier, we'd love to hear it.

Stepping back, the use of crates like bytemuck and zerocopy in the Rust ecosystem is incredibly important to Rust's Project Safe Transmute. Functionality requests get funneled to these crates and that, in turn, helps inform the design of the Rust's future support for safer transmutation.

@hpenne
Copy link
Author

hpenne commented Feb 3, 2025

Let me assure you that I have no doubt that zerocopy is well maintained and of very high quality.

Our situation is that we need to vet our dependencies. This is not a one time cost. Every time we upgrade to a new version of a crate, it needs to be vetted again (a delta review at least). This includes (but is not limited to) doing a code audit/review of all new or modified unsafe code. The fact that zerocopy is well documented and well maintained does not actually help that much in this regard.
This makes the new dependency on zerocopy painful and costly.

The proliferation of dependencies is a challenge for us. rand was a "good crate" in this regard as it has a limited and well defined purpose, and used no dependencies to achieve this. This ended with version 0.9, when zerocopy was introduced. We will now have to do a code audit of a new version zerocopy whenever we upgrade our dependencies. This will have a significant cost for us. It is probably much cheaper for us long term to replace or fork rand somehow to get rid of the dependency on zerocopy.

I suppose the majority of users of rand will be fine with this change. Most/many care little about dependencies at all. However, for the users that are trying to keep the number of dependencies low, or that have to vet their dependencies, the introduction of this dependency on zerocopy will cause pain and higher costs. If a fundamental and extensively used crate like rand does not care about the collateral damage from this, then who will?

If this was done only to remove unsafe code, by delegating this to zerocopy, the please consider fixing/reverting this. Alternatively, if this is only necessary for performance optimization, then please consider putting this performance optimization behind an optional feature so that users that do not need it can opt out.

@jswrenn
Copy link

jswrenn commented Feb 3, 2025

Every time we upgrade to a new version of a crate, it needs to be vetted again (a delta review at least). This includes (but is not limited to) doing a code audit/review of all new or modified unsafe code. The fact that zerocopy is well documented and well maintained does not actually help that much in this regard.

That does sound like it could be painful. However, rand presently uses three APIs from zerocopy: transmute!, as_bytes and as_mut_bytes. The implementation of these APIs is self-contained and does not involve vast swaths of unsafe code. I can't make promises, but we rarely (if ever) update these implementations. We haven't touched transmute! (other than to move it to a different file) in four years.

@mitsuhiko
Copy link

I saw there was a discussion on reddit about this, including a number of comments supporting this move.

I understand that reddit is very much in support of this, and seemingly so is a large part of the Rust community. That is disappointing. Unfortunately it means that for plenty of commercial contexts it's very likely that this will create a growing rift in the ecosystem.

It's becoming harder and harder to audit Rust projects because of changes like this. I understand the goal of zerocopy and I can see that it's a great crate. It is however a massive dependency.

In practical terms this will cause a lot of people to move to hand written RNGs or fastrand which I don't think is a good move for this ecosystem.

@benjamin-lieser
Copy link
Member

benjamin-lieser commented Feb 3, 2025

I would say in summary the change to zerocopy was good for most users, but for some it is quite a pain.

I think it is feasible to have zerocopy behind a feature-flag and have safe but possible slow code without it (or fast and unsafe code as alternative). There are only a handful of places which would need to change and be maintained (I found 4, in rand and rand_core). Not sure if the added complexity is worth it, but it is also not too bad if enough people feel strongly about it.

There is also hope this might become unnecessary with better compiler support.

If this is deemed worth it, I can do a PR for it.

@Bluefinger
Copy link

In practical terms this will cause a lot of people to move to hand written RNGs or fastrand which I don't think is a good move for this ecosystem.

I think if minimal/no dependencies and zero unsafe is your main focus, then you are probably hand-writing your own RNGs or using fastrand already. But at some point, efforts to reduce unsafe will converge on utilising well-tested/battle-hardened/verified work, and since that effort is particularly hard, folks will want to not have to repeat/retread the same ground over and over. So, they will use what's been provided already by folks who have fully committed into providing said utilities.

And if you are in the position where this is untenable, forking/vendoring is the answer. But I think this is a wrong move to undo efforts to make rand safer just because it invokes a new dependency whose entire purpose is to make things safer. Feature-gating also adds additional maintenance burden which should not be forgotten either. However, I am not a maintainer, so in the end, it will be their call.

@briansmith
Copy link

briansmith commented Feb 3, 2025

There are only two places in rand that use zerocopy. I believe at least one of them can be easily rewritten to avoid unsafe and avoid zerocopy. I think the other use can probably similarly be rewritten.

The dependency on zerocopy was added during a flurry of PRs that proposed adding zerocopy as a dependency to many crates, including one of mine. In my case, I was able to refactor my code to avoid unsafe and also avoid adding a dependency. I also remember reviewing another project's PR and found that the added zerocopy dependency was also superfluous.

Accordingly, I think spending effort on PRs might be a more useful use of time than the meta-discussion about dependency policies.

@mitsuhiko
Copy link

I don't want to drag this out excessively, but I do want to respond to this:

I think if minimal/no dependencies and zero unsafe is your main focus, then you are probably hand-writing your own RNGs or using fastrand already.

In a world where I control all my dependencies, maybe. But that's just unrealistic. For instance rand ends up in a ton of projects today because of the UUID's v4 feature. In some cases you don't even have the option to avoid it, because something else turns it on. At least today you can end up using rand via some other very large well established crates.

@hpenne
Copy link
Author

hpenne commented Feb 3, 2025

Every time we upgrade to a new version of a crate, it needs to be vetted again (a delta review at least). This includes (but is not limited to) doing a code audit/review of all new or modified unsafe code. The fact that zerocopy is well documented and well maintained does not actually help that much in this regard.

That does sound like it could be painful. However, rand presently uses three APIs from zerocopy: transmute!, as_bytes and as_mut_bytes. The implementation of these APIs is self-contained and does not involve vast swaths of unsafe code. I can't make promises, but we rarely (if ever) update these implementations. We haven't touched transmute! (other than to move it to a different file) in four years.

I'm sure that is all correct, but that does not help an audit process. You see as far as auditing is concerned, this is just hearsay and would have to be verified every time the a new version is audited. It is impractical to do an analysis of the call graph of rand when auditing zerocopy for a project. It is also possible that other crates have started to depend on it since your last audit. In the end, if you depend on it you do not have much choice other than doing a full audit.

@hpenne
Copy link
Author

hpenne commented Feb 3, 2025

I think an "almost std" crate like rand should have absolutely minimal (preferably zero) dependencies. This should be a top priority, to avoid forcing users of rand to depend on crates that they would otherwise need. Rand is the de facto standard for random. That success comes with responsibilities, and you should not take those lightly. That should mean avoiding taking any dependencies unless they are absolutely necessary, and zerocopy does not seem necessary.

@Bluefinger
Copy link

One final response and then I'll leave things here:

In a world where I control all my dependencies, maybe. But that's just unrealistic. For instance rand ends up in a ton of projects today because of the UUID's v4 feature. In some cases you don't even have the option to avoid it, because something else turns it on. At least today you can end up using rand via some other very large well established crates.

And zerocopy tends to appear in a lot of crate dependencies as well, either that or bytemuck. If you are in large enough project territory where rand is likely to appear in your dependency tree, then so is zerocopy and bytemuck. The ecosystem has mostly converged on the two for zero-copy shenanigans and safe transmutes. It's not just any random crate (heh) we are talking about here.

@mitsuhiko
Copy link

And zerocopy tends to appear in a lot of crate dependencies as well, either that or bytemuck. If you are in large enough project territory where rand is likely to appear in your dependency tree, then so is zerocopy and bytemuck. The ecosystem has mostly converged on the two for zero-copy shenanigans and safe transmutes. It's not just any random crate (heh) we are talking about here.

You are not wrong, but that is a very recent change. It's very disappointing and frustrating to see that crate show up in critical parts of the ecosystem at the moment for what was previously rather lightweight crates.

I'm not sure where the push towards zerocopy is coming from but it's disappointing. What is however particularly disappointing in the case of rand is that it also uses the derive feature which is much heavier. The other common dependency I have is hashbrown that pulls it in, and it at least does not use derive (See also discussion in tkaitchuck/aHash#180)

Up until the change in rand I did not have any zerocopy-derive in my dependency tree and with it I had plenty of projects without proc-macro2 and syn.

@jswrenn
Copy link

jswrenn commented Feb 3, 2025

What is however particularly disappointing in the case of rand is that it also uses the derive feature which is much heavier.

Hm, could you say more about this? Both rand and rand_core pull in zerocopy without the derive feature:

...is something else in the dependency tree enabling this feature?

@newpavlov
Copy link
Member

newpavlov commented Feb 3, 2025

It looks like ppv-lite86 pulls zerocopy with enabled derive feature. Yet another reason to move forward with migration to chacha20.

@joshlf
Copy link
Contributor

joshlf commented Feb 3, 2025

@hpenne Can you say more about who "we" is and what your vendoring processes look like? I work on the Fuchsia team at Google, and we have similar process, so I understand where you're coming from regarding being dependency-skeptical.

For this specific issue (leaving ppv-lite86 aside), if there's going to be any change, I'd advocate for it to be a feature flag which is on by default. The Reddit thread makes it clear that a large amount of the Rust community is in support of the tradeoff (less unsafe but the same performance in exchange for a new dependency). Those who want to avoid the dependency can still do so by disabling the feature at the expense of slightly worse performance.

@briansmith
Copy link

I did a quick sketch of replacing one of the uses of zerocopy with safe Rust code: PR #1575

The other use of zerocopy is very similar.

@briansmith
Copy link

Those who want to avoid the dependency can still do so by disabling the feature at the expense of slightly worse performance.

I don't know how to properly evaluate performance of the functions in rand_core that are using zerocopy or which use cases to optimize for. That's one of the advantages of zerocopy; what was a memcpy before was still a memcpy after.

I would also say that, at least as far as the code in PR #1575 is concerned, if that code isn't being optimized well by the compiler, it's a pretty bad indictment of the compiler; if I were on the compiler team, I would consider it to be a bug.

@hpenne
Copy link
Author

hpenne commented Feb 4, 2025

Those who want to avoid the dependency can still do so by disabling the feature at the expense of slightly worse performance.

I don't know how to properly evaluate performance of the functions in rand_core that are using zerocopy or which use cases to optimize for. That's one of the advantages of zerocopy; what was a memcpy before was still a memcpy after.

I would also say that, at least as far as the code in PR #1575 is concerned, if that code isn't being optimized well by the compiler, it's a pretty bad indictment of the compiler; if I were on the compiler team, I would consider it to be a bug.

Optimisers are amazing, and perform better than most developers can imagine. It's starting to seem like the dependency on zerocopy might be unnecessary from a performance point of view. Perhaps that is worth exploring further?

Now on to something more sinister.

I'm sure many of you are familiar with the OpenSSH and XZ/liblzma incident, which came very close to compromising a huge number of Linux systems. The community nallowly escaped this, but mostly due to a lucky discovery and the extraordinary efforts of one single person.

There was a lesson in this for the offensive state actors/hackers of the world. This lesson is that a small amount of money invested long term in supply chain attacks can have a huge return and allow you to compromise a vast number of systems. Any such state actor that did not already fund long term supply chain attack programs will now be doing so, after the XZ incident. The incident proves that it is stupid not to.

The Rust community will inevitably also be the target of such long term supply chain attack efforts. I suspect Google will be an interesting target due to the use of Rust in Android etc., and among Google's crates, zerocopy would be an interesting target indeed.

Making rand depend on zerocopy puts rand and all its dependents in harm's way for such a supply chain attack. Is that really a risk that you are willing to take on behalf of your users?

The dependency on zerocopy was added during a flurry of PRs that proposed adding zerocopy as a dependency to many crates, including one of mine. In my case, I was able to refactor my code to avoid unsafe and also avoid adding a dependency. I also remember reviewing another project's PR and found that the added zerocopy dependency was also superfluous.

This kept me awake in the early hours of the morning. If you are invested in a long term effort to do a supply chain attack through a crate, one of the things you would like to do while you are trying to get into a position to compromise the target crate is to try to make as may other crates depend on it as possible, in order to widen the impact of your efforts should you succeed.

I know that this will be considered paranoia by many, and that I risk offending members of the zerocopy community. That is is not my intention, and for that I apologize. My previous paragraph is likely to be wrong (and I certainly hope so), but Rust certainly will be the target of future supply chain attacks and zerocopy is a very attractive target, and even more so if it is introduced in more widely used crates, like rand.

I do hope that you can find the time to consider removing the dependency on zerocopy, especially if there is merit to the ideas I see that some/all of the unsafe might be unnecessary. I'd be happy to help out if I can, and I am reasonably certain that my employer will allow me to spend a little time on that. You should not trust a stranger like me to write unsafe code, but I can probably be trusted to remove some.

@Bluefinger
Copy link

Could we not get into conspiracy theory level thinking about dependencies, please? Let's not go assuming malice out of other open source contributors and maintainers simply because you don't like a dependency.

@dhardy
Copy link
Member

dhardy commented Feb 4, 2025

Lets... just side-step the political / conspiracy theories and focus on a more technical question: can we drop the zerocopy dependency without significant cost (in code complexity of rand, local usage of unsafe (which has IIRC never been zero) and performance)?

I thank @briansmith for getting started here with #1575.


On a slightly different but related tack, we already have proposals #934 and #1545 which would replace/drop the dependency on rand_chacha. The latter is obviously much simpler code-wise (not requiring any complex PRNG to ship with rand), but I'd like more feedback on performance before committing — it's also not something that should just land in a patch release, so is more likely to wait until 1.0.

@hpenne
Copy link
Author

hpenne commented Feb 4, 2025

Could we not get into conspiracy theory level thinking about dependencies, please? Let's not go assuming malice out of other open source contributors and maintainers simply because you don't like a dependency.

It was never my intention to imply that anyone here added this dependency to rand out of malice. I was trying to make the point that there will be malicious actors out there with the intent of doing supply chain attacks, which might be relevant to the discussion. I believe that the push to depend on zerocopy which I quoted is in all likelihood perfectly harmless, but that there is always a small non-zero probability that something like that might not be. However, taking the speculation that far was too much. Please accept my apologies for that.

It makes me happy to see that there is work in progress to try to do this without zerocopy. I hope that succeeds.

@josephlr
Copy link
Member

josephlr commented Feb 4, 2025

If we can find ways to implement things in a way that is safe, readable, and performant (on relevant benchmarks), I'm fine changing the implementation.

But if we can't do that (i.e. things would be an unreadable/unsafe mess otherwise), I would want to stick with zerocopy. It's a very well-tested and well-maintained crate, and at this point I would trust it more than unsafe code I would write myself.

@dhardy
Copy link
Member

dhardy commented Feb 4, 2025

@josephlr the point motivating this issue remains, however: zerocopy is a complex dependency and not easy to vet (e.g. v0.8.15's code contains 299 counts of unsafe and 73 of macro_rules). I would therefore consider alternatives using unsafe acceptable, provided these are easy enough to verify.

@briansmith
Copy link

Just to clarify what I said above, I think the zerocopy developers were doing a good thing when they contirbuted their PRs to projects. I think in many situations adding the zerocopy dependency wasn't the best solution to the problem, but when they had dozens (IIRC) of crates to contribute to, it wouldn't have been reasonable to create an optimal solution for every instance, especially when there are trade-offs to be made. (I, myself have, multiple times, contributed to dozens of crates in a single day to update them to use a new API.)

@joshlf
Copy link
Contributor

joshlf commented Feb 4, 2025

@dhardy Would you consider making zerocopy optional and on-by-default to address everyone's concerns here? It lets people who are concerned about the dependency act on that concern without affecting performance for most users. It's also probably an easier change to make than trying to develop a safe alternative to the current code that achieves performance parity.

On Fuchsia, we occasionally have issues with specific dependencies similar to what @hpenne is describing. What we usually do is just patch or fork in our vendor directory so that we don't have to review a particular crate. If rand had zerocopy as an optional feature, then anyone with a concern could just carry a small patch in their vendor directory to disable that feature for all of their dependencies which depend on rand.

@hpenne
Copy link
Author

hpenne commented Feb 4, 2025

@dhardy Would you consider making zerocopy optional and on-by-default to address everyone's concerns here? It lets people who are concerned about the dependency act on that concern without affecting performance for most users. It's also probably an easier change to make than trying to develop a safe alternative to the current code that achieves performance parity.

Perhaps it is better to see how the active PR turns out. It is not unlikely that the speed difference will be too small to matter, and keeping zerocopy as a dependency will make zero sense in such a scenario.

@the8472
Copy link

the8472 commented Feb 5, 2025

This includes (but is not limited to) doing a code audit/review of all new or modified unsafe code.

This seems like self-inflicted pain if you review more than what is used by the dependent crate. If you applied some form of cross-crate-dead-code-elimination before review the burden would become much smaller.

It is impractical to do an analysis of the call graph of rand when auditing zerocopy for a project.

Perhaps this is a matter of having a process without adequate tooling? Should rand and zerocopy be responsible for lightening your audit workload, rather than the audit process being responsible for maintaining efficiency/scalability?

@dhardy
Copy link
Member

dhardy commented Feb 5, 2025

@dhardy Would you consider making zerocopy optional and on-by-default to address everyone's concerns here?

I'd prefer only having a single implementation of functionality, assuming a reasonable solution can be found.

(And, considering that this crate does not have a policy against usage of unsafe, that is also an option, so long as the usage is local and easy to reason about.)

It looks like we have a solution to the usage of zerocopy in rand_core. The following usages remain in rand:

  • zerocopy::IntoBytes is used in src/rng.rs to implement Fill for various types.
  • zerocopy::transmute! is used in src/distr/integer.rs

Edit: the simplest solution to the above two usages would be to make the using impls dependent on an optional zerocopy. These impls aren't core parts of rand, though the loss of Fill for most integer types is significant.

Finally, rand_chacha uses ppv-lite86 which uses zerocopy. rand_chacha is already optional, but enabled by default for rand::rng(). Adoption of chacha20 and #1545 are both possibilities here, though I'm unsure if either will land in a patch release, so we may see rand v0.10 at some point.

@mitsuhiko
Copy link

@dhardy ppv-lite86 did not depend on zerocopy until recently and the total number of unsafe paths that was removed was a fraction of the total unsafe usage in that crate. I think the removal of zerocopy (at least the derive option) in that dependency at least should be discussed. I did open an issue there to have that discussion (cryptocorrosion/cryptocorrosion#84).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests