-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - Zero-Sized References #2040
Conversation
Is it possible to do this in a library with |
The main purpose is to help with polymorphic code, that usually doesn't deal with ZST. |
IIRC we haven't pursued this idea in the past because of the potential for breakage involved. |
text/0000-zero-sized-references.md
Outdated
# How We Teach This | ||
[how-we-teach-this]: #how-we-teach-this | ||
|
||
For most rust users, this change will be invisible. Thier code will just become a tiny bit smaller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thier -> Their
Users of unsafe rust might encounter this case. Therefore there should probably be a note in the nomicon that references might be optimized away for ZST. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also mess with monomorphization, because you can't know the size of a reference anymore without knowing the type. I'm not sure of the implications for generic functions. Currently transmute<&T, usize>
works inside of generic functions. But we had the same issue with transmuting fn pointers, so there's some precedent for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's already the case: &mut Trait
has a different size than &mut i32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do it based on sized-ness and the kind of "maybe-DST tail" when that is unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sized-ness any easier to track than zero-sizedness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sized-ness is "easier" to track than zero-sizedness, because a type parameter is assumed sized unless one explicitly declares otherwise via the Sized?
marker. Once you have <T: Sized?>
, then the generic code cannot make assumptions about the sized-ness of T
.
other_ffi_function(x); | ||
``` | ||
We might want to represent a pointer we got from FFI and checked it's not null as a reference to an object. | ||
Since that object is not accessable directly we represent it as an empty struct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc that is not how external structures should be represented, so breaking that at compile-time sounds like a win to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik that's currently the best way to represent external structures, see the discussion on the extern type
RFC
|
||
### Mitigating breakage | ||
|
||
Safe rust code should be affected positively by this change. However unsafe code might break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a dangerous statement. Unsafe code isn't any less stable than safe code. I think there should be some review of existing code that might break before deciding whether to do this. If all the unsafe code that breaks is some code that should never have been written in the first place because there's better ways to do it, then imo that would be ok.
text/0000-zero-sized-references.md
Outdated
|
||
### Specific examples - Pro | ||
|
||
The RFC isn't well-justified until it has at least one detailed use case where it helps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that it's enough justification that this will solve all the issues around addresses and zsts once and for all.
I disagree completely with this change. |
This would also make references not |
Shouldn't |
"Breaking code" is a drawback big enough to make this Rust 2.0 material |
Even in an hypothetical Rust 2.0 (which by the way I hope won’t happen for many years!), breaking code that uses A zero-size reference-like thing can be useful, but I think it should be a separate type. Something like:
|
A couple points I want to add (a bit meta):
Also, clarification about breaking code: I want to find a solution that will not break valid unsafe code. I have no strong opinion about breaking non-valid unsafe code. However, since I have little knowledge about much of the unsafe code in the wild, the specifics of these should be discussed here. |
AFAICT there's two basic use cases here, which are both valid, but in tension:
How about an opt-in approach where authors can tag specific zero-sized types with "references to this type should also be ZSTs"? That satisfies the allocator use case (with a bit of manual annotation) without affecting any existing code. (My suspicion is that generic code would not pose a problem either because Rust implements it exclusively with monomorphization, but I haven't thought about it very deeply!) |
You can transmute from/to |
@eddyb Is there actually generic code in the wild which does it? (I'm not doubtful, simply ignorant.) |
Yes, we had to add a special-case because there were about a dozen (IIRC) crates which transmuted between pointers to |
@RalfJung no. If you want more info, hit me up on irc, I don't want to write it up here. Basically, it's to do with correct types being the best documentation. If you expect a non-null pointer that doesn't alias anything else, take an
is much less informative than the (should eventually be equivalent):
|
@ubsan I would argue that the But maybe that doesn't actually work so well, and anyway it's enough for some people to write libraries the way you propose to make this a problem. |
@eddyb Hmm, what if both pointers and references to such opted-in types were ZSTs, with a fixed address of 0xAlign? Would that solve the |
What does that mean? A by-value coercion? Then this will still silently break code which doesn't use |
@eddyb Can you give an example of what kind of code you're thinking of? I just mean that both Hmm... that would break |
@glaebhoerl We really shouldn't touch
@eddyb Pointers to @ubsan The specific examples you showed didn't use a ZST. For examples with ZST, the next comes to mind: struct OpaqueStruct;
extern "C" {
fn some_fn(s: &OpaqueStruct);
} but I think it could be enough to allow writing the next unsafe impl !Sized for OpaqueStruct {} and give an error that hints of the fix, something like Note that writing the |
@EpicatSupercell My point was that there were even transmutes between |
@eddyb Is it valid to transmute between pointers to different known sizes? So transmute Edit: Guess it is... Think it's gonna be a tough one. Is it invalid to transmute between |
@EpicatSupercell It has always been valid. Transmute only cares about immediate size, which is the same for all pointers to sized types.
Strictly speaking, Sadly rust-lang/rust#32939 doesn't seem to link to the original crater report, so I can't show you the examples for |
...Yes. That's what I said. |
@eddyb Actually since |
#1861 is already being merged, thankfully. |
With the approaching resolution of #1861 we get a tool for FFI to solve some concerns. The other concerns are about unspecified behavior. I am not in favor of breaking code, so right now this RFC is inapplicable. However, with the additions of epochs, a new possibility opens for us - allowing only references from a newer epoch to be ZST, while keeping all current code working. This RFC sounds fitting for a test run of epochs (assuming it's accepted at the time in the first place). I'm not sure if now is a fitting time for it or not. On a side note, due to personal circumstances, I didn't have the time to manage this RFC at all this past month :( |
A requirement for epoch's is that "if your code compiles in the previous epoch without warnings, then it will keep working in the next epoch". There cannot be silent breakage. So we'd need to be able to reliably warn for any code that relies on the data layout of |
It sounds possible to me. You make &ZST be Zero-Sized in the new epoch (= new code) and non Zero-Sized in the old epoch. When calling a function from old epoch or using a struct from old epoch you generate a value for the reference so it's no longer Zero-Sized, and when you receive a value from one you drop it so it's Zero-Sized again. Old code keeps working because it's never ZST, new code will not allow you to do illegal operations anyway. |
That's still not epoch-compatible -- you can't just "generate a value for the reference", the value has to still be correct if you don't want to break backwards compatibility. Remember, using And the struct thing won't work at all, we can't have different representations for the same struct across crates; it would not allow them to be shared. Epoches allow for breaking changes that are not in the core language, as defined in the RFC. I'd argue that this is definitely a change to the core language since it observably changes representations. |
Different representation depending on where it is defined, not used. Therefore a use of an epoch 1 struct in epoch 2 code will follow conventions of epoch 1 struct, while checking the size of an epoch 2 struct containing &ZST will return size 0 even in epoch 1 code.
I come with the assumption that #1861 becomes the standard and |
The compiler would have to emit warnings wherever code would not work for ZST references. This includes FFI, casts to/from usize, various transmutes, raw pointers of ZST types, etc. The problematic part, however, is that generic code couldn't do those things on any generic references at all, unless a new auto-trait for non-ZSTs is added first (or ZSTs were removed from the |
With respect to interaction between epochs, imagine a specific case where epoch 2 code uses an epoch 1 FFI library that, for whatever irrelevant reason, passes around opaque payload as type So for one thing, epoch 2 would need to understand its own In other words, it would be possible, but it would be no small feat. |
Guess it could be possible in that case to create a type called My main reason I try and find a way to make this work is because I fear that if we decide a feature as simple (in my opinion) as this can't be used with Epochs, we will never be able to add any serious features without a major version (Rust 2.0). And with the collective desire to not go that route, I fear the language will stagnate even faster than C. |
I have some thoughts. I'm afraid I have no conclusions. It seems to me there are two main ways to try to remove the overhead of references to ZSTs: make them ZSTs themselves (as proposed here), or try to eliminate them through optimization. Making them ZSTsThe former option is a breaking change to the language. I agree that it could probably be accomplished in a linking-compatible way by using something like a pair of The next question is, is it worth it? Even with a mechanism for making breaking changes to the language, the desire is still to keep them small and rare. We would also be committing to keeping both types of references in the compiler for the foreseeable future. Therefore, I would want to see concrete benefits demonstrated before going down that road, such a real work application where it saves a significant amount of memory or code size. I do agree that it seems reasonable to assume that using ZST references for FFI would become deprecated once we have a better alternative. When I first came to Rust, I actually expected ZST references to also be ZSTs, and was surprised when they weren't. My preference would still be for them to be, I'm just not sure it's worth the breaking change. Aside from backward compatibility, I would like to understand better what generic patterns would be complicated, given that containers et al already have to special case ZSTs in most cases. Are there examples of code that currently works with ZSTs without special casing, but would have to deal with them specially after this change? Is there any code that would silently break? Eliminating ZST references as an optimizationThis can be done either conservatively and aggressively. ConservativelyThis would work by eliminating ZST references when the compiler can prove the address isn't observed, such as when a function never converts the reference to a raw pointer. This could be accomplished through metadata flags, and, like I imagine this approach would be able to avoid passing/storing ZST references here and there, but it wouldn't be reliable, and a change to a leaf function could force a ZST reference to be stored through most of the call graph. AggressivelyI see this approach as kind of a hybrid. ZST references would take up no space in a lot more cases, but code could still treat ZST references as pointer-sized in generic code. To make this work, the language would specify that for any This would allow the compiler to always eliminate passing and returning a bare This approach obviously requires that using Finally, this could break some non-FFI code. The most obvious example I can think of is something like using https://github.com/Diggsey/rust-field-offset with a zero-sized field. In debug mode, calculating the offset would underflow and panic. In release mode, applying the offset to a given struct would result in a pointer to a random location. Now, dereferencing a random point to a ZST is unlikely to actually cause any trouble, since there will never be an actual read and write to that location. Further the address would effectively get normalized back to the proper one when |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period is now complete. |
Thanks all for the discussion, and @EpicatSupercell for the RFC! I'm going to close as per discussion and FCP, but there are still alternative avenues to explore here! |
Can someone explain why this is? What is preventing |
@kevincox For example they might use ZST’s to represent opaque types that are only manipulated through pointers or references (say, C++ objects whose memory layout might change across platforms). Using references over pointers is useful to get lifetime tracking. Then when it’s time to do something with that reference, it is cast back to a pointer and passed to some FFI function and needs to have preserved its address. |
@SimonSapin That sounds like the general reason this RFC was denied, but I don't see how it translates to the |
- Saves one or more characters in parameter declarations. - Reduces the memory consumption, run-time overhead, and code size because `WM` can be a zero-sized type while `&WM` cannot. (rust-lang/rfcs#2040
It is indeed possible to do zero-sized references without breaking existing code: struct S<'a> {
#[allow_zero_sized]
pub e: &'a Empty,
} Then Please implement this feature. |
Rendered