Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate pointer-width integer aliases #1400

Closed
josephlr opened this issue Jun 14, 2019 · 14 comments
Closed

Deprecate pointer-width integer aliases #1400

josephlr opened this issue Jun 14, 2019 · 14 comments

Comments

@josephlr
Copy link
Contributor

Similar to #1304, we guarantee that usize/isize is layout compatible with C's uintptr_t/intptr_t.

From the Rust documentation on usize:

The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.

From the C11 standard (N1570) section 7.20.1.4:

7.20.1.4 Integer types capable of holding object pointers

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t

There is no need to provide type aliases (e.g. libc::uintptr_t) for these, in the same way that we don't provide a libc::c_bool type alias for bool.

What about size_t and ptrdiff_t?

We could potentially deprecate libc::size_t for usize (and libc::ptrdiff_t for isize). This is a more interesting question, as they are defined to be the type returned by a specific operation.

From section 7.19 of the C11 standard:

The types are
ptrdiff_t
which is the signed integer type of the result of subtracting two pointers;
size_t
which is the unsigned integer type of the result of the sizeof operator;

For ptrdiff_t, Rust's pointer methods ptr::offset and ptr::offset_from use isize to represent the "difference between pointers".

For size_t, Rust's core::mem::size_of function returns usize.

@gnzlbg @alexcrichton, I'm not sure what we should do here; I would love to know your thoughts. The argument for deprecation seems stronger for size_t than for ptrdiff_t.

The reason we have size_t/ptrdiff_t and uintptr_t/intptr_t in C is to support segmented addressing, but on every system I could find (including x86 Real Mode) uintptr_t and size_t are the same. Also, Rust already seems to implicitly assume that size_t == usize, this would just make it explicit.

What about ssize_t?

Note that we definitely don't want to deprecate libc::ssize_t because:

  1. Is isn't in the C standard, but in the Open Gorup <sys/types.h> standard
  2. This type is not necessarily isize. In fact, ssize_t only needs to be able to have the range [-1, SSIZE_MAX] and SSIZE_MAX has no relation to size_t's SIZE_MAX.

Also, see #1244 (comment)

What about intmax_t and uintmax_t?

While I did try to address those in #1244 and they are part of the C standard, defining them to either be [u|i]size or [u|i]64 would be wrong, as it would preclude a platform from using a 128-bit integer as it's largest type.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 14, 2019

The UCGs specifies that usize/isize are uintptr_t/intptr_t - see: https://rust-lang.github.io/unsafe-code-guidelines/layout/scalars.html#isize-and-usize); a lot of unsafe code relies on being able to transmute "pointers to sized types" to usize and vice-versa - this is something we cannot break.

That is, we could deprecate libc::{intptr_t, uintptr_t} and tell people to just use usize/isize instead.


Note that because we define isize/usize to be identical to C's intptr_t/uintptr_t, we can't replace size_t/ptrdiff_t with either isize/usize.

This does not only matter for segmented architectures. An example of a modern architecture where this matters is ARM's CHERI. CHERI pointers are 128-bit wide, but they refer to a 64-bit address space - the other 64-bit are used for verification. This means that the maximum number of bytes that can be accessed, the maximum distance between two pointers, etc. are all values for a 64-bit address space. That is, in C, size_t/ptrdiff_t are all 64-bit types, so they could not be replaced with Rust's 128-bit wide usize/isize here.

This is problematic for Rust in other ways. While in C loop indices would be of type size_t/ssize_t (64-bit wide), Rust uses usize for loop indices, so those would be 128-bit wide in Rust. The same applies to the pointer offset functions, size functions, etc. They would all return 128-bit wide integers in Rust. In other words, Rust conflates "the size of a pointer" with "the size of the address space", C does not, and that's the difference between uintptr_t and size_t.

For us to deprecate libc::{size_t, ptrdiff_t} Rust would need to guarantee to never support architectures like CHERI or segmented memory. That would need an RFC to Rust lang.

@briansmith
Copy link

briansmith commented Jun 14, 2019

Note that because we define isize/usize to be identical to C's intptr_t/uintptr_t, we can't replace size_t/ptrdiff_t with either isize/usize.

In fact, libc::size_t is already defined to be usize and lots of code relies on this too. Probably more code relies on usize == libc::size_t than relies on usize == libc::uintptr_t. I don't think either size_t nor uintptr_t should be removed until a final decision has been made with an RFC about what to do about platforms where size_t is smaller than uintptr_t.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 14, 2019

In fact, libc::size_t is already defined to be usize and lots of code relies on this too

Code that relies on libc::size_t being exactly usize is not portable. Most code I see only assumes that libc::size_t is an integer type, which is a correct assumption. That is:

// given a libc API or type that uses `size_t`:
fn libc_foo(x: libc::size_t) {}

// most code I see is portable and does:
libc_foo(4);
// or:
libc_foo(4_i32 as _);

// Code that exploits the actual type of `libc::size_t` is not portable:
libc_foo(4_usize);

From libc's POV, libc_foo does not take an usize, it takes a libc::size_t, and code that relies on libc::size_t being a particular type is not portable. We have broken non-portable code like that many times in the past when fixing bugs in the library.

Sadly, Rust is not expressive enough to enforce this constrain of libc via the type system, which is why non-portable code is accepted by the compiler.

@briansmith
Copy link

It would be a huge breaking change that would require an RFC. The usize == uintpr_t doesn't make sense when you have uintptr_t as 128 or 256 bits and size_t is only 32 or 64 bits. It's a bug in the definition of usize that needs to be fixed.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 14, 2019 via email

@kornelski
Copy link
Contributor

kornelski commented Dec 12, 2019

  • RV128I RISC-V has 128-bit uintptr_t, but 64-bit size_t. Pointers are a (64-bit object ID + 64-bit offset), so it can't address sizes larger than 64-bits, but it has large pointers.

  • CHERI achitecture is a 64-bit ISA (64-bit size_t), but it has 128 or 256-bit pointers tagged with additional information (128/256-bit uintptr_t).

Rust already made a mistake of assuming uintptr_t == size_t, but maybe it should try to back out from it instead of solidifying it further?

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 13, 2019

Rust guarantees that usize is pointer sized, literal quote (emphasis mine):

The pointer-sized unsigned integer type.

Changing that would be a backward incompatible change that would break a lot of extremely subtle unsafe Rust code that goes from pointers to integers and back and that we guarantee that will work forever and that it is portable to all platforms Rust will ever support.

Deprecating uintptr_t does not guarantee anything new, it just makes the libc API consistent with the other integer types, e.g., Rust guarantees that i32 is identical to int32_t, and libc int32_t type alias is also deprecated.

I cannot find any official documentation that guarantees that there is a Rust type that's identical to size_t or ssize_t anywhere, so I don't see anything wrong with deprecating intptr_t/uintptr_t as long as we keep type aliases for size_t and ssize_t.

FWIW @kornelski this allows targeting CHERI and RV128 by just defining size_t = u64, and there is already a rustc branch that is able to target CHERI by doing exactly this. Were we to guarantee anywhere that usize also matches size_t, that would make it impossible for Rust to target either of these targets. The CHERI branch uses 128-bit indices/offsets in Rust in many places where 64-bit indices would suffice because Rust APIs use usize for indexing. This is sub-optimal, but sound. If you want to somehow improve that and have ideas about how to do so, this issue isn't the best place to discuss that, open an internal thread or a rust-lang/rust issue.

If you think the current definition of usize "is a bug" like @briansmith claimed, arguing about that in this issue doesn't change the fact that this is precisely how the Rust API docs define this type. Open an internal threads or a rust-lang/rust issue, ping the lang team, and tell them about the "bug" or ask them to clarify.

@briansmith
Copy link

If you think the current definition of usize "is a bug" like @briansmith claimed, arguing about that in this issue doesn't change the fact that this is precisely how the Rust API docs define this type. Open an internal threads or a rust-lang/rust issue, ping the lang team, and tell them about the "bug" or ask them to clarify.

I agree with that. However, the reason it was brought up in this issue is that, if/when that bug is fixed, then we will need a uintptr_t type.

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 13, 2019

If somebody manages to get the time to start such a discussion thread, please do ping me there. Solving that issue is kind of required to provide really good support for CHERI.

@kornelski
Copy link
Contributor

kornelski commented Dec 13, 2019

I was under impression that usize == size_t, because std::os::raw doesn't even bother to define size_t alias. libc aliases it to usize. So all my FFI crates just use usize directly in place of size_t :(

#include <stddef.h>
size_t len = 0;

bindgen test.h

/* automatically generated by rust-bindgen */
pub const len: usize = 0;

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 13, 2019

That assumption isn't wrong for all targets that rustc currently support, but if you try to compile those crates for CHERI, you would get a compilation error because size_t there is u64 and usize and u64 are different types.

That's not great, but it isn't that bad. Suppose we were to change the definition of usize to mean size_t. Then your code would compile and work, but code that does ptr as usize would still compile, yet subtly fail silently at run-time since the cast would lose information (notice that in Rust ptr as i8 is ok and truncates the pointer value). That wouldn't be great either.

I really don't know of any way to fix this that wouldn't break something, but I hope there is a way to do that. We probably need to throw way more people at this problem to figure out all the alternatives and constraints.

@briansmith
Copy link

Then your code would compile and work, but code that does ptr as usize would still compile, yet subtly fail silently

If we were to implement the breaking change where usize is fixed to be size_t instead of uintptr_t then at the same time we should also fix as so that it is no longer usable for (lossy) conversions.

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 14, 2019

@briansmith we probably want to fix that anyways (independently of what we do with usize) but there are a couple of options, and we need to evaluate the impact, see here: rust-lang/reference#729

@tgross35
Copy link
Contributor

This has been some more discussion about this, e.g. https://internals.rust-lang.org/t/pre-rfc-usize-semantics/19444. I don't think there is any point in changing something here before we know exactly how/if Rust might change (I suspect it will at some point given CHERI).

What that in mind, I am going to close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants