Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment of the bytes of Allocation to match align parameter #100467

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
0fe75b8
Stop x.py from being mean
maurer Aug 2, 2022
f37fe37
DO NOT MERGE - scratchwork
maurer Aug 3, 2022
fde4235
point to my miri fork
Aug 3, 2022
e0e8e00
no more direct hole punch to alloc bytes
Aug 4, 2022
a7b7f19
no more holepunch to bytes
Aug 4, 2022
f7a991b
commented out alignment check for int-aligned addrs -- need to make t…
Aug 6, 2022
e2ed272
proper check for size of allocation being aligned
Aug 7, 2022
51269b8
using the align parameter to properly align allocations
Aug 8, 2022
bca203e
redoing allocation bytes realignment -- still testing with miri
Aug 12, 2022
0ddff36
allocation bytes alignment, and cleanup .gitmodules
Aug 12, 2022
a72a057
double free detected in tcache 2: could not compile core in stage 1 c…
Aug 12, 2022
04f29dc
using slice directly, no intermediate vec
Aug 12, 2022
2fd7606
removing miri submodule updates
Aug 15, 2022
ab1a61f
removing miri submodule updates
Aug 15, 2022
b87f5ef
removing miri submodule updates
Aug 15, 2022
cade1c1
going back to previous version of miri to match commit of rustc
Aug 15, 2022
c31d404
moving AllocBytes into a trait -- partially done
emarteca Sep 5, 2022
e993680
more moving allocbytes into trait
emarteca Sep 7, 2022
17ac36b
allocbytes moved into a trait, implemented for `Box<[u8]>` as default
emarteca Sep 7, 2022
c2e142b
merging in changes from upstream master
emarteca Sep 8, 2022
99f6708
cleanup
emarteca Sep 8, 2022
a12d111
cleanup
emarteca Sep 8, 2022
f75649b
propagating Allocation Bytes type
emarteca Sep 14, 2022
8db066f
adding deref and derefmut to allocbytes, removing now unnecessary met…
emarteca Nov 9, 2022
f075a12
nit
emarteca Nov 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions compiler/rustc_const_eval/src/interpret/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,13 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
)
}

/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists.
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists.
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> {
/// Get the base address for the bytes in an `Allocation` specified by the
/// `AllocID` passed in; error if no such allocation exists. The address will
/// be exposed (on the host system!).
///
/// It is up to the caller to take sufficient care when using this address:
/// there could be provenance or uninit memory in there, and other memory
/// accesses could invalidate the exposed pointer.
pub fn expose_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> {

Also note that this only allows read-only access to the bytes. If C code writes to this memory, it's writing to a shared reference! So I think you probably want to make this mut.

let alloc = self.get_alloc_raw(id)?;
Ok(alloc.expose_base_addr())
}

/// Gives raw access to the `Allocation`, without bounds or alignment checks.
/// The caller is responsible for calling the access hooks!
///
Expand Down
2 changes: 2 additions & 0 deletions compiler/rustc_middle/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
//! This API is completely unstable and subject to change.

#![doc(html_root_url = "https://doc.rust-lang.org/nightly/nightly-rustc/")]
#![feature(maybe_uninit_write_slice)]
#![feature(allocator_api)]
#![feature(array_windows)]
#![feature(assert_matches)]
Expand Down Expand Up @@ -59,6 +60,7 @@
#![feature(intra_doc_pointers)]
#![feature(yeet_expr)]
#![feature(const_option)]
#![feature(vec_into_raw_parts)]
#![recursion_limit = "512"]
#![allow(rustc::potential_query_instability)]

Expand Down
156 changes: 133 additions & 23 deletions compiler/rustc_middle/src/mir/interpret/allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use std::borrow::Cow;
use std::convert::{TryFrom, TryInto};
use std::fmt;
use std::hash;
use std::hash::Hash;
use std::iter;
use std::ops::{Deref, Range};
use std::ptr;
Expand All @@ -21,6 +22,98 @@ use super::{
};
use crate::ty;

/// Functionality required for the bytes of an `Allocation`.
pub trait AllocBytes:
Clone + core::fmt::Debug + Eq + PartialEq + PartialOrd + Ord + core::hash::Hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this also have Deref<Target = [u8]>, you can remove everything but adjust_to_align, from_bytes and uninit and should have to adjust much less code as it would pick up the slice methods that Box<[u8]> also exposes via Deref.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! I was hoping to keep AllocBytes more general though, so I can use it to represent foreign memory in Miri, that isn't represented by a [u8]. Given that, do you think it's ok to keep these extra functions in AllocBytes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of memory are you thinking of? You already support get_slice_from_range, which returns a &[u8]. Most of what I'm proposing is to get rid of that method and require the implementor of AllocBytes to put the logic into the Deref::deref method instead. Or are you planning to change the return type of the get_slice methods?

Copy link
Author

@emarteca emarteca Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if we can make a custom get_slice method with the same return type then that should work! In my implementation these methods still return &[u8] -- my current representation of foreign memory is represented as a pair of machine address and length, and the get_slice methods build the slice from std::slice::from_raw_parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, so you could implement Deref and DerefMut for your foreign memory representation and have the deref and deref_mut methods use from_raw_parts. Then you can remove the get_slice* methods and others from the trait, as you get them for free once Deref is a super trait

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see -- that's smart, thanks for the suggestion! (Will push these changes soon)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the trait has a lot more methods than I was expecting so this sounds good.

{
/// The length of the bytes.
fn get_len(&self) -> usize;
/// The address of the bytes.
fn expose_addr(&self) -> u64;
Comment on lines +37 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to make this return a raw pointer, and leave exposing and things like that to the caller.

And in fact... why is this even needed? One can use the deref to get such a pointer, right?

/// Get a slice of the bytes corresponding to a specified range.
fn get_slice_from_range(&self, range: Range<usize>) -> &[u8];
/// Mutable slice of the bytes corresponding to a specified range.
fn get_slice_from_range_mut<'a>(&'a mut self, range: Range<usize>) -> &'a mut [u8];
/// Add to the pointer of the head of the bytes, and return a mutable pointer to this location.
fn add_ptr(&mut self, to_add: usize) -> *mut u8;
/// Hash the head and tail of the bytes.
/// This is required to statisfy the `Hash` trait.
fn hash_head_tail<H: hash::Hasher>(
&self,
_byte_count: usize,
_state: &mut H,
_max_bytes_to_hash: usize,
) {
}
/// Adjust the bytes to the specified alignment -- by default, this is a no-op.
fn adjust_to_align(self, _align: Align) -> Self {
self
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this about? Alignment should be set at allocation time.

/// Create an `AllocBytes` from a slice of `u8`.
fn from_bytes<'a>(slice: impl Into<Cow<'a, [u8]>>, _align: Align) -> Self;
/// Create an uninitialized `AllocBytes` of the specified size and alignment;
/// call the callback error handler if there is an error in allocating the memory.
fn uninit<'tcx, F: Fn() -> InterpError<'tcx>>(
size: Size,
_align: Align,
handle_alloc_fail: F,
) -> Result<Self, InterpError<'tcx>>;
}

// Default `bytes` for `Allocation` is a `Box<[u8]>`.
impl AllocBytes for Box<[u8]> {
fn uninit<'tcx, F: Fn() -> InterpError<'tcx>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not uninit though, it returns a zeroed buffer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh that's true -- I just took this code from here where it was originally in the file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire allocation is considered uninit based on its InitMask, yeah. But looking just at the bytes, which is the level of abstraction you have here, this is zeroed. The bytes cannot be uninit.

size: Size,
_align: Align,
handle_alloc_fail: F,
) -> Result<Self, InterpError<'tcx>> {
let bytes = Box::<[u8]>::try_new_zeroed_slice(size.bytes_usize())
.map_err(|_| handle_alloc_fail())?;
// SAFETY: the box was zero-allocated, which is a valid initial value for Box<[u8]>
let bytes = unsafe { bytes.assume_init() };
emarteca marked this conversation as resolved.
Show resolved Hide resolved
Ok(bytes)
}

fn from_bytes<'a>(slice: impl Into<Cow<'a, [u8]>>, _align: Align) -> Self {
Box::<[u8]>::from(slice.into())
}

/// The length of the bytes.
fn get_len(&self) -> usize {
self.len()
}

/// The real address of the bytes.
fn expose_addr(&self) -> u64 {
self.as_ptr() as u64
}

/// Slice of the bytes, for a specified range.
fn get_slice_from_range(&self, range: Range<usize>) -> &[u8] {
&self[range]
}

/// Mutable slice of the bytes, for a specified range.
fn get_slice_from_range_mut<'a>(&'a mut self, range: Range<usize>) -> &'a mut [u8] {
&mut self[range]
}

/// Pointer addition to the base address of the bytes.
fn add_ptr(&mut self, to_add: usize) -> *mut u8 {
self.as_mut_ptr().wrapping_add(to_add)
}

fn hash_head_tail<H: hash::Hasher>(
&self,
_byte_count: usize,
_state: &mut H,
_max_bytes_to_hash: usize,
) {
self[.._max_bytes_to_hash].hash(_state);
self[_byte_count - _max_bytes_to_hash..].hash(_state);
}
}

/// This type represents an Allocation in the Miri/CTFE core engine.
///
/// Its public API is rather low-level, working directly with allocation offsets and a custom error
Expand All @@ -30,10 +123,10 @@ use crate::ty;
// hashed. (see the `Hash` impl below for more details), so the impl is not derived.
#[derive(Clone, Debug, Eq, PartialEq, PartialOrd, Ord, TyEncodable, TyDecodable)]
#[derive(HashStable)]
pub struct Allocation<Prov = AllocId, Extra = ()> {
pub struct Allocation<Prov = AllocId, Extra = (), Bytes = Box<[u8]>> {
/// The actual bytes of the allocation.
/// Note that the bytes of a pointer represent the offset of the pointer.
bytes: Box<[u8]>,
bytes: Bytes,
/// Maps from byte addresses to extra provenance data for each pointer.
/// Only the first byte of a pointer is inserted into the map; i.e.,
/// every entry in this map applies to `pointer_size` consecutive bytes starting
Expand Down Expand Up @@ -71,14 +164,13 @@ impl hash::Hash for Allocation {
fn hash<H: hash::Hasher>(&self, state: &mut H) {
// Partially hash the `bytes` buffer when it is large. To limit collisions with common
// prefixes and suffixes, we hash the length and some slices of the buffer.
let byte_count = self.bytes.len();
let byte_count = self.bytes.get_len();
if byte_count > MAX_HASHED_BUFFER_LEN {
// Hash the buffer's length.
byte_count.hash(state);

// And its head and tail.
self.bytes[..MAX_BYTES_TO_HASH].hash(state);
self.bytes[byte_count - MAX_BYTES_TO_HASH..].hash(state);
// And its head and tail, if it is a Box<[u8]>.
self.bytes.hash_head_tail(byte_count, state, MAX_BYTES_TO_HASH);
} else {
self.bytes.hash(state);
}
Expand Down Expand Up @@ -205,15 +297,17 @@ impl AllocRange {
}

// The constructors are all without extra; the extra gets added by a machine hook later.
impl<Prov> Allocation<Prov> {
impl<Prov, Bytes: AllocBytes> Allocation<Prov, (), Bytes> {
/// Creates an allocation initialized by the given bytes
// FIXME! ellen make this generic for bytes
pub fn from_bytes<'a>(
slice: impl Into<Cow<'a, [u8]>>,
align: Align,
mutability: Mutability,
) -> Self {
let bytes = Box::<[u8]>::from(slice.into());
let size = Size::from_bytes(bytes.len());
let bytes = Bytes::from_bytes(slice, align);
let size = Size::from_bytes(bytes.get_len());

Self {
bytes,
provenance: ProvenanceMap::new(),
Expand All @@ -233,7 +327,7 @@ impl<Prov> Allocation<Prov> {
///
/// If `panic_on_fail` is true, this will never return `Err`.
pub fn uninit<'tcx>(size: Size, align: Align, panic_on_fail: bool) -> InterpResult<'tcx, Self> {
let bytes = Box::<[u8]>::try_new_zeroed_slice(size.bytes_usize()).map_err(|_| {
let handle_alloc_fail = || -> InterpError<'tcx> {
// This results in an error that can happen non-deterministically, since the memory
// available to the compiler can change between runs. Normally queries are always
// deterministic. However, we can be non-deterministic here because all uses of const
Expand All @@ -246,9 +340,10 @@ impl<Prov> Allocation<Prov> {
tcx.sess.delay_span_bug(DUMMY_SP, "exhausted memory during interpretation")
});
InterpError::ResourceExhaustion(ResourceExhaustionInfo::MemoryExhausted)
})?;
// SAFETY: the box was zero-allocated, which is a valid initial value for Box<[u8]>
let bytes = unsafe { bytes.assume_init() };
};

let bytes = Bytes::uninit(size, align, handle_alloc_fail)?;

Ok(Allocation {
bytes,
provenance: ProvenanceMap::new(),
Expand All @@ -260,29 +355,33 @@ impl<Prov> Allocation<Prov> {
}
}

impl Allocation {
impl<Bytes: AllocBytes> Allocation<AllocId, (), Bytes> {
/// Adjust allocation from the ones in tcx to a custom Machine instance
/// with a different Provenance and Extra type.
// FIXME! ellen make this generic for Bytes
pub fn adjust_from_tcx<Prov, Extra, Err>(
self,
cx: &impl HasDataLayout,
extra: Extra,
mut adjust_ptr: impl FnMut(Pointer<AllocId>) -> Result<Pointer<Prov>, Err>,
) -> Result<Allocation<Prov, Extra>, Err> {
) -> Result<Allocation<Prov, Extra, Bytes>, Err> {
// Compute new pointer provenance, which also adjusts the bytes.
let mut bytes = self.bytes;
// Realign the pointer
let mut bytes = self.bytes.adjust_to_align(self.align);

let mut new_provenance = Vec::with_capacity(self.provenance.0.len());
let ptr_size = cx.data_layout().pointer_size.bytes_usize();
let endian = cx.data_layout().endian;
for &(offset, alloc_id) in self.provenance.iter() {
let idx = offset.bytes_usize();
let ptr_bytes = &mut bytes[idx..idx + ptr_size];
let ptr_bytes = bytes.get_slice_from_range_mut(idx..idx + ptr_size); //&mut bytes[idx..idx + ptr_size];
let bits = read_target_uint(endian, ptr_bytes).unwrap();
let (ptr_prov, ptr_offset) =
adjust_ptr(Pointer::new(alloc_id, Size::from_bytes(bits)))?.into_parts();
write_target_uint(endian, ptr_bytes, ptr_offset.bytes().into()).unwrap();
new_provenance.push((offset, ptr_prov));
}

// Create allocation.
Ok(Allocation {
bytes,
Expand All @@ -298,7 +397,7 @@ impl Allocation {
/// Raw accessors. Provide access to otherwise private bytes.
impl<Prov, Extra> Allocation<Prov, Extra> {
pub fn len(&self) -> usize {
self.bytes.len()
self.bytes.get_len()
}

pub fn size(&self) -> Size {
Expand All @@ -310,7 +409,7 @@ impl<Prov, Extra> Allocation<Prov, Extra> {
/// edges) at all.
/// This must not be used for reads affecting the interpreter execution.
pub fn inspect_with_uninit_and_ptr_outside_interpreter(&self, range: Range<usize>) -> &[u8] {
&self.bytes[range]
self.bytes.get_slice_from_range(range)
}

/// Returns the mask indicating which bytes are initialized.
Expand All @@ -326,7 +425,16 @@ impl<Prov, Extra> Allocation<Prov, Extra> {

/// Byte accessors.
impl<Prov: Provenance, Extra> Allocation<Prov, Extra> {
/// This is the entirely abstraction-violating way to just grab the raw bytes without
/// Get the pointer of the [u8] of bytes.
pub fn expose_base_addr(&self) -> usize {
self.bytes.expose_addr().try_into().unwrap()
}

/// The last argument controls whether we error out when there are uninitialized or pointer
/// bytes. However, we *always* error when there are relocations overlapping the edges of the
/// range.
///
/// You should never call this, call `get_bytes` or `get_bytes_with_uninit_and_ptr` instead,
/// caring about provenance or initialization.
///
/// This function also guarantees that the resulting pointer will remain stable
Expand Down Expand Up @@ -372,7 +480,9 @@ impl<Prov: Provenance, Extra> Allocation<Prov, Extra> {
self.mark_init(range, true);
self.clear_provenance(cx, range)?;

Ok(&mut self.bytes[range.start.bytes_usize()..range.end().bytes_usize()])
Ok(self
.bytes
.get_slice_from_range_mut(range.start.bytes_usize()..range.end().bytes_usize()))
}

/// A raw pointer variant of `get_bytes_mut` that avoids invalidating existing aliases into this memory.
Expand All @@ -384,8 +494,8 @@ impl<Prov: Provenance, Extra> Allocation<Prov, Extra> {
self.mark_init(range, true);
self.clear_provenance(cx, range)?;

assert!(range.end().bytes_usize() <= self.bytes.len()); // need to do our own bounds-check
let begin_ptr = self.bytes.as_mut_ptr().wrapping_add(range.start.bytes_usize());
assert!(range.end().bytes_usize() <= self.bytes.get_len()); // need to do our own bounds-check
let begin_ptr = self.bytes.add_ptr(range.start.bytes_usize());
let len = range.end().bytes_usize() - range.start.bytes_usize();
Ok(ptr::slice_from_raw_parts_mut(begin_ptr, len))
}
Expand Down