-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS #77496
Conversation
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit b0a249415002259095836413843f80f033c3d782 with merge 097a1aa29aaea51d74064470a76793b00bb9ec11... |
CC @pickfire since you wanted to review vec changes |
☀️ Try build successful - checks-actions, checks-azure |
Queued 097a1aa29aaea51d74064470a76793b00bb9ec11 with parent 738d4a7, future comparison URL. |
Finished benchmarking try commit (097a1aa29aaea51d74064470a76793b00bb9ec11): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Nice, moderate improvements across the board :) |
Oh wait I was looking at bootstrap - the instruction counts seem to be about the same. |
b0a2494
to
c5af975
Compare
b8af850
to
8a0a13a
Compare
|
/// * `offset == 0` is always valid | ||
/// * `offset` must be positive | ||
unsafe fn into_vec(self, offset: isize) -> Vec<T> { | ||
let dst = unsafe { self.buf.as_ptr().offset(offset) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably use add
instead of offset
for usize
, but I don't know if the compiler will optimize out the offset
if it is 0
? Does it or should we have two functions, one with offset and one without?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is a constant in the other callsite, so it should be easy to optimize for llvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @lzutao to see if he's interested to check this out
&& self.capacity() - self.len() < iterator.len() | ||
&& iterator.cap - iterator.len() >= self.len() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified as?
&& self.capacity() - self.len() < iterator.len() | |
&& iterator.cap - iterator.len() >= self.len() | |
&& iterator.len() - self.len() < iterator.cap - self.capacity() |
I wonder if this will always grow the vec if the iterator is larger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed change would underflow if self is larger than iterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically still an overflow, but that iterator.len() - self.len()
would panic or wrap if, say, self.capacity() == 20_000
and self.len() == 19_950
, and iterator.len() == 100
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the case where self.capacity() > iterator.cap
. The subtraction would underflow the usize
result and thus lead to the inequality unexpectedly evaluating to true
which would then violate the safety constraints of into_vec_with_uninit_prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how about?
&& self.capacity() - self.len() < iterator.len() | |
&& iterator.cap - iterator.len() >= self.len() | |
&& iterator.len().saturating_sub(self.len()) < iterator.cap.saturating_sub(self.capacity()) |
I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a self with len == 2 && cap == 2
and an iterator len == 2 && cap == 3
that would evaluate to true and attempt to store 4 elements into an allocation of 3. 💣💥
Significant parts were changed since then, yes. |
// ² memcpy | ||
// ³ memmove | ||
// ⁴ free | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That line was intentionally left blank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think following the rest of the codebase convention here would have this blank line not have the comment //
, and just be empty.
// ³ into_vec ____------BBBB____-------------- Vec(0x00, 0, 4) Vec(0x0a, 4, 8) | ||
// ⁴ *self = v ----------BBBB____-------------- Vec(0x0a, 4, 8) | ||
// | ||
// ## empty self, pristine iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even check for this case below?
By the way, nice diagram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into_vec_with_uninit_prefix
is pretty much just a struct conversion when there's nothing to move.
library/alloc/src/vec.rs
Outdated
// ## insufficient capacity | ||
// | ||
// [initial] AAAAA-----BBBBBB__-------------- Vec(0x00, 5, 5) IntoIter(0x0a, 0x0a, 0x0f, 8) | ||
// ¹² reserve(6) ----------BBBBBB__--AAAAA______- Vec(0x14, 5, 11) IntoIter(0x0a, 0x0a, 0x0f, 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, is this correct or maybe I think wrongly. Shouldn't reserve
only have one malloc
(realloc) to reallocate the memory when the alignment is the same which in this case should be the same, why does it needs a memmove
?
Lines 169 to 183 in e055f87
// SAFETY: `new_size` is non-zero as `old_size` is greater than or equal to `new_size` | |
// as required by safety conditions. Other conditions must be upheld by the caller | |
old_size if old_layout.align() == new_layout.align() => unsafe { | |
let new_size = new_layout.size(); | |
// `realloc` probably checks for `new_size >= old_layout.size()` or something similar. | |
intrinsics::assume(new_size >= old_layout.size()); | |
let raw_ptr = GlobalAlloc::realloc(self, ptr.as_ptr(), old_layout, new_size); | |
let ptr = NonNull::new(raw_ptr).ok_or(AllocError)?; | |
if zeroed { | |
raw_ptr.add(old_size).write_bytes(0, new_size - old_size); | |
} | |
Ok(NonNull::slice_from_raw_parts(ptr, new_size)) | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many cases realloc is just malloc+memcpy+free. Only in some cases it can extend the allocation in place.
} | ||
iterator.ptr = iterator.end; | ||
iterator.move_into(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cooler if the code are linked to the cool diagram.
iterator.move_into(self); | |
// Insufficient capacity | |
iterator.move_into(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes and no.
With the optimization present that is indeed all it covers. But it is the general codepath that also works without the optimization. So I don't want to give the impression that it can only handle that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but I think we should run the timer again.
@jyn514 poke for another perf run. |
@bors try @rust-timer queue Thanks for the ping, I'd forgotten about this. |
Awaiting bors try build completion |
⌛ Trying commit 1885f38 with merge d99d46fbeabf815d807dea479ae158f7ae9041c2... |
☀️ Try build successful - checks-actions, checks-azure |
Queued d99d46fbeabf815d807dea479ae158f7ae9041c2 with parent b5c9e24, future comparison URL. |
Finished benchmarking try commit (d99d46fbeabf815d807dea479ae158f7ae9041c2): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Slight improvements in the bootstrap timings, otherwise not much change. Not sure if it's worth it. |
Slight change but the memory did improve, less memory used. |
I am pretty sure that the memory usage here is just noise. Noise in the 20% range is not unusual for max-rss. I think the improvements on the bootstrap timing is also likely noise, though that's less clear. |
@the8472 Hmm, was there a usecase you had in mind originally when optimizing this case? |
@KodrAus just trying to reduce allocations and memcpys. But looks like it didn't work out. |
I tried a broader version of this optimization in #70793 but it regressed compile times because it emitted more IR for every single use of
extend()
and thus had to be removed from that PR. This attempt is narrower since it only applies to cases where we're extending/appending directly from another vec.The aim is to improve runtime behavior, not necessarily compile time but I still want a perf-run to make sure it's not regressing significantly unlike the original attempt.