Support arrays of zeros in Vec's __rust_alloc_zeroed optimization #95362

scottmcm · 2022-03-27T08:05:35Z

I happened to notice in https://users.rust-lang.org/t/any-advantage-of-box-u64-16-16-16-over-vec-u64/73500/3?u=scottmcm that the calloc optimization wasn't applying to vectors-of-arrays, so here's the easy fix for that.

rust-highfive · 2022-03-27T08:05:38Z

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

Dylan-DPC · 2022-04-28T19:31:06Z

r? @Mark-Simulacrum

Mark-Simulacrum · 2022-04-30T22:30:37Z

Hm. The current IsZero implementations were all really cheap -- basically a single cmp -- while scanning an array to check it's all zeroes is comparatively expensive. That said, I think we don't currently use it for anything except the calloc optimization, and there I think it is likely to be a win to know up front you're allocating a set of zeroes.

I suppose with the possible exception where the array is long and total number of elements is low, so the extra pass over the array ends up being more expensive than the knowledge the time saved with memset(0) over memcpy. But probably that case is quite rare.

Mark-Simulacrum · 2022-04-30T22:30:54Z

r=me if you agree with that rationale, otherwise we can discuss more.

nbdd0121 · 2022-04-30T22:37:28Z

Yeah, I think this'll be too expensive for large array. It might make sense to do this only when its size is small enough, e.g. containable in a cache line.

Mark-Simulacrum · 2022-04-30T22:42:54Z

Part of my worry is that due to various processor optimizations and e.g. the ability to calloc by directly asking the kernel for zeroed pages (in theory), there's a really hard balance to strike for this kind of thing. So for short arrays it definitely seems reasonable -- where the cost of checking is quite low. I guess it's purely an optimization though, so should be easy to make that happen.

nbdd0121 · 2022-04-30T22:45:50Z

The issue is that if the array is a chain of zeros followed by a non-zero byte, then we spend all time checking it without using the calloc path.

scottmcm · 2022-04-30T23:50:04Z

Yes, the trait isn't even mem::IsZero, it's alloc::vec::IsZero, because it's only used for this.

Basically, the odds of a particularly long only-non-zero-at-the-end array getting passed to vec![x; n] seem low enough to me that it's ok as-is. And we can always tweak the check later if needed.

@bors r=Mark-Simulacrum

(It's not obvious where the threshold for stopping would be if we did make one, because I suspect that the things passed to vec! are often literals, where even something like vec![[0_u8; 4096]; n] is completely fine because the check would be const-folded.)

Oh, and in the long-term I look forward to having something from safe transmute so we'll be able to just have a Vec::zeroed(n) that's safe by using those traits to ensure sure values are safe to construct, so that people will be able to get this directly and obviously, rather than needing these tricks internally. At that point restricting it to shorter arrays would make more sense to me, since with runtime values it's less obviously useful to have this check.

bors · 2022-04-30T23:50:06Z

📌 Commit 8034c45 has been approved by Mark-Simulacrum

nbdd0121 · 2022-04-30T23:52:45Z

We could have different optimization for literals/values visible to optimizer. I am not convinced that a runtime check is a good one.

nbdd0121 · 2022-05-01T00:41:55Z

(It's not obvious where the threshold for stopping would be if we did make one, because I suspect that the things passed to vec! are often literals, where even something like vec![[0_u8; 4096]; n] is completely fine because the check would be const-folded.)

This doesn't seem true: https://godbolt.org/z/KcP1sjEj3

bors · 2022-05-01T00:50:55Z

⌛ Testing commit 8034c45 with merge bf61143...

bors · 2022-05-01T03:18:47Z

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing bf61143 to master...

rust-timer · 2022-05-01T04:37:05Z

Finished benchmarking commit (bf61143): comparison url.

Summary: This benchmark run did not return any relevant results.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

…acrum Tweak the vec-calloc runtime check to only apply to shortish-arrays r? `@Mark-Simulacrum` `@nbdd0121` pointed out in rust-lang#95362 (comment) that LLVM currently doesn't constant-fold the `IsZero` check for long arrays, so that seems like a reasonable justification for limiting it. It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR. Maybe it's a ["the number of inlining shall be three"](https://youtu.be/s4wnuiCwTGU?t=320) sort of situation. Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.

ChayimFriedman2 · 2022-05-06T01:24:26Z

(It's not obvious where the threshold for stopping would be if we did make one, because I suspect that the things passed to vec! are often literals, where even something like vec![[0_u8; 4096]; n] is completely fine because the check would be const-folded.)

This doesn't seem true: https://godbolt.org/z/KcP1sjEj3

Test shows that current LLVM threshold is up to 98 elements (including), so it may make sense to limit N?

nbdd0121 · 2022-05-06T01:26:21Z

The value I got is 49 elements.

scottmcm · 2022-05-06T01:28:15Z

@ChayimFriedman2 I also got 49 -- but see #96596 where I've already limited this check.

ChayimFriedman2 · 2022-05-06T03:57:50Z

The value I got is 49 elements.

I used -Copt-level=3, you probably used -O which is opt level 2.

Tweak the vec-calloc runtime check to only apply to shortish-arrays r? `@Mark-Simulacrum` `@nbdd0121` pointed out in rust-lang/rust#95362 (comment) that LLVM currently doesn't constant-fold the `IsZero` check for long arrays, so that seems like a reasonable justification for limiting it. It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR. Maybe it's a ["the number of inlining shall be three"](https://youtu.be/s4wnuiCwTGU?t=320) sort of situation. Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.

rust-highfive assigned joshtriplett Mar 27, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 27, 2022

This comment has been minimized.

Sign in to view

Support arrays of zeros in Vec's __rust_alloc_zeroed optimization

8034c45

scottmcm force-pushed the calloc-arrays branch from 373f6ea to 8034c45 Compare March 27, 2022 08:50

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2022

rust-highfive assigned Mark-Simulacrum and unassigned joshtriplett Apr 28, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 30, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label May 1, 2022

bors merged commit bf61143 into rust-lang:master May 1, 2022

rustbot added this to the 1.62.0 milestone May 1, 2022

scottmcm deleted the calloc-arrays branch May 1, 2022 07:03

scottmcm mentioned this pull request May 1, 2022

Tweak the vec-calloc runtime check to only apply to shortish-arrays #96596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arrays of zeros in Vec's __rust_alloc_zeroed optimization #95362

Support arrays of zeros in Vec's __rust_alloc_zeroed optimization #95362

scottmcm commented Mar 27, 2022

rust-highfive commented Mar 27, 2022

This comment has been minimized.

Dylan-DPC commented Apr 28, 2022

Mark-Simulacrum commented Apr 30, 2022

Mark-Simulacrum commented Apr 30, 2022

nbdd0121 commented Apr 30, 2022

Mark-Simulacrum commented Apr 30, 2022

nbdd0121 commented Apr 30, 2022

scottmcm commented Apr 30, 2022 •

edited

Loading

bors commented Apr 30, 2022

nbdd0121 commented Apr 30, 2022

nbdd0121 commented May 1, 2022

bors commented May 1, 2022

bors commented May 1, 2022

rust-timer commented May 1, 2022

ChayimFriedman2 commented May 6, 2022

nbdd0121 commented May 6, 2022

scottmcm commented May 6, 2022

ChayimFriedman2 commented May 6, 2022

Support arrays of zeros in Vec's __rust_alloc_zeroed optimization #95362

Support arrays of zeros in Vec's __rust_alloc_zeroed optimization #95362

Conversation

scottmcm commented Mar 27, 2022

rust-highfive commented Mar 27, 2022

This comment has been minimized.

Dylan-DPC commented Apr 28, 2022

Mark-Simulacrum commented Apr 30, 2022

Mark-Simulacrum commented Apr 30, 2022

nbdd0121 commented Apr 30, 2022

Mark-Simulacrum commented Apr 30, 2022

nbdd0121 commented Apr 30, 2022

scottmcm commented Apr 30, 2022 • edited Loading

bors commented Apr 30, 2022

nbdd0121 commented Apr 30, 2022

nbdd0121 commented May 1, 2022

bors commented May 1, 2022

bors commented May 1, 2022

rust-timer commented May 1, 2022

ChayimFriedman2 commented May 6, 2022

nbdd0121 commented May 6, 2022

scottmcm commented May 6, 2022

ChayimFriedman2 commented May 6, 2022

scottmcm commented Apr 30, 2022 •

edited

Loading