Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arrays of zeros in Vec's __rust_alloc_zeroed optimization #95362

Merged
merged 1 commit into from
May 1, 2022

Conversation

scottmcm
Copy link
Member

I happened to notice in https://users.rust-lang.org/t/any-advantage-of-box-u64-16-16-16-over-vec-u64/73500/3?u=scottmcm that the calloc optimization wasn't applying to vectors-of-arrays, so here's the easy fix for that.

@rust-highfive
Copy link
Collaborator

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 27, 2022
@rust-log-analyzer

This comment has been minimized.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2022
@Dylan-DPC
Copy link
Member

r? @Mark-Simulacrum

@Mark-Simulacrum
Copy link
Member

Hm. The current IsZero implementations were all really cheap -- basically a single cmp -- while scanning an array to check it's all zeroes is comparatively expensive. That said, I think we don't currently use it for anything except the calloc optimization, and there I think it is likely to be a win to know up front you're allocating a set of zeroes.

I suppose with the possible exception where the array is long and total number of elements is low, so the extra pass over the array ends up being more expensive than the knowledge the time saved with memset(0) over memcpy. But probably that case is quite rare.

@Mark-Simulacrum
Copy link
Member

r=me if you agree with that rationale, otherwise we can discuss more.

@nbdd0121
Copy link
Contributor

Yeah, I think this'll be too expensive for large array. It might make sense to do this only when its size is small enough, e.g. containable in a cache line.

@Mark-Simulacrum
Copy link
Member

Part of my worry is that due to various processor optimizations and e.g. the ability to calloc by directly asking the kernel for zeroed pages (in theory), there's a really hard balance to strike for this kind of thing. So for short arrays it definitely seems reasonable -- where the cost of checking is quite low. I guess it's purely an optimization though, so should be easy to make that happen.

@nbdd0121
Copy link
Contributor

The issue is that if the array is a chain of zeros followed by a non-zero byte, then we spend all time checking it without using the calloc path.

@scottmcm
Copy link
Member Author

scottmcm commented Apr 30, 2022

Yes, the trait isn't even mem::IsZero, it's alloc::vec::IsZero, because it's only used for this.

Basically, the odds of a particularly long only-non-zero-at-the-end array getting passed to vec![x; n] seem low enough to me that it's ok as-is. And we can always tweak the check later if needed.

@bors r=Mark-Simulacrum

(It's not obvious where the threshold for stopping would be if we did make one, because I suspect that the things passed to vec! are often literals, where even something like vec![[0_u8; 4096]; n] is completely fine because the check would be const-folded.)

Oh, and in the long-term I look forward to having something from safe transmute so we'll be able to just have a Vec::zeroed(n) that's safe by using those traits to ensure sure values are safe to construct, so that people will be able to get this directly and obviously, rather than needing these tricks internally. At that point restricting it to shorter arrays would make more sense to me, since with runtime values it's less obviously useful to have this check.

@bors
Copy link
Contributor

bors commented Apr 30, 2022

📌 Commit 8034c45 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 30, 2022
@nbdd0121
Copy link
Contributor

We could have different optimization for literals/values visible to optimizer. I am not convinced that a runtime check is a good one.

@nbdd0121
Copy link
Contributor

nbdd0121 commented May 1, 2022

(It's not obvious where the threshold for stopping would be if we did make one, because I suspect that the things passed to vec! are often literals, where even something like vec![[0_u8; 4096]; n] is completely fine because the check would be const-folded.)

This doesn't seem true: https://godbolt.org/z/KcP1sjEj3

@bors
Copy link
Contributor

bors commented May 1, 2022

⌛ Testing commit 8034c45 with merge bf61143...

@bors
Copy link
Contributor

bors commented May 1, 2022

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing bf61143 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 1, 2022
@bors bors merged commit bf61143 into rust-lang:master May 1, 2022
@rustbot rustbot added this to the 1.62.0 milestone May 1, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bf61143): comparison url.

Summary: This benchmark run did not return any relevant results.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

bors added a commit to rust-lang-ci/rust that referenced this pull request May 2, 2022
…acrum

Tweak the vec-calloc runtime check to only apply to shortish-arrays

r? `@Mark-Simulacrum`

`@nbdd0121` pointed out in rust-lang#95362 (comment) that LLVM currently doesn't constant-fold the `IsZero` check for long arrays, so that seems like a reasonable justification for limiting it.

It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR.  Maybe it's a ["the number of inlining shall be three"](https://youtu.be/s4wnuiCwTGU?t=320) sort of situation.

Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.
@ChayimFriedman2
Copy link
Contributor

(It's not obvious where the threshold for stopping would be if we did make one, because I suspect that the things passed to vec! are often literals, where even something like vec![[0_u8; 4096]; n] is completely fine because the check would be const-folded.)

This doesn't seem true: https://godbolt.org/z/KcP1sjEj3

Test shows that current LLVM threshold is up to 98 elements (including), so it may make sense to limit N?

@nbdd0121
Copy link
Contributor

nbdd0121 commented May 6, 2022

The value I got is 49 elements.

@scottmcm
Copy link
Member Author

scottmcm commented May 6, 2022

@ChayimFriedman2 I also got 49 -- but see #96596 where I've already limited this check.

@ChayimFriedman2
Copy link
Contributor

The value I got is 49 elements.

I used -Copt-level=3, you probably used -O which is opt level 2.

workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Tweak the vec-calloc runtime check to only apply to shortish-arrays

r? `@Mark-Simulacrum`

`@nbdd0121` pointed out in rust-lang/rust#95362 (comment) that LLVM currently doesn't constant-fold the `IsZero` check for long arrays, so that seems like a reasonable justification for limiting it.

It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR.  Maybe it's a ["the number of inlining shall be three"](https://youtu.be/s4wnuiCwTGU?t=320) sort of situation.

Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.