-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage compiling keccak
benchmark
#54208
Comments
Now for NLL. According to perf.rust-lang.org, an "Nll" build of The three allocation sites are here: rust/src/librustc_mir/borrow_check/mod.rs Lines 171 to 197 in 28bcffe
Each rust/src/librustc_mir/dataflow/mod.rs Line 710 in 28bcffe
In each case num_blocks is 25,994, and bits_per_block is 94,972 in the first two and 83,308 in the third.
I tried changing One trivial idea: it looks like @nikomatsakis: any other thoughts here from the algorithmic side? |
I have implemented this in #54211. |
I have implemented this in #54213. |
#54420 improves the non-NLL case some more. |
Because of this, the NLL:non-NLL ratio for |
@nnethercote two questions:
|
I guess this answers my question:
|
@nikomatsakis: I have run out of ideas on this one. If it helps, here is what the
In other words, it is 25994 x 94976 bits (308.6MB), and the rows start off almost entirely set, and by the end drop down to about half set. About 75% of the bits are set. And here's what
It is 25994 x 83328 bits (270.8MB). Apart from the second row, the rows start of almost empty and get fuller until they are 77% full by the end. About 38% of the bits are set. I didn't look at I can't see how to represent this data more compactly, and I don't understand the algorithm in enough detail to know if less data could be stored. I also looked into separating the lifetimes of the two structures but they are used in tandem, as far as I can tell. |
Discussed with @nikomatsakis during triage of NLL issues. We decided that the memory usage on this case should not block NLL's inclusion in RC2. In terms of whether to put this on the Release milestone or not, we decided that it would be a better idea, at least in the short-to-middle term, to focus effort more on Polonius, since that component might end up replacing the dataflow entirely, and thus the pay-off from optimizing So, tagging as NLL-deferred, with the intention of revisiting after we've learned more about what we plan to do with Polonius, if anything. |
NLL triage. P-medium. WG-compiler-performance. |
Unsure if this is still relevant. But retagging wg- label to wg-nll |
#93984 has been merged. It reduced max-rss on CI from 974MB to 399MB, a 2.44x reduction. This wins back enough of the original 2.69x regression that I am happy to declare victory here 😄 |
According to perf.rust-lang.org, a "Clean" build of
![keccak-clean](https://user-images.githubusercontent.com/1940286/45520646-a6550a00-b7fd-11e8-8a09-bdd4ae9aa21d.png)
keccak-check
has amax-rss
of 637 MB. Here's a Massif profile of the heap memory usage.The spike is due to a single allocation of 500,363,244 bytes here:
rust/src/librustc/middle/liveness.rs
Line 601 in 28bcffe
Each vector element is a
Users
, which is a three field struct taking up 12 bytes.num_live_nodes
is 16,371, andnum_vars
is 2,547, and 12 * 16,371 * 2,547 = 500,363,244.I have one idea to improve this:
Users
is a triple contains twou32
s and abool
, which means that it is 96 bytes even though it only contains 65 bytes of data. If we split it up so we have 3 vectors instead of a vector of triples, we'd end up with 4 * 16,371 * 2,547 + 4 * 16,371 * 2,547 + 1 * 16,371 * 2,547 = 375,272,433, which is a reduction of 125,090,811 bytes. This would getmax-rss
down from 637MB to 512MB, a reduction of 20%.Alternatively, if we packed the
bool
s into a bitset we could get it down to 338,787,613 bytes, which is a reduction of 161,575,631 bytes. This would getmax-rss
down from 637MB to 476MB, a reduction of 25%. But it might slow things down... depends if the improved locality is outweighed by the extra instructions needs for bit manipulations.@nikomatsakis: do you have any ideas for improving this on the algorithmic side? Is this dense
num_live_nodes * num_vars
representation avoidable?The text was updated successfully, but these errors were encountered: