Rollup of 4 pull requests #123372

GuillaumeGomez · 2024-04-02T16:18:52Z

Successful merges:

rustdoc-search: shard the search result descriptions #122614 (rustdoc-search: shard the search result descriptions)
Update to new browser-ui-test version #123338 (Update to new browser-ui-test version)
Minor by_move_body impl cleanups #123366 (Minor by_move_body impl cleanups)
Remove dangling .mir.stderr and .thir.stderr test files #123371 (Remove dangling .mir.stderr and .thir.stderr test files)

r? @ghost
@rustbot modify labels: rollup

The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files. This commit also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse. https://caniuse.com/async-functions [^1]: <https://microsoft.github.io/windows-docs-rs/>, a crate with 44MiB of pure names and no descriptions for them, is an outlier and should not be counted.

This adds a bit more data than "pure sharding" by including information about which items have no description at all. This way, it can sort the results, then truncate, then finally download the description. With the "e" bitmap: 2380KiB Without the "e" bitmap: 2364KiB

…llible op

Co-authored-by: Guillaume Gomez <[email protected]>

…=GuillaumeGomez rustdoc-search: shard the search result descriptions ## Preview This makes no visual changes to rustdoc search. It's a pure perf improvement. <details><summary>old</summary> Preview: <http://notriddle.com/rustdoc-html-demo-10/doc/std/index.html?search=vec> WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240317_AiDc61_2EM,240317_AiDcM0_2EN> Waterfall diagram: ![image](https://github.com/rust-lang/rust/assets/1593513/39548f0c-7ad6-411b-abf8-f6668ff4da18) </details> Preview: <http://notriddle.com/rustdoc-html-demo-10/doc2/std/index.html?search=vec> WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240322_BiDcCH_13R,240322_AiDcJY_104> ![image](https://github.com/rust-lang/rust/assets/1593513/4be1f9ff-c3ff-4b96-8f5b-b264df2e662d) ## Description r? `@GuillaumeGomez` The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files. Additionally, this PR pulls out information about whether there's a description into a bitmap. This allows us to sort, truncate, *then* download. This PR also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse. https://caniuse.com/async-functions [^1]: <https://microsoft.github.io/windows-docs-rs/>, a crate with 44MiB of pure names and no descriptions for them, is an outlier and should not be counted. But this PR should improve it, by replacing a long line of empty strings with a compressed bitmap with a single Run section. Just not very much. ## Detailed sizes ```console $ cat test.sh set -ex cp ../search-index*.js search-index.js awk 'FNR==NR {a++;next} FNR<a-3' search-index.js{,} | awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' | sed -E "s:\\\\':':g" > search-index.json jq -c '.t' search-index.json > t.json jq -c '.n' search-index.json > n.json jq -c '.q' search-index.json > q.json jq -c '.D' search-index.json > D.json jq -c '.e' search-index.json > e.json jq -c '.i' search-index.json > i.json jq -c '.f' search-index.json > f.json jq -c '.c' search-index.json > c.json jq -c '.p' search-index.json > p.json jq -c '.a' search-index.json > a.json du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json $ bash test.sh + cp ../search-index1.78.0.js search-index.js + awk 'FNR==NR {a++;next} FNR<a-3' search-index.js search-index.js + awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' + sed -E 's:\\'\'':'\'':g' + jq -c .t search-index.json + jq -c .n search-index.json + jq -c .q search-index.json + jq -c .D search-index.json + jq -c .e search-index.json + jq -c .i search-index.json + jq -c .f search-index.json + jq -c .c search-index.json + jq -c .p search-index.json + jq -c .a search-index.json + du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json 64K t.json 800K n.json 8.0K q.json 4.0K D.json 16K e.json 192K i.json 544K f.json 4.0K c.json 36K p.json 20K a.json ``` These are, roughly, the size of each section in the standard library (this tool actually excludes libtest, for parsing-json-with-awk reasons, but libtest is tiny so it's probably not important). t = item type, like "struct", "free fn", or "type alias". Since one byte is used for every item, this implies that there are approximately 64 thousand items in the standard library. n = name, and that's now the largest section of the search index with the descriptions removed from it q = parent *module* path, stored parallel to the items within D = the size of each description shard, stored as vlq hex numbers e = empty description bit flags, stored as a roaring bitmap i = parent *type* index as a link into `p`, stored as decimal json numbers; used only for associated types; might want to switch to vlq hex, since that's shorter, but that would be a separate pr f = function signature, stored as lists of lists that index into `p` c = deprecation flag, stored as a roaring bitmap p = parent *type*, stored separately and linked into from `i` and `f` a = alias, as [[key, value]] pairs ## Search performance http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html For example, in stm32f4: <table><thead><tr><th>before<th>after</tr></thead> <tbody><tr><td> ``` Testing T -> U ... in_args = 0, returned = 0, others = 200 wall time = 617 Testing T, U ... in_args = 0, returned = 0, others = 200 wall time = 198 Testing T -> T ... in_args = 0, returned = 0, others = 200 wall time = 282 Testing crc32 ... in_args = 0, returned = 0, others = 0 wall time = 426 Testing spi::pac ... in_args = 0, returned = 0, others = 0 wall time = 673 ``` </td><td> ``` Testing T -> U ... in_args = 0, returned = 0, others = 200 wall time = 716 Testing T, U ... in_args = 0, returned = 0, others = 200 wall time = 207 Testing T -> T ... in_args = 0, returned = 0, others = 200 wall time = 289 Testing crc32 ... in_args = 0, returned = 0, others = 0 wall time = 418 Testing spi::pac ... in_args = 0, returned = 0, others = 0 wall time = 687 ``` </td></tr><tr><td> ``` user: 005.345 s sys: 002.955 s wall: 006.899 s child_RSS_high: 583664 KiB group_mem_high: 557876 KiB ``` </td><td> ``` user: 004.652 s sys: 000.565 s wall: 003.865 s child_RSS_high: 538696 KiB group_mem_high: 511724 KiB ``` </td></tr> </table> This perf tester is janky and unscientific enough that the apparent differences might just be noise. If it's not an order of magnitude, it's probably not real. ## Future possibilities * Currently, results are not shown until the descriptions are downloaded. Theoretically, the description-less results could be shown. But actually doing that, and making sure it works properly, would require extra work (we have to be careful to avoid layout jumps). * More than just descriptions can be sharded this way. But we have to be careful to make sure the size wins are worth the round trips. Ideally, data that’s needed only for display should be sharded while data needed for search isn’t. * [Full text search](https://internals.rust-lang.org/t/full-text-search-for-rustdoc-and-doc-rs/20427) also needs this kind of infrastructure. A good implementation might store a compressed bloom filter in the search index, then download the full keyword in shards. But, we have to be careful not just of the amount readers have to download, but also of the amount that [publishers](https://gist.github.com/notriddle/c289e77f3ed469d1c0238d1d135d49e1) have to store.

…est, r=notriddle Update to new browser-ui-test version This new version brings a lot of new internal improvements (mostly around validating the commands input). It also improved some command names and arguments. r? `@notriddle`

…=compiler-errors Minor by_move_body impl cleanups r? `@compiler-errors`

…ompiler-errors Remove dangling `.mir.stderr` and `.thir.stderr` test files They are not needed since rust-lang#117673

GuillaumeGomez · 2024-04-02T16:19:05Z

@bors r+ p=4 rollup=never

bors · 2024-04-02T16:19:08Z

📌 Commit 4468068 has been approved by GuillaumeGomez

It is now in the queue for this repository.

bors · 2024-04-02T17:08:14Z

⌛ Testing commit 4468068 with merge 029cb1b...

bors · 2024-04-02T19:19:30Z

☀️ Test successful - checks-actions
Approved by: GuillaumeGomez
Pushing 029cb1b to master...

rust-timer · 2024-04-02T19:20:40Z

📌 Perf builds for each rolled up PR:

PR#	Message	Perf Build Sha
#122614	rustdoc-search: shard the search result descriptions	`90e44e7b54be0da0debdadeb4a2743f10f8a0ff9` (link)
#123338	Update to new browser-ui-test version	`00b44ecf3db8b63f0354cf75f9d238dd4e08dcc9` (link)
#123366	Minor by_move_body impl cleanups	`7402dd349cf1ec8dca7b7231c675400c306a45c0` (link)
#123371	Remove dangling `.mir.stderr` and `.thir.stderr` test files	`781d2f5f1cf8011964ab44ecbfc6b588d647a6c4` (link)

previous master: 36b6f9b58e

In the case of a perf regression, run the following command for each PR you suspect might be the cause: @rust-timer build $SHA

rust-timer · 2024-04-02T20:51:48Z

Finished benchmarking commit (029cb1b): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.5%	[2.3%, 2.6%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.5%	[2.3%, 2.6%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.623s -> 667.286s (-0.05%)
Artifact size: 315.65 MiB -> 315.68 MiB (0.01%)

notriddle and others added 16 commits March 15, 2024 17:49

rustdoc: clean up formatting

351890d

Fix style errors

2e368bf

Use promise.all to load sorted results in parallel

e860b9c

rustdoc-search: address nits

c65f7d8

Update to new browser-ui-test version

59120d0

Remove redundant code comments

0bb1ec7

Prefer UnordSet over FxHashSet where possible

b4993c4

Avoid an is_empty() followed by an index op in favor of a single fa…

6f3cc09

…llible op

Clean up src/librustdoc/html/render/search_index/encode.rs

a272007

Co-authored-by: Guillaume Gomez <[email protected]>

Remove dangling .mir.stderr and .thir.stderr test files

858a1df

Rollup merge of rust-lang#123366 - oli-obk:cleanups_async_closures, r…

0bf8140

…=compiler-errors Minor by_move_body impl cleanups r? `@compiler-errors`

Rollup merge of rust-lang#123371 - eduardosm:dangling-test-files, r=c…

4468068

…ompiler-errors Remove dangling `.mir.stderr` and `.thir.stderr` test files They are not needed since rust-lang#117673

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 2, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 2, 2024

bors merged commit 029cb1b into rust-lang:master Apr 2, 2024
12 checks passed

rustbot added this to the 1.79.0 milestone Apr 2, 2024

GuillaumeGomez deleted the rollup-nwxdzev branch April 2, 2024 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollup of 4 pull requests #123372

Rollup of 4 pull requests #123372

GuillaumeGomez commented Apr 2, 2024

GuillaumeGomez commented Apr 2, 2024

bors commented Apr 2, 2024

bors commented Apr 2, 2024

bors commented Apr 2, 2024

rust-timer commented Apr 2, 2024

rust-timer commented Apr 2, 2024

Rollup of 4 pull requests #123372

Rollup of 4 pull requests #123372

Conversation

GuillaumeGomez commented Apr 2, 2024

GuillaumeGomez commented Apr 2, 2024

bors commented Apr 2, 2024

bors commented Apr 2, 2024

bors commented Apr 2, 2024

rust-timer commented Apr 2, 2024

rust-timer commented Apr 2, 2024

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size