Add benchmarks for u32/u64 functions. #582

josephlr · 2025-01-10T04:24:21Z

This allows us to see the effect of any specicalized implementations for getrandom::u32 or getrandom::u64. As expected, on Linux (which just uses the default implementation in utils.rs) there is no change:

test bench_u32                 ... bench:         196.50 ns/iter (+/- 4.85) = 20 MB/s
test bench_u32_via_fill        ... bench:         198.25 ns/iter (+/- 1.78) = 20 MB/s
test bench_u64                 ... bench:         196.95 ns/iter (+/- 2.99) = 40 MB/s
test bench_u64_via_fill        ... bench:         197.62 ns/iter (+/- 2.24) = 40 MB/s

but when using the rdrand backend (which is specialized), there is a mesurable difference.

test bench_u32                 ... bench:          16.84 ns/iter (+/- 0.09) = 250 MB/s
test bench_u32_via_fill        ... bench:          18.40 ns/iter (+/- 0.28) = 222 MB/s
test bench_u64                 ... bench:          16.62 ns/iter (+/- 0.06) = 500 MB/s
test bench_u64_via_fill        ... bench:          17.70 ns/iter (+/- 0.08) = 470 MB/s

This allows us to see the effect of any specicalized implementations for `getrandom::u32` or `getrandom::u64`. As expected, on Linux (which just uses the default implementation in `utils.rs`) there is no change: ``` test bench_u32 ... bench: 196.50 ns/iter (+/- 4.85) = 20 MB/s test bench_u32_via_fill ... bench: 198.25 ns/iter (+/- 1.78) = 20 MB/s test bench_u64 ... bench: 196.95 ns/iter (+/- 2.99) = 40 MB/s test bench_u64_via_fill ... bench: 197.62 ns/iter (+/- 2.24) = 40 MB/s ``` but when using the `rdrand` backend (which is specialized), there is a mesurable difference. ``` test bench_u32 ... bench: 16.84 ns/iter (+/- 0.09) = 250 MB/s test bench_u32_via_fill ... bench: 18.40 ns/iter (+/- 0.28) = 222 MB/s test bench_u64 ... bench: 16.62 ns/iter (+/- 0.06) = 500 MB/s test bench_u64_via_fill ... bench: 17.70 ns/iter (+/- 0.08) = 470 MB/s ``` Signed-off-by: Joe Richey <[email protected]>

newpavlov

I don't think we need the bench_*_via_fill benchmarks.

josephlr · 2025-01-10T23:41:49Z

I don't think we need the bench_*_via_fill benchmarks.

The main reason I added those was to test the effectiveness of the getrandom::u32() and getrandom::u64() specializations we have for some backends. I wanted to know if having different implementations for u32() (say for the rdrand backend) actually made things faster.

Does it make sense to keep them for this reason? I can add some comments explaining why if you think it's necessary.

newpavlov · 2025-01-11T01:09:39Z

Performance was not a motivation for addition of the u32/u64 functions in the first place. It was user ergonomics and conceptual "zero costness" on some platforms. In the worst case scenario we will get some stack spilling and unnecessary copies, which will cost no more than several cycles.

josephlr requested a review from newpavlov January 10, 2025 04:24

josephlr added this to the Post 0.3 Release milestone Jan 10, 2025

newpavlov reviewed Jan 10, 2025

View reviewed changes

newpavlov approved these changes Jan 10, 2025

View reviewed changes

newpavlov merged commit aa96363 into master Jan 25, 2025
59 checks passed

newpavlov deleted the bench_u32_u64 branch January 25, 2025 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for u32/u64 functions. #582

Add benchmarks for u32/u64 functions. #582

josephlr commented Jan 10, 2025

newpavlov left a comment

josephlr commented Jan 10, 2025

newpavlov commented Jan 11, 2025

Add benchmarks for u32/u64 functions. #582

Add benchmarks for u32/u64 functions. #582

Conversation

josephlr commented Jan 10, 2025

newpavlov left a comment

Choose a reason for hiding this comment

josephlr commented Jan 10, 2025

newpavlov commented Jan 11, 2025