Improve sample rounding and clean up noise shaping leftovers #771

roderickvd · 2021-05-27T18:52:40Z

Two things:

Remove unused parameter (leftover from scrapped noise shaping)
Round half away from zero instead of truncating to lower quantisation error

I've been fighting with Rust over something as stupid as point 2 but am sure now:

assert_eq!( 0,  0.1_f32.round() as i16);
assert_eq!( 1,  0.5_f32.round() as i16);
assert_eq!( 1,  0.9_f32.round() as i16);
assert_eq!( 0, -0.1_f32.round() as i16);
assert_eq!(-1, -0.5_f32.round() as i16);
assert_eq!(-1, -0.9_f32.round() as i16);

Without round() this would all return 0.

roderickvd · 2021-05-27T19:08:21Z

b724603: Absolute best rounding method is "round to even". Although this is the default and recommended rounding mode in IEEE 754 ("round to nearest, ties to even"), there are no LLVM intrinsics for it, and it introduces a new dependency. Compiling with --release, this increases the binary by only 728 bytes (yes, bytes) compared to std::f32::round().

roderickvd · 2021-05-29T20:25:51Z

Going to rebase and merge this, if there is any late opposition against the added 728 bytes of b724603 we can always revert later 😆

JasonLG1979 · 2021-05-29T20:48:10Z

I don't think 728 bytes is going to make or break it for anyone,lol!!! A monolithic librespot binary compiled for ARMv6 with the ALSA backend is about 13.6 MB.

JasonLG1979 · 2021-05-30T02:19:25Z

@roderickvd I'm not sure what the deal is but this addition causes librespot to use about twice the CPU as before during normal use (just playing not downloading or caching). Going directly to hardware with the --device arg (so no resampling overhead) it went from 10 - 15% to 20 to 30%. Not a huge deal as there's still plenty of headroom and it doesn't cause any audible issues but a couple more bumps like that and librespot won't run a Pi Zero anymore.

Edit: That's with --format S32 so no dithering either. My guess is that the new rounding method is a little costly.

JasonLG1979 · 2021-05-30T03:36:43Z

For a frame of reference on my Pi 4 running 64bit Raspberry Pi OS it uses less than 10% CPU and on my Fedora x86_64 desktop it's about 0.5% CPU. So clearly not the end of the world just something to keep in the back of your mind.

Edit: Diving a little deeper it's got to be the rounding. If I switch to F32 (bypassing the rounding in librespot, letting pipewire handle it) on my desktop it cuts the CPU usage in about half.

JasonLG1979 · 2021-05-30T04:18:31Z

@roderickvd I'm sorry if I'm talking your ear off but another thing I noticed is the use of math::round::half_to_even I get that it replicates the IEEE 754 standard but I'm not so sure if that's what you'd want in this case. The point of that standard is to make rounding errors more uniform and predictable. Uniform and predictable to me sounds like signal correlated. Wouldn't math::round::stochastic be a better choice to randomize ties?

roderickvd · 2021-05-30T07:00:28Z

@roderickvd I'm sorry if I'm talking your ear off but another thing I noticed is the use of math::round::half_to_even I get that it replicates the IEEE 754 standard but I'm not so sure if that's what you'd want in this case. The point of that standard is to make rounding errors more uniform and predictable. Uniform and predictable to me sounds like signal correlated. Wouldn't math::round::stochastic be a better choice to randomize ties?

That's not the same.

Randomisation is what dithering is for (and does better). A stochastic rounding function was in the original dithering PR, but yanked because it behaves as a rectangular probability density function. That would be uniform, and it's too theoretical to explain but that causes intermodulation distortion. (If you're interested, read the publicly available works by Lipschitz and Wannamaker)

The Gaussian or triangular ditherers take care of non-uniform, uncorrelated randomisation. Next step, you need to round instead of truncate. The best way to round is with the least error, and that is with an unbiased function (i.e. round half to even/uneven, also called convergent rounding). "Normal" rounding is biased in that it rounds away from zero.

We can watch the CPU usage and revert to standard round() if necessary. We'll also have to see how the 64-bit PR impacts this. The libmath crate introduces another conversion from f32 to f64 and back again, which is no longer necessary with the 64-bit PR.

At this point I'm not too concerned and only expect CPU usage to drop with the 64-bit PR.

JasonLG1979 · 2021-05-30T07:09:17Z

That's not the same.

Randomisation is what dithering is for (and does better). A stochastic rounding function was in the original dithering PR, but yanked because it behaves as a rectangular probability density function. That would be uniform, and it's too theoretical to explain but that causes intermodulation distortion. (If you're interested, read the publicly available works by Lipschitz and Wannamaker)

The Gaussian or triangular ditherers take care of non-uniform, uncorrelated randomisation. Next step, you need to round instead of truncate. The best way to round is with the least error, and that is with an unbiased function (i.e. round half to even/uneven, also called convergent rounding). "Normal" rounding is biased in that it rounds away from zero.

Thanks for the explanation. I will take your word for it and skip the technical papers,lol!!! I was just curious.

We can watch the CPU usage and revert to standard round() if necessary. We'll also have to see how the 64-bit PR impacts this. The libmath crate introduces another conversion from f32 to f64 and back again, which is no longer necessary with the 64-bit PR.

At this point I'm not too concerned and only expect CPU usage to drop with the 64-bit PR.

Good deal. Rebase it and I'll give it a test.

roderickvd · 2021-05-30T07:43:41Z

Try it, just merged it into #773.

roderickvd · 2021-05-30T08:23:35Z

@roderickvd I'm not sure what the deal is but this addition causes librespot to use about twice the CPU as before during normal use (just playing not downloading or caching). Going directly to hardware with the --device arg (so no resampling overhead) it went from 10 - 15% to 20 to 30%. Not a huge deal as there's still plenty of headroom and it doesn't cause any audible issues but a couple more bumps like that and librespot won't run a Pi Zero anymore.

On my RPi 3B it seems about +5%

JasonLG1979 · 2021-05-30T08:48:36Z

On my RPi 3B it seems about +5%

That sounds about right considering it probably only used about 5% to begin with before. the Pi Zero is orders of magnitudes slower then even a Pi 3. As a frame of reference, it takes about 13 min to compile librespot on a Pi 4 and over 4 hours on a Pi Zero,lol!!!

JasonLG1979 · 2021-05-30T08:58:29Z

20 - 30% isn't a huge deal under normal playback conditions. My only concern is that CPU usage can spike momentarily to as high as in the 70's while caching a track. Another doubling of CPU usage and we hit 100% and audio starts to drop out. Is there a way to give the playback thread a higher priority or is that done already?

roderickvd · 2021-05-30T10:02:46Z

I'm not sure but inclined to say no. But let's cross that bridge if and when we get there.

I'm about done with sound quality improvements, don't expect any more CPU hogs
it's easy to swap for std::f64::round() if necessary

JasonLG1979 · 2021-05-30T10:40:41Z

I'm not sure but inclined to say no. But let's cross that bridge if and when we get there.

Fair enough. No need for premature optimization.

I'm about done with sound quality improvements, don't expect any more CPU hogs

it's easy to swap for std::f64::round() if necessary

Well on a selfish note, I'm currently softening the wife up to let me buy another Pi 4 and a Topping D10s DAC for audio streaming so I may not care that much for much longer,lol!!!

But generally I don't think it's a huge deal. It's not like a Pi Zero is a multi-tasking beast. Realistically you can only expect it to do one thing at a time well. My test set up is a stock install of Raspotify OS lite with nothing else running except the default services and librespot. I don't think that it's unreasonable to say that if you plan on running librespot on a Pi Zero that's all you're going to run on it.

roderickvd · 2021-05-30T11:43:54Z

That's a great deal. You should come on Gitter if you want to talk gear and other stuff.

JasonLG1979 · 2021-05-30T12:10:27Z

So the verdict is that the 64 bit PR didn't help. Comparing 5d43b7d (the last build I had laying around) it still basically doubles. There was a cargo update in between so I can build right after that to make sure it's not a dependency update that's actually causing the problem?

roderickvd · 2021-05-30T12:31:09Z

Nah I’m pretty sure this is it, I’m seeing it too. I’d like to arrive at a different conclusion though, namely that we can have 64-bit sample handling basically for free.

JasonLG1979 · 2021-05-30T20:24:09Z

Nah I’m pretty sure this is it, I’m seeing it too. I’d like to arrive at a different conclusion though, namely that we can have 64-bit sample handling basically for free.

It's for sure it. I replaced it with int_value.round() and CPU is back where it was. The builtin round is clearly just more optimized. It is what it is.

Round half away from zero and clean up noise shaping leftovers

4f20610

roderickvd added the audio label May 27, 2021

roderickvd self-assigned this May 27, 2021

Round ties to even for absolutely lowest error

b724603

roderickvd force-pushed the round-and-cleanup-dithering branch from fd4915a to b724603 Compare May 27, 2021 19:18

Merge branch 'dev' into round-and-cleanup-dithering

532ce83

roderickvd merged commit 8062bd2 into librespot-org:dev May 29, 2021

roderickvd deleted the round-and-cleanup-dithering branch May 29, 2021 20:53

roderickvd changed the title ~~Round half away from zero and clean up noise shaping leftovers~~ Improve sample rounding and clean up noise shaping leftovers May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sample rounding and clean up noise shaping leftovers #771

Improve sample rounding and clean up noise shaping leftovers #771

roderickvd commented May 27, 2021

roderickvd commented May 27, 2021 •

edited

Loading

roderickvd commented May 29, 2021

JasonLG1979 commented May 29, 2021

JasonLG1979 commented May 30, 2021 •

edited

Loading

JasonLG1979 commented May 30, 2021 •

edited

Loading

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021 •

edited

Loading

Improve sample rounding and clean up noise shaping leftovers #771

Improve sample rounding and clean up noise shaping leftovers #771

Conversation

roderickvd commented May 27, 2021

roderickvd commented May 27, 2021 • edited Loading

roderickvd commented May 29, 2021

JasonLG1979 commented May 29, 2021

JasonLG1979 commented May 30, 2021 • edited Loading

JasonLG1979 commented May 30, 2021 • edited Loading

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021

roderickvd commented May 30, 2021

JasonLG1979 commented May 30, 2021 • edited Loading

roderickvd commented May 27, 2021 •

edited

Loading

JasonLG1979 commented May 30, 2021 •

edited

Loading

JasonLG1979 commented May 30, 2021 •

edited

Loading

JasonLG1979 commented May 30, 2021 •

edited

Loading