Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep samples in 64 bit #773

Merged
merged 7 commits into from
May 30, 2021

Conversation

roderickvd
Copy link
Member

@roderickvd roderickvd commented May 27, 2021

Perhaps counter-intuitively, the main reason for doing this is higher performance. Until now we are storing samples in f32 but casting them to f64 every time we manipulate them (normalisation, volume control), then cast them back again. The idea is that we can save CPU cycles and RAM copies by just keeping them in f64.

Added benefits are:

  • Support for F64 audio format
  • Even lower quantization error (bragging rights)

To re-iterate, 64 bit sample processing is already in, this just two places of casting back and forth while introducing one new one (reading lewton samples). So compared to the current situation, this should result in the following cases:

Volume control Normalisation Performance Sound quality
Softvol Disabled Equal Better
Softvol Enabled Faster Better
Softvol @ MAX Disabled Slower Equal
Softvol @ MAX Enabled Equal Better
Alsa mixer Disabled Slower Equal
Alsa mixer Enabled Equal Better

Perhaps counter-intuitively, the main reason for doing this is higher
performance. Until now we are storing samples in `f32` but casting them
to `f64` every time we manipulate them (normalisation, volume control),
then cast them back again. The idea is that we can save CPU cycles and
RAM copies by just keeping them in `f64`.

Added benefits are:

- Support for F64 audio format
- Even lower quantization error (bragging rights)

To re-iterate, 64 bit sample processing is already in, this just
two places of casting back and forth while introducing one new one
(reading `lewton` samples). So compared to the current situation,
this should result in the following cases:

Volume control | Normalisation | Performance | Sound quality
-------------- | ------------- | ----------- | -------------
Softvol        | Disabled      | Equal       | Better
Softvol        | Enabled       | Faster      | Better
Softvol @ MAX  | Disabled      | Slower      | Equal
Softvol @ MAX  | Enabled       | Equal       | Better
Alsa mixer     | Disabled      | Slower      | Equal
Alsa mixer     | Enabled       | Equal       | Better

This is a work in progress, currently in is support for Alsa, Rodio,
pipe and subprocess. The other backends are on the todo list.
@roderickvd
Copy link
Member Author

Contrary to what I had thought, cursory testing on Alsa has shown no significant change in CPU or RAM usage. In which case we "might as well".

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented May 29, 2021

I am by no means an expert but in my mind, like you've said it makes sense to just to keep everything in 64 bit once it's converted even if it's not a tangible performance difference it's more straightforward.

As a side-note, the Gstreamer backend may still "just work" since it just uses parse_launch?

@JasonLG1979
Copy link
Contributor

One nitpick I have is the magic numbers in convert.rs The "factors". They would be much less magical if they were consts.

It could be something like (pseudo code, but you get the idea):

const i32_factor = 2147483648.
const i24_factor = 8388608.
const i16_factor = 32768.

@roderickvd
Copy link
Member Author

As a side-note, the Gstreamer backend may still "just work" since it just uses parse_launch?

Yes but due to a different mechanic. The default GStreamer sets ! audioconvert and that's where the magic happens. parse_launch is really just to parse and launch that pipeline.

Also note that the pipeline sets format={} which is aligned with the samples that librespot puts out, so there won't be a format conversion in GStreamer anyway (at least not on that account).

One nitpick I have is the magic numbers in convert.rs The "factors". They would be much less magical if they were consts.

Agreed that's why this is already in this PR 😉 (look at the code)

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented May 29, 2021

Agreed that's why this is already in this PR (look at the code)

Where? In this PR you use for example:

            .map(|sample| self.scale(*sample, 2147483648.) as i32)

Just because it's not hex doesn't make it less magical,lol!!!

@roderickvd
Copy link
Member Author

Oh I see you're asking to make them named constants instead of just dropping them as parameters. Sure could do that.

@JasonLG1979
Copy link
Contributor

Oh I see you're asking to make them named constants instead of just dropping them as parameters. Sure could do that.

I'm not a rust person but to me it at least kinda makes it more clear what they are. I just hate naked numbers in code in general also,lol!!!

@JasonLG1979
Copy link
Contributor

Yes but due to a different mechanic. The default GStreamer sets ! audioconvert and that's where the magic happens. parse_launch is really just to parse and launch that pipeline.

Also note that the pipeline sets format={} which is aligned with the samples that librespot puts out, so there won't be a format conversion in GStreamer anyway (at least not on that account).

Gstreamer supports 64 bit float, so the question really becomes why bother to do any conversion? Why not just always give Gstreamer 64 bit float and let it worry about converting it? Having a static config would make things much easier I'd imagine?

@roderickvd
Copy link
Member Author

I guess you could, however:

  1. I don’t fancy magic black-box approaches because they are not guaranteed to work and somehow always produce more issue reports than just pulling the strings yourself.

  2. It would mean that GStreamer does dithering downstream, and it would be a burden to map librespot dithering to control what GStreamer does. One of the big review points of the dithering PR was too not cause too much coupling in the back ends. We’ve now nicely abstracted that away.

  3. It’s not very idiomatic in the way librespot tackles the other backends.

So yes we could but let’s not change a winning team.

@roderickvd
Copy link
Member Author

I am by no means an expert but in my mind, like you've said it makes sense to just to keep everything in 64 bit once it's converted even if it's not a tangible performance difference it's more straightforward.

Could you verify on your RPi Zero?

Indeed if it doesn't really matter CPU-wise and no-one raises any other concerns, I plan on merging this.

@roderickvd roderickvd marked this pull request as ready for review May 29, 2021 19:47
@JasonLG1979
Copy link
Contributor

I guess you could, however:

  1. I don’t fancy magic black-box approaches because they are not guaranteed to work and somehow always produce more issue reports than just pulling the strings yourself.
  2. It would mean that GStreamer does dithering downstream, and it would be a burden to map librespot dithering to control what GStreamer does. One of the big review points of the dithering PR was too not cause too much coupling in the back ends. We’ve now nicely abstracted that away.
  3. It’s not very idiomatic in the way librespot tackles the other backends.

So yes we could but let’s not change a winning team.

I guess for that matter Gstreamer also supports volume control and ReplayGain. It was just a thought.

Could you verify on your RPi Zero?

Yes of course. I test just about all of the audio related PR's on my zero 😉

Indeed if it doesn't really matter CPU-wise and no-one raises any other concerns, I plan on merging this.

In your 1st comment you mention that only doing the conversions at the edges in theory should use less CPU cycles and then in a follow up comment you said it really didn't make much of any difference. What I mean is that it makes more sense and is more straightforward to have "one format to rule them all" as far as internally and convert only at the edges even if there is no performance benefit. Regardless of anything else it's easier for a future dev to know that internally audio is always 64 bit float.

@roderickvd
Copy link
Member Author

Could you verify on your RPi Zero?

Yes of course. I test just about all of the audio related PR's on my zero wink

Great, thanks! Buying a Zero myself wouldn't be too much trouble but at the same time, this is great peer review, so very much appreciated.

Indeed if it doesn't really matter CPU-wise and no-one raises any other concerns, I plan on merging this.

In your 1st comment you mention that only doing the conversions at the edges in theory should use less CPU cycles and then in a follow up comment you said it really didn't make much of any difference. What I mean is that it makes more sense and is more straightforward to have "one format to rule them all" as far as internally and convert only at the edges even if there is no performance benefit. Regardless of anything else it's easier for a future dev to know that internally audio is always 64 bit float.

To double-check. You mean you agree with the approach in these commits? Internally it's now f32 to f64 from lewton to librespot AudioPacket, then stays there in f64 all through volume control and normalisation, then is converted down to the chosen audio format for the backend. So this is like you said: converting at the edges, same format internally everywhere.

@JasonLG1979
Copy link
Contributor

To double-check. You mean you agree with the approach in these commits? Internally it's now f32 to f64 from lewton to librespot AudioPacket, then stays there in f64 all through volume control and normalisation, then is converted down to the chosen audio format for the backend. So this is like you said: converting at the edges, same format internally everywhere.

Yes I agree that only converting twice is obviously more efficient technically and as I said it's easier to understand if the audio is 64 bit pretty much all the time internally even if it doesn't make for a tangible performance boost.

@JasonLG1979
Copy link
Contributor

Great, thanks! Buying a Zero myself wouldn't be too much trouble but at the same time, this is great peer review, so very much appreciated.

I'm not so sure if I would refer to myself as a "peer" as I clearly don't know rust,lol!!! Testing things and asking disruptive and often stupid questions are my way of contributing,lol!!! Contribution is the currency in an open source project. Pure consumers are of very little, to no use in open source. Basically if you don't contribute you can't complain,lol!!!

@JasonLG1979
Copy link
Contributor

Testing on my Pi Zero shows no noticeable difference performance or resource usage wise.

@JasonLG1979
Copy link
Contributor

@roderickvd as soon as you merge this I'll start my work on the wiki. It'll be nice from a "bragging rights" stand point to say that all internal audio processing is done in "64 bit Double-precision floating-point format to push requantization noise well below the physical noise floor of even the most state of the art DACs in production today."... Hows that for marketing speak,lol!!!

@roderickvd
Copy link
Member Author

🦾 heck yes. Roon and JRiver do it all in 64-bit and now we're here with them.
I'll leave this up here for one more day for others to comment.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented May 29, 2021

heck yes. Roon and JRiver do it all in 64-bit and now we're here with them.
I'll leave this up here for one more day for others to comment.

I think pretty much every modern media player/framework does stuff in 64 bit. Just not all of them brag about it,lol!!! The audio enthusiast community does love their superlatives and grand statements though. I promise to keep away from snake oil salesman territory though. I will keep things factual but there's no reason facts can't sound poetic,lol!!!

@roderickvd roderickvd merged commit fe2d5ca into librespot-org:dev May 30, 2021
@roderickvd roderickvd deleted the keep-samples-in-64-bit branch May 30, 2021 18:09
@JasonLG1979
Copy link
Contributor

@roderickvd again sorry to bug you, but to go from f64 to f32 it looks like you're casting a f64 as a f32 what does the conversion look like in that situation? Does it round, truncate?

@roderickvd
Copy link
Member Author

It drops 32 bits and what happens then is determined by the way floating point are encoded. It's not as easy to say if that's rounding or not, that's thinking in an integer paradigm when this is not.

You can play around with https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8fdaea3e5d7d51fda1c922f286c5fb5a and see what happens.

Remember that samples values are normalized between -1.0..=1.0.

This PR is closed now so as much as I enjoy the discussion, let's do that on Gitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants