Reintroduce fast math functions #7495

messmerd · 2024-09-12T01:43:32Z

Added the fastPow function back into lmms_math.h after it was mistakenly removed in #7382, and removed the undefined behavior caused by type punning with a union.

Replaced calls to std::fma with x * y + z (see discussion in review comments below).

I left the fast sqrt function out because it was unused.

include/lmms_math.h

Rossmaxx

All good.

include/lmms_math.h

Rossmaxx · 2024-09-14T11:29:47Z

@messmerd Having thought of it a bit, how about we just do the raw a * b + c instead of calling fma? @LostRobotMusic is there any chance this might be faster, since like dom said, FP_FAST_FMA* is rarely defined.

LostRobotMusic · 2024-09-25T00:27:10Z

@LostRobotMusic is there any chance this might be faster, since like dom said, FP_FAST_FMA* is rarely defined.

0 chance, because the code already there uses a * b + c when the fast FMA flag isn't set.

Having thought of it a bit, how about we just do the raw a * b + c instead of calling fma?

I have another idea... using fma instead of calling fma (explained below). But I have no strong opinion either way, a*b+c is already blazingly fast.

@DomClark @messmerd A quick check of Dom's findings show he's correct, when using that march flag both the FMA and non-FMA versions of the code compile to use vfmadd132sd for me in GCC. I think this is because it is enabling the -mfma flag, which provides the same results. This allows FP_FAST_FMA to be defined in GCC.

However, as messmerd mentioned on Discord, FMA has different precision behavior than the standard multiply-add, and I found it odd that the compiler would choose to optimize something in a way that objectively changes its behavior. As it turns out, GCC has -ffp-contract=fast by default, even with -fno-fast-math. Apparently these behavior differences are so incredibly minor that GCC decided to have the faster behavior enabled by default regardless of other settings.

(I don't recommend this ↓)
If we wanted to have full control over this precision behavior, while also making use of FMA where applicable, the solution would be to enable FMA with -mfma, prevent the compiler from optimizing things to use FMA automatically via -ffp-contract=off (I don't know what other things this would impact aside from FMA), and then use FMA directly whenever we want to make use of that...

(I recommend this, if possible ↓)
However, since this whole time GCC would have been doing this anyway with certain march flags (as Dom demonstrated), I think it can safely be concluded that allowing the compiler to optimize things to use FMA globally via -mfma wouldn't noticeably change any behavior whatsoever inside of LMMS. Assuming I'm not missing some important detail, I highly recommend taking this approach...

IF all of the hardware we target supports FMA. I don't know whether this is the case. If this isn't the case, then we can still allow it to potentially be enabled for compiled builds, but it would have to be disabled for releases on the website or else the program simply wouldn't run on that hardware.

In any case, unless we take the aforementioned manual FMA approach with -ffp-contract=off (which I don't think we should do), there isn't any reason for fastFma to exist and it should be removed, if I'm not mistaken.

I don't know a lot about this topic, so hopefully I'm not being completely glaringly wrong on every front here in some way I'm not noticing. That would be embarrassing.

Rossmaxx · 2024-09-25T02:43:17Z

Thanks for the clarification lost, now another question, would it make sense to change other calls to std::pow to use fastPow instead?

LostRobotMusic · 2024-09-25T03:43:10Z

Thanks for the clarification lost, now another question, would it make sense to change other calls to std::pow to use fastPow instead?

Probably not, only if you really know what you're doing. fastPow is an approximation, and I don't think anybody's measured its error yet. It runs 5x as fast on my computer and 14x as fast on my laptop, but its speed doesn't matter if it's too inaccurate to work for the task it's being used for. Backwards compatibility needs to be kept in mind as well. The first absolutely necessary step would be measuring its accuracy.

I recommend not bothering with replacing past pow uses except in cases where its usage is causing a noticeable performance detriment. Do not use it in cases where the pow base is known at compile time (e.g. pow(2,x) or pow(10,x)), those can be optimized in significantly better and more accurate ways than a general-purpose pow approximation. I'll eventually make a PR to optimize LMMS's dBFS/amplitude conversion functions specifically.

On GCC with -O1 or -O2 optimizations, this new implementation generates identical assembly to the old union-based implementation

include/lmms_math.h

Rossmaxx · 2024-09-29T15:59:27Z

@LostRobotMusic you did say you use std::pow in LOMM right? Wanna replace?

LostRobotMusic · 2024-09-29T16:40:41Z

@LostRobotMusic you did say you use std::pow in LOMM right? Wanna replace?

No, I said I use the dBFS/amplitude conversion functions. In my last message I said I'll make a PR to optimize those.

Rossmaxx · 2024-09-29T23:31:20Z

Regarding LOMM, I remember you said somewhere in discord. Per : https://discord.com/channels/203559236729438208/784594576223633478/1286903926653714433

I dug out the exact chat this time to avoid flak.

LostRobotMusic · 2024-09-30T00:25:29Z

Regarding LOMM, I remember you said somewhere in discord. Per : https://discord.com/channels/203559236729438208/784594576223633478/1286903926653714433

I dug out the exact chat this time to avoid flak.

LOMM uses LMMS's dbfsToAmp function, and LMMS's dbfsToAmp function uses std::pow. std::pow is not directly used anywhere in LOMM's code, and as I said, I'll eventually make a PR to optimize LMMS's dBFS/amplitude conversion functions specifically.

Rossmaxx · 2024-09-30T00:30:40Z

Okk, i misunderstood that point.

* Add fast fma functions * Use fast fma functions * Add fast pow function * Use fast pow function * Fix build * Remove fastFma * Avoid UB in fastPow On GCC with -O1 or -O2 optimizations, this new implementation generates identical assembly to the old union-based implementation

messmerd added 5 commits September 11, 2024 18:14

Add fast fma functions

998becf

Use fast fma functions

ad363cf

Add fast pow function

652b25e

Use fast pow function

afb645d

Fix build

47c8295

Rossmaxx reviewed Sep 12, 2024

View reviewed changes

include/lmms_math.h Outdated Show resolved Hide resolved

DomClark self-requested a review September 12, 2024 17:47

Rossmaxx approved these changes Sep 13, 2024

View reviewed changes

DomClark reviewed Sep 14, 2024

View reviewed changes

include/lmms_math.h Outdated Show resolved Hide resolved

include/lmms_math.h Outdated Show resolved Hide resolved

Remove fastFma

e82932a

Avoid UB in fastPow

a61a6ff

On GCC with -O1 or -O2 optimizations, this new implementation generates identical assembly to the old union-based implementation

Rossmaxx reviewed Sep 29, 2024

View reviewed changes

include/lmms_math.h Show resolved Hide resolved

michaelgregorius approved these changes Sep 29, 2024

View reviewed changes

LostRobotMusic approved these changes Oct 1, 2024

View reviewed changes

messmerd merged commit 121d608 into LMMS:master Oct 1, 2024
11 checks passed

messmerd deleted the fast-math branch October 1, 2024 18:35

messmerd added the performance label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce fast math functions #7495

Reintroduce fast math functions #7495

messmerd commented Sep 12, 2024 •

edited

Loading

Rossmaxx left a comment

Rossmaxx commented Sep 14, 2024

LostRobotMusic commented Sep 25, 2024 •

edited

Loading

Rossmaxx commented Sep 25, 2024

LostRobotMusic commented Sep 25, 2024

Rossmaxx commented Sep 29, 2024

LostRobotMusic commented Sep 29, 2024

Rossmaxx commented Sep 29, 2024 •

edited

Loading

LostRobotMusic commented Sep 30, 2024

Rossmaxx commented Sep 30, 2024

Reintroduce fast math functions #7495

Reintroduce fast math functions #7495

Conversation

messmerd commented Sep 12, 2024 • edited Loading

Rossmaxx left a comment

Choose a reason for hiding this comment

Rossmaxx commented Sep 14, 2024

LostRobotMusic commented Sep 25, 2024 • edited Loading

Rossmaxx commented Sep 25, 2024

LostRobotMusic commented Sep 25, 2024

Rossmaxx commented Sep 29, 2024

LostRobotMusic commented Sep 29, 2024

Rossmaxx commented Sep 29, 2024 • edited Loading

LostRobotMusic commented Sep 30, 2024

Rossmaxx commented Sep 30, 2024

messmerd commented Sep 12, 2024 •

edited

Loading

LostRobotMusic commented Sep 25, 2024 •

edited

Loading

Rossmaxx commented Sep 29, 2024 •

edited

Loading