Add fast tanh approximation #95

mikeoliphant · 2023-03-04T18:33:29Z

This change adds a fast, accurate tanh approximation. On my machine, it speeds up processing by about 40%.

Note that this change has the potential to alter the perceived sound of the plugin. It sounds the same to my ears, though, and the performance gain is very significant.

Because it is not a transparent change, I have left it off by default. To enable it, switch the #define for tanh_impl_ to the fast_tanh_ in dsp.cpp.

Maybe this could be switched based on a compile flag?

sdatkinson

Very cool!

My first reaction is that there's no reason the model has to use tanh--in fact I experimented with different activations (ReLU, etcetc) when picking that model. This could simply be another activation, and folks could train their models to explicitly intend to use this faster activation--there's no reason IMO why the model has to compute differently (up to the fast convs) in the plugin vs the trainer.

So...care to PR this as a new activation in neural-amp-modeler as well? 😄 (Not sure your proficiency w/ Python or PyTorch, but just in case?...)

In the meantime, I see that this PR as-is doesn't actually change the behavior of the plugin, and I think it's an interesting idea in the right direction, so I'm happy to approve & get you opened up to hacking with it with a smaller diff from main 😄

I don't want to change the default given how important accuracy is to the community, but something like a "fast activations" checkbox that could be available (if the model architecture allows it--there are different architectures that this supports beyond the "pre-packaged" ones that most folks use) I might approve if it's not too intrusive (e.g. in some "advanced options" menu?)

mikeoliphant · 2023-03-04T20:11:33Z

I was thinking the same thing with regard to also using it as an alternate training activation function. If/when I get a local training setup working on my machine, I'll look into it more.

On a related note, specifying fast (rather than precise) floating point in Visual Studio also improves things significantly.

alexlarsson · 2023-03-27T08:07:30Z

Random drive-by comment:

I did some profiling on NAMCore yesterday, and at least in my testcase (Phillipe P VOXAC15-JonAr1.nam) tanh is by far the most expensive part of the dsp runtime at 43%:

However, even with fast tanh enabled, it is not exactly free, at 19%:

So, maybe an even better apporach is to design some even faster activation function that is similar to tanh and re-train with that.

mikeoliphant · 2023-03-27T15:00:52Z

I have a slightly faster fast tanh version (using fabsf instead of fabs) - but yeah, I agree that there is room for improvement.

I looked into doing a custom activation function for pytorch, but it was non-trivial enough that I stalled on it (I'm not a python guy).

"ReLU" is supported by both the training and the plugin, and should be cheap. I assume that it didn't work as well, though?

sdatkinson · 2023-03-27T15:40:19Z

"ReLU" is supported by both the training and the plugin, and should be cheap. I assume that it didn't work as well, though?

Correct, unfortunately

alexlarsson · 2023-03-28T09:59:21Z

I did not see any improvements using fabsf(). And the reason for that is that my cmath header has both float and double version using function overloading.

I do however think it is safe to enable fast tanh by default. To test this I ran the vox model above on a short (~12 sec) sample file with 16bit input, first once with std::tanh, and then with fast_tanh, both producing new 16-bit data.

There were a total of 557056 samples, and the absolute error between the results (in the 16bit sample space) histogram is:

Meaning no single error was larger than 26 units in 65536, and 90% of the total samples differed no more than 3 units in 65536. The mean absolute error is 1.8, and the mean square error is 6.0.

I don't think anyone could ever hear this difference.

mikeoliphant · 2023-03-28T14:34:18Z

I don't think anyone could ever hear this difference.

That is my suspicion as well, but since it is a change, I didn't want to enable it by default.

I'm hoping to reorganize the activation code a bit and make it easy to swap to the fast function at runtime, but I'm holding out for this PR: sdatkinson/NeuralAmpModelerCore#13 so I don't have to juggle too many branches.

mikeoliphant · 2023-03-30T01:40:36Z

It turns out that using a hard tanh (basically just clamp to the -1/1 range) seems to work just as well as regular tanh in training, and it is cheap to run and implemented as an activation function in pytorch.

Here is a PR to add the "Hardtanh" activation function to the core playback code: sdatkinson/NeuralAmpModelerCore#14

Added fast tanh approximation

0d06be0

sdatkinson approved these changes Mar 4, 2023

View reviewed changes

sdatkinson merged commit e1410f3 into sdatkinson:main Mar 4, 2023

mikeoliphant deleted the tanh_approximation branch April 5, 2023 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast tanh approximation #95

Add fast tanh approximation #95

mikeoliphant commented Mar 4, 2023

sdatkinson left a comment

mikeoliphant commented Mar 4, 2023

alexlarsson commented Mar 27, 2023

mikeoliphant commented Mar 27, 2023

sdatkinson commented Mar 27, 2023

alexlarsson commented Mar 28, 2023

mikeoliphant commented Mar 28, 2023

mikeoliphant commented Mar 30, 2023 •

edited

Loading

Add fast tanh approximation #95

Add fast tanh approximation #95

Conversation

mikeoliphant commented Mar 4, 2023

sdatkinson left a comment

Choose a reason for hiding this comment

mikeoliphant commented Mar 4, 2023

alexlarsson commented Mar 27, 2023

mikeoliphant commented Mar 27, 2023

sdatkinson commented Mar 27, 2023

alexlarsson commented Mar 28, 2023

mikeoliphant commented Mar 28, 2023

mikeoliphant commented Mar 30, 2023 • edited Loading

mikeoliphant commented Mar 30, 2023 •

edited

Loading