Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast tanh approximation #95

Merged
merged 1 commit into from
Mar 4, 2023

Conversation

mikeoliphant
Copy link
Contributor

This change adds a fast, accurate tanh approximation. On my machine, it speeds up processing by about 40%.

Note that this change has the potential to alter the perceived sound of the plugin. It sounds the same to my ears, though, and the performance gain is very significant.

Because it is not a transparent change, I have left it off by default. To enable it, switch the #define for tanh_impl_ to the fast_tanh_ in dsp.cpp.

Maybe this could be switched based on a compile flag?

Copy link
Owner

@sdatkinson sdatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

My first reaction is that there's no reason the model has to use tanh--in fact I experimented with different activations (ReLU, etcetc) when picking that model. This could simply be another activation, and folks could train their models to explicitly intend to use this faster activation--there's no reason IMO why the model has to compute differently (up to the fast convs) in the plugin vs the trainer.

So...care to PR this as a new activation in neural-amp-modeler as well? 😄 (Not sure your proficiency w/ Python or PyTorch, but just in case?...)

In the meantime, I see that this PR as-is doesn't actually change the behavior of the plugin, and I think it's an interesting idea in the right direction, so I'm happy to approve & get you opened up to hacking with it with a smaller diff from main 😄

I don't want to change the default given how important accuracy is to the community, but something like a "fast activations" checkbox that could be available (if the model architecture allows it--there are different architectures that this supports beyond the "pre-packaged" ones that most folks use) I might approve if it's not too intrusive (e.g. in some "advanced options" menu?)

@sdatkinson sdatkinson merged commit e1410f3 into sdatkinson:main Mar 4, 2023
@mikeoliphant
Copy link
Contributor Author

I was thinking the same thing with regard to also using it as an alternate training activation function. If/when I get a local training setup working on my machine, I'll look into it more.

On a related note, specifying fast (rather than precise) floating point in Visual Studio also improves things significantly.

@alexlarsson
Copy link

Random drive-by comment:

I did some profiling on NAMCore yesterday, and at least in my testcase (Phillipe P VOXAC15-JonAr1.nam) tanh is by far the most expensive part of the dsp runtime at 43%:
nam-prof1

However, even with fast tanh enabled, it is not exactly free, at 19%:
nam-prof2

So, maybe an even better apporach is to design some even faster activation function that is similar to tanh and re-train with that.

@mikeoliphant
Copy link
Contributor Author

I have a slightly faster fast tanh version (using fabsf instead of fabs) - but yeah, I agree that there is room for improvement.

I looked into doing a custom activation function for pytorch, but it was non-trivial enough that I stalled on it (I'm not a python guy).

"ReLU" is supported by both the training and the plugin, and should be cheap. I assume that it didn't work as well, though?

@sdatkinson
Copy link
Owner

"ReLU" is supported by both the training and the plugin, and should be cheap. I assume that it didn't work as well, though?

Correct, unfortunately

@alexlarsson
Copy link

I did not see any improvements using fabsf(). And the reason for that is that my cmath header has both float and double version using function overloading.

I do however think it is safe to enable fast tanh by default. To test this I ran the vox model above on a short (~12 sec) sample file with 16bit input, first once with std::tanh, and then with fast_tanh, both producing new 16-bit data.

There were a total of 557056 samples, and the absolute error between the results (in the 16bit sample space) histogram is:

  92803 0
 195674 1
 150212 2
  58648 3
  26176 4
  13581 5
   7681 6
   4493 7
   2726 8
   1711 9
   1183 10
    759 11
    513 12
    341 13
    231 14
    139 15
     82 16
     42 17
     25 18
     14 19
     12 20
      4 21
      3 22
      1 23
      1 24
      1 26

Meaning no single error was larger than 26 units in 65536, and 90% of the total samples differed no more than 3 units in 65536. The mean absolute error is 1.8, and the mean square error is 6.0.

I don't think anyone could ever hear this difference.

@mikeoliphant
Copy link
Contributor Author

I don't think anyone could ever hear this difference.

That is my suspicion as well, but since it is a change, I didn't want to enable it by default.

I'm hoping to reorganize the activation code a bit and make it easy to swap to the fast function at runtime, but I'm holding out for this PR: sdatkinson/NeuralAmpModelerCore#13 so I don't have to juggle too many branches.

@mikeoliphant
Copy link
Contributor Author

mikeoliphant commented Mar 30, 2023

It turns out that using a hard tanh (basically just clamp to the -1/1 range) seems to work just as well as regular tanh in training, and it is cheap to run and implemented as an activation function in pytorch.

Here is a PR to add the "Hardtanh" activation function to the core playback code: sdatkinson/NeuralAmpModelerCore#14

@mikeoliphant mikeoliphant deleted the tanh_approximation branch April 5, 2023 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants