Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tanh to speed up sigmoid calculation #149

Closed

Conversation

mikeoliphant
Copy link
Contributor

For LSTM networks, the sigmoid activation function is very much in the performance hot path.

This PR defines the sigmoid calculation based on tanh() instead of exp(). It results in a significant performance improvement - I think because you are using a faster tanh approximation in your Eigen code.

It is hard to follow the twists through the Eigen code to the root tanh function, but I think this is what gets used?

T generic_fast_tanh_float(const T& a_x)

It will obviously vary based on platform and network architecture, but in my testing I was seeing around 50% of CPU going to the sigmoid calculation. This change cuts that roughly in half, so the overall performance improvement is about 25% (or about 1.3x faster).

@jatinchowdhury18
Copy link
Owner

jatinchowdhury18 commented Nov 1, 2024

I don't think we should merge this change for two reasons:

  1. The whole point of the MathsProvider pattern is that users can manually provide their own implementations of functions like sigmoid, on a per-model or even per-layer basis.
  2. With that in mind, I think it's best for the "default" MathsProvider to provide the highest precision implementation of each function that can be expected relative to the constraints of the chosen backend. This is especially important for recurrent layers, since precision errors might "build" up over time, and potentially even lead to instabilities. If users know they can get by with less precision for certain layers/models, they can implement their own MathsProvider for those layers/models, and potentially even use more aggressive approximations.

Unless you have some proof that the tanh implementation would be equal to the exp implementation to within, let's say 5 ULP for all possible single-precision floating point numbers, then I don't think change is worth the potential risk.

@mikeoliphant
Copy link
Contributor Author

I would agree with you, except that expressing sigmoid in terms of tanh is only an approximation to the extent that your existing tanh is an approximation.

@jatinchowdhury18
Copy link
Owner

Sure, but that's Eigen's choice, so people using the Eigen backend won't be "surprised" by that behaviour.

My biggest concern is that this change could "break" existing users' recurrent models. The sigmoid function is expected to approach zero as x → -∞, and there's a high risk that this change would drastically increase the relative error for small values... exactly the kind of thing that could lead to instabilities. That seems like a silly risk to take when the MathsProvider pattern already allows users to provide their own approximations.

For example, here's a Desmos plot comparing the two implementations: https://www.desmos.com/calculator/9a8j1tb2lz. I would imagine Desmos is using double-precision implementations of both exp() and tanh(), but the relative error still becomes quite large as x goes negative.

@mikeoliphant
Copy link
Contributor Author

I suspect that graph is more about precision issues on Desmos than anything else, but I'm not going to argue the point since I agree that having MathsProvider gives an easy work-around.

However, I actually did try to create my own MathsProvider first and ran into trouble.

I copied the DefaultMathsProvider unaltered from "maths_eigen.h" code to my own struct and pass my struct into the LSTMLayerT template and it broke (models that worked now produce no output).

Is there anything else that I need to do? Thanks.

@jatinchowdhury18
Copy link
Owner

However, I actually did try to create my own MathsProvider first and ran into trouble.

I copied the DefaultMathsProvider unaltered from "maths_eigen.h" code to my own struct and pass my struct into the LSTMLayerT template and it broke (models that worked now produce no output).

Is there anything else that I need to do? Thanks.

It's hard to say without seeing more of the code... if you have a repo or something with a small example, I'd be happy to have a look at it.

@mikeoliphant
Copy link
Contributor Author

My code was pretty embedded, so I hacked up a branch on my fork of the RTNeural-NAM branch to test:

https://github.com/mikeoliphant/RTNeural-NAM/tree/math_test

I'm creating two LSTM models, identical except for that one uses my (identical) copy of the RTNeural DefaultMathsProvider.

They don't produce the same output.

Which is not surprising, since when it loads the second model, it complains that:

Loading a no-op layer!
Layer: lstm
  Dims: 16
Wrong layer type! Expected: Dense

Any ideas?

@mikeoliphant
Copy link
Contributor Author

It looks like when I specify a non-default MathsProvider, it fails to match the loadLayer template for LSTM, and gets picked up by the no-op template.

I tried adding "typename MathsProvider" to the LSTM template, but then it fails for both models.

I'm just shooting in the dark here - I haven't worked much with template classes before.

@jatinchowdhury18
Copy link
Owner

jatinchowdhury18 commented Nov 2, 2024

Oh interesting... thanks for the investigation. I've mostly been using custom MathsProviders with PyTorch-trained models, so I load the weights directly via the torch_helpers methods.

We should put together some test cases to ensure the MathsProvider template arguments are being handled properly in each part of the system... I'll add that to my list of TODOs.

@mikeoliphant
Copy link
Contributor Author

👍

@mikeoliphant
Copy link
Contributor Author

Saw your PR come through with the MathsProvider fixes. That worked - thanks!

I was still missing it for dynamic model json parsing, though. PR here: #152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants