-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use tanh to speed up sigmoid calculation #149
Conversation
I don't think we should merge this change for two reasons:
Unless you have some proof that the |
I would agree with you, except that expressing sigmoid in terms of tanh is only an approximation to the extent that your existing tanh is an approximation. |
Sure, but that's Eigen's choice, so people using the Eigen backend won't be "surprised" by that behaviour. My biggest concern is that this change could "break" existing users' recurrent models. The sigmoid function is expected to approach zero as For example, here's a Desmos plot comparing the two implementations: https://www.desmos.com/calculator/9a8j1tb2lz. I would imagine Desmos is using double-precision implementations of both |
I suspect that graph is more about precision issues on Desmos than anything else, but I'm not going to argue the point since I agree that having MathsProvider gives an easy work-around. However, I actually did try to create my own MathsProvider first and ran into trouble. I copied the DefaultMathsProvider unaltered from "maths_eigen.h" code to my own struct and pass my struct into the LSTMLayerT template and it broke (models that worked now produce no output). Is there anything else that I need to do? Thanks. |
It's hard to say without seeing more of the code... if you have a repo or something with a small example, I'd be happy to have a look at it. |
My code was pretty embedded, so I hacked up a branch on my fork of the RTNeural-NAM branch to test: https://github.com/mikeoliphant/RTNeural-NAM/tree/math_test I'm creating two LSTM models, identical except for that one uses my (identical) copy of the RTNeural DefaultMathsProvider. They don't produce the same output. Which is not surprising, since when it loads the second model, it complains that:
Any ideas? |
It looks like when I specify a non-default MathsProvider, it fails to match the loadLayer template for LSTM, and gets picked up by the no-op template. I tried adding "typename MathsProvider" to the LSTM template, but then it fails for both models. I'm just shooting in the dark here - I haven't worked much with template classes before. |
Oh interesting... thanks for the investigation. I've mostly been using custom We should put together some test cases to ensure the |
👍 |
Saw your PR come through with the MathsProvider fixes. That worked - thanks! I was still missing it for dynamic model json parsing, though. PR here: #152 |
For LSTM networks, the sigmoid activation function is very much in the performance hot path.
This PR defines the sigmoid calculation based on tanh() instead of exp(). It results in a significant performance improvement - I think because you are using a faster tanh approximation in your Eigen code.
It is hard to follow the twists through the Eigen code to the root tanh function, but I think this is what gets used?
RTNeural/modules/Eigen/Eigen/src/Core/MathFunctionsImpl.h
Line 166 in 32b8664
It will obviously vary based on platform and network architecture, but in my testing I was seeing around 50% of CPU going to the sigmoid calculation. This change cuts that roughly in half, so the overall performance improvement is about 25% (or about 1.3x faster).