Use tanh to speed up sigmoid calculation #149

mikeoliphant · 2024-11-01T16:19:17Z

For LSTM networks, the sigmoid activation function is very much in the performance hot path.

This PR defines the sigmoid calculation based on tanh() instead of exp(). It results in a significant performance improvement - I think because you are using a faster tanh approximation in your Eigen code.

It is hard to follow the twists through the Eigen code to the root tanh function, but I think this is what gets used?

RTNeural/modules/Eigen/Eigen/src/Core/MathFunctionsImpl.h

Line 166 in 32b8664

T generic_fast_tanh_float(const T& a_x)

It will obviously vary based on platform and network architecture, but in my testing I was seeing around 50% of CPU going to the sigmoid calculation. This change cuts that roughly in half, so the overall performance improvement is about 25% (or about 1.3x faster).

…tion

jatinchowdhury18 · 2024-11-01T19:25:52Z

I don't think we should merge this change for two reasons:

The whole point of the MathsProvider pattern is that users can manually provide their own implementations of functions like sigmoid, on a per-model or even per-layer basis.
With that in mind, I think it's best for the "default" MathsProvider to provide the highest precision implementation of each function that can be expected relative to the constraints of the chosen backend. This is especially important for recurrent layers, since precision errors might "build" up over time, and potentially even lead to instabilities. If users know they can get by with less precision for certain layers/models, they can implement their own MathsProvider for those layers/models, and potentially even use more aggressive approximations.

Unless you have some proof that the tanh implementation would be equal to the exp implementation to within, let's say 5 ULP for all possible single-precision floating point numbers, then I don't think change is worth the potential risk.

mikeoliphant · 2024-11-01T19:35:42Z

I would agree with you, except that expressing sigmoid in terms of tanh is only an approximation to the extent that your existing tanh is an approximation.

jatinchowdhury18 · 2024-11-01T19:56:26Z

Sure, but that's Eigen's choice, so people using the Eigen backend won't be "surprised" by that behaviour.

My biggest concern is that this change could "break" existing users' recurrent models. The sigmoid function is expected to approach zero as x → -∞, and there's a high risk that this change would drastically increase the relative error for small values... exactly the kind of thing that could lead to instabilities. That seems like a silly risk to take when the MathsProvider pattern already allows users to provide their own approximations.

For example, here's a Desmos plot comparing the two implementations: https://www.desmos.com/calculator/9a8j1tb2lz. I would imagine Desmos is using double-precision implementations of both exp() and tanh(), but the relative error still becomes quite large as x goes negative.

mikeoliphant · 2024-11-01T20:23:47Z

I suspect that graph is more about precision issues on Desmos than anything else, but I'm not going to argue the point since I agree that having MathsProvider gives an easy work-around.

However, I actually did try to create my own MathsProvider first and ran into trouble.

I copied the DefaultMathsProvider unaltered from "maths_eigen.h" code to my own struct and pass my struct into the LSTMLayerT template and it broke (models that worked now produce no output).

Is there anything else that I need to do? Thanks.

jatinchowdhury18 · 2024-11-01T20:48:46Z

However, I actually did try to create my own MathsProvider first and ran into trouble.

I copied the DefaultMathsProvider unaltered from "maths_eigen.h" code to my own struct and pass my struct into the LSTMLayerT template and it broke (models that worked now produce no output).

Is there anything else that I need to do? Thanks.

It's hard to say without seeing more of the code... if you have a repo or something with a small example, I'd be happy to have a look at it.

mikeoliphant · 2024-11-02T00:45:39Z

My code was pretty embedded, so I hacked up a branch on my fork of the RTNeural-NAM branch to test:

https://github.com/mikeoliphant/RTNeural-NAM/tree/math_test

I'm creating two LSTM models, identical except for that one uses my (identical) copy of the RTNeural DefaultMathsProvider.

They don't produce the same output.

Which is not surprising, since when it loads the second model, it complains that:

Loading a no-op layer!
Layer: lstm
  Dims: 16
Wrong layer type! Expected: Dense

Any ideas?

mikeoliphant · 2024-11-02T14:55:36Z

It looks like when I specify a non-default MathsProvider, it fails to match the loadLayer template for LSTM, and gets picked up by the no-op template.

I tried adding "typename MathsProvider" to the LSTM template, but then it fails for both models.

I'm just shooting in the dark here - I haven't worked much with template classes before.

jatinchowdhury18 · 2024-11-02T17:00:53Z

Oh interesting... thanks for the investigation. I've mostly been using custom MathsProviders with PyTorch-trained models, so I load the weights directly via the torch_helpers methods.

We should put together some test cases to ensure the MathsProvider template arguments are being handled properly in each part of the system... I'll add that to my list of TODOs.

mikeoliphant · 2024-11-02T18:26:25Z

👍

mikeoliphant · 2024-11-04T20:27:50Z

Saw your PR come through with the MathsProvider fixes. That worked - thanks!

I was still missing it for dynamic model json parsing, though. PR here: #152

Use tanh (which seems to be approximated) to speed up sigmoid calcula…

1d30f10

…tion

jatinchowdhury18 closed this Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tanh to speed up sigmoid calculation #149

Use tanh to speed up sigmoid calculation #149

mikeoliphant commented Nov 1, 2024

jatinchowdhury18 commented Nov 1, 2024 •

edited

Loading

mikeoliphant commented Nov 1, 2024

jatinchowdhury18 commented Nov 1, 2024

mikeoliphant commented Nov 1, 2024

jatinchowdhury18 commented Nov 1, 2024

mikeoliphant commented Nov 2, 2024

mikeoliphant commented Nov 2, 2024

jatinchowdhury18 commented Nov 2, 2024 •

edited

Loading

mikeoliphant commented Nov 2, 2024

mikeoliphant commented Nov 4, 2024

Use tanh to speed up sigmoid calculation #149

Use tanh to speed up sigmoid calculation #149

Conversation

mikeoliphant commented Nov 1, 2024

jatinchowdhury18 commented Nov 1, 2024 • edited Loading

mikeoliphant commented Nov 1, 2024

jatinchowdhury18 commented Nov 1, 2024

mikeoliphant commented Nov 1, 2024

jatinchowdhury18 commented Nov 1, 2024

mikeoliphant commented Nov 2, 2024

mikeoliphant commented Nov 2, 2024

jatinchowdhury18 commented Nov 2, 2024 • edited Loading

mikeoliphant commented Nov 2, 2024

mikeoliphant commented Nov 4, 2024

jatinchowdhury18 commented Nov 1, 2024 •

edited

Loading

jatinchowdhury18 commented Nov 2, 2024 •

edited

Loading