Activation function refactor #28

mikeoliphant · 2023-04-03T21:55:20Z

This PR refactors the activation function handling. It does the following:

centralizes activation functions in activation.h/activation.cpp
does function lookup by name at model initialization
ensures that all activation function application is done in efficient block operations

It also adds a flag to turn on and off the fast tanh approximation. You just need to do call "activations::Activation::enable_fast_tanh()" before loading a model. Even with the potential of "Hardtanh" replacing it, it probably makes sense to implement this given all of the existing models out there that could be sped up. Maybe a toggle switch in the UI ("Fast Mode")?

I haven't yet switched over the LSTM code, and there are also some hardcoded sigmoids in the WaveNet code.

mikeoliphant · 2023-04-04T16:45:31Z

Ok - I think there are more improvements that could be made here, but I'm going to leave it for now.

These changes shouldn't modify any existing behavior (other than making ConvNet a bit faster), or change the interface with the plugin (other than adding the enable/disable fast_tanh).

jatinchowdhury18 · 2023-04-07T19:32:03Z

NAM/activations.cpp

+  {"Tanh", new activations::ActivationTanh()},
+  {"Hardtanh", new activations::ActivationHardTanh()},
+  {"Fasttanh", new activations::ActivationFastTanh()},
+  {"ReLU", new activations::ActivationReLU()},
+  {"Sigmoid", new activations::ActivationSigmoid()}};


Seems like this memory will be leaked?

Yeah, it makes me uncomfortable, too - even though it is global, so should happen once and stick around for the lifetime of the program. My c++ chops are rusty - not sure what the preferred modern c++ way of handling this is.

I also thought about creating individual global instances and just taking a pointer to them in the map.

I also thought about creating individual global instances and just taking a pointer to them in the map.

This seems to work?

#include "activations.h" activations::ActivationTanh _TANH = activations::ActivationTanh(); activations::ActivationFastTanh _FAST_TANH = activations::ActivationFastTanh(); activations::ActivationHardTanh _HARD_TANH = activations::ActivationHardTanh(); activations::ActivationReLU _RELU = activations::ActivationReLU(); activations::ActivationSigmoid _SIGMOID = activations::ActivationSigmoid(); std::unordered_map<std::string, activations::Activation*> activations::Activation::_activations = { {"Tanh", &_TANH}, {"Hardtanh", &_HARD_TANH}, {"Fasttanh", &_FAST_TANH}, {"ReLU", &_RELU}, {"Sigmoid", &_SIGMOID} };

Seems fine to me, though I know that this is stretching my discipline w/ C++, so either of y'all can lmk if you think there's anything wrong with this ;)

jatinchowdhury18 · 2023-04-07T19:35:27Z

NAM/activations.h

+  virtual void apply(Eigen::MatrixXf& matrix)
+  {
+    apply(matrix.data(), matrix.rows() * matrix.cols());
+  }
+  virtual void apply(Eigen::Block<Eigen::MatrixXf> block)
+  {
+    apply(block.data(), block.rows() * block.cols());
+  }
+  virtual void apply(Eigen::Block<Eigen::MatrixXf, -1, -1, true> block)
+  {
+    apply(block.data(), block.rows() * block.cols());
+  }
+  virtual void apply(float* data, long size) {}


Putting functions which will be called per-sample (or maybe multiple times per-sample depending on the network architecture) behind a virtual interface will likely have a significant performance impact, since it prevents the compiler from inlining these function calls in most cases.

In RTNeural, this is a big reason why the "dynamic" API is much slower than the "static" API.

I was worried about inlining, too, which is why I made sure to keep everything as block operations. I did not see any slowdown over the existing implementation in my testing, but that could be compiler-specific? I'm testing using VS on Windows.

The override methods definitely make it more convenient to use, but it would be worth giving that up if there is a clearly demonstrable performance gain to be had.

The functions that apply over a block seem like a good idea. Back-of-the-envelope, this is looking at 64*16=1024 elementwise ops (buffer 64, 16-channel model). I'd ballpark the virtual table overhead around 1% or less?

jatinchowdhury18 · 2023-04-07T19:38:18Z

NAM/activations.h

+      for (long pos = 0; pos < size; pos++)
+      {
+        data[pos] = std::tanh(data[pos]);
+      }


It would probably be better to use Eigen's math functions for implementing some of these things, since they do some internal vectorization. (And that vectorization can be even better if the Matrix size is known at compile-time)

To be honest, tanh is best avoided completely...

I'm pretty sure I looked for a way to do a matrix-level tanh with Eigen, and didn't see one.

mikeoliphant · 2023-04-08T00:26:00Z

Btw, @jatinchowdhury18 - your work on rtneural (among other things) is much appreciated. I haven't yet used it directly myself, but it clearly has been very helpful throughout the audio ML community.

I considered trying to port NAM to sit on top of rtneural, but decided to focus on smaller changes instead.

sdatkinson

If you wanna grab the memory leak w/ what I wrote then it seems perfect to me. If not, I'll still merge & fix myself.

Thanks!

sdatkinson · 2023-04-09T01:07:30Z

NAM/activations.cpp

+  {"Tanh", new activations::ActivationTanh()},
+  {"Hardtanh", new activations::ActivationHardTanh()},
+  {"Fasttanh", new activations::ActivationFastTanh()},
+  {"ReLU", new activations::ActivationReLU()},
+  {"Sigmoid", new activations::ActivationSigmoid()}};


I also thought about creating individual global instances and just taking a pointer to them in the map.

This seems to work?

#include "activations.h" activations::ActivationTanh _TANH = activations::ActivationTanh(); activations::ActivationFastTanh _FAST_TANH = activations::ActivationFastTanh(); activations::ActivationHardTanh _HARD_TANH = activations::ActivationHardTanh(); activations::ActivationReLU _RELU = activations::ActivationReLU(); activations::ActivationSigmoid _SIGMOID = activations::ActivationSigmoid(); std::unordered_map<std::string, activations::Activation*> activations::Activation::_activations = { {"Tanh", &_TANH}, {"Hardtanh", &_HARD_TANH}, {"Fasttanh", &_FAST_TANH}, {"ReLU", &_RELU}, {"Sigmoid", &_SIGMOID} };

Seems fine to me, though I know that this is stretching my discipline w/ C++, so either of y'all can lmk if you think there's anything wrong with this ;)

sdatkinson · 2023-04-09T01:55:40Z

NAM/activations.h

+  virtual void apply(Eigen::MatrixXf& matrix)
+  {
+    apply(matrix.data(), matrix.rows() * matrix.cols());
+  }
+  virtual void apply(Eigen::Block<Eigen::MatrixXf> block)
+  {
+    apply(block.data(), block.rows() * block.cols());
+  }
+  virtual void apply(Eigen::Block<Eigen::MatrixXf, -1, -1, true> block)
+  {
+    apply(block.data(), block.rows() * block.cols());
+  }
+  virtual void apply(float* data, long size) {}


The functions that apply over a block seem like a good idea. Back-of-the-envelope, this is looking at 64*16=1024 elementwise ops (buffer 64, 16-channel model). I'd ballpark the virtual table overhead around 1% or less?

sdatkinson · 2023-04-09T01:58:14Z

NAM/wavenet.cpp

  if (this->_gated)
  {
-    sigmoid_(this->_z, channels, 2 * channels, 0, this->_z.cols());
+    activations::Activation::get_activation("Sigmoid")->apply(this->_z.block(channels, 0, channels, this->_z.cols()));


FYI, I wouldn't call this "hard-coded" in this case--this is the most common implementation of a gated activation (messing with the sigmoid is, to my knowledge, not really done).

mikeoliphant · 2023-04-09T03:29:49Z

If you wanna grab the memory leak w/ what I wrote then it seems perfect to me. If not, I'll still merge & fix myself.

What you wrote is pretty much what I was thinking, so how about you go ahead and merge and add it.

mikeoliphant added 9 commits April 3, 2023 13:34

Begin to refactor activation function handling

14cb71e

Remove #pragma

2aa7020

Fixed apply overrides

069679d

More switching to new activation implementaion

a3f3ce7

Move static function implementations into .cpp

1cf8a4f

Simplify Activation apply

74b0a49

Use new activation code for WaveNet Sigmoid

26a725d

Fixed WaveNet sigmoid

b41d172

No need for middleCols() in full matrix apply anymore

482b69a

mikeoliphant mentioned this pull request Apr 7, 2023

Enable, or allow toggling, Fast Tanh #30

Closed

jatinchowdhury18 reviewed Apr 7, 2023

View reviewed changes

sdatkinson approved these changes Apr 9, 2023

View reviewed changes

sdatkinson merged commit 3a139b9 into sdatkinson:main Apr 9, 2023

mikeoliphant deleted the activation_refactor branch April 13, 2023 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation function refactor #28

Activation function refactor #28

mikeoliphant commented Apr 3, 2023

mikeoliphant commented Apr 4, 2023

jatinchowdhury18 Apr 7, 2023

mikeoliphant Apr 7, 2023

sdatkinson Apr 9, 2023

jatinchowdhury18 Apr 7, 2023

mikeoliphant Apr 7, 2023

sdatkinson Apr 9, 2023

jatinchowdhury18 Apr 7, 2023

mikeoliphant Apr 7, 2023

mikeoliphant commented Apr 8, 2023 •

edited

Loading

sdatkinson left a comment

sdatkinson Apr 9, 2023

sdatkinson Apr 9, 2023

sdatkinson Apr 9, 2023

mikeoliphant commented Apr 9, 2023

Activation function refactor #28

Activation function refactor #28

Conversation

mikeoliphant commented Apr 3, 2023

mikeoliphant commented Apr 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeoliphant commented Apr 8, 2023 • edited Loading

sdatkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeoliphant commented Apr 9, 2023

mikeoliphant commented Apr 8, 2023 •

edited

Loading