Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activation function refactor #28

Merged
merged 9 commits into from
Apr 9, 2023

Conversation

mikeoliphant
Copy link
Contributor

This PR refactors the activation function handling. It does the following:

  • centralizes activation functions in activation.h/activation.cpp
  • does function lookup by name at model initialization
  • ensures that all activation function application is done in efficient block operations

It also adds a flag to turn on and off the fast tanh approximation. You just need to do call "activations::Activation::enable_fast_tanh()" before loading a model. Even with the potential of "Hardtanh" replacing it, it probably makes sense to implement this given all of the existing models out there that could be sped up. Maybe a toggle switch in the UI ("Fast Mode")?

I haven't yet switched over the LSTM code, and there are also some hardcoded sigmoids in the WaveNet code.

@mikeoliphant
Copy link
Contributor Author

Ok - I think there are more improvements that could be made here, but I'm going to leave it for now.

These changes shouldn't modify any existing behavior (other than making ConvNet a bit faster), or change the interface with the plugin (other than adding the enable/disable fast_tanh).

Comment on lines +4 to +8
{"Tanh", new activations::ActivationTanh()},
{"Hardtanh", new activations::ActivationHardTanh()},
{"Fasttanh", new activations::ActivationFastTanh()},
{"ReLU", new activations::ActivationReLU()},
{"Sigmoid", new activations::ActivationSigmoid()}};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this memory will be leaked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it makes me uncomfortable, too - even though it is global, so should happen once and stick around for the lifetime of the program. My c++ chops are rusty - not sure what the preferred modern c++ way of handling this is.

I also thought about creating individual global instances and just taking a pointer to them in the map.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about creating individual global instances and just taking a pointer to them in the map.

This seems to work?

#include "activations.h"

activations::ActivationTanh _TANH = activations::ActivationTanh();
activations::ActivationFastTanh _FAST_TANH = activations::ActivationFastTanh();
activations::ActivationHardTanh _HARD_TANH = activations::ActivationHardTanh();
activations::ActivationReLU _RELU = activations::ActivationReLU();
activations::ActivationSigmoid _SIGMOID = activations::ActivationSigmoid();

std::unordered_map<std::string, activations::Activation*> activations::Activation::_activations = {
  {"Tanh", &_TANH},
  {"Hardtanh", &_HARD_TANH},
  {"Fasttanh", &_FAST_TANH},
  {"ReLU", &_RELU},
  {"Sigmoid", &_SIGMOID}
};

Seems fine to me, though I know that this is stretching my discipline w/ C++, so either of y'all can lmk if you think there's anything wrong with this ;)

Comment on lines +39 to +51
virtual void apply(Eigen::MatrixXf& matrix)
{
apply(matrix.data(), matrix.rows() * matrix.cols());
}
virtual void apply(Eigen::Block<Eigen::MatrixXf> block)
{
apply(block.data(), block.rows() * block.cols());
}
virtual void apply(Eigen::Block<Eigen::MatrixXf, -1, -1, true> block)
{
apply(block.data(), block.rows() * block.cols());
}
virtual void apply(float* data, long size) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting functions which will be called per-sample (or maybe multiple times per-sample depending on the network architecture) behind a virtual interface will likely have a significant performance impact, since it prevents the compiler from inlining these function calls in most cases.

In RTNeural, this is a big reason why the "dynamic" API is much slower than the "static" API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about inlining, too, which is why I made sure to keep everything as block operations. I did not see any slowdown over the existing implementation in my testing, but that could be compiler-specific? I'm testing using VS on Windows.

The override methods definitely make it more convenient to use, but it would be worth giving that up if there is a clearly demonstrable performance gain to be had.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions that apply over a block seem like a good idea. Back-of-the-envelope, this is looking at 64*16=1024 elementwise ops (buffer 64, 16-channel model). I'd ballpark the virtual table overhead around 1% or less?

Comment on lines +66 to +69
for (long pos = 0; pos < size; pos++)
{
data[pos] = std::tanh(data[pos]);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better to use Eigen's math functions for implementing some of these things, since they do some internal vectorization. (And that vectorization can be even better if the Matrix size is known at compile-time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, tanh is best avoided completely...

I'm pretty sure I looked for a way to do a matrix-level tanh with Eigen, and didn't see one.

@mikeoliphant
Copy link
Contributor Author

mikeoliphant commented Apr 8, 2023

Btw, @jatinchowdhury18 - your work on rtneural (among other things) is much appreciated. I haven't yet used it directly myself, but it clearly has been very helpful throughout the audio ML community.

I considered trying to port NAM to sit on top of rtneural, but decided to focus on smaller changes instead.

Copy link
Owner

@sdatkinson sdatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanna grab the memory leak w/ what I wrote then it seems perfect to me. If not, I'll still merge & fix myself.

Thanks!

Comment on lines +4 to +8
{"Tanh", new activations::ActivationTanh()},
{"Hardtanh", new activations::ActivationHardTanh()},
{"Fasttanh", new activations::ActivationFastTanh()},
{"ReLU", new activations::ActivationReLU()},
{"Sigmoid", new activations::ActivationSigmoid()}};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about creating individual global instances and just taking a pointer to them in the map.

This seems to work?

#include "activations.h"

activations::ActivationTanh _TANH = activations::ActivationTanh();
activations::ActivationFastTanh _FAST_TANH = activations::ActivationFastTanh();
activations::ActivationHardTanh _HARD_TANH = activations::ActivationHardTanh();
activations::ActivationReLU _RELU = activations::ActivationReLU();
activations::ActivationSigmoid _SIGMOID = activations::ActivationSigmoid();

std::unordered_map<std::string, activations::Activation*> activations::Activation::_activations = {
  {"Tanh", &_TANH},
  {"Hardtanh", &_HARD_TANH},
  {"Fasttanh", &_FAST_TANH},
  {"ReLU", &_RELU},
  {"Sigmoid", &_SIGMOID}
};

Seems fine to me, though I know that this is stretching my discipline w/ C++, so either of y'all can lmk if you think there's anything wrong with this ;)

Comment on lines +39 to +51
virtual void apply(Eigen::MatrixXf& matrix)
{
apply(matrix.data(), matrix.rows() * matrix.cols());
}
virtual void apply(Eigen::Block<Eigen::MatrixXf> block)
{
apply(block.data(), block.rows() * block.cols());
}
virtual void apply(Eigen::Block<Eigen::MatrixXf, -1, -1, true> block)
{
apply(block.data(), block.rows() * block.cols());
}
virtual void apply(float* data, long size) {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions that apply over a block seem like a good idea. Back-of-the-envelope, this is looking at 64*16=1024 elementwise ops (buffer 64, 16-channel model). I'd ballpark the virtual table overhead around 1% or less?

if (this->_gated)
{
sigmoid_(this->_z, channels, 2 * channels, 0, this->_z.cols());
activations::Activation::get_activation("Sigmoid")->apply(this->_z.block(channels, 0, channels, this->_z.cols()));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I wouldn't call this "hard-coded" in this case--this is the most common implementation of a gated activation (messing with the sigmoid is, to my knowledge, not really done).

@mikeoliphant
Copy link
Contributor Author

If you wanna grab the memory leak w/ what I wrote then it seems perfect to me. If not, I'll still merge & fix myself.

What you wrote is pretty much what I was thinking, so how about you go ahead and merge and add it.

@sdatkinson sdatkinson merged commit 3a139b9 into sdatkinson:main Apr 9, 2023
@mikeoliphant mikeoliphant deleted the activation_refactor branch April 13, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants