-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activation function refactor #28
Conversation
Ok - I think there are more improvements that could be made here, but I'm going to leave it for now. These changes shouldn't modify any existing behavior (other than making ConvNet a bit faster), or change the interface with the plugin (other than adding the enable/disable fast_tanh). |
{"Tanh", new activations::ActivationTanh()}, | ||
{"Hardtanh", new activations::ActivationHardTanh()}, | ||
{"Fasttanh", new activations::ActivationFastTanh()}, | ||
{"ReLU", new activations::ActivationReLU()}, | ||
{"Sigmoid", new activations::ActivationSigmoid()}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this memory will be leaked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it makes me uncomfortable, too - even though it is global, so should happen once and stick around for the lifetime of the program. My c++ chops are rusty - not sure what the preferred modern c++ way of handling this is.
I also thought about creating individual global instances and just taking a pointer to them in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about creating individual global instances and just taking a pointer to them in the map.
This seems to work?
#include "activations.h"
activations::ActivationTanh _TANH = activations::ActivationTanh();
activations::ActivationFastTanh _FAST_TANH = activations::ActivationFastTanh();
activations::ActivationHardTanh _HARD_TANH = activations::ActivationHardTanh();
activations::ActivationReLU _RELU = activations::ActivationReLU();
activations::ActivationSigmoid _SIGMOID = activations::ActivationSigmoid();
std::unordered_map<std::string, activations::Activation*> activations::Activation::_activations = {
{"Tanh", &_TANH},
{"Hardtanh", &_HARD_TANH},
{"Fasttanh", &_FAST_TANH},
{"ReLU", &_RELU},
{"Sigmoid", &_SIGMOID}
};
Seems fine to me, though I know that this is stretching my discipline w/ C++, so either of y'all can lmk if you think there's anything wrong with this ;)
virtual void apply(Eigen::MatrixXf& matrix) | ||
{ | ||
apply(matrix.data(), matrix.rows() * matrix.cols()); | ||
} | ||
virtual void apply(Eigen::Block<Eigen::MatrixXf> block) | ||
{ | ||
apply(block.data(), block.rows() * block.cols()); | ||
} | ||
virtual void apply(Eigen::Block<Eigen::MatrixXf, -1, -1, true> block) | ||
{ | ||
apply(block.data(), block.rows() * block.cols()); | ||
} | ||
virtual void apply(float* data, long size) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting functions which will be called per-sample (or maybe multiple times per-sample depending on the network architecture) behind a virtual interface will likely have a significant performance impact, since it prevents the compiler from inlining these function calls in most cases.
In RTNeural, this is a big reason why the "dynamic" API is much slower than the "static" API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about inlining, too, which is why I made sure to keep everything as block operations. I did not see any slowdown over the existing implementation in my testing, but that could be compiler-specific? I'm testing using VS on Windows.
The override methods definitely make it more convenient to use, but it would be worth giving that up if there is a clearly demonstrable performance gain to be had.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions that apply
over a block seem like a good idea. Back-of-the-envelope, this is looking at 64*16=1024 elementwise ops (buffer 64, 16-channel model). I'd ballpark the virtual table overhead around 1% or less?
for (long pos = 0; pos < size; pos++) | ||
{ | ||
data[pos] = std::tanh(data[pos]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be better to use Eigen's math functions for implementing some of these things, since they do some internal vectorization. (And that vectorization can be even better if the Matrix size is known at compile-time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, tanh is best avoided completely...
I'm pretty sure I looked for a way to do a matrix-level tanh with Eigen, and didn't see one.
Btw, @jatinchowdhury18 - your work on rtneural (among other things) is much appreciated. I haven't yet used it directly myself, but it clearly has been very helpful throughout the audio ML community. I considered trying to port NAM to sit on top of rtneural, but decided to focus on smaller changes instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanna grab the memory leak w/ what I wrote then it seems perfect to me. If not, I'll still merge & fix myself.
Thanks!
{"Tanh", new activations::ActivationTanh()}, | ||
{"Hardtanh", new activations::ActivationHardTanh()}, | ||
{"Fasttanh", new activations::ActivationFastTanh()}, | ||
{"ReLU", new activations::ActivationReLU()}, | ||
{"Sigmoid", new activations::ActivationSigmoid()}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about creating individual global instances and just taking a pointer to them in the map.
This seems to work?
#include "activations.h"
activations::ActivationTanh _TANH = activations::ActivationTanh();
activations::ActivationFastTanh _FAST_TANH = activations::ActivationFastTanh();
activations::ActivationHardTanh _HARD_TANH = activations::ActivationHardTanh();
activations::ActivationReLU _RELU = activations::ActivationReLU();
activations::ActivationSigmoid _SIGMOID = activations::ActivationSigmoid();
std::unordered_map<std::string, activations::Activation*> activations::Activation::_activations = {
{"Tanh", &_TANH},
{"Hardtanh", &_HARD_TANH},
{"Fasttanh", &_FAST_TANH},
{"ReLU", &_RELU},
{"Sigmoid", &_SIGMOID}
};
Seems fine to me, though I know that this is stretching my discipline w/ C++, so either of y'all can lmk if you think there's anything wrong with this ;)
virtual void apply(Eigen::MatrixXf& matrix) | ||
{ | ||
apply(matrix.data(), matrix.rows() * matrix.cols()); | ||
} | ||
virtual void apply(Eigen::Block<Eigen::MatrixXf> block) | ||
{ | ||
apply(block.data(), block.rows() * block.cols()); | ||
} | ||
virtual void apply(Eigen::Block<Eigen::MatrixXf, -1, -1, true> block) | ||
{ | ||
apply(block.data(), block.rows() * block.cols()); | ||
} | ||
virtual void apply(float* data, long size) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions that apply
over a block seem like a good idea. Back-of-the-envelope, this is looking at 64*16=1024 elementwise ops (buffer 64, 16-channel model). I'd ballpark the virtual table overhead around 1% or less?
if (this->_gated) | ||
{ | ||
sigmoid_(this->_z, channels, 2 * channels, 0, this->_z.cols()); | ||
activations::Activation::get_activation("Sigmoid")->apply(this->_z.block(channels, 0, channels, this->_z.cols())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I wouldn't call this "hard-coded" in this case--this is the most common implementation of a gated activation (messing with the sigmoid is, to my knowledge, not really done).
What you wrote is pretty much what I was thinking, so how about you go ahead and merge and add it. |
This PR refactors the activation function handling. It does the following:
It also adds a flag to turn on and off the fast tanh approximation. You just need to do call "activations::Activation::enable_fast_tanh()" before loading a model. Even with the potential of "Hardtanh" replacing it, it probably makes sense to implement this given all of the existing models out there that could be sped up. Maybe a toggle switch in the UI ("Fast Mode")?
I haven't yet switched over the LSTM code, and there are also some hardcoded sigmoids in the WaveNet code.