Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pop reduction #539

Merged
merged 33 commits into from
Oct 4, 2022
Merged

Pop reduction #539

merged 33 commits into from
Oct 4, 2022

Conversation

neworderofjamie
Copy link
Contributor

@neworderofjamie neworderofjamie commented Oct 3, 2022

Since #447, GeNN has supported 'batch reductions' where you can reduce (calculate the sum or maximum) a per-neuron/synapse state variable across the instances of a batched model. This PR introduces a basic implementation of 'neuron reductions' allowing the same reduction operations to be applied across a population of neurons e.g. to implement a softmax function on a population of output neurons. As is all too often the case, the majority of changes relate to refactoring the code that figures out how to index into variables to handle the new cases this PR enables!

Variable duplication modes

Variables that are shared across batched model instances are marked with an access mode containing the VarAccessDuplication::SHARED flag . As well as allowing for memory savings, for example by sharing weights, this allows these variables to be used as the targets for batch reductions. Similarly, this PR introduces a new VarAccessDuplication::SHARED_NEURON flag for variables which are shared between all the neurons in a population (but duplicated across batches). Outside of custom updates, the only access mode where this is allowable is VarAccess::READ_ONLY_SHARED_NEURON which results in a read-only variable shared between all neurons in a population.
Note: this functionality ends up rather duplicating the functionality of non-pointer extra global parameters but this is something I intend to unify in GeNN 5.0.0

Variable access modes

Now, with all the indexing fun required to deal with the new duplication modes, custom updates can now reduce into VarAccess::READ_ONLY_SHARED_NEURON variables using the VarAccessMode::REDUCE_SUM or VarAccessMode::REDUCE_MAX access modes or define their own state variables with the VarAccess::REDUCE_NEURON_SUM and VarAccess::REDUCE_NEURON_MAX access modes.

SIMT implementation

Similarly to our batch reductions, I copied the TF algorithm used in the common case (used when number of neurons < 1024). This uses a warp of threads per batch:

  1. Warps, loop through variable to be reduced, reading coalesced values
  2. Each thread calculates a local reduction in a register
  3. This sort of warp reduction is performed image
  4. First thread in the warp writes back to global memory

This is tailored to e.g. small populations of output neurons (around warp size) and large batch size which can reasonably occupy GPU using this approach. Implementation at https://github.com/genn-team/genn/blob/pop_reduction/src/genn/genn/code_generator/backendSIMT.cc#L934-L988.

Only fly in the ointment is that OpenCL 1.2 does not expose warp-level operations so I have just added an error and left some rather CUDA-specific code generation in the SIMT backend. I can tidy this up another time.

CPU implementation

This is super-simple it just applies the operation in a loop! https://github.com/genn-team/genn/blob/pop_reduction/src/genn/backends/single_threaded_cpu/backend.cc#L562-L586

Syntax

Aside from the flags, no new syntax is required so a sum of the membrane voltages of a population of Izhikevich neurons can be calculated like this:

C++

#include "modelSpec.h"

class Reduction : public CustomUpdateModels::Base
{
    DECLARE_CUSTOM_UPDATE_MODEL(Reduction , 0, 1, 1);

    SET_UPDATE_CODE("$(reduction) = $(source);\n");

    SET_VARS({{"reduction", "scalar", VarAccess::REDUCE_NEURON_SUM}})
    SET_VAR_REFS({{"source", "scalar", VarAccessMode::READ_ONLY}});
};
IMPLEMENT_MODEL(Reduction);

void modelDefinition(ModelSpec &model)
{
    NeuronModels::Izhikevich::ParamValues paramVals(0.02, 0.2, -65.0, 8.0);
    NeuronModels::Izhikevich::VarValues varVals(0.0, 0.0);
    auto *pop = model.addNeuronPopulation<NeuronModels::Izhikevich>("Pop", 10, paramVals, varVals);

    Reduction::VarReferences reduceVarReferences(createVarRef(pop, "V"));
    Reduction::VarValues reduceVars(0.0);
    auto *cu = model.addCustomUpdate<Reduction>("Reduction", "CustomUpdate",
                                                {}, reduceVars, reduceVarReferences);
}

Python

from pygenn.genn_model import create_custom_custom_update_class, create_var_ref
from pygenn import GeNNModel
from pygenn.genn_wrapper.Models import VarAccess_REDUCE_NEURON_SUM, VarAccessMode_READ_ONLY

reduction = create_custom_custom_update_class("reduction",
                                              var_name_types=[("reduction", "scalar", VarAccess_REDUCE_NEURON_SUM)],
                                              var_refs=[("source", "scalar", VarAccessMode_READ_ONLY)],
                                              update_code="$(reduction) = $(source);\n")
model = GeNNModel("float", "test")

pop = model.add_neuron_population("Pop", 10, "Izhikevich", {"a": 0.01, "b": 0.2, "c": -65.0, "d": 8.0},
                                  {"V": 0.0, "U": 0.0})
model.add_custom_update("Reduction", "Calculate", reduction, {}, {"reduction": 0.0},
                        {"source": create_var_ref(pop, "V")})

Finally, see the custom_update_neuron_reduction_batch_one feature test for a more complex where I implement softmax

Performance

To evaluate performance, I replaced the CUDA intrinsics previously used by mlGeNN to implement softmax with a version implemented using this system and trained a single epoch of latency MNIST using eProp. As you might expect, adding three extra kernel launches per-timestep does reduce performance (mostly through CPU overheads):
Latency MNIST eProp training time (1 epoch) (1)
However, this is pretty much the worst case as this is a tiny model with 100 hidden neurons so the (approximately constant) time taken to calculate the reduction is more significant than it would be if e.g. we were using a larger model/more computationally expensive ALIF model.

Future

  • At some point reductions probably need to be implemented using strategy pattern like presynaptic updates so different algorithms can be implemented for different backends/reduction shapes and selected automatically.
  • Various other sorts of reduction e.g. per synapse->per pre neuron and per synapse->per pre neuron and per synapse->kernel need to be implemented
  • Some reduction algorithms (like the SIMT approach added in this PR) don't require synchronisation between passes - this could be exploited if we added some syntax for describing multi-pass custom updates

Fixes #371

@neworderofjamie neworderofjamie added this to the GeNN 4.8.0 milestone Oct 3, 2022
@neworderofjamie neworderofjamie marked this pull request as ready for review October 3, 2022 15:57
@codecov
Copy link

codecov bot commented Oct 3, 2022

Codecov Report

Base: 87.04% // Head: 89.38% // Increases project coverage by +2.33% 🎉

Coverage data is based on head (23fa037) compared to base (206b436).
Patch coverage: 81.05% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #539      +/-   ##
==========================================
+ Coverage   87.04%   89.38%   +2.33%     
==========================================
  Files          84       73      -11     
  Lines       18099    10747    -7352     
==========================================
- Hits        15754     9606    -6148     
+ Misses       2345     1141    -1204     
Impacted Files Coverage Δ
include/genn/genn/code_generator/backendSIMT.h 99.10% <ø> (+0.91%) ⬆️
...genn/genn/code_generator/customUpdateGroupMerged.h 100.00% <ø> (ø)
...genn/genn/code_generator/neuronUpdateGroupMerged.h 100.00% <ø> (ø)
include/genn/genn/customUpdateInternal.h 0.00% <ø> (ø)
include/genn/genn/varAccess.h 100.00% <ø> (ø)
src/genn/genn/code_generator/modelSpecMerged.cc 74.44% <0.00%> (-21.03%) ⬇️
src/genn/genn/customUpdateModels.cc 100.00% <ø> (ø)
src/genn/genn/code_generator/backendSIMT.cc 89.67% <35.13%> (-6.16%) ⬇️
src/genn/backends/opencl/backend.cc 91.51% <66.66%> (-0.05%) ⬇️
src/genn/genn/weightUpdateModels.cc 95.00% <66.66%> (-0.84%) ⬇️
... and 84 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@tnowotny tnowotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for also updating the documentation.

@neworderofjamie neworderofjamie merged commit 5b14794 into master Oct 4, 2022
@neworderofjamie neworderofjamie deleted the pop_reduction branch October 4, 2022 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reductions within neuron kernels
2 participants