Pop reduction #539

neworderofjamie · 2022-10-03T09:20:27Z

Since #447, GeNN has supported 'batch reductions' where you can reduce (calculate the sum or maximum) a per-neuron/synapse state variable across the instances of a batched model. This PR introduces a basic implementation of 'neuron reductions' allowing the same reduction operations to be applied across a population of neurons e.g. to implement a softmax function on a population of output neurons. As is all too often the case, the majority of changes relate to refactoring the code that figures out how to index into variables to handle the new cases this PR enables!

Variable duplication modes

Variables that are shared across batched model instances are marked with an access mode containing the VarAccessDuplication::SHARED flag . As well as allowing for memory savings, for example by sharing weights, this allows these variables to be used as the targets for batch reductions. Similarly, this PR introduces a new VarAccessDuplication::SHARED_NEURON flag for variables which are shared between all the neurons in a population (but duplicated across batches). Outside of custom updates, the only access mode where this is allowable is VarAccess::READ_ONLY_SHARED_NEURON which results in a read-only variable shared between all neurons in a population.
Note: this functionality ends up rather duplicating the functionality of non-pointer extra global parameters but this is something I intend to unify in GeNN 5.0.0

Variable access modes

Now, with all the indexing fun required to deal with the new duplication modes, custom updates can now reduce into VarAccess::READ_ONLY_SHARED_NEURON variables using the VarAccessMode::REDUCE_SUM or VarAccessMode::REDUCE_MAX access modes or define their own state variables with the VarAccess::REDUCE_NEURON_SUM and VarAccess::REDUCE_NEURON_MAX access modes.

SIMT implementation

Similarly to our batch reductions, I copied the TF algorithm used in the common case (used when number of neurons < 1024). This uses a warp of threads per batch:

Warps, loop through variable to be reduced, reading coalesced values
Each thread calculates a local reduction in a register
This sort of warp reduction is performed
First thread in the warp writes back to global memory

This is tailored to e.g. small populations of output neurons (around warp size) and large batch size which can reasonably occupy GPU using this approach. Implementation at https://github.com/genn-team/genn/blob/pop_reduction/src/genn/genn/code_generator/backendSIMT.cc#L934-L988.

Only fly in the ointment is that OpenCL 1.2 does not expose warp-level operations so I have just added an error and left some rather CUDA-specific code generation in the SIMT backend. I can tidy this up another time.

CPU implementation

This is super-simple it just applies the operation in a loop! https://github.com/genn-team/genn/blob/pop_reduction/src/genn/backends/single_threaded_cpu/backend.cc#L562-L586

Syntax

Aside from the flags, no new syntax is required so a sum of the membrane voltages of a population of Izhikevich neurons can be calculated like this:

C++

#include "modelSpec.h"

class Reduction : public CustomUpdateModels::Base
{
    DECLARE_CUSTOM_UPDATE_MODEL(Reduction , 0, 1, 1);

    SET_UPDATE_CODE("$(reduction) = $(source);\n");

    SET_VARS({{"reduction", "scalar", VarAccess::REDUCE_NEURON_SUM}})
    SET_VAR_REFS({{"source", "scalar", VarAccessMode::READ_ONLY}});
};
IMPLEMENT_MODEL(Reduction);

void modelDefinition(ModelSpec &model)
{
    NeuronModels::Izhikevich::ParamValues paramVals(0.02, 0.2, -65.0, 8.0);
    NeuronModels::Izhikevich::VarValues varVals(0.0, 0.0);
    auto *pop = model.addNeuronPopulation<NeuronModels::Izhikevich>("Pop", 10, paramVals, varVals);

    Reduction::VarReferences reduceVarReferences(createVarRef(pop, "V"));
    Reduction::VarValues reduceVars(0.0);
    auto *cu = model.addCustomUpdate<Reduction>("Reduction", "CustomUpdate",
                                                {}, reduceVars, reduceVarReferences);
}

Python

from pygenn.genn_model import create_custom_custom_update_class, create_var_ref
from pygenn import GeNNModel
from pygenn.genn_wrapper.Models import VarAccess_REDUCE_NEURON_SUM, VarAccessMode_READ_ONLY

reduction = create_custom_custom_update_class("reduction",
                                              var_name_types=[("reduction", "scalar", VarAccess_REDUCE_NEURON_SUM)],
                                              var_refs=[("source", "scalar", VarAccessMode_READ_ONLY)],
                                              update_code="$(reduction) = $(source);\n")
model = GeNNModel("float", "test")

pop = model.add_neuron_population("Pop", 10, "Izhikevich", {"a": 0.01, "b": 0.2, "c": -65.0, "d": 8.0},
                                  {"V": 0.0, "U": 0.0})
model.add_custom_update("Reduction", "Calculate", reduction, {}, {"reduction": 0.0},
                        {"source": create_var_ref(pop, "V")})

Finally, see the custom_update_neuron_reduction_batch_one feature test for a more complex where I implement softmax

Performance

To evaluate performance, I replaced the CUDA intrinsics previously used by mlGeNN to implement softmax with a version implemented using this system and trained a single epoch of latency MNIST using eProp. As you might expect, adding three extra kernel launches per-timestep does reduce performance (mostly through CPU overheads):

However, this is pretty much the worst case as this is a tiny model with 100 hidden neurons so the (approximately constant) time taken to calculate the reduction is more significant than it would be if e.g. we were using a larger model/more computationally expensive ALIF model.

Future

At some point reductions probably need to be implemented using strategy pattern like presynaptic updates so different algorithms can be implemented for different backends/reduction shapes and selected automatically.
Various other sorts of reduction e.g. per synapse->per pre neuron and per synapse->per pre neuron and per synapse->kernel need to be implemented
Some reduction algorithms (like the SIMT approach added in this PR) don't require synchronisation between passes - this could be exploited if we added some syntax for describing multi-pass custom updates

Fixes #371

…d in WUM

…RON mode

… directly

…initialisation

…N`` tests

…pdate`` and ``CustomUpdateWU`` and split into ``isBatchReduction`` and ``isNeuronReduction``

* added warning about bug

…bles

…ron reductions

…f variable with a delay buffer

codecov · 2022-10-03T16:11:47Z

Codecov Report

Base: 87.04% // Head: 89.38% // Increases project coverage by +2.33% 🎉

Coverage data is based on head (23fa037) compared to base (206b436).
Patch coverage: 81.05% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #539      +/-   ##
==========================================
+ Coverage   87.04%   89.38%   +2.33%     
==========================================
  Files          84       73      -11     
  Lines       18099    10747    -7352     
==========================================
- Hits        15754     9606    -6148     
+ Misses       2345     1141    -1204

Impacted Files	Coverage Δ
include/genn/genn/code_generator/backendSIMT.h	`99.10% <ø> (+0.91%)`	⬆️
...genn/genn/code_generator/customUpdateGroupMerged.h	`100.00% <ø> (ø)`
...genn/genn/code_generator/neuronUpdateGroupMerged.h	`100.00% <ø> (ø)`
include/genn/genn/customUpdateInternal.h	`0.00% <ø> (ø)`
include/genn/genn/varAccess.h	`100.00% <ø> (ø)`
src/genn/genn/code_generator/modelSpecMerged.cc	`74.44% <0.00%> (-21.03%)`	⬇️
src/genn/genn/customUpdateModels.cc	`100.00% <ø> (ø)`
src/genn/genn/code_generator/backendSIMT.cc	`89.67% <35.13%> (-6.16%)`	⬇️
src/genn/backends/opencl/backend.cc	`91.51% <66.66%> (-0.05%)`	⬇️
src/genn/genn/weightUpdateModels.cc	`95.00% <66.66%> (-0.84%)`	⬇️
... and 84 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

tnowotny

Looking good. Thanks for also updating the documentation.

neworderofjamie added 18 commits September 28, 2022 10:15

new variable modes and checks that NEURON_SHARED variables aren't use…

d2ec7ff

…d in WUM

support for allocating, pushing and pulling variables with SHARED_NEU…

3a414fc

…RON mode

support for indexing variables with SHARED_NEURON duplication

5228315

genNeuronIndexCalculation is pre-substitution

c6bc853

initialisation of NEURON_SHARED variables

35e4550

fixed typo

69fd61d

more instances where substitutions aren't run so batch has to be used…

28b546d

… directly

extended batch_var_init to test READ_ONLY_SHARED_NEURON variable …

bbd43a5

…initialisation

modified pre and post wu var tests to mix in ``READ_ONLY_SHARED_NEURO…

6bc2e94

…N`` tests

moved isReduction test up from CustomUpdateModel to ``CustomU…

b3a09d2

…pdate`` and ``CustomUpdateWU`` and split into ``isBatchReduction`` and ``isNeuronReduction``

first go at CUDA population reduction implementation

2f1f969

test for batch size one population reductions

7ae2d2f

test for population reduction with larger batch size

b8c92b7

fixed typo

a9e8569

skip neuron reduction tests for OpenCL

65cd6e6

(currently failing) test of more complex three-step softmax operation

4ca9450

fixed index calculations for custom updates

9bfb8c8

test final softmax output

81e9b28

neworderofjamie added the enhancement label Oct 3, 2022

neworderofjamie added this to the GeNN 4.8.0 milestone Oct 3, 2022

neworderofjamie added 10 commits October 3, 2022 10:56

Makefile symlinks for new feature tests

288b5f6

removed commented out stuff

1d28d09

moved genInitReductionTargets from BackendSIMT to BackendBase

dca666a

* implemented single-threaded CPU neuron reductions

747dae8

* added warning about bug

CUDA optimiser

70b7966

tweaked type check

f10b72a

error message if Custom WU updates use model with NEURON_SHARED varia…

ce4ef69

…bles

error message if custom update is configured to do both batch and neu…

77af7f9

…ron reductions

unit tests

5efa219

Added error if you attempt to make neuron reductions on OpenCL backend

c821e86

neworderofjamie added 4 commits October 3, 2022 15:59

fixed nasty bug with reductions that write back result to some sort o…

d215557

…f variable with a delay buffer

fixed compiler warning in test

825a63c

update pip

e7aed41

fixed typo

1114e62

neworderofjamie requested a review from tnowotny October 3, 2022 15:57

neworderofjamie marked this pull request as ready for review October 3, 2022 15:57

documentation

23fa037

tnowotny approved these changes Oct 4, 2022

View reviewed changes

neworderofjamie merged commit 5b14794 into master Oct 4, 2022

neworderofjamie deleted the pop_reduction branch October 4, 2022 15:49

neworderofjamie mentioned this pull request Mar 24, 2023

Fix PyGeNN shared neuron #578

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pop reduction #539

Pop reduction #539

neworderofjamie commented Oct 3, 2022 •

edited

Loading

codecov bot commented Oct 3, 2022 •

edited

Loading

tnowotny left a comment

Pop reduction #539

Pop reduction #539

Conversation

neworderofjamie commented Oct 3, 2022 • edited Loading

Variable duplication modes

Variable access modes

SIMT implementation

CPU implementation

Syntax

C++

Python

Performance

Future

codecov bot commented Oct 3, 2022 • edited Loading

Codecov Report

tnowotny left a comment

Choose a reason for hiding this comment

neworderofjamie commented Oct 3, 2022 •

edited

Loading

codecov bot commented Oct 3, 2022 •

edited

Loading