Batch reductions #447

neworderofjamie · 2021-08-05T16:32:01Z

So the batching system (#392) let you do parallel inference but, in order to do parallel training, you need to be able to sum up (reduce) what you're learning online across all elements in the batch and apply them to the (shared) weights. This PR implements this via some new VarAccess modes: REDUCE_SUM and REDUCE_MAX which signal that writes to these variables should be reductions. I went backwards and forwards a lot about the syntax for this one but ended up not really adding any new syntax so a gradient reduce and zeroing custom update might look like this:

gradient_batch_reduce_model = genn_model.create_custom_custom_update_class(
    "gradient_batch_reduce",
    var_name_types=[("reducedGradient", "scalar", VarAccess_REDUCE_BATCH_SUM)],
    var_refs=[("gradient", "scalar")],
    update_code="""
    $(reducedGradient) = $(gradient);
    $(gradient) = 0;
    """)

The nice thing with this lack of syntax is that you can do stuff like implement softmax with $(reducedGradient) = exp($(gradient)); (although that doesn't make a lot of sense reducing across batches) and backends which don't support batching (like single-threaded CPU) can basically just stick in a write back to global memory after the generic code generation and this will automatically turn into an (unnecessary) copy operation.

Because typically NUM_BATCHES << NUM_WEIGHTS, this reduction is quite different than those typically talked about in the literature (https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf) so I dug into the TF source to see how they implement reductions of this type and, for NUM_WEIGHTS > 4096, they use this very simple algorithm:

Launch kernel with one thread per weight/neuron
Initialise register to a suitable initial value (hence the 'type system' wrangling so a sensible initial value for REDUCE_MAX can be established)
In each thread loop over the batches and apply the reduction operation
Write the reduced value back to memory.

This makes sense as you get good coalescing of global memory reads and no need for atomics etc and, as GeNN will fuse any compatible reductions together so they're run in parallel, I think any reasonable model will easily occupy the GPU (which I think the 4096 vaguely represents).

I've been using this to do parallel eProp where one of these reductions on the gradients is followed by an Adam optimizer custom update which applies the now-non-batched gradients to the shared weights (via #446). However, it's pretty flexible so you could actually use it with STDP or whatever - you'd apply your STDP rule to a deltaG variable which would be duplicated across the batches, reduce these and add them to the (shared) weights.

On the Titan V, increasing the batch size gives decreases the effective time to train a single stimuli by around 4.5x as shown:

…operator

…tom updates

# Conflicts: # include/genn/genn/currentSource.h # include/genn/genn/neuronGroup.h # src/genn/genn/synapseGroup.cc

…`` and ``CustomUpdateWUGroupMergedBase::getVarRefIndex``

* added test of error

…:isReduction`` so they are set irrespective of actual batch size of model

…lSpec::addCustomUpdate`` rather than only when finalizing model (always a good thing)

…into ``BackendBase``

…threaded CPU)

…ppers to handle transposes involving custom WU update variablesadd additional error to prevent reduction and transpose operations being attempted simultaneously

codecov · 2021-08-05T18:16:57Z

Codecov Report

Merging #447 (8bd66ac) into master (78faf0d) will increase coverage by 0.07%.
The diff coverage is 92.20%.

@@            Coverage Diff             @@
##           master     #447      +/-   ##
==========================================
+ Coverage   88.00%   88.08%   +0.07%     
==========================================
  Files          78       78              
  Lines       16605    16824     +219     
==========================================
+ Hits        14614    14820     +206     
- Misses       1991     2004      +13

Impacted Files	Coverage Δ
include/genn/genn/code_generator/codeGenUtils.h	`98.07% <ø> (ø)`
include/genn/genn/code_generator/groupMerged.h	`89.68% <ø> (ø)`
include/genn/genn/currentSource.h	`77.77% <ø> (+19.44%)`	⬆️
include/genn/genn/currentSourceModels.h	`50.00% <ø> (ø)`
include/genn/genn/customUpdateInternal.h	`0.00% <ø> (ø)`
include/genn/genn/customUpdateModels.h	`50.00% <ø> (ø)`
include/genn/genn/gennUtils.h	`100.00% <ø> (ø)`
include/genn/genn/neuronGroup.h	`72.72% <ø> (-1.42%)`	⬇️
include/genn/genn/postsynapticModels.h	`100.00% <ø> (ø)`
src/genn/backends/single_threaded_cpu/backend.cc	`56.94% <37.50%> (-0.16%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78faf0d...8bd66ac. Read the comment docs.

tnowotny

Other than the comment about not initialising reduction type variables below it all makes sense ...

tnowotny · 2021-08-10T16:57:43Z

include/genn/genn/code_generator/backendSIMT.h

+        // Loop through variable references
+        for(const auto &v : cm->getVarRefs()) {
+            // If variable reference is a reduction target, define variable initialised to correct initial value for reduction
+            // **NOTE** by not initialising this, compilers should emit a warning if user code doesn't set it to something


uhm ... I am not quite sure I understand what is going on here. Is this an old comment (you seem to be initialising below after all) or am I missing the entire plot?

Good spot - I've moved the comment to where it belongs

# Conflicts: # include/genn/genn/currentSource.h # include/genn/genn/neuronGroup.h

neworderofjamie and others added 27 commits July 20, 2021 15:53

new variable access modes

eaed3ee

removed strange usage of VarAccessDuplication & VarAccessDuplication …

8c17846

…operator

finally tidied up makefile for unit tests

b36c249

fixed bug in var access modes

590bd79

checks that reduction variables aren't added to models other than cus…

ddf88a7

…tom updates

Merge branch 'name_validate' into reductions

fe1c9c6

# Conflicts: # include/genn/genn/currentSource.h # include/genn/genn/neuronGroup.h # src/genn/genn/synapseGroup.cc

incorporate reduction variable checks into validation framework

9f2301d

initial implementation of reduction operations

4f6242d

fixed compiler warnings

215928c

initial implemention of weight reduction operations

8467025

started feature test for reductions

27d676d

fixed typos

7623196

complete feature test

dccd48b

fixed warning

b0bebdc

fixed issue with var access flags

31a6f7b

respect isBatched in ``CustomUpdateWUGroupMergedBase::getVarIndex…

250052b

…`` and ``CustomUpdateWUGroupMergedBase::getVarRefIndex``

fixed issue in PyGeNN

79b43cb

* added error message if you try and reduce into a duplicate variable

321d49f

* added test of error

Small refactor of CustomUpdateBase::isBatched and ``CustomUpdate:…

49351c9

…:isReduction`` so they are set irrespective of actual batch size of model

updated tests to reflect that we can now detect some errors in ``Mode…

1a09636

…lSpec::addCustomUpdate`` rather than only when finalizing model (always a good thing)

moved some generic reduction-handling code down from BackendSIMT …

f1a0d59

…into ``BackendBase``

new test for reductions with batch size 1 (tests fallback for single-…

ca7988a

…threaded CPU)

simplification

4d6c790

actually BackendSIMT is a better place for genInitReductionTargets

cbca852

implemented single-threaded CPU non-reduction

5fe2b85

additional WUVarReference constructors and createWUVarRef wra…

0191f8f

…ppers to handle transposes involving custom WU update variablesadd additional error to prevent reduction and transpose operations being attempted simultaneously

Merge branch 'master' into reductions

5865000

neworderofjamie added the enhancement label Aug 5, 2021

neworderofjamie added this to the GeNN 4.6.0 milestone Aug 5, 2021

neworderofjamie requested a review from tnowotny August 5, 2021 18:05

neworderofjamie marked this pull request as ready for review August 5, 2021 18:05

tnowotny approved these changes Aug 10, 2021

View reviewed changes

neworderofjamie added 2 commits August 11, 2021 12:12

fixed comments

b45882f

Merge branch 'master' into reductions

8bd66ac

# Conflicts: # include/genn/genn/currentSource.h # include/genn/genn/neuronGroup.h

neworderofjamie merged commit 36d3e83 into master Aug 11, 2021

neworderofjamie deleted the reductions branch August 11, 2021 12:54

neworderofjamie mentioned this pull request Aug 11, 2021

NCCL multi-GPU reductions #449

Merged

neworderofjamie mentioned this pull request Oct 3, 2022

Pop reduction #539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch reductions #447

Batch reductions #447

neworderofjamie commented Aug 5, 2021 •

edited

Loading

codecov bot commented Aug 5, 2021 •

edited

Loading

tnowotny left a comment

tnowotny Aug 10, 2021

neworderofjamie Aug 11, 2021

Batch reductions #447

Batch reductions #447

Conversation

neworderofjamie commented Aug 5, 2021 • edited Loading

codecov bot commented Aug 5, 2021 • edited Loading

Codecov Report

tnowotny left a comment

Choose a reason for hiding this comment

tnowotny Aug 10, 2021

Choose a reason for hiding this comment

neworderofjamie Aug 11, 2021

Choose a reason for hiding this comment

neworderofjamie commented Aug 5, 2021 •

edited

Loading

codecov bot commented Aug 5, 2021 •

edited

Loading