Completion of merging implementation #316

neworderofjamie · 2020-04-15T10:24:47Z

First of all apologies that this has turned into such a beast. I was trying to cherry pick out features into seperate PRs but that turned into a mess so this PR encompasses a lot of things required to make the multi-area model work:

In the initial implementation of kernel merging in master, mergable groups had to have the same parameter values which was pretty rubbish. This PR detects which parameters are "heterogeneous" and adds their value to the merged struct and substitutes references to these rather than the hard-coded value.
The PCI bus ID-based GPU identification doesn't work on machines like JADE where GPUs are virtualised using NVML - have added a flag to GENN_PREFERENCES to switch back to id-based selection.
Ideally, in CUDA, you want the cumulative thread indices for merged groups which get binary searched and the merged structures to live in the constant cache, but it's very small. In master there was a flag which let you set where the structs were located but that:
1. Required user input
2. Didn't let you relocate the group indices on really large models (70000 indices don't fit in 64kb)
This is now fully-automated - the backend returns a data structure containing "memory spaces" and these structures are placed into this in a preferential order (we don't care as much about initialization kernel performance for example). This is some variant of an NP-hard bin-packing problem so finding the perfect solution is basically impossible 😨
When using the fixed number total connector you previously had to manually call a helper function to precalculate the row lengths in the sim code and shove it in an extra global parameter array. This was problematic because:
1. The user has to add loads of simulation code (compare master to this PR)
2. Aside from shoving them in userprojects and adding more SWIG horror it was hard to provide them to (especially Python) users and it was a bit rubbish that this code wasn't part of the model class.
3. Calling these functions thousands of times from Python was very slow
4. All the boilerplate code for handling extra global parameters made the runner size and hence the compile time explode
Weight update models now have the concept of host initialization code which is used to initialize EGPs on the host. This uses the normal code generation tricks to expose EGP allocation etc and kernel merging to avoid code duplication. This seems a bit special-case but maybe some more use-cases will appear!
Various other small runner-size optimizations
1. If all your model variables are located on the device only (as they are likely to be on a large model) then you end up with a lot of empty pushXXXXStateToDevice and pullXXXXStateToDevice functions - have added a flag to not generate these if they're empty
2. The previous way of populating merged structure overflowed the stack (especially on Windows) - switched to a better approach.
3. When you have lots of array extra global parameters their push, pull and allocate functions end up drastically expanding runner - you seldom want to pull them as they are essentially read only on device so added a flag to not generate pull functions
The merging algorithm was N² where N is the number of populations which got slow when N=70000! Have implemented a subtly different algorithm with NM complexity where M is the (typically much smaller) number of merged groups
The merged group structs generally consist of a mixture of scalar values (32-bit) and pointers (64-bit) which, if ordered naively (which we were), incurs a lot of padding overhead that really matters here as we really want these to fit in constant cache. We now sort fields by size (a classic solution from http://www.catb.org/esr/structure-packing/#_structure_reordering) to minimize padding.

Beyond these actual features, there is a quite a lot of refactoring in order to make the kernel merging code less awful. It was scattered around the MergedStructGenerator, in generateRunner.cc and in GroupMerged but it is now centralized in a class hierarchy of GroupMerged classes. This still ends up being quite a lot of code but it's better.

… new macro

* Base class for ``InitSparseConnectivity::Snippet::Init`` was not being detected - was a combination of SWIG being dumb and not having * EGPs were totally broken for synapse groups * Variable loading did not take variable location into account

# Conflicts: # pygenn/genn_model.py

… to userprojects

# Conflicts: # include/genn/genn/initSparseConnectivitySnippet.h

…ant way

…ues via merged struct * Neuron parameters will be sub * Current source parameters will be subs * Moved unpleasant sorting of 'children' of neuron groups into ``NeuronGroupMerged`` to allow this information to be accessed when generating neuron update

…eous

…ss gross

…enerateInit

…e updates

…python_microcircuit # Conflicts: # src/genn/genn/code_generator/modelSpecMerged.cc

…ut happy with behaviour

…s only used once for host init

… combined

…linker errors

# Conflicts: # include/genn/genn/code_generator/codeGenUtils.h # include/genn/genn/code_generator/generateInit.h # include/genn/genn/code_generator/generateNeuronUpdate.h # include/genn/genn/code_generator/generateSynapseUpdate.h # include/genn/genn/code_generator/substitutions.h # include/genn/genn/gennUtils.h # pygenn/genn_groups.py # pygenn/genn_model.py # setup.py # src/genn/backends/cuda/backend.cc # src/genn/generator/generator.cc # src/genn/genn/code_generator/codeGenUtils.cc # src/genn/genn/code_generator/generateAll.cc # src/genn/genn/code_generator/generateInit.cc # src/genn/genn/code_generator/generateNeuronUpdate.cc # src/genn/genn/code_generator/generateRunner.cc # src/genn/genn/code_generator/generateSynapseUpdate.cc

codecov · 2020-04-15T10:28:14Z

Codecov Report

Merging #316 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #316   +/-   ##
=======================================
  Coverage   82.34%   82.34%           
=======================================
  Files          64       64           
  Lines        9848     9848           
=======================================
  Hits         8109     8109           
  Misses       1739     1739

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27400b2...27400b2. Read the comment docs.

…c weight update model variables need to be updated BEFORE spikes are registered so sT is correct

tnowotny

Arguably, the flag at

genn/include/genn/genn/code_generator/backendBase.h

Lines 58 to 59 in 27400b2

    
           //! Should GeNN generate empty state push and pull functions 
        
           bool generateEmptyStatePushPull = true;

could be false by default? It is unclear what the empty pull/push functions would be good for (better for the user to get an error about an non-existing function rather than not getting any copy?)

Otherwise hard to digest ... have had a look through but don't think I can contribute much. We talked about possibly adding some developer notes on skype. I think that would be helpful, especially covering the merging data structures and procedures.

neworderofjamie · 2020-04-21T11:55:06Z

Totally agree that the default should change but, in general, I'm trying to semantically version so not change things that could break existing models (however, this does mean that, by the end of major version's life, it's a mess of flags...)

tnowotny · 2020-04-21T15:26:20Z

I suppose in the longer term, one should then introduce a deprecation warning and then at the next to next version remove it ...

neworderofjamie and others added 30 commits December 31, 2019 13:04

Work around grossness in swig interface by making sure ALL W.U.M. use…

b4e3228

… new macro

two small fixes for broken wrapping of calc max row/col length

f2f361d

further fixes

efd3cb8

* Base class for ``InitSparseConnectivity::Snippet::Init`` was not being detected - was a combination of SWIG being dumb and not having * EGPs were totally broken for synapse groups * Variable loading did not take variable location into account

should be able to load models without building

cc00dda

Use same consistent version of numpy on windows

cf831da

Use multiple cores for building generated code in pygenn

d91ac45

# Conflicts: # pygenn/genn_model.py

Exposed kernel timing information to PyGeNN

60e81bc

Added support for scalar EGPs in PyGeNN

79dae79

**CHERRYPICK** Hopefully improved merging algorithm

f1e82b1

Add 3rd party include directory to SWIG includes

e2fd756

Start of new logging system which will work better with PyGeNN

0691ed5

Seperate GeNN logging into 3 stream

39687fb

Updated SpineML to also use its own logging system

0c0d065

Successfully exposed logging to PyGeNN

6e5f706

**CHERRYPICK** Added some common snippets to GeNN

7fa1f54

**CHERRYPICK** Added poisson spike current source to GeNN

01f8ecb

**CHERRYPICK** add fixed number total row length calculation function…

c22b734

… to userprojects

Merge branch 'microcircuit_userproject' into python_microcircuit

3f376c4

# Conflicts: # include/genn/genn/initSparseConnectivitySnippet.h

Added SWIG module to expose fixedNumberTotalPreCalc in a numpy-compli…

e42db2f

…ant way

**CHERRYPICK** fixed bug in zero copy

aeb740f

Methods for testing whether wu var initialiser parameters are homogen…

bd344e8

…eous

Added small helperd to MergedStructGenerator to make code a little le…

467a641

…ss gross

Improved helpers and added var init param helpers

969c28f

Hacked out merging checks in synapse group, correctly substitute in g…

c44c5fc

…enerateInit

Duh!

4d4ec70

Correctly substituted heterogeneous parameters into procedural synaps…

3dd9091

…e updates

Merge remote-tracking branch 'origin/less_conservative_merging' into …

623d7dc

…python_microcircuit # Conflicts: # src/genn/genn/code_generator/modelSpecMerged.cc

**HACK** disabled #including of runner.cc to speed up compilation

5c01c31

Expose GeNN preferences to PyGeNN

14eeacb

neworderofjamie added 14 commits April 14, 2020 15:24

fix unit test script

43c6c31

Added further comments about threshold event conditions and merging b…

1c6f7d1

…ut happy with behaviour

Another attempt at fixing circular dependency

4efc23a

Another attempt at fixing circular dependency

785bd55

Implement of heterogeneous GLOBALG PSM variables

e786c29

Implement heterogeneous GLOBALG WU variables

b8fdb1c

Removed weird MergedStructGenerator::addEGPPointers method as it'…

f11508e

…s only used once for host init

Fixed potential bug - postsynaptic models with EGPs can't be linearly…

97d0ac9

… combined

Updated test to reflect new GLOBALG behaviour

39c6ad0

Implemented outstanding EGPs

fa8e2cb

make merged groups static in single threaded cpu so they don't cause …

a669698

…linker errors

fixed up another test project for windows

2297887

thanks compiler warnings! fixed error with extra global parameter arrays

cf8a4a5

neworderofjamie added the enhancement label Apr 15, 2020

neworderofjamie added this to the GeNN 4.3.0 milestone Apr 15, 2020

neworderofjamie marked this pull request as draft April 15, 2020 10:30

neworderofjamie marked this pull request as ready for review April 15, 2020 11:02

neworderofjamie requested a review from tnowotny April 15, 2020 11:02

neworderofjamie added 3 commits April 16, 2020 09:43

fiddled with Substitutions exports

f67e9c6

Added conclusion of 2 hours digging to PyGeNN readme!

d1ae13e

Fix for nasty bug in single-threaded CPU backend: Pre and postsynapti…

27400b2

…c weight update model variables need to be updated BEFORE spikes are registered so sT is correct

tnowotny approved these changes Apr 21, 2020

View reviewed changes

neworderofjamie merged commit 1433279 into master Apr 21, 2020

neworderofjamie deleted the python_microcircuit branch April 21, 2020 11:55

neworderofjamie mentioned this pull request Jul 28, 2020

Fix bug in merging logic for detecting which parameters are 'heterogeneus' #350

Merged

neworderofjamie mentioned this pull request Aug 11, 2021

NCCL multi-GPU reductions #449

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completion of merging implementation #316

Completion of merging implementation #316

neworderofjamie commented Apr 15, 2020 •

edited

Loading

codecov bot commented Apr 15, 2020 •

edited

Loading

tnowotny left a comment •

edited

Loading

neworderofjamie commented Apr 21, 2020

tnowotny commented Apr 21, 2020

	//! Should GeNN generate empty state push and pull functions
	bool generateEmptyStatePushPull = true;

Completion of merging implementation #316

Completion of merging implementation #316

Conversation

neworderofjamie commented Apr 15, 2020 • edited Loading

codecov bot commented Apr 15, 2020 • edited Loading

Codecov Report

tnowotny left a comment • edited Loading

Choose a reason for hiding this comment

neworderofjamie commented Apr 21, 2020

tnowotny commented Apr 21, 2020

neworderofjamie commented Apr 15, 2020 •

edited

Loading

codecov bot commented Apr 15, 2020 •

edited

Loading

tnowotny left a comment •

edited

Loading