Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completion of merging implementation #316

Merged
merged 141 commits into from
Apr 21, 2020
Merged

Conversation

neworderofjamie
Copy link
Contributor

@neworderofjamie neworderofjamie commented Apr 15, 2020

First of all apologies that this has turned into such a beast. I was trying to cherry pick out features into seperate PRs but that turned into a mess so this PR encompasses a lot of things required to make the multi-area model work:

  1. In the initial implementation of kernel merging in master, mergable groups had to have the same parameter values which was pretty rubbish. This PR detects which parameters are "heterogeneous" and adds their value to the merged struct and substitutes references to these rather than the hard-coded value.
  2. The PCI bus ID-based GPU identification doesn't work on machines like JADE where GPUs are virtualised using NVML - have added a flag to GENN_PREFERENCES to switch back to id-based selection.
  3. Ideally, in CUDA, you want the cumulative thread indices for merged groups which get binary searched and the merged structures to live in the constant cache, but it's very small. In master there was a flag which let you set where the structs were located but that:
    1. Required user input
    2. Didn't let you relocate the group indices on really large models (70000 indices don't fit in 64kb)
    This is now fully-automated - the backend returns a data structure containing "memory spaces" and these structures are placed into this in a preferential order (we don't care as much about initialization kernel performance for example). This is some variant of an NP-hard bin-packing problem so finding the perfect solution is basically impossible 😨
  4. When using the fixed number total connector you previously had to manually call a helper function to precalculate the row lengths in the sim code and shove it in an extra global parameter array. This was problematic because:
    1. The user has to add loads of simulation code (compare master to this PR)
    2. Aside from shoving them in userprojects and adding more SWIG horror it was hard to provide them to (especially Python) users and it was a bit rubbish that this code wasn't part of the model class.
    3. Calling these functions thousands of times from Python was very slow
    4. All the boilerplate code for handling extra global parameters made the runner size and hence the compile time explode
    Weight update models now have the concept of host initialization code which is used to initialize EGPs on the host. This uses the normal code generation tricks to expose EGP allocation etc and kernel merging to avoid code duplication. This seems a bit special-case but maybe some more use-cases will appear!
  5. Various other small runner-size optimizations
    1. If all your model variables are located on the device only (as they are likely to be on a large model) then you end up with a lot of empty pushXXXXStateToDevice and pullXXXXStateToDevice functions - have added a flag to not generate these if they're empty
    2. The previous way of populating merged structure overflowed the stack (especially on Windows) - switched to a better approach.
    3. When you have lots of array extra global parameters their push, pull and allocate functions end up drastically expanding runner - you seldom want to pull them as they are essentially read only on device so added a flag to not generate pull functions
  6. The merging algorithm was N2 where N is the number of populations which got slow when N=70000! Have implemented a subtly different algorithm with NM complexity where M is the (typically much smaller) number of merged groups
  7. The merged group structs generally consist of a mixture of scalar values (32-bit) and pointers (64-bit) which, if ordered naively (which we were), incurs a lot of padding overhead that really matters here as we really want these to fit in constant cache. We now sort fields by size (a classic solution from http://www.catb.org/esr/structure-packing/#_structure_reordering) to minimize padding.

Beyond these actual features, there is a quite a lot of refactoring in order to make the kernel merging code less awful. It was scattered around the MergedStructGenerator, in generateRunner.cc and in GroupMerged but it is now centralized in a class hierarchy of GroupMerged classes. This still ends up being quite a lot of code but it's better.

neworderofjamie and others added 30 commits December 31, 2019 13:04
* Base class for ``InitSparseConnectivity::Snippet::Init`` was not being detected - was a combination of SWIG being dumb and not having
* EGPs were totally broken for synapse groups
* Variable loading did not take variable location into account
# Conflicts:
#	include/genn/genn/initSparseConnectivitySnippet.h
…ues via merged struct

* Neuron parameters will be sub
* Current source parameters will be subs
* Moved unpleasant sorting of 'children' of neuron groups into ``NeuronGroupMerged`` to allow this information to be accessed when generating neuron update
…python_microcircuit

# Conflicts:
#	src/genn/genn/code_generator/modelSpecMerged.cc
# Conflicts:
#	include/genn/genn/code_generator/codeGenUtils.h
#	include/genn/genn/code_generator/generateInit.h
#	include/genn/genn/code_generator/generateNeuronUpdate.h
#	include/genn/genn/code_generator/generateSynapseUpdate.h
#	include/genn/genn/code_generator/substitutions.h
#	include/genn/genn/gennUtils.h
#	pygenn/genn_groups.py
#	pygenn/genn_model.py
#	setup.py
#	src/genn/backends/cuda/backend.cc
#	src/genn/generator/generator.cc
#	src/genn/genn/code_generator/codeGenUtils.cc
#	src/genn/genn/code_generator/generateAll.cc
#	src/genn/genn/code_generator/generateInit.cc
#	src/genn/genn/code_generator/generateNeuronUpdate.cc
#	src/genn/genn/code_generator/generateRunner.cc
#	src/genn/genn/code_generator/generateSynapseUpdate.cc
@codecov
Copy link

codecov bot commented Apr 15, 2020

Codecov Report

Merging #316 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #316   +/-   ##
=======================================
  Coverage   82.34%   82.34%           
=======================================
  Files          64       64           
  Lines        9848     9848           
=======================================
  Hits         8109     8109           
  Misses       1739     1739           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27400b2...27400b2. Read the comment docs.

@neworderofjamie neworderofjamie added this to the GeNN 4.3.0 milestone Apr 15, 2020
@neworderofjamie neworderofjamie marked this pull request as draft April 15, 2020 10:30
@neworderofjamie neworderofjamie marked this pull request as ready for review April 15, 2020 11:02
Copy link
Member

@tnowotny tnowotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, the flag at

//! Should GeNN generate empty state push and pull functions
bool generateEmptyStatePushPull = true;

could be false by default? It is unclear what the empty pull/push functions would be good for (better for the user to get an error about an non-existing function rather than not getting any copy?)

Otherwise hard to digest ... have had a look through but don't think I can contribute much. We talked about possibly adding some developer notes on skype. I think that would be helpful, especially covering the merging data structures and procedures.

@neworderofjamie neworderofjamie merged commit 1433279 into master Apr 21, 2020
@neworderofjamie
Copy link
Contributor Author

Totally agree that the default should change but, in general, I'm trying to semantically version so not change things that could break existing models (however, this does mean that, by the end of major version's life, it's a mess of flags...)

@neworderofjamie neworderofjamie deleted the python_microcircuit branch April 21, 2020 11:55
@tnowotny
Copy link
Member

I suppose in the longer term, one should then introduce a deprecation warning and then at the next to next version remove it ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants