Does PPSeq.jl scale okay? #1
russelljjarvis
announced in
Announcements
Replies: 3 comments
-
This is the output from the first 750 cells from multi_area_model https://github.com/INM-6/multi-area-model |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi Russell,
Thanks for getting in touch! (also I've never used this github
functionality, so I hope this email gets through to you!)
So, quick clarification; this is not actually my repo, nor my idea, nor
implementation! PPSeq was made by Williams et al. in the Linderman lab (
https://arxiv.org/abs/2010.04875). My only role is that my colleagues and I
modified PPSeq to look for replay, and in the process found and fixed some
super minor bug (
lindermanlab@cdda7b7).
So I'm only a contributor because of that tiny bug fix!
As such, it might be worth your while chatting to some of the original
authors.
Since I'm here though my only thoughts on your conundrum are:
- Yes, it get very slow. The number of spikes is the thing that slows it
down (since I believe the bottleneck step is the inference over which
sequence each spike might belong to, I could be wrong about that though).
This is useful to know because it suggests that one of your fixes -
grouping neurons together - won't work, since the run time scales with
number of spikes, not the number of neurons, and grouping the neurons
together doesn't change the number of spikes.
- The approved method for speeding it up is to chop the time of the
recording into small segments and run on each of these segments in
parallel. This is what all the distributed code does, and it worked well
for me. There are errors introduced by the artificial slicing of the data
(it effectively means the inference is happening over a subset of the data
sometimes, see paper for more discussion of this), but if you have a long
recording this is definitely a good approach as the errors are tiny.
Unfortunately for you this deals with large-time but not
large-neuron-number inference.
- Using the distributed thing above I have been able to run hundreds of
cells at once (~400 max, the size of the dataset I was analysing, so I
didn't test if it could go bigger), perhaps that is enough for you? It was
definitely possible to run the algorithm overnight in this case.
- Your suggestion to chop the neurons up into groups of 35 cells would
definitely make it possible to run it. My worry is that PPSeq is looking
for sequences that occur consistently across large populations of neurons.
Chopping the neurons into these arbitrary groups will ensure that you can
only detect sequences within those groups of 35 neurons, probably defeating
the point of using PPSeq?
I hope that helps somewhat!
Cheers,
Will
…On Fri, 20 May 2022 at 07:16, Russell Jarvis ***@***.***> wrote:
This is the output from the first 750 cells from multi_area_model
https://github.com/INM-6/multi-area-model
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7USPTRC3CBBXHTRXGOIFDVK4U4LANCNFSM5WOACVZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for the reply @WilburDoz. I am still tinkering with the code. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi @WilburDoz,
Your repository is totally awesome, thanks for coming up with the idea and implementing it. Sorry about creating this discussion here. I can delete this version of the discussion and paste it somewhere more appropriate if you agree?
I am trying to apply the algorithm to spike trains with 1 million of cells, and I have noticed that execution time is quiet long. Note though I have re-purposed setup/configuration from the examples.
I was wondering if there is some known range of cell numbers for which execution time is reasonable?
Also do you think there could be tricks for overcoming prohibitively long times associated with large cell count spike trains? I think I could lump the activity of many cells into one cell, so a single cell becomes a population of cells.
Or I could break 1million cells into chunks of 750 cells and run them through PPSeq sequence sorting/clustering and collate the re-sorted spike trains, back into a mutated copy of the original 1000 cell spike trains, until I have replaced every chunk with a sorted chunk?
It seems to terminate execution in about 5 minutes for 750 cellids that where recorded over 2000ms that seems reasonable. I think the full neuron count is 1 million cells.
@aMarcireau,
Beta Was this translation helpful? Give feedback.
All reactions