Does PPSeq.jl scale okay? #1

russelljjarvis · 2022-05-20T02:22:35Z

russelljjarvis
May 20, 2022
Maintainer

Your repository is totally awesome, thanks for coming up with the idea and implementing it. Sorry about creating this discussion here. I can delete this version of the discussion and paste it somewhere more appropriate if you agree?

I am trying to apply the algorithm to spike trains with 1 million of cells, and I have noticed that execution time is quiet long. Note though I have re-purposed setup/configuration from the examples.

I was wondering if there is some known range of cell numbers for which execution time is reasonable?

Also do you think there could be tricks for overcoming prohibitively long times associated with large cell count spike trains? I think I could lump the activity of many cells into one cell, so a single cell becomes a population of cells.

Or I could break 1million cells into chunks of 750 cells and run them through PPSeq sequence sorting/clustering and collate the re-sorted spike trains, back into a mutated copy of the original 1000 cell spike trains, until I have replaced every chunk with a sorted chunk?

It seems to terminate execution in about 5 minutes for 750 cellids that where recorded over 2000ms that seems reasonable. I think the full neuron count is 1 million cells.

@aMarcireau,

russelljjarvis · 2022-05-20T06:16:25Z

russelljjarvis
May 20, 2022
Maintainer Author

This is the output from the first 750 cells from multi_area_model https://github.com/INM-6/multi-area-model

0 replies

WilburDoz · 2022-05-20T07:50:33Z

WilburDoz
May 20, 2022

Hi Russell, Thanks for getting in touch! (also I've never used this github functionality, so I hope this email gets through to you!) So, quick clarification; this is not actually my repo, nor my idea, nor implementation! PPSeq was made by Williams et al. in the Linderman lab ( https://arxiv.org/abs/2010.04875). My only role is that my colleagues and I modified PPSeq to look for replay, and in the process found and fixed some super minor bug ( lindermanlab@cdda7b7). So I'm only a contributor because of that tiny bug fix! As such, it might be worth your while chatting to some of the original authors. Since I'm here though my only thoughts on your conundrum are: - Yes, it get very slow. The number of spikes is the thing that slows it down (since I believe the bottleneck step is the inference over which sequence each spike might belong to, I could be wrong about that though). This is useful to know because it suggests that one of your fixes - grouping neurons together - won't work, since the run time scales with number of spikes, not the number of neurons, and grouping the neurons together doesn't change the number of spikes. - The approved method for speeding it up is to chop the time of the recording into small segments and run on each of these segments in parallel. This is what all the distributed code does, and it worked well for me. There are errors introduced by the artificial slicing of the data (it effectively means the inference is happening over a subset of the data sometimes, see paper for more discussion of this), but if you have a long recording this is definitely a good approach as the errors are tiny. Unfortunately for you this deals with large-time but not large-neuron-number inference. - Using the distributed thing above I have been able to run hundreds of cells at once (~400 max, the size of the dataset I was analysing, so I didn't test if it could go bigger), perhaps that is enough for you? It was definitely possible to run the algorithm overnight in this case. - Your suggestion to chop the neurons up into groups of 35 cells would definitely make it possible to run it. My worry is that PPSeq is looking for sequences that occur consistently across large populations of neurons. Chopping the neurons into these arbitrary groups will ensure that you can only detect sequences within those groups of 35 neurons, probably defeating the point of using PPSeq? I hope that helps somewhat! Cheers, Will

…

On Fri, 20 May 2022 at 07:16, Russell Jarvis ***@***.***> wrote: This is the output from the first 750 cells from multi_area_model https://github.com/INM-6/multi-area-model — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7USPTRC3CBBXHTRXGOIFDVK4U4LANCNFSM5WOACVZA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

russelljjarvis · 2022-06-03T04:15:27Z

russelljjarvis
Jun 3, 2022
Maintainer Author

Thanks for the reply @WilburDoz. I am still tinkering with the code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does PPSeq.jl scale okay? #1

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Does PPSeq.jl scale okay? #1

russelljjarvis May 20, 2022 Maintainer

Replies: 3 comments

russelljjarvis May 20, 2022 Maintainer Author

WilburDoz May 20, 2022

russelljjarvis Jun 3, 2022 Maintainer Author

russelljjarvis
May 20, 2022
Maintainer

russelljjarvis
May 20, 2022
Maintainer Author

WilburDoz
May 20, 2022

russelljjarvis
Jun 3, 2022
Maintainer Author