Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Data Support #937

Closed
wants to merge 1 commit into from

Conversation

alemagnani
Copy link

This PR has been replaced by #2364

This adds some basic support for sparse data in CSR format. The main thing is a SparseBlob to store the sparse data and an extension to InnerProduct that handles both dense and sparse depending on what is presented at the input. A new data layer is added to read data from LevelDB.
Some more details:

  • subclass of Blob that handles matrices in CSR format (SparseBlob)
  • subclass of InnerProductLayer to deal with a sparse blob as input (it handle dense at the same time)
  • a new data layer (DataLayerSparseInput) that reads for LevelDB sparse vector and creates SparseBlob
  • support for CuSparse
  • new sparse math operations on both GPU and CPU
  • small change to Syncedmem to make it possible to own/not own GPU data.
  • new Proto object SparseDatum to store sparse vectors.
  • changed the way Net creates Blobs to support sparse blobs. Basically the bottom layer gets to decide what Blob to create. This way the DataLayerSparseInput produces SparseBlobs (see the new method in layer_factory.cpp). There could be other ways to achieve the same results.
  • exstensive test for the new math function and the new layers
  • 2 different implementations of GPU sparse kernels (one is commented out). This is because I found out that CUSparse has very poor performance at least in the way I used it.

If this is of interest to people I also have locally a data layer for sparse data that uses a block of memory and an extension to Pycaffe to make use of it and deal in python with sparse matrices.

@Yangqing
Copy link
Member

It's a pretty big PR so I haven't looked through it yet, but one thing in my mind is that we probably want to separate out the InnerProductLayer change to a SparseInnerProductLayer. This would hopefully make things a bit clearer rather than putting everything in one chunk.

@alemagnani
Copy link
Author

I initially thought of separating them but I then realized that in this way I was able to pass as input both dense and sparse data. Moreover conceptually the variables of the inner product are exactely the same weather the input is sparse or not and so I though that conceptually it was cleaner. In any case the code is separated in different files as if there were different layers and the change in the original InnerProductLayer is very minimal.

@alemagnani alemagnani force-pushed the sparse_data_pull_request branch from c82932c to 8b827d9 Compare August 24, 2014 07:31
@shelhamer shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00
@alemagnani alemagnani force-pushed the sparse_data_pull_request branch from 1afefce to 2a2a5de Compare September 17, 2014 04:58
@alemagnani
Copy link
Author

I split the InnerProductLayer as you said. Now there is a subclass of InnerProductLayer that support sparse input. Let me know if you need any clarification on the code.

@alemagnani
Copy link
Author

I rebase it to the latest dev branch.

@shelhamer
Copy link
Member

Thanks for the rebase! We will review this after the CVPR deadline.
On Wed, Nov 5, 2014 at 21:05 alemagnani [email protected] wrote:

I rebase it to the latest dev branch.


Reply to this email directly or view it on GitHub
#937 (comment).

@alemagnani alemagnani force-pushed the sparse_data_pull_request branch from de5308c to da35dfa Compare November 12, 2014 23:17
@jeffdonahue jeffdonahue mentioned this pull request Feb 15, 2015
@jeffdonahue jeffdonahue mentioned this pull request Mar 4, 2015
@shelhamer shelhamer added the JD label Mar 7, 2015
@alemagnani
Copy link
Author

I have a rebase ready on top of current master. Should I create a new PR on master?

@shelhamer
Copy link
Member

That would be great. Thanks @alemagnani!
On Fri, Apr 24, 2015 at 16:11 alemagnani [email protected] wrote:

I have a rebase ready on top of current master. Should I create a new PR
on master?


Reply to this email directly or view it on GitHub
#937 (comment).

@jeffdonahue
Copy link
Contributor

I'll close this as it's been replaced by #2364. Thanks for updating @alemagnani!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants