Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add MKL Sparse BLAS module #4525

Closed
wants to merge 1 commit into from
Closed

RFC: Add MKL Sparse BLAS module #4525

wants to merge 1 commit into from

Conversation

andreasnoack
Copy link
Member

Wraps the Sparse BLAS subroutines for CSC storage format.

[Viral: Marking as RFC so that it does not get accidentally merged until 0.3]

@ViralBShah
Copy link
Member

Thanks. Let's hold merging this until master opens for 0.3.

@andreasnoack
Copy link
Member Author

Agree. Let's wait.

Just to quantify this. With @stevengj's Ai matrix the difference is

julia> min([@elapsed Ai*b for i = 1:10])
0.005347529

julia> min([@elapsed Base.LinAlg.SparseBLAS.cscmv!('N',1.0,"GXXFXX",Ai,b,0.0,zeros(length(b))) for i = 1:10])
0.001363916

or about four times faster and MKL can exploit the symmetry which makes it five times faster

julia> min([@elapsed Base.LinAlg.SparseBLAS.cscmv!('N',1.0,"SLNFXX",Ai,b,0.0,zeros(length(b))) for i = 1:10])
0.001052231

so for the iterative methods this could make a difference.

@lindahua
Copy link
Contributor

This is awesome!

@JeffBezanson
Copy link
Member

This is great, but what do we do when MKL is not available?

@lindahua
Copy link
Contributor

I thought we have a plan to ship MKL with Julia? (#4272).

When the user choose not to use MKL, then I think it should fallback to the old sparse module.

@andreasnoack
Copy link
Member Author

I am not sure how this should be done in practice. When/if you get a federal budget, we could also look at the NIST Sparse BLAS which, I think, the MKL Sparse BLAS is build upon.

@ViralBShah
Copy link
Member

The fastest thing one can do is use OSKI / pOSKI, but it is not supported anymore and is not up to date with new architectures. MKL is probably the best thing to start with. I am reasonably certain that if we do some blocking for better cache reuse and exploitation of matrix structure (Small mxn dense blocks + SIMD instructions), we can get close to MKL or perhaps even better.

It would actually be pretty amazing if we can write such high performance kernels in julia - and sparse matvec is an easy case to start experimenting with.

@ViralBShah
Copy link
Member

Quick overview of OSKI techniques - http://bebop.cs.berkeley.edu/oski/oski-cse2005-rev4.pdf

@jtravs
Copy link
Contributor

jtravs commented Oct 15, 2013

Just thought I'd add a vote here for an MKL alternative. Quite a lot of linux users (including very big cluster installations - which may end up as the biggest consumers of Julia per capita) will be building from source without MKL, so a good, feature complete and performant alternative to MKL really is necessary. I think it'd be a shame for major Julia functionality to depend on commercial libraries.

@ViralBShah
Copy link
Member

@jtravs I fully sympathize. MKL also makes it difficult for those with AMD and non-x86 architectures. We will certainly not depend on it, but where available, we should certainly use it. I am quite hopeful that either NIST or a pure julia implementation will help us get the desired performance.

@andreasnoack
Copy link
Member Author

The NIST sites are back online. It seems that only the C++ version is maintained and the older C and much older Fortran versions are only in double precision so I think NIST is a dead end.

@andreasnoack
Copy link
Member Author

@ViralBShah What to do with this?

@ViralBShah
Copy link
Member

We have two options:

  1. Include this in sysimg.jl only when USE_MKL is true
  2. Make this into an MKL package, which can then have other things from MKL as well.

Currently, I am preferring the second option, because it will then work with standard binaries. I wonder if we could provide the MKL BLAS and LAPACK in the same way too - through a package.

@staticfloat
Copy link
Member

@ViralBShah if we end up with an MKL package, we'd need some highly performant way of switching BLAS backends. If we can do that entirely in Julia-code, (perhaps by supplanting method definitions in Base? I have no idea how this would work) this would provide a much neater alternative to what is proposed in JuliaLang/LinearAlgebra.jl#27

@ViralBShah
Copy link
Member

We definitely need an MKL package for functionality like in this PR, which is MKL specific, and will not interfere with anything else.

@JeffBezanson Is it possible to overwrite the blas and lapack wrappers in Base, by importing new definitions from a package? All we need is for ccall to dlopen a different library for BLAS and LAPACK.

@JeffBezanson
Copy link
Member

You don't want to overwrite the wrappers, since the exact same wrappers are needed.
One way to do it is to have a global variable for each function, and a switch_blas function that calls dlsym to assign each variable from the requested library.

@staticfloat
Copy link
Member

I was hoping to avoid the extra cost of loading an address from memory
every time we called a BLAS function. Perhaps if we solved the
recompilation of dependent functions issue, we can create a get_blas_lib()
function that we overwrite, then use that function in all our BLAS code.
When we want to switch BLAS implementations, we change a switch inside that
function (or overwrite it) and somehow trigger recompilation of all the
BLAS functions that have already been compiled.
On Nov 25, 2013 9:40 AM, "Jeff Bezanson" [email protected] wrote:

You don't want to overwrite the wrappers, since the exact same wrappers
are needed.
One way to do it is to have a global variable for each function, and a
switch_blas function that calls dlsym to assign each variable from the
requested library.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4525#issuecomment-29223002
.

@JeffBezanson
Copy link
Member

That's theoretically possible, but observe the two basic approaches: either make the function addresses later-bound, or recompile everything. You can already do the recompile-everything approach by rebuilding the system image, and while we'd hope it would be faster to recompile a bit less stuff, there is still a cost.

Actually, we might change all ccalls to use the later-bound approach, allowing more code to be compiled in advance and yet find libraries at run time. PyCall is the typical example of a package that needs flexibility in finding libraries.

@staticfloat
Copy link
Member

I figured that since switching BLAS implementations wouldn't be something users would do that often, it might be acceptable to have a >1 second delay when doing it. My intuition about performance is very often incorrect though, so I'm totally okay with having some kind of global variable or something that is pulled from every time a BLAS call is made if it turns out that's still highly performant.

@ViralBShah
Copy link
Member

I prefer the later binding approach for now, and I doubt the performance penalty will be measurable. There's only one way to find out.

@ViralBShah
Copy link
Member

I think it is best to move this to an MKL.jl package.

@mlubin
Copy link
Member

mlubin commented Dec 25, 2013

I agree. Commercial-only code shouldn't be in base.

@andreasnoack
Copy link
Member Author

I'll close this one. After the bounds checking has been turned off the time gain as also not that big.

@StefanKarpinski
Copy link
Member

To be clear, bindings to proprietary libraries are not themselves proprietary. This has no effect on the license of our code, although I think what @mlubin meant was that code that's specifically for something that is proprietary, like MKL, should not be in Base Julia.

@mlubin
Copy link
Member

mlubin commented Dec 25, 2013

Right, that's what I meant.

@andreasnoack andreasnoack deleted the anj/sparsemkl branch January 7, 2014 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants