RFC: Add MKL Sparse BLAS module #4525

andreasnoack · 2013-10-15T09:37:25Z

Wraps the Sparse BLAS subroutines for CSC storage format.

[Viral: Marking as RFC so that it does not get accidentally merged until 0.3]

ViralBShah · 2013-10-15T11:17:40Z

Thanks. Let's hold merging this until master opens for 0.3.

andreasnoack · 2013-10-15T13:25:18Z

Agree. Let's wait.

Just to quantify this. With @stevengj's Ai matrix the difference is

julia> min([@elapsed Ai*b for i = 1:10])
0.005347529

julia> min([@elapsed Base.LinAlg.SparseBLAS.cscmv!('N',1.0,"GXXFXX",Ai,b,0.0,zeros(length(b))) for i = 1:10])
0.001363916

or about four times faster and MKL can exploit the symmetry which makes it five times faster

julia> min([@elapsed Base.LinAlg.SparseBLAS.cscmv!('N',1.0,"SLNFXX",Ai,b,0.0,zeros(length(b))) for i = 1:10])
0.001052231

so for the iterative methods this could make a difference.

lindahua · 2013-10-15T13:58:16Z

This is awesome!

JeffBezanson · 2013-10-15T14:37:42Z

This is great, but what do we do when MKL is not available?

lindahua · 2013-10-15T14:53:55Z

I thought we have a plan to ship MKL with Julia? (#4272).

When the user choose not to use MKL, then I think it should fallback to the old sparse module.

andreasnoack · 2013-10-15T15:03:15Z

I am not sure how this should be done in practice. When/if you get a federal budget, we could also look at the NIST Sparse BLAS which, I think, the MKL Sparse BLAS is build upon.

ViralBShah · 2013-10-15T16:35:40Z

The fastest thing one can do is use OSKI / pOSKI, but it is not supported anymore and is not up to date with new architectures. MKL is probably the best thing to start with. I am reasonably certain that if we do some blocking for better cache reuse and exploitation of matrix structure (Small mxn dense blocks + SIMD instructions), we can get close to MKL or perhaps even better.

It would actually be pretty amazing if we can write such high performance kernels in julia - and sparse matvec is an easy case to start experimenting with.

ViralBShah · 2013-10-15T16:37:27Z

Quick overview of OSKI techniques - http://bebop.cs.berkeley.edu/oski/oski-cse2005-rev4.pdf

jtravs · 2013-10-15T17:19:15Z

Just thought I'd add a vote here for an MKL alternative. Quite a lot of linux users (including very big cluster installations - which may end up as the biggest consumers of Julia per capita) will be building from source without MKL, so a good, feature complete and performant alternative to MKL really is necessary. I think it'd be a shame for major Julia functionality to depend on commercial libraries.

ViralBShah · 2013-10-16T09:25:04Z

@jtravs I fully sympathize. MKL also makes it difficult for those with AMD and non-x86 architectures. We will certainly not depend on it, but where available, we should certainly use it. I am quite hopeful that either NIST or a pure julia implementation will help us get the desired performance.

andreasnoack · 2013-10-17T19:29:18Z

The NIST sites are back online. It seems that only the C++ version is maintained and the older C and much older Fortran versions are only in double precision so I think NIST is a dead end.

andreasnoack · 2013-11-24T12:53:29Z

@ViralBShah What to do with this?

ViralBShah · 2013-11-24T13:29:52Z

We have two options:

Include this in sysimg.jl only when USE_MKL is true
Make this into an MKL package, which can then have other things from MKL as well.

Currently, I am preferring the second option, because it will then work with standard binaries. I wonder if we could provide the MKL BLAS and LAPACK in the same way too - through a package.

staticfloat · 2013-11-25T06:50:49Z

@ViralBShah if we end up with an MKL package, we'd need some highly performant way of switching BLAS backends. If we can do that entirely in Julia-code, (perhaps by supplanting method definitions in Base? I have no idea how this would work) this would provide a much neater alternative to what is proposed in JuliaLang/LinearAlgebra.jl#27

ViralBShah · 2013-11-25T10:56:26Z

We definitely need an MKL package for functionality like in this PR, which is MKL specific, and will not interfere with anything else.

@JeffBezanson Is it possible to overwrite the blas and lapack wrappers in Base, by importing new definitions from a package? All we need is for ccall to dlopen a different library for BLAS and LAPACK.

JeffBezanson · 2013-11-25T17:39:51Z

You don't want to overwrite the wrappers, since the exact same wrappers are needed.
One way to do it is to have a global variable for each function, and a switch_blas function that calls dlsym to assign each variable from the requested library.

staticfloat · 2013-11-25T18:51:06Z

I was hoping to avoid the extra cost of loading an address from memory
every time we called a BLAS function. Perhaps if we solved the
recompilation of dependent functions issue, we can create a get_blas_lib()
function that we overwrite, then use that function in all our BLAS code.
When we want to switch BLAS implementations, we change a switch inside that
function (or overwrite it) and somehow trigger recompilation of all the
BLAS functions that have already been compiled.
On Nov 25, 2013 9:40 AM, "Jeff Bezanson" [email protected] wrote:

You don't want to overwrite the wrappers, since the exact same wrappers
are needed.
One way to do it is to have a global variable for each function, and a
switch_blas function that calls dlsym to assign each variable from the
requested library.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4525#issuecomment-29223002
.

JeffBezanson · 2013-11-25T19:00:09Z

That's theoretically possible, but observe the two basic approaches: either make the function addresses later-bound, or recompile everything. You can already do the recompile-everything approach by rebuilding the system image, and while we'd hope it would be faster to recompile a bit less stuff, there is still a cost.

Actually, we might change all ccalls to use the later-bound approach, allowing more code to be compiled in advance and yet find libraries at run time. PyCall is the typical example of a package that needs flexibility in finding libraries.

staticfloat · 2013-11-25T19:16:12Z

I figured that since switching BLAS implementations wouldn't be something users would do that often, it might be acceptable to have a >1 second delay when doing it. My intuition about performance is very often incorrect though, so I'm totally okay with having some kind of global variable or something that is pulled from every time a BLAS call is made if it turns out that's still highly performant.

ViralBShah · 2013-11-25T19:48:30Z

I prefer the later binding approach for now, and I doubt the performance penalty will be measurable. There's only one way to find out.

ViralBShah · 2013-12-25T11:56:10Z

I think it is best to move this to an MKL.jl package.

mlubin · 2013-12-25T15:50:32Z

I agree. Commercial-only code shouldn't be in base.

andreasnoack · 2013-12-25T17:11:27Z

I'll close this one. After the bounds checking has been turned off the time gain as also not that big.

StefanKarpinski · 2013-12-25T19:32:23Z

To be clear, bindings to proprietary libraries are not themselves proprietary. This has no effect on the license of our code, although I think what @mlubin meant was that code that's specifically for something that is proprietary, like MKL, should not be in Base Julia.

mlubin · 2013-12-25T19:34:18Z

Right, that's what I meant.

Add MKL Sparse BLAS module

3d82817

andreasnoack closed this Dec 25, 2013

andreasnoack deleted the anj/sparsemkl branch January 7, 2014 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add MKL Sparse BLAS module #4525

RFC: Add MKL Sparse BLAS module #4525

andreasnoack commented Oct 15, 2013

ViralBShah commented Oct 15, 2013

andreasnoack commented Oct 15, 2013

lindahua commented Oct 15, 2013

JeffBezanson commented Oct 15, 2013

lindahua commented Oct 15, 2013

andreasnoack commented Oct 15, 2013

ViralBShah commented Oct 15, 2013

ViralBShah commented Oct 15, 2013

jtravs commented Oct 15, 2013

ViralBShah commented Oct 16, 2013

andreasnoack commented Oct 17, 2013

andreasnoack commented Nov 24, 2013

ViralBShah commented Nov 24, 2013

staticfloat commented Nov 25, 2013

ViralBShah commented Nov 25, 2013

JeffBezanson commented Nov 25, 2013

staticfloat commented Nov 25, 2013

JeffBezanson commented Nov 25, 2013

staticfloat commented Nov 25, 2013

ViralBShah commented Nov 25, 2013

ViralBShah commented Dec 25, 2013

mlubin commented Dec 25, 2013

andreasnoack commented Dec 25, 2013

StefanKarpinski commented Dec 25, 2013

mlubin commented Dec 25, 2013

RFC: Add MKL Sparse BLAS module #4525

RFC: Add MKL Sparse BLAS module #4525

Conversation

andreasnoack commented Oct 15, 2013

ViralBShah commented Oct 15, 2013

andreasnoack commented Oct 15, 2013

lindahua commented Oct 15, 2013

JeffBezanson commented Oct 15, 2013

lindahua commented Oct 15, 2013

andreasnoack commented Oct 15, 2013

ViralBShah commented Oct 15, 2013

ViralBShah commented Oct 15, 2013

jtravs commented Oct 15, 2013

ViralBShah commented Oct 16, 2013

andreasnoack commented Oct 17, 2013

andreasnoack commented Nov 24, 2013

ViralBShah commented Nov 24, 2013

staticfloat commented Nov 25, 2013

ViralBShah commented Nov 25, 2013

JeffBezanson commented Nov 25, 2013

staticfloat commented Nov 25, 2013

JeffBezanson commented Nov 25, 2013

staticfloat commented Nov 25, 2013

ViralBShah commented Nov 25, 2013

ViralBShah commented Dec 25, 2013

mlubin commented Dec 25, 2013

andreasnoack commented Dec 25, 2013

StefanKarpinski commented Dec 25, 2013

mlubin commented Dec 25, 2013