-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add MKL Sparse BLAS module #4525
Conversation
Thanks. Let's hold merging this until master opens for 0.3. |
Agree. Let's wait. Just to quantify this. With @stevengj's
or about four times faster and MKL can exploit the symmetry which makes it five times faster
so for the iterative methods this could make a difference. |
This is awesome! |
This is great, but what do we do when MKL is not available? |
I thought we have a plan to ship MKL with Julia? (#4272). When the user choose not to use MKL, then I think it should fallback to the old sparse module. |
I am not sure how this should be done in practice. When/if you get a federal budget, we could also look at the NIST Sparse BLAS which, I think, the MKL Sparse BLAS is build upon. |
The fastest thing one can do is use OSKI / pOSKI, but it is not supported anymore and is not up to date with new architectures. MKL is probably the best thing to start with. I am reasonably certain that if we do some blocking for better cache reuse and exploitation of matrix structure (Small mxn dense blocks + SIMD instructions), we can get close to MKL or perhaps even better. It would actually be pretty amazing if we can write such high performance kernels in julia - and sparse matvec is an easy case to start experimenting with. |
Quick overview of OSKI techniques - http://bebop.cs.berkeley.edu/oski/oski-cse2005-rev4.pdf |
Just thought I'd add a vote here for an MKL alternative. Quite a lot of linux users (including very big cluster installations - which may end up as the biggest consumers of Julia per capita) will be building from source without MKL, so a good, feature complete and performant alternative to MKL really is necessary. I think it'd be a shame for major Julia functionality to depend on commercial libraries. |
@jtravs I fully sympathize. MKL also makes it difficult for those with AMD and non-x86 architectures. We will certainly not depend on it, but where available, we should certainly use it. I am quite hopeful that either NIST or a pure julia implementation will help us get the desired performance. |
The NIST sites are back online. It seems that only the C++ version is maintained and the older C and much older Fortran versions are only in double precision so I think NIST is a dead end. |
@ViralBShah What to do with this? |
We have two options:
Currently, I am preferring the second option, because it will then work with standard binaries. I wonder if we could provide the MKL BLAS and LAPACK in the same way too - through a package. |
@ViralBShah if we end up with an MKL package, we'd need some highly performant way of switching BLAS backends. If we can do that entirely in Julia-code, (perhaps by supplanting method definitions in Base? I have no idea how this would work) this would provide a much neater alternative to what is proposed in JuliaLang/LinearAlgebra.jl#27 |
We definitely need an MKL package for functionality like in this PR, which is MKL specific, and will not interfere with anything else. @JeffBezanson Is it possible to overwrite the blas and lapack wrappers in Base, by importing new definitions from a package? All we need is for |
You don't want to overwrite the wrappers, since the exact same wrappers are needed. |
I was hoping to avoid the extra cost of loading an address from memory
|
That's theoretically possible, but observe the two basic approaches: either make the function addresses later-bound, or recompile everything. You can already do the recompile-everything approach by rebuilding the system image, and while we'd hope it would be faster to recompile a bit less stuff, there is still a cost. Actually, we might change all |
I figured that since switching BLAS implementations wouldn't be something users would do that often, it might be acceptable to have a >1 second delay when doing it. My intuition about performance is very often incorrect though, so I'm totally okay with having some kind of global variable or something that is pulled from every time a BLAS call is made if it turns out that's still highly performant. |
I prefer the later binding approach for now, and I doubt the performance penalty will be measurable. There's only one way to find out. |
I think it is best to move this to an |
I agree. Commercial-only code shouldn't be in base. |
I'll close this one. After the bounds checking has been turned off the time gain as also not that big. |
To be clear, bindings to proprietary libraries are not themselves proprietary. This has no effect on the license of our code, although I think what @mlubin meant was that code that's specifically for something that is proprietary, like MKL, should not be in Base Julia. |
Right, that's what I meant. |
Wraps the Sparse BLAS subroutines for CSC storage format.
[Viral: Marking as RFC so that it does not get accidentally merged until 0.3]