-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance suboptimal for small matrices #29
Comments
Hi Michael, |
To answer the question: it's float32, if I'm not mistaken. Thanks for having a look, I've put everything into a gist, see https://gist.github.com/xor2k/b2b7d1d5e87bfe8a8a30c2d0c7e12f9e The code might not be as polished as it gets, but the approach is super generic and I think something like this may be interesting to have in OpenBLAS or Numpy for benchmarking in the future, especially the automated switch between environments, integrated plotting and the multidimensional parameter space to just exhaust all meaningful options. Feel free to ask if something does not work out-of-the-box. The simulation will take forever, so for a quick test it might make sense to (drastically) reduce the search space size in test.py. Creating a C/C++ version to benchmark BLAS directly would also be thinkable. |
The version of AOCL was the 5.0 release. I used the debian BLIS package and replaced the sources. I can also provide that package file if interested, only small modifications are necessary and since AOCL is open source, would totally make sense to integrate it into debian based distros. |
[AMD Official Use Only - AMD Internal Distribution Only]
Thanks Michael,
Can you also send mail to: ***@***.******@***.***>
Thanks,
Kiran V
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Michael ***@***.***>
Sent: Saturday, February 22, 2025 4:30:14 PM
To: amd/blis ***@***.***>
Cc: Varaganti, Kiran ***@***.***>; Comment ***@***.***>
Subject: Re: [amd/blis] Performance suboptimal for small matrices (Issue #29)
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
The version of AOCL was the 5.0 release. I used the debian BLIS package and replaced the sources. I can also provide that package file if interested, only small modifications are necessary and since AOCL is open source, would totally make sense to integrate it into debian based distros.
—
Reply to this email directly, view it on GitHub<#29 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADAJQWGN5UZ7EX4STVGDGMD2RBKD5AVCNFSM6AAAAABXGGKSTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZWGE2DOMRVHA>.
You are receiving this because you commented.Message ID: ***@***.***>
[xor2k]xor2k left a comment (amd/blis#29)<#29 (comment)>
The version of AOCL was the 5.0 release. I used the debian BLIS package and replaced the sources. I can also provide that package file if interested, only small modifications are necessary and since AOCL is open source, would totally make sense to integrate it into debian based distros.
—
Reply to this email directly, view it on GitHub<#29 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADAJQWGN5UZ7EX4STVGDGMD2RBKD5AVCNFSM6AAAAABXGGKSTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZWGE2DOMRVHA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Unfortunately, I cannot see your e-mail address in the message, Github replaced it with stars and also does not support direct messages. Can you write me directly? My current e-mail address you find e.g. on https://doi.org/10.1016/j.cor.2013.04.002 where I'm the first author. |
Just send you a request on LinkedIn to connect. |
Dear AOCL Team,
I'm currently working to improve Numpy's
matmul
for the strided case and I ran a large grid search with different BLAS frameworks, seenumpy/numpy#23752 (comment)
Here a repost of the plots:
blas_benchmark_v2.pdf
The plots show the improvement of performance of the respective BLAS framework plus copying over naïve matrix multiplication.
AOCL is based on BLIS. It is clearly visible that for the case
n=100
, AOCL provides a substantial improvement over BLIS (see purple shimmer). However, that is not the case for smaller matrices. Some countermeasures have been taken and left a triangular pattern in the performance chart.I wonder whether with the help of these plots performance can be improved for smaller matrices. I can do more benchmarks and plots like that if interested and also provide some code.
Best from Berlin, Michael
The text was updated successfully, but these errors were encountered: