-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison with torch jit #14
Comments
Hi, thanks! I'm not sure how it compares - I've never used torch jit. In general, I have very little experience with the python frameworks out there. This is one of the reasons to implement |
Hi @ggerganov |
@ggerganov thanks for the reply. I am astonished by your c implementation of these neural network building blocks. Like @skaiware said, the downside of jit could be that it comes with a big linking burden. I think your solution has a great potential to make big models run faster on any device! |
how did you make sure that the c implementation renders the same results as pytorch functions? Any tests to guard this? |
The results are note the same - it's hard to make them exactly the same due to round-off errors when using floating point numbers. Instead, I just verified manually that the numbers that I get after each layer are similar to the one from the original python implementation. It would be great to add some tests in the future to make sure the results match the reference implementation within some tolerance. |
@skaiware, how would you convert this C++ code to use CUDA? any ideas? |
IMHO it does not make sense to port this code to CUDA. If you want to use CUDA or other GPU framework, you are better off using some of the well-established python frameworks (PyTorch, Tensforflow, etc.). With And here also comes the Apple Silicon hardware which gives you the AMX matrix coprocessor offering great performance boost (at least according to my experiments). It is so easy to integrate it in your project (simply add My understanding is that python frameworks at the moment do not fully support the Accelerate framework (I might be wrong), but soon they will do and probably you won't see much of a performance improvement when using |
@ggerganov Sorry, I should have been clearer. I meant C++ with Pytorch CUDA-enabled drivers. I thought a lot of the performance issues were with Python, but it wasn't as simple as that after looking into it with a profiler called scalene which looks at Python code, C/C++ etc library code, system calls and various GPU metrics. We're now converting Whisper to use TorchScript with TensorRT and some other CUDA optimizations like pinning memory etc. But again this is all for CUDA-enabled devices. We think we're going to get about a 3x-5x, but until completed this is just a theory with quick back-of-the-napkin calculations. In any case, thank you so much for this C++ version; it was quite an inspiration and great work. I might come back to it if I feel we can squeeze more performance gains by remaining entirely native but with Pytorch libs and the other tricks mentioned above. |
make LICENSE a link instead of code-formatted text
Great work! I find especially the implementation of ggml interesting. It looks like you implement all the basic neural network building blocks with ggml. How do you compare it with the torch jit approach of using a pytorch model in c++?
The text was updated successfully, but these errors were encountered: