-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add SYCL Backend registry, device and Event Interfaces #9705
[SYCL] Add SYCL Backend registry, device and Event Interfaces #9705
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot duplicate the async loading code in llama.cpp for each backend. In the next days I will make a PR that will make changes that will allow this code to work with any backend that implements the necessary ggml-backend interfaces (#9707).
Fair enough @slaren thanks for the hint. Will draft the current PR until then. |
@OuadiElfarouki #9707 got merged 🎉 |
08e843a
to
b373dcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions and (possibly) a dumb question wrt the use of events.
…rg#9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp
…rg#9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp
…rg#9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp test passed
…rg#9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp
We add Backend and Device registry as introduced in #9707 for SYCL Backend. This re-enables
test-backend-ops
for SYCL among others.This patch also implements most of the event APIs for the SYCL backend, fixes the
set_tensor_async
and enables an async IO / H2D memory copies for model loading (similar to CUDA backend implementation).Some improvement figures (load time) :
Nvidia A100 40GB + LLaMa 3.1 70B Q4 : 27.6s (master) -> 5.8s (patch)
Intel Arc A770 + LLaMa 3.1 8B Q4 : 1.6s (master) -> 0.8s (patch)
I have read the contributing guidelines
Self-reported review complexity: