-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Endless inferencing with cpu on DeepSeek-R1-Distill-Qwen-1.5B #1134
Comments
@basncy can you please try what is detailed here: |
Hi EricLBuehler, Not sure if this related with PagedAttention.
Here is some strace log during the inference loop:
|
Hi @basncy! Could you please remove the
I don't think it has to do with PagedAttention, unless the example is unchanged, and you are compiling with the I think implementing PagedAttention for the CPU would be a relatively low priority for now, as deployment use cases would most likely target using a GPU, for which we support both CUDA and Metal. A SYCL backend might be interesting, but I think implementing WGPU support would have a broader impact and make using Vulkan/OpenGL and others available. |
Same result, debug with default option from "Debug example 'deepseekr1'"
Perhaps, but there are still many challenges as it focuses on rendering at this moment. |
Replace deepseek-ai/DeepSeek-R1 with deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on examples/deepseekr1/main.rs for demo running on CPU, this example application gets into an endless loop after Dummy run completed.
The text was updated successfully, but these errors were encountered: