-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCCL hanging during inference #2770
Comments
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/cudagraph.html
Looks like there is issue between nccl and cuda graph. I use |
More updates, even after disabling cudagraph by setting
|
I reported the same issue in #2731. you need enforce_eager and disable_custom_all_reduce to fix it. But the real issue is how to fix the cuda_graphs and custom all reduce to make it work. |
Closed as #2811 fixes this. Please feel free to re-open the issue if you find the bug persists. |
@WoosukKwon I met same problem, but after using PyTorch 2.2.0, this problem has been resolved. |
hi, do you use 0.2.7 or 0.3.0? |
with vllm v0.2.7, I saw the nccl hanging for allreduce:
after switching to v0.3.0(with custom all reduce), it's
gather
The text was updated successfully, but these errors were encountered: