-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix CustomAllreduce pcie nvlink topology detection (#3974) #4159
Conversation
Can you give some pointers to explain the behavior of |
Sure - Nvidia's documentation for nvmlDeviceGetNvLinkState reads:
This function queries individual NVlink links/channels associated with the specified device , these are the 12 (PCIe) or 18 (SXM) physical lanes that connect that GPU with either its peer (PCIe) or NVswitch (SXM) , equivalent to "nvidia-smi nvlink -s -i [unit]". (output of that command attached as nvidia-smi-nvlink-s.txt as it's quite long; note that each card reports 18 possible connections, of which only 12 are active, with some differences in link activation among cards which I suspect reflect die yield variation.) A "true" nvmlDeviceGetNvLinkState() response would be equivalent to "link present" for Ethernet - indicating that something is connected, but saying nothing about the other device's identity. Docs for nvmlDeviceGetP2PStatus:
This method returns concrete information regarding connectivity between the two specified GPUs, which I believe matches the intent of _is_full_nvlink(). |
This should be a release blocker. cc @simon-mo |
Also cc @hanzhi713 FYI |
ok what's the merge plan? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
I will merge it after the test is good. |
…3974) [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (vllm-project#3974) (vllm-project#4159)
…3974) [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (vllm-project#3974) (vllm-project#4159)
…3974) [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (vllm-project#3974) (vllm-project#4159)
…3974) [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (vllm-project#3974) (vllm-project#4159)
…3974) [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (vllm-project#3974) (vllm-project#4159)
CustomAllreduce requires nvidia GPUs to be fully connected via NVlink, and attempts to disable itself when run on incompatible hardware (e.g. >2 PCIE GPU where only specific pairs of cards are linked).
Current code calls
nvmlDeviceGetNvLinkState()
, but that method does not actually assess peer connectivity, instead it queries status of individual NVlink lanes on the current-rank GPU (equivalent to "nvidia-smi nvlink -s -i "). As a result, CustomAllReduce is intermittently incorrectly enabled for >2 PCIE configurations, leading to hangs at model loading as discussed in #3974.New code calls nvmlDeviceGetP2PStatus() to determine topology.
FIX #3974