[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability #6216

youkaichao · 2024-07-08T16:47:55Z

mgoin

This looks reasonable, thanks for quick fix!

comaniac · 2024-07-08T17:26:04Z

Should we replace all usages of torch.cuda.get_device_capability() with this? For example attention selector and ROCm attention now still use the native torch one, which may result in the same issue.

youkaichao · 2024-07-08T17:31:57Z

Should we replace all usages of torch.cuda.get_device_capability() with this? For example attention selector and ROCm attention now still use the native torch one, which may result in the same issue.

I replaced most of them in #6080 . The rest are mostly used under is_hip, and they still use torch.cuda.get_device_capability() . In rocm, this function does not initialize device.

mgoin · 2024-07-08T17:48:08Z

@youkaichao I forget where but I ran into CI issues with current_platform.get_device_capability() hitting an exception on CPU tests because current_platform returns as None in that case. Should we make better dummy behavior for CPU/non-supported platforms, such as returning 0?

youkaichao · 2024-07-08T17:51:13Z

I forget where but I ran into CI issues with current_platform.get_device_capability() hitting an exception on CPU tests because current_platform returns as None in that case.

links would be appreciated. I don't think cpu code should call this function. even before this change, torch.cuda.get_device_capability() would also error out in cpu case.

comaniac · 2024-07-08T17:55:14Z

How about this one?

https://github.com/vllm-project/vllm/blob/main/vllm/attention/selector.py#L151

youkaichao · 2024-07-08T17:57:40Z

@comaniac these should be called after process initialize cuda, so they are fine. we can also replace them, though.

comaniac · 2024-07-08T18:04:42Z

@comaniac these should be called after process initialize cuda, so they are fine. we can also replace them, though.

Yeah it would be safer to replace this one as well.

T-Atlas

LGTM

comaniac · 2024-07-09T03:02:50Z

Merge first. I'll change the use in selector in another PR.

…ice_capability (vllm-project#6216)

…ice_capability (vllm-project#6216) Signed-off-by: Alvant <[email protected]>

use device id under CUDA_VISIBLE_DEVICES for get_device_capability

e34748a

youkaichao mentioned this pull request Jul 8, 2024

[Bug]: Using vllm as the inference engine, there is an incorrect recognition of GPU computing capabilities for different types. #6213

Closed

youkaichao requested a review from mgoin July 8, 2024 17:10

mgoin approved these changes Jul 8, 2024

View reviewed changes

T-Atlas approved these changes Jul 9, 2024

View reviewed changes

comaniac merged commit a3c9435 into vllm-project:main Jul 9, 2024
70 checks passed

youkaichao deleted the capability branch July 9, 2024 03:32

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_dev…

a799814

…ice_capability (vllm-project#6216)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_dev…

3155a73

…ice_capability (vllm-project#6216)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_dev…

b6c0a96

…ice_capability (vllm-project#6216) Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability #6216

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability #6216

youkaichao commented Jul 8, 2024

mgoin left a comment

comaniac commented Jul 8, 2024

youkaichao commented Jul 8, 2024

mgoin commented Jul 8, 2024

youkaichao commented Jul 8, 2024

comaniac commented Jul 8, 2024

youkaichao commented Jul 8, 2024

comaniac commented Jul 8, 2024

T-Atlas left a comment

comaniac commented Jul 9, 2024

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability #6216

[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability #6216

Conversation

youkaichao commented Jul 8, 2024

mgoin left a comment

Choose a reason for hiding this comment

comaniac commented Jul 8, 2024

youkaichao commented Jul 8, 2024

mgoin commented Jul 8, 2024

youkaichao commented Jul 8, 2024

comaniac commented Jul 8, 2024

youkaichao commented Jul 8, 2024

comaniac commented Jul 8, 2024

T-Atlas left a comment

Choose a reason for hiding this comment

comaniac commented Jul 9, 2024