WSL cannot use multiple GPUs simultaneously #12467

1556900941lizerui · 2025-01-15T03:00:01Z

Windows Version

Windows Server 2022:10.0.20348.2849

WSL Version

2.3.26.0

Are you using WSL 1 or WSL 2?

WSL 2
WSL 1

Kernel Version

5.15.167.4-1

Distro Version

20.04

Other Software

No response

Repro Steps

Qwen2-VL Dockerfile.
When I deploy Qwen2-VL using VLLM and want to use multi-card deployment or use the NCCL-test tool for multi-card communication testing, a "out of memory" bug occurs when the number of GPUs is greater than or equal to 4, but everything is normal when the number of GPUs is set to 2.Below are some of my GPU-related configuration information.I want to know if this is related to WSL2, because it runs well in the Ubuntu environment

Expected Behavior

I expect to know whether it is an environment issue brought by WSL or an issue with my own configuration

Actual Behavior

Qwen2-VL Dockerfile.
When I deploy Qwen2-VL using VLLM and want to use multi-card deployment or use the NCCL-test tool for multi-card communication testing, a "out of memory" bug occurs when the number of GPUs is greater than or equal to 4, but everything is normal when the number of GPUs is set to 2.Below are some of my GPU-related configuration information.I want to know if this is related to WSL2, because it runs well in the Ubuntu environment

Diagnostic Logs

(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-g0wGsu (size 9637888), Traceback (most recent call last):
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-342lst (size 9637888), Traceback (most recent call last):
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] inputs_embeds = self.model.embed_tokens(input_ids)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] inputs_embeds = self.model.embed_tokens(input_ids)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in wrapped_call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self.call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = tensor_model_parallel_all_reduce(output_parallel)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return get_tp_group().all_reduce(input)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = tensor_model_parallel_all_reduce(output_parallel)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.all_reduce(input, group=self.device_group)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return get_tp_group().all_reduce(input)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] work = group.allreduce([tensor], opts)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.all_reduce(input, group=self.device_group)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-g0wGsu (size 9637888)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] work = group.allreduce([tensor], opts)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-342lst (size 9637888)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-6jaegD (size 9637888), Traceback (most recent call last):
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] inputs_embeds = self.model.embed_tokens(input_ids)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self.call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = tensor_model_parallel_all_reduce(output_parallel)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return get_tp_group().all_reduce(input)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.all_reduce(input, group=self.device_group)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] work = group.allreduce([tensor], opts)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-6jaegD (size 9637888)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226]
ERROR 01-15 02:47:33 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3986 died, exit code: -15
INFO 01-15 02:47:33 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 236, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, rpc_path)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 34, in init
self.engine = AsyncLLMEngine.from_engine_args(
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 735, in from_engine_args
engine = cls(
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 615, in init
self.engine = self._init_engine(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 835, in _init_engine
return engine_class(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 262, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 338, in init
self._initialize_kv_caches()
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 467, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
File "/usr/local/lib/python3.8/dist-packages/vllm/executor/distributed_gpu_executor.py", line 39, in determine_num_available_blocks
num_blocks = self._run_workers("determine_num_available_blocks", )
File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 199, in _run_workers
driver_worker_output = driver_worker_method(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
self.model_runner.profile_run()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
hidden_or_intermediate_states = model_executable(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
inputs_embeds = self.model.embed_tokens(input_ids)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
output = tensor_model_parallel_all_reduce(output_parallel)
File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
return get_tp_group().all_reduce(input)
File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
torch.distributed.all_reduce(input, group=self.device_group)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
work = group.allreduce([tensor], opts)
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
ncclUnhandledCudaError: Call to CUDA function failed.

github-actions · 2025-01-15T03:00:46Z

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'.
Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here

Once completed please upload the output files to this Github issue.

Click here for more info on logging
If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

1556900941lizerui · 2025-01-15T03:15:37Z

Logs are required for review from WSL team

日志需要由 WSL 团队进行审查
If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'.如果这是一个功能请求，请回复'/feature'。如果这是一个问题，请回复'/question'。 Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.否则请按照以下说明附加日志，除非添加这些日志，否则您的問題将不会得到审阅。这些日志将帮助我们了解您的机器中发生了什么。

How to collect WSL logs
如何收集 WSL 日志

/question

github-actions · 2025-01-15T03:16:05Z

Diagnostic information

Found '/question', adding tag 'question'

github-actions bot added the needs-author-feedback label Jan 15, 2025

microsoft-github-policy-service bot removed the needs-author-feedback label Jan 15, 2025

github-actions bot added the question label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSL cannot use multiple GPUs simultaneously #12467

WSL cannot use multiple GPUs simultaneously #12467

1556900941lizerui commented Jan 15, 2025

github-actions bot commented Jan 15, 2025

1556900941lizerui commented Jan 15, 2025

Logs are required for review from WSL team

github-actions bot commented Jan 15, 2025

WSL cannot use multiple GPUs simultaneously #12467

WSL cannot use multiple GPUs simultaneously #12467

Comments

1556900941lizerui commented Jan 15, 2025

Windows Version

WSL Version

Are you using WSL 1 or WSL 2?

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

github-actions bot commented Jan 15, 2025

Logs are required for review from WSL team

1556900941lizerui commented Jan 15, 2025

Logs are required for review from WSL team

github-actions bot commented Jan 15, 2025