-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL cannot use multiple GPUs simultaneously #12467
Comments
Logs are required for review from WSL teamIf this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. How to collect WSL logsDownload and execute collect-wsl-logs.ps1 in an administrative powershell prompt:
The script will output the path of the log file once done. If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here Once completed please upload the output files to this Github issue. Click here for more info on logging |
/question |
Diagnostic information
|
Windows Version
Windows Server 2022:10.0.20348.2849
WSL Version
2.3.26.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.167.4-1
Distro Version
20.04
Other Software
No response
Repro Steps
Qwen2-VL Dockerfile.
When I deploy Qwen2-VL using VLLM and want to use multi-card deployment or use the NCCL-test tool for multi-card communication testing, a "out of memory" bug occurs when the number of GPUs is greater than or equal to 4, but everything is normal when the number of GPUs is set to 2.Below are some of my GPU-related configuration information.I want to know if this is related to WSL2, because it runs well in the Ubuntu environment
Expected Behavior
I expect to know whether it is an environment issue brought by WSL or an issue with my own configuration
Actual Behavior
Qwen2-VL Dockerfile.
When I deploy Qwen2-VL using VLLM and want to use multi-card deployment or use the NCCL-test tool for multi-card communication testing, a "out of memory" bug occurs when the number of GPUs is greater than or equal to 4, but everything is normal when the number of GPUs is set to 2.Below are some of my GPU-related configuration information.I want to know if this is related to WSL2, because it runs well in the Ubuntu environment
Diagnostic Logs
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-g0wGsu (size 9637888), Traceback (most recent call last):
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-342lst (size 9637888), Traceback (most recent call last):
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] inputs_embeds = self.model.embed_tokens(input_ids)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] inputs_embeds = self.model.embed_tokens(input_ids)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in wrapped_call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self.call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = tensor_model_parallel_all_reduce(output_parallel)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return get_tp_group().all_reduce(input)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = tensor_model_parallel_all_reduce(output_parallel)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.all_reduce(input, group=self.device_group)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return get_tp_group().all_reduce(input)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] work = group.allreduce([tensor], opts)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.all_reduce(input, group=self.device_group)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-g0wGsu (size 9637888)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3985) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] work = group.allreduce([tensor], opts)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-342lst (size 9637888)
(VllmWorkerProcess pid=3984) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-6jaegD (size 9637888), Traceback (most recent call last):
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] inputs_embeds = self.model.embed_tokens(input_ids)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return self.call_impl(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] output = tensor_model_parallel_all_reduce(output_parallel)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return get_tp_group().all_reduce(input)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.all_reduce(input, group=self.device_group)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] work = group.allreduce([tensor], opts)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Last error:
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226] Error while creating shared memory segment /dev/shm/nccl-6jaegD (size 9637888)
(VllmWorkerProcess pid=3986) ERROR 01-15 02:47:32 multiproc_worker_utils.py:226]
ERROR 01-15 02:47:33 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3986 died, exit code: -15
INFO 01-15 02:47:33 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 236, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, rpc_path)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 34, in init
self.engine = AsyncLLMEngine.from_engine_args(
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 735, in from_engine_args
engine = cls(
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 615, in init
self.engine = self._init_engine(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 835, in _init_engine
return engine_class(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 262, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 338, in init
self._initialize_kv_caches()
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 467, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
File "/usr/local/lib/python3.8/dist-packages/vllm/executor/distributed_gpu_executor.py", line 39, in determine_num_available_blocks
num_blocks = self._run_workers("determine_num_available_blocks", )
File "/usr/local/lib/python3.8/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 199, in _run_workers
driver_worker_output = driver_worker_method(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 222, in determine_num_available_blocks
self.model_runner.profile_run()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1211, in profile_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/model_runner.py", line 1538, in execute_model
hidden_or_intermediate_states = model_executable(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 860, in forward
inputs_embeds = self.model.embed_tokens(input_ids)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
output = tensor_model_parallel_all_reduce(output_parallel)
File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
return get_tp_group().all_reduce(input)
File "/usr/local/lib/python3.8/dist-packages/vllm/distributed/parallel_state.py", line 293, in all_reduce
torch.distributed.all_reduce(input, group=self.device_group)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 2288, in all_reduce
work = group.allreduce([tensor], opts)
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5
ncclUnhandledCudaError: Call to CUDA function failed.
The text was updated successfully, but these errors were encountered: