-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: I want to integrate vllm into LLaMA-Factory, a transformers-based LLM training framework. However, I encountered two bugs: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method & RuntimeError: NCCL error: invalid usage (run with NCCL_DEBUG=WARN for details) #9469
Comments
Can you try running vLLM with |
I'm also interested if 0.6.3 behaves better for you. It includes this change: #8823 With that change, vllm should automatically use |
I tried this, but got a NCCL bug: NCCL bug
|
What do you get when you set NCCL_DEBUG=warn? |
I tried vllm 0.6.3.post2.dev12+g1ffc8a73 with the test demo below, however, there is still a demo code:
ERROR:
|
one solution is to use vllm's openai api server, then you don't need to worry about the problem.
@takagi97 you didn't provide the detailed NCCL log with |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
I also encountered this issue. any solution available? |
See #12084 |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I want to utilize VLLM to conduct LLM inference and efficiently derive the output probability distribution for token-level knowledge distillation. To achieve this, I need to first use VLLM for inference and then use its output to train student models. For the implementation, I integrated VLLM (0.6.2) into LLaMA-Factory (0.9.0), a Transformers (4.45.0)-based LLM training framework. However, when I run my code, I encounter the following bug:
ERROR-1
After debugging for hours and searching through issues, I realized that both LLaMA-Factory and Transformers are calling
torch.cuda
functions, such astorch.cuda.is_available()
, before the initialization of vllm models (source_model = LLM(model=model_args.cwc_source_model_name_or_path, tensor_parallel_size=training_args.world_size)
), which leads to the bug. However, there are numerous calls totorch.cuda
functions throughout the project, and I cannot remove all of them. :(After reading this issue, I set
export VLLM_WORKER_MULTIPROC_METHOD=spawn
, but there is another bug.ERROR-2
Then I check my driver and hardware following the python script provided in https://docs.vllm.ai/en/latest/getting_started/debugging.html. Below is the report. It seems that everything is okay.
driver & hardware checking report
Is there any way to fix this bug? @DarkLight1337 @youkaichao
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: