-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci][distributed] add distributed test gptq_marlin with tp = 2 #6010
Conversation
6a2004c
to
8e128d2
Compare
Thanks for the PR! You need to move the test from https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml In addition, because of some limitations, you might only test the tp=2 case. It is not safe to test two vLLM instances together. |
63b9545
to
49141fb
Compare
Imo we should keep the original |
make sense - so is it better to abstract the common following test codes into a new code block (e.g.
|
Let's abstract out the code (similar to what I did for the multimodal distributed tests) |
64c0686
to
f12288d
Compare
@@ -17,8 +18,6 @@ | |||
|
|||
from .utils import check_logprobs_close | |||
|
|||
os.environ["TOKENIZERS_PARALLELISM"] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this line as it avoids unnecessary warnings from HuggingFace
@DarkLight1337 it looks like the new unit test (
|
This happens because you initialized CUDA too early (probably indirectly via imports). Try to avoid importing torch-related stuff in the top level code of your test. |
If the issue persists, #6056 should help you. |
d1d19d7
to
74d32a3
Compare
Please merge the latest |
Also it's really difficult to keep track of your changes if you keep force-pushing. |
thanks for your suggestion 👍 I will pay attentions in future. BTW - I have merged the latest main into this PR and we will see the result |
with latest main, distributed tests were still failed because of
|
I think it's because you initialized CUDA via |
looks like the distributed tests still failed |
Hmm I guess we can't even call |
if you merged the latest main, the problem might be you are testing with multiple models, and pytest is using a single process to test multiple models. after you test one model, the process has cuda initialized. |
Thanks so much for the details. If we have to pick up one model for distributed test, which following is better to use?
|
if you can locally reproduce the problem, it would be easier to debug. here is a simple script you can put in the test code, to find out which function initializes cuda: import torch
import sys
found = False
import traceback
def _trace_calls(frame, event, arg=None):
if event in ['call', 'return']:
# for every function call or return
try:
global found
# Temporarily disable the trace function
sys.settrace(None)
# check condition here
if not found and torch.cuda.is_initialized():
found = True
traceback.print_stack()
# Re-enable the trace function
sys.settrace(_trace_calls)
except NameError:
# modules are deleted during shutdown
pass
return _trace_calls
sys.settrace(_trace_calls) |
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
follow-up pr of #6007