-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
模型非流式回答的内容不完整,而设置stream=True就能完整输出,设置max_tokens无效 #310
Comments
@aimi0914 Apologies I had to translate that:
I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating |
My code is as follows, simple call with openai library. Qwen1.5-14B-Chat. The model was deployed using fastchat |
Replacing the model with baichuan and testing with the same code is fine. So I suspect it's the qwen model |
vllm-project/vllm#3034 (comment) |
@aimi0914 Oh interesting! Hmm I wonder why the Qwen model doesn't work well |
The text was updated successfully, but these errors were encountered: