Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

模型非流式回答的内容不完整,而设置stream=True就能完整输出,设置max_tokens无效 #310

Closed
aimi0914 opened this issue Apr 7, 2024 · 5 comments

Comments

@aimi0914
Copy link

aimi0914 commented Apr 7, 2024

image

@danielhanchen
Copy link
Contributor

@aimi0914 Apologies I had to translate that:

The content of the model's non-streaming answer is incomplete, but setting stream=True can complete the output. Setting max_tokens is invalid #310

I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating

@aimi0914
Copy link
Author

aimi0914 commented Apr 8, 2024

@aimi0914 Apologies I had to translate that:

The content of the model's non-streaming answer is incomplete, but setting stream=True can complete the output. Setting max_tokens is invalid #310

I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating

My code is as follows, simple call with openai library. Qwen1.5-14B-Chat. The model was deployed using fastchat
image

@aimi0914
Copy link
Author

aimi0914 commented Apr 8, 2024

@aimi0914 Apologies I had to translate that:

The content of the model's non-streaming answer is incomplete, but setting stream=True can complete the output. Setting max_tokens is invalid #310

I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating

Replacing the model with baichuan and testing with the same code is fine. So I suspect it's the qwen model

@aimi0914
Copy link
Author

aimi0914 commented Apr 8, 2024

vllm-project/vllm#3034 (comment)
the same question

@aimi0914 aimi0914 closed this as completed Apr 8, 2024
@danielhanchen
Copy link
Contributor

@aimi0914 Oh interesting! Hmm I wonder why the Qwen model doesn't work well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants