模型非流式回答的内容不完整，而设置stream=True就能完整输出，设置max_tokens无效 #310

aimi0914 · 2024-04-07T03:22:54Z

danielhanchen · 2024-04-07T17:29:06Z

@aimi0914 Apologies I had to translate that:

The content of the model's non-streaming answer is incomplete, but setting stream=True can complete the output. Setting max_tokens is invalid #310

I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating

aimi0914 · 2024-04-08T01:40:20Z

@aimi0914 Apologies I had to translate that:
The content of the model's non-streaming answer is incomplete, but setting stream=True can complete the output. Setting max_tokens is invalid #310
I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating

My code is as follows, simple call with openai library. Qwen1.5-14B-Chat. The model was deployed using fastchat

aimi0914 · 2024-04-08T03:02:55Z

@aimi0914 Apologies I had to translate that:
The content of the model's non-streaming answer is incomplete, but setting stream=True can complete the output. Setting max_tokens is invalid #310
I can see you set max_tokens to 4096, but it just stops generating? Very weird. Tbh I'm not really sure how to debug this, since this seems like an application you're creating

Replacing the model with baichuan and testing with the same code is fine. So I suspect it's the qwen model

aimi0914 · 2024-04-08T07:49:09Z

vllm-project/vllm#3034 (comment)
the same question

danielhanchen · 2024-04-09T06:03:22Z

@aimi0914 Oh interesting! Hmm I wonder why the Qwen model doesn't work well

aimi0914 closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

模型非流式回答的内容不完整，而设置stream=True就能完整输出，设置max_tokens无效 #310

模型非流式回答的内容不完整，而设置stream=True就能完整输出，设置max_tokens无效 #310

aimi0914 commented Apr 7, 2024

danielhanchen commented Apr 7, 2024

aimi0914 commented Apr 8, 2024

aimi0914 commented Apr 8, 2024

aimi0914 commented Apr 8, 2024

danielhanchen commented Apr 9, 2024

模型非流式回答的内容不完整，而设置stream=True就能完整输出，设置max_tokens无效 #310

模型非流式回答的内容不完整，而设置stream=True就能完整输出，设置max_tokens无效 #310

Comments

aimi0914 commented Apr 7, 2024

danielhanchen commented Apr 7, 2024

aimi0914 commented Apr 8, 2024

aimi0914 commented Apr 8, 2024

aimi0914 commented Apr 8, 2024

danielhanchen commented Apr 9, 2024