部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034

gaijigoumeiren · 2024-02-26T08:18:13Z

我在部署qwen1.5-7B-Chat的时候遇到调用API时最后有10个字符缺失的问题，长度正好是结束token<|im_end|>。

nohup python -m vllm.entrypoints.openai.api_server \
	--model /Qwen/Qwen1.5-7B-Chat
	--host 0.0.0.0 \
	--port 80 \
	--trust-remote-code \

临时的解决方案：调用接口的时候传入：include_stop_str_in_output=True
可能是因为在调用api的时候include_stop_str_in_output默认是False，而在 https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py#L966中，最后的stop token会被截断掉，但是seq.output_text中并不包含<|im_end|>，所以就截断错了。

    def _finalize_sequence(self, seq: Sequence,
                           sampling_params: SamplingParams,
                           stop_string: str) -> None:
        if not sampling_params.include_stop_str_in_output and stop_string:
            # Truncate the output text so that the stop string is
            # not included in the output.
            seq.output_text = seq.output_text[:-len(stop_string)]

感觉是不是改成如下就OK了

def _finalize_sequence(self, seq: Sequence,
                           sampling_params: SamplingParams,
                           stop_string: str) -> None:
        if not sampling_params.include_stop_str_in_output and stop_string:
            # Truncate the output text so that the stop string is
            # not included in the output.
            seq.output_text = seq.output_text.rstrip(stop_string)

The text was updated successfully, but these errors were encountered:

lcvcl · 2024-02-27T01:41:08Z

我也遇到了相同问题，0.3.1和0.3.2版本都有上述问题，在llama模型和qwen1,.5上都遇到了

currenttime · 2024-03-19T09:54:39Z

Qwen1.5-7B已解决
max_tokens默认为16，指定SamplingParams时传入一个大的max_tokens参数
output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))

github-actions · 2024-10-30T02:01:44Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2024-11-29T02:08:05Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

HyperdriveHustle mentioned this issue Feb 26, 2024

Fix: Output text is always truncated in some models #3016

Merged

YYLCyylc mentioned this issue Feb 28, 2024

BUG：返回值会自己中断 xorbitsai/inference#1029

Closed

fyabc mentioned this issue Apr 8, 2024

qwen1.5-72b-chat content 结果被截断 QwenLM/Qwen2.5#220

Closed

aimi0914 mentioned this issue Apr 8, 2024

模型非流式回答的内容不完整，而设置stream=True就能完整输出，设置max_tokens无效 unslothai/unsloth#310

Closed

github-actions bot added the stale label Oct 30, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034

部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034

gaijigoumeiren commented Feb 26, 2024

lcvcl commented Feb 27, 2024 •

edited

Loading

currenttime commented Mar 19, 2024

github-actions bot commented Oct 30, 2024

github-actions bot commented Nov 29, 2024

部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034

部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034

Comments

gaijigoumeiren commented Feb 26, 2024

lcvcl commented Feb 27, 2024 • edited Loading

currenttime commented Mar 19, 2024

github-actions bot commented Oct 30, 2024

github-actions bot commented Nov 29, 2024

lcvcl commented Feb 27, 2024 •

edited

Loading