Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034

Closed
gaijigoumeiren opened this issue Feb 26, 2024 · 4 comments
Labels

Comments

@gaijigoumeiren
Copy link

我在部署qwen1.5-7B-Chat的时候遇到调用API时最后有10个字符缺失的问题,长度正好是结束token<|im_end|>。

nohup python -m vllm.entrypoints.openai.api_server \
	--model /Qwen/Qwen1.5-7B-Chat
	--host 0.0.0.0 \
	--port 80 \
	--trust-remote-code \

临时的解决方案:调用接口的时候传入:include_stop_str_in_output=True
可能是因为在调用api的时候include_stop_str_in_output默认是False,而在 https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py#L966中,最后的stop token会被截断掉,但是seq.output_text中并不包含<|im_end|>,所以就截断错了。

    def _finalize_sequence(self, seq: Sequence,
                           sampling_params: SamplingParams,
                           stop_string: str) -> None:
        if not sampling_params.include_stop_str_in_output and stop_string:
            # Truncate the output text so that the stop string is
            # not included in the output.
            seq.output_text = seq.output_text[:-len(stop_string)]

感觉是不是改成如下就OK了

def _finalize_sequence(self, seq: Sequence,
                           sampling_params: SamplingParams,
                           stop_string: str) -> None:
        if not sampling_params.include_stop_str_in_output and stop_string:
            # Truncate the output text so that the stop string is
            # not included in the output.
            seq.output_text = seq.output_text.rstrip(stop_string)
@lcvcl
Copy link

lcvcl commented Feb 27, 2024

我也遇到了相同问题,0.3.1和0.3.2版本都有上述问题,在llama模型和qwen1,.5上都遇到了

@currenttime
Copy link

Qwen1.5-7B已解决
max_tokens默认为16,指定SamplingParams时传入一个大的max_tokens参数
output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 30, 2024
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants