-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] vllm推理qwen-72b-chat返回异常 #728
Comments
您需要配合FastChat使用vLLM,请参照README中说明。vLLM本身并未提供对Qwen对话模型上的支持。 |
+1按照VLLM官方和readme都复现了类似的问题(在VLLM api部署后chat版本中1.8B是这样的 。7B和14B是正常的),且LORA后的在VLLM框架下输出被截断,不带VLLM框架输出正常 |
@positive666 有调temperature/topp什么的吗? |
请问最后解决了吗 |
same issue, 意外截断 |
@Modas-Li @magnificent1208 模型输出截断问题可能有各种不同原因(楼主应该就是没用fastchat导致的),如果有问题可以单开isseu或者在这个帖子下回复出问题的具体情况(比如模型大小,启动方式,模型输出,解码参数,命令行输出等) |
走的是VLLM的官方命令 应该都是缺省值,通过FASTCHAT和VLLM的也一样复现,我个人猜测如果是VLLM的问题可能是VLLM generate的函数代码那里,手头忙完活 再看看 |
遇到了同样的问题,请问解决了吗 |
根据您的样例:
配置有以下问题
输入有以下问题
tokenize有以下问题
不建议单独使用vLLM。 |
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
input:
output:
期望行为 | Expected Behavior
output:
复现方法 | Steps To Reproduce
1 run command
2 log
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: