-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: RuntimeError: No suitable kernel. h_in=16 h_out=3424 dtype=Float out_dtype=BFloat16 #3793
Comments
@WoosukKwon How can I solve this problem |
Current punica kernel can't process |
Thanks, It can work now, but I still want to use all gpu, because the memory is not enough... |
您好,我在加载baichuan2-13b时候遇到相似问题,在0.3.3与0.4版本均存在,RuntimeError: No suitable kernel. h_in=32 h_out=15360 dtype=Float out_dtype=BFloat16 |
当前的vllm版本中,punica的算子不支持15360,我之前的PR没有注意到这点,不好意思。 f(in_T, out_T, W_T, narrow, 15360) \ 然后重新编译vllm(0.4.0的版本) |
是的,我也是这样添加的,测试没有问题,在0.3.3与0.4中均没有问题 |
hi,您可以提个PR解决这个问题吗 |
我看您有不断提交pr,您可以帮忙在下次提交pr把这部分合并上去,我就不专门提pr了!另外问下,在0.4版本与您之前提交qkv的pr合并了之外,还做了哪些改动,比如我看到下面这部分,以便我选择是否需要更新到0.4版本,因为我基于0.3.3重写了ModelRunner,worker到LLMEngine部分 |
@nlp-learner 好的。 |
@jeejeelee 哥,我遇到了一个bug。 |
@jeejeelee 搞定了,我增加了640的算子 |
我在chinese-alpaca-llama2-7B遇到此问题,是否有不重新编译的方法 |
h_in=16 h_out=3424 |
@jeejeelee can you support this on 8 gpus? |
@jeejeelee |
这个是什么模型的尺寸呢,此外你可以自己提个PR来解决这个size |
I'm trying to solve this issue |
@jeejeelee |
可以去查下如何向github的工程提交PR |
@Edisonwei54 看了你的代码是跑的qwen1.5-14b+2个lora,可以请问有出现cannot access local variable 'lora_b_k' where it is not associated with a value 这个错误吗 |
请问解决了吗? |
您好,目前我已经了相关的PR, 参考:#5036 但是目前还处于开发中 |
好的 |
你好,我是使用pip install vllm安装的vllm包,你说的vllm/blob/main/csrc/punica/bgmv/bgmv_config.h没有找到在哪 |
你好,clone下源码,即可找到,在vllm/csrc/punica/bgmv/bgmv_config.h中 |
您好,我在微调Qwen2-7B后部署时遇到了同样的问题:
|
18944的尺寸不支持,可以参考15360的解决方式解决 |
@liangxiao777 FYI #5441 |
Same error with Qwen-72B-Instruct lora:
|
Hi, Thanks for your help. I encountered the similar situation while fine tuning. Could you say how can I treat this? And how '-tensor-parallel-size' option can avoid this error?
|
Hi,although the error messages are similar, your situation are not the same. You can refer to #5441 to add support for 3424 is not divisible by 64, so the Punica kernel can't process h_out=3424. However, 19200 is divisible by 64 and can be processed. |
Thanks for your help! You mean referring to #5441 and changing 2 files (csrc/punica/bgmv/bgmv_config.h, tests/lora/test_punica.py). And build and install again. I am going to try that! Thanks! |
Yes, you can try it by: export VLLM_INSTALL_PUNICA_KERNELS=1 # build for multi-LoRA capability
pip install -e . # This may take 5-10 minutes. |
This should be resolved with the new landed Triton kernels #5036 |
Dear all, I use 0.4.3 version vllm face similar error: Besides, I didn't find the Now, I have to merge lora with raw model to infer, but it is so large in disk, so is there any choice I can do to solve this issue? |
Your current environment
🐛 Describe the bug
The text was updated successfully, but these errors were encountered: