-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable flashinfer for dsv2. #3751
Conversation
@foreverlms may you paste the gsm8k result? (triton backend vs flashinfer backend) python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319 |
Is this for accuracy or perf? I tried some prompts and the results seemed well. |
accuracy |
Got:
Compared with mla not enabled:
Launch command:
There is gap. @zhyncs |
I'm investigating the accuracy issue now. |
Hi @foreverlms @zhyncs , for dsv2, we should not use hardcoded After changing |
As said from the pr you mentioned, the newly added mla backend will handle both v2 and v3. I will close this PR. cc @zhyncs |
Motivation
Enable flashinfer backend for deepseekv2.
Modifications
Only one line.
Checklist