Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor flashinfer logic for deepseek v3 and fix accuracy bug #3785

Merged
merged 6 commits into from
Feb 24, 2025

Conversation

Fridge003
Copy link
Collaborator

@Fridge003 Fridge003 commented Feb 22, 2025

Motivation

flashinfer_backend.py for attention is too complex, this PR extract the logic of MLA and creates a new flashinfer_mla_backend.py

Also, #3716 #3751 reports an accuracy bug when enabling flashinfer mla. This PR solves this bug by correctly handling rope scaling with yarn, with the help of @yzh119.

Modifications

  • Define FlashInferMLAAttnBackend in flashinfer_mla_backend.py by removing codes irrelevant to MLA in flashinfer_backend.py
  • Remove magic numbers in code so Deepseek v2 can also be supported
  • Simplify the code in forward of MLA
  • Change flash attention backend to auto so the newest fa3 backend of flashinfer can be used
  • Fix accuracy bug with yarn rope scaling

Accuracy Test

The baseline results of not enabling flashinfer mla can be referred to #3486.

Server

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-flashinfer-mla

gsm8k

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
Accuracy: 0.960
Invalid: 0.000
Latency: 101.807 s
Output throughput: 1329.111 token/s

mmlu

bash benchmark/mmlu/download_data.sh
python3 benchmark/mmlu/bench_sglang.py --nsub 100 --ntrain 5 --parallel 2000
Total latency: 159.003
Average accuracy: 0.871

Checklist

@Fridge003 Fridge003 changed the title [] Add flashinfer mla backend for deepseek v3 Refactor flashinfer logic for deepseek v3 Feb 22, 2025
@zhyncs zhyncs self-assigned this Feb 22, 2025
@Fridge003 Fridge003 changed the title Refactor flashinfer logic for deepseek v3 Refactor flashinfer logic for deepseek v3 and fix accuracy bug Feb 23, 2025
@Fridge003 Fridge003 mentioned this pull request Feb 24, 2025
2 tasks
@zhyncs zhyncs merged commit b110084 into sgl-project:main Feb 24, 2025
5 of 18 checks passed
@Fridge003 Fridge003 deleted the deepseek branch February 25, 2025 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants