Enable flashinfer for dsv2. #3751

foreverlms · 2025-02-21T08:51:48Z

Motivation

Enable flashinfer backend for deepseekv2.

Modifications

Only one line.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

zhyncs · 2025-02-21T12:53:24Z

@foreverlms may you paste the gsm8k result? (triton backend vs flashinfer backend)

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319

foreverlms · 2025-02-21T14:38:09Z

@foreverlms may you paste the gsm8k result? (triton backend vs flashinfer backend)
python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319

Is this for accuracy or perf? I tried some prompts and the results seemed well.

zhyncs · 2025-02-21T14:40:16Z

accuracy

foreverlms · 2025-02-22T11:36:14Z

Got:

Accuracy: 0.453
Invalid: 0.005
Latency: 37.627 s
Output throughput: 4522.078 token/s

Compared with mla not enabled:

Accuracy: 0.659
Invalid: 0.002
Latency: 40.824 s
Output throughput: 4552.229 token/s

Launch command:

CUDA_VISIBLE_DEVICES=4 python -m sglang.launch_server  --model-path deepseek-ai/DeepSeek-V2-Lite-Chat --port 30000 --host 0.0.0.0 --trust-remote-code

There is gap. @zhyncs

yzh119 · 2025-02-23T16:18:41Z

I'm investigating the accuracy issue now.

yzh119 · 2025-02-23T22:23:48Z

Hi @foreverlms @zhyncs , for dsv2, we should not use hardcoded sm_scale: 1 / sqrt(192), it should be 0.1147213867929261, which is computed by yarn_get_mscale: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py, thanks Zhiyao Cen from vllm team for the hint.

After changing sm_scale to the correct value, the accuracy become 0.650

foreverlms · 2025-02-24T06:47:27Z

Hi @foreverlms @zhyncs , for dsv2, we should not use hardcoded sm_scale: 1 / sqrt(192), it should be 0.1147213867929261, which is computed by yarn_get_mscale: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py, thanks Zhiyao Cen from vllm team for the hint.

After changing sm_scale to the correct value, the accuracy become 0.650

As said from the pr you mentioned, the newly added mla backend will handle both v2 and v3. I will close this PR. cc @zhyncs

Enable flashinfer for dsv2.

0b6e49b

foreverlms requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners February 21, 2025 08:51

Merge branch 'main' into enable_flashinfer_in_dsv2

5ab2d36

Merge branch 'main' into enable_flashinfer_in_dsv2

55d83de

Merge branch 'main' into enable_flashinfer_in_dsv2

cd5e9a3

yzh119 mentioned this pull request Feb 23, 2025

Refactor flashinfer logic for deepseek v3 and fix accuracy bug #3785

Merged

6 tasks

foreverlms closed this Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable flashinfer for dsv2. #3751

Enable flashinfer for dsv2. #3751

foreverlms commented Feb 21, 2025 •

edited

Loading

zhyncs commented Feb 21, 2025

foreverlms commented Feb 21, 2025 •

edited

Loading

zhyncs commented Feb 21, 2025

foreverlms commented Feb 22, 2025 •

edited

Loading

yzh119 commented Feb 23, 2025 •

edited

Loading

yzh119 commented Feb 23, 2025

foreverlms commented Feb 24, 2025 •

edited

Loading

Enable flashinfer for dsv2. #3751

Enable flashinfer for dsv2. #3751

Conversation

foreverlms commented Feb 21, 2025 • edited Loading

Motivation

Modifications

Checklist

zhyncs commented Feb 21, 2025

foreverlms commented Feb 21, 2025 • edited Loading

zhyncs commented Feb 21, 2025

foreverlms commented Feb 22, 2025 • edited Loading

yzh119 commented Feb 23, 2025 • edited Loading

yzh119 commented Feb 23, 2025

foreverlms commented Feb 24, 2025 • edited Loading

foreverlms commented Feb 21, 2025 •

edited

Loading

foreverlms commented Feb 21, 2025 •

edited

Loading

foreverlms commented Feb 22, 2025 •

edited

Loading

yzh119 commented Feb 23, 2025 •

edited

Loading

foreverlms commented Feb 24, 2025 •

edited

Loading