Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable flashinfer for dsv2. #3751

Closed

Conversation

foreverlms
Copy link

@foreverlms foreverlms commented Feb 21, 2025

Motivation

Enable flashinfer backend for deepseekv2.

Modifications

Only one line.

Checklist

@zhyncs
Copy link
Member

zhyncs commented Feb 21, 2025

@foreverlms may you paste the gsm8k result? (triton backend vs flashinfer backend)

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319

@foreverlms
Copy link
Author

foreverlms commented Feb 21, 2025

@foreverlms may you paste the gsm8k result? (triton backend vs flashinfer backend)

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319

Is this for accuracy or perf? I tried some prompts and the results seemed well.

@zhyncs
Copy link
Member

zhyncs commented Feb 21, 2025

accuracy

@foreverlms
Copy link
Author

foreverlms commented Feb 22, 2025

Got:

Accuracy: 0.453
Invalid: 0.005
Latency: 37.627 s
Output throughput: 4522.078 token/s

Compared with mla not enabled:

Accuracy: 0.659
Invalid: 0.002
Latency: 40.824 s
Output throughput: 4552.229 token/s

Launch command:

CUDA_VISIBLE_DEVICES=4 python -m sglang.launch_server  --model-path deepseek-ai/DeepSeek-V2-Lite-Chat --port 30000 --host 0.0.0.0 --trust-remote-code

There is gap. @zhyncs

@yzh119
Copy link
Collaborator

yzh119 commented Feb 23, 2025

I'm investigating the accuracy issue now.

@yzh119
Copy link
Collaborator

yzh119 commented Feb 23, 2025

Hi @foreverlms @zhyncs , for dsv2, we should not use hardcoded sm_scale: 1 / sqrt(192), it should be 0.1147213867929261, which is computed by yarn_get_mscale: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py, thanks Zhiyao Cen from vllm team for the hint.

After changing sm_scale to the correct value, the accuracy become 0.650

@foreverlms
Copy link
Author

foreverlms commented Feb 24, 2025

Hi @foreverlms @zhyncs , for dsv2, we should not use hardcoded sm_scale: 1 / sqrt(192), it should be 0.1147213867929261, which is computed by yarn_get_mscale: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py, thanks Zhiyao Cen from vllm team for the hint.

After changing sm_scale to the correct value, the accuracy become 0.650

As said from the pr you mentioned, the newly added mla backend will handle both v2 and v3. I will close this PR. cc @zhyncs

@foreverlms foreverlms closed this Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants