[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

zhyncs · 2024-08-19T17:21:17Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

As titled. Make DeepSeek V2 MLA Faster!

Related resources

No response

fengyang95 · 2024-08-30T09:09:08Z

Is there a specific timeline for this?

zhyncs · 2024-08-30T09:13:53Z

Is there a specific timeline for this?

bmm fp8 has been implemented with flashinfer-ai/flashinfer#469
fp8 e5m2 kv cache has been implemented with #1204

Currently, there is no adaptation for DeepSeek V2 as we are focusing on other higher priority tasks. Expected to be completed within these few days.

zhyncs · 2024-09-01T09:51:32Z

done

zhyncs assigned ispobock and zhyncs Aug 19, 2024

zhyncs added the feature label Aug 19, 2024

This was referenced Sep 1, 2024

feat: fix fp8 for MLA and support bmm fp8 for DeepSeek V2 #1285

Merged

Support Triton fp8 e5m2 kv cache #1286

Merged

zhyncs closed this as completed Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

zhyncs commented Aug 19, 2024

fengyang95 commented Aug 30, 2024

zhyncs commented Aug 30, 2024

zhyncs commented Sep 1, 2024

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

Comments

zhyncs commented Aug 19, 2024

Checklist

Motivation

Related resources

fengyang95 commented Aug 30, 2024

zhyncs commented Aug 30, 2024

zhyncs commented Sep 1, 2024