Upstream merge 25 01 27#391
Merged
gshtras merged 109 commits intomainfrom upstream_merge_25_01_27Jan 28, 2025
+6,363-1,987
Commits
Commits on Jan 20, 2025
Commits on Jan 21, 2025
- authored
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (vllm-project#12222)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (vllm-project#12281)
authored
Commits on Jan 22, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 23, 2025
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (vllm-project#12282)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[BugFix] Fix parameter names and
process_after_weight_loading
for W4A16 MoE Group Act Order (vllm-project#11528)- authored
- authored
Commits on Jan 24, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). (vllm-project#12405)
authored
Commits on Jan 25, 2025
- authored
- authored
- authored
- authored
- authored
Commits on Jan 26, 2025
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 27, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed