This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
Upstream sync 2024 06 08#288
Merged
andy-neuma merged 101 commits intomainfrom upstream-sync-2024-06-08Jun 10, 2024
+17,573-9,003
Commits
Commits on Jun 8, 2024
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (vllm-project#4799)
- committed
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (vllm-project#4837)
- committed
- committed
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (vllm-project#5136)
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)
- committed
Commits on Jun 9, 2024
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed