Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 06 08 #288

Merged
merged 101 commits into from
Jun 10, 2024
Merged

Upstream sync 2024 06 08 #288

merged 101 commits into from
Jun 10, 2024

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jun 8, 2024

Upstream sync 2024 06 08 (#288) - ties to v0.4.3 of vllm-upstream

SUMMARY:

  • Merge commits from vllm-project@f68470e to vllm-project@1197e02
  • Our GCP test instances do not have gcc or clang installed. All of the triton kernels rely on the gcc and clang to generate JITs. I disabled these for now, but we need to get these installed (cc @andy-neuma). All are marked with:
@pytest.mark.skip("C compiler not installed in NM automation. "
                  "This codepath follows a triton pathway, which "
                  "JITs using clang or gcc. Since neither are installed "
                  "in our test instances, we need to skip this for now.")
  • Cherry-picked in the changes associated with Fp8 weight format from @mgoin

Note that vllm-project@f68470e is NOT included in this merge.

COMPARE vs UPSTREAM:

alexm-redhat and others added 30 commits June 8, 2024 16:39
Allow dummy load format for fp8,
torch.uniform_ doesn't support FP8 at the moment

Co-authored-by: Mor Zusman <[email protected]>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893)

The 2nd PR for vllm-project#4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
@andy-neuma andy-neuma self-requested a review June 10, 2024 17:25
Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@andy-neuma andy-neuma merged commit db9ed90 into main Jun 10, 2024
49 of 57 checks passed
robertgshaw2-redhat added a commit that referenced this pull request Jun 11, 2024
Upstream sync 2024 06 11
(#288)

SUMMARY:

* Merge commits from
vllm-project@1197e02
to
vllm-project@114332b
* Our GCP test instances do not have gcc or clang installed. All of the
triton kernels rely on the gcc and clang to generate JITs. These are
still disabled (cc @andy-neuma). All are marked with:
```python 
@pytest.mark.skip("C compiler not installed in NM automation. "
                  "This codepath follows a triton pathway, which "
                  "JITs using clang or gcc. Since neither are installed "
                  "in our test instances, we need to skip this for now.")
```

Note that
vllm-project@1197e02
is NOT included in this merge.

COMPARE vs UPSTREAM:


https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-11..vllm-project:vllm:v0.5.0

---------

Signed-off-by: Ye Cao <[email protected]>
Signed-off-by: kevin <[email protected]>
Co-authored-by: Daniele <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Ye Cao <[email protected]>
Co-authored-by: Nadav Shmayovits <[email protected]>
Co-authored-by: chenqianfzh <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: Daniil Arapov <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Avinash Raj <[email protected]>
Co-authored-by: Divakar Verma <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Antoni Baum <[email protected]>
Co-authored-by: Yuan <[email protected]>
Co-authored-by: Kaiyang Chen <[email protected]>
Co-authored-by: Kevin H. Luu <[email protected]>
Co-authored-by: Breno Faria <[email protected]>
Co-authored-by: Toshiki Kataoka <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: zifeitong <[email protected]>
Co-authored-by: Jie Fu (傅杰) <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: tomeras91 <[email protected]>
Co-authored-by: Cody Yu <[email protected]>
Co-authored-by: DriverSong <[email protected]>
Co-authored-by: qiujiawei9 <[email protected]>
Co-authored-by: Philipp Moritz <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Alex Wu <[email protected]>
Co-authored-by: Breno Faria <[email protected]>
Co-authored-by: liuyhwangyh <[email protected]>
Co-authored-by: mulin.lyh <[email protected]>
Co-authored-by: Matthew Goldey <[email protected]>
Co-authored-by: Jie Fu (傅杰) <[email protected]>
Co-authored-by: Itay Etelis <[email protected]>
Co-authored-by: limingshu <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Calvinn Ng <[email protected]>
Co-authored-by: team <[email protected]>
Co-authored-by: Cheng Li <[email protected]>
Co-authored-by: Benjamin Kitor <[email protected]>
Co-authored-by: Hongxia Yang <[email protected]>
Co-authored-by: bnellnm <[email protected]>
Co-authored-by: Bla_ckB <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.