Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 06 16 #307

Merged
merged 58 commits into from
Jun 20, 2024
Merged

Upstream sync 2024 06 16 #307

merged 58 commits into from
Jun 20, 2024

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jun 16, 2024

Upstream sync 2024 06 16 (#310) - v0.5.0.post of vllm

SUMMARY:

  • Merge commits from vllm-project@8f89d72 to vllm-project@0f0d8bc
  • Limit numpy to < 2.0
  • Updated run-tests to print name of the test that is about to run (for debugging what hangs in automation)
  • Disable usage stats in automation
  • Temporarily disable ENTRYPOINTS (to be re-enabled in Andy's single whl PR)
  • Updated run-tests to consider exit code 5 from pytest to be a pass (since exit code 5 from pytest means that we did not run any tests)

Note that vllm-project@8f89d72 is NOT included in this merge.

COMPARE vs UPSTREAM:

https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-16..vllm-project:vllm:v0.5.0.post1

mgoin and others added 30 commits June 16, 2024 19:37
Inspired by vllm-project#5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large).

In details, we applied 3 optimizations:

- Use inverted scale so that most divisions are changed to multiplications.
- Unroll the loop by 4 times to improve ILP.
- Use vectorized 4 to transfer data between HBM and SRAM.
…#4990)

Signed-off-by: Travis Johnson <[email protected]>
Co-authored-by: Sanger Steel <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
…-project#5293)

[Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
…#5497)

Tune Qwen2-57B-A14B configs based on vllm-project#4921

Throughput Performance
command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2

A100 GPU

benchmark	no config	w/ PR
tp=2	10.53 requests/s, 11058.17 tokens/s	12.47 requests/s, 13088.57 tokens/s
tp=4	17.77 requests/s, 18662.95 tokens/s	20.20 requests/s, 21212.32 tokens/s
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: zifeitong <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@@ -109,6 +109,10 @@ do
LOCAL_SUCCESS=0
RESULT_XML=$(echo ${TEST} | sed -e "s/${TEST_DIR}/${RESULTS_DIR}/" | sed -e "s/.py/.xml/")

# report which test is being run
# (in CI, if a test hangs, this logs *which* test is running *before* it hangs)
echo "RUNNING TEST: ${TEST}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

hopefully, we can get rid of this script soon.


__all__ = [
"__version__",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did the version's value go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its moved to version.py + then imported

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

echo "=== PASSED TEST: ${TEST} ==="
# if a file does not run any tests, pytest reports exit code of 5
# since we skip full modules in our skipping strategy, this is common
elif [[ $LOCAL_SUCCESS == 5 ]]; then,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ... maybe we just echo "=== SKIPPED TEST: ${TEST} ===" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@robertgshaw2-redhat robertgshaw2-redhat merged commit dd39914 into main Jun 20, 2024
36 of 37 checks passed
@robertgshaw2-redhat robertgshaw2-redhat deleted the upstream-sync-2024-06-16 branch June 20, 2024 18:11
derekk-nm pushed a commit that referenced this pull request Jun 24, 2024
Upstream sync 2024 06 16
(#310) - v0.5.0.post of vllm

SUMMARY:

* Merge commits from
vllm-project@8f89d72
to
vllm-project@0f0d8bc
* Limit numpy to < 2.0
* Updated `run-tests` to print name of the test that is about to run
(for debugging what hangs in automation)
* Disable usage stats in automation
* Temporarily disable ENTRYPOINTS (to be re-enabled in Andy's single whl
PR)
* Updated `run-tests` to consider exit code 5 from pytest to be a pass
(since exit code 5 from pytest means that we did not run any tests)

Note that
vllm-project@8f89d72
is NOT included in this merge.

COMPARE vs UPSTREAM:


https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-16..vllm-project:vllm:v0.5.0.post1

---------

Signed-off-by: kevin <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: SangBin Cho <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
Co-authored-by: Kevin H. Luu <[email protected]>
Co-authored-by: Cody Yu <[email protected]>
Co-authored-by: Arthur Kim <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Co-authored-by: Sanger Steel <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: wenyujin333 <[email protected]>
Co-authored-by: Jianan Gu <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: zifeitong <[email protected]>
Co-authored-by: Philipp Moritz <[email protected]>
Co-authored-by: Antoni Baum <[email protected]>
Co-authored-by: Jie Fu (傅杰) <[email protected]>
Co-authored-by: Allen.Dou <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.