Upstream sync 2024 06 16 #307

robertgshaw2-redhat · 2024-06-16T19:53:49Z

Upstream sync 2024 06 16 (#310) - v0.5.0.post of vllm

SUMMARY:

Merge commits from vllm-project@8f89d72 to vllm-project@0f0d8bc
Limit numpy to < 2.0
Updated run-tests to print name of the test that is about to run (for debugging what hangs in automation)
Disable usage stats in automation
Temporarily disable ENTRYPOINTS (to be re-enabled in Andy's single whl PR)
Updated run-tests to consider exit code 5 from pytest to be a pass (since exit code 5 from pytest means that we did not run any tests)

Note that vllm-project@8f89d72 is NOT included in this merge.

COMPARE vs UPSTREAM:

https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-16..vllm-project:vllm:v0.5.0.post1

…st configurations (vllm-project#5253)

…ation test configurations" (vllm-project#5463)

…t fail for GPU tests (vllm-project#5464) Signed-off-by: kevin <[email protected]>

Inspired by vllm-project#5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large). In details, we applied 3 optimizations: - Use inverted scale so that most divisions are changed to multiplications. - Unroll the loop by 4 times to improve ILP. - Use vectorized 4 to transfer data between HBM and SRAM.

…#4990) Signed-off-by: Travis Johnson <[email protected]> Co-authored-by: Sanger Steel <[email protected]> Co-authored-by: Roger Wang <[email protected]>

…#5470)

…llm-project#5425)

…ect#5451)

…-project#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)

Signed-off-by: kevin <[email protected]>

…should i… (vllm-project#5303) Signed-off-by: Wang, Yi A <[email protected]> Co-authored-by: Roger Wang <[email protected]>

Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>

…n test configurations (vllm-project#5466)

…#5497) Tune Qwen2-57B-A14B configs based on vllm-project#4921 Throughput Performance command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2 A100 GPU benchmark no config w/ PR tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s

…llm-project#4971) Co-authored-by: Jianan Gu <[email protected]>

Co-authored-by: Roger Wang <[email protected]>

Co-authored-by: Michael Goin <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: zifeitong <[email protected]> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>

Co-authored-by: Philipp Moritz <[email protected]>

…-project#5478)

)

andy-neuma

thanks

andy-neuma · 2024-06-18T16:20:58Z

.github/scripts/run-tests

@@ -109,6 +109,10 @@ do
    LOCAL_SUCCESS=0
    RESULT_XML=$(echo ${TEST} | sed -e "s/${TEST_DIR}/${RESULTS_DIR}/" | sed -e "s/.py/.xml/")

+    # report which test is being run
+    # (in CI, if a test hangs, this logs *which* test is running *before* it hangs)
+    echo "RUNNING TEST: ${TEST}"


nice

hopefully, we can get rid of this script soon.

andy-neuma · 2024-06-18T16:29:47Z

vllm/__init__.py


 __all__ = [
+    "__version__",


where did the version's value go?

Its moved to version.py + then imported

andy-neuma

thanks

andy-neuma · 2024-06-20T14:48:38Z

.github/scripts/run-tests

+        echo "=== PASSED TEST: ${TEST} ==="
+    # if a file does not run any tests, pytest reports exit code of 5
+    # since we skip full modules in our skipping strategy, this is common
+    elif [[ $LOCAL_SUCCESS == 5 ]]; then,


yeah ... maybe we just echo "=== SKIPPED TEST: ${TEST} ===" ?

Upstream sync 2024 06 16 (#310) - v0.5.0.post of vllm SUMMARY: * Merge commits from vllm-project@8f89d72 to vllm-project@0f0d8bc * Limit numpy to < 2.0 * Updated `run-tests` to print name of the test that is about to run (for debugging what hangs in automation) * Disable usage stats in automation * Temporarily disable ENTRYPOINTS (to be re-enabled in Andy's single whl PR) * Updated `run-tests` to consider exit code 5 from pytest to be a pass (since exit code 5 from pytest means that we did not run any tests) Note that vllm-project@8f89d72 is NOT included in this merge. COMPARE vs UPSTREAM: https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-16..vllm-project:vllm:v0.5.0.post1 --------- Signed-off-by: kevin <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Wang, Yi A <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: SangBin Cho <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Arthur Kim <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Sanger Steel <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: wenyujin333 <[email protected]> Co-authored-by: Jianan Gu <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: zifeitong <[email protected]> Co-authored-by: Philipp Moritz <[email protected]> Co-authored-by: Antoni Baum <[email protected]> Co-authored-by: Jie Fu (傅杰) <[email protected]> Co-authored-by: Allen.Dou <[email protected]>

mgoin and others added 30 commits June 16, 2024 19:37

[CI/Build] Add is_quant_method_supported to control quantization te…

0b02164

…st configurations (vllm-project#5253)

Revert "[CI/Build] Add is_quant_method_supported to control quantiz…

ef29fa3

…ation test configurations" (vllm-project#5463)

[CI] Upgrade codespell version. (vllm-project#5381)

d059698

[Hardware] Initial TPU integration (vllm-project#5292)

134aadc

[Bugfix] Add device assertion to TorchSDPA (vllm-project#5402)

2ca01e6

[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default sof…

2226d70

…t fail for GPU tests (vllm-project#5464) Signed-off-by: kevin <[email protected]>

[Bugfix] TYPE_CHECKING for MultiModalData (vllm-project#5444)

b465102

[Frontend] [Core] Support for sharded tensorized models (vllm-project…

c20bc35

…#4990) Signed-off-by: Travis Johnson <[email protected]> Co-authored-by: Sanger Steel <[email protected]> Co-authored-by: Roger Wang <[email protected]>

[misc] add hint for AttributeError (vllm-project#5462)

2232e17

[Doc] Update debug docs (vllm-project#5438)

a658440

[Bugfix] Fix typo in scheduler.py (requeset -> request) (vllm-project…

470363e

…#5470)

[Frontend] Add "input speed" to tqdm postfix alongside output speed (v…

d2bfa2a

…llm-project#5425)

[Bugfix] Fix wrong multi_modal_input format for CPU runner (vllm-proj…

365e96b

…ect#5451)

[Core][Distributed] code deduplication in tp&pp with coordinator(vllm…

04a28ad

…-project#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)

[ci] Use sccache to build images (vllm-project#5419)

14e332c

Signed-off-by: kevin <[email protected]>

[Bugfix]if the content is started with ":"(response of ping), client …

9c2ff81

…should i… (vllm-project#5303) Signed-off-by: Wang, Yi A <[email protected]> Co-authored-by: Roger Wang <[email protected]>

[Kernel] w4a16 support for compressed-tensors (vllm-project#5385)

0f53365

Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>

[CI/Build][REDO] Add is_quant_method_supported to control quantizatio…

59d5682

…n test configurations (vllm-project#5466)

[Hardware][Intel] Optimize CPU backend and add more performance tips (v…

45e1f25

…llm-project#4971) Co-authored-by: Jianan Gu <[email protected]>

[Docs] Add 4th meetup slides (vllm-project#5509)

d7edd60

[Misc] Add vLLM version getter to utils (vllm-project#5098)

c44efdc

[CI/Build] Simplify OpenAI server setup in tests (vllm-project#5100)

b9c0824

[Doc] Update LLaVA docs (vllm-project#5437)

a689156

Co-authored-by: Roger Wang <[email protected]>

[MISC] Remove FP8 warning (vllm-project#5472)

2752570

Co-authored-by: Philipp Moritz <[email protected]>

Seperate dev requirements into lint and test (vllm-project#5474)

fa895d1

Revert "[Core] Remove unnecessary copies in flash attn backend" (vllm…

93072fb

…-project#5478)

[misc] fix format.sh (vllm-project#5511)

f4eb11e

DamonFool and others added 12 commits June 16, 2024 19:47

[Hardware][Intel] Support CPU inference with AVX2 ISA (vllm-project#5452

61a038e

)

[Misc] Fix arg names in quantizer script (vllm-project#5507)

d72246c

bump version to v0.5.0.post1 (vllm-project#5522)

6b6bd26

fix up reqs

69f4dc1

format

a266560

fix version

50991ad

Merge branch 'main' into upstream-sync-2024-06-16

32ff52c

added skips to distributed tests

a580c65

added skip logic to newly added files

78f359d

committing formats

20abc3f

pin numpy

b2df72d

report which test is running in CI

62e346c

andy-neuma approved these changes Jun 18, 2024

View reviewed changes

robertgshaw2-redhat and others added 12 commits June 18, 2024 22:29

skip test llava next

12581a2

skip llava next

cca46d2

format

33ace95

format

dedebe7

disable usage tracking

a5cc26b

fixed

9ace1dc

format

c42e71a

Merge branch 'main' into upstream-sync-2024-06-16

38f6e88

format

3e016f0

format

ffa0be7

report test status in run-tests

ee0fc0a

count exit code 5 as 0

f6f7c0f

andy-neuma approved these changes Jun 20, 2024

View reviewed changes

fix typo

91ff8fc

robertgshaw2-redhat merged commit dd39914 into main Jun 20, 2024
36 of 37 checks passed

robertgshaw2-redhat deleted the upstream-sync-2024-06-16 branch June 20, 2024 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream sync 2024 06 16 #307

Upstream sync 2024 06 16 #307

robertgshaw2-redhat commented Jun 16, 2024 •

edited

Loading

andy-neuma left a comment

andy-neuma Jun 18, 2024

andy-neuma Jun 18, 2024

robertgshaw2-redhat Jun 18, 2024

andy-neuma left a comment

andy-neuma Jun 20, 2024

robertgshaw2-redhat Jun 20, 2024

Upstream sync 2024 06 16 #307

Upstream sync 2024 06 16 #307

Conversation

robertgshaw2-redhat commented Jun 16, 2024 • edited Loading

andy-neuma left a comment

Choose a reason for hiding this comment

andy-neuma Jun 18, 2024

Choose a reason for hiding this comment

andy-neuma Jun 18, 2024

Choose a reason for hiding this comment

robertgshaw2-redhat Jun 18, 2024

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

andy-neuma Jun 20, 2024

Choose a reason for hiding this comment

robertgshaw2-redhat Jun 20, 2024

Choose a reason for hiding this comment

robertgshaw2-redhat commented Jun 16, 2024 •

edited

Loading