-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
…st configurations (vllm-project#5253)
…ation test configurations" (vllm-project#5463)
…t fail for GPU tests (vllm-project#5464) Signed-off-by: kevin <[email protected]>
Inspired by vllm-project#5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large). In details, we applied 3 optimizations: - Use inverted scale so that most divisions are changed to multiplications. - Unroll the loop by 4 times to improve ILP. - Use vectorized 4 to transfer data between HBM and SRAM.
…#4990) Signed-off-by: Travis Johnson <[email protected]> Co-authored-by: Sanger Steel <[email protected]> Co-authored-by: Roger Wang <[email protected]>
…-project#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
Signed-off-by: kevin <[email protected]>
…should i… (vllm-project#5303) Signed-off-by: Wang, Yi A <[email protected]> Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
…n test configurations (vllm-project#5466)
…#5497) Tune Qwen2-57B-A14B configs based on vllm-project#4921 Throughput Performance command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2 A100 GPU benchmark no config w/ PR tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s
…llm-project#4971) Co-authored-by: Jianan Gu <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Michael Goin <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: zifeitong <[email protected]> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Philipp Moritz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
.github/scripts/run-tests
Outdated
@@ -109,6 +109,10 @@ do | |||
LOCAL_SUCCESS=0 | |||
RESULT_XML=$(echo ${TEST} | sed -e "s/${TEST_DIR}/${RESULTS_DIR}/" | sed -e "s/.py/.xml/") | |||
|
|||
# report which test is being run | |||
# (in CI, if a test hangs, this logs *which* test is running *before* it hangs) | |||
echo "RUNNING TEST: ${TEST}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
hopefully, we can get rid of this script soon.
|
||
__all__ = [ | ||
"__version__", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did the version's value go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its moved to version.py
+ then imported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
.github/scripts/run-tests
Outdated
echo "=== PASSED TEST: ${TEST} ===" | ||
# if a file does not run any tests, pytest reports exit code of 5 | ||
# since we skip full modules in our skipping strategy, this is common | ||
elif [[ $LOCAL_SUCCESS == 5 ]]; then, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ... maybe we just echo "=== SKIPPED TEST: ${TEST} ===" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Upstream sync 2024 06 16 (#310) - v0.5.0.post of vllm SUMMARY: * Merge commits from vllm-project@8f89d72 to vllm-project@0f0d8bc * Limit numpy to < 2.0 * Updated `run-tests` to print name of the test that is about to run (for debugging what hangs in automation) * Disable usage stats in automation * Temporarily disable ENTRYPOINTS (to be re-enabled in Andy's single whl PR) * Updated `run-tests` to consider exit code 5 from pytest to be a pass (since exit code 5 from pytest means that we did not run any tests) Note that vllm-project@8f89d72 is NOT included in this merge. COMPARE vs UPSTREAM: https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-16..vllm-project:vllm:v0.5.0.post1 --------- Signed-off-by: kevin <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Wang, Yi A <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: SangBin Cho <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Arthur Kim <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Sanger Steel <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: wenyujin333 <[email protected]> Co-authored-by: Jianan Gu <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: zifeitong <[email protected]> Co-authored-by: Philipp Moritz <[email protected]> Co-authored-by: Antoni Baum <[email protected]> Co-authored-by: Jie Fu (傅杰) <[email protected]> Co-authored-by: Allen.Dou <[email protected]>
Upstream sync 2024 06 16 (#310) - v0.5.0.post of vllm
SUMMARY:
run-tests
to print name of the test that is about to run (for debugging what hangs in automation)run-tests
to consider exit code 5 from pytest to be a pass (since exit code 5 from pytest means that we did not run any tests)Note that vllm-project@8f89d72 is NOT included in this merge.
COMPARE vs UPSTREAM:
https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-16..vllm-project:vllm:v0.5.0.post1