Tensor Parallel profiling result #22

zhuohan123 · 2023-04-02T06:50:04Z

Will update the profiling results in this PR.

BS=8, input_len=32, output_len=128

OPT-13B
TP 1: 3.5404738585154214 seconds
TP 2: 4.742188215255737 seconds
TP 4: 4.907034238179524 seconds

OPT-30B
TP 1: OOM
TP 2: 5.9848620891571045 seconds
TP 4: 5.943212985992432 seconds

The text was updated successfully, but these errors were encountered:

…sform_plus Extended vLLM transform for optimum-intel based models + Utilities

Integrate PagedAttention Optimization custom kernel into vLLM

split core process into separate class

WoosukKwon mentioned this issue Apr 5, 2023

Add CUDA graph-based all reduce launcher #26

Merged

WoosukKwon closed this as completed Jun 16, 2023

shanshanpt mentioned this issue Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this issue Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this issue Apr 3, 2024

Merge pull request vllm-project#22 from slyalin/extended_optimum_tran…

3992523

…sform_plus Extended vLLM transform for optimum-intel based models + Utilities

fxmarty pushed a commit to fxmarty/vllm-public that referenced this issue May 31, 2024

Merge pull request vllm-project#22 from ROCm/csrikris_pa_opt_shomy_1_16

87ec0c7

Integrate PagedAttention Optimization custom kernel into vLLM

alixiaodi mentioned this issue Aug 2, 2024

[Bug]: #7072

Closed

njhill pushed a commit that referenced this issue Oct 31, 2024

Merge pull request #22 from njhill/rework-splitcore

99f683e

split core process into separate class

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Parallel profiling result #22

Tensor Parallel profiling result #22

zhuohan123 commented Apr 2, 2023 •

edited

Loading

Tensor Parallel profiling result #22

Tensor Parallel profiling result #22

Comments

zhuohan123 commented Apr 2, 2023 • edited Loading

BS=8, input_len=32, output_len=128

zhuohan123 commented Apr 2, 2023 •

edited

Loading