Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Parallel profiling result #22

Closed
zhuohan123 opened this issue Apr 2, 2023 · 0 comments
Closed

Tensor Parallel profiling result #22

zhuohan123 opened this issue Apr 2, 2023 · 0 comments

Comments

@zhuohan123
Copy link
Member

zhuohan123 commented Apr 2, 2023

Will update the profiling results in this PR.

BS=8, input_len=32, output_len=128

OPT-13B
TP 1: 3.5404738585154214 seconds
TP 2: 4.742188215255737 seconds
TP 4: 4.907034238179524 seconds

OPT-30B
TP 1: OOM
TP 2: 5.9848620891571045 seconds
TP 4: 5.943212985992432 seconds
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this issue Apr 3, 2024
…sform_plus

Extended vLLM transform for optimum-intel based models + Utilities
fxmarty pushed a commit to fxmarty/vllm-public that referenced this issue May 31, 2024
Integrate PagedAttention Optimization custom kernel into vLLM
@alixiaodi alixiaodi mentioned this issue Aug 2, 2024
njhill pushed a commit that referenced this issue Oct 31, 2024
split core process into separate class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants