Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression between v0.4.0 and v0.4.1 #4210

Closed
Tracked by #4181
simon-mo opened this issue Apr 19, 2024 · 2 comments
Closed
Tracked by #4181

Performance Regression between v0.4.0 and v0.4.1 #4210

simon-mo opened this issue Apr 19, 2024 · 2 comments
Assignees
Labels
performance Performance-related issues release-blocker This PR/issue blocks the next release, therefore deserves highest priority

Comments

@simon-mo
Copy link
Collaborator

Anything you want to discuss about vllm.

#3550 seems to reduce throughput of vLLM

Before: Throughput: 20.13 requests/s, 10308.29 tokens/s
After: Throughput: 17.67 requests/s, 9048.03 tokens/s

(reported by @esmeetu and @youkaichao)

@simon-mo simon-mo added misc performance Performance-related issues release-blocker This PR/issue blocks the next release, therefore deserves highest priority and removed misc labels Apr 19, 2024
@simon-mo simon-mo mentioned this issue Apr 19, 2024
9 tasks
@rkooo567
Copy link
Collaborator

@simon-mo I can do investigation this Sun. Unfortunately, I have plans this weekends already made that I cannot cancel...

@rkooo567 rkooo567 self-assigned this Apr 22, 2024
@njhill
Copy link
Member

njhill commented May 15, 2024

For future reference, this was addressed by #4280.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues release-blocker This PR/issue blocks the next release, therefore deserves highest priority
Projects
None yet
Development

No branches or pull requests

3 participants