Performance Regression between v0.4.0 and v0.4.1 #4210

simon-mo · 2024-04-19T17:13:42Z

Anything you want to discuss about vllm.

#3550 seems to reduce throughput of vLLM

Before: Throughput: 20.13 requests/s, 10308.29 tokens/s
After: Throughput: 17.67 requests/s, 9048.03 tokens/s

rkooo567 · 2024-04-19T20:19:30Z

@simon-mo I can do investigation this Sun. Unfortunately, I have plans this weekends already made that I cannot cancel...

njhill · 2024-05-15T18:23:19Z

For future reference, this was addressed by #4280.

simon-mo added misc performance Performance-related issues release-blocker This PR/issue blocks the next release, therefore deserves highest priority and removed misc labels Apr 19, 2024

simon-mo mentioned this issue Apr 19, 2024

v0.4.1 Release Tracker #4181

Closed

9 tasks

rkooo567 self-assigned this Apr 22, 2024

simon-mo closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Regression between v0.4.0 and v0.4.1 #4210

Performance Regression between v0.4.0 and v0.4.1 #4210

simon-mo commented Apr 19, 2024

rkooo567 commented Apr 19, 2024

njhill commented May 15, 2024

Performance Regression between v0.4.0 and v0.4.1 #4210

Performance Regression between v0.4.0 and v0.4.1 #4210

Comments

simon-mo commented Apr 19, 2024

Anything you want to discuss about vllm.

rkooo567 commented Apr 19, 2024

njhill commented May 15, 2024