Performance Regression between v0.4.0 and v0.4.1 #4210
Labels
performance
Performance-related issues
release-blocker
This PR/issue blocks the next release, therefore deserves highest priority
Anything you want to discuss about vllm.
#3550 seems to reduce throughput of vLLM
Before: Throughput: 20.13 requests/s, 10308.29 tokens/s
After: Throughput: 17.67 requests/s, 9048.03 tokens/s
(reported by @esmeetu and @youkaichao)
The text was updated successfully, but these errors were encountered: