fix inaccurate batch size description

vllm-project · Jun 16, 2024 · 511c529 · 511c529
1 parent dc61dc8
commit 511c529
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/.buildkite/nightly-benchmarks/tests/descriptions.md b/.buildkite/nightly-benchmarks/tests/descriptions.md
@@ -19,7 +19,7 @@ This test suite aims to test vllm's throughput.
 
 - Input length: randomly sample 200 prompts from ShareGPT dataset (with fixed random seed).
 - Output length: the corresponding output length of these 200 prompts.
-- Batch size: no constraint, so that vllm can batch as many requests as GPU memory permits.
+- Batch size: dynamically determined by vllm to achieve maximum throughput.
 - Models: llama-3 8B, llama-3 70B, mixtral 8x7B.
 - Evaluation metrics: throughput.
 
@@ -33,7 +33,7 @@ This test suite aims to test vllm's real serving metrics.
 
 - Input length: randomly sample 200 prompts from ShareGPT dataset (with fixed random seed).
 - Output length: the corresponding output length of these 200 prompts.
-- Batch size: no constraint, so that vllm can batch as many requests as GPU memory permits.
+- Batch size: dynamically determined by vllm and the arrival pattern of the requests.
 - **Average QPS (query per second)**: 1, 4, 16 and inf. QPS = inf means all requests come at once. For other QPS values, the arrival time of each query is determined using a random Poisson process (with fixed random seed).
 - Models: llama-3 8B, llama-3 70B, mixtral 8x7B.
 - Evaluation metrics: throughput, TTFT (time to the first token, with mean, median and p99), ITL (inter-token latency, with mean, median and p99).