Skip to content

Commit

Permalink
fix inaccurate batch size description
Browse files Browse the repository at this point in the history
  • Loading branch information
KuntaiDu committed Jun 16, 2024
1 parent dc61dc8 commit 511c529
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions .buildkite/nightly-benchmarks/tests/descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This test suite aims to test vllm's throughput.

- Input length: randomly sample 200 prompts from ShareGPT dataset (with fixed random seed).
- Output length: the corresponding output length of these 200 prompts.
- Batch size: no constraint, so that vllm can batch as many requests as GPU memory permits.
- Batch size: dynamically determined by vllm to achieve maximum throughput.
- Models: llama-3 8B, llama-3 70B, mixtral 8x7B.
- Evaluation metrics: throughput.

Expand All @@ -33,7 +33,7 @@ This test suite aims to test vllm's real serving metrics.

- Input length: randomly sample 200 prompts from ShareGPT dataset (with fixed random seed).
- Output length: the corresponding output length of these 200 prompts.
- Batch size: no constraint, so that vllm can batch as many requests as GPU memory permits.
- Batch size: dynamically determined by vllm and the arrival pattern of the requests.
- **Average QPS (query per second)**: 1, 4, 16 and inf. QPS = inf means all requests come at once. For other QPS values, the arrival time of each query is determined using a random Poisson process (with fixed random seed).
- Models: llama-3 8B, llama-3 70B, mixtral 8x7B.
- Evaluation metrics: throughput, TTFT (time to the first token, with mean, median and p99), ITL (inter-token latency, with mean, median and p99).
Expand Down

0 comments on commit 511c529

Please sign in to comment.