Add Prefix Caching Benchmark to Serving Benchmark #3194

ywang96 · 2024-03-05T09:26:52Z

This PR:

Added a way to benchmark automatic prefix caching using sonnet.txt dataset from llmperf-legacy, with additional arguments to control input, output and prefix lengths.
Fixed TPOT/ITL(time per output token/inter token latency) calculation - now TTFT is excluded from end-to-end latency before latency is averaged over number of generated tokens - 1 (first token).
Misc fixes due to changes in other backends.
Error logging to address IMPORTANT Bug: Model return empty response (output len = 0), when recieved multiple concurrent request. #3209

Not sure if we should separate the prompt generation functions to a different file so it can be reused by benchmark_throughput.py and benchmark_prefix_caching.py as well. For now I still kept them in the same file.

ywang96 · 2024-03-05T10:09:11Z

cc @robertgshaw2-neuralmagic as we discussed offline on benchmarking automatic prefix caching. PTAL and let me know if you have any question!

ywang96 · 2024-03-05T23:33:14Z

@tattrongvu - I added some additional functionalities to save error, generated text and actual output len to the result json file so it's easier to debug. I've already tested it myself with num_prompts=200 and request-rate='inf' on a mixtral hosted on 2xA100-80G with vllm 0.3.3, so if you could give this PR a try, that would be great!

ywang96 added 7 commits March 5, 2024 06:00

update tpot

b445b47

Merge branch 'main' into add-prefix

c252dab

minor change to fix tgi & ds

fe085cd

add sonnet

84dfe1f

skip sonnet for ruff

8b65047

fix CI

430a201

typo

9a8cd49

ywang96 marked this pull request as ready for review March 5, 2024 09:59

This was referenced Mar 5, 2024

Benchmarking script does not limit the maximum concurrency #3127

Closed

IMPORTANT Bug: Model return empty response (output len = 0), when recieved multiple concurrent request. #3209

Closed

ywang96 added 2 commits March 5, 2024 23:13

error & info logging

5332bb2

yapf & ruff

d22538c

ywang96 mentioned this pull request Mar 7, 2024

[Minor Fix] Fix comments in benchmark_serving #3252

Merged

ywang96 added 2 commits March 7, 2024 17:54

update comments

6cce059

typo

2ca739d

ywang96 closed this by deleting the head repository Mar 8, 2024

ywang96 mentioned this pull request Mar 8, 2024

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prefix Caching Benchmark to Serving Benchmark #3194

Add Prefix Caching Benchmark to Serving Benchmark #3194

ywang96 commented Mar 5, 2024 •

edited

Loading

ywang96 commented Mar 5, 2024

ywang96 commented Mar 5, 2024

Add Prefix Caching Benchmark to Serving Benchmark #3194

Add Prefix Caching Benchmark to Serving Benchmark #3194

Conversation

ywang96 commented Mar 5, 2024 • edited Loading

ywang96 commented Mar 5, 2024

ywang96 commented Mar 5, 2024

ywang96 commented Mar 5, 2024 •

edited

Loading