Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prefix Caching Benchmark to Serving Benchmark #3194

Closed
wants to merge 11 commits into from
Closed

Add Prefix Caching Benchmark to Serving Benchmark #3194

wants to merge 11 commits into from

Conversation

ywang96
Copy link
Member

@ywang96 ywang96 commented Mar 5, 2024

This PR:

Not sure if we should separate the prompt generation functions to a different file so it can be reused by benchmark_throughput.py and benchmark_prefix_caching.py as well. For now I still kept them in the same file.

@ywang96 ywang96 marked this pull request as ready for review March 5, 2024 09:59
@ywang96
Copy link
Member Author

ywang96 commented Mar 5, 2024

cc @robertgshaw2-neuralmagic as we discussed offline on benchmarking automatic prefix caching. PTAL and let me know if you have any question!

@ywang96
Copy link
Member Author

ywang96 commented Mar 5, 2024

@tattrongvu - I added some additional functionalities to save error, generated text and actual output len to the result json file so it's easier to debug. I've already tested it myself with num_prompts=200 and request-rate='inf' on a mixtral hosted on 2xA100-80G with vllm 0.3.3, so if you could give this PR a try, that would be great!

@ywang96 ywang96 closed this by deleting the head repository Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant