Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched benchmark script and more detailed benchmark metrics #25

Closed
wants to merge 1 commit into from

Conversation

zhuohan123
Copy link
Member

No description provided.

@zhuohan123 zhuohan123 requested a review from WoosukKwon April 3, 2023 14:12
@zhuohan123
Copy link
Member Author

Close this PR since it's too outdated.

@zhuohan123 zhuohan123 closed this Jun 3, 2023
@zhuohan123 zhuohan123 deleted the benchmark-script branch June 18, 2023 07:22
@zhuohan123 zhuohan123 restored the benchmark-script branch June 18, 2023 07:22
@zhuohan123 zhuohan123 deleted the benchmark-script branch June 18, 2023 07:22
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Apr 17, 2024
Disable weight compression on optimum-intel conversion path
z103cb referenced this pull request in z103cb/opendatahub_vllm May 9, 2024
Cherry-pick of fix commit 6100f4b from ODH:
opendatahub-io#17

---------

Signed-off-by: Travis Johnson <[email protected]>
Co-authored-by: Daniele Trifirò <[email protected]>
dtrifiro added a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request May 31, 2024
Updates to custom PagedAttention for supporting context len upto 32k.
tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024
* Fix setup.py for HPU

* Fix  vllm._C import ops -> vllm.hpu import ops

* more of the same thing

* re-add hpex rmsnorm and rope; but rope is crashing

* remove unnecessary comments

* add vllm/hpu files

* add hpu autodetection

* Add HabanaAttention stub

* revert accidental changes

* revert non-habana backend attention changes

* add habana attention/worker/executor, sampling fails now

* Restore unnecessarily changed files

* enable HabanaMemoryProfiler

* Make sampler pass

* restore habana fused rope

* prefill is now working!!!

* fix prefill padding; decode is now working!!!!!

* revert accidental changes

* remove unused stuff in habana_paged_attn.py

* remove diagnostic stuff from llm_engine.py

* use HabanaExecutorAsync in async_llm_engine.py

* add habana copyright headers to habana_*.py files

* fix prefill attention conformance

* minor naming fixes

* remove naive attention from habana_attn (it never worked anyway)

* re-enable profile run

* Add fake HPUGraph support

* add more metrics

* indentation fix

* ~~recipe cache metrics don't work lalalala~~

* i'm done with metrics for now

* fix corner case in which hl-smi is not available but synapse is

* FIXME: temporary setup.py workaround

* WIP: add tensor parallelism stubs

* habana worker cleanup

* tensor parallelism is now working

* remove unused files

* remove unused func

* add hpugraphrunner

* improve hpu layernorm

* Port pipelined PA

* Port context length bucketing

* remove cudagraphrunner from hpu runner

* restore HPUGraphRunner back from FakeHPUGraphRunner

* handle rotary embeddings properly on gaudi3

* oopsie! captured_block_counts was incorrect!

* captured_block_counts.append doesn't do anything

* Restore habana_main KV cache memory layout

* fix memory profiler

* overhaul hpugraph capture

* memory profiling overhaul

* format memory properly in model warmup

* add graph compilation profiler for graph capture phase

* adroll back log lvl on graph capture message

* Remove unnecessary view on residual connection in RMSNorm (vllm-project#25)

---------

Co-authored-by: madamczykhabana <[email protected]>
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jun 25, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
njhill pushed a commit to njhill/vllm that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant