-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batched benchmark script and more detailed benchmark metrics #25
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Close this PR since it's too outdated. |
luo-cheng2021
pushed a commit
to luo-cheng2021/vllm
that referenced
this pull request
Apr 17, 2024
Disable weight compression on optimum-intel conversion path
z103cb
referenced
this pull request
in z103cb/opendatahub_vllm
May 9, 2024
Cherry-pick of fix commit 6100f4b from ODH: opendatahub-io#17 --------- Signed-off-by: Travis Johnson <[email protected]> Co-authored-by: Daniele Trifirò <[email protected]>
dtrifiro
added a commit
to dtrifiro/vllm
that referenced
this pull request
May 21, 2024
Ibm main update 2024-05-16
fxmarty
pushed a commit
to fxmarty/vllm-public
that referenced
this pull request
May 31, 2024
Updates to custom PagedAttention for supporting context len upto 32k.
tianyil1
pushed a commit
to tianyil1/vllm
that referenced
this pull request
Jun 5, 2024
* Fix setup.py for HPU * Fix vllm._C import ops -> vllm.hpu import ops * more of the same thing * re-add hpex rmsnorm and rope; but rope is crashing * remove unnecessary comments * add vllm/hpu files * add hpu autodetection * Add HabanaAttention stub * revert accidental changes * revert non-habana backend attention changes * add habana attention/worker/executor, sampling fails now * Restore unnecessarily changed files * enable HabanaMemoryProfiler * Make sampler pass * restore habana fused rope * prefill is now working!!! * fix prefill padding; decode is now working!!!!! * revert accidental changes * remove unused stuff in habana_paged_attn.py * remove diagnostic stuff from llm_engine.py * use HabanaExecutorAsync in async_llm_engine.py * add habana copyright headers to habana_*.py files * fix prefill attention conformance * minor naming fixes * remove naive attention from habana_attn (it never worked anyway) * re-enable profile run * Add fake HPUGraph support * add more metrics * indentation fix * ~~recipe cache metrics don't work lalalala~~ * i'm done with metrics for now * fix corner case in which hl-smi is not available but synapse is * FIXME: temporary setup.py workaround * WIP: add tensor parallelism stubs * habana worker cleanup * tensor parallelism is now working * remove unused files * remove unused func * add hpugraphrunner * improve hpu layernorm * Port pipelined PA * Port context length bucketing * remove cudagraphrunner from hpu runner * restore HPUGraphRunner back from FakeHPUGraphRunner * handle rotary embeddings properly on gaudi3 * oopsie! captured_block_counts was incorrect! * captured_block_counts.append doesn't do anything * Restore habana_main KV cache memory layout * fix memory profiler * overhaul hpugraph capture * memory profiling overhaul * format memory properly in model warmup * add graph compilation profiler for graph capture phase * adroll back log lvl on graph capture message * Remove unnecessary view on residual connection in RMSNorm (vllm-project#25) --------- Co-authored-by: madamczykhabana <[email protected]>
bigPYJ1151
pushed a commit
to bigPYJ1151/vllm
that referenced
this pull request
Jun 25, 2024
Add IPEX MOE CPU support
yukavio
pushed a commit
to yukavio/vllm
that referenced
this pull request
Jul 3, 2024
yukavio
pushed a commit
to yukavio/vllm
that referenced
this pull request
Jul 3, 2024
Closed
njhill
pushed a commit
to njhill/vllm
that referenced
this pull request
Nov 6, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.