-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang #7412
Merged
simon-mo
merged 182 commits into
vllm-project:main
from
KuntaiDu:kuntai-update-nightlybench
Oct 4, 2024
Merged
Changes from all commits
Commits
Show all changes
182 commits
Select commit
Hold shift + click to select a range
91b49b3
raise trt-llm to version 24.07
KuntaiDu e61fa29
adjust the way of processing protobuf files
KuntaiDu 00aaefa
bump up trt-llm to r24.07
KuntaiDu 7d4b1f0
avoid pip upgrade transformers -- no longer needed
KuntaiDu c67deaa
change tokenizer_dir
KuntaiDu f7901f1
fall back to using protobuf files from tensorrt-demo
KuntaiDu 2e9c063
replace python to python3
KuntaiDu 4e15409
include sglang
KuntaiDu 5f880ae
add sglang to the gateway script
KuntaiDu b74d95a
add sglang benchmarking script
KuntaiDu fc2f850
add vllm host ip
KuntaiDu 0a3bfae
only enable sglang for testing
KuntaiDu 89c7fe8
add sglang server parameters
KuntaiDu cbbcbd0
use Llama 3.1 instead of llama3
KuntaiDu 0648d9e
bring back vllm for testing
KuntaiDu 7f16b64
add sglang into backend request func
KuntaiDu 5155b77
adjust trt-llm launch script
KuntaiDu 85d39f9
enable trt-llm and sglang
KuntaiDu 4516373
bug fix: cd into triton_model_repo before calling fill_template.py
KuntaiDu dbf6607
upload zipped results to artifact -- for easy debugging
KuntaiDu a8cac72
upload
KuntaiDu e461c64
use Llama 3 8B instead --- trt-llm crashes with Llama 3 8.1B
KuntaiDu c935bda
update the documentation
KuntaiDu a7e12e7
update trt --- no need to update transformers
KuntaiDu 9605cf3
change to llama 2 7b --- I don't have access to llama3 8B for dev
KuntaiDu fcc3f52
adjust trt-llm backend version -- should be v0.11.0
KuntaiDu 760d70f
replace tokenizer_dir to the directory that has been downloaded (/tok…
KuntaiDu 8230734
no it should be
KuntaiDu 1ba468b
update llama model
KuntaiDu c7bfc22
update model_path
KuntaiDu 4c1500c
add model path
KuntaiDu 711b65f
log error
KuntaiDu 483e1b1
add logging when launching triton server
KuntaiDu 1c5f677
add debugging symbol
KuntaiDu a65763b
engine_path needs to be the path of compiled engine...
KuntaiDu aa278eb
adjust the way of killing vllm instance
KuntaiDu 5da3db1
add more QPS
KuntaiDu 45e94a2
disable radix cache and enable torch compile for SGLang
KuntaiDu a4cb503
move nightly benchmark script into scripts folder
KuntaiDu 03f7830
centralize run scripts alltogether
KuntaiDu 84fd15e
merge all running scripts into one.
KuntaiDu 797c4d6
merge server launching script into one place
KuntaiDu 6fd153a
adjust the testing cases
KuntaiDu 61a45b5
bug fix on finding nightly-tests.json
KuntaiDu eb592aa
wait for server before running benchmark_serving.py
KuntaiDu c7aafa0
adjust sonnet parameters
KuntaiDu 23b886a
add get_chat_template attribute
KuntaiDu ac7ecc5
check the existence of chat template via apply_chat_template
KuntaiDu aefae68
fall back to default way -- it is correct, I guess I need to use inst…
KuntaiDu ac95463
use instruct model
KuntaiDu 2337b39
trt does not work with llama 3.1 in r24.07
KuntaiDu fb834ff
add full pipeline for testing
KuntaiDu 2f8db9d
add upload_to_buildkite utility
KuntaiDu 7435538
add long decode workload
KuntaiDu 6fd1ac5
update the engine name of trt to tensorrt-llm, to match with benchmar…
KuntaiDu 2ff5429
bug fix: annotate the engine correctly in the benchmarking result
KuntaiDu a53fbc7
update test suite
KuntaiDu 02f22f9
update plotting script
KuntaiDu b17e76f
update how to annotate the results
KuntaiDu 594f35b
update nightly descriptions doc correspondingly
KuntaiDu b8a3f76
update plotting script
KuntaiDu a64aeab
rename to tensorrt-llm
KuntaiDu 12b1ec4
update transformers library
KuntaiDu f513995
Merge branch 'vllm-project:main' into kuntai-update-nightlybench
KuntaiDu c52c45e
Merge branch 'kuntai-update-nightlybench' of https://github.com/Kunta…
KuntaiDu 582b5b2
add support for ignore_eos flag, for benchmarking
KuntaiDu 7802c75
annotate that tgi and deepspeed_mii is not supported ignore_eos
KuntaiDu 8e3e269
set ignore_eos flag for benchmark
KuntaiDu 8409687
support total_input_tokens and total_output_tokens
KuntaiDu e138cca
generate markdown file in a separate file
KuntaiDu 3951a96
need to fallback to llama 3.0
KuntaiDu c23ccc6
no i need to update docker version instead
KuntaiDu 058e1aa
bug fix: no need to specify type when store_true
KuntaiDu e320941
update docker container
KuntaiDu 1b4946c
adjust the nameing of tensorrt
KuntaiDu 40ffa0a
set QPS
KuntaiDu 9efae6c
remove tgi: there is no way to constraint its output length
KuntaiDu d2072af
adjust the name in benchmarking script
KuntaiDu a1596ed
switch to 8B, change lmdeploy version name
KuntaiDu 4c7d73c
update to latest aws docker -- for multi-step scheduling
KuntaiDu f089fae
comment out sglang
KuntaiDu 35c2025
move to Llama 3.1 for local testing
KuntaiDu 8f411ec
make sure server args exist
KuntaiDu 3387919
raise max_model_len
KuntaiDu 8700543
export VLLM host ip
KuntaiDu 0b531f0
enable sglang
KuntaiDu 9a6a18a
disable torch compile --- it raises bug for 8b instruct model
KuntaiDu 26ad283
fix json syntax bug
KuntaiDu 16ce24a
bug fix
KuntaiDu b5f90fd
benchmark vllm again with num-schedule-step=1
KuntaiDu 3c40c2f
allow downloading results and scripts from buildkite annotation. Down…
KuntaiDu ca6a9fd
add vllm version-specific benchmarking
KuntaiDu c3696b0
distinguish between different versions of vllm
KuntaiDu 1c95549
use different set of parameters between different versions of vllm
KuntaiDu 0ee0688
fix vllm import issue
KuntaiDu 4befb1b
launch server from script
KuntaiDu 53b13a2
redirect backend to vllm if the engine is vllm055 or vllm 053post1
KuntaiDu 1410ce3
refer to instead of when benchmark_serving
KuntaiDu c75dbcd
udpate test cases
KuntaiDu 427e013
use tpot instead of ITL --- ITL is wrongfully too large for multi-step
KuntaiDu 9228035
update plotting script and benchmarking results
KuntaiDu 039f391
update sharegpt image
KuntaiDu a0a944d
also put raw benchmark results inside, in case people wants to reproduce
KuntaiDu 6fb655a
remove the results --- without causing footprint when merging into main
KuntaiDu ca36b0e
sanity check on the full set of benchmark
KuntaiDu 415cc0f
update--remove vllm 0.5.3.post1
KuntaiDu 0a8d641
vary different values of scheduler steps
KuntaiDu f72eeca
adjust parameters --- name vllm as vllm 0.5.5, so that we can vary it…
KuntaiDu 6e2a9d0
change launch_server
KuntaiDu f364a54
log NUM_SCHEDULER_STEPS
KuntaiDu c4a6dfd
adjust the way of injecting env var
KuntaiDu 9a2acda
fix typo: should be NUM_SCHEDULER_STEPS instead of NUM_SCHEDULER_STEP
KuntaiDu 208a111
add step 2 and 3
KuntaiDu 99153de
temporarily cache the results
KuntaiDu 52eabfe
cache plotting script
KuntaiDu 8ad3184
update nightly pipeline to benchmark async output processing
KuntaiDu 093f410
add nightly benchmark results for sharing
KuntaiDu 76ce5b7
test async output processing
KuntaiDu 2dfecb9
update the docker image and try again
KuntaiDu 6c1f754
fix zmq backend issue
KuntaiDu 9a8e8fa
use the image that is post-merge so that we have both zmq and async o…
KuntaiDu f9cd4bb
check step=1 performance
KuntaiDu e3ba754
update plotting script
KuntaiDu decf67b
test if successful or no
KuntaiDu 257087f
support latency
KuntaiDu 9882b17
initial test run
KuntaiDu 7d0e3c6
add vllm -- test if it is OK to raise to 0.95
KuntaiDu 8bf3308
sglang does not support torch.compile on Llama 3
KuntaiDu ceaaff7
add latency key
KuntaiDu 945a09b
remove max-model-len --- llama 3b is short context model
KuntaiDu 2f4a1cb
comment out sglang and lmdeploy
KuntaiDu bf61370
large-scale benchmark start
KuntaiDu 7e7435c
remove results
KuntaiDu c040060
bugfix: should be instead of
KuntaiDu 4a815f9
update benchmark
KuntaiDu 49190d7
skip mixtral
KuntaiDu b33055b
avoid injecting scheduler steps via envvar
KuntaiDu 811fdbf
small-scale test on vllm
KuntaiDu 57b4b7c
comment out other engines
KuntaiDu 1d2f0e2
do not reuse server
KuntaiDu 3970bfe
switch to latest docker
KuntaiDu ec61fb6
bring in the full test suite
KuntaiDu 93156a0
bring in the docker of all benchmarking engines
KuntaiDu 408fab0
update plotting script
KuntaiDu e43b66c
need to separate TRT benchmark to two steps.... Test variable
KuntaiDu 575d89a
Add to separate one trt-llm runs to two steps, so that the 1:30hr lo…
KuntaiDu 661633f
also update the comparison script
KuntaiDu 4276c99
adjust nightly test --- i guess the # of output tokens cannot be long…
KuntaiDu e6c94e5
cut down the test scale so that it fits within 1:30 minutes
KuntaiDu 42650a1
update test cases
KuntaiDu fbd27dc
use Alex's PR to rerun the benchmark
KuntaiDu e2373e8
adjust the test case: maximum length when generating output should no…
KuntaiDu 4ffa6f9
make sure that vllm benchmark runs first
KuntaiDu 501fea6
update to include more benchmarking metrics
KuntaiDu 555db07
test Alex and Rober'ts PR
KuntaiDu 79102e7
significantly reduce the test case --- please don't crash when killin…
KuntaiDu 094339c
update plotting script
KuntaiDu 5d054f2
bring back the full benchmarking suite
KuntaiDu 7f74875
bump up cuda version to 12.4, also update sglang version
KuntaiDu e7e6c57
udpate plotting script
KuntaiDu ba1c9ee
add ignore-eos flag
KuntaiDu 9163d52
fix: cannot reuse server if there is only one test
KuntaiDu 950219c
set sonnet output len to 400
KuntaiDu 8f8ed06
rerun trt , with --ignore-eos set off
KuntaiDu 6e3e6e1
switch to 8B
KuntaiDu ede9688
update nightly benchmarks -- add ignore_eos
KuntaiDu 3b994c5
move plotting scripts to a separate folder
KuntaiDu 6bc6777
avoid embedding vLLM version to current serving engine, to make upgra…
KuntaiDu 9d63edb
rename vllm 055 to vllm
KuntaiDu faf4083
reduce GPU util to 0.9
KuntaiDu 1b43f1c
enable torch compile for SGLang
KuntaiDu 3a5fa29
Merge branch 'main' into kuntai-update-nightlybench
KuntaiDu 7e12e84
make syntax checker happy
KuntaiDu 60892b6
Merge branch 'vllm-project:main' into kuntai-update-nightlybench
KuntaiDu 66ced32
remove plotting scripts -- visualization is sensitive and people need…
KuntaiDu 5bc0198
Merge branch 'kuntai-update-nightlybench' of https://github.com/Kunta…
KuntaiDu efc0bc4
fix comments & bump to bf16
KuntaiDu 9c20da0
update metric collection metric to incorporate latest benchmark_servi…
KuntaiDu e20d6b9
bug fix: latency has been removed.
KuntaiDu 3a76dc7
Remove QPS 2 to accelerate the benchmark (SGLang's benchmark is hitti…
KuntaiDu 141ce12
Merge branch 'kuntai-update-nightlybench' of https://github.com/Kunta…
KuntaiDu 4f1a72a
Bump up version of all containers
KuntaiDu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
|
||
## Description | ||
|
||
This file contains the downloading link for benchmarking results. | ||
|
||
- [benchmarking pipeline](artifact://nightly-pipeline.yaml) | ||
- [benchmarking results](artifact://results.zip) | ||
- [benchmarking code](artifact://nightly-benchmarks.zip) | ||
|
||
Please download the visualization scripts in the post | ||
|
||
|
||
## Results reproduction | ||
|
||
- Find the docker we use in `benchmarking pipeline` | ||
- Deploy the docker, and inside the docker: | ||
- Download `nightly-benchmarks.zip`. | ||
- In the same folder, run the following code | ||
``` | ||
export HF_TOKEN=<your HF token> | ||
apt update | ||
apt install -y git | ||
unzip nightly-benchmarks.zip | ||
VLLM_SOURCE_CODE_LOC=./ bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh | ||
``` | ||
|
||
And the results will be inside `./benchmarks/results`. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,39 @@ | ||
|
||
# Nightly benchmark | ||
|
||
The main goal of this benchmarking is two-fold: | ||
- Performance clarity: Provide clarity on which one (vllm, tensorrt-llm, lmdeploy and tgi) leads in performance in what workload. | ||
- Reproducible: one can run the exact same set of benchmarking commands inside the exact same docker by following reproducing instructions in [reproduce.md](). | ||
|
||
|
||
## Docker images | ||
|
||
We benchmark vllm, tensorrt-llm, lmdeploy and tgi using the following docker images: | ||
- vllm/vllm-openai:v0.5.0.post1 | ||
- nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 | ||
- openmmlab/lmdeploy:v0.5.0 | ||
- ghcr.io/huggingface/text-generation-inference:2.1 | ||
|
||
<!-- Please check <a href="artifact://workspace/build/buildkite/vllm/performance-benchmark/.buildkite/nightly-benchmarks/nightly-pipeline.yaml">nightly-pipeline.yaml</a> artifact for more details on how we deploy the docker images. --> | ||
|
||
|
||
## Hardware | ||
|
||
One AWS node with 8x NVIDIA A100 GPUs. | ||
|
||
|
||
## Workload description | ||
|
||
We benchmark vllm, tensorrt-llm, lmdeploy and tgi using the following workload: | ||
|
||
- Input length: randomly sample 500 prompts from ShareGPT dataset (with fixed random seed). | ||
- Output length: the corresponding output length of these 500 prompts. | ||
- Models: llama-3 8B, llama-3 70B, mixtral 8x7B. | ||
- Average QPS (query per second): 4 for the small model (llama-3 8B) and 2 for other two models. For each QPS, the arrival time of each query is determined using a random Poisson process (with fixed random seed). | ||
- Evaluation metrics: Throughput (higher the better), TTFT (time to the first token, lower the better), ITL (inter-token latency, lower the better). | ||
|
||
<!-- Check <a href="artifact://workspace/build/buildkite/vllm/performance-benchmark/.buildkite/nightly-benchmarks/tests/nightly-tests.json">nightly-tests.json</a> artifact for more details. --> | ||
|
||
## Plots | ||
|
||
In the following plots, the dot shows the mean and the error bar shows the standard error of the mean. Value 0 means that the corresponding benchmark crashed. | ||
|
||
<img src="artifact://nightly_results.png" alt="Benchmarking results" height=250 > | ||
|
||
## Results | ||
|
||
{nightly_results_benchmarking_table} | ||
This benchmark aims to: | ||
- Provide performance clarity: Provide clarity on which one (vllm, tensorrt-llm, lmdeploy and SGLang) leads in performance in what workload. | ||
- Be reproducible: one can run the exact same set of benchmarking commands inside the exact same docker by following reproducing instructions. | ||
|
||
Latest results: [results link](https://blog.vllm.ai/2024/09/05/perf-update.html), scroll to the end. | ||
|
||
Latest reproduction guilde: [github issue link](https://github.com/vllm-project/vllm/issues/8176) | ||
|
||
|
||
## Setup | ||
|
||
- Docker images: | ||
- vLLM: `vllm/vllm-openai:v0.6.2` | ||
- SGLang: `lmsysorg/sglang:v0.3.2-cu121` | ||
- LMDeploy: `openmmlab/lmdeploy:v0.6.1-cu12` | ||
- TensorRT-LLM: `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3` | ||
- *NOTE: we uses r24.07 as the current implementation only works for this version. We are going to bump this up.* | ||
- Check [nightly-pipeline.yaml](nightly-pipeline.yaml) for the concrete docker images, specs and commands we use for the benchmark. | ||
- Hardware | ||
- 8x Nvidia A100 GPUs | ||
- Workload: | ||
- Dataset | ||
- ShareGPT dataset | ||
- Prefill-heavy dataset (in average 462 input tokens, 16 tokens as output) | ||
- Decode-heavy dataset (in average 462 input tokens, 256 output tokens) | ||
- Check [nightly-tests.json](tests/nightly-tests.json) for the concrete configuration of datasets we use. | ||
- Models: llama-3 8B, llama-3 70B. | ||
- We do not use llama 3.1 as it is incompatible with trt-llm r24.07. ([issue](https://github.com/NVIDIA/TensorRT-LLM/issues/2105)). | ||
- Average QPS (query per second): 2, 4, 8, 16, 32 and inf. | ||
- Queries are randomly sampled, and arrival patterns are determined via Poisson process, but all with fixed random seed. | ||
- Evaluation metrics: Throughput (higher the better), TTFT (time to the first token, lower the better), ITL (inter-token latency, lower the better). | ||
|
||
# Known issues | ||
|
||
- TRT-LLM crashes with Llama 3.1 8B [issue](https://github.com/NVIDIA/TensorRT-LLM/issues/2105). | ||
- TGI does not support `ignore-eos` flag. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,7 @@ common_pod_spec: &common_pod_spec | |
|
||
common_container_settings: &common_container_settings | ||
command: | ||
- bash .buildkite/nightly-benchmarks/run-nightly-suite.sh | ||
- bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh | ||
resources: | ||
limits: | ||
nvidia.com/gpu: 8 | ||
|
@@ -37,7 +37,10 @@ common_container_settings: &common_container_settings | |
|
||
steps: | ||
- block: ":rocket: Ready for comparing vllm against alternatives? This will take 4 hours." | ||
- label: "A100 trt benchmark" | ||
|
||
|
||
|
||
- label: "A100 vllm step 10" | ||
priority: 100 | ||
agents: | ||
queue: A100 | ||
|
@@ -46,7 +49,21 @@ steps: | |
podSpec: | ||
<<: *common_pod_spec | ||
containers: | ||
- image: nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 | ||
- image: vllm/vllm-openai:v0.6.2 | ||
<<: *common_container_settings | ||
|
||
|
||
|
||
- label: "A100 sglang benchmark" | ||
priority: 100 | ||
agents: | ||
queue: A100 | ||
plugins: | ||
- kubernetes: | ||
podSpec: | ||
<<: *common_pod_spec | ||
containers: | ||
- image: lmsysorg/sglang:v0.3.2-cu121 | ||
<<: *common_container_settings | ||
|
||
- label: "A100 lmdeploy benchmark" | ||
|
@@ -58,11 +75,13 @@ steps: | |
podSpec: | ||
<<: *common_pod_spec | ||
containers: | ||
- image: openmmlab/lmdeploy:v0.5.0 | ||
- image: openmmlab/lmdeploy:v0.6.1-cu12 | ||
<<: *common_container_settings | ||
|
||
|
||
- label: "A100 vllm benchmark" | ||
|
||
|
||
|
||
- label: "A100 trt llama-8B" | ||
priority: 100 | ||
agents: | ||
queue: A100 | ||
|
@@ -71,10 +90,25 @@ steps: | |
podSpec: | ||
<<: *common_pod_spec | ||
containers: | ||
- image: vllm/vllm-openai:latest | ||
- image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 | ||
<<: *common_container_settings | ||
env: | ||
- name: VLLM_USAGE_SOURCE | ||
value: ci-test | ||
- name: HF_HOME | ||
value: /root/.cache/huggingface | ||
- name: VLLM_SOURCE_CODE_LOC | ||
value: /workspace/build/buildkite/vllm/performance-benchmark | ||
- name: HF_TOKEN | ||
valueFrom: | ||
secretKeyRef: | ||
name: hf-token-secret | ||
key: token | ||
- name: TEST_SELECTOR | ||
value: "llama8B" | ||
|
||
- label: "A100 tgi benchmark" | ||
|
||
- label: "A100 trt llama-70B" | ||
priority: 100 | ||
agents: | ||
queue: A100 | ||
|
@@ -83,12 +117,54 @@ steps: | |
podSpec: | ||
<<: *common_pod_spec | ||
containers: | ||
- image: ghcr.io/huggingface/text-generation-inference:2.1 | ||
- image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto |
||
<<: *common_container_settings | ||
env: | ||
- name: VLLM_USAGE_SOURCE | ||
value: ci-test | ||
- name: HF_HOME | ||
value: /root/.cache/huggingface | ||
- name: VLLM_SOURCE_CODE_LOC | ||
value: /workspace/build/buildkite/vllm/performance-benchmark | ||
- name: HF_TOKEN | ||
valueFrom: | ||
secretKeyRef: | ||
name: hf-token-secret | ||
key: token | ||
- name: TEST_SELECTOR | ||
value: "llama70B" | ||
|
||
|
||
# FIXME(Kuntai): uncomment this after NVIDIA gives us their test docker image | ||
# - label: "A100 trt benchmark" | ||
# priority: 100 | ||
# agents: | ||
# queue: A100 | ||
# plugins: | ||
# - kubernetes: | ||
# podSpec: | ||
# <<: *common_pod_spec | ||
# containers: | ||
# - image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 | ||
# <<: *common_container_settings | ||
|
||
|
||
# FIXME(Kuntai): uncomment this after TGI supports `--ignore-eos`. | ||
# - label: "A100 tgi benchmark" | ||
# priority: 100 | ||
# agents: | ||
# queue: A100 | ||
# plugins: | ||
# - kubernetes: | ||
# podSpec: | ||
# <<: *common_pod_spec | ||
# containers: | ||
# - image: ghcr.io/huggingface/text-generation-inference:2.2.0 | ||
# <<: *common_container_settings | ||
|
||
- wait | ||
|
||
- label: "Plot" | ||
- label: "Collect the results" | ||
priority: 100 | ||
agents: | ||
queue: A100 | ||
|
@@ -117,4 +193,4 @@ steps: | |
name: hf-token-secret | ||
key: token | ||
|
||
- wait | ||
- block: ":rocket: check the results!" |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use 24.08?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got r24.07 protobuf template filling scripts from NVIDIA and these scripts doesn't work for r24.08 right now. I confirmed with NVIDIA that in the future there will be a test docker that can be used for benchmarking so I am planning to use r24.07 for now and then switch to the test docker after its release.