Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

andy-neuma triggered nightly on refs/heads/main #58

andy-neuma triggered nightly on refs/heads/main

andy-neuma triggered nightly on refs/heads/main #58

Triggered via schedule April 4, 2024 01:23
Status Failure
Total duration 7h 26m 56s
Artifacts 7

nightly.yml

on: schedule
AWS-AVX2-32G-A10G-24G-Benchmark  /  BENCHMARK
7h 21m
AWS-AVX2-32G-A10G-24G-Benchmark / BENCHMARK
NIGHTLY-MULTI  /  ...  /  BUILD
23m 56s
NIGHTLY-MULTI / BUILD / BUILD
NIGHTLY-SOLO  /  ...  /  BUILD
45m 7s
NIGHTLY-SOLO / BUILD / BUILD
AWS-AVX2-32G-A10G-24G-Accuracy  /  LM-EVAL
1h 44m
AWS-AVX2-32G-A10G-24G-Accuracy / LM-EVAL
AWS-AVX2-32G-A10G-24G-Benchmark  /  NM_GH_ACTION_BENCHMARK
19s
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Matrix: NIGHTLY-MULTI / TEST
Matrix: NIGHTLY-SOLO / TEST
Fit to window
Zoom out
Zoom in

Annotations

2 errors and 8 warnings
NIGHTLY-SOLO / TEST (aws-avx2-192G-4-a10g-96G) / TEST
Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
# :warning: **Performance Alert** :warning: Possible performance regression was detected for benchmark **'smaller_is_better'**. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold `1.10`. | Benchmark suite | Current: 3d151aa4e77b4729bd591fdc72908096b7748909 | Previous: 5d256f536d8e112a97a2c0a09729c74f9552fde4 | Ratio | |-|-|-|-| | `{"name": "median_request_latency", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `14154.261130999657` ms | `12107.788427500054` ms | `1.17` | | `{"name": "mean_tpot_ms", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `124.25997606894775` ms | `107.88996677468617` ms | `1.15` | | `{"name": "median_tpot_ms", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `112.44488433213928` ms | `96.48709741669009` ms | `1.17` | | `{"name": "median_request_latency", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `12421.510741999555` ms | `10733.578026000032` ms | `1.16` | | `{"name": "mean_tpot_ms", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `95.61047598259354` ms | `81.96528384593707` ms | `1.17` | | `{"name": "median_tpot_ms", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `98.40706089880393` ms | `83.47636292423746` ms | `1.18` | | `{"name": "mean_ttft_ms", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `19808.064829272684` ms | `17532.284571051987` ms | `1.13` | | `{"name": "median_ttft_ms", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 12107.788427500054 and current value is 14154.261130999657. It is 1.1690211813456792x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 107.88996677468617 and current value is 124.25997606894775. It is 1.1517287453470828x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 96.48709741669009 and current value is 112.44488433213928. It is 1.1653877807779183x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 10733.578026000032 and current value is 12421.510741999555. It is 1.157257226985338x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 81.96528384593707 and current value is 95.61047598259354. It is 1.1664752624087065x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 83.47636292423746 and current value is 98.40706089880393. It is 1.178861386044304x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 17532.284571051987 and current value is 19808.064829272684. It is 1.1298051174675945x worse than previous exceeding a ratio threshold 1.1
AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK
Performance alert! Previous value was 12446.751371499886 and current value is 14854.807530500693. It is 1.1934686479328804x worse than previous exceeding a ratio threshold 1.1

Artifacts

Produced during runtime
Name Size
3.10.12-nm-vllm-0.1.0.tar.gz Expired
447 KB
3.11.4-nm-vllm-0.1.0.tar.gz Expired
447 KB
8547840978-aws-avx2-32G-a10g-24G Expired
123 KB
cc-vllm-html-aws-avx2-192G-4-a10g-96G Expired
1.65 MB
gh_action_benchmark_jsons-8547840978-aws-avx2-32G-a10g-24G Expired
28.1 KB
nm_vllm-0.1.0-cp310-cp310-linux_x86_64.whl Expired
87.1 MB
nm_vllm-0.1.0-cp311-cp311-linux_x86_64.whl Expired
87.1 MB