mgoin triggered nightly on refs/heads/merge-upstream-0.4.0-to-main · neuralmagic/nm-vllm@1c2725c

# :warning: **Performance Alert** :warning: Possible performance regression was detected for benchmark **'smaller_is_better'**. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold `1.10`. | Benchmark suite | Current: b3d607a9022ebd492a4c220401cad0b1ae126f8c | Previous: bdfdb774576b34b4cae98a200b146c19cd24d24c | Ratio | |-|-|-|-| | `{"name": "median_tpot_ms", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"750,2.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `99.5452885821862` ms | `85.08840363727279` ms | `1.17` | | `{"name": "median_request_latency", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"750,2.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `2550.2822674989147` ms | `2304.388393999943` ms | `1.11` | | `{"name": "mean_tpot_ms", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"750,2.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `18.67525323370096` ms | `16.874880133747496` ms | `1.11` | | `{"name": "median_tpot_ms", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"750,2.5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `16.787582091402747` ms | `15.064373402042957` ms | `1.11` | | `{"name": "median_request_latency", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `15734.700635499394` ms | `12165.00634949989` ms | `1.29` | | `{"name": "mean_tpot_ms", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `145.87599154684392` ms | `108.87166010577987` ms | `1.34` | | `{"name": "median_tpot_ms", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `133.68083343954564` ms | `97.20393972317379` ms | `1.38` | | `{"name": "median_request_latency", "description": "VLLM Serving - Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50\nmax-model-len - 4096\nsparsity - sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "g

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 85.08840363727279 and current value is 99.5452885821862. It is 1.1699042916181894x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 2304.388393999943 and current value is 2550.2822674989147. It is 1.1067067835175781x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 16.874880133747496 and current value is 18.67525323370096. It is 1.1066895341290728x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 15.064373402042957 and current value is 16.787582091402747. It is 1.1143896691465505x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 12165.00634949989 and current value is 15734.700635499394. It is 1.293439574418821x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 108.87166010577987 and current value is 145.87599154684392. It is 1.339889475416381x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 97.20393972317379 and current value is 133.68083343954564. It is 1.3752614741774258x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 63174.63465400033 and current value is 69967.52652400118. It is 1.1075256217500058x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 26975.800461572657 and current value is 31053.952912183973. It is 1.151178181215445x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 24898.172750000413 and current value is 29978.847964499437. It is 1.2040581558138213x worse than previous exceeding a ratio threshold 1.1

Artifacts

Produced during runtime

Name	Size
3.10.12-nm-vllm-0.1.0.tar.gz Expired	404 KB
3.11.4-nm-vllm-0.1.0.tar.gz Expired	404 KB
8513837051-aws-avx2-32G-a10g-24G Expired	124 KB
cc-vllm-html Expired	1.52 MB
gh_action_benchmark_jsons-8513837051-aws-avx2-32G-a10g-24G Expired	29.2 KB
nm_vllm-0.1.0-cp310-cp310-linux_x86_64.whl Expired	87 MB
nm_vllm-0.1.0-cp311-cp311-linux_x86_64.whl Expired	87 MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgoin triggered nightly on refs/heads/merge-upstream-0.4.0-to-main #53

Summary

mgoin triggered nightly on refs/heads/merge-upstream-0.4.0-to-main #53

Jobs

Run details

nightly.yml

Annotations

Artifacts