andy-neuma triggered nightly on refs/heads/main · neuralmagic/nm-vllm@3d151aa

# :warning: **Performance Alert** :warning: Possible performance regression was detected for benchmark **'smaller_is_better'**. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold `1.10`. | Benchmark suite | Current: 3d151aa4e77b4729bd591fdc72908096b7748909 | Previous: 5d256f536d8e112a97a2c0a09729c74f9552fde4 | Ratio | |-|-|-|-| | `{"name": "median_request_latency", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `14154.261130999657` ms | `12107.788427500054` ms | `1.17` | | `{"name": "mean_tpot_ms", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `124.25997606894775` ms | `107.88996677468617` ms | `1.15` | | `{"name": "median_tpot_ms", "description": "VLLM Serving - 2:4 Sparse\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4\nmax-model-len - 4096\nsparsity - semi_structured_sparse_w16a16\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `112.44488433213928` ms | `96.48709741669009` ms | `1.17` | | `{"name": "median_request_latency", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `12421.510741999555` ms | `10733.578026000032` ms | `1.16` | | `{"name": "mean_tpot_ms", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `95.61047598259354` ms | `81.96528384593707` ms | `1.17` | | `{"name": "median_tpot_ms", "description": "VLLM Serving - Dense\nmodel - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `98.40706089880393` ms | `83.47636292423746` ms | `1.18` | | `{"name": "mean_ttft_ms", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.1.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.1.2+cu121"}` | `19808.064829272684` ms | `17532.284571051987` ms | `1.13` | | `{"name": "median_ttft_ms", "description": "VLLM Serving - Dense\nmodel - TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ\nmax-model-len - 4096\nsparsity - None\nbenchmark_serving {\n \"nr-qps-pair_\": \"1500,5\",\n \"dataset\": \"sharegpt\"\n}", "gpu_description": "NVIDIA A10G

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 12107.788427500054 and current value is 14154.261130999657. It is 1.1690211813456792x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 107.88996677468617 and current value is 124.25997606894775. It is 1.1517287453470828x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 96.48709741669009 and current value is 112.44488433213928. It is 1.1653877807779183x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 10733.578026000032 and current value is 12421.510741999555. It is 1.157257226985338x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 81.96528384593707 and current value is 95.61047598259354. It is 1.1664752624087065x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 83.47636292423746 and current value is 98.40706089880393. It is 1.178861386044304x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 17532.284571051987 and current value is 19808.064829272684. It is 1.1298051174675945x worse than previous exceeding a ratio threshold 1.1

AWS-AVX2-32G-A10G-24G-Benchmark / NM_GH_ACTION_BENCHMARK

Performance alert! Previous value was 12446.751371499886 and current value is 14854.807530500693. It is 1.1934686479328804x worse than previous exceeding a ratio threshold 1.1

Artifacts

Produced during runtime

Name	Size
3.10.12-nm-vllm-0.1.0.tar.gz Expired	447 KB
3.11.4-nm-vllm-0.1.0.tar.gz Expired	447 KB
8547840978-aws-avx2-32G-a10g-24G Expired	123 KB
cc-vllm-html-aws-avx2-192G-4-a10g-96G Expired	1.65 MB
gh_action_benchmark_jsons-8547840978-aws-avx2-32G-a10g-24G Expired	28.1 KB
nm_vllm-0.1.0-cp310-cp310-linux_x86_64.whl Expired	87.1 MB
nm_vllm-0.1.0-cp311-cp311-linux_x86_64.whl Expired	87.1 MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

andy-neuma triggered nightly on refs/heads/main #58

Summary

andy-neuma triggered nightly on refs/heads/main #58

Jobs

Run details

nightly.yml

Annotations

Artifacts