[Bug]: benchmarking serving returns index -1 is out of bounds #4987

samos123 · 2024-05-22T17:21:15Z

Your current environment

vLLM Version: 0.4.2
vLLM Build Flags:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] triton==2.3.0
[pip3] vllm-nccl-cu12==2.18.1.0.4.0

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.1.75+-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4
GPU 2: NVIDIA L4
GPU 3: NVIDIA L4
GPU 4: NVIDIA L4
GPU 5: NVIDIA L4
GPU 6: NVIDIA L4
GPU 7: NVIDIA L4

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.35

Nvidia driver version: 535.161.07

PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

🐛 Describe the bug

Steps to reproduce:

python3 benchmarks/benchmark_serving.py \
        --backend openai \
        --model meta-llama/Meta-Llama-3-70B-Instruct \
        --dataset-name sharegpt \
        --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \
        --request-rate 100 \
        --num-prompts 1000

Error observed:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traffic request rate: 100.0
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:10<00:00, 93.32it/s]
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/root/vllm/benchmarks/benchmark_serving.py", line 600, in <module>
    main(args)
  File "/root/vllm/benchmarks/benchmark_serving.py", line 410, in main
    benchmark_result = asyncio.run(
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/root/vllm/benchmarks/benchmark_serving.py", line 281, in benchmark
    metrics, actual_output_lens = calculate_metrics(
  File "/root/vllm/benchmarks/benchmark_serving.py", line 231, in calculate_metrics
    p99_tpot_ms=np.percentile(tpots, 99) * 1000,
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4283, in percentile
    return _quantile_unchecked(
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4555, in _quantile_unchecked
    return _ureduce(a,
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 3823, in _ureduce
    r = func(a, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4722, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4831, in _quantile
    slices_having_nans = np.isnan(arr[-1, ...])
IndexError: index -1 is out of bounds for axis 0 with size 0

The text was updated successfully, but these errors were encountered:

simon-mo · 2024-05-22T17:33:39Z

@ywang96

ywang96 · 2024-05-22T17:48:41Z

@samos123 It seems to me that the server never received these requests or rejected them since it's not possible to actually process all 1000 requests within 10 seconds (1000/1000 [00:10<00:00)

Could you also share the command that you used to launch the API server? Have you checked the logs from the API server as well?

ywang96 · 2024-05-23T05:15:16Z

@samos123 Following up - if you're able to find out other issues please let me know - I don't think this is a bug otherwise.

samos123 · 2024-05-23T06:08:53Z

I was missing --port 8080 since my endpoint was listening on port 8080. Once I added port 8080, I was able to crash my vLLM instance which is a good sign! That means it's at least taking traffic. Closing this bug for now and will file a separate bug if needed once I do investigate the crash.

Thanks @ywang96 for catching that it was processing all 1000 requests within 10 seconds. That gave me the hint that it was indeed likely not sending any requests!

It would be helpful to catch an incorrect endpoint configuration early and provide a more helpful error message.

ywang96 · 2024-05-23T06:25:01Z

It would be helpful to catch an incorrect endpoint configuration early and provide a more helpful error message.

Yea - the error is essentially saying it's trying to compute mean of an empty list when calculating the metrics, and I guess I could add a catch on that to provide a better error message!

samos123 · 2024-05-23T06:28:54Z

You could also do a single request and ensuring a valid response before starting the benchmark. Report back the error message with whatever the response was so the end-user can easily identify the issue.

samos123 added the bug Something isn't working label May 22, 2024

samos123 changed the title ~~[Bug]: benchmarking serving is broken~~ [Bug]: benchmarking serving returns index -1 is out of bounds May 22, 2024

samos123 closed this as completed May 23, 2024

ywang96 mentioned this issue May 25, 2024

[Misc] Make Serving Benchmark More User-friendly #5044

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: benchmarking serving returns index -1 is out of bounds #4987

[Bug]: benchmarking serving returns index -1 is out of bounds #4987

samos123 commented May 22, 2024

simon-mo commented May 22, 2024

ywang96 commented May 22, 2024 •

edited

Loading

ywang96 commented May 23, 2024

samos123 commented May 23, 2024 •

edited

Loading

ywang96 commented May 23, 2024

samos123 commented May 23, 2024

[Bug]: benchmarking serving returns index -1 is out of bounds #4987

[Bug]: benchmarking serving returns index -1 is out of bounds #4987

Comments

samos123 commented May 22, 2024

Your current environment

🐛 Describe the bug

simon-mo commented May 22, 2024

ywang96 commented May 22, 2024 • edited Loading

ywang96 commented May 23, 2024

samos123 commented May 23, 2024 • edited Loading

ywang96 commented May 23, 2024

samos123 commented May 23, 2024

ywang96 commented May 22, 2024 •

edited

Loading

samos123 commented May 23, 2024 •

edited

Loading