Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: benchmarking serving returns index -1 is out of bounds #4987

Closed
samos123 opened this issue May 22, 2024 · 6 comments
Closed

[Bug]: benchmarking serving returns index -1 is out of bounds #4987

samos123 opened this issue May 22, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@samos123
Copy link
Contributor

Your current environment

vLLM Version: 0.4.2
vLLM Build Flags:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] triton==2.3.0
[pip3] vllm-nccl-cu12==2.18.1.0.4.0

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.1.75+-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4
GPU 2: NVIDIA L4
GPU 3: NVIDIA L4
GPU 4: NVIDIA L4
GPU 5: NVIDIA L4
GPU 6: NVIDIA L4
GPU 7: NVIDIA L4

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.35

Nvidia driver version: 535.161.07

PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

🐛 Describe the bug

Steps to reproduce:

python3 benchmarks/benchmark_serving.py \
        --backend openai \
        --model meta-llama/Meta-Llama-3-70B-Instruct \
        --dataset-name sharegpt \
        --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \
        --request-rate 100 \
        --num-prompts 1000

Error observed:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traffic request rate: 100.0
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:10<00:00, 93.32it/s]
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/root/vllm/benchmarks/benchmark_serving.py", line 600, in <module>
    main(args)
  File "/root/vllm/benchmarks/benchmark_serving.py", line 410, in main
    benchmark_result = asyncio.run(
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/root/vllm/benchmarks/benchmark_serving.py", line 281, in benchmark
    metrics, actual_output_lens = calculate_metrics(
  File "/root/vllm/benchmarks/benchmark_serving.py", line 231, in calculate_metrics
    p99_tpot_ms=np.percentile(tpots, 99) * 1000,
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4283, in percentile
    return _quantile_unchecked(
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4555, in _quantile_unchecked
    return _ureduce(a,
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 3823, in _ureduce
    r = func(a, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4722, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 4831, in _quantile
    slices_having_nans = np.isnan(arr[-1, ...])
IndexError: index -1 is out of bounds for axis 0 with size 0
@samos123 samos123 added the bug Something isn't working label May 22, 2024
@samos123 samos123 changed the title [Bug]: benchmarking serving is broken [Bug]: benchmarking serving returns index -1 is out of bounds May 22, 2024
@simon-mo
Copy link
Collaborator

@ywang96

@ywang96
Copy link
Member

ywang96 commented May 22, 2024

@samos123 It seems to me that the server never received these requests or rejected them since it's not possible to actually process all 1000 requests within 10 seconds (1000/1000 [00:10<00:00)

Could you also share the command that you used to launch the API server? Have you checked the logs from the API server as well?

@ywang96
Copy link
Member

ywang96 commented May 23, 2024

@samos123 Following up - if you're able to find out other issues please let me know - I don't think this is a bug otherwise.

@samos123
Copy link
Contributor Author

samos123 commented May 23, 2024

I was missing --port 8080 since my endpoint was listening on port 8080. Once I added port 8080, I was able to crash my vLLM instance which is a good sign! That means it's at least taking traffic. Closing this bug for now and will file a separate bug if needed once I do investigate the crash.

Thanks @ywang96 for catching that it was processing all 1000 requests within 10 seconds. That gave me the hint that it was indeed likely not sending any requests!

It would be helpful to catch an incorrect endpoint configuration early and provide a more helpful error message.

@ywang96
Copy link
Member

ywang96 commented May 23, 2024

It would be helpful to catch an incorrect endpoint configuration early and provide a more helpful error message.

Yea - the error is essentially saying it's trying to compute mean of an empty list when calculating the metrics, and I guess I could add a catch on that to provide a better error message!

@samos123
Copy link
Contributor Author

You could also do a single request and ensuring a valid response before starting the benchmark. Report back the error message with whatever the response was so the end-user can easily identify the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants