Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ValueError: bytes must be in range(0, 256) #3617

Closed
cosmic-heart opened this issue Mar 25, 2024 · 5 comments
Closed

[Bug]: ValueError: bytes must be in range(0, 256) #3617

cosmic-heart opened this issue Mar 25, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@cosmic-heart
Copy link

Your current environment

root@3b4826375ab0:/workspace# python collect_env.py
Collecting environment information...
PyTorch version: 2.1.2
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (ppc64le)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.28.4
Libc version: glibc-2.35

Python version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-100-generic-ppc64le-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.91
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 535.161.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

CPU:
Architecture:                       ppc64le
Byte Order:                         Little Endian
CPU(s):                             128
On-line CPU(s) list:                0-127
Model name:                         POWER9, altivec supported
Model:                              2.2 (pvr 004e 1202)
Thread(s) per core:                 4
Core(s) per socket:                 16
Socket(s):                          2
Frequency boost:                    enabled
CPU max MHz:                        3800.0000
CPU min MHz:                        2300.0000
L1d cache:                          1 MiB (32 instances)
L1i cache:                          1 MiB (32 instances)
L2 cache:                           8 MiB (16 instances)
L3 cache:                           160 MiB (16 instances)
NUMA node(s):                       6
NUMA node0 CPU(s):                  0-63
NUMA node8 CPU(s):                  64-127
NUMA node252 CPU(s):                
NUMA node253 CPU(s):                
NUMA node254 CPU(s):                
NUMA node255 CPU(s):                
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Mitigation; RFI Flush, L1D private per thread
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:           Mitigation; Indirect branch serialisation (kernel only)
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] cudatoolkit               11.8.0              hedcfb66_13    conda-forge
[conda] libmagma                  2.7.2                he288b6c_2    conda-forge
[conda] libmagma_sparse           2.7.2                h5b5c57a_3    conda-forge
[conda] magma                     2.7.2                h097a1ca_3    conda-forge
[conda] numpy                     1.24.3          py310h87cc683_0  
[conda] numpy-base                1.24.3          py310hac71eb6_0  
[conda] torch                     2.1.2                    pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: 7.0; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV3     SYS     SYS     0-63    0               N/A
GPU1    NV3      X      SYS     SYS     0-63    0               N/A
GPU2    SYS     SYS      X      NV3     64-127  8               N/A
GPU3    SYS     SYS     NV3      X      64-127  8               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="./models", 
    dtype="float16", 
    tensor_parallel_size=4, 
    enforce_eager=False, 
    trust_remote_code=True, 
    load_format='safetensors'
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if enforce_eager=False then i am getting the following error, but enforce_eager=True i am not getting any error as of this stage.

root@3b4826375ab0:/workspace# python3 example.py 
WARNING 03-25 16:50:03 config.py:686] Casting torch.bfloat16 to torch.float16.
2024-03-25 16:50:06,721 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
INFO 03-25 16:50:09 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='./models', tokenizer='./models', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=safetensors, tensor_parallel_size=4, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Traceback (most recent call last):
  File "/workspace/example.py", line 10, in <module>
    llm = LLM(
  File "/root/vllm/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/root/vllm/vllm/engine/llm_engine.py", line 146, in from_engine_args
    engine = cls(*engine_configs,
  File "/root/vllm/vllm/engine/llm_engine.py", line 103, in __init__
    self.model_executor = executor_class(model_config, cache_config,
  File "/root/vllm/vllm/executor/ray_gpu_executor.py", line 60, in __init__
    self._init_workers_ray(placement_group)
  File "/root/vllm/vllm/executor/ray_gpu_executor.py", line 190, in _init_workers_ray
    self._run_workers("init_device",
  File "/root/vllm/vllm/executor/ray_gpu_executor.py", line 318, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/root/vllm/vllm/worker/worker.py", line 92, in init_device
    init_distributed_environment(self.parallel_config, self.rank,
  File "/root/vllm/vllm/worker/worker.py", line 276, in init_distributed_environment
    cupy_utils.init_process_group(
  File "/root/vllm/vllm/model_executor/parallel_utils/cupy_utils.py", line 90, in init_process_group
    _NCCL_BACKEND = NCCLBackendWithBFloat16(world_size, rank, host, port)
  File "/root/miniconda3/lib/python3.10/site-packages/cupyx/distributed/_nccl_comm.py", line 70, in __init__
    self._init_with_tcp_store(n_devices, rank, host, port)
  File "/root/miniconda3/lib/python3.10/site-packages/cupyx/distributed/_nccl_comm.py", line 93, in _init_with_tcp_store
    shifted_nccl_id = bytes([b + 128 for b in nccl_id])
ValueError: bytes must be in range(0, 256)
@cosmic-heart cosmic-heart added the bug Something isn't working label Mar 25, 2024
@cosmic-heart cosmic-heart changed the title [Bug]: [Bug]: ValueError: bytes must be in range(0, 256) Mar 25, 2024
@youkaichao
Copy link
Member

This is a bug of cupy, and we plan to remove cupy dependency with #3442 . Please stay tuned, or you can have a try using docker image built during CI of that PR, e.g. docker pull us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:a3c2340ae36ce8ee782691d30111377eaf7ae6ce .

Feedback is welcome!

@cosmic-heart
Copy link
Author

Thanks, I can directly use that hash and build from source right? My system is ppc64le, is there any vllm docker container for that architecture ?

@youkaichao
Copy link
Member

This docker image already has a good vllm installation, you can directly use it. Or you can customize the Dockerfile to build it again in your architecture, say ppc64le. Although not sure if nccl supports your OS architecture 👀

@cosmic-heart
Copy link
Author

cosmic-heart commented Mar 26, 2024

yes it supports. i get an warning type of message, prompting me to install nccl, through conda. conda install -y -c conda-forge nccl from vllm. After installing it, i didn't get any error/warning from the vllm but i still get the warn from nvidia

root@3b4826375ab0:/workspace# NCCL_DEBUG=INFO python3 example.py
3b4826375ab0:14697:14697 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
3b4826375ab0:14697:14697 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation

@youkaichao
Copy link
Member

Then you are good to go 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants