[Installation]: VLLM on ARM machine with GH200 #10459

Phimanu · 2024-11-19T16:57:34Z

Your current environment

(I can not run collect_env since it requires VLLM installed)

$ pip freeze
certifi==2022.12.7
charset-normalizer==2.1.1
filelock==3.16.1
fsspec==2024.10.0
idna==3.4
Jinja2==3.1.4
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.1.3
pillow==10.2.0
pynvml==11.5.3
requests==2.28.1
sympy==1.13.1
torch==2.5.1
typing_extensions==4.12.2
urllib3==1.26.13

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy

I have an ARM CPU and a NVIDIA GH200 Driver Version: 550.90.07 CUDA Version: 12.4.

How you are installing vllm

pip install torch numpy
pip install vllm

I get this error:

pip install vllm
Collecting vllm
  Using cached vllm-0.6.4.post1.tar.gz (3.1 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      /tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      Traceback (most recent call last):
        File "/hpi/fs00/home/philipp.hildebrandt/armpython/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/hpi/fs00/home/philipp.hildebrandt/armpython/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/hpi/fs00/home/philipp.hildebrandt/armpython/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 526, in <module>
        File "<string>", line 433, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

I thought numpy was missing or there was some problem with torch, which is why I manually installed numpy and torch in a fresh venv before trying this again. Torch has cuda available, but the error looks like VLLM might be trying to use a CPU backend. I tried manually installing pynvml, but it did not change anything.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

drikster80 · 2024-11-20T19:36:20Z

@Phimanu
Pytorch doesn't support Arm64+CUDA in the stable release, but you can now run it with the nightly version.

I just submitted a PR today (#10499) that updates the Dockerfile and adds a new requirements file specifically to fix this and allow for building a Arm64/GH200 version with CUDA from the main repo.

Side note: I've been maintaining a GH200 specific docker container of VLLM until the PR is merges if you want to try that (haven't exhaustively tested everything, but tried a couple different models and options to confirm general functionality):
https://hub.docker.com/r/drikster80/vllm-gh200-openai/tags

Phimanu · 2024-11-23T13:25:32Z

Hey, I tried it with the nightly pytorch version and also your branch but still got the same error.

(vllm-arm) philipp.hildebrandt@ga01:~$ python
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.6.0.dev20241120+cu124'
>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
12.4

(vllm-arm) philipp.hildebrandt@ga01:~$ git clone https://github.com/drikster80/vllm.git
Cloning into 'vllm'...
remote: Enumerating objects: 41671, done.
remote: Counting objects: 100% (7682/7682), done.
remote: Compressing objects: 100% (487/487), done.
remote: Total 41671 (delta 7427), reused 7195 (delta 7195), pack-reused 33989 (from 1)
Receiving objects: 100% (41671/41671), 32.58 MiB | 21.70 MiB/s, done.
Resolving deltas: 100% (32302/32302), done.
(vllm-arm) philipp.hildebrandt@ga01:~$ cd vllm
(vllm-arm) philipp.hildebrandt@ga01:~/vllm$ pip install -e .
Obtaining file:///hpi/fs00/home/philipp.hildebrandt/vllm
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      /tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      Traceback (most recent call last):
        File "/hpi/fs00/home/philipp.hildebrandt/vllm-arm/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/hpi/fs00/home/philipp.hildebrandt/vllm-arm/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/hpi/fs00/home/philipp.hildebrandt/vllm-arm/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 144, in get_requires_for_build_editable
          return hook(config_settings)
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 483, in get_requires_for_build_editable
          return self.get_requires_for_build_wheel(config_settings)
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 526, in <module>
        File "<string>", line 433, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

I am not sure but is there some incorrect enviroment variable that makes vllm try to use numpy (CPU backend?)?

SHELL=/bin/bash
CONDA_EXE=<HOME_DIR>/miniconda3/bin/conda
_CE_M=
LMOD_arch=x86_64
TMUX=/tmp/tmux-9798/default,476541,2
LMOD_DIR=/usr/share/lmod/lmod/libexec
PWD=<HOME_DIR>
SLURM_GTIDS=0
LOGNAME=<USER>
XDG_SESSION_TYPE=tty
CONDA_PREFIX=<HOME_DIR>/miniconda3
SLURM_JOB_PARTITION=sorcery
MODULESHOME=/usr/share/lmod/lmod
MANPATH=/usr/share/lmod/lmod/share/man::
LMOD_PREPEND_BLOCK=normal
MOTD_SHOWN=pam
LANG=C.UTF-8
VIRTUAL_ENV=<HOME_DIR>/vllm-arm
CONDA_PROMPT_MODIFIER=(base)
TMPDIR=/tmp
LMOD_VERSION=6.6
MODULEPATH_ROOT=/usr/modulefiles
CUDA_VISIBLE_DEVICES=0
XDG_SESSION_CLASS=user
LMOD_PKG=/usr/share/lmod/lmod
TERM=screen
_CE_CONDA=
USER=<USER>
TMUX_PANE=%2
CONDA_SHLVL=1
LMOD_SETTARG_CMD=:
SHLVL=3
BASH_ENV=/usr/share/lmod/lmod/init/bash
LMOD_FULL_SETTARG_SUPPORT=no
LMOD_sys=Linux
XDG_SESSION_ID=4503
CONDA_PYTHON_EXE=<HOME_DIR>/miniconda3/bin/python
LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:
LMOD_COLORIZE=yes
XDG_RUNTIME_DIR=/run/user/9798
PS1=(vllm-arm) ${debian_chroot:+($debian_chroot)}\u@\h:\w\$
CONDA_DEFAULT_ENV=base
CUDA_HOME=/usr/local/cuda-12.4
PATH=<HOME_DIR>/vllm-arm/bin:/usr/local/cuda-12.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:<HOME_DIR>/.local/bin:<HOME_DIR>/bin:<HOME_DIR>/.local/bin:<HOME_DIR>/bin
MODULEPATH=/etc/lmod/modules:/usr/share/lmod/lmod/modulefiles/
LMOD_CMD=/usr/share/lmod/lmod/libexec/lmod
SSH_TTY=/dev/pts/13
OLDPWD=<HOME_DIR>/vllm
SLURM_JOB_NODELIST=ga01
BASH_FUNC_ml%%=() {  eval $($LMOD_DIR/ml_cmd "$@")
}
BASH_FUNC_module%%=() {  eval $($LMOD_CMD bash "$@"); 
 [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
_=/usr/bin/env

Bihan · 2024-12-24T12:36:41Z

To successfully run vLLM on the GH200, we followed these steps:

docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3

# Inside the container
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # Currently, only the PyTorch nightly has wheels for aarch64 with CUDA.
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

# Install Triton otherwise throws Triton Module Not Found
$ git clone https://github.com/triton-lang/triton.git
$ cd triton
$ pip install ninja cmake wheel pybind11 # build-time dependencies
$ pip install -e python

johnnynunez · 2024-12-31T11:13:15Z

To successfully run vLLM on the GH200, we followed these steps:

docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3

# Inside the container
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # Currently, only the PyTorch nightly has wheels for aarch64 with CUDA.
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

# Install Triton otherwise throws Triton Module Not Found
$ git clone https://github.com/triton-lang/triton.git
$ cd triton
$ pip install ninja cmake wheel pybind11 # build-time dependencies
$ pip install -e python

you can use the same scripts of jetson-containers, only use the docker for SBSA.
I mean for Nvidia is:
ARM: Jetson and future arm laptops.
SBSA: Grace
https://github.com/dusty-nv/jetson-containers

1556900941lizerui · 2025-02-25T04:12:30Z

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
May I ask what version of the vllm you compiled? I haven't been able to compile the latest version on my end

johnnynunez · 2025-02-25T09:44:35Z

officially now pytorch support aarch wheels
if not you can build it with github arm runners and cuda binaries.
I ported to use github arm runners and SBSA CUDA
https://github.com/Jimver/cuda-toolkit/releases/tag/v0.2.21

Phimanu added the installation Installation problems label Nov 19, 2024

Phimanu changed the title ~~[Installation]:~~ [Installation]: VLLM on ARM machine with GH200 Nov 19, 2024

surak mentioned this issue Dec 27, 2024

some error happend when installing vllm #3002

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Installation]: VLLM on ARM machine with GH200 #10459

[Installation]: VLLM on ARM machine with GH200 #10459

Phimanu commented Nov 19, 2024

drikster80 commented Nov 20, 2024

Phimanu commented Nov 23, 2024 •

edited

Loading

Bihan commented Dec 24, 2024 •

edited

Loading

johnnynunez commented Dec 31, 2024 •

edited

Loading

1556900941lizerui commented Feb 25, 2025

johnnynunez commented Feb 25, 2025 •

edited

Loading

[Installation]: VLLM on ARM machine with GH200 #10459

[Installation]: VLLM on ARM machine with GH200 #10459

Comments

Phimanu commented Nov 19, 2024

Your current environment

How you are installing vllm

Before submitting a new issue...

drikster80 commented Nov 20, 2024

Phimanu commented Nov 23, 2024 • edited Loading

Bihan commented Dec 24, 2024 • edited Loading

johnnynunez commented Dec 31, 2024 • edited Loading

1556900941lizerui commented Feb 25, 2025

johnnynunez commented Feb 25, 2025 • edited Loading

Phimanu commented Nov 23, 2024 •

edited

Loading

Bihan commented Dec 24, 2024 •

edited

Loading

johnnynunez commented Dec 31, 2024 •

edited

Loading

johnnynunez commented Feb 25, 2025 •

edited

Loading