Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Installation]: VLLM on ARM machine with GH200 #10459

Open
1 task done
Phimanu opened this issue Nov 19, 2024 · 6 comments
Open
1 task done

[Installation]: VLLM on ARM machine with GH200 #10459

Phimanu opened this issue Nov 19, 2024 · 6 comments
Labels
installation Installation problems

Comments

@Phimanu
Copy link

Phimanu commented Nov 19, 2024

Your current environment

(I can not run collect_env since it requires VLLM installed)

$ pip freeze
certifi==2022.12.7
charset-normalizer==2.1.1
filelock==3.16.1
fsspec==2024.10.0
idna==3.4
Jinja2==3.1.4
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.1.3
pillow==10.2.0
pynvml==11.5.3
requests==2.28.1
sympy==1.13.1
torch==2.5.1
typing_extensions==4.12.2
urllib3==1.26.13

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy

I have an ARM CPU and a NVIDIA GH200 Driver Version: 550.90.07 CUDA Version: 12.4.

How you are installing vllm

pip install torch numpy
pip install vllm

I get this error:

pip install vllm
Collecting vllm
  Using cached vllm-0.6.4.post1.tar.gz (3.1 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      /tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      Traceback (most recent call last):
        File "/hpi/fs00/home/philipp.hildebrandt/armpython/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/hpi/fs00/home/philipp.hildebrandt/armpython/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/hpi/fs00/home/philipp.hildebrandt/armpython/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-8t3z_6ag/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 526, in <module>
        File "<string>", line 433, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

I thought numpy was missing or there was some problem with torch, which is why I manually installed numpy and torch in a fresh venv before trying this again. Torch has cuda available, but the error looks like VLLM might be trying to use a CPU backend. I tried manually installing pynvml, but it did not change anything.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@Phimanu Phimanu added the installation Installation problems label Nov 19, 2024
@Phimanu Phimanu changed the title [Installation]: [Installation]: VLLM on ARM machine with GH200 Nov 19, 2024
@drikster80
Copy link
Contributor

@Phimanu
Pytorch doesn't support Arm64+CUDA in the stable release, but you can now run it with the nightly version.

I just submitted a PR today (#10499) that updates the Dockerfile and adds a new requirements file specifically to fix this and allow for building a Arm64/GH200 version with CUDA from the main repo.

Side note: I've been maintaining a GH200 specific docker container of VLLM until the PR is merges if you want to try that (haven't exhaustively tested everything, but tried a couple different models and options to confirm general functionality):
https://hub.docker.com/r/drikster80/vllm-gh200-openai/tags

@Phimanu
Copy link
Author

Phimanu commented Nov 23, 2024

Hey, I tried it with the nightly pytorch version and also your branch but still got the same error.

(vllm-arm) philipp.hildebrandt@ga01:~$ python
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.6.0.dev20241120+cu124'
>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
12.4
(vllm-arm) philipp.hildebrandt@ga01:~$ git clone https://github.com/drikster80/vllm.git
Cloning into 'vllm'...
remote: Enumerating objects: 41671, done.
remote: Counting objects: 100% (7682/7682), done.
remote: Compressing objects: 100% (487/487), done.
remote: Total 41671 (delta 7427), reused 7195 (delta 7195), pack-reused 33989 (from 1)
Receiving objects: 100% (41671/41671), 32.58 MiB | 21.70 MiB/s, done.
Resolving deltas: 100% (32302/32302), done.
(vllm-arm) philipp.hildebrandt@ga01:~$ cd vllm
(vllm-arm) philipp.hildebrandt@ga01:~/vllm$ pip install -e .
Obtaining file:///hpi/fs00/home/philipp.hildebrandt/vllm
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      /tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      Traceback (most recent call last):
        File "/hpi/fs00/home/philipp.hildebrandt/vllm-arm/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/hpi/fs00/home/philipp.hildebrandt/vllm-arm/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/hpi/fs00/home/philipp.hildebrandt/vllm-arm/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 144, in get_requires_for_build_editable
          return hook(config_settings)
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 483, in get_requires_for_build_editable
          return self.get_requires_for_build_wheel(config_settings)
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-s6eeaoxg/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 526, in <module>
        File "<string>", line 433, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

I am not sure but is there some incorrect enviroment variable that makes vllm try to use numpy (CPU backend?)?

SHELL=/bin/bash
CONDA_EXE=<HOME_DIR>/miniconda3/bin/conda
_CE_M=
LMOD_arch=x86_64
TMUX=/tmp/tmux-9798/default,476541,2
LMOD_DIR=/usr/share/lmod/lmod/libexec
PWD=<HOME_DIR>
SLURM_GTIDS=0
LOGNAME=<USER>
XDG_SESSION_TYPE=tty
CONDA_PREFIX=<HOME_DIR>/miniconda3
SLURM_JOB_PARTITION=sorcery
MODULESHOME=/usr/share/lmod/lmod
MANPATH=/usr/share/lmod/lmod/share/man::
LMOD_PREPEND_BLOCK=normal
MOTD_SHOWN=pam
LANG=C.UTF-8
VIRTUAL_ENV=<HOME_DIR>/vllm-arm
CONDA_PROMPT_MODIFIER=(base)
TMPDIR=/tmp
LMOD_VERSION=6.6
MODULEPATH_ROOT=/usr/modulefiles
CUDA_VISIBLE_DEVICES=0
XDG_SESSION_CLASS=user
LMOD_PKG=/usr/share/lmod/lmod
TERM=screen
_CE_CONDA=
USER=<USER>
TMUX_PANE=%2
CONDA_SHLVL=1
LMOD_SETTARG_CMD=:
SHLVL=3
BASH_ENV=/usr/share/lmod/lmod/init/bash
LMOD_FULL_SETTARG_SUPPORT=no
LMOD_sys=Linux
XDG_SESSION_ID=4503
CONDA_PYTHON_EXE=<HOME_DIR>/miniconda3/bin/python
LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:
LMOD_COLORIZE=yes
XDG_RUNTIME_DIR=/run/user/9798
PS1=(vllm-arm) ${debian_chroot:+($debian_chroot)}\u@\h:\w\$
CONDA_DEFAULT_ENV=base
CUDA_HOME=/usr/local/cuda-12.4
PATH=<HOME_DIR>/vllm-arm/bin:/usr/local/cuda-12.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:<HOME_DIR>/.local/bin:<HOME_DIR>/bin:<HOME_DIR>/.local/bin:<HOME_DIR>/bin
MODULEPATH=/etc/lmod/modules:/usr/share/lmod/lmod/modulefiles/
LMOD_CMD=/usr/share/lmod/lmod/libexec/lmod
SSH_TTY=/dev/pts/13
OLDPWD=<HOME_DIR>/vllm
SLURM_JOB_NODELIST=ga01
BASH_FUNC_ml%%=() {  eval $($LMOD_DIR/ml_cmd "$@")
}
BASH_FUNC_module%%=() {  eval $($LMOD_CMD bash "$@"); 
 [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
_=/usr/bin/env

@Bihan
Copy link

Bihan commented Dec 24, 2024

To successfully run vLLM on the GH200, we followed these steps:

docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3

# Inside the container
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # Currently, only the PyTorch nightly has wheels for aarch64 with CUDA.
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

# Install Triton otherwise throws Triton Module Not Found
$ git clone https://github.com/triton-lang/triton.git
$ cd triton
$ pip install ninja cmake wheel pybind11 # build-time dependencies
$ pip install -e python

@johnnynunez
Copy link

johnnynunez commented Dec 31, 2024

To successfully run vLLM on the GH200, we followed these steps:

docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3

# Inside the container
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # Currently, only the PyTorch nightly has wheels for aarch64 with CUDA.
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

# Install Triton otherwise throws Triton Module Not Found
$ git clone https://github.com/triton-lang/triton.git
$ cd triton
$ pip install ninja cmake wheel pybind11 # build-time dependencies
$ pip install -e python

you can use the same scripts of jetson-containers, only use the docker for SBSA.
I mean for Nvidia is:
ARM: Jetson and future arm laptops.
SBSA: Grace
https://github.com/dusty-nv/jetson-containers

@1556900941lizerui
Copy link

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
May I ask what version of the vllm you compiled? I haven't been able to compile the latest version on my end

@johnnynunez
Copy link

johnnynunez commented Feb 25, 2025

officially now pytorch support aarch wheels
if not you can build it with github arm runners and cuda binaries.
I ported to use github arm runners and SBSA CUDA
https://github.com/Jimver/cuda-toolkit/releases/tag/v0.2.21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Installation problems
Projects
None yet
Development

No branches or pull requests

5 participants