-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Installation]: VLLM on ARM machine with GH200 #10459
Comments
@Phimanu I just submitted a PR today (#10499) that updates the Dockerfile and adds a new requirements file specifically to fix this and allow for building a Arm64/GH200 version with CUDA from the main repo. Side note: I've been maintaining a GH200 specific docker container of VLLM until the PR is merges if you want to try that (haven't exhaustively tested everything, but tried a couple different models and options to confirm general functionality): |
Hey, I tried it with the nightly pytorch version and also your branch but still got the same error.
I am not sure but is there some incorrect enviroment variable that makes vllm try to use numpy (CPU backend?)?
|
To successfully run vLLM on the GH200, we followed these steps:
|
you can use the same scripts of jetson-containers, only use the docker for SBSA. |
|
officially now pytorch support aarch wheels |
Your current environment
(I can not run collect_env since it requires VLLM installed)
I have an ARM CPU and a NVIDIA GH200 Driver Version: 550.90.07 CUDA Version: 12.4.
How you are installing vllm
I get this error:
I thought numpy was missing or there was some problem with torch, which is why I manually installed numpy and torch in a fresh venv before trying this again. Torch has cuda available, but the error looks like VLLM might be trying to use a CPU backend. I tried manually installing pynvml, but it did not change anything.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: