Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support RPC with -dev/-devd #10609

Closed
4 tasks done
person4268 opened this issue Dec 1, 2024 · 3 comments · Fixed by #11262
Closed
4 tasks done

Feature Request: Support RPC with -dev/-devd #10609

person4268 opened this issue Dec 1, 2024 · 3 comments · Fixed by #11262
Assignees
Labels
enhancement New feature or request

Comments

@person4268
Copy link

person4268 commented Dec 1, 2024

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

-dev/-devd currently doesn't appear to work with RPC, due to RPC devices getting created later down the line:

130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
version: 4230 (0c39f44d)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server -m /mnt4/models/Mistral-Large-Instruct-2411-IQ3_XXS.gguf -ngl 18 -c 16384 --host 0.0.0.0 --log-colors -fa --no-mmap --rpc 192.168.0.104:50052 -md /mnt4/models/Ministral-8B-Instruct-2410.i1-Q6_K.gguf -devd "RPC[192.168.0.104:50052]" -ngld 99 -cd 8192 -dev ROCm0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
error while handling argument "-devd": invalid device: RPC[192.168.0.104:50052]

usage:
-dev,  --device <dev1,dev2,..>          comma-separated list of devices to use for offloading (none = don't
                                        offload)
                                        use --list-devices to see a list of available devices
                                        (env: LLAMA_ARG_DEVICE)


to show complete usage, run with -h
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --list-devices --rpc 192.168.0.104:50052
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
Available devices:
  ROCm0: AMD Radeon RX 6700 XT (12272 MiB, 11872 MiB free)

Motivation

I have one computer that can run a large model and fit nothing else. I have another computer that can fit a smaller draft model and run it pretty quickly, so it'd be pretty nice if I could run the draft model over RPC. To do so, I need to set -dev to my local machine's gpu, and -devd to the system over RPC.

Possible Implementation

RPC's device creation would need to happen much earlier, before the arguments for -dev are validated. I was trying to see if I could hack the feature in but wasn't sure how to approach it.

@person4268 person4268 added the enhancement New feature or request label Dec 1, 2024
@slaren
Copy link
Collaborator

slaren commented Dec 1, 2024

We should remove rpc_servers from llama_model_params and instead treat them as any other device in the llama_model_params::devices list. That would require moving the RPC device initialization outside of llama.cpp, into the command line argument parser.

@rgerganov rgerganov self-assigned this Dec 2, 2024
@github-actions github-actions bot added the stale label Jan 2, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@slaren slaren reopened this Jan 16, 2025
rgerganov added a commit to rgerganov/llama.cpp that referenced this issue Jan 16, 2025
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: ggerganov#10609
@slaren slaren linked a pull request Jan 16, 2025 that will close this issue
@github-actions github-actions bot removed the stale label Jan 17, 2025
rgerganov added a commit to rgerganov/llama.cpp that referenced this issue Jan 17, 2025
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: ggerganov#10609
@rgerganov
Copy link
Collaborator

rgerganov commented Jan 17, 2025

PR #11262 will resolve this but note that --list-devices should come after all of the --rpc flags:

$ bin/llama-cli --rpc localhost:50052,localhost:50053 --list-devices
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA T1200 Laptop GPU, compute capability 7.5, VMM: yes
Available devices:
  CUDA0: NVIDIA T1200 Laptop GPU (3735 MiB, 3127 MiB free)
  RPC[localhost:50052]: RPC[localhost:50052] (3735 MiB, 3235 MiB free)
  RPC[localhost:50053]: RPC[localhost:50053] (3735 MiB, 3181 MiB free)

vs.

$ bin/llama-cli --list-devices --rpc localhost:50052,localhost:50053               
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA T1200 Laptop GPU, compute capability 7.5, VMM: yes
Available devices:
  CUDA0: NVIDIA T1200 Laptop GPU (3735 MiB, 3127 MiB free)

The same applies for -dev and -devd. I will submit a follow-up patch to clarify this in the docs.

rgerganov added a commit that referenced this issue Jan 17, 2025
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: #10609
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants