Feature Request: Support RPC with -dev/-devd #10609

person4268 · 2024-12-01T01:50:49Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

-dev/-devd currently doesn't appear to work with RPC, due to RPC devices getting created later down the line:

130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
version: 4230 (0c39f44d)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server -m /mnt4/models/Mistral-Large-Instruct-2411-IQ3_XXS.gguf -ngl 18 -c 16384 --host 0.0.0.0 --log-colors -fa --no-mmap --rpc 192.168.0.104:50052 -md /mnt4/models/Ministral-8B-Instruct-2410.i1-Q6_K.gguf -devd "RPC[192.168.0.104:50052]" -ngld 99 -cd 8192 -dev ROCm0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
error while handling argument "-devd": invalid device: RPC[192.168.0.104:50052]

usage:
-dev,  --device <dev1,dev2,..>          comma-separated list of devices to use for offloading (none = don't
                                        offload)
                                        use --list-devices to see a list of available devices
                                        (env: LLAMA_ARG_DEVICE)


to show complete usage, run with -h
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --list-devices --rpc 192.168.0.104:50052
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
Available devices:
  ROCm0: AMD Radeon RX 6700 XT (12272 MiB, 11872 MiB free)

Motivation

I have one computer that can run a large model and fit nothing else. I have another computer that can fit a smaller draft model and run it pretty quickly, so it'd be pretty nice if I could run the draft model over RPC. To do so, I need to set -dev to my local machine's gpu, and -devd to the system over RPC.

Possible Implementation

RPC's device creation would need to happen much earlier, before the arguments for -dev are validated. I was trying to see if I could hack the feature in but wasn't sure how to approach it.

The text was updated successfully, but these errors were encountered:

slaren · 2024-12-01T17:56:13Z

We should remove rpc_servers from llama_model_params and instead treat them as any other device in the llama_model_params::devices list. That would require moving the RPC device initialization outside of llama.cpp, into the command line argument parser.

github-actions · 2025-01-16T01:07:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: ggerganov#10609

rgerganov · 2025-01-17T08:08:31Z

PR #11262 will resolve this but note that --list-devices should come after all of the --rpc flags:

$ bin/llama-cli --rpc localhost:50052,localhost:50053 --list-devices
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA T1200 Laptop GPU, compute capability 7.5, VMM: yes
Available devices:
  CUDA0: NVIDIA T1200 Laptop GPU (3735 MiB, 3127 MiB free)
  RPC[localhost:50052]: RPC[localhost:50052] (3735 MiB, 3235 MiB free)
  RPC[localhost:50053]: RPC[localhost:50053] (3735 MiB, 3181 MiB free)

vs.

$ bin/llama-cli --list-devices --rpc localhost:50052,localhost:50053               
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA T1200 Laptop GPU, compute capability 7.5, VMM: yes
Available devices:
  CUDA0: NVIDIA T1200 Laptop GPU (3735 MiB, 3127 MiB free)

The same applies for -dev and -devd. I will submit a follow-up patch to clarify this in the docs.

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609

person4268 added the enhancement New feature or request label Dec 1, 2024

rgerganov self-assigned this Dec 2, 2024

github-actions bot added the stale label Jan 2, 2025

github-actions bot closed this as completed Jan 16, 2025

slaren reopened this Jan 16, 2025

rgerganov added a commit to rgerganov/llama.cpp that referenced this issue Jan 16, 2025

rpc : register backend devices

9a79d96

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: ggerganov#10609

rgerganov mentioned this issue Jan 16, 2025

rpc : register backend devices #11262

Merged

slaren linked a pull request Jan 16, 2025 that will close this issue

rpc : register backend devices #11262

Merged

github-actions bot removed the stale label Jan 17, 2025

rgerganov added a commit to rgerganov/llama.cpp that referenced this issue Jan 17, 2025

rpc : early register backend devices

b38e86f

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: ggerganov#10609

rgerganov added a commit that referenced this issue Jan 17, 2025

rpc : early register backend devices (#11262)

667d728

Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609

rgerganov closed this as completed in #11262 Jan 17, 2025

rgerganov mentioned this issue Jan 29, 2025

Misc. bug: llama-server with rpc oom's allocation even though plenty left on devices #11435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support RPC with -dev/-devd #10609

Feature Request: Support RPC with -dev/-devd #10609

person4268 commented Dec 1, 2024 •

edited

Loading

slaren commented Dec 1, 2024

github-actions bot commented Jan 16, 2025

rgerganov commented Jan 17, 2025 •

edited

Loading

Feature Request: Support RPC with -dev/-devd #10609

Feature Request: Support RPC with -dev/-devd #10609

Comments

person4268 commented Dec 1, 2024 • edited Loading

Prerequisites

Feature Description

Motivation

Possible Implementation

slaren commented Dec 1, 2024

github-actions bot commented Jan 16, 2025

rgerganov commented Jan 17, 2025 • edited Loading

person4268 commented Dec 1, 2024 •

edited

Loading

rgerganov commented Jan 17, 2025 •

edited

Loading