This repository demonstrates how to set up a llama.cpp server and call it using a Python client for translation purposes. The goal is to provide a minimal working example of a llama.cpp
server/client pair.
These instructions assume the server's host machine runs a Debian-based system, such as Ubuntu.
The setup requires running custom bash scripts with sudo
privileges, especially when using Nvidia GPUs, to ensure a convenient and quick setup.
Before starting the setup, review the configuration file config.conf. This file contains default values optimized for a quick test of the llama.cpp
server's capabilities. For more advanced use, such as deploying a larger model, you can easily modify the configuration.
One key setting is USE_GPU
. Set this to true
if you have Nvidia GPUs available and wish to take advantage of their capabilities.
- Default Setup (without GPU): The setup will require approximately 5.5 GB of disk space.
- With GPU Support (
USE_GPU=true
): An additional 10 GB will be required, bringing the total to around 15.5 GB.
Run the following command to check if docker
is installed:
docker --version
If the version number is not responded call:
sudo apt install docker-ce
and
sudo systemctl start docker
This step is only required if you plan to use Nvidia GPUs (USE_GPU=true
).
Run the following command to check if nvidia-container-toolkit
is installed:
nvidia-container-toolkit --version
If the version number is not responded call the following script from the root of the repository:
sudo ./scripts/setup_nvidia_container_toolkit.sh
This step is only required if you plan to use Nvidia GPUs (USE_GPU=true
).
If the command
sudo docker info | grep Runtimes
doesn't show nvidia
as a runtime, you need to add it. Run the following script from the root of the repository:
sudo ./scripts/setup_nvidia_docker_runtime.sh
You have two options to get a model file.
- Download a model file: You can download the model from Hugging Face. To do this, execute the following script:
sudo ./scripts/download_model.sh
This script utilizes the configuration variables HUGGING_FACE_MODEL_REPO
and MODEL_FILE
of config.conf.
- Use your own model file: Place your model file in the
./models
folder. Ensure that the model file name matches theMODEL_FILE
variable in config.conf.
To setup the server, run:
sudo ./scripts/setup_server.sh
Start the server:
sudo ./scripts/start_server.sh
Stop the server:
sudo ./scripts/stop_server.sh
You can check if the server is running locally on your machine:
sudo ./scripts/is_server_running_locally.sh
During the startup process, it is possible that the script may return false for a few seconds; however, everything is still okay. If you suspect there are errors during the server startup, check the log files in ./logs
.
Once the server is running, you can try it out by giving it some tasks with a client call:
sudo ./scripts/translate.sh
To cleanup your environment, run the following command:
sudo ./scripts/cleanup/cleanup.sh
This will remove the installed components of the repository i.e. the log and model files and the Docker related volumes.