llm_translate

This repository demonstrates how to set up a llama.cpp server and call it using a Python client for translation purposes. The goal is to provide a minimal working example of a llama.cpp server/client pair.

Setup

These instructions assume the server's host machine runs a Debian-based system, such as Ubuntu. The setup requires running custom bash scripts with sudo privileges, especially when using Nvidia GPUs, to ensure a convenient and quick setup.

Configuration

Before starting the setup, review the configuration file config.conf. This file contains default values optimized for a quick test of the llama.cpp server's capabilities. For more advanced use, such as deploying a larger model, you can easily modify the configuration.

One key setting is USE_GPU. Set this to true if you have Nvidia GPUs available and wish to take advantage of their capabilities.

Disk Space Requirements

Default Setup (without GPU): The setup will require approximately 5.5 GB of disk space.
With GPU Support (USE_GPU=true): An additional 10 GB will be required, bringing the total to around 15.5 GB.

Host setup

`docker` install

Run the following command to check if docker is installed:

docker --version

If the version number is not responded call:

sudo apt install docker-ce

and

sudo systemctl start docker

`nvidia-container-toolkit` install (only if using Nvidia GPUs)

This step is only required if you plan to use Nvidia GPUs (USE_GPU=true).

Run the following command to check if nvidia-container-toolkit is installed:

nvidia-container-toolkit --version

If the version number is not responded call the following script from the root of the repository:

sudo ./scripts/setup_nvidia_container_toolkit.sh

`nvidia` docker runtime install (only if using Nvidia GPUs)

This step is only required if you plan to use Nvidia GPUs (USE_GPU=true).

If the command

sudo docker info | grep Runtimes

doesn't show nvidia as a runtime, you need to add it. Run the following script from the root of the repository:

sudo ./scripts/setup_nvidia_docker_runtime.sh

Getting the model file

You have two options to get a model file.

Download a model file: You can download the model from Hugging Face. To do this, execute the following script:

sudo ./scripts/download_model.sh

This script utilizes the configuration variables HUGGING_FACE_MODEL_REPO and MODEL_FILE of config.conf.

Use your own model file: Place your model file in the ./models folder. Ensure that the model file name matches the MODEL_FILE variable in config.conf.

Server setup

To setup the server, run:

sudo ./scripts/setup_server.sh

Controlling the server

Start the server:

sudo ./scripts/start_server.sh

Stop the server:

sudo ./scripts/stop_server.sh

You can check if the server is running locally on your machine:

sudo ./scripts/is_server_running_locally.sh

During the startup process, it is possible that the script may return false for a few seconds; however, everything is still okay. If you suspect there are errors during the server startup, check the log files in ./logs.

Client call

Once the server is running, you can try it out by giving it some tasks with a client call:

sudo ./scripts/translate.sh

Cleanup

To cleanup your environment, run the following command:

sudo ./scripts/cleanup/cleanup.sh

This will remove the installed components of the repository i.e. the log and model files and the Docker related volumes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm_translate

Setup

Configuration

Disk Space Requirements

Host setup

`docker` install

`nvidia-container-toolkit` install (only if using Nvidia GPUs)

`nvidia` docker runtime install (only if using Nvidia GPUs)

Getting the model file

Server setup

Controlling the server

Client call

Cleanup

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
client		client
llama.cpp		llama.cpp
logs		logs
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.conf		config.conf

jano-wol/llm_translate

Folders and files

Latest commit

History

Repository files navigation

llm_translate

Setup

Configuration

Disk Space Requirements

Host setup

docker install

nvidia-container-toolkit install (only if using Nvidia GPUs)

nvidia docker runtime install (only if using Nvidia GPUs)

Getting the model file

Server setup

Controlling the server

Client call

Cleanup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`docker` install

`nvidia-container-toolkit` install (only if using Nvidia GPUs)

`nvidia` docker runtime install (only if using Nvidia GPUs)

Packages