Skip to content

Run One trainer in Runpod

hyppyhyppo edited this page Feb 22, 2025 · 12 revisions

With Runpod you set up a virtual instance on the cloud. One advantage is to benefit of a large GPU at a reasonable cost, second as it's running on the cloud you can do anyother activity on your local machine during a training.

Note: the cloud training has been released in January 2025 that brings simplicity of use and amny advantages, we recommend to use this solution instead.

  1. Deploy a pod.
  2. Install OneTrainer
  3. Copy your dataset, model and config. Edit the config.
  4. Call boyu and start your training.

First Create an account on runpod and charge some money.

Then deploy a pod. DEploy1

Select a GPU. They are charged hourly when used, cheapest prices are with the NVIDIA previous gen. GPU GPU2

Choose a template. Here I'm using "RunPod VS Code Server" but others can work. Note there is a template with One Trainer already installed, search for "dxqbyd/onetrainer-cli:0.7", just think to update OT when using it, this template is updated only for major OT updates.

Review and edit the template. Check on the volume space. OT takes 10GB, then you need to think at your dataset(s), cache and workspace. If you plan to use models from Hugging Face that require a token (SD3, Flux), you can set your HF_TOKEN as an environment variable.

hf

Select a pricing plan and deploy the pod.

Plan

Start the pod with the blue arrow top right.

Capture d'écran 2024-09-07 102729

Before connecting to it, open its parameter again (edit pod), you'll find the password for Jupyter Lab.

Capture d'écran 2024-09-07 102857 edittemplate

Connect to the pod and choose "Connect to HTTP Service (Port 8888).

Capture d'écran 2024-09-07 102934

You'll be asked for the Jupyter password and Jupiter Lab will open.

Capture d'écran 2024-09-07 103046

Open the terminal and install OneTrainer and byobu:

git clone https://github.com/Nerogar/OneTrainer.git

apt update

apt install ffmpeg byobu tmux aria2

cd OneTrainer/

./install.sh

Later you can update OT with ./update.sh in OneTrainer directory.

Now open OneTrainer on your local computer and from the UI export your training configuration with the export button bottom right. Save it locally.

Back to Jupyter move under the root folder your config, base model and dataset(s). If you're reading the base model from HuggingFace, you don't need to upload it. Capture d'écran 2024-09-07 104359

Edit your config to reflect your dataset (and model) location, save it.

Capture d'écran 2024-09-07 104447

Make sure to start with the root folder /workspace/ or OT won't find it.

Finally start the training in the OneTrainer Directory:

byobu

source venv/bin/activate

python scripts/train.py --config-path "<path_to_config>"

Ex: python scripts/train.py --config-path "/workspace/config.json"

You can stop the training with Ctrl C when in the byobu, it has the same effect as stopping a training from the UI: create a backup and save the model.

Et voila !

Clone this wiki locally