Run One trainer in Runpod

With Runpod you set up a virtual instance on the cloud. One advantage is to benefit of a large GPU at a reasonable cost, second as it's running on the cloud you can do anyother activity on your local machine during a training.

Note: the cloud training has been released in January 2025 that brings simplicity of use and amny advantages, we recommend to use this solution instead.

Deploy a pod.
Install OneTrainer
Copy your dataset, model and config. Edit the config.
Call boyu and start your training.

First Create an account on runpod and charge some money.

Then deploy a pod. DEploy1

Select a GPU. They are charged hourly when used, cheapest prices are with the NVIDIA previous gen. GPU GPU2

Choose a template. Here I'm using "RunPod VS Code Server" but others can work. Note there is a template with One Trainer already installed, search for "dxqbyd/onetrainer-cli:0.7", just think to update OT when using it, this template is updated only for major OT updates.

Review and edit the template. Check on the volume space. OT takes 10GB, then you need to think at your dataset(s), cache and workspace. If you plan to use models from Hugging Face that require a token (SD3, Flux), you can set your HF_TOKEN as an environment variable.

Select a pricing plan and deploy the pod.

Plan

Start the pod with the blue arrow top right.

Capture d'écran 2024-09-07 102729

Before connecting to it, open its parameter again (edit pod), you'll find the password for Jupyter Lab.

Capture d'écran 2024-09-07 102857 edittemplate

Connect to the pod and choose "Connect to HTTP Service (Port 8888).

Capture d'écran 2024-09-07 102934

You'll be asked for the Jupyter password and Jupiter Lab will open.

Capture d'écran 2024-09-07 103046

Open the terminal and install OneTrainer and byobu:

git clone https://github.com/Nerogar/OneTrainer.git

apt update

apt install ffmpeg byobu tmux aria2

cd OneTrainer/

./install.sh

Later you can update OT with ./update.sh in OneTrainer directory.

Now open OneTrainer on your local computer and from the UI export your training configuration with the export button bottom right. Save it locally.

Back to Jupyter move under the root folder your config, base model and dataset(s). If you're reading the base model from HuggingFace, you don't need to upload it. Capture d'écran 2024-09-07 104359

Edit your config to reflect your dataset (and model) location, save it.

Capture d'écran 2024-09-07 104447

Make sure to start with the root folder /workspace/ or OT won't find it.

Finally start the training in the OneTrainer Directory:

byobu

source venv/bin/activate

python scripts/train.py --config-path "<path_to_config>"

Ex: python scripts/train.py --config-path "/workspace/config.json"

You can stop the training with Ctrl C when in the byobu, it has the same effect as stopping a training from the UI: create a backup and save the model.

Et voila !

Overview

Home

Overview

Learning

Training

Getting Started

The Program - Tab Explanation

General

Model

Data

Concepts

Training

Optimizers

Custom Scheduler

Sampling

Backup and Saving

Tools

Additional Embeddings

Cloud

Embedding

Lora

More info

Infos, Guides and Lessons Learnt

Misc Info

Diffusion Models

Guides

One Trainer March 2024 Guide

Run One Trainer on Runpod

Other Tools - Helpful Links

Lessons Learnt

Frequently Asked Questions

Lessons Learnt and Tutorials

For Developers

Dev Corner

Developing on Clouds

Quick Start for Developers

CLI Training

Docker Image

Embedding Training

Project Structure

RAM Offloading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run One trainer in Runpod

Overview

Training

More info

For Developers

Clone this wiki locally