-
-
Notifications
You must be signed in to change notification settings - Fork 173
Run One trainer in Runpod
With Runpod you set up a virtual instance on the cloud. One advantage is to benefit of a large GPU at a reasonable cost, second as it's running on the cloud you can do anyother activity on your local machine during a training.
Note: the cloud training has been released in January 2025 that brings simplicity of use and amny advantages, we recommend to use this solution instead.
- Deploy a pod.
- Install OneTrainer
- Copy your dataset, model and config. Edit the config.
- Call boyu and start your training.
First Create an account on runpod and charge some money.
Then deploy a pod.
Select a GPU. They are charged hourly when used, cheapest prices are with the NVIDIA previous gen.
Choose a template. Here I'm using "RunPod VS Code Server" but others can work. Note there is a template with One Trainer already installed, search for "dxqbyd/onetrainer-cli:0.7", just think to update OT when using it, this template is updated only for major OT updates.
Review and edit the template. Check on the volume space. OT takes 10GB, then you need to think at your dataset(s), cache and workspace. If you plan to use models from Hugging Face that require a token (SD3, Flux), you can set your HF_TOKEN as an environment variable.

Select a pricing plan and deploy the pod.
Start the pod with the blue arrow top right.
Before connecting to it, open its parameter again (edit pod), you'll find the password for Jupyter Lab.
Connect to the pod and choose "Connect to HTTP Service (Port 8888).
You'll be asked for the Jupyter password and Jupiter Lab will open.
Open the terminal and install OneTrainer and byobu:
git clone https://github.com/Nerogar/OneTrainer.git
apt update
apt install ffmpeg byobu tmux aria2
cd OneTrainer/
./install.sh
Later you can update OT with ./update.sh
in OneTrainer directory.
Now open OneTrainer on your local computer and from the UI export your training configuration with the export button bottom right. Save it locally.
Back to Jupyter move under the root folder your config, base model and dataset(s). If you're reading the base model from HuggingFace, you don't need to upload it.
Edit your config to reflect your dataset (and model) location, save it.
Make sure to start with the root folder /workspace/ or OT won't find it.
Finally start the training in the OneTrainer Directory:
byobu
source venv/bin/activate
python scripts/train.py --config-path "<path_to_config>"
Ex: python scripts/train.py --config-path "/workspace/config.json"
You can stop the training with Ctrl C
when in the byobu, it has the same effect as stopping a training from the UI: create a backup and save the model.
Et voila !