In this example, we'll be training a PixArt Sigma model using the SimpleTuner toolkit and will be using the full
model type, as it being a smaller model will likely fit in VRAM.
Make sure that you have python installed. You can check this by running:
python --version
Clone the SimpleTuner repository and set up the python venv:
git clone --branch=release https://github.com/bghira/SimpleTuner.git
cd SimpleTuner
python -m venv .venv
source .venv/bin/activate
pip install -U poetry pip
Depending on your system, you will run one of 3 commands:
# MacOS
poetry install --no-root -C install/apple
# Linux
poetry install --no-root
# Linux with ROCM
poetry install --no-root -C install/rocm
To run SimpleTuner, you will need to set up a configuration file, the dataset and model directories, and a dataloader configuration file.
Copy sdxl-env.sh.example
to sdxl-env.sh
:
cp sdxl-env.sh.example sdxl-env.sh
There, you will need to modify the following variables:
MODEL_TYPE
- Set this tofull
.USE_BITFIT
- Set this tofalse
.PIXART_SIGMA
- Set this totrue
.MODEL_NAME
- Set this toPixArt-alpha/PixArt-Sigma-XL-2-1024-MS
.BASE_DIR
- Set this to the directory where you want to store your outputs and datasets. It's recommended to use a full path here.
There are a few more if using a Mac M-series machine:
MIXED_PRECISION
should be set tono
.USE_XFORMERS
should be set tofalse
.
It's crucial to have a substantial dataset to train your model on. There are limitations on the dataset size, and you will need to ensure that your dataset is large enough to train your model effectively. Note that the bare minimum dataset size is TRAIN_BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS
. The dataset will not be discoverable by the trainer if it is too small.
Depending on the dataset you have, you will need to set up your dataset directory and dataloader configuration file differently. In this example, we will be using pseudo-camera-10k as the dataset.
In your BASE_DIR
directory, create a multidatabackend.json:
[
{
"id": "pseudo-camera-10k-pixart",
"type": "local",
"crop": true,
"crop_aspect": "square",
"crop_style": "center",
"resolution": 0.5,
"minimum_image_size": 0.25,
"maximum_image_size": 1.0,
"target_downsample_size": 1.0,
"resolution_type": "area",
"cache_dir_vae": "cache/vae/pixart/pseudo-camera-10k",
"instance_data_dir": "datasets/pseudo-camera-10k",
"disabled": false,
"skip_file_discovery": "",
"caption_strategy": "filename",
"metadata_backend": "json"
},
{
"id": "text-embeds",
"type": "local",
"dataset_type": "text_embeds",
"default": true,
"cache_dir": "cache/text/pixart/pseudo-camera-10k",
"disabled": false,
"write_batch_size": 128
}
]
Then, navigate to the BASE_DIR
directory and create a datasets
directory:
apt -y install git-lfs
mkdir -p datasets
pushd datasets
git clone https://huggingface.co/datasets/ptx0/pseudo-camera-10k
popd
This will download about 10k photograph samples to your datasets/pseudo-camera-10k
directory, which will be automatically created for you.
You'll want to login to WandB and HF Hub before beginning training, especially if you're using PUSH_TO_HUB=true
and --report_to=wandb
.
If you're going to be pushing items to a Git LFS repository manually, you should also run git config --global credential.helper store
Run the following commands:
wandb login
and
huggingface-cli login
Follow the instructions to log in to both services.
From the SimpleTuner directory, one simply has to run:
bash train_sdxl.sh
This will begin the text embed and VAE output caching to disk.
For more information, see the dataloader and tutorial documents.