This example shows how to perform self-supervised image classifier training with BYOL using Determined's PyTorch API. This example is based on the byol-pytorch package.
Original BYOL paper:
Code and configuration details also sourced from the following BYOL implementations:
- (JAX, paper authors)
- (Pytorch)
- Backbone registry.
- Dataset downloading and metadata registry.
- Kicks off an evaluation run, for longer training of classifier heads.
- Script to generate a blob list from a GCS bucket + prefix. Used to support GCS streaming for ImageNet dataset.
- Core trial and callback definitions. This is the entrypoint for trials.
- Optimizer definitions and utilities.
- Custom reducers used for evaluation metrics.
- This script will automatically be run by Determined during startup of every container launched for this experiment. This script installs some additional dependencies.
- Simple utility functions and classes.
- const-cifar10.yaml: Train with CIFAR-10 on a single GPU with constant hyperparameter values.
- distributed-stl10.yaml: Train with STL-10 using 8 GPU distributed training with constant hyperparameter values.
- distributed-imagenet.yaml: Train with ImageNet using 64 GPU distributed training with constant hyperparameter values.
This repo uses three datasets:
- CIFAR-10 (32x32, 10 classes), automatically downloaded via torchvision.
- STL-10 (96x96, 10 classes), automatically downloaded via torchvision.
- ImageNet-1k (1000 classes), which must stored in a GCS bucket along with a blob index. Information on downloading ImageNet-1k is available at the ImageNet website. See
for an example bucket configuration,
for a script to generate the blob list.
If you have not yet installed Determined, installation instructions can be found under docs/install-admin.html
or at
Run the following command to kick off self-supervised training: det -m <master host:port> experiment create -f config/const-cifar10.yaml .
The other configurations can be run by specifying the appropriate configuration file in place of const-cifar10.yaml
To run classifier training and validation on a completed self-supervised training:
- Find the experiment ID of your self-supervised training.
- Run
python --experiment-id=<id> --classifier-train-epochs=<number>
This is necessary for ImageNet, where hyperparameters.validate_with_classifier
is set to false
during self-supervised training due to the time it takes to train the classifier. Other configs have hyperparameters.validate_with_classifier
set to true to collect test_accuracy
during the self-supervised training.
For const-cifar10.yaml
and distributed-stl10.yaml
, results were taken from best test_accuracy
achieved over the self-supervised training duration. For distributed-imagenet.yaml
, result was taken from running
for 80 classifier training epochs.
Config file | Test Accuracy (%) |
const-cifar10.yaml | 74.91 |
distributed-stl10.yaml | 91.10 |
distributed-imagenet.yaml | 71.37 |