The CMMVAE comes with a cli that contains commands that are useful for submitting and monitoring
experiments. Simply run cmmvae --help
for more information.
To invoke the cmmvae CLI, use one the following commands:
cmmvae --help
Usage: cmmvae [OPTIONS] COMMAND [ARGS]...
Main entry point for cmmvae CLI
Options:
--help Show this message and exit.
Commands:
logger Logger command group.
submit Submit experiments using configurations from a YAML file.
workflow Workflow commands for experiments.
Generates combinations of experiments to run. By default, configuration
for experiments is at experiments.yaml and can be overriden with --config_file.
All options specified with --config are parsed as keyword arguments as key=value
(ie. --config key=value --config key2=value2
).
If no key values are present in the config_file then default snakemake config applys.
To configure multiple experiments at a time replace the value that changes with a dictionary of a single key track
which
has a dictionary of either a single key and value of list or dictionary name value pairs. The names are appened to the run_name
with dot notation.
run_name: default
train_command:
fit:
model: configs/model/config.yaml
trainer:
track:
test: configs/trainer/config.test.yaml
# full: configs/trainer/config.yaml
data:
track:
local: configs/data/local.yaml
server: configs/data/server.yaml
The experiments.yaml configuration provided will create two experiments. One with the train_command:
fit --model configs/model/config.yaml --trainer configs/trainer/config.test.yaml --data configs/data/local.yaml --run_name=default.test.local
and the other:
fit --model configs/model/config.yaml --trainer configs/trainer/config.test.yaml --data configs/data/server.yaml --run_name=default.test.server
cmmvae submit --help
Usage: cmmvae submit [OPTIONS]
Submit experiments using configurations from a YAML file.
Args: config_file (str): Path to the YAML configuration file.
max_job_limit (int): Maximum number of jobs that can be run. preview
(bool): Preview job configurations without running them.
Options:
--config_file TEXT Path to configuration file. [default:
experiments.yaml]
--config TEXT Configuration options as key=value pairs
-m, --max_job_limit INTEGER Max number of jobs capable of outputting
without failure. [default: 10]
-t, --timestamp Added timestamp to end of run name.
-p, --preview Do not run subprocess, only preview job
configurations.
--help Show this message and exit.
Review stdout and stderr files from previous snakemake submissions.
cmmvae logger --help
Usage: cmmvae logger [OPTIONS] COMMAND [ARGS]...
Logger command group.
Options:
--help Show this message and exit.
Commands:
history Display the last n jobs in history.
job Display the job by submission jobid
last View the last job or a specified job.
Workflow commands responsible for training, running inference, and visualizing predictions.
cmmvae workflow --help
Usage: cmmvae workflow [OPTIONS] COMMAND [ARGS]...
Workflow commands for experiments.
Options:
--help Show this message and exit.
Commands:
cli Run using the LightningCli.
merge-predictions Merge saved embeddings and metadata into one npz and...
umap-predictions Plot UMAP embeddings and optionally log images to...
The CLI offers several options to manage and configure your experiments:
--default_root_dir
(str): Specifies the default directory for storing logs and checkpoints.--experiment_name
(str): Defines the name of the experiment directory.--run_name
(str): Specifies the name of the particular experiment run.--predict_dir
(str): Indicates the directory where predictions will be saved after model fitting.
If you do not intend to run a new experiment (e.g., for inference using an existing model checkpoint), you can set the --run
flag to False
. In this case, you must also provide:
--ckpt_path
(str): Path to the checkpoint file to be loaded by the model.
- Before Fit: The CLI prints the model configuration and saves its hyperparameters before beginning the fitting process.
- After Fit: Upon completion of model fitting, the CLI automatically executes the predict subcommand using the best checkpoint found during training.
This CLI is designed to provide a seamless and user-friendly interface for conducting experiments, managing configurations, and logging results efficiently.
Snakemake is responsible for managing execution depending on necessary resources per rule. The Snakemake pipeline for CMMVAE follows the following rules:
This Snakemake workflow consists of the following rules:
-
all
: The default rule that specifies the final target files (evaluation files) that should be produced by the workflow. -
train
: Trains the cmmVAE model using the specified configuration and saves the model checkpoint and predictions. -
merge_predictions
: Merges the prediction files generated during training into a unified format for further analysis. -
umap_predictions
: Generates UMAP visualizations based on the merged predictions and saves the resulting images.
The last rule that specifies completion of all jobs.
Train the model and output a config.yaml to reinitialize state and the best model checkpoint.
In order to train models separately from Snakemake, run the following:
cmmvae workflow cli fit -c config/config.yaml --default_root_dir /path/to/root --experiment_name experiment_name --run_name run_name ...
By default, Snakemake config passes those along by the following:
TRAIN_COMMAND += (
f"--default_root_dir {ROOT_DIR} "
f"--experiment_name {EXPERIMENT_NAME} "
f"--run_name {RUN_NAME} "
f"--seed_everything {SEED} "
f"--predict_dir {PREDICT_SUBDIR} "
)
Therefore, to override default config files, you can do something like the following:
snakemake --profile workflow/profile/slurm --config train_command="fit -c ..." --run_name="version000"
In the config.yaml files, you can specify class configurations by using the following structure:
class_path: path.to.Class
init_args:
arg1: argument1
arg2: 2
Look to LightningCLI Documentation for more resources on configuring YAML files.
To configure the workflow execution, you can modify the configuration file located at workflow/profile/slurm
. This file contains the default settings for cluster execution, including where logs are stored.
By default, Snakemake will create a .snakemake
directory in the directory where it is run, here you can find snakemake logs. When using Slurm profile: Inside the main directory, you will find a .cmmvae
folder with subdirectories for each rule that is executed. These subdirectories contain the err
and out
files for each job, named according to the job and the jobId.
Workflow resource allocation is determined by the rule executed and the configuration provided in workflow/profile/slurm/config.yaml.
- Training Jobs: Slurm jobs for training (
train
rule) are configured to run on GPU nodes by default. - Merging Predictions and UMAP Generation: Jobs for merging predictions (
merge_predictions
rule) and generating UMAPs (umap_predictions
rule) are configured to run on high-memory nodes.
While Snakemake can be executed on the head node, it is recommended to run Snakemake from a submission node for long-running jobs to ensure stability and resource availability.
To test that your setup is working properly you can run the following to perform a quick pass through pipeline:
Run a single experiment based off of workflow/config.yaml:
sbatch scripts/run-experiment.sh --config trainer=configs/trainer/config.test.yaml --config experiment_name=testing --config run_name=quick_test
or Run single or multiple experiments based off experiments.yaml:
cmmvae submit -t