In this repo, we show how to train a self-supervised model by using Global Contrastive Loss (GCL) on a widely used bimodal image-text dataset CC3M. Initial experimentations are run on CC3M_mini (100k subset).
Setting up a new virtual environment with Conda:
conda create -f config/env.yaml
conda activate AmCLR
For quick iteration on the data pipeline and flow, use python mock_dataset_creation.py
and route the training and eval scripts to the newly generated directories. Mock data adheres to the intended CC3M structure.
- Download the data: cc3m_subset_100k.tar.gz, a 100k subset of the Conceptual Captions dataset; mscoco_val.tar.gz, a 5k subset of the COCO val2014 dataset; clip_train.tar.gz, captions of the previous datasets; imagenet/val.tar, ImageNet validation set. The code and data should be structured as follows:
. +--src (code) | +--clip_train (captions) | +--cc3m_train_subset.json | +--coco_val.json | +--datasets (images) | +--cc3m_subset_100k | +--mscoco_val | +--imagnet | | +-- val
- To train a model on cc3m, use
parallel_train.sh
, below is a sample for one type of experiment:# Export environment variables export PYTHONPATH="$PYTHONPATH:./src" export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface' export TORCH_DISTRIBUTED_DEBUG=DETAIL # Paths and Configurations data_path=./datasets ann_path=./clip_train train_image_root=cc3m_subset_100k/ data=cc3m train_file=${data}_train_subset.json gamma=0.8 epochs=30 # Ensure necessary directories exist mkdir -p logs # Function to run training run_training() { local ita_type=$1 local gpu_id=$2 local optimizer=$3 local port=$((4820 + gpu_id)) local output_dir="output/${ita_type}/${ita_type}_${optimizer}_${data}_g${gamma}_e${epochs}" local log_dir="logs/${ita_type}" local log_file="${log_dir}/${ita_type}_${optimizer}_training.log" # Ensure output and log directories exist mkdir -p "${output_dir}" mkdir -p "${log_dir}" # Launch training CUDA_VISIBLE_DEVICES=${gpu_id} torchrun --nproc_per_node=1 --master_port=${port} ./src/clip.py \ --data_path ${data_path} \ --ann_path ${ann_path} \ --train_file ${train_file} \ --train_image_root ${train_image_root} \ --output_dir ${output_dir} \ --init_model \ --use_amp \ --ita_type ${ita_type} \ --tau_init 0.01 \ --opt ${optimizer} \ --sogclr_gamma ${gamma} \ --eta_init 0.03 --sched cosine \ --distributed \ --epochs ${epochs} > "${log_file}" 2>&1 & } # Call run_training with different configurations run_training sogclraug_linear 3 adamp run_training sogclraug_wSelf_linear 4 adamp run_training sogclr 5 adamp # Wait for all processes to finish wait
- To test the performance of a model on MSCOCO and ImageNet, use
parallel_eval.sh
, below is a sample for the same:export PYTHONPATH="$PYTHONPATH:./src" export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface' # Constants data_path=./datasets ann_path=./clip_train train_image_root=cc3m_subset_100k/ data=cc3m train_file=${data}_train_subset.json epochs=30 declare -A models models=( ["sogclraug_linear"]="adamp" ["sogclraug_wSelf_linear"]="adamp" ["sogclr"]="adamp" ) # Iterate over the models for model in "${!models[@]}"; do optimizer=${models[$model]} gamma=0.8 # Adjust gamma if necessary checkpoint_path="./output/${model}/${model}_${optimizer}_${data}_g${gamma}_e${epochs}/checkpoint_30.pth" output_dir="output/eval/eval_${model}_${optimizer}_${data}_g${gamma}_e${epochs}" echo "Evaluating model: ${model}, optimizer: ${optimizer}" CUDA_VISIBLE_DEVICES=4 python ./src/clip.py \ --data_path ${data_path} \ --ann_path ${ann_path} \ --train_file ${train_file} \ --train_image_root ${train_image_root} \ --output_dir ${output_dir} \ --init_model \ --use_amp \ --ita_type ${model} \ --tau_init 0.01 \ --sogclr_gamma ${gamma} \ --eta_init 0.03 --sched cosine \ --no-distributed \ --epochs ${epochs} \ --evaluate \ --checkpoint ${checkpoint_path} \ --zs_dataset imagenet \ --zs_datafolder ./datasets/imagenet/val > "logs/eval_logs/eval_${model}_${optimizer}.log" 2>&1 & done wait
AmCLR : sogclraug_linear xAmCLR : sogclraug_wSelf_linear
If you find this tutorial helpful, please cite:
@misc{jagannath2024amclrunifiedaugmentedlearning,
title={AmCLR: Unified Augmented Learning for Cross-Modal Representations},
author={Ajay Jagannath and Aayush Upadhyay and Anant Mehta},
year={2024},
eprint={2412.07979},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.07979},
}