Skip to content

aaupadhy/AmCLR

Repository files navigation

AmCLR & xAmCLR PyTorch Implementation based on SogCLR

In this repo, we show how to train a self-supervised model by using Global Contrastive Loss (GCL) on a widely used bimodal image-text dataset CC3M. Initial experimentations are run on CC3M_mini (100k subset).

Environment

Setting up a new virtual environment with Conda:

conda create -f config/env.yaml
conda activate AmCLR

[Optional] Mock Dataset Creation

For quick iteration on the data pipeline and flow, use python mock_dataset_creation.py and route the training and eval scripts to the newly generated directories. Mock data adheres to the intended CC3M structure.

Training and Evaluation

  1. Download the data: cc3m_subset_100k.tar.gz, a 100k subset of the Conceptual Captions dataset; mscoco_val.tar.gz, a 5k subset of the COCO val2014 dataset; clip_train.tar.gz, captions of the previous datasets; imagenet/val.tar, ImageNet validation set. The code and data should be structured as follows:
    .
    +--src (code)
    |
    +--clip_train (captions)
    |  +--cc3m_train_subset.json
    |  +--coco_val.json
    |
    +--datasets (images)
    |  +--cc3m_subset_100k
    |  +--mscoco_val
    |  +--imagnet
    |  |  +-- val
    
  2. To train a model on cc3m, use parallel_train.sh, below is a sample for one type of experiment:
    # Export environment variables
    export PYTHONPATH="$PYTHONPATH:./src"
    export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface'
    export TORCH_DISTRIBUTED_DEBUG=DETAIL  
    
    # Paths and Configurations
    data_path=./datasets
    ann_path=./clip_train
    train_image_root=cc3m_subset_100k/
    data=cc3m
    train_file=${data}_train_subset.json
    gamma=0.8
    epochs=30
    
    # Ensure necessary directories exist
    mkdir -p logs
    
    # Function to run training
    run_training() {
        local ita_type=$1
        local gpu_id=$2
        local optimizer=$3
        local port=$((4820 + gpu_id))
        local output_dir="output/${ita_type}/${ita_type}_${optimizer}_${data}_g${gamma}_e${epochs}"
        local log_dir="logs/${ita_type}"
        local log_file="${log_dir}/${ita_type}_${optimizer}_training.log"
    
        # Ensure output and log directories exist
        mkdir -p "${output_dir}"
        mkdir -p "${log_dir}"
    
        # Launch training
        CUDA_VISIBLE_DEVICES=${gpu_id} torchrun --nproc_per_node=1 --master_port=${port} ./src/clip.py \
            --data_path ${data_path} \
            --ann_path ${ann_path} \
            --train_file ${train_file} \
            --train_image_root ${train_image_root} \
            --output_dir ${output_dir} \
            --init_model \
            --use_amp \
            --ita_type ${ita_type} \
            --tau_init 0.01 \
            --opt ${optimizer} \
            --sogclr_gamma ${gamma} \
            --eta_init 0.03 --sched cosine \
            --distributed \
            --epochs ${epochs} > "${log_file}" 2>&1 &
    }
    
    # Call run_training with different configurations
    run_training sogclraug_linear 3 adamp
    run_training sogclraug_wSelf_linear 4 adamp
    run_training sogclr 5 adamp
    
    # Wait for all processes to finish
    wait
    
  3. To test the performance of a model on MSCOCO and ImageNet, use parallel_eval.sh, below is a sample for the same:
    export PYTHONPATH="$PYTHONPATH:./src"
    export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface'
    
    # Constants
    data_path=./datasets
    ann_path=./clip_train
    train_image_root=cc3m_subset_100k/
    data=cc3m
    train_file=${data}_train_subset.json
    epochs=30
    
    declare -A models
    models=(
        ["sogclraug_linear"]="adamp"
        ["sogclraug_wSelf_linear"]="adamp"
        ["sogclr"]="adamp"
        
    )
    # Iterate over the models
    for model in "${!models[@]}"; do
        optimizer=${models[$model]}
        gamma=0.8 # Adjust gamma if necessary
        checkpoint_path="./output/${model}/${model}_${optimizer}_${data}_g${gamma}_e${epochs}/checkpoint_30.pth"
        output_dir="output/eval/eval_${model}_${optimizer}_${data}_g${gamma}_e${epochs}"
    
        echo "Evaluating model: ${model}, optimizer: ${optimizer}"
    
        CUDA_VISIBLE_DEVICES=4 python ./src/clip.py \
            --data_path ${data_path} \
            --ann_path ${ann_path} \
            --train_file ${train_file} \
            --train_image_root ${train_image_root} \
            --output_dir ${output_dir} \
            --init_model \
            --use_amp \
            --ita_type ${model} \
            --tau_init 0.01 \
            --sogclr_gamma ${gamma} \
            --eta_init 0.03 --sched cosine \
            --no-distributed \
            --epochs ${epochs} \
            --evaluate \
            --checkpoint ${checkpoint_path} \
            --zs_dataset imagenet \
            --zs_datafolder ./datasets/imagenet/val > "logs/eval_logs/eval_${model}_${optimizer}.log" 2>&1 &
    done
    
    wait

Note: Aliases Used

AmCLR : sogclraug_linear xAmCLR : sogclraug_wSelf_linear

Reference

If you find this tutorial helpful, please cite:

@misc{jagannath2024amclrunifiedaugmentedlearning,
      title={AmCLR: Unified Augmented Learning for Cross-Modal Representations}, 
      author={Ajay Jagannath and Aayush Upadhyay and Anant Mehta},
      year={2024},
      eprint={2412.07979},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.07979}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published