🌌 Visual Foundation Model with Bayesian Inference

Real-world Image Classification with Distributed PyTorch Training

A robust foundation model for visual inference and classification tasks built on real-world images. This model leverages ResNet50 with Bayesian Inference for uncertainty estimation, distributed training with NCCL backend, and logging through Weights and Biases.

🔥 Key Features

Distributed Training: Multi-GPU training across devices using PyTorch's NCCL backend.
Bayesian Inference: Real-time uncertainty estimation for robust model outputs.
Efficient Data Handling: Supports large-scale real-world image datasets.
Automatic Logging: Logs training metrics and model checkpoints with Weights & Biases.

⚙️ Requirements

Python 3.8+
CUDA 11.0+ for GPU-based training
PyTorch >= 1.10
NCCL backend for distributed training
Weights & Biases for logging metrics

Install dependencies with:

pip install -r requirements.txt

🛠️ Setup

Clone the repository

git clone https://github.com/gaga1313/vfm.git
cd visual-foundation-model

Set up your dataset

Place your dataset under data/dataset/. The dataset should contain images and corresponding labels.
Environment Setup

Create an .env file with your Weights & Biases credentials if you’d like to enable logging.

🚀 Quick Start

1. Initialize Distributed Training

Ensure your GPUs are correctly set up and launch training with:

pyython -m torch.distributed.run --nproc_per_node=<num_gpus> train.py

🧠 Model Architecture

Our model uses the ResNet50 architecture, customized with:

Bayesian Inference Layer: Enables uncertainty estimation for each prediction.
Metric Logger: Logs losses, accuracies, and Bayesian confidence.
Multi-Process DataLoader: Optimized for distributed data loading in multi-GPU environments.

📊 Training

The training loop is distributed across available GPUs using DistributedSampler. Metrics are logged for every epoch via Weights & Biases.

Run the training script:

torchrun --nproc_per_node=<num_gpus> main.py --mode train

Flags:

--nproc_per_node: Number of GPUs
--batch_size: Batch size for each process
--epochs: Total epochs
--log_interval: Logging frequency

🔍 Testing

Evaluate the trained model on a separate test dataset. To run the test evaluation, use:

torchrun --nproc_per_node=<num_gpus> main.py --mode test

📈 Results

Sample results and logged metrics can be found in the results/ directory. Metrics such as accuracy, loss, and Bayesian confidence are logged for detailed analysis.

Example Weights & Biases Dashboard (replace <your_wandb_project_link>):

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Feel free to submit issues or pull requests. For major changes, please open an issue first to discuss what you would like to change.

Happy Training! 💪

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
metadata		metadata
src		src
.gitignore		.gitignore
README.md		README.md
attributions.py		attributions.py
featuremaps.py		featuremaps.py
inference_trajectories.py		inference_trajectories.py
train.py		train.py
train_conditional_vae.py		train_conditional_vae.py
train_reconstruction.py		train_reconstruction.py
train_regressor.py		train_regressor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌌 Visual Foundation Model with Bayesian Inference

🔥 Key Features

📋 Table of Contents

⚙️ Requirements

🛠️ Setup

🚀 Quick Start

🧠 Model Architecture

📊 Training

🔍 Testing

📈 Results

📜 License

🤝 Contributing

About

Releases

Packages

gaga1313/vfm

Folders and files

Latest commit

History

Repository files navigation

🌌 Visual Foundation Model with Bayesian Inference

🔥 Key Features

📋 Table of Contents

⚙️ Requirements

🛠️ Setup

🚀 Quick Start

🧠 Model Architecture

📊 Training

🔍 Testing

📈 Results

📜 License

🤝 Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages