This repo has the code for the paper "How far can we go with ImageNet for Text-to-Image generation?"
The core idea is that text-to-image generation models typically rely on vast datasets, prioritizing quantity over quality. The usual solution is to gather massive amounts of data. We propose a new approach that leverages strategic data augmentation of small, well-curated datasets to enhance the performance of these models. We show that this method improves the quality of the generated images on several benchmarks.
Paper on Arxiv: coming soon Project website: coming soon
To install, first create a virtual environment with python (at least 3.9) and run
pip install -e .
If you want to use the training pipeline (see training/README.md
):
pip install .[train]
Depending of your CUDA version be careful installing torch.
See data_augmentations/README.md
If you happen to use this repo in your experiments, you can acknowledge us by citing the following paper: