Pretraining

Please download the 558K subset of the LAION-CC-SBU dataset with BLIP captions we use in the paper here. Put the downloaded data under the folder playground/data.

playground/
└── data
    └── pretrain
        ├── blip_laion_cc_sbu_558k.json
        ├── blip_laion_cc_sbu_558k_meta.json
        └── images

Instruction Tuning

Please download the annotation of the final mixture our instruction tuning data llava_v1_5_mix665k.json, and download the images from constituting datasets:

COCO: train2017， val2014
GQA: images
OCR-VQA: download script, we save all files as .jpg
TextVQA: train_val_images
VisualGenome: part1, part2

After downloading all of them, organize the data as follows in ./playground/data,

playground/
└── data
    ├── llava_v1_5_mix665k.json
    ├── coco
    │   ├── val2014
    │   └── train2017
    ├── gqa
    │   └── images
    ├── ocr_vqa
    │   └── images
    ├── textvqa
    │   └── train_images
    └── vg
        ├── VG_100K
        └── VG_100K_2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data.md

Data.md

Pretraining

Instruction Tuning

Files

Data.md

Latest commit

History

Data.md

File metadata and controls

Pretraining

Instruction Tuning