Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support LLaVA #196

Merged
merged 218 commits into from
Dec 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
218 commits
Select commit Hold shift + click to select a range
6de3469
v1
LZHgrla Nov 1, 2023
70971d9
add load_image
LZHgrla Nov 1, 2023
9405944
update cfg image url
LZHgrla Nov 1, 2023
c946bd5
del fig
LZHgrla Nov 1, 2023
5b76a56
Merge branch 'main' into lzh/llava
LZHgrla Nov 1, 2023
551bb74
update
LZHgrla Nov 1, 2023
70fa7d9
temp
LZHgrla Nov 1, 2023
5013d3d
update convert
LZHgrla Nov 2, 2023
39f2fb3
update chat_mm
LZHgrla Nov 2, 2023
a65a1ae
add exclude_frozen_parameters for deepspeed
LZHgrla Nov 2, 2023
5dd244a
update chat
LZHgrla Nov 2, 2023
669f282
update xtuner help msg
LZHgrla Nov 2, 2023
b0f9ad0
fix bugs
LZHgrla Nov 2, 2023
dea64fb
revert bf16 deepspeed
LZHgrla Nov 2, 2023
c31ab61
Merge branch 'InternLM:main' into lzh/llava
LZHgrla Nov 2, 2023
1f4a97b
fix bugs
LZHgrla Nov 2, 2023
6ceeaa8
add visual_select_layer for chat
LZHgrla Nov 2, 2023
7502793
improve pth_to_hf
LZHgrla Nov 2, 2023
6f31402
Merge branch 'main' into lzh/llava
LZHgrla Nov 3, 2023
9268dbc
rename projecter_pth to pretrained_pth
LZHgrla Nov 6, 2023
5282b3c
temp
LZHgrla Nov 6, 2023
3e4b425
update requirements
LZHgrla Nov 7, 2023
fe30549
add cfgs
LZHgrla Nov 7, 2023
413111f
update
LZHgrla Nov 7, 2023
fac6cf8
fix pre-commit
LZHgrla Nov 7, 2023
da3f268
optim chat
LZHgrla Nov 7, 2023
b25913b
optim chat
LZHgrla Nov 7, 2023
0a9e480
Delete xtuner/model/unused.py
LZHgrla Nov 7, 2023
c0c4b8b
move dispatch to a deeper folder
LZHgrla Nov 8, 2023
74159ac
add projector
LZHgrla Nov 8, 2023
b92f075
update
LZHgrla Nov 8, 2023
04fca91
del model/projector
LZHgrla Nov 8, 2023
fba83bc
fix bugs
LZHgrla Nov 8, 2023
93f6616
add docs
LZHgrla Nov 8, 2023
d13f71e
update
LZHgrla Nov 8, 2023
e62132f
update
LZHgrla Nov 8, 2023
37b017e
update
LZHgrla Nov 8, 2023
8d67ed1
update
LZHgrla Nov 8, 2023
6b99303
Merge branch 'main' into lzh/llava
LZHgrla Nov 8, 2023
00c5d0c
enhance resume for map_fn
LZHgrla Nov 8, 2023
5496952
update import
LZHgrla Nov 8, 2023
a07a833
add llava_internlm_chat_7b_clip_vit_large_p14
LZHgrla Nov 9, 2023
21e649b
update dispatch
LZHgrla Nov 9, 2023
e0e0275
update dispatch
LZHgrla Nov 9, 2023
1a12477
add link
LZHgrla Nov 9, 2023
fbaf22f
update max_length
LZHgrla Nov 10, 2023
f31366f
update max_length
LZHgrla Nov 10, 2023
9786d3b
update hyp
LZHgrla Nov 10, 2023
db15bfa
align
LZHgrla Nov 11, 2023
1af5c4a
Merge branch 'main' into lzh/llava
LZHgrla Nov 13, 2023
c426c42
move yi flash attn
LZHgrla Nov 13, 2023
001bf8d
fix pre-commit
LZHgrla Nov 13, 2023
7b7b690
update deepspeed requirements
LZHgrla Nov 14, 2023
374f997
add mmbench script
LZHgrla Nov 14, 2023
bb9dcc3
install openpyxl
LZHgrla Nov 14, 2023
f4114ed
add entry_point for mmbench
LZHgrla Nov 14, 2023
041e96f
save args
LZHgrla Nov 14, 2023
c5c8437
update mmbench
LZHgrla Nov 14, 2023
0509cdd
Merge branch 'main' into lzh/llava
LZHgrla Nov 15, 2023
c3e3cf2
update max_length
LZHgrla Nov 15, 2023
20b5d74
add llama2 qlora
LZHgrla Nov 15, 2023
092f7e6
update mmbench
LZHgrla Nov 15, 2023
6ffffef
fix mmbench bugs
LZHgrla Nov 15, 2023
efc178b
Merge branch 'main' into lzh/llava
LZHgrla Nov 16, 2023
d30ba29
use osp instead of os.path
LZHgrla Nov 16, 2023
a3f2435
refactor pth_to_hf
LZHgrla Nov 16, 2023
aafe80c
update chat and mmbench to support --llava
LZHgrla Nov 16, 2023
8836b0b
align to chat
LZHgrla Nov 16, 2023
8393559
update entry_point
LZHgrla Nov 16, 2023
e0274f6
add vicuna template
LZHgrla Nov 16, 2023
927e207
add vicuna_7b_v15
LZHgrla Nov 16, 2023
33d6c76
Merge branch 'main' into lzh/llava
LZHgrla Nov 17, 2023
903e074
fix pre-commit
LZHgrla Nov 17, 2023
b66419e
add vicuna_7b_v1.5 qlora
LZHgrla Nov 20, 2023
6edf769
Merge branch 'main' into lzh/llava
LZHgrla Nov 20, 2023
5fe1e54
skip_special_tokens for decode text
LZHgrla Nov 21, 2023
376abb6
remove do_sample
LZHgrla Nov 22, 2023
82d8df6
Merge branch 'main' into lzh/llava
LZHgrla Nov 22, 2023
4dc6379
add warmup
LZHgrla Nov 22, 2023
67ffd70
fix pre-commit
LZHgrla Nov 22, 2023
76ad9e9
Update dataset_prepare.md
LZHgrla Nov 22, 2023
aeded33
Update dataset_prepare.md
LZHgrla Nov 22, 2023
3c44f94
Add KEEP_STSTEM for template
LZHgrla Nov 22, 2023
b537e9b
remove
LZHgrla Nov 22, 2023
c36c1ff
fix vicuna template
LZHgrla Nov 22, 2023
fb3f7da
clean cfgs
LZHgrla Nov 22, 2023
b0b4f1d
add cfgs
LZHgrla Nov 22, 2023
6c2dec5
Merge branch 'main' into lzh/llava
LZHgrla Nov 22, 2023
1543df7
fix pre-commit
LZHgrla Nov 22, 2023
db434f7
add --language for mmbench
LZHgrla Nov 23, 2023
9f3e44e
Merge branch 'main' into lzh/llava
LZHgrla Nov 23, 2023
eb2ad0d
fix bugs
LZHgrla Nov 23, 2023
349e37c
fix pretrain bug
LZHgrla Nov 23, 2023
bbbc62b
support visual_encoder lora
LZHgrla Nov 23, 2023
0357a93
fix bugs
LZHgrla Nov 23, 2023
f2295d2
add paramwise_cfg
LZHgrla Nov 23, 2023
72a986e
remove print_peft_model_trainable_parameters
LZHgrla Nov 23, 2023
e0583cb
Merge branch 'main' into lzh/llava
LZHgrla Nov 24, 2023
4416c56
fix bugs
LZHgrla Nov 24, 2023
0894e60
add paramwise_cfg for DeepSpeedOptimWrapper
LZHgrla Nov 24, 2023
ff4f15e
fix engine deepspeed paramwise_cfg bug
LZHgrla Nov 24, 2023
aa1dbf1
fix encode_fn bug
LZHgrla Nov 25, 2023
a046e0e
fix
LZHgrla Nov 25, 2023
9080be3
fix pad_image_to_square bugs
LZHgrla Nov 26, 2023
12c212a
Add space for system to avoid mismatch of 'USER' token
LZHgrla Nov 26, 2023
19bde6f
revert to adding bos_token at each conv
LZHgrla Nov 29, 2023
7c01831
revert for paramwise_cfg
LZHgrla Nov 29, 2023
ba9de6d
better cfgs?
LZHgrla Nov 29, 2023
baa1727
fix import bug
LZHgrla Nov 29, 2023
c5e61a5
fix import bug
LZHgrla Nov 29, 2023
fece023
pretrain align
LZHgrla Nov 30, 2023
273b24d
update prepare_inputs_labels_for_multimodal
LZHgrla Nov 30, 2023
b37dd8f
1792
LZHgrla Nov 30, 2023
e307624
support length_grouped_samplers
LZHgrla Dec 1, 2023
e25280b
1792
LZHgrla Dec 1, 2023
0a15676
remove KEEP_SYSTEM
LZHgrla Dec 1, 2023
e3b936a
remove system in cfg
LZHgrla Dec 1, 2023
580136b
update 336 cfg
LZHgrla Dec 1, 2023
683385e
Merge branch 'main' into lzh/llava
LZHgrla Dec 1, 2023
053eb84
add torch_dtype for mmbench and chat
LZHgrla Dec 2, 2023
f362a9f
group 50
LZHgrla Dec 2, 2023
12d7a1e
quant for pretrain
LZHgrla Dec 2, 2023
c4fd8db
update cfgs
LZHgrla Dec 4, 2023
245af61
refactor cfgs
LZHgrla Dec 4, 2023
4087168
add length for concat dataset
LZHgrla Dec 4, 2023
013a930
update requirements
LZHgrla Dec 4, 2023
8721427
Merge branch 'lzh/llava' of github.com:LZHgrla/xtuner into lzh/llava
LZHgrla Dec 4, 2023
491be19
fix typo
LZHgrla Dec 4, 2023
bf5d2da
Merge branch 'main' into lzh/llava
LZHgrla Dec 4, 2023
0e21e51
add template for internlm pretrain
LZHgrla Dec 6, 2023
d429961
no zh
LZHgrla Dec 6, 2023
8ce84c3
remove 20b cfgs
LZHgrla Dec 6, 2023
41a8794
fix pre-commit
LZHgrla Dec 6, 2023
ac80d1a
revert invalid input
LZHgrla Dec 7, 2023
2e94a52
rename
LZHgrla Dec 7, 2023
c5b9e75
Update README.md
LZHgrla Dec 7, 2023
2110f19
Update README_zh-CN.md
LZHgrla Dec 7, 2023
036fd72
fix pre-commit
LZHgrla Dec 7, 2023
a8eecbf
remove llava_zh from docs
LZHgrla Dec 8, 2023
0cc9bf8
qlora 512
LZHgrla Dec 9, 2023
bcaffd4
rename llava map_fn
LZHgrla Dec 10, 2023
0f0250c
update cfgs
LZHgrla Dec 10, 2023
a050926
update model urls
LZHgrla Dec 11, 2023
951b15a
add docs link
LZHgrla Dec 11, 2023
8dc0746
add llava docs
LZHgrla Dec 11, 2023
2e5c77a
Merge branch 'main' into lzh/llava
LZHgrla Dec 11, 2023
3aef652
update docs
LZHgrla Dec 11, 2023
24996d6
update urls
LZHgrla Dec 11, 2023
8787baa
Merge branch 'main' into lzh/llava
LZHgrla Dec 11, 2023
3eac2df
add citation
LZHgrla Dec 11, 2023
f65bf9a
fix README
LZHgrla Dec 14, 2023
aa6d525
move
LZHgrla Dec 14, 2023
ba8facd
update
LZHgrla Dec 14, 2023
b717f8e
vicuna pretrain with prompt
LZHgrla Dec 14, 2023
6dd1e63
rename
LZHgrla Dec 15, 2023
321b351
add results
LZHgrla Dec 15, 2023
c44f71b
fix pre-commit
LZHgrla Dec 15, 2023
cde826f
update
LZHgrla Dec 15, 2023
f2b6e3b
update
LZHgrla Dec 15, 2023
3b6c07a
update
LZHgrla Dec 15, 2023
b383667
update
LZHgrla Dec 15, 2023
b463465
update
LZHgrla Dec 15, 2023
261fc43
update
LZHgrla Dec 15, 2023
da3ed07
update
LZHgrla Dec 15, 2023
8045143
update
LZHgrla Dec 15, 2023
f7d14f8
update
LZHgrla Dec 15, 2023
ee9c026
update
LZHgrla Dec 15, 2023
adb4ba5
update
LZHgrla Dec 15, 2023
fa1ce76
update
LZHgrla Dec 15, 2023
95faa59
Update README.md
LZHgrla Dec 15, 2023
8054690
Update README_zh-CN.md
LZHgrla Dec 15, 2023
56e9507
Update README_zh.md
LZHgrla Dec 15, 2023
99c4e91
Update README_zh.md
LZHgrla Dec 15, 2023
479a5fd
Update README.md
LZHgrla Dec 15, 2023
47f4927
Update README_zh.md
LZHgrla Dec 15, 2023
367225b
Update README.md
LZHgrla Dec 15, 2023
c4007f7
Update README.md
LZHgrla Dec 15, 2023
3d4dee8
fix typo
LZHgrla Dec 15, 2023
eec012c
fix
LZHgrla Dec 15, 2023
b027cb8
Update README.md
LZHgrla Dec 15, 2023
6276d33
Update README_zh-CN.md
LZHgrla Dec 15, 2023
ad65fc8
rename
LZHgrla Dec 16, 2023
77d9809
auto cn_string
LZHgrla Dec 16, 2023
a318133
fix pre-commit
LZHgrla Dec 16, 2023
1dadc4b
rename
LZHgrla Dec 16, 2023
72ca5ee
remove language
LZHgrla Dec 16, 2023
197c292
add VLMEvalKit
LZHgrla Dec 16, 2023
11cbfdc
rename VLLM to VLM
LZHgrla Dec 21, 2023
63ed932
add the download links of MMBench
LZHgrla Dec 21, 2023
99a2b8e
update
LZHgrla Dec 21, 2023
8080a06
update readme
LZHgrla Dec 21, 2023
360b816
update
LZHgrla Dec 21, 2023
4ade82d
update
LZHgrla Dec 21, 2023
885c832
update merge
LZHgrla Dec 21, 2023
990d689
fix cfg bug
LZHgrla Dec 21, 2023
0e5d692
Update README.md
LZHgrla Dec 21, 2023
8225f9f
Update README_zh.md
LZHgrla Dec 21, 2023
648111d
update
LZHgrla Dec 21, 2023
6f06498
fix
LZHgrla Dec 21, 2023
76d1313
Merge branch 'main' into lzh/llava
LZHgrla Dec 21, 2023
b5124b1
update requirements
LZHgrla Dec 22, 2023
5973d6c
Merge branch 'main' into lzh/llava
LZHgrla Dec 22, 2023
311f9d0
Update runtime.txt
LZHgrla Dec 22, 2023
cbb7924
Update runtime.txt
LZHgrla Dec 22, 2023
d9a96af
Update runtime.txt
LZHgrla Dec 22, 2023
8332c2c
Update README.md
LZHgrla Dec 25, 2023
b9efc8a
Update README.md
LZHgrla Dec 25, 2023
7b29f81
Update README_zh.md
LZHgrla Dec 25, 2023
6c80ec7
fix pre-commit
LZHgrla Dec 25, 2023
034b4cb
fix
LZHgrla Dec 25, 2023
f7ec4da
update mmbench prompt
LZHgrla Dec 25, 2023
7231865
fix bugs
LZHgrla Dec 26, 2023
bf384de
fix bugs
LZHgrla Dec 26, 2023
80d7c11
update docs
LZHgrla Dec 26, 2023
7e68e1d
update
LZHgrla Dec 26, 2023
327a122
update
LZHgrla Dec 26, 2023
15c927a
Merge branch 'main' into lzh/llava
LZHgrla Dec 26, 2023
761a7ea
Update README.md
LZHgrla Dec 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ English | [简体中文](README_zh-CN.md)

## 🎉 News

- **\[2023/12\]** 🔥 Support [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model! To get started, please check out the [docs](xtuner/configs/mixtral/README.md)!
- **\[2023/12\]** 🔥 Support multi-modal VLM pretraining and fine-tuning with [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) architecture! Click [here](xtuner/configs/llava/README.md) for details!
- **\[2023/12\]** 🔥 Support [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model! Click [here](xtuner/configs/mixtral/README.md) for details!
- **\[2023/11\]** Support [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b) model!
- **\[2023/10\]** Support [MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench) dataset, and the fine-tuned LLMs can be applied by [Lagent](https://github.com/InternLM/lagent)!
- **\[2023/10\]** Optimize the data processing to accommodate `system` context. More information can be found on [Docs](docs/en/user_guides/dataset_format.md)!
Expand Down Expand Up @@ -267,6 +268,18 @@ We appreciate all contributions to XTuner. Please refer to [CONTRIBUTING.md](.gi
- [Llama 2](https://github.com/facebookresearch/llama)
- [QLoRA](https://github.com/artidoro/qlora)
- [LMDeploy](https://github.com/InternLM/lmdeploy)
- [LLaVA](https://github.com/haotian-liu/LLaVA)

## 🖊️ Citation

```bibtex
@misc{2023xtuner,
title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
author={XTuner Contributors},
howpublished = {\url{https://github.com/InternLM/xtuner}},
year={2023}
}
```

## License

Expand Down
13 changes: 13 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

## 🎉 更新

- **\[2023/12\]** 🔥 支持多模态模型 VLM([LLaVA-v1.5](https://github.com/haotian-liu/LLaVA))预训练和指令微调!快速开始请查阅此[文档](xtuner/configs/llava/README_zh.md)!
- **\[2023/12\]** 🔥 支持 [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) 模型!快速开始请查阅此[文档](xtuner/configs/mixtral/README.md)!
- **\[2023/11\]** 支持 [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b) 模型!
- **\[2023/10\]** 支持 [MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench) 数据集,并且微调所得大语言模型可应用至 [Lagent](https://github.com/InternLM/lagent) 框架!
Expand Down Expand Up @@ -267,6 +268,18 @@ xtuner chat meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-
- [Llama 2](https://github.com/facebookresearch/llama)
- [QLoRA](https://github.com/artidoro/qlora)
- [LMDeploy](https://github.com/InternLM/lmdeploy)
- [LLaVA](https://github.com/haotian-liu/LLaVA)

## 🖊️ 引用

```bibtex
@misc{2023xtuner,
title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
author={XTuner Contributors},
howpublished = {\url{https://github.com/InternLM/xtuner}},
year={2023}
}
```

## 开源许可证

Expand Down
76 changes: 76 additions & 0 deletions docs/en/user_guides/dataset_prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [Arxiv Gentitle](#arxiv-gentitle)
- [MOSS-003-SFT](#moss-003-sft)
- [Chinese Lawyer](#chinese-lawyer)
- [LLaVA dataset](#llava-dataset)

## HuggingFace datasets

Expand Down Expand Up @@ -55,3 +56,78 @@ unzip moss-003-sft-with-tools-no-text2image.zip
Chinese Lawyer dataset has two sub-dataset, and can be downloaded form https://github.com/LiuHC0428/LAW-GPT.

All lawyer configs assume the dataset path to be `./data/CrimeKgAssitant清洗后_52k.json` and `./data/训练数据_带法律依据_92k.json`. You can move and rename your data, or make changes to these configs.

### LLaVA dataset

#### File structure

```
./data/llava_data
├── LLaVA-Pretrain
│   ├── blip_laion_cc_sbu_558k.json
│   ├── blip_laion_cc_sbu_558k_meta.json
│   └── images
├── LLaVA-Instruct-150K
│   └── llava_v1_5_mix665k.json
└── llava_images
   ├── coco
   │ └── train2017
   ├── gqa
   │ └── images
   ├── ocr_vqa
   │ └── images
   ├── textvqa
   │ └── train_images
   └── vg
      ├── VG_100K
   └── VG_100K_2
```

#### Pretrain

LLaVA-Pretrain

```shell
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
```

#### Finetune

1. Text data

1. LLaVA-Instruct-150K

```shell
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
```

2. Image data

1. COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)

2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)

3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)

1. ⚠️ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!

```shell
#!/bin/bash
ocr_vqa_path="<your-directory-path>"

find "$target_dir" -type f | while read file; do
extension="${file##*.}"
if [ "$extension" != "jpg" ]
then
cp -- "$file" "${file%.*}.jpg"
fi
done
```

4. TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)

5. VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
76 changes: 76 additions & 0 deletions docs/zh_cn/user_guides/dataset_prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [Arxiv Gentitle 生成题目](#arxiv-gentitle-生成题目)
- [MOSS-003-SFT](#moss-003-sft)
- [Chinese Lawyer](#chinese-lawyer)
- [LLaVA dataset](#llava-dataset)

## HuggingFace 数据集

Expand Down Expand Up @@ -55,3 +56,78 @@ unzip moss-003-sft-with-tools-no-text2image.zip
Chinese Lawyer 数据集有两个子数据集,它们可以在 https://github.com/LiuHC0428/LAW-GPT 下载。

所有的 Chinese Lawyer 配置文件都假设数据集路径为 `./data/CrimeKgAssitant清洗后_52k.json` 和 `./data/训练数据_带法律依据_92k.json`。用户可以移动并重命名数据,或者在配置文件中重新设置数据路径。

### LLaVA dataset

#### 文件结构

```
./data/llava_data
├── LLaVA-Pretrain
│   ├── blip_laion_cc_sbu_558k.json
│   ├── blip_laion_cc_sbu_558k_meta.json
│   └── images
├── LLaVA-Instruct-150K
│   └── llava_v1_5_mix665k.json
└── llava_images
   ├── coco
   │ └── train2017
   ├── gqa
   │ └── images
   ├── ocr_vqa
   │ └── images
   ├── textvqa
   │ └── train_images
   └── vg
      ├── VG_100K
   └── VG_100K_2
```

#### 预训练 Pretrain

LLaVA-Pretrain

```shell
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
```

#### 微调 Finetune

1. 文本数据

1. LLaVA-Instruct-150K

```shell
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
```

2. 图片数据

1. COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)

2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)

3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)

1. ⚠️ OCR-VQA 所下载的图片命名需要进行修改,以确保所有图片后缀为 `.jpg`!

```shell
#!/bin/bash
ocr_vqa_path="<your-directory-path>"

find "$target_dir" -type f | while read file; do
extension="${file##*.}"
if [ "$extension" != "jpg" ]
then
cp -- "$file" "${file%.*}.jpg"
fi
done
```

4. TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)

5. VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
1 change: 1 addition & 0 deletions requirements/runtime.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ lagent>=0.1.2
# Minimum 0.10.1 to support exclude_frozen_parameters for DeepSpeedStrategy,
# see https://github.com/open-mmlab/mmengine/pull/1415, https://github.com/open-mmlab/mmengine/pull/1424
mmengine>=0.10.1
openpyxl
# Minimum 0.4.0 to support QLoRA, see https://github.com/huggingface/peft/pull/476
peft>=0.4.0
scipy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,9 @@
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
dict(
type=DatasetInfoHook, tokenizer=tokenizer,
is_intern_repo_dataset=True),
dict(type=ThroughputHook)
]

Expand Down
92 changes: 92 additions & 0 deletions xtuner/configs/llava/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# LLaVA Full Pipeline

## Data Preparation

Please refer to the [docs](../../../docs/en/user_guides/dataset_prepare.md#llava-dataset).

## Training

The training of LLaVA consists of two steps: alignment module (i.e., MLP) pretraining and instruction following fine-tuning

Note: this guide takes 8-card training LLaVA-InternLM as an example, if there are insufficient GPU resources or memory during actual use, you can reduce the batchsize appropriately to decrease memory consumption. The Pretrained projector is saved and re-loaded by default in `./work_dirs/llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain/epoch_1.pth`.

1. Alignment module pretraining (saved by default in `./work_dirs/`)

```bash
NPROC_PER_NODE=8 xtuner train llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2
```

2. Instruction following fine-tuning (saved by default in `./work_dirs/`)

```bash
NPROC_PER_NODE=8 xtuner train llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2
```

## Model Convert (and Merge)

After training, we will obtain a set of weights (*i.e.*, `epoch_1.pth`), which are not in the universal HuggingFace format. We first need to convert them.

```bash
xtuner convert pth_to_hf $FINETUNE_CFG $PTH_PATH $SAVE_PATH
# e.g., xtuner convert pth_to_hf llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune ./epoch_1.pth ./epoch_1_hf
```

At this point, we have obtained the relevant model (LLM or the corresponding LoRA).

Afterwards, if you want to merge LoRA into LLM or CLIP-ViT, please use the following command:

```bash
(LLM) xtuner convert merge $LLM $LLM_ADAPTER $SAVE_PATH
(CLIP) xtuner convert merge $CLIP $CLIP_ADAPTER $SAVE_PATH --is-clip
```

## Chat

You can download the released LLaVA-InternLM-7B model from 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-internlm-7b) and 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-internlm-7b), and achieve image-text question answering with the following command!

```bash
xtuner chat internlm/internlm-chat-7b \
--visual-encoder openai/clip-vit-large-patch14-336 \
--llava xtuner/llava-internlm-7b \
--prompt-template internlm_chat \
--image $IMAGE_PATH
```

Here, `--llava` is the converted weight from the above step (in our example, it is `./epoch_1_hf` ).

## Evaluation

XTuner's LLaVA models can be evaluated using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit).

For convenience, XTuner also integrates the [MMBench](https://mmbench.opencompass.org.cn/home) evaluation.

User can download the MMBench dataset with

```
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/CCBench.tsv
```

After that, the evaluations can be run with

```bash
xtuner mmbench internlm/internlm-chat-7b \
--visual-encoder openai/clip-vit-large-patch14-336 \
--llava xtuner/llava-internlm-7b \
--prompt-template internlm_chat \
--data-path $DATA_PATH \
--work-dir $RESULT_PATH
```

Here, `$DATA_PATH` refers to one of the datasets downloaded as mentioned above, such as `MMBench_DEV_EN.tsv`.

After the evaluation is completed, if it's a development set, it will directly print out the results; If it's a test set, you need to submit `mmbench_result.xlsx` to the official MMBench for final evaluation to obtain precision results!

| Model | MMBench Test (EN) | MMBench Dev (EN) | MMBench Test (CN) | MMBench Dev (CN) | CCBench Dev | MME | MMVet | SEEDBench_IMG | Configs | Pretrained Projector Checkpoints | Fine-tuned LLaVA Checkpoints |
| :------------------------- | :---------------: | :--------------: | :---------------: | :--------------: | :---------: | :--: | :---: | :-----------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: |
| LLaVA-v1.5-7B (XTuner) | 67.7 | 69.2 | 61.0 | 59.7 | 27.6 | 1702 | 66.4 | 32.3 | [Pretrain](./vicuna_7b_v15_clip_vit_large_p14_336/pretrain/llava_vicuna_7b_v15_clip_vit_large_p14_336_e1_gpu8_pretrain.py) / [Fine-tune](./vicuna_7b_v15_clip_vit_large_p14_336/finetune/llava_vicuna_7b_v15_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune.py) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-7b-xtuner-pretrain) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-7b-xtuner-pretrain) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-7b-xtuner) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-7b-xtuner) |
| LLaVA-v1.5-13B (XTuner) | 68.9 | 69.5 | 64.7 | 63.1 | 32.2 | 1771 | 68.1 | 35.5 | [Pretrain](./vicuna_13b_v15_clip_vit_large_p14_336/pretrain/llava_vicuna_13b_v15_clip_vit_large_p14_336_e1_gpu8_pretrain.py) / [Fine-tune](./vicuna_13b_v15_clip_vit_large_p14_336/finetune/llava_vicuna_13b_v15_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune.py) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-13b-xtuner-pretrain) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-13b-xtuner-pretrain) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-13b-xtuner) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-13b-xtuner) |
| LLaVA-InternLM-7B (XTuner) | 69.0 | 68.5 | 66.7 | 63.8 | 35.8 | 1671 | 65.8 | 33.8 | [Pretrain](./internlm_chat_7b_clip_vit_large_p14_336/pretrain/llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain.py) / [Fine-tune](./internlm_chat_7b_clip_vit_large_p14_336/finetune/llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune.py) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-internlm-7b-pretrain) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-internlm-7b-pretrain) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-internlm-7b) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-internlm-7b) |
Loading
Loading