dataset | Google Drive | BaiduNetDisk | Description |
---|---|---|---|
E-IC | [Google Drive] | [BaiduNetDisk] | dataset for editing Image Captioning |
E-VQA | [Google Drive] | [BaiduNetDisk] | dataset for editing Visual Question Answering |
- All images used in E-IC and E-VQA are available for download at Google Drive or BaiduNetDisk.
- For locality, it is the same as factual editing in order to measure whether unrelated facts retain their outputs.
- For multimodal locality, it assesses the impact of editing on the visual module, which is similar to regular locality.
dataset description
editing-data
├── caption
│ ├── caption_train_edit.json
│ └── caption_eval_edit.json
├── locality
│ ├── NQ dataset
│ │ ├── train.json
│ │ └── validation.json
├── multimodal_locality
│ ├── OK-VQA dataset
│ │ ├── okvqa_loc.json
└── vqa
├── vqa_train.json
└── vqa_eval.json
Multimodal locality
(evaluation for multimodal locality, see dataset's details in this paper)
Note: Please use Python 3.9+ for EasyEdit To get started, simply install conda and run:
git clone https://github.com/zjunlp/EasyEdit.git
conda create -n EasyEdit python=3.9.7
...
pip install -r requirements.txt
You should configure the qformer_checkpoint
and pretrained_ckpt
settings, deviating from the original repository's guidelines. Please refer to the Multimodal section in this file for the correct settings.
pretrained_ckpt
can be downloaded from here, and for the qformer_checkpoint
, you can find it here.
- Meta-learning based:
MEND
- Memory-based routing:
SERAC
For above editing methods, pre-training of corresponding meta-networks or classifiers is required. Therefore, in EasyEdit, we provide a unified framework for pretraining the relevant network structures. Take the training SERAC for example:
Step1: Define a MLLM as the object to be edited.
Choose the MLLM to be edited. EasyEdit
supports partial multimodal models(MiniGPT-4
, BLIP2OPT
so far). The corresponding configuration file directory is hparams/TRAINING/YUOR_METHOD/YOUR_MODEL.YAML
for training, such as hparams/TRAINING/MEND/minigpt4.yaml
, set the corresponding model_name
to select the object for editing. And hparams/YUOR_METHOD/YOUR_MODEL.YAML
for evaluating.
model_name: minigpt4
model_class: Blip2OPT
tokenizer_class: LlamaTokenizer
tokenizer_name: Vicuna
Step2: Choose the appropriate Editing Method The selection of editing methods is a crucial step, as different methods have their own strengths and weaknesses. Users need to consider the trade-off between editing success rate, generalization, and maintaining unrelated performance.
## In this case, we use SERAC method, so you should import `SERACMultimodalTrainingHparams` for training
from easyeditor import SERACMultimodalTrainingHparams
## Loading config from hparams/TRAINING/SERAC/minigpt4.yaml
training_hparams = SERACMultimodalTrainingHparams.from_hparams('./hparams/TRAINING/SERAC/minigpt4.yaml')
Step3: Provide the edit training set
The currently supported and available datasets are: Caption
and VQA
(Google Drive). Please place them in the "data" directory and initialize the dataset_class (CaptionDataset
for Caption and VQADataset
for VQA) to load the corresponding training set.
train_ds = CaptionDataset('data/caption_train_edit.json', config=training_hparams)
eval_ds = CaptionDataset('data/caption_eval_edit.json', config=training_hparams)
Step4: Combine them into a Trainer
trainer = MultimodalTrainer(
config=hparams,
train_set=train_ds,
val_set=eval_ds
)
Step5: Run and Edit Done! We can conduct Run and Evaluation.
trainer.run()
- Run: The
CHECKPOINT
will be saved to the pathresults_dir
. - Edit: Set the
archive
field in the hparams file toCHECKPOINT
. EasyEdit will automatically load the corresponding pre-trained weights during the editing process (Go to edit).
Training Example
training_hparams = SERACMultimodalTrainingHparams.from_hparams('hparams/TRAINING/SERAC/minigpt4.yaml')
train_ds = CaptionDataset('data/caption_train_edit.json', config=training_hparams)
eval_ds = CaptionDataset('data/caption_eval_edit.json', config=training_hparams)
trainer = MultimodalTrainer(
config=hparams,
train_set=train_ds,
val_set=eval_ds
)
trainer.run()
Evaluating Example
hparams = SERACMultimodalHparams.from_hparams('hparams/SERAC/minigpt4.yaml')
# train_ds = CaptionDataset('data/caption_train_edit.json', config=hparams)
eval_ds = CaptionDataset('data/caption_eval_edit.json', config=hparams)
trainer = MultimodalTrainer(
config=hparams,
train_set=eval_ds,
val_set=eval_ds
)
trainer.run()
The results will include the following metrics:
-
rewrite_acc
$\rightarrow$ Reliablilty -
rephrase_acc
$\rightarrow$ Generalization -
image_rephrase_acc
$\rightarrow$ Generalization for Multimodal -
locality_acc
$\rightarrow$ Locality -
multimodal_locality_acc
$\rightarrow$ Locality for Multimodal
MultimodalEditor
is the class for Multi-Modality Editing. You can choose the appropriate editing method (such asIKE
) based on your specific needs.
- Due to different transformer versions and different GPU models, the editing results may fluctuate slightly.
Step1: Generate embedding files for IKE You can use Generate_Embedding_for_IKE()
in multimodal_edit.py
to generate directly.
## Generate embedding files for IKE
hparams = IKEMultimodalHyperParams.from_hparams('hparams/IKE/blip2.yaml')
train_ds = VQADataset('data/vqa_train.json', config=hparams)
sentence_model = SentenceTransformer(hparams.sentence_model_name).to(f'cuda:{hparams.device}')
encode_ike_facts_multimodal(sentence_model, train_ds, hparams)
Step 2: Run and Edit! Select a specific model and dataset, then use test_IKE_MiniGPT4_Caption()
in multimodal_edit.py
to run the experiments.
- For the Caption dataset, use the following code:
hparams = IKEMultimodalHyperParams.from_hparams('hparams/IKE/minigpt4.yaml')
editor = MultimodalEditor.from_hparams(hparams)
eval_ds = CaptionDataset('data/caption_eval_edit.json', config=hparams)
metrics, edited_model, _ = editor.edit_dataset(
ds=eval_ds,
train_ds=eval_ds,
keep_original_weight=True
)
print_result(metrics)
- For the VQA dataset, you should set the
template
as follows:
hparams = IKEMultimodalHyperParams.from_hparams('hparams/IKE/minigpt4.yaml')
editor = MultimodalEditor.from_hparams(hparams)
eval_ds = VQADataset('data/vqa_eval.json', config=hparams)
template = "Question: {} Short answer:"
metrics, edited_model, _ = editor.edit_dataset(
ds=eval_ds,
train_ds=eval_ds,
keep_original_weight=True,
template=template
)
print_result(metrics)
For
MEND
andSERAC
, the CHECKPOINT mentioned in MultimodalTrainer Step 5 is needed. Then you can edit models for any dataset usingMultimodalEditor
.
For example, to run experiments with MEND
on the Caption dataset, use the following code:
hparams = MENDMultimodalHparams.from_hparams('hparams/MEND/minigpt4.yaml')
editor = MultimodalEditor.from_hparams(hparams)
eval_ds = CaptionDataset('data/caption_eval_edit.json', config=hparams)
metrics, edited_model, _ = editor.edit_dataset(
ds=eval_ds,
keep_original_weight=True
)
print_result(metrics)
We would like to express our sincere gratitude to the excellent work LAVIS, MiniGPT-4, SERAC and MEND.
If finding this work useful for your research, you can cite it as follows:
@inproceedings{DBLP:conf/emnlp/0008TL0WC023,
author = {Siyuan Cheng and
Bozhong Tian and
Qingbin Liu and
Xi Chen and
Yongheng Wang and
Huajun Chen and
Ningyu Zhang},
editor = {Houda Bouamor and
Juan Pino and
Kalika Bali},
title = {Can We Edit Multimodal Large Language Models?},
booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2023, Singapore, December 6-10, 2023},
pages = {13877--13888},
publisher = {Association for Computational Linguistics},
year = {2023},
url = {https://aclanthology.org/2023.emnlp-main.856},
timestamp = {Wed, 13 Dec 2023 17:20:20 +0100},
biburl = {https://dblp.org/rec/conf/emnlp/0008TL0WC023.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}