In this repository, we test VLAttack through the VQA task and NLVR task on the VQAv2 and NLVR2 datasets, respectively. We conducted VLAttack on 5K correctly predicted samples. Instructions are shown below:
Firstly, download the pretrained BLIP model weights (BLIP with ViT-B, 14M) from the BLIP original repository. We use these weights to generate adversarial samples in our work.
-
Download the VQAv2 dataset from the original website, and then set the
vqa_root
in./configs/vqa.yaml
-
Download the finetuned VQAv2 model weights from the original repo of BLIP. Specifically, the finetuned model weights can be downloaded from here. Don't forget to set the
pretrain
in./configs/vqa.yaml
with the path ofmodel_vqa.pth
. -
Find 5K correctly predicted samples using the
python prepare_vqa.py
command. After running, it will generateright_vqa_list.txt
andright_vqa_ans_table.txt
, which store the indexes and predictions of correctly predicted samples. -
To conduct VLAttack on the VQAv2 dataset, use the
python attack_vqa.py
command with different--method
options shown below:
-
Method Options:
- BSA (ours)
- VLAttack (ours)
- Co-Attack
- BERTAttack
-
Command: Replace
METHOD_NAME
with your chosen options from above:python attack_vqa.py --method METHOD_NAME
- Download the NLVR2 dataset from the original website, and then set the
image_root
in./configs/nlvr.yaml
- Download the finetuned NLVR2 model weights from the original repo of BLIP. Specifically, the finetuned model weights can be downloaded from here. Don't forget to set the
pretrain
in./configs/nlvr.yaml
with the path ofmodel_base_nlvr.pth
. - Find 5K correctly predicted samples using the
python prepare_nlvr.py
command. After running, it will generateright_nlvr_list.txt
andright_nlvr_ans_table.txt
, which store the indexes and predictions of correctly predicted samples. - To conduct VLAttack on the NLVR2 dataset, use the
python attack_nlvr.py
command with above--method
options. For example, run below command to conduct VLAttack:
python attack_nlvr.py --method VLAttack