Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

We build the evaluation tool MMVMEvalKit based on VLMEvalKit.

Before running evaluation:

Clone down our MMVMEvalKit.
Download the match_bench.zip and mllm_match_eval_full.tsv from here and put them under the MMVMEvalKit folder and match_bench.zip
Evironment requirements follow that of VLMEvalKit
Note: Your OpenAI API Key should be setted in the .env file:

# OpenAI API
OPENAI_API_KEY=
OPENAI_API_BASE=

To evaluate the existing MLLMs on MMVM benchmark, e.g. InternVL2-2B, run

python run.py --data MMatch --model InternVL2-2B --verbose

To evaluate CoLVA-InternVL2-4B on MMVM benchmark, download the pretrained weights from here and run

python run.py --data MMatch --model colva_internvl2_4b --verbose

To evaluate CoLVA-Qwen2VL-2B on MMVM benchmark, download the pretrained weights from here and run

python run.py --data MMatch --model colva_qwen2vl_2b --verbose

To evaluate CoLVA-Qwen2VL-7B on MMVM benchmark, download the pretrained weights from here and run

python run.py --data MMatch --model colva_qwen2vl_7b --verbose

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
docs		docs
imgs		imgs
requirements		requirements
scripts		scripts
vlmeval		vlmeval
.env		.env
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback