Source code for paper: HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization
We propose HIPPO, which represents tables using both text and image, and optimizes MLLMs to effectively learn more comprehensive table information from these multiple modalities.
Specifically, HIPPO samples model responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during DPO training.
Clone the repository
git clone https://github.com/NEUIR/HIPPO.git
cd HIPPO
Install Dependencies
conda create -n hippo python=3.10
conda activate hippo
pip install -r requirments.txt
Download the MMTab Image
# test
wget https://huggingface.co/datasets/SpursgoZmy/MMTab/resolve/main/MMTab-eval_table_images_23K.zip
mv MMTab-eval_table_images_23K.zip hippo/
unzip MMTab-eval_table_images_23K.zip
# train
wget https://huggingface.co/datasets/SpursgoZmy/MMTab/resolve/main/MMTab-instruct_table_images_82K.zip
mv MMTab-instruct_table_images_82K.zip
unzip MMTab-instruct_table_images_82K.zip
You can download the checkpoint of HIPPO directly from here or go to the scripts
and train the HIPPO model.
For Training, you need to download the model MiniCPM-V-2.6 and data. Then you can go to the scripts
to construct DPO data.
cd scripts
bash construct_dpo_data.bash
You can also use constructed data directly: dpo_data.
Then you can train the model.
cd scripts
bash train.bash
For Inference, you can go to the scripts
and inference on the HIPPO model:
cd scripts
bash inference.sh
For evaluation, you can use src/eval/MMTab_evaluation.ipynb
to evaluate the performance.