Authors: Lingjun Mao, Zineng Tang, Alane Suhr
We study the perception of color illusions by vision-language models. Color illusion, where a person's visual system perceives color differently from actual color, is well-studied in human vision. However, it remains underexplored whether vision-language models (VLMs), trained on large-scale human data, exhibit similar perceptual biases when confronted with such color illusions. We propose an automated framework for generating color illusion images, resulting in RCID (Realistic Color Illusion Dataset), a dataset of 19,000 realistic illusion images. Our experiments show that all studied VLMs exhibit perceptual biases similar to human vision. Finally, we train a model to distinguish both human perception and actual pixel differences.
-
We propose an automated framework for generating realistic illusion images and create a large, realistic dataset of color illusion images, named Realistic Color Illusion Dataset (RCID), to enhance the fairness and accuracy of model testing.
-
We investigate underlying mechanisms of color illusions in VLMs, highlighting the combined influence of the vision system and prior knowledge. We also explore how external prompts and instruction tuning impact the models' performance on these illusions.
-
We propose a simple training method that enables models to understand human perception while also recognizing the actual pixel values.
The construction of our dataset involves three steps:
-
Image Generation. For contrast and stripe illusions, we use procedural code to generate simple illusion images, which are then processed by ControlNet to create realistic illusion images. For filter illusions, we directly apply contrasting color filters to the original images. Each type of illusion also includes a corresponding control group without any illusions for comparison.
-
Question Generation. We use GPT-4o to generate image-specific questions that are designed to evaluate the model's understanding of the illusion.
-
Human Feedback. We collect human participants' feedback on these images and adjust the original classification of “illusion” and “non-illusion” based on whether participants are deceived.
Our data can be found in the following link: RCID Dataset
The model used to generate the data is available at: Color Diffusion.
Clone the repository to your local machine:
git clone https://github.com/mao1207/RCID.git
After cloning the repository, set up your Conda environment. You need to install the following dependencies:
conda create -n rcid python=3.8
conda activate rcid
conda install pytorch torchvision -c pytorch
pip install accelerate>=0.16.0
pip install transformers>=4.25.1
pip install ftfy
pip install tensorboard
pip install datasets
Ensure that your environment is correctly configured with the above dependencies.
To simplify realistic images into basic shapes and colors for training ControlNet, first run the following Python script to quantize the original dataset images:
python ControlNet_training/generate_quantified_images.py
This script will convert realistic images into simplified versions to be used for training.
Once the dataset is prepared, you can begin training ControlNet with the simplified dataset. Run the following command to start training:
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="Your_model_dir"
python train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=/path/to/cocos-dataset/cocos \
--resolution=512 \
--learning_rate=5e-4 \
--train_batch_size=4 \
--num_train_epochs=20 \
--checkpointing_steps=2000
- Replace
/path/to/cocos-dataset/cocos
with the actual path to your dataset. - The
MODEL_DIR
should point to the pretrained model you want to use, for example,runwayml/stable-diffusion-v1-5
. - Adjust the
OUTPUT_DIR
to the location where you want to save the trained model.
Once ControlNet has been trained, you can generate images with color illusions.
Run the following script to generate simplified images with color illusions:
python dataset_construction/simple_contrast_illusion_generation.py
This will create a set of simplified images with color illusions.
To convert the simplified color illusion images into realistic images, use the following script:
python dataset_construction/realistic_illusion_generation.py
This script will generate realistic versions of the illusion images.
Finally, generate corresponding questions for each image using the following command:
python dataset_construction/question_generation.py
This will generate questions related to the color differences and illusions present in the images.
- Make sure your dataset is in the correct format before running the scripts.
- Ensure that you have sufficient disk space, as image generation and training require significant resources.
- You can adjust hyperparameters in the
train_controlnet.py
script based on your hardware and training requirements.
The source code of this repository is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages.
For more information, access to the dataset, and to contribute, please visit our Website.