Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

This repository contains the official implementation of the In-Subtree Probability and the Subtree Coverage Score, two metrics for measuring the hypernymy capabilities in text-to-image generation. Also, this repository contains the code of experiments from the corresponding paper.

Installation

Clone the repository.
pip install -r requirements.txt
pip install -e .
If you want to evaluate GLIDE, install it using the tutorial from the official repository.

How to evaluate a model

Generate synset images from the model using generate_images.py.
- Example: python scripts/generate_images.py --model_name=CompVis/ldm-text2im-large-256 --images_per_synset=32 --remove_leaves=True --batch_size=16 --image_size=224 --images_path=PATH_TO_IMAGES --gpu_id=0 --use_float16=True
Classify them using classify_images.py.
- Example: python scripts/classify_images.py --model_name=vit_b_16 --images_path=PATH_TO_IMAGES --logits_path=PATH_TO_LOGITS --batch_size=256 --gpu_id=0
Calculate the metrics using calculate_metrics.py.
- The metrics proposed in the paper are called SubtreeInProb and SubtreeIS.
- Example: python scripts/calculate_metrics.py --logits_path=PATH_TO_LOGITS/vit_b_16_logits.npz --metric_names=SubtreeInProb,SubtreeIS --metrics_path=PATH_TO_METRICS
- You also need to divide the SCS (SubtreeIS value) by ~1.6236 to get the leaderboard value.

For further details on how to run the scripts, please read the absl app flags.

Structure of the repository

laion_400M/ – code for counting lemmas in laion_400M.
notebooks/ – jupyter notebooks with our analyses.
scripts/ – code for launching metric computations.
src/ – code for our metrics.
wordnet_classes/ – imagenet to wordnet mappings.

Scripts

generate_images.py – generates images for synsets using the Diffusers text-to-image pipeline.
generate_images_glide.py – generates images for synsets from GLIDE.
classify_images.py – classifies the generated images.
calculate_metrics.py – calculates the metrics from logits.
generate_coco.py – generates MS-COCO samples from a diffusers text-to-image pipeline.
generate_coco_glide.py – generates MS-COCO samples from GLIDE.

Notebooks

analysis.ipynb – computing the main metrics from logits, granular comparison, unknown concepts.
sweeps.ipynb – guidance and number of diffusion steps sweep plots.
human_eval.ipynb – human evaluation results.
coco_metrics.ipynb – computing MS COCO metrics (FID and CLIPScore).
train_data_correlation.ipynb – computing correlation between synset metric and frequency in dataset.

For further details on how to run the scripts, please read the absl app flags.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

Installation

How to evaluate a model

Structure of the repository

Scripts

Notebooks

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

Installation

How to evaluate a model

Structure of the repository

Scripts

Notebooks