Skip to content

Official code for No Annotations for Object Detection in Art through Stable Diffusion (WACV 2025)

Notifications You must be signed in to change notification settings

patrick-john-ramos/nada

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NADA

Official code for No Annotations for Object Detection in Art through Stable Diffusion (WACV 2025)

[📖 Paper] [🖥️ Project Page]

Setup

This repository is composed of three folders corresponding to different parts of training or evaluting NADA. The code is organized this way to prevent conflicting dependicies.

  • prompt-to-prompt

    This folder contains code for the class proposers not based on LLaVA and the class-conditioned detector. This uses code from Google's prompt-to-prompt repository and DAAM.

    Create a Python virtual environment and pip install the corresponding requirements file to set up the folder.

    • For the class-conditioned detector and weakly-supervised class proposer

       cd prompt-to-prompt
       python -m venv env
       source env/bin/activate
       pip install -r requirements.txt
    • For the non-LLaVA zero-shot class proposers

       cd prompt-to-prompt
       python -m venv cp_env
       source cp_env/bin/activate
       pip install -r cp_requirements.txt
  • detectron2

    Code for evaluating predictions made by NADA. Bounding boxes are saved in the COCO format, so we use Meta's Detectron2 library to evaluate them.

    Create a virtual environment and pip install from requirements.txt to set it up.

     cd detectron2
     python -m venv env
     source env/bin/activate
     pip install -r requirements.txt
  • LLaVA

    Code for generating outputs with LLaVA. We use LLaVA for our zero-shot class proposer and for caption prompt construction. This uses code from the official LLaVA repository.

    Create a Python environment and install from folder to set it up.

     cd LLaVA
     python -m venv env
     source env/bin/activate
     pip install -e .

Preparing data

Download ArtDL and IconArt and place the ArtDL and IconArt_v1 folders in a data folder at the root of the repository.

Using NADA

Using the class proposer

Weakly-supervised class proposer

Run prompt-to-prompt/classify/fc.py to train and perform inference (to create labels for use with the class-conditioned detector) with the weakly-supervised class proposer.

cd prompt-to-prompt
python classify/fc.py \
--dataset {artdl, iconart} \
--classification-type {single, multi} \
--data-type images \
--modes {train, eval, label} \
--num-layers {2, 3} \
--checkpoint checkpoints/{artdl, iconart}/checkpoint.ckpt \
--save-dir labels/{ex. artdl_wscp}

Specify --eval-label-split {} when eval or label (inference) is includes in --modes. Refer to prompt-to-prompt/data/classify_with_labels.py for the splits per dataset. Items in {} are options/examples.

Zero-shot class proposer

Run LLaVA/classify.py to train the zero-shot class proposer.

cd LLaVA
python classify.py \
--dataset {artdl, iconart} \
--prompt {who, score}
--dataset-split {}
--save-dir ../prompt-to-prompt/labels/{ex. artdl_zscp}

Use --prompt who (the choice prompt in the paper) for artdl and --prompt score (the score prompt in the paper) for iconart.

Using the class-conditioned detector

The class-conditioned detector uses the labels inferred by the class proposer to perform detection requires no training. The detector relies on a text prompt, and we support two kinds of prompt construction.

Template prompt construction

Template prompt construction inserts the labels into templates à la CLIP. Run prompt-to-prompt/generate.py:

cd prompt-to-prompt
python generate.py \
--dataset {artdl, iconart} \
--dataset-split {} \
--prompt-type {} \
--save-dir annotations/{ex. artdl_wscp} \
--label-dir labels/{ex. artdl_wscp}

In the paper, we use --prompt-type wikipedia for artdl and --prompt-type custom_1 for iconart.

Caption prompt construction

Caption prompt construction uses a caption containing the label as a prompt. First, create captions using LLaVA/caption.py:

cd LLaVA
python caption.py \
--dataset {artdl, iconart \
--dataset-split {} \
--prompt-type \
--label-dir {ex. ../prompt-to-prompt/labels/artdl_wscp} \
--save-dir {ex. ../prompt-prompt/captions/artdl_wscp}

Then run LLaVA/check_captions.py to check if the captions contain the labels at indices within the maximum input length of the diffusion model, and modify them if necessary.

--dataset {artdl, iconart \
--dataset-split {} \
--prompt-type \
--save-dir {ex. ../prompt-prompt/captions/artdl_wscp}

Once the captions are ready, use prompt-to-prompt/generate.py like in template prompt construction, but instead of --label-dir, use --caption-dir.

Evaluation

Use the nada_eval.ipynb notebook in LLaVA.

Citation

@InProceedings{Ramos_2025_WACV,
    author    = {Ramos, Patrick and Gonthier, Nicolas and Khan, Selina and Nakashima, Yuta and Garcia, Noa},
    title     = {No Annotations for Object Detection in Art through Stable Diffusion},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025}
}

About

Official code for No Annotations for Object Detection in Art through Stable Diffusion (WACV 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published