ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

(March. 2025) Fundemental issue for this repository: ColorizeDiffusion (e-print).
Version 1 - trained with 512px (WACV 2025): ColorizeDiffusion Basic reference-based training. Released.
Version 1.5 - trained with 512px (CVPR 2025): Solving spatial entangelment ColorizeDiffusion 1.5 (e-preprint). Released.
Version 2 - trained with 768px, paper and code: Enhancing background and style transfer. Available soon.
Version XL - trained with 1024px : Enhancing embedding guidance for character colorization, geometry disentanglement. Ongoing.

Model weights are available: https://huggingface.co/tellurion/colorizer.

Implementation Details

The repository offers the implementation of ColorizeDiffusion.
Now, only the noisy model introduced in the paper, which utilizes the local tokens.

Getting Start

To utilize the code in this repository, ensure that you have installed the required dependencies as specified in the requirements.

To install and run:

conda env create -f environment.yaml
conda activate hf

User Interface:

We also provided a Web UI based on Gradio UI. To run it, just:

python -u app.py

Then you can browse the UI in http://localhost:7860/.

Inference:

Important inference options:

Options	Description
Mask guide mode	Activate mask guided attention and corresponding lora weights for colorization.
Crossattn scale	Used to diminish all kinds of artifacts caused by the distribution problem.
Pad reference with margin	Used to diminish spatial entanglement, pad reference to T times of current width.
Reference guidance scale	Classifier-free guidance scale of the reference image, suggested 5.
Sketch guidance scale	Classifier-free guidance scale of the sketch image, suggested 1.
Attention injection	Strengthen similarity with reference.
Visualize	Used for local manipulation. Visualize the regions selected by each threshold.

For artifacts like spatial entanglement (the distribution problem discussed in the paper) like this
Please activate background enhance (optionally with foreground enhance).

Manipulation:

The colorization results can be manipulated using text prompts.

For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.

The manipulation result and correlation visualization of the settings:

Target prompt: the girl's blonde hair
Anchor prompt the girl's brown hair
Control prompt the girl's brown hair, 
Target scale: 8
Enhanced: false
Thresholds: 0.5、0.55、0.65、0.95

As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.

Manipulation options:

Options	Description
Group index	The index of selected manipulation sequences's parameter group.
Target prompt	The prompt used to specify the desired visual attribute for the image after manipulation.
Anchor prompt	The prompt to specify the anchored visaul attribute for the image before manipulation.
Control prompt	Used for local manipulation (crossattn-based models). The prompt to specify the target regions.
Enhance	Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute).
Target scale	The scale used to progressively control the manipulation.
Thresholds	Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 < threshold 0 < threshold 1 < threshold 2 < threshold 3 < 1.0.
<Threshold0	Select regions most related to control prompt. Indicated by deep blue.
Threshold0-Threshold1	Select regions related to control prompt. Indicated by blue.
Threshold1-Threshold2	Select neighbouring but unrelated regions. Indicated by green.
Threshold2-Threshold3	Select unrelated regions. Indicated by orange.
>Threshold3	Select most unrelated regions. Indicated by brown.
Add	Click add to save current manipulation in the sequence.

Code reference

Citation

@article{2024arXiv240101456Y,
       author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru},
        title = "{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}",
      journal = {arXiv e-prints},
         year = {2024},
          doi = {10.48550/arXiv.2401.01456},
}

@InProceedings{Yan_2025_WACV,
    author    = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru},
    title     = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2025},
    pages     = {5092-5102}
}

@article{2025arXiv250219937Y,
    author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian},
    title = "{Image Referenced Sketch Colorization Based on Animation Creation Workflow}",
    journal = {arXiv e-prints},
    year = {2025},
    doi = {10.48550/arXiv.2502.19937},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
backend		backend
configs		configs
data		data
k_diffusion		k_diffusion
ldm		ldm
libs		libs
preprocessor		preprocessor
refnet		refnet
app.py		app.py
environment.yml		environment.yml
inference.py		inference.py
logger.py		logger.py
options.py		options.py
readme.md		readme.md
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

Implementation Details

Getting Start

To install and run:

User Interface:

Inference:

Important inference options:

Manipulation:

Manipulation options:

Code reference

Citation

About

Releases

Packages

Contributors 2

Languages

tellurion-kanata/colorizeDiffusion

Folders and files

Latest commit

History

Repository files navigation

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

Implementation Details

Getting Start

To install and run:

User Interface:

Inference:

Important inference options:

Manipulation:

Manipulation options:

Code reference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages