Skip to content
/ 3DIS Public

[ICLR 2025] 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Notifications You must be signed in to change notification settings

limuloo/3DIS

Repository files navigation

3DIS: DEPTH-DRIVEN DECOUPLED INSTANCE SYNTHESIS FOR TEXT-TO-IMAGE GENERATION

[Project Page] [3DIS Paper] [3DIS-FLUX Paper] [Huggingface Page]

🔥🔥🔥 News

  • 2025-01-22: Our paper 3DIS is accepted by ICLR 2025.
  • 2025-01-27: We have released the code for rendering with the SD1.x model to meet the needs of more researchers.

attr_control

To Do List

  • Code
  • pretrained weights
  • 3DIS GUI
  • More Demos

Installation

Conda environment setup

conda create -n 3DIS python=3.10 -y
conda activate 3DIS
pip install -r requirement.txt
pip install -e .
cd segment-anything-2
pip install -e . --no-deps
cd ..

Checkpoints 🚀

Step1 Download the checkpoint of the fine-tuned Text-to-Depth model, unet_0901.ckpt.

Step2 Download the checkpoint of the trained Layout-to-Depth Adapter, layout_adapter.ckpt.

You can also get our pretrained weights from Huggingface🤗.

Step3 Download the checkpoint of the SAM2, sam2_hiera_large.pt.

Step4 put them under the 'pretrained_weights' folder.

├── pretrained_weights
│   ├── unet_0901.ckpt
│   ├── layout_adapter.ckpt
│   ├── sam2_hiera_large.pt
├── threeDIS
│   ├── ...
├── scripts
│   ├── ...

Layout-to-Depth Generation 🎨

Single Image Example

You can quickly run inference for layout-to-depth generation using the following command:

python scripts/inference_layout2depth_demo0.py

example

python scripts/inference_layout2depth_demo1.py

example

Rendering Generated Scene with Various Models 🌈

Rendering with FLUX ✨

You can quickly run inference for FLUX rendering using the following command:

python scripts/inference_flux_rendering_sam_demo0.py  --width=768 --height=1024 --i2i=4 --use_sam_enhance

example

python scripts/inference_flux_rendering_sam_demo1.py  --use_sam_enhance --res=512 --i2i=4

example

python scripts/inference_flux_rendering_sam_demo2.py  --use_sam_enhance --res=768 --i2i=3

example

Rendering with SD1.x 🖼️

You can quickly run inference for SD1.x rendering using the following command:

python scripts/inference_sd1_rendering_sam_demo0.py  --control_CN  --fft

example

Due to the limited generation capabilities of the SD1.x model, you can also try other better base models on Civitai, such as CetusMix, RV60B1, etc., to achieve better generation results.

python scripts/inference_sd1_rendering_sam_demo1.py  --control_CN  --fft

example

More interesting demos will be coming soon!!!

End-to-end Layout-to-Image Generation 📐

You can quickly run inference for end-to-end layout-to-image generation using the following command:

python scripts/inference_layout2image_demo0.py --use_sam_enhance

example

Rendering Real Scene Depth Maps 📚

You can also apply our method to render the scene depth map extracted from a real-world image:

python scripts/inference_flux_rendering_sam_demo3.py  --height=512 --width=768 --i2i=4 --use_sam_enhance

example

python scripts/inference_flux_rendering_sam_demo5.py  --height=768 --width=640 --i2i=2

example

Rendering with LoRA📚

Rendering with the Miku LoRA:

python scripts/inference_flux_rendering_sam_demo4.py  --height=1024 --width=768 --i2i=2

example

Create with 3DIS GUI ⭐️

Use the following command to create a scene depth map with 3DIS GUI:

cd 3dis_gui
python layout2depth_app.py --port=3421

Demo2

Use the following command to render the scene depth map with 3DIS GUI using FLUX:

cd 3dis_gui
python flux_rendering_app.py --port=3477

Demo2

Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@article{zhou20243dis,
  title={3dis: Depth-driven decoupled instance synthesis for text-to-image generation},
  author={Zhou, Dewei and Xie, Ji and Yang, Zongxin and Yang, Yi},
  journal={arXiv preprint arXiv:2410.12669},
  year={2024}
}

@article{zhou20253disflux,
  title={3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering},
  author={Zhou, Dewei and Xie, Ji and Yang, Zongxin and Yang, Yi},
  journal={arXiv preprint arXiv:2501.05131},
  year={2025}
}

About

[ICLR 2025] 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published