[Project Page] [3DIS Paper] [3DIS-FLUX Paper] [Huggingface Page]
- 2025-01-22: Our paper 3DIS is accepted by ICLR 2025.
- 2025-01-27: We have released the code for rendering with the SD1.x model to meet the needs of more researchers.
- Code
- pretrained weights
- 3DIS GUI
- More Demos
conda create -n 3DIS python=3.10 -y
conda activate 3DIS
pip install -r requirement.txt
pip install -e .
cd segment-anything-2
pip install -e . --no-deps
cd ..
Step1 Download the checkpoint of the fine-tuned Text-to-Depth model, unet_0901.ckpt.
Step2 Download the checkpoint of the trained Layout-to-Depth Adapter, layout_adapter.ckpt.
You can also get our pretrained weights from Huggingface🤗.
Step3 Download the checkpoint of the SAM2, sam2_hiera_large.pt.
Step4 put them under the 'pretrained_weights' folder.
├── pretrained_weights
│ ├── unet_0901.ckpt
│ ├── layout_adapter.ckpt
│ ├── sam2_hiera_large.pt
├── threeDIS
│ ├── ...
├── scripts
│ ├── ...
You can quickly run inference for layout-to-depth generation using the following command:
python scripts/inference_layout2depth_demo0.py
python scripts/inference_layout2depth_demo1.py
You can quickly run inference for FLUX rendering using the following command:
python scripts/inference_flux_rendering_sam_demo0.py --width=768 --height=1024 --i2i=4 --use_sam_enhance
python scripts/inference_flux_rendering_sam_demo1.py --use_sam_enhance --res=512 --i2i=4
python scripts/inference_flux_rendering_sam_demo2.py --use_sam_enhance --res=768 --i2i=3
You can quickly run inference for SD1.x rendering using the following command:
python scripts/inference_sd1_rendering_sam_demo0.py --control_CN --fft
Due to the limited generation capabilities of the SD1.x model, you can also try other better base models on Civitai, such as CetusMix, RV60B1, etc., to achieve better generation results.
python scripts/inference_sd1_rendering_sam_demo1.py --control_CN --fft
More interesting demos will be coming soon!!!
You can quickly run inference for end-to-end layout-to-image generation using the following command:
python scripts/inference_layout2image_demo0.py --use_sam_enhance
You can also apply our method to render the scene depth map extracted from a real-world image:
python scripts/inference_flux_rendering_sam_demo3.py --height=512 --width=768 --i2i=4 --use_sam_enhance
python scripts/inference_flux_rendering_sam_demo5.py --height=768 --width=640 --i2i=2
Rendering with the Miku LoRA:
python scripts/inference_flux_rendering_sam_demo4.py --height=1024 --width=768 --i2i=2
Use the following command to create a scene depth map with 3DIS GUI:
cd 3dis_gui
python layout2depth_app.py --port=3421
Use the following command to render the scene depth map with 3DIS GUI using FLUX:
cd 3dis_gui
python flux_rendering_app.py --port=3477
If you find this repository useful, please use the following BibTeX entry for citation.
@article{zhou20243dis,
title={3dis: Depth-driven decoupled instance synthesis for text-to-image generation},
author={Zhou, Dewei and Xie, Ji and Yang, Zongxin and Yang, Yi},
journal={arXiv preprint arXiv:2410.12669},
year={2024}
}
@article{zhou20253disflux,
title={3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering},
author={Zhou, Dewei and Xie, Ji and Yang, Zongxin and Yang, Yi},
journal={arXiv preprint arXiv:2501.05131},
year={2025}
}