Skip to content

Latest commit

 

History

History
343 lines (278 loc) · 9.81 KB

README.md

File metadata and controls

343 lines (278 loc) · 9.81 KB

SketchVideo: Sketch-based Video Generation and Editing

   

CVPR 2025

🚀 Introduction

We propose SketchVideo, which aim to achieve sketch-based spatial and motion control for video generation and support fine-grained editing of real or synthetic videos. Please check our project page and paper for more information.

1. Sketch-based Video Generation

1.1 One keyframe Input Sketch

Input frame Generated video Input frame Generated video

1.2 Two keyframe Input Sketches

Input frame Generated video Input frame Generated video

2. Sketch-based Video Editing

2.1 One keyframe Input Sketch

Input sketch Original video Generated video

2.2 Two keyframe Input Sketches

Input sketch 1 Input sketch 2 Original video Generated video

📝 Changelog

  • [2025.04.01]: 🔥🔥 Release code and model weights.
  • [2025.03.30]: Launch the project page and update the arXiv preprint.

🧰 Models

Model Resolution GPU Mem. & Inference Time (A100, ddim 50steps) Checkpoint
SketchGen 720x480 ~21G & 95s Hugging Face
SketchEdit 720x480 ~23G & 230s Hugging Face

Our method is built based on pretrained CogVideo-2b model. We add an additional sketch conditional network for sketch-based generation and editing.

Currently, our SketchVideo can support generating videos of up to 49 frames with a resolution of 720x480. For generation, we assume the sketches have a resolution of 720x480. For editing, we assume the input video has 49 frames with a resolution of 720x480.

The inference time can be reduced by using fewer DDIM steps.

⚙️ Setup

Install Environment via Anaconda (Recommended)

conda create -n sketchvideo python=3.10
conda activate sketchvideo
pip install -r requirements.txt

Notably, diffusers==0.30.1 is required.

💫 Inference

1. Sketch-based Video Generation

Download pretrained SketchGen network [hugging face] and pretrained CogVideo-2b [hugging face] video generation model. Then, modify the --control_checkpoint_path and --cogvideo_checkpoint_path in scripts to corresponding paths.

Generate video based on single keyframe sketch.

cd generation
sh scripts/test_sketch_gen_single.sh

Generate video based on two keyframe sketches.

cd generation
sh scripts/test_sketch_gen_two.sh

2. Sketch-based Video Editing

Download pretrained SketchEdit network [hugging face] and pretrained CogVideo-2b [hugging face] video generation model. Then, for each editing example, modify the config.py in editing/editing_exp folder. Change controlnet_path into SketchEdit weights path, and vae_path, pipeline_path into CogVideo weights path.

Edit video based on keyframe sketches.

cd editing
sh scripts/test_sketch_edit.sh

It contains the editing examples based on one or two keyframe sketches.

😉 Citation

Please consider citing our paper if our code is useful:

@inproceedings{Liu2025sketchvideo,
  author    = {Liu, Feng-Lin and Fu, Hongbo and Wang, Xintao and Ye, Weicai and Wan, Pengfei and Zhang, Di and Gao, Lin},
  title     = {SketchVideo: Sketch-based Video Generation and Editing},
  booktitle    = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition},
  publisher    = {{IEEE}},
  year         = {2025},
}

🙏 Acknowledgements

We thanks the projects of video generation models CogVideoX and ControlNet. Our code introduction is modified from ToonCrafter template.

📢 Disclaimer

Our framework achieves interesting sketch-based video generation and editing, but due to the variaity of generative video prior, the success rate is not guaranteed. Different random seeds can be tried to generate the best video generation results.

This project strives to impact the domain of AI-driven video generation positively. Users are granted the freedom to create videos using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.