SketchVideo: Sketch-based Video Generation and Editing

CVPR 2025

🚀 Introduction

We propose SketchVideo, which aim to achieve sketch-based spatial and motion control for video generation and support fine-grained editing of real or synthetic videos. Please check our project page and paper for more information.

1. Sketch-based Video Generation

1.1 One keyframe Input Sketch

Input frame	Generated video	Input frame	Generated video

1.2 Two keyframe Input Sketches

Input frame	Generated video	Input frame	Generated video

2. Sketch-based Video Editing

2.1 One keyframe Input Sketch

Input sketch	Original video	Generated video

2.2 Two keyframe Input Sketches

Input sketch 1	Input sketch 2	Original video	Generated video

📝 Changelog

[2025.04.01]: 🔥🔥 Release code and model weights.
[2025.03.30]: Launch the project page and update the arXiv preprint.

🧰 Models

Model	Resolution	GPU Mem. & Inference Time (A100, ddim 50steps)	Checkpoint
SketchGen	720x480	~21G & 95s	Hugging Face
SketchEdit	720x480	~23G & 230s	Hugging Face

Our method is built based on pretrained CogVideo-2b model. We add an additional sketch conditional network for sketch-based generation and editing.

Currently, our SketchVideo can support generating videos of up to 49 frames with a resolution of 720x480. For generation, we assume the sketches have a resolution of 720x480. For editing, we assume the input video has 49 frames with a resolution of 720x480.

The inference time can be reduced by using fewer DDIM steps.

⚙️ Setup

Install Environment via Anaconda (Recommended)

conda create -n sketchvideo python=3.10
conda activate sketchvideo
pip install -r requirements.txt

Notably, diffusers==0.30.1 is required.

💫 Inference

1. Sketch-based Video Generation

Download pretrained SketchGen network [hugging face] and pretrained CogVideo-2b [hugging face] video generation model. Then, modify the --control_checkpoint_path and --cogvideo_checkpoint_path in scripts to corresponding paths.

Generate video based on single keyframe sketch.

cd generation
sh scripts/test_sketch_gen_single.sh

Generate video based on two keyframe sketches.

cd generation
sh scripts/test_sketch_gen_two.sh

2. Sketch-based Video Editing

Download pretrained SketchEdit network [hugging face] and pretrained CogVideo-2b [hugging face] video generation model. Then, for each editing example, modify the config.py in editing/editing_exp folder. Change controlnet_path into SketchEdit weights path, and vae_path, pipeline_path into CogVideo weights path.

Edit video based on keyframe sketches.

cd editing
sh scripts/test_sketch_edit.sh

It contains the editing examples based on one or two keyframe sketches.

😉 Citation

Please consider citing our paper if our code is useful:

@inproceedings{Liu2025sketchvideo,
  author    = {Liu, Feng-Lin and Fu, Hongbo and Wang, Xintao and Ye, Weicai and Wan, Pengfei and Zhang, Di and Gao, Lin},
  title     = {SketchVideo: Sketch-based Video Generation and Editing},
  booktitle    = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition},
  publisher    = {{IEEE}},
  year         = {2025},
}

🙏 Acknowledgements

We thanks the projects of video generation models CogVideoX and ControlNet. Our code introduction is modified from ToonCrafter template.

📢 Disclaimer

Our framework achieves interesting sketch-based video generation and editing, but due to the variaity of generative video prior, the success rate is not guaranteed. Different random seeds can be tried to generate the best video generation results.

This project strives to impact the domain of AI-driven video generation positively. Users are granted the freedom to create videos using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SketchVideo: Sketch-based Video Generation and Editing

🚀 Introduction

1. Sketch-based Video Generation

1.1 One keyframe Input Sketch

1.2 Two keyframe Input Sketches

2. Sketch-based Video Editing

2.1 One keyframe Input Sketch

2.2 Two keyframe Input Sketches

📝 Changelog

🧰 Models

⚙️ Setup

Install Environment via Anaconda (Recommended)

💫 Inference

1. Sketch-based Video Generation

2. Sketch-based Video Editing

😉 Citation

🙏 Acknowledgements

📢 Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

SketchVideo: Sketch-based Video Generation and Editing

🚀 Introduction

1. Sketch-based Video Generation

1.1 One keyframe Input Sketch

1.2 Two keyframe Input Sketches

2. Sketch-based Video Editing

2.1 One keyframe Input Sketch

2.2 Two keyframe Input Sketches

📝 Changelog

🧰 Models

⚙️ Setup

Install Environment via Anaconda (Recommended)

💫 Inference

1. Sketch-based Video Generation

2. Sketch-based Video Editing

😉 Citation

🙏 Acknowledgements

📢 Disclaimer