Wenhao Sun,
Rong-Cheng Tu,
Jingyi Liao,
Dacheng Tao
Nanyang Technological University
teaser.mp4
If you find this repository helpful, please consider citing our paper:
@article{sun2024v2vsurvey,
author = {Wenhao Sun and Rong-Cheng Tu and Jingyi Liao and Dacheng Tao},
title = {Diffusion Model-Based Video Editing: A Survey},
journal = {CoRR},
volume = {abs/2407.07111},
year = {2024}
}
Overview of diffusion-based video editing model components.
- Network and Training Paradigm | network and data modifications
- Attention Feature Injection | a class of training-free techniques
- Diffusion Latents Manipulation | manipulate diffusion process
- Canonical Representation | the efficient video representation
- Novel Conditioning | condtional video editing
Tip
The papers are listed in reverse chronological order, formatted as follows: (Conference/Journal Year) Title, Authors
- (Preprint 24') VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing, Gu et al.
- (ICML 24') Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices, Cohen et al.
- (ECCV 24') Video Editing via Factorized Diffusion Distillation, Singer et al.
- (CVPR 24') MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers, Ma et al.
- (Preprint 23') Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis, Wu et al.
- (CVPR 24') VidToMe: Video Token Merging for Zero-Shot Video Editing, Li et al.
- (CVPR 24') SimDA: Simple Diffusion Adapter for Efficient Video Generation, Xing et al.
- (NeurIPS 23') Towards Consistent Video Editing with Text-to-Image Diffusion Models, Zhang et al.
- (ICCV 23') Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, Wu et al.
- (Preprint '24) EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing, Yang et al.
- (IJCAI '24) Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models, Duan et al.
- (CVPR '24) FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis, Liang et al.
- (Preprint '23) Motion-Conditioned Image Animation for Video Editing, Yan et al.
- (CVPR '24) LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation, Wu et al.
- (ICLR '24) Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models, Jeong and Ye
- (Preprint '23) CCEdit: Creative and Controllable Video Editing via Diffusion Models, Feng et al.
- (Preprint '23) MagicEdit: High-Fidelity and Temporally Coherent Video Editing, Liew et al.
- (Preprint '23) VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet, Hu and Xu
- (NeurIPS '23) VideoComposer: Compositional Video Synthesis with Motion Controllability, Wang et al.
- (Preprint '23) Structure and Content-Guided Video Synthesis with Diffusion Models, Esser et al.
- (Preprint 24') Generative Video Propagation, Liu et al.
- (Preprint 24') Movie Gen: A Cast of Media Foundation Models, Polyak et al.
- (Preprint 24') Still-Moving: Customized Video Generation without Customized Video Data, Chefer et al.
- (Preprint 24') EffiVED: Efficient Video Editing via Text-instruction Diffusion Models, Zhang et al.
- (ECCV 24') Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models, Ren et al.
- (Preprint 24') VASE: Object-Centric Appearance and Shape Manipulation of Real Videos, Peruzzo et al.
- (Preprint 23') Customizing Motion in Text-to-Video Diffusion Models, Materzynska et al.
- (ECCV 24') SAVE: Protagonist Diversification with Structure Agnostic Video Editing, Song et al.
- (CVPR 24') VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models, Jeong et al.
- (CVPR 24') DreamVideo: Composing Your Dream Videos with Customized Subject and Motion, Wei et al.
- (ICLR 24') Consistent Video-to-Video Transfer Using Synthetic Dataset, Cheng et al.
- (Preprint 23') VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models, Xing et al.
- (ECCV 24') MotionDirector: Motion Customization of Text-to-Video Diffusion Models, Zhao et al.
- (ICME 24') InstructVid2Vid: Controllable Video Editing with Natural Language Instructions, Qin et al.
- (Preprint 23') Dreamix: Video Diffusion Models are General Video Editors, Molad et al.
- (TMLR 24') AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks, Ku et al.
- (ECCV 24') Object-Centric Diffusion: Efficient Video Editing, Kahatapitiya et al.
- (Preprint 24') UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing, Bai et al.
- (Preprint 23') Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts, Zhao et al.
- (Preprint 23') Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models, Wang et al.
- (ICCV 23') FateZero: Fusing Attentions for Zero-shot Text-based Video Editing, Qi et al.
- (ACML 23') Edit-A-Video: Single Video Editing with Object-Aware Consistency, Shin et al.
- (Preprint 23') Video-P2P: Video Editing with Cross-attention Control, Liu et al.
- (CVPR 24') FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation, Yang et al.
- (ICLR 24') FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing, Cong et al.
- (ICLR 24') TokenFlow: Consistent Diffusion Features for Consistent Video Editing, Geyer et al.
- (CVPR 24') A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing, Li et al.
- (Preprint 23') Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models, Chu et al.
- (Preprint 23') Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models, Chen et al.
- (ICCV 23') Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators, Khachatryan et al.
- (ICML 24') FRAG: Frequency Adapting Group for Diffusion Video Editing, Yoon et al.
- (CVPR 24') GenVideo: One-shot target-image and shape-aware video editing using T2I diffusion models, Harsha et al.
- (Preprint 24') MotionClone: Training-Free Motion Cloning for Controllable Video Generation, Ling et al.
- (CVPR 24') RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models, Kara et al.
- (CVPR 24') Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer, Yatim et al.
- (Preprint 23') DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis, Duan et al.
- (SIGGRAPH 23') Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation, Yang et al.
- (ICLR 23') ControlVideo: Training-free Controllable Text-to-Video Generation, Zhang et al.
- (ICCV 23') Pix2Video: Video Editing using Image Diffusion, Ceylan et al.
- (Preprint 23') Neural Video Fields Editing: Neural Video Fields Editing, Yang et al.
- (Preprint 23') DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing, Chang et al.
- (ICCV 23') StableVideo: Text-driven Consistency-aware Diffusion Video Editing, Chai et al.
- (CVPR 24') CoDeF: Content Deformation Fields for Temporally Consistent Video Processing, Ouyang et al.
- (TMLR 24') VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing, Couairon et al.
- (CVPR 23') Shape-aware Text-driven Layered Video Editing, Lee et al.
- (SIGGRAPH 24') MotionCtrl: A Unified and Flexible Motion Controller for Video Generation, Wang et al.
- (Preprint 23') Drag-A-Video: Non-rigid Video Editing with Point-based Interaction, Teng et al.
- (ECCV 24') DragVideo: Interactive Drag-style Video Editing, Deng et al.
- (CVPR 24') VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence, Placeholder et al.
- (Preprint 24') StableAnimator: High-Quality Identity-Preserving Human Image Animation, Tu et al.
- (SINGGRAPH Asia 24') Fashion-VDM: Video Diffusion Model for Virtual Try-On, Karras et al.
- (Preprint 24') Animate-X: Universal Character Image Animation with Enhanced Motion Representation, Tan et al.
- (SCIS 24') UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation, Wang et al.
- (Preprint 24') Zero-shot High-fidelity and Pose-controllable Character Animation, Zhu et al.
- (Preprint 23') Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Hu et al.
- (Preprint 23') MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model, Xu et al.
- (ICML 24') MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion, Chang et al.
- (CVPR 24') DisCo: Disentangled Control for Realistic Human Dance Generation, Wang et al.
- (ICCV 23') DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion, Karras et al.
- (AAAI 23') Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos, Ma et al.
- [28 Nov 2024] Update the list format to enhance clarity.