-
Hong Kong University of Science and Technology
- Hong Kong
Stars
[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
Wan: Open and Advanced Large-Scale Video Generative Models
A framework to convert any 2D videos to immersive stereoscopic 3D
[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"
Deep Sketch-guided Cartoon Video Inbetweening
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
A curated list of recent diffusion models for video generation, editing, and various other applications.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Awesome Lists for Tenure-Track Assistant Professors and PhD students. (助理教授/博士生生存指南)
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
Stable Video Diffusion Training Code and Extensions.
Generative Models by Stability AI
[ECCV 2024] HiFi-123: Towards High-fidelity One Image to 3D Content Generation
A unified framework for 3D content generation.
[ICCV 2023] Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Official code release for ICCV2023 paper AG3D: Learning to Generate 3D Avatars from 2D Image Collections
Inference code for "DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars"
Hosts the Multiface dataset, which is a multi-view dataset of multiple identities performing a sequence of facial expressions.
Collecting papers about new view synthesis
This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.