You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How difficult would it be to support I2V training?
I am new to video transformers diffusion and more of a generalist LLM Engineer with Application Experience.
I suspect it would be as simple as taking the input video which will be the target latent.
The sample would be a copy of the latent.
Mask out everything other than the first frame.
Sample noise for the remaining frames.
Add a binary mask to the latent.
The diffusion transformer predicts the noise to be removed from the sample latent given the first frame using the target.
Feature request / 功能建议
How difficult would it be to support I2V training?
I am new to video transformers diffusion and more of a generalist LLM Engineer with Application Experience.
I suspect it would be as simple as taking the input video which will be the target latent.
The sample would be a copy of the latent.
Mask out everything other than the first frame.
Sample noise for the remaining frames.
Add a binary mask to the latent.
The diffusion transformer predicts the noise to be removed from the sample latent given the first frame using the target.
Does this sound correct?
Or do I have this wrong @sayakpaul @a-r-r-o-w
Motivation / 动机
Want to make a PR for this
Your contribution / 您的贡献
None as of yet still having conversations
The text was updated successfully, but these errors were encountered: