FR: TAE for SVD #14

Ednaordinary · 2024-04-07T21:22:10Z

SVD can now get to really fast speeds step wise but is limited by the slow speed of the vae. Any way to distill the temporal spatial auto encoder the same way as the regular auto encoder?

madebyollin · 2024-04-08T02:35:29Z

Definitely possible! I worry there's a fairly narrow band of usefulness for a TAESVD, though, since for cheap previews you can run TAESD per-frame and for max quality you should just run the SVD VAE.

(I started training a TAESVD with temporal layers a few months ago - top is GT, middle is per-frame TAESD, and bottom is TAESD with temporal layers - but I haven't gotten around to finishing it)

Ednaordinary · 2024-04-08T05:49:26Z

Since each frame after the first should only be a difference from the previous frame (not an entirely new frame), is it viable to first decode the first frame with the original SVD vae (should only take about the same memory/speed as undistilled regular SD vae in this context) then decode the rest on a vae that is specifically trained on the difference between the current and previous frame in the latents, outputting a difference map to be applied to the last decoded frame? Kinda like a video codec. Moves away from the simplicity of decoding normally though

weleen · 2024-09-24T04:00:54Z

Definitely possible! I worry there's a fairly narrow band of usefulness for a TAESVD, though, since for cheap previews you can run TAESD per-frame and for max quality you should just run the SVD VAE.

(I started training a TAESVD with temporal layers a few months ago - top is GT, middle is per-frame TAESD, and bottom is TAESD with temporal layers - but I haven't gotten around to finishing it)

Hi @madebyollin, thanks for your work! I am curious to know if there are any updates regarding the TAESVD model.

Since each frame after the first should only be a difference from the previous frame (not an entirely new frame), is it viable to first decode the first frame with the original SVD vae (should only take about the same memory/speed as undistilled regular SD vae in this context) then decode the rest on a vae that is specifically trained on the difference between the current and previous frame in the latents, outputting a difference map to be applied to the last decoded frame? Kinda like a video codec. Moves away from the simplicity of decoding normally though

Hi @Ednaordinary, do you find some alternative ways or repos to speed up the decoding process for SVD?

madebyollin · 2024-09-25T15:22:09Z

I've uploaded my initial TAESDV checkpoint + code to https://github.com/madebyollin/taesdv. It's still a bit WIP (see the TODOs in the README) but it should be capable of decoding much smoother videos than single-frame TAESD (while still being really fast).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: TAE for SVD #14

FR: TAE for SVD #14

Ednaordinary commented Apr 7, 2024

madebyollin commented Apr 8, 2024 •

edited

Loading

Ednaordinary commented Apr 8, 2024 •

edited

Loading

weleen commented Sep 24, 2024

madebyollin commented Sep 25, 2024

FR: TAE for SVD #14

FR: TAE for SVD #14

Comments

Ednaordinary commented Apr 7, 2024

madebyollin commented Apr 8, 2024 • edited Loading

Ednaordinary commented Apr 8, 2024 • edited Loading

weleen commented Sep 24, 2024

madebyollin commented Sep 25, 2024

madebyollin commented Apr 8, 2024 •

edited

Loading

Ednaordinary commented Apr 8, 2024 •

edited

Loading