You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unsupervised discovery of semantic latent space directions.
2. What's great compared to previous research?
Unsupervised better than supervised because 1) don't need labels 2) can discover more directions than you could with labels alone
GANSpace's PCA approach requires sampling and calculating PCA on the samples. On this takes 1.5 hours on BigGAN, and 2 minutes on StyleGAN.
Recap: PCA works by finding the most important dimensions of data points in a high dimesional space. Requires many data points to be effective.
However SeFa uses a closed form and does not require any PCA computation, rather computes only on the weight values of the latent code's linear projections. This can be done very fast in < 1 second.
authors prove that we only need to examine the trained weights of the first linear projection to find semantic changes in the output image (equation 3)
Quantitative and qualitative results show to be more disentangled than GANSpace
3. Where are the key elements of the technology and method?
2.2 Unsupervised Semantic Factorization
"our goal turns into finding the directions n that can cause the
significant change of y."
"we make an assumption that a large change of [the latent code's linear projection,] y, will lead to a large content variation of [the output image]"
Equation 3: A change in the projection space, ∆y, is equivalent to αAn.
Equation 4: to get the largest change, frame it as a constrained optimization problem.
The constraint is that nTn=1, i.e. unit vectors only. We maximize the L2 norm, akin to distance. Why the constraint? Because taking the "max" over an infinite space is meaningless unless we establish a boundary.
Equation 5: top-k version of equation 4, for most meaningful k directions.
Equation 6: Constrained optimization can be easily solved with the Lagrangian.
Lagrangian recap: left term f(n1, n2, n3...) is the infinite space, right term g(n1, n2, n3...) is the constraint. That - 1 represents the unit circle.
We simply put equation 4/5 into Lagrangian format.
Equation 7: This is the derivative of the Lagrangian in equation 6; we find the max by finding the critical points (where the derivative is 0).
Note: we can simplify by dropping the 2 factor and the solution will be the same.
Implies all possible solutions, where input n_i makes the equation 0, should be the eigenvectors of ATA.
eigenvector recap: transforming a vector n by the matrix A is akin to scaling n up by some scalar. We can see that scalar is λ
eigenvectors are important because they're orthogonal
2.3 Property of the Discovered Semantics
Equation 8: To find eigen values, we can use eigen decomposition. These eigenvalues correspond to eigenvectors n_i of the latent space. This is a special case of the eigen decomposition because ATA is positive semi-definite, see page 5 of these lecture notes, or Wikipedia.
The eigenvectors are orthogonal. From above lecture notes, "The important properties of a positive semi-definite matrix is
that [...] eigenvectors are pairwise orthogonal when their eigenvalues are different."
"Obviously, each n_i is a column of Q". Why? NOT SURE.
Draft: Diagonal matrix Lambda are the eigen values,
Since eigenvectors are orthogonal, they represent independent semantic directions in latent space.
Equation 9: Having orthogonal latent vectors also produces orthogonal projections y.
4. How do the authors measure success?
Quantitative
Compare cosine distance of found vectors to "ground-truth" learned in the supervised InterFaceGAN
Qualitative
Some test results
Appears to disentangle moderately well for some attributes
Advantage of unsupervised over supervised
5. How did you verify that it works?
TODO: apply to generated clothes latent space.
6. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)
Weaknesses
Authors stated goal is to find directions that cause the most perturbation in the output image; but what about entanglement? A big change is not necessarily semantically isolated, but could be a change composed of several semantics.
this is shown in table 2 there is still entanglement
supposedly, disentanglement is addressed by the fact they do an eigen decomposition, which uses orthogonal matrices
as acknowledged by the authors, orthogonal vectors don't totally disentangle: "(masculinizing + aging) and (feminizing + aging) are also two orthogonal directions in the latent space."
The authors make an assumption that a large change in the linear projection of the latent space will lead to a large variation in the output image (section 2.2, first ¶). However, that assumption is not proven in the paper.
Authors assume that the rate of change of the projected space y is monotonic, and that the max change found within the unit circle constraint will hold true outside the unit circle. Is it a guarantee that this is true? Or can the rate of change vary in a latent space? Recall StyleGAN's concept of Perceptual Path Length, an entangled space might be drastically curved.
How does it compare to the PCA approach of GANSpace? PCA can also be done via eigen decomposition?
7. Are there any papers to read next?
8. References
The text was updated successfully, but these errors were encountered:
0. Article Information and Links
1. What do the authors try to accomplish?
Unsupervised discovery of semantic latent space directions.
![ezgif-4-be493b3e769e](https://user-images.githubusercontent.com/8121216/88115744-ba8d8f00-cb6b-11ea-80ea-db1e4635674e.gif)
2. What's great compared to previous research?
3. Where are the key elements of the technology and method?
2.2 Unsupervised Semantic Factorization
"our goal turns into finding the directions
n
that can cause thesignificant change of
y
.""we make an assumption that a large change of [the latent code's linear projection,] y, will lead to a large content variation of [the output image]"
Equation 3: A change in the projection space,
![image](https://user-images.githubusercontent.com/8121216/88111918-ae053880-cb63-11ea-8f3f-2322ff1cf4bb.png)
∆y
, is equivalent toαAn
.Equation 4: to get the largest change, frame it as a constrained optimization problem.
![image](https://user-images.githubusercontent.com/8121216/88113639-0984f580-cb67-11ea-9743-b535c051af45.png)
The constraint is that
nTn=1
, i.e. unit vectors only. We maximize the L2 norm, akin to distance. Why the constraint? Because taking the "max" over an infinite space is meaningless unless we establish a boundary.Equation 5: top-k version of equation 4, for most meaningful k directions.
Equation 6: Constrained optimization can be easily solved with the Lagrangian.
![image](https://user-images.githubusercontent.com/8121216/88113872-8a43f180-cb67-11ea-8555-be19ca5cfe70.png)
f(n1, n2, n3...)
is the infinite space, right termg(n1, n2, n3...)
is the constraint. That- 1
represents the unit circle.Equation 7: This is the derivative of the Lagrangian in equation 6; we find the max by finding the critical points (where the derivative is 0).
![image](https://user-images.githubusercontent.com/8121216/88112778-6384bb80-cb65-11ea-9191-9715f42a984a.png)
Note: we can simplify by dropping the
2
factor and the solution will be the same.n_i
makes the equation 0, should be the eigenvectors ofATA
.n
by the matrixA
is akin to scalingn
up by some scalar. We can see that scalar isλ
2.3 Property of the Discovered Semantics
Equation 8: To find eigen values, we can use eigen decomposition. These eigenvalues correspond to eigenvectors
![image](https://user-images.githubusercontent.com/8121216/88114794-8e710e80-cb69-11ea-80db-fa9ddfb618f2.png)
n_i
of the latent space. This is a special case of the eigen decomposition becauseATA
is positive semi-definite, see page 5 of these lecture notes, or Wikipedia.The eigenvectors are orthogonal. From above lecture notes, "The important properties of a positive semi-definite matrix is
that [...] eigenvectors are pairwise orthogonal when their eigenvalues are different."
"Obviously, each
n_i
is a column of Q". Why? NOT SURE.Draft: Diagonal matrix
Lambda
are the eigen values,Since eigenvectors are orthogonal, they represent independent semantic directions in latent space.
Equation 9: Having orthogonal latent vectors also produces orthogonal projections
![image](https://user-images.githubusercontent.com/8121216/88115067-27a02500-cb6a-11ea-907e-1dc0f0145a44.png)
y
.4. How do the authors measure success?
Quantitative
Compare cosine distance of found vectors to "ground-truth" learned in the supervised InterFaceGAN
![image](https://user-images.githubusercontent.com/8121216/88117922-350cdd80-cb71-11ea-9c08-8127e88640bc.png)
Qualitative
Some test results
![image](https://user-images.githubusercontent.com/8121216/88118263-0d6a4500-cb72-11ea-87bf-6510a2aca4e4.png)
Appears to disentangle moderately well for some attributes
![image](https://user-images.githubusercontent.com/8121216/88124988-2f1ff800-cb83-11ea-8a00-65d5ddb0dee1.png)
Advantage of unsupervised over supervised
![image](https://user-images.githubusercontent.com/8121216/88124942-19aace00-cb83-11ea-90f3-6677faacbada.png)
5. How did you verify that it works?
TODO: apply to generated clothes latent space.
6. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)
Weaknesses
this is shown in table 2 there is still entanglement
![image](https://user-images.githubusercontent.com/8121216/88124863-f122d400-cb82-11ea-9cc6-58808dd1880e.png)
supposedly, disentanglement is addressed by the fact they do an eigen decomposition, which uses orthogonal matrices
y
is monotonic, and that the max change found within the unit circle constraint will hold true outside the unit circle. Is it a guarantee that this is true? Or can the rate of change vary in a latent space? Recall StyleGAN's concept of Perceptual Path Length, an entangled space might be drastically curved.How does it compare to the PCA approach of GANSpace? PCA can also be done via eigen decomposition?
7. Are there any papers to read next?
8. References
The text was updated successfully, but these errors were encountered: