Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SeFa] Closed-Form Factorization of Latent Semantics in GANs (Jul 2020) #21

Open
andrewjong opened this issue Jul 21, 2020 · 0 comments
Assignees

Comments

@andrewjong
Copy link
Owner

andrewjong commented Jul 21, 2020

0. Article Information and Links

1. What do the authors try to accomplish?

Unsupervised discovery of semantic latent space directions.
ezgif-4-be493b3e769e

2. What's great compared to previous research?

  • Unsupervised better than supervised because 1) don't need labels 2) can discover more directions than you could with labels alone
  • GANSpace's PCA approach requires sampling and calculating PCA on the samples. On this takes 1.5 hours on BigGAN, and 2 minutes on StyleGAN.
    • Recap: PCA works by finding the most important dimensions of data points in a high dimesional space. Requires many data points to be effective.
  • However SeFa uses a closed form and does not require any PCA computation, rather computes only on the weight values of the latent code's linear projections. This can be done very fast in < 1 second.
    • authors prove that we only need to examine the trained weights of the first linear projection to find semantic changes in the output image (equation 3)
  • Quantitative and qualitative results show to be more disentangled than GANSpace
    sefa_ganspace_comparison

3. Where are the key elements of the technology and method?

2.2 Unsupervised Semantic Factorization

"our goal turns into finding the directions n that can cause the
significant change of y."

"we make an assumption that a large change of [the latent code's linear projection,] y, will lead to a large content variation of [the output image]"

Equation 3: A change in the projection space, ∆y, is equivalent to αAn.
image

Equation 4: to get the largest change, frame it as a constrained optimization problem.
image
The constraint is that nTn=1, i.e. unit vectors only. We maximize the L2 norm, akin to distance. Why the constraint? Because taking the "max" over an infinite space is meaningless unless we establish a boundary.

Equation 5: top-k version of equation 4, for most meaningful k directions.

Equation 6: Constrained optimization can be easily solved with the Lagrangian.
image

  • Lagrangian recap: left term f(n1, n2, n3...) is the infinite space, right term g(n1, n2, n3...) is the constraint. That - 1 represents the unit circle.
  • We simply put equation 4/5 into Lagrangian format.

Equation 7: This is the derivative of the Lagrangian in equation 6; we find the max by finding the critical points (where the derivative is 0).
image

Note: we can simplify by dropping the 2 factor and the solution will be the same.

  • Implies all possible solutions, where input n_i makes the equation 0, should be the eigenvectors of ATA.
    • eigenvector recap: transforming a vector n by the matrix A is akin to scaling n up by some scalar. We can see that scalar is λ
    • eigenvectors are important because they're orthogonal

2.3 Property of the Discovered Semantics

Equation 8: To find eigen values, we can use eigen decomposition. These eigenvalues correspond to eigenvectors n_i of the latent space. This is a special case of the eigen decomposition because ATA is positive semi-definite, see page 5 of these lecture notes, or Wikipedia.
image

The eigenvectors are orthogonal. From above lecture notes, "The important properties of a positive semi-definite matrix is
that [...] eigenvectors are pairwise orthogonal when their eigenvalues are different."

"Obviously, each n_i is a column of Q". Why? NOT SURE.
Draft: Diagonal matrix Lambda are the eigen values,

Since eigenvectors are orthogonal, they represent independent semantic directions in latent space.

Equation 9: Having orthogonal latent vectors also produces orthogonal projections y.
image

4. How do the authors measure success?

Quantitative

Compare cosine distance of found vectors to "ground-truth" learned in the supervised InterFaceGAN
image

Qualitative

Some test results
image

Appears to disentangle moderately well for some attributes
image

Advantage of unsupervised over supervised
image

5. How did you verify that it works?

TODO: apply to generated clothes latent space.

6. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)

Weaknesses

  • Authors stated goal is to find directions that cause the most perturbation in the output image; but what about entanglement? A big change is not necessarily semantically isolated, but could be a change composed of several semantics.
    • this is shown in table 2 there is still entanglement
      image

    • supposedly, disentanglement is addressed by the fact they do an eigen decomposition, which uses orthogonal matrices

      • as acknowledged by the authors, orthogonal vectors don't totally disentangle: "(masculinizing + aging) and (feminizing + aging) are also two orthogonal directions in the latent space."
  • The authors make an assumption that a large change in the linear projection of the latent space will lead to a large variation in the output image (section 2.2, first ¶). However, that assumption is not proven in the paper.
  • Authors assume that the rate of change of the projected space y is monotonic, and that the max change found within the unit circle constraint will hold true outside the unit circle. Is it a guarantee that this is true? Or can the rate of change vary in a latent space? Recall StyleGAN's concept of Perceptual Path Length, an entangled space might be drastically curved.

How does it compare to the PCA approach of GANSpace? PCA can also be done via eigen decomposition?

7. Are there any papers to read next?

8. References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant