Professional Documents
Culture Documents
Abstract
arXiv:2104.12476v1 [cs.CV] 26 Apr 2021
Gender
Layer: 3
Recent studies on Generative Adversarial Network
(GAN) reveal that different layers of a generative CNN hold
different semantics of the synthesized images. However, few Hair
Color
GAN models have explicit dimensions to control the seman-
Layer: 5
tic attributes represented in a specific layer. This paper pro-
poses EigenGAN which is able to unsupervisedly mine in-
Hue
terpretable and controllable dimensions from different gen-
Layer: 6
erator layers. Specifically, EigenGAN embeds one linear
subspace with orthogonal basis into each generator layer.
Via the adversarial training to learn a target distribution, Painting
these layer-wise subspaces automatically discover a set of Style
“eigen-dimensions” at each layer corresponding to a set of Layer: 2
semantic attributes or interpretable variations. By travers-
ing the coefficient of a specific eigen-dimension, the gener- Pose
ator can produce samples with continuous changes corre- Layer: 3
sponding to a specific semantic attribute. Taking the human
face for example, EigenGAN can discover controllable di- Hue
mensions for high-level concepts such as pose and gender in Layer: 6
the subspace of deep layers, as well as low-level concepts
such as hue and color in the subspace of shallow layers.
Moreover, under the linear circumstance, we theoretically Figure 1. Example of interpretable dimensions learned by Eigen-
prove that our algorithm derives the principal components GAN. The smaller the index, the deeper the layer.
as PCA does. Codes can be found in https://github.
com/LynnHo/EigenGAN-Tensorflow.
showing that deep layers tend to determine the spatial lay-
out while shallow layers determine the color scheme. Sim-
1. Introduction ilar conclusion is also found by Bau et al. [3] in the dissec-
Generative adversarial network (GAN) [10] and its vari- tion analysis of GAN features at different layers. All these
ants [25, 11, 5, 18] achieve great success in high fidelity evidences reveal a property that different generator layers
image synthesis. Strong evidences [39, 41, 2] show that hold different semantics of the synthesized images in terms
different layers of a discriminative CNN capture different of abstraction level.
semantic concepts in terms of abstraction level, e.g., shal- According to this property, one can identify semantic
lower layers detect color and texture while deeper layers fo- attributes from different layers of a well-trained generator
cus more on objects and parts. Accordingly, we can expect by performing special algorithms [3, 12, 36, 38], and then
that a generative CNN also has similar property, and recent can manipulate these attributes on the synthesized images.
GAN studies confirm this fact [18, 38, 3]. StyleGAN [18] For example, Bau et al. [3] identify the causal units for
show that deeper generator layers control higher-level at- a specific concept (such as “tree”) by dissection and in-
tributes such as pose and glasses while shallower layers tervention on each generator layer. Turning on or off the
control lower-level features such as color and edge. Yang causal units causes the concept to appear or disappear on
et al. [38] found similar phenomenon in scene synthesis, the synthesized image. However, these methods are all
1
post-processing algorithms for well-trained GAN genera- tive, and there is no direct link from these dimensions to
tors. As for the generator itself, it operates as a black box the semantics of any specific generator layer. Ramesh et
and lacks explicit dimensions to directly control the seman- al. [33] found that the principal right-singular subspace of
tic attributes represented in different layers. In other words, the generator Jacobian show local disentanglement prop-
we do not know what attributes are represented in differ- erty, then they apply a spectral regularization to align the
ent generator layers or how to manipulate these attributes, singular vectors with straight coordinates, and finally obtain
unless we deeply inspect each layer by post-processing al- globally interpretable representations. However, this work
gorithms [3, 12, 36, 38]. also does not investigate the correspondence between these
Under above discussion, this paper starts with a question: interpretable representations and the semantics of different
can a generator itself automatically/unsupervisedly learn generator layers. Different from these methods, the inter-
explicit control of the semantic attributes represented in dif- pretability of our EigenGAN comes from the special design
ferent layers? To this end, we propose EigenGAN which of layer-wise subspace embedding, rather than imposing
equips a generator with interpretable dimensions for differ- any objective or regularization. Moreover, our EigenGAN
ent layers in a completely unsupervised manner. Specifi- establishes an explicit connection between the interpretable
cally, EigenGAN embeds a linear subspace model with or- dimensions and the semantics of a specific layer by directly
thogonal basis into each generator layer. On the one hand, embedding a subspace model into that layer.
since each subspace model is directly embedded into a spe- The above methods try to learn a GAN generator with
cific layer, a direct link is established between the subspace explicit interpretable representations, in contrast, another
and the semantic variations of the corresponding layer. On class of methods tries to reveal the interpretable factors from
the other hand, driven by the adversarial learning, the gener- a well-trained GAN generator [9, 3, 35, 38, 32, 12, 36].
ator tries to capture the principal variations of the data dis- [9, 3, 35, 38] adopt pre-trained semantic predictors to iden-
tribution, and these principal variations are separately repre- tify the corresponding semantic factors in the GAN latent
sented in different layers in terms of their abstraction level. space, e.g., Yang et al. [38] use layout estimator, scene
Then, with the help of the subspace model, the principal category recognizer, and attribute classifier to find out the
variations of a specific layer are further orthogonally sepa- decision boundaries for these concepts in the latent space.
rated into different basis vectors. Finally, each basis vector Without introducing external supervision, several methods
discovers an “eigen-dimension” that controls an attribute or search interpretable factors in self-supervised [32] or un-
interpretable variation corresponding to the semantics of its supervised [12, 36] manners. Plumerault et al. [32] uti-
layer. For example, as shown at the top of Fig. 1, an eigen- lize simple image transforms (e.g., translation and zoom)
dimension of the subspace embedded in a deep layer con- to search the axes for these transforms in the latent space.
trols gender, while another of the subspace embedded in the Harkonen et al. [12] apply PCA on the feature space of the
shallowest layer controls the hue of the image. Furthermore, early layers, and the resulting principal components repre-
under the linear circumstance, i.e., one layer model, we the- sent interpretable variations. Shen and Zhou [36] show that
oretically prove that our EigenGAN is able to discover the the weight matrix of the very first fully-connected layer of a
principal components as PCA [15] does, which gives us a generator determines a set of critical latent directions which
strong insight and reason to embed the subspace models into dominate the image synthesis, and the moving along these
different generator layers. Besides, we also provide a man- directions controls a set of semantic attributes. Among
ifold perspective showing that our EigenGAN decomposes these methods, [3, 35, 38, 12, 36] carefully investigate the
the data generation modeling into layer-wise dimension ex- semantics represented in different generator layers. How-
panding steps. ever, this class of methods can only be operated on well-
trained GANs, on the contrary, our EigenGAN aims to dis-
2. Related Works cover the interpretable dimensions for each generator layer
along with the GAN training in an end-to-end manner.
2.1. Interpretablity Learning for GANs
2.2. Generative Adversarial Networks
The first attempt to learn interpretable representations for
GAN generators is InfoGAN [6] which employs mutual in- Generative adversarial network (GAN) [10] is a sort of
formation maximization (MIM) between the latent variable generative model which can synthesize data samples from
and synthesized samples. Including InfoGAN, MIM based noises. The learning process of GAN is the competition be-
methods [6, 16, 17, 14, 20, 21, 22] can automatically dis- tween a generator and a discriminator. Specifically, the dis-
cover interpretable dimensions which respectively control criminator tries to distinguish the synthesized samples from
different semantic attributes such as pose, glasses and emo- the real ones, while the generator tries to make the synthe-
tion of human face. However, the learning of these inter- sized samples as realistic as possible in order to fool the
pretable dimensions is mainly driven by the MIM objec- discriminator. When the competition reaches Nash equilib-
2
real
FC Convs 2↑ Convs 2↑ Convs 2↑ or
fake
Figure 2. Overview of the proposed EigenGAN. The main stream of the model is a chain of 2-stride transposed convolutional blocks which
gradually enlarges the resolution of the feature maps and finally outputs a synthesized sample. In the ith layer, we embed a linear subspace
with orthonormal basis Ui = [ui1 , . . . , uiq ], and each basis vector uij is intended to unsupervisedly discover an “eigen-dimension” which
holds an interpretable variation of the synthesized samples.
rium, the synthesized data distribution is identical to the real controls major variation of the the ith layer while low
data distribution. absolute value denotes minor variation, which can be
GANs show promising performance and properties on also viewed as a kind of dimension selection.
data synthesis. Therefore, plenty of researches on GANs
appear, including loss functions [30, 25, 1], regulariza- • µi denotes the origin of the subspace.
tions [34, 26, 28], conditional generation [27, 31, 29], repre- Then, we use the ith latent variable zi = [zi1 , . . . , ziq ] as
T
sentation learning [24, 6, 8], architecture design [7, 5, 18], the coordinates (linear combination coefficients) to sample
applications [13, 42, 40], and etc. Our EigenGAN can be a point from the subspace Si :
categorized into representation learning as well as architec-
ture design for GANs. φi = Ui Li zi + µi (1)
q
X
3. EigenGAN = zij lij uij + µi . (2)
j=1
In this section, we first introduce the EigenGAN gen-
erator design with layer-wise subspace models in Sec. 3.1. This sample point φi will be added to the network feature
Then in Sec. 3.2, we make a discussion from the linear case of the ith layer as stated next.
to the general case of EigenGAN and finally provide a man- Let hi ∈ RHi ×Wi ×Ci denote the feature maps of the ith
ifold perspective. layer and x = ht+1 denote the final synthesized image, the
forward relation between the adjacent layers is
3.1. Generator with Layer-Wise Subspaces
hi+1 = Conv2x (hi + f (φi )) , i = 1, . . . , t, (3)
Fig. 2 shows our generator architecture. Our target is to
learn a t-layer generator mapping from a set of latent vari- where “Conv2x” denotes transposed convolutions that dou-
ables {zi ∈ Rq | zi ∼ Nq (0, I) , i = 1, . . . , t} to the syn- bles the resolution of the feature maps, and f can be iden-
thesized image x = G (z1 , . . . , zt ), where zi is directly tity function or a simple transform (1x1 convolution in prac-
injected into the ith generator layer. tice). As can be seen from Eq. (3), the sample point φi from
In the ith layer, we embed a linear subspace model Si = the subspace Si directly interacts with the network feature
(Ui , Li , µi ) where of the ith layer. Therefore, the subspace Si directly deter-
mines the variations of the ith layer, more concretely, q coor-
• Ui = [ui1 , . . . , uiq ] is the orthonormal basis of the T
dinates zi = [zi1 , . . . , ziq ] respectively control q different
subspace, and each basis vector uij ∈ RHi ×Wi ×Ci variations.
is intended to unsupervisedly discover an “eigen- Besides, we also inject a noise input ∼ N (0, I) to the
dimension” which holds an interpretable variation of bottom of the generator intended to capture the rest varia-
the synthesized samples. tions missed by the subspaces, as follows,
• Li = diag (li1 , . . . , liq ) is a diagonal matrix with lij h1 = FC () , (4)
deciding the “importance” of the basis vector uij . To
be specific, high absolute value of lij means that uij where “FC” denotes the fully-connected layer.
3
nonlinear gaze nonlinear
gaze
pose
pose
pose pose hue gaze
Figure 3. Manifold perspective of EigenGAN. At each layer, a linear subspace is added to the feature manifold, expanding the manifold
with “straight” directions along which the variation of some semantic attributes are linear. At the end of each layer, nonlinear mappings
“bend” these straight directions, yet another subspace at the next layer will continue to add new straight directions. Here, we only show
one semantic direction of each subspace just for simplicity, generally, each subspace contains multiple orthogonal directions.
t
The bases {Ui }ti=1 , the importance matrices {Li }i=1 , in a specific layer can capture the principal semantic vari-
t
the origins {µi }i=1 , and the convolution kernels are all ations of that layer, and these principal variations are or-
learnable parameters and the learning can be driven by var- thogonally separated into the basis vectors. In consequence,
ious adversarial losses [10, 25, 1, 28]. In this paper, hinge each basis vector discovers an “eigen-dimension” that con-
loss [28] is used for the adversarial training. Besides, the trols an attribute or interpretable variation corresponding to
orthogonality of Ui is achieved by the regularization of the semantics of its layer.
kUT i Ui − IkF . After training, each latent dimension zij
2
can explicitly control an interpretable variation correspond- Manifold Perspective Fig. 3 shows a manifold perspec-
ing to the semantic of its layer. tive of EigenGAN. From this aspect, the subspace of each
layer expands the feature manifold with “straight” direc-
3.2. Discussion tions along which the variations of some semantic attributes
are linear. At the end of each layer, nonlinear mappings
Linear Case To better understand how our model works,
“bend” these straight directions, yet another subspace at the
we first discuss the linear case of our EigenGAN. Adapting
next layer will continue to add new straight directions. In
from Eq. (1), the linear model is formulated as below,
a word, EigenGAN decomposes the data generation mod-
x = ULz + µ + σ. (5) eling into hierarchical dimension expanding steps, i.e., ex-
panding the feature manifold with linear semantic dimen-
This equation relates a d-dimension observation vector x sions layer-by-layer.
to a corresponding q-dimension (q < d) latent variables
z ∼ Nq (0, I) by an affine transform UL and a transla- 4. Experiments
tion µ. Besides, a noise vector ∼ Nd (0, I) is introduce to
compensate the missing energy. We also constrain U with Dataset We test our method on CelebA [23], FFHQ [18],
orthonormal columns and L as a diagonal matrix like the and Danbooru2019 Portraits [4]. CelebA contains 202,599
general case in Sec. 3.1. This formulation can also be re- celebrity face images with annotations of 40 binary at-
garded as a constrained case of Probabilistic PCA [37]. tributes. FFHQ contains 70,000 high-quality face images
To estimate U, L, µ and σ in Eq. (5) with n observations and Danbooru2019 Portraits contains 302,652 anime face
n
{xi }i=1 , an analytical solution is maximum likelihood es- images. We use CelebA attributes for the quantitative evalu-
timation (MLE). Please refer to the appendix for detailed ations and use FFHQ and Danbooru2019 Portraits for more
derivation of the MLE results. One important result is that visual results.
the columns of UML = uML ML
1 , . . . , uq are the eigenvec-
Implementation Details We use hinge loss [28] and R1
tors of data covariance corresponding to the q largest eigen-
penalty [26] for the adversarial training. We adopt Adam
values, which is exactly the same as the result of PCA [15].
solver [19] for all networks and parameter moving average
That is to say, the linear EigenGAN is able to discover the
for the generator. The generator is designed for 256 × 256
principal dimensions, which gives us a strong insight and
images and contains 6 upsampling convolutional blocks. A
motivation to embed such a linear model (Eq. (5)) hierar-
whole block with one upsampling is defined as a “layer”,
chically into different generator layers as stated in Sec. 3.1.
and one linear subspace with 6 basis vectors is embedded
EigenGAN (General Case) With the insight of the linear into the generator at each layer. Please refer to the appendix
case, we suppose that the linear subspace model embedded for detailed network architectures.
4
L1 D5 (Layer: 1 Dimension: 5) Facial Hair → Hat Hat (45%) Sideburns (33%)
L3 D1 Age → Gender Gender (89%) Lipstick (87%) Makeup (80%) Attractive (60%) Age (57%)
Hair Color
L3 D4 Bangs Bangs (68%) L5 D2 Black Hair (59%) Blond Hair (44%) Gray Hair (33%)
L4 D5 Smile
Smile (81%) High Cheekbones (67%) Mouth Open (55%) Narrow Eyes (43%) L6 D1 Background Hue
Figure 4. Discovered semantic attributes at different layers for CelebA dataset [23]. Traversing the coordinate value in [−4.5σ, 4.5σ], each
dimension controls an attribute, colored in blue. The attributes colored in green are the most correlated CelebA attributes, and the bracket
value is the entropy coefficient: what fraction of the information of the CelebA attribute is contained in the corresponding dimension. “Li
Dj” means the jth dimension of the ith layer. We only show the most meaningful dimensions, please refer to the appendix for all dimensions.
5
L2 D5 Painting Style
L4 D2 Mouth Shape
L6 D1 Hue
Figure 5. Interpretable dimensions of FFHQ dataset [18] and anime dataset [4].
4.1. Discovered Semantic Attributes Identifying Well-Defined Attributes In the previous part,
we visually identify semantic attributes for each dimension.
Visual Analysis Fig. 4 shows the semantic attributes In this part, we identify the attributes in a statistical man-
learned by the subspace of different layers, where “Li Dj” ner, utilizing 40 well-defined binary attributes in CelebA
means the jth dimension of the ith layer and smaller index dataset [23]. Specifically, we investigate the correlation be-
of layer means deeper. As shown, moving along an eigen- tween a dimension Z and a CelebA attribute Y in terms of
dimension (i.e., a basis vector of a subspace), the synthe- entropy coefficient (normalized mutual information), which
sized images consistently change by an interpretable mean- represents what fraction of the information of Y is con-
ing. Shallower layers tend to learn lower-level attributes, tained in Z:
e.g., L6 and L5 learn color-related attributes such as “Hue” I(Y ; Z) H(Y ) − H(Y |Z)
in L6 and “Hair Color” in L5. As the layer goes deeper, U (Y |Z) = = ∈ [0, 1] (6)
H(Y ) H(Y )
the generator discovers attributes with higher-level or more
complicated concepts. For example, L4 and L3 learn geo- where
metric or structural attributes such as “Face Shape” in L4 Z
and “Body Side” in L3. Deep layers tend to learn multi- H(Y |Z) = pZ (z) − pY|Z (y = 1|z) ln (pY|Z (y = 1|z))
ple attributes in one dimension, e.g., L1 D5 learns “Facial Z
Hair” on the left axis but “Hat” on the right axis. Besides, − (1 − pY|Z (y = 1|z)) ln (1 − pY|Z (y = 1|z)) dz, (7)
entanglement of attributes is likely to happen in deep layer
dimensions, e.g., L2 D2 learns to simultaneously change H(Y ) = − pY (y = 1) ln (pY (y = 1))
“Hair Side” and “Background Texture Orientation”, be- − (1 − pY (y = 1)) ln (1 − pY (y = 1)) . (8)
cause complex attribute composition might mislead the net-
work into believing their whole as one high-level attribute. pY|Z (y = 1|z) and pY (y = 1) can be calculated by1
In summary, shallow layers learn low-level or simple at- Z
tributes while deep layers learn high-level or complicated pY|Z (y = 1|z) = pY|X (y = 1|x)pG (x|z)dx, (9)
attributes. Entanglement might happen in some deep layer ZX
dimensions, and this is one of our limitations. Nonethe-
pY (y = 1) = pY|Z (y = 1|z)pZ (z)dz, (10)
less, the entanglement is interpretable, i.e., we can identify Z
what attributes are entangled in a dimension. Moreover,
our method can still discover well disentangled dimensions where pG (x|z) is the generator distribution, and pY|X (y =
that are highly consistent with visual concepts of humans. 1|x) is the posterior distribution which is approximated by
Fig. 5 show additional results of FFHQ dataset [18] and a pre-trained attribute classifier on CelebA dataset. We set
Danbooru2019 Portraits dataset [4]. Please refer to the ap- 1 y and z are conditionally independent given x, i.e., p
Y|X,Z (y =
pendix for more results and more interpretable dimensions. 1|x, z) = pY|X (y = 1|x).
6
(a)
(b)
Figure 6. Qualitative comparison between (a) SeFa [36] from StyleGAN and (b) EigenGAN. Both are trained on FFHQ-256 [18].
Table 1. Correlation between the discovered attributes and the CelebA attributes in terms of entropy coefficient. Each row denotes a
discovered attributes by (a) SeFa [36] and (b) EigenGAN, and each column denotes a CelebA attribute.
(a) SeFa from StyleGAN trained on FFHQ-256 dataset [18]. (b) EigenGAN trained on FFHQ-256 dataset [18].
Gender Eyeglasses Smiling Black Hair Gender Eyeglasses Smiling Black Hair
Gender 49% 14% 2% 4% Gender 57% 14% 12% 2%
Eyeglasses 5% 49% 2% 0% Eyeglasses 2% 33% 0% 1%
Smiling 1% 1% 52% 8% Smiling 1% 0% 55% 2%
Black Hair 1% 0% 1% 18% Black Hair 0% 0% 0% 38%
7
effect of layer-wise latent (fix bottom noise) effect of layer-wise latent (fix bottom noise)
effect of bottom noise (fix layer-wise latent) effect of bottom noise (fix layer-wise latent)
(a) With the subspace models (EigenGAN), major variations are captured by (b) Without the subspace models (typical GANs), major variations are cap-
the layer-wise latent variables. tured by the bottom noise.
Figure 7. Effect of the layer-wise latent variables (top) and the bottom noise (down).
Table 2. Basis similarity with PCA. P = Nd (0, I). Table 3. Basis similarity with PCA. P = Ud (0, 1).
Data Rank → Subspace Rank Data Rank → Subspace Rank
GAN Loss GAN Loss
5→1 5→3 10→1 10→3 10→5 20→1 20→5 20→10 5→1 5→3 10→1 10→3 10→5 20→1 20→5 20→10
KL-f-GAN [30] 1.00 0.98 0.99 0.90 0.93 0.97 0.78 0.79 KL-f-GAN [30] 0.96 0.98 0.97 0.89 0.93 0.89 0.72 0.82
Vanilla GAN [10] 1.00 0.99 1.00 0.90 0.94 0.98 0.77 0.81 Vanilla GAN [10] 0.97 0.97 0.97 0.92 0.92 0.92 0.76 0.84
WGAN [11] 0.99 0.98 1.00 0.89 0.92 0.99 0.76 0.83 WGAN [11] 0.98 0.97 0.98 0.93 0.94 0.98 0.77 0.84
LSGAN [25] 0.99 0.99 1.00 0.89 0.92 0.99 0.76 0.80 LSGAN [25] 0.97 0.97 0.96 0.89 0.95 0.91 0.74 0.82
HingeGAN [28] 0.99 0.99 1.00 0.92 0.93 0.96 0.77 0.81 HingeGAN [28] 0.97 0.98 0.97 0.87 0.94 0.92 0.75 0.82
variations, which is completely opposite to the original set- be seen, when the data rank is no more than 10, EigenGAN
ting in Fig. 7a. In conclusion, the subspace model is the key basis is highly similar to PCA basis with cosine similarity
point to enable the generator to put major variations into the about 0.9-1.0. When the data rank increases to 20, there are
layer-wise variables, therefore can further let the layer-wise two situations: 1) if we only search the most principal one
variables capture different semantics of different layers. basis vector (20→1), the vectors found by linear EigenGAN
and by PCA are still very close; 2) but if we want to find 5
Linear Case Study Sec. 3.2 theoretically proves that the
or more basis vectors, the average similarity decreases to
linear case of EigenGAN can discover the principal com-
0.7-0.8. We suppose the reason is that higher dimension
ponents under maximum likelihood estimation (MLE). In
data leads to the curse of dimensionality and further results
this part, we validate this statement by applying adversar-
in learning instability. Besides, various GAN losses have
ial training on the linear EigenGAN (we do not directly use
very consistent results, which shows the potential of gen-
MLE since we train the general EigenGAN with adversar-
eralizability of our theoretical results in Sec. 3.2 from KL
ial loss rather than MLE objective, and we keep this con-
divergence (MLE) to more general statistical distance such
sistency between the linear and the general case). Specifi-
JS divergence and Wassertein distance. In conclusion, we
cally, we use the linear EigenGAN to learn a low-rank sub-
experimentally verify the theoretical statement that the lin-
space model for toy datasets, then compare the basis vectors
ear EigenGAN can indeed discover principal components.
learned by our model and learned by PCA in terms of cosine
similarity. The toy datasets are generated as follows,
5. Limitations and Future Works
DA,b,P = {yi = Axi + b | xi ∼ P } (11)
Discovered semantic attributes are not always the same
where A is a random transform matrix, b is a random at different training times in two cases: 1) E.g., sometimes
translation vector, and P is a distribution selected from the gender and pose are learned as separated dimensions
Nd (0, I) or Ud (0, 1). We test typical adversarial losses but sometimes are entangled in one dimension at a deeper
including Vanilla GAN [10], LSGAN [25], WGAN [11], layer. This is because, without supervision, some complex
HingeGAN [28], and f-GAN [30] with KL divergence (KL- attribute compositions might mislead the model into believ-
f-GAN). Note that the objective of KL-f-GAN is theoreti- ing their whole as one higher-level attribute. 2) Sometimes
cally equivalent to MLE, thus we are actually also testing the model can discover a specific attribute but sometimes
MLE in the adversarial training manner. cannot, such as eyeglasses, mainly because these attributes
Table 2 and Table 3 report the average similarity between appear less frequently in the dataset. Future works will
EigenGAN basis vectors and PCA basis vectors, where each study the layer-wise eigen-learning with better disentangle-
result is the average over 100 random toy datasets. As can ment techniques and more powerful GAN architectures.
8
References [17] Takuhiro Kaneko, Kaoru Hiramatsu, and Kunio Kashino.
Generative adversarial image synthesis with decision tree la-
[1] Martin Arjovsky, Soumith Chintala, and Léon Bottou. tent controller. In IEEE Conf. Comput. Vis. Pattern Recog.,
Wasserstein gan. In Int. Conf. Mach. Learn., 2017. 3, 4 2018. 2
[2] David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and [18] Tero Karras, Samuli Laine, and Timo Aila. A style-based
Antonio Torralba. Network dissection: Quantifying inter- generator architecture for generative adversarial networks. In
pretability of deep visual representations. In IEEE Conf. IEEE Conf. Comput. Vis. Pattern Recog., 2019. 1, 3, 4, 6, 7
Comput. Vis. Pattern Recog., 2017. 1 [19] Diederik P Kingma and Jimmy Ba. Adam: A method for
[3] David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, stochastic optimization. In Int. Conf. Learn. Represent.,
Joshua B. Tenenbaum, William T. Freeman, and Antonio 2015. 4
Torralba. Gan dissection: Visualizing and understanding [20] Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and
generative adversarial networks. In Int. Conf. Learn. Rep- Honglak Lee. High-fidelity synthesis with disentangled rep-
resent., 2019. 1, 2 resentation. In Eur. Conf. Comput. Vis., 2020. 2
[4] Gwern Branwen, Anonymous, and Danbooru Community. [21] Zinan Lin, Kiran Thekumparampil, Giulia Fanti, and Se-
Danbooru2019 portraits: A large-scale anime head illustra- woong Oh. Infogan-cr and modelcentrality: Self-supervised
tion dataset, 2019. 4, 6 model training and selection for disentangling gans. In Int.
[5] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Conf. Mach. Learn., 2020. 2
scale gan training for high fidelity natural image synthesis. [22] Bingchen Liu, Yizhe Zhu, Zuohui Fu, Gerard de Melo, and
In Int. Conf. Learn. Represent., 2018. 1, 3 Ahmed Elgammal. Oogan: Disentangling gan with one-hot
[6] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya sampling and orthogonal regularization. In AAAI, 2020. 2
Sutskever, and Pieter Abbeel. Infogan: Interpretable rep- [23] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang.
resentation learning by information maximizing generative Deep learning face attributes in the wild. In Proceedings of
adversarial nets. In Adv. Neural Inform. Process. Syst., 2016. the IEEE international conference on computer vision, 2015.
2, 3 4, 5, 6
[7] Emily Denton, Soumith Chintala, Arthur Szlam, and Rob [24] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian
Fergus. Deep generative image models using a laplacian Goodfellow, and Brendan Frey. Adversarial autoencoders.
pyramid of adversarial networks. In Adv. Neural Inform. Pro- In Int. Conf. Learn. Represent., 2016. 3
cess. Syst., 2015. 3 [25] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen
[8] Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Ad- Wang, and Stephen Paul Smolley. Least squares generative
versarial feature learning. In Int. Conf. Learn. Represent., adversarial networks. In Int. Conf. Comput. Vis., 2017. 1, 3,
2017. 3 4, 8
[9] Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip [26] Lars Mescheder, Andreas Geiger, and Sebastian Nowozin.
Isola. Ganalyze: Toward visual definitions of cognitive im- Which training methods for gans do actually converge? In
age properties. In Int. Conf. Comput. Vis., 2019. 2 Int. Conf. Mach. Learn., 2018. 3, 4
[10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing [27] Mehdi Mirza and Simon Osindero. Conditional generative
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and adversarial nets. arXiv:1411.1784, 2014. 3
Yoshua Bengio. Generative adversarial networks. In Adv. [28] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and
Neural Inform. Process. Syst., 2014. 1, 2, 4, 8 Yuichi Yoshida. Spectral normalization for generative ad-
[11] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent versarial networks. In Int. Conf. Learn. Represent., 2018. 3,
Dumoulin, and Aaron Courville. Improved training of 4, 8
wasserstein gans. In Adv. Neural Inform. Process. Syst., [29] Takeru Miyato and Masanori Koyama. cgans with projection
2017. 1, 8 discriminator. In Int. Conf. Learn. Represent., 2018. 3
[12] Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and [30] Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-
Sylvain Paris. Ganspace: Discovering interpretable gan con- gan: Training generative neural samplers using variational
trols. Adv. Neural Inform. Process. Syst., 2020. 1, 2 divergence minimization. In Adv. Neural Inform. Process.
[13] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Syst., 2016. 3, 8
Efros. Image-to-image translation with conditional adver- [31] Augustus Odena, Christopher Olah, and Jonathon Shlens.
sarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., Conditional image synthesis with auxiliary classifier gans.
2017. 3 In Int. Conf. Mach. Learn., 2017. 3
[14] Insu Jeon, Wonkwang Lee, Myeongjang Pyeon, and Gunhee [32] Antoine Plumerault, Hervé Le Borgne, and Céline Hude-
Kim. Ib-gan: Disentangled representation learning with in- lot. Controlling generative models with continuous factors
formation bottleneck gan. In AAAI, 2021. 2 of variations. In Int. Conf. Learn. Represent., 2019. 2
[15] Ian T Jolliffe. Principal component analysis. 1986. 2, 4 [33] Aditya Ramesh, Youngduck Choi, and Yann LeCun.
[16] Takuhiro Kaneko, Kaoru Hiramatsu, and Kunio Kashino. A spectral regularizer for unsupervised disentanglement.
Generative attribute controller with conditional filtered gen- arXiv:1812.01161, 2018. 2
erative adversarial networks. In IEEE Conf. Comput. Vis. [34] Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, and
Pattern Recog., 2017. 2 Thomas Hofmann. Stabilizing training of generative adver-
9
sarial networks through regularization. In Adv. Neural In-
form. Process. Syst., 2017. 3
[35] Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. In-
terfacegan: Interpreting the disentangled face representation
learned by gans. IEEE Trans. Pattern Anal. Mach. Intell.,
2020. 2
[36] Yujun Shen and Bolei Zhou. Closed-form factorization of
latent semantics in gans. In IEEE Conf. Comput. Vis. Pattern
Recog., 2021. 1, 2, 7
[37] Michael E Tipping and Christopher M Bishop. Probabilistic
principal component analysis. Journal of the Royal Statisti-
cal Society: Series B (Statistical Methodology), 61(3):611–
622, 1999. 4
[38] Ceyuan Yang, Yujun Shen, and Bolei Zhou. Semantic hier-
archy emerges in deep generative representations for scene
synthesis. Int. J. Comput. Vis., 2021. 1, 2
[39] Matthew D Zeiler and Rob Fergus. Visualizing and under-
standing convolutional networks. In Eur. Conf. Comput. Vis.,
2014. 1
[40] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiao-
gang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stack-
gan: Text to photo-realistic image synthesis with stacked
generative adversarial networks. In Int. Conf. Comput. Vis.,
2017. 3
[41] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva,
and Antonio Torralba. Object detectors emerge in deep scene
cnns. In Int. Conf. Learn. Represent., 2015. 1
[42] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A
Efros. Unpaired image-to-image translation using cycle-
consistent adversarial networks. In Int. Conf. Comput. Vis.,
2017. 3
10
Appendix A for EigenGAN
In Sec. 1-8, we derive the analytical maximum likelihood estimation (MLE) result
of the linear case of the proposed EigenGAN. Sec. 8 discusses the MLE result and the
relation among the linear EigenGAN, the Principal Component Analysis (PCA) [1],
and Probabilistic PCA [2].
1. The Likelihood
1
Then for n observations {x}ni=1 , the log-likelihood is
( n
)
n 1 X
L1 = − d log 2π + log |C| + (xi − µ)T C−1 (xi − µ) . (10)
2 n i=1
According to Eq. (7), only the square value of L affects the probability density func-
tion, therefore we can assume the elements of L to be non-negative. Further, for conve-
nience of the following analysis, without loss of generality, we organize L by grouping
and sorting it by the value of the diagonal elements:
L = diag l1 Id1 , l2 Id2 , · · · , lp Idp , (11)
where l1 > l2 > · · · > lp ≥ 0; Idj denotes a dj × dj identity matrix, dj 6= 0, and
d1 + d2 + · · · + dp = q. According to Eq. (9) and (11), M also has a grouped form:
2 2 −1 −2 2 2 −1 −2
M = diag l1 + σ − σ Id1 , · · · , lp + σ − σ Idp . (12)
2. Determination of µ
Since L1 is a concave function of µ, the above stationary point is also the global max-
imum point.
2
3. Determination of U: Part (1)
Substituting Eq. (15) into the log-likelihood L1 (10), we obtain a new objective:
n
L2 = − d log 2π + log |C| + tr SC−1 , (16)
2
where
n
1X
S= (xi − x̄) (xi − x̄)T , (17)
n i=1
i.e., the covariance matrix of the data. According to Eq. (4), the maximization of
L2 (16) with respect to U is a constrained optimization as below:
max L2
U
subject to UT U = I
Introducing the Lagrange multiplier H, the Lagrangian function is
LU = L2 + tr HT UT U − I
n
=− d log 2π + log |C| + tr SC−1 + tr HT UT U − I . (18)
2
Then the partial derivative of LU with respect to U is
∂LU −1 H + H T
= −n U L2 + σ 2 I L2 + + SUM . (19)
∂U 2
At the stationary point,
−1 H + H T
SUM = −U L2 + σ 2 I L2 + . (20)
2
Left multiplying the above equation by UT and using UT U = I, we obtain
T 2
2 −1 2 H + HT
U SUM = − L + σ I L − . (21)
2
The right-hand side of above equation is a symmetric matrix, therefore the left-hand
side UT SUM is also symmetric. Furthermore, since both UT SU and M are also sym-
metric, to satisfy the symmetry of UT SUM, according to the form of M in Eq. (12),
UT SU must have a similar block diagonal form:
UT SU = diag (A1 , A2 , · · · , Ap ) (22)
= diag QT T T
1 Λ1 Q1 , Q2 Λ2 Q2 , · · · , Qp Λp Qp , (23)
3
where Aj is a dj × dj symmetric matrix, and QT j Λj Qj is the eigendecomposition of
Aj . Using Eq. (20), (21), and (23), we can derive
SUM = U · diag QT Λ Q
1 1 1 , QT
Λ
2 2 2Q , · · · , QT
p p p · M.
Λ Q (24)
Substituting Eq. (12) and (13) into Eq. (24), we obtain
−1
2 −1
lj2 + σ 2 − σ −2 SUj QT
j = lj
2
+ σ − σ −2
Uj QT
j Λj (25)
=⇒ SUj QT T
j = Uj Qj Λj , j = 1, 2, · · · , p0 , (26)
where
(
p, lp > 0,
p0 = (27)
p − 1, lp = 0.
Uj = Vj Qj , j = 1, 2, · · · , p0 , (28)
where the columns of V
j are orthonormal eigenvectors of S with corresponding eigen-
values as Λj = diag λj1 , λj2 , · · · , λjdj , and Qj is an arbitrary orthogonal matrix.
Note that if p0 = p − 1, i.e., lp = 0, Up is an arbitrary matrix.
4. Determination of L = diag l1Id1 , l2Id2 , · · · , lpIdp
Substituting Eq. (26), Eq. (7)-(8), and Eq. (11)-(12) into L2 (16) and after some
manipulation, a new objective is derived:
( p0 h
n X i
2 2 2 2 −1
L3 = − d log 2π + dj log lj + σ + lj + σ tr (Λj )
2 j=1
p0
)
X
+(d − q 0 ) log σ 2 + σ −2 tr (S) − tr (Λj ) , (29)
j=1
where
0
p
X
q0 = dj . (30)
j=1
4
Then the partial derivative of L3 with respect to lj is
( )
∂L3 dj lj lj tr (Λj )
= −n 2 − , j = 1, 2, · · · , p0 , (31)
∂lj lj + σ 2 (lj2 + σ 2 )2
and the stationary point is
tr (Λj )
lj2 = − σ2, j = 1, 2, · · · , p0 . (32)
dj
5. Determination of σ
6. Determination of Λj
5
According to Sec. 3, the diagonal elements of Λj = diag λj1 , λj2 , · · · , λjdj , j =
1, · · · , p0 are the eigenvalues of S, therefore the problem here is to select the suitable
eigenvalues from S and separate them into different Λj s to maximize L5 (36). Using
Jensen’s inequality:
tr (Λj ) λj1 + · · · + λjdj log λj1 + · · · + log λjdj
log = log ≥ , (37)
dj dj dj
and the equality holds if and only if λj1 = · · · = λjdj . That means, no matter how we
select the eigenvalues, only grouping them by the same values can maximize L5 (36).
Therefore the optimal grouping is
Λj = diag λj1 , λj2 , · · · , λjdj
= diag (λj , λj , · · · , λj )
= λj Idj , (38)
where λj is an eigenvalue of S whose algebraic multiplicity ≥ dj , and without loss of
generality, we can assume λ1 > λ2 > · · · > λp0 .
6
been selected. Maximizing L6 (39) requires maximizing
P
d P
d
log γi γi
i=q 0 +1 i=q 0 +1
F= − log , (40)
d − q0 d − q0
which only requires γi , i = q 0 + 1, · · · , d to be adjacent in ordered eigenvalues. How-
ever according to Eq. (32), we need λj > σ 2 , j = 1, · · · , p0 , and then from Eq. (35),
the only choice to maximize F is to let γi , i = q 0 + 1, · · · , d be the d − q 0 smallest
eigenvalues. Meanwhile, lager q 0 leads to larger F, therefore,
p0 = p (41)
p
X
0
q =q= dj (42)
j=1
According to Eq. (28) and Eq. (38), the columns of Vj are orthonormal eigenvectors
of S corresponding to a same eigenvalue λj . Since Qj is an arbitrary orthogonal matrix,
the column of Uj = Vj Qj are still orthonormal eigenvectors corresponding to the
eigenvalue λj .
Summarizing the above analysis (Eq. (15), (32), (35), and Sec. 7), the global maxi-
mum of the likelihood with respect to the model parameters is
n
1X
µ= xi , (43)
n i=1
tr (S) − tr (Λ)
σ2 = , (44)
d−q
L2 = Λ − σ 2 I, (45)
U = [u1 , · · · , uq ] , (46)
where the elements of the diagonal matrix Λ is the q largest eigenvalues of the data
covariance S, and u1 , · · · , uq are the principal q eigenvectors corresponding to Λ. As
7
can be seen, under maximum likelihood estimation, the basis vectors U of our linear
model are exactly the same as that learned by PCA [1]. Moreover, diagonal elements of
L represent the “importance” or “energy” of the corresponding basis vectors, and from
Eq. (45), when σ → 0, the elements of L approach the q largest eigenvalues. Besides,
as shown in Eq. (44), energy (σ 2 ) of the noise is the average of the discard eigenvalues,
which exactly compensates the energy missed by the subspace model.
Our model can be viewed as constrained case of Probabilistic PCA (PPCA) [2]:
x = Wz + µ + σ, (47)
whose maximum likelihood estimation is
1
W = V Λ − σ 2 I 2 Q, (48)
where the columns of V are the principal eigenvectors of the data covariance, Λ is a
diagonal matrix whose elements are the corresponding eigenvalues, Q is an arbitrary
orthogonal matrix. Therefore, MLE result of PPCA is nondeterministic due to the arbi-
trary Q. Although W contains information of the principal eigenvectors, the columns
of W itself do not show explicit property of the orthogonality. Our model (1) restricts
W of PPCA (47) to the special form of UL where U has orthonormal columns and L
is diagonal matrix. In consequence, MLE result of our model is deterministic (Eq. (43)-
(46)). Moreover, our model can build a linear subspace with the principal eigenvectors
explicitly as the basis vectors, which is very suitable for extension to the nonlinear case
to learn layer-wise interpretable dimensions, as introduced in the main text.
References
[1] Ian T Jolliffe. Principal component analysis. 1986. 1, 8
[2] Michael E Tipping and Christopher M Bishop. Probabilistic principal component analysis. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999. 1, 8
8
Appendix B for EigenGAN
1. All Interpretable Dimensions of Each Layer IEEE Conf. Comput. Vis. Pattern Recog., 2019. 1, 11, 12, 13,
14, 15
Fig. 1 to Fig. 5 show all interpretable dimensions of [3] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep
each generator layer learnt by the proposed EigenGAN on learning face attributes in the wild. In Int. Conf. Comput. Vis.,
CelebA dataset [3], Fig. 6 to Fig. 9 show all dimensions 2015. 1, 2, 3, 4, 5, 6
learnt on Anime dataset [1], and Fig. 10 to Fig. 14 show all [4] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rec-
dimensions learnt on FFHQ dataset [2]. We traverse each tifier nonlinearities improve neural network acoustic models.
dimension from −4.5σ to 4.5σ and omit the dimensions In Int. Conf. Mach. Learn., 2013. 17
with almost no change. “None” in these figures means the
corresponding part of the dimension is uninterpretable or
difficult to be assigned an attribute name. The smaller the
index, the deeper the layer.
3. Network Architectures
The architectures of the generator and the discriminator
of EigenGAN are shown in Fig. 16.
References
[1] Gwern Branwen, Anonymous, and Danbooru Community.
Danbooru2019 portraits: A large-scale anime head illustration
dataset, 2019. 1, 7, 8, 9, 10
[2] Tero Karras, Samuli Laine, and Timo Aila. A style-based
generator architecture for generative adversarial networks. In
1
L6 (Layer 6, shallowest)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L6 D1 Hue (Blue-Pink)
L6 D2 Hue (Blue-Orange)
L6 D3 Hue (Yellow-Purple)
L6 D4 Hue (Orange-White)
L6 D5 Hue (Green-Red)
Figure 1. Interpretable dimensions of layer 6 (the shallowest) for CelebA dataset [3].
2
L5 (Layer 5) L4 (Layer 4)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ -4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L5 D4 Lighting L4 D4 Lighting
L5 D5 Gaze L4 D5 Smiling
Figure 2. Interpretable dimensions of layer 5 (left) and layer 4 (right) for CelebA dataset [3].
3
L3 (Layer 3)
L3 D3 Race Bangs
L3 D4 Bangs
L3 D5 Hair Style
L3 D6 Body Side
4
L2 (Layer 2)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L2 D5 Body Pose
Figure 4. Interpretable dimensions of layer 2 for CelebA dataset [3]. “None” means uninterpretable.
5
L1 (Layer 1, deepest)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L1 D2 Distortion None Male (→) & Facial Hair & Hair or Hat
Figure 5. Interpretable dimensions of layer 1 (the deepest) for CelebA dataset [3]. “None” means uninterpretable.
6
L6 (Layer 6, shallowest) L5 (Layer 5)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ -4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
Figure 6. Interpretable dimensions of layer 6 (left, the shallowest) and layer 5 (right) for Anime dataset [1].
7
L4 (Layer 4) L3 (Layer 3)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ -4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
Figure 7. Interpretable dimensions of layer 4 (left) and layer 3 (right) for Anime dataset [1].
8
L2 (Layer 2)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L2 D4 Matureness
L2 D5 Painting Style
L2 D6 Hair Length
Figure 8. Interpretable dimensions of layer 2 for Anime dataset [1]. “None” means uninterpretable.
9
L1 (Layer 1, deepest)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
Figure 9. Interpretable dimensions of layer 1 (the deepest) for Anime dataset [1]. “None” means uninterpretable.
10
L6 (Layer 6, shallowest)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L6 D1 Hue (Blue-Yellow)
L6 D2 Hue (Red-White)
L6 D3 Hue (Red-Green)
Figure 10. Interpretable dimensions of layer 6 (the shallowest) for FFHQ dataset [2].
11
L5 (Layer 5) L4 (Layer 4)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ -4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L5 D4 Squinting L4 D4 Race
L4 D5 Race
L4 D6 Background (Symmetric)
Figure 11. Interpretable dimensions of layer 5 (left) and layer 4 (right) for FFHQ dataset [2].
12
L3 (Layer 3)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L3 D1 Pose (Yaw)
L3 D3 Background
L3 D5 Smiling
13
L2 (Layer 2)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L2 D3 Age
L2 D5 Hair Volume
L2 D6 Body Side
14
L1 (Layer 1, deepest)
-4.5σ -3σ -2σ -1σ 1σ 2σ 3σ 4.5σ
L1 D1 Gender
L1 D6 Eyeglasses
Figure 14. Interpretable dimensions of layer 1 (the deepest) for FFHQ dataset [2]. “None” means uninterpretable.
15
Figure 15. Effect of the importance values Li = diag (li1 , . . . , liq ). Dimensions with large importance value control large variations while
dimensions with small importance value control small variations.
16
Logit
Tanh FC(1)
LReLU → Conv(3, 7, 1) FC(512) → LReLU
Conv(512, 3, 1) → LReLU
LReLU → DeConv(210-i, 3, 1)
Conv(16, 7,3,1)
Conv(512, 2)→
→LReLU
LReLU
DeConv(211-i, 1, 1)
Conv(256, 3, 1) → LReLU
Conv(16, 7,3,1)
Conv(256, 2)→
→LReLU
LReLU
Conv(128, 3, 1) → LReLU
Conv(16,
Conv(64, 7,
3, 1)
2) → LReLU
LReLU → DeConv(210-i, 3, 2)
Conv(32, 3, 1) → LReLU
DeConv(211-i, 1, 1)
Conv(16, 3,
Conv(32, 7, 2)
1) → LReLU
Conv(16, 3, 1) → LReLU
Reshape(512, 4, 4)
FC(512 * 4 * 4) Conv(16, 7, 1) → LReLU
Figure 16. Network architectures of EigenGAN. Conv(d, k, s) and DeConv(d, k, s) denote convolutional layer and transposed convolutional
layer with d as output dimensions, k as kernel size and s as stride. FC(d) denotes fully connected layer with d as output dimensions. LReLU
denotes Leaky ReLU [4].
17