Professional Documents
Culture Documents
Philip O. Ogunbona
1 Autoencoders
3 References
Conceptually an autoencoder is a
feedforward network trained to copy its
input to its output (albeit imperfectly)
Any architecture of autoencoder can be trained without the risk of over-capacity or learning
a trivial identity, by using regularization
Regularization can inpart properties to loss function:
sparsity of representtaion
smallness of derivative of representation
robustness to noise
robustness to missing data
Sparse autoencoders are useful in learning features that can be input for
other tasks, e.g. classification (think about semi-supervised classification)
log pmodel (h, x) = log pmodel (h) + log pmodel (x|h) (3)
Training process forces f and g to implictly learn the structure of pdata (x)
Contractive autoencoder
Regularization is introduced on the code h = f (x) to encourage
derivatives of f to be as small as possible
∂f (x)
2
Ω(h) = λ
∂x
F
Figure 5: Two models leaerned while training GAN; Discriminator (D) and Generator
(G); models implemented using neural network, but any differentiable system
(mapping) can also be used (Creswell et al. 2018)
G : G(z) → R|x|
D : D(x) → (0, 1)
pdata (x) represents the probability density function over the data samples
(in R|x| ) and pg (x) distribution of the samples produced by the generator
During training we set objective functions for the generator (JG (ΘG ; ΘD ))
and discriminator (JD (ΘD ; ΘG ))
Training GAN
We find parameters of a discriminator that maximize its classification accuracy and find the
parameters of a generator that maximally confuses the discriminator
Cost of training is evaluated using a value function; solve the following mini-max problem:
max min V(G, D)
D G
where
V(G, D) = Epdata(x) log D(x) + Epg(x) log (1 − D(x))
Parameters of one model are updated while the parameters of the other are fixed
Optimal discriminator is unique (Goodfellow et al. 2014)
pdata(x)
D∗ (x) =
pdata(x) + pg(x)
Figure 7: The main loop of GAN training. Novel data samples, x0 , may be drawn by
passing random samples, z, through the generator network. The gradient of the
discriminator may be updated k times before updating the generator. (Creswell et al.
2018)
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B. & Bharath, A. A. (2018),
‘Generative adversarial networks: An overview’, IEEE Signal Processing Magazine 35(1), 53 –
65.
Géron, A. (2017), Hands-on Machine Learning with Scikit-Learn and TensorFlow, O’Reilly Media,
Inc., CA, USA.
Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. &
Bengio, Y. (2014), Generative adversarial nets, in ‘Proc. Advances Neural Information
Processing Systems Conf’, Montreal, Quebec, Canada, p. 2672–2680.