You are on page 1of 20

Machine Learning: Algorithms and Applications

Philip O. Ogunbona

Advanced Multimedia Research Lab


University of Wollongong

Artificial Neural Networks and Deep Learning: An Introduction (III)


Autumn 2020

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 1 / 20


Outline

1 Autoencoders

2 Generative Adversarial Networks (GAN)

3 References

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 2 / 20


Autoencoders

Conceptually an autoencoder is a
feedforward network trained to copy its
input to its output (albeit imperfectly)

Structure (see Figure 1) has a hidden layer


h describing the code representing the
input

Autoencoder has two parts: encoder


function h = f (x) that generates the
representative code of the input and
decoder function r = g(h) that produces a
reconstruction from the code

Figure 1: General structure of an autoencoder; Generalization of autoencoder to


input x maps to an output r (reconstruction) stochastic mappings:
through internal representation or code pencoder (h|x) and pdecoder (x|h)
h (Goodfellow et al. 2016)
Typical training strategy is similar to that
used for feedforward networks - minibatch
gradient descent

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 3 / 20


Stacked autoencoder

Practical autoencoder is a stack


Architecture of a stacked
autoencoder is typically
symmetrical with regards to the
central hidden layer (the coding
Figure 2: Example of stacked autoencoder layer) (see Figure 2)
used for the MNIST dataset; notice the 784
(28 × 28) input neurons; 300 hidden neurons;
150 central hidden neurons; a mirroring in the
top layer (Géron 2017)

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 4 / 20


Undercomplete/Overcomplete autoencoders

Constraining h to have smaller dimension than x results in an undercomplete autoencoder

h captures the most salient features of input

Learning entails minimizing a loss function


L(x, g(f (x))) (1)
L penalizes g(f (x)) being dissimilar to x

If dimension of code is greater than that of input we have overcomplete autoencoder

Any architecture of autoencoder can be trained without the risk of over-capacity or learning
a trivial identity, by using regularization
Regularization can inpart properties to loss function:
sparsity of representtaion
smallness of derivative of representation
robustness to noise
robustness to missing data

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 5 / 20


Autoencoders and Principal Component Analysis
(PCA)

With a linear decoder g(h) and mean squared error loss, an


undercomplete autoencoder learns the same subspace as PCA

With a nonlinear encoder and decoder (respectively, f (x) and g(h)), an


autoencoder can learn a more powerful generalization of PCA

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 6 / 20


Sparse autoencoders
Sparse autoencoder has cost function used for training in the form of
reconstruction error and sparsity penalty on the code layer h:

L(x, g(f (x))) + Ω(h) (2)

where h is the encoder output; h = f (x) typically (see Figure 1)

Sparse autoencoders are useful in learning features that can be input for
other tasks, e.g. classification (think about semi-supervised classification)

Sparse autoencoders can be interpreted as approximating maximum


likelihood training of generative model that has latent variables (in this
case h)

In this respect, it is maximizing

log pmodel (h, x) = log pmodel (h) + log pmodel (x|h) (3)

log pmodel (h) can be sparsity-inducing


SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 7 / 20
Denoising autoencoders

Denoising aims to reduce the noise in signals

Denoising autoencoders minimize

L(x, g(f (x̃))) (4)

where x̃ is a copy of x corrupted by some form of noise

Training process forces f and g to implictly learn the structure of pdata (x)

Another form of regularization λ i ||∇x hi ||2 forces the learning of a


P
function that does not change much when x changes slightly:
X
L(x, f (g(x))) + λ ||∇x hi ||2 (5)
i

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 8 / 20


Denoising autoencoders

Figure 4: Comparison between output of


Figure 3: Stacked convolutional denoising stacked convolutional denoising autoencoder
autoencoder and median filter; Gaussian noise: µ = 0, σ = 1

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 9 / 20


More autoencoders - cost functions

Contractive autoencoder
Regularization is introduced on the code h = f (x) to encourage
derivatives of f to be as small as possible

∂f (x) 2

Ω(h) = λ

∂x F

Contracctive autoencoder and denoising autoencoder are related when


input noise is small and Gaussian (Goodfellow et al. (2016))
denoising autoencoders make the reconstruction function resist small but
finite-sized perturbations of the input;
contractive autoencoders make the feature extraction function resist
infinitesimal perturbations of the input

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 10 / 20


Generative adversarial networks (GAN)

Central problem addressed by GAN is density estimation; GAN implicitly


captures the underlying data distribution

GAN can be used in both unsupervised and semisupervised learning


settings
Characterised by training two networks in competition:
There is a network, named the generator (G), trying to produce samples
from a distribution that is learned from given data - mimicking, forging,
synthetic data
There is a second network, the discriminator (D), that is able to tell the
synthetic samples from the real ones

Objective is to be able to generate synthetic signals that are no different


from the real ones

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 11 / 20


Generative adversarial networks (GAN)

Figure 5: Two models leaerned while training GAN; Discriminator (D) and Generator
(G); models implemented using neural network, but any differentiable system
(mapping) can also be used (Creswell et al. 2018)

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 12 / 20


Generative adversarial networks (GAN)

In Figure 5, Generator network has no access to the real samples

Generator network is a mapping from some representaion space (latent


space) to the data sample space:

G : G(z) → R|x|

where z ∈ R|x| is the data sample and | · | denotes the number of


dimensions

Discriminator network, D, maps data sample to a probability that sample


is from real data distribution and not generator distribution

D : D(x) → (0, 1)

pdata (x) represents the probability density function over the data samples
(in R|x| ) and pg (x) distribution of the samples produced by the generator

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 13 / 20


Generative adversarial networks (GAN)

During training we set objective functions for the generator (JG (ΘG ; ΘD ))
and discriminator (JD (ΘD ; ΘG ))

Note that JG and JD are co-dependent on the network parameters, ΘG


and ΘD as the networks are iteratively trained

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 14 / 20


Generative adversarial networks (GAN)

Figure 6: During GAN training, the generator is encouraged to produce a distribution


of samples, pg (x) to match that of real data, pdata (x) (Creswell et al. 2018)

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 15 / 20


Generative adversarial networks (GAN)

Training GAN

We find parameters of a discriminator that maximize its classification accuracy and find the
parameters of a generator that maximally confuses the discriminator
Cost of training is evaluated using a value function; solve the following mini-max problem:
max min V(G, D)
D G

where
V(G, D) = Epdata(x) log D(x) + Epg(x) log (1 − D(x))

Parameters of one model are updated while the parameters of the other are fixed
Optimal discriminator is unique (Goodfellow et al. 2014)
pdata(x)
D∗ (x) =
pdata(x) + pg(x)

Generator is optimal when (Goodfellow et al. 2014)


pg(x) = pdata(x)

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 16 / 20


Generative adversarial networks (GAN)

Figure 7: The main loop of GAN training. Novel data samples, x0 , may be drawn by
passing random samples, z, through the generator network. The gradient of the
discriminator may be updated k times before updating the generator. (Creswell et al.
2018)

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 17 / 20


Generative adversarial networks (GAN)

Other GAN architectures

Intial GAN architecture used fully connected neural network


Difficult to train; successful only with a subset of datsets - stablity issues
Deep convolutional GAN provided more stability;
Conditional GAN - both the generator and the dis- criminator networks
are class-conditional (Figure 8)
Conditional GANs can provide better representations for multimodal data
generation
InfoGAN decomposes the noise source into an incompressible source
and a “latent code,”; attempt to discover latent factors of variation by
maximizing the mutual information between the latent code and the
generator’s output (Figure 9)

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 18 / 20


Generative adversarial networks (GAN)

Other GAN architectures

Figure 8: Conditional GAN Figure 9: InfoGAN

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 19 / 20


Bibliography

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B. & Bharath, A. A. (2018),
‘Generative adversarial networks: An overview’, IEEE Signal Processing Magazine 35(1), 53 –
65.
Géron, A. (2017), Hands-on Machine Learning with Scikit-Learn and TensorFlow, O’Reilly Media,
Inc., CA, USA.
Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. &
Bengio, Y. (2014), Generative adversarial nets, in ‘Proc. Advances Neural Information
Processing Systems Conf’, Montreal, Quebec, Canada, p. 2672–2680.

SCIT-AMRL (University of Wollongong) Machine Learning ANNDL 20 / 20

You might also like