You are on page 1of 13

venturebeat.

com

Generative adversarial networks:


What GANs are and how they’ve
evolved
By Kyle Wiggers | Dec. 27th, 2019 Send to Kindle

Perhaps you’ve read about AI capable of producing humanlike speech or generating


images of people that are difficult to distinguish from real-life photographs. More
often than not, these systems build upon generative adversarial networks (GANs),
which are two-part AI models consisting of a generator that creates samples and a
discriminator that attempts to differentiate between the generated samples and
real-world samples. This unique arrangement enables GANs to achieve impressive
feats of media synthesis, from composing melodies and swapping sheep for giraffes
to hallucinating footage of ice skaters and soccer players. In point of fact, it’s
because of this prowess that GANs have been used to produce problematic content
like deepfakes, which is media that takes a person in existing media and replaces
them with someone else’s likeness.

The evolution of GANs — which Facebook AI research director Yann LeCun has
called the most interesting idea of the decade — is somewhat long and winding, and
very much continues to this day. They have their deficiencies, but GANs remain one
of the most versatile neural network architectures in use today.

History of GANs
The idea of pitting two algorithms against each other originated with Arthur Samuel,
a prominent researcher in the field of computer science who’s credited with
popularized the term “machine learning.” While at IBM, he devised a checkers game
— the Samuel Checkers-playing Program — that was among the first to successfully
self-learn, in part by estimating the chance of each side’s victory at a given position.

But if Samuel is the grandfather of GANs, Ian Goodfellow, former Google Brain
research scientist and director of machine learning at Apple’s Special Projects
Group, might be their father. In a seminal 2014 research paper simply titled
“Generative Adversarial Nets,” Goodfellow and colleagues describe the first working
implementation of a generative model based on adversarial networks.

Goodfellow has often stated that he was inspired by noise-contrastive estimation, a


way of learning a data distribution by comparing it against a defined noise
distribution (i.e., a mathematical function representing corrupted or distorted data).
Noise-contrastive estimation uses the same loss functions as GANs — in other
words, the same measure of performance with respect to a model’s ability to
anticipate expected outcomes.

Of course, Goodfellow was’t the only one to pursue an adversarial AI model


design. Dalle Molle Institute for Artificial Intelligence Research co-director Juergen
Schmidhuber advocated predictability minimization, a technique that models
distributions through an encoder that maximizes the objective function (the
function that specifies the problem to be solved by the system) minimized by a
predictor. It adopts what’s known as a minimax decision rule, where the possible loss
for a worst case (maximum loss) scenario is minimized as much as possible.

And this is the paradigm upon which GANs are built.

GAN architecture
Again, GANs consist of two parts: generators and discriminators. The generator
model produces synthetic examples (e.g., images) from random noise sampled using
a distribution, which along with real examples from a training data set are fed to the
discriminator, which attempts to distinguish between the two. Both the generator
and discriminator improve in their respective abilities until the discriminator is
unable to tell the real examples from the synthesized examples with better than the
50% accuracy expected of chance.

GANs train in an unsupervised fashion, meaning that they infer the patterns within
data sets without reference to known, labeled, or annotated outcomes. Interestingly,
the discriminator’s work informs that of the generator — every time the
discriminator correctly identifies a synthesized work, it tells the generator how to
tweak its output so that it might be more realistic in the future.

In practice, GANs suffer from a number of shortcomings owing to their architecture.


The simultaneous training of generator and discriminator models is inherently
unstable. Sometimes the parameters — the configuration values internal to the
models — oscillate or destabilize, which isn’t surprising given that after every
parameter update, the nature of the optimization problem being solved changes.
Alternatively, the generator collapses, and it begins to produce data samples that are
largely homogeneous in appearance.
Above: The architecture of a generative adversarial network (GAN).

The generator and discriminator also run the risk of overpowering each other. If the
generator becomes too accurate, it’ll exploit weaknesses in the discriminator that
lead to undesirable results, whereas if the discriminator becomes too accurate, it’ll
impede the generator’s progress toward convergence.
A lack of training data also threatens to impede GANs’ progress in the semantic
realm, which in this context refers to the relationships among objects. Today’s best
GANs struggle to reconcile the difference between palming and holding an object,
for example — a differentiation most humans make in seconds.

But as Hanlin Tang, senior director of Intel’s AI laboratory, explained to VentureBeat


in a phone interview, emerging techniques get around these limitations. One entails
building multiple discriminator into a model and fine-tuning them on specific data.
Another involves feeding discriminator dense embedding representations, or
numerical representations of data, so that they have more information from which
to draw.

“There [aren’t] that many well-curated data sets to start … applying GANs to,” Tang
said. “GANs just follow where the data sets are going.”

On the subject of compute, Youssef Mroueh, a research staff member in the IBM
multi-modal algorithms and engines group, is working with colleagues to develop
lightweight models dubbed “small GANs” that reduce training time and memory
usage. The bulk of their research is concentrated in the MIT-IBM Watson AI Lab, a
joint AI research effort between the Massachusetts Institute of Technology and IBM.

“[It’s a] challenging business question: How can we change [the] modeling without
all the computation and hassle?” Mroueh said. “That’s what we’re working toward.”

GAN applications
Image and video synthesis
GANs are perhaps best known for their contributions to image synthesis.

StyleGAN, a model Nvidia developed, has generated high-resolution head shots of


fictional people by learning attributes like facial pose, freckles, and hair. A newly
released version — StyleGAN 2 — makes improvements with respect to both
architecture and training methods, redefining the state of the art in terms of
perceived quality.

In June 2019, Microsoft researchers detailed ObjGAN, a novel GAN that could
understand captions, sketch layouts, and refine the details based on the wording.
The coauthors of a related study proposed a system — StoryGAN — that
synthesizes storyboards from paragraphs.

Such models have made their way into production. Startup Vue.ai‘s GAN susses out
clothing characteristics and learns to produce realistic poses, skin colors, and other
features. From snapshots of apparel, it can generate model images in every size up
to five times faster than a traditional photo shoot.

Elsewhere, GANs have been applied to the problems of super-resolution (image


upsampling) and pose estimation (object transformation). Tang says one of his
teams used GANs to train a model to upscale 200-by-200-pixel satellite imagery to
1,000 by 1,000 pixels, and to produce images that appear as though they were
captured from alternate angles.

Above: Examples of edits performed by GAN Paint Studio.


Scientists at Carnegie Mellon last year demoed Recycle-GAN, a data-driven approach
for transferring the content of one video or photo to another. When trained on
footage of human subjects, the GAN generated clips that captured subtle
expressions like dimples and lines that formed when subjects smiled and moved
their mouths.

More recently, researchers at Seoul-based Hyperconnect published MarioNETte,


which synthesizes a reenacted face animated by a person’s movement while
preserving the face’s appearance.

On the object synthesis side of the equation, Google and MIT’s Computer Science
and Artificial Intelligence Laboratory (CSAIL) developed a GAN that can generate
images of 3D models with realistic lighting and reflections and enables shape and
texture editing, as well as viewpoint shifts.

Video
Predicting future events from only a few video frames — a task once considered
impossible — is nearly within grasp thanks to state-of-the-art approaches involving
GANs and novel data sets.

One of the newest papers on the subject from DeepMind details recent advances in
the budding field of AI clip generation. Thanks to “computationally efficient”
components and techniques and a new custom-tailored data set, researchers say
their best-performing model — Dual Video Discriminator GAN (DVD-GAN) — can
generate coherent 256 x 256-pixel videos of “notable fidelity” up to 48 frames in
length.
In a twist on the video synthesis formula, Cambridge Consultants last year demoed a
model called DeepRay that invents video frames to mitigate distortion caused by
rain, dirt, smoke, and other debris.

Artwork
GANs are capable of more than generating images and video footage. When trained
on the right data sets, they’re able to produce de novo works of art.

Researchers at the Indian Institute of Technology Hyderabad and the Sri Sathya Sai
Institute of Higher Learning devised a GAN, dubbed SkeGAN, that generates stroke-
based vector sketches of cats, firetrucks, mosquitoes, and yoga poses.

Scientists at the Maastricht University in the Netherlands created a GAN that


produces logos from one of 12 different colors.

Victor Dibia, a human-computer interaction researcher and Carnegie Mellon


graduate, trained a GAN to synthesize African tribal masks.

Meanwhile, a team at the University of Edinburgh’s Institute for Perception and


Institute for Astronomy designed a model that generates images of fictional galaxies
that closely follow the distributions of real galaxies.
In March during its GPU Technology Conference (GTC) in San Jose, California,
Nvidia took the wraps off of GauGAN, a generative adversarial AI system that lets
users create lifelike landscape images that never existed. GauGAN — whose name
comes from post-Impressionist painter Paul Gauguin — improves upon Nvidia’s
Pix2PixHD system introduced last year, which was similarly capable of rendering
synthetic worlds but left artifacts in its images. The machine learning model
underpinning GauGAN was trained on more than one million images from Flickr,
imbuing it with an understanding of the relationships among over 180 objects
including snow, trees, water, flowers, bushes, hills, and mountains. In practice, trees
next to water have reflections, for instance, and the type of precipitation changes
depending on the season depicted.

Music
GANs are architecturally well-suited to generating media, and that includes music.

In a paper published in August, researchers hailing from the National Institute of


Informatics in Tokyo describe a system that’s able to generate “lyrics-conditioned”
melodies from learned relationships between syllables and notes.

Not to be outdone, in December, Amazon Web Services detailed DeepComposer, a


cloud-based service that taps a GAN to fill in compositional gaps in songs.

“For a long time, [GANs research] has been about improving the training instabilities
whatever the modality is — text, images, sentences, et cetera. Engineering is one
thing, but it’s also [about] coming up with [the right] architecture,” said Mroueh.
“It’s a combination of lots of things.”

Speech
Google and Imperial College London researchers recently set out to create a GAN-
based text-to-speech system capable of matching (or besting) state-of-the-art
methods. Their proposed system — GAN-TTS — consists of a neural network that
learned to produce raw audio by training on a corpus of speech with 567 pieces of
encoded phonetic, duration, and pitch data. To enable the model to generate
sentences of arbitrary length, the coauthors sampled 44 hours’ worth of two-
second snippets together with the corresponding linguistic features computed for
five-millisecond snippets. An ensemble of 10 discriminators — some of which assess
linguistic conditioning, while others assess general realism — attempt to distinguish
between real and synthetic speech.

Medicine
In the medical field, GANs have been used to produce data on which other AI
models — in some cases, other GANs — might train and to invent treatments for
rare diseases that to date haven’t received much attention.

In April, the Imperial College London, University of Augsburg, and Technical


University of Munich sought to synthesize data to fill in gaps in real data with a
model dubbed Snore-GAN. In a similar vein, researchers from Nvidia, the Mayo
Clinic, and the MGH and BWH Center for Clinical Data Science proposed a model
that generates synthetic magnetic resonance images (MRIs) of brains with
cancerous tumors.

Audio Player

Baltimore-based Insilico Medicine pioneered the use of GANs in molecular structure


creation for diseases with a known ligand (a complex biomolecule) but no target (a
protein associated with a disease process). Its team of researchers is actively
working on drug discovery programs in cancer, dermatological diseases, fibrosis,
Parkinson’s, Alzheimer’s, ALS, diabetes, sarcopenia, and aging.

Robotics
The field of robotics has a lot to gain from GANs, as it turns out.

A tuned discriminator can determine whether a machine’s trajectory has been


drawn from a distribution of human demonstrations or from synthesized examples.
In that way, it’s able to train agents to complete tasks accurately, even when it has
access only to the robot’s positional information. (Normally, training robot-directing
AI requires both positional and action data. The latter indicates which motors
moved over time.)

“The idea of using adversarial loss for training agent trajectories is not new, but
what’s new is allowing it to work with a lot less data,” Tang said. “The trick to
applying these adversarial learning approaches is figuring out which inputs the
discriminator has access to — what information is available to avoid being tricked
[by the discriminator] … [In state-of-the-art approaches], discriminators need
access to [positional] data alone, allowing us to train with expert demonstrations
where all we have are the state data.”

Tang says this enables the training of much more robust models than was previously
possible — models that require only about two dozen human demonstrations. “If
you reduce the amount of data that the discriminator has access to, you’re reducing
the complexity of the data set that you have to provide to the model. These types of
adversarial learning methods actually work pretty well in low-data regimes,” he
added.

Deepfake detection
GANs’ ability to generate convincing photos and videos of people makes them ripe
targets for abuse. Already, malicious actors have used models to generate fake
celebrity pornography.

But preliminary research suggests GANs could root out deepfakes just as effectively
as they produce them. A paper published on the preprint server Arxiv.org in March
describes spamGAN, which learns from a limited corpus of annotated and
unannotated data. In experiments, the researchers say that spamGAN outperformed
existing spam detection techniques with limited labeled data, achieving accuracy of
between 71% and 86% when trained on as little as 10% of labeled data.

Future directions
What might the future hold with respect to GANs? Despite the leaps and bounds
brought by this past decade of research, Tang cautions that it’s still early days.

“GANs are still [missing] very fine-grained control,” he said. “[That’s] a big
challenge.”

For his part, Mroueh believes that GAN-generated content will become increasingly
difficult to distinguish from real content.

“My feeling is that the field will improve,” he said. “Comparing image generation in
2014 to today, I wouldn’t have expected the quality to become that good. If the
progress continues like this, [GANs] will remain a very important research project.”

https://venturebeat.com/2019/12/26/gan-generative-adversarial-network-explainer-ai-machine-learning/

You might also like