Professional Documents
Culture Documents
com
The evolution of GANs — which Facebook AI research director Yann LeCun has
called the most interesting idea of the decade — is somewhat long and winding, and
very much continues to this day. They have their deficiencies, but GANs remain one
of the most versatile neural network architectures in use today.
History of GANs
The idea of pitting two algorithms against each other originated with Arthur Samuel,
a prominent researcher in the field of computer science who’s credited with
popularized the term “machine learning.” While at IBM, he devised a checkers game
— the Samuel Checkers-playing Program — that was among the first to successfully
self-learn, in part by estimating the chance of each side’s victory at a given position.
But if Samuel is the grandfather of GANs, Ian Goodfellow, former Google Brain
research scientist and director of machine learning at Apple’s Special Projects
Group, might be their father. In a seminal 2014 research paper simply titled
“Generative Adversarial Nets,” Goodfellow and colleagues describe the first working
implementation of a generative model based on adversarial networks.
GAN architecture
Again, GANs consist of two parts: generators and discriminators. The generator
model produces synthetic examples (e.g., images) from random noise sampled using
a distribution, which along with real examples from a training data set are fed to the
discriminator, which attempts to distinguish between the two. Both the generator
and discriminator improve in their respective abilities until the discriminator is
unable to tell the real examples from the synthesized examples with better than the
50% accuracy expected of chance.
GANs train in an unsupervised fashion, meaning that they infer the patterns within
data sets without reference to known, labeled, or annotated outcomes. Interestingly,
the discriminator’s work informs that of the generator — every time the
discriminator correctly identifies a synthesized work, it tells the generator how to
tweak its output so that it might be more realistic in the future.
The generator and discriminator also run the risk of overpowering each other. If the
generator becomes too accurate, it’ll exploit weaknesses in the discriminator that
lead to undesirable results, whereas if the discriminator becomes too accurate, it’ll
impede the generator’s progress toward convergence.
A lack of training data also threatens to impede GANs’ progress in the semantic
realm, which in this context refers to the relationships among objects. Today’s best
GANs struggle to reconcile the difference between palming and holding an object,
for example — a differentiation most humans make in seconds.
“There [aren’t] that many well-curated data sets to start … applying GANs to,” Tang
said. “GANs just follow where the data sets are going.”
On the subject of compute, Youssef Mroueh, a research staff member in the IBM
multi-modal algorithms and engines group, is working with colleagues to develop
lightweight models dubbed “small GANs” that reduce training time and memory
usage. The bulk of their research is concentrated in the MIT-IBM Watson AI Lab, a
joint AI research effort between the Massachusetts Institute of Technology and IBM.
“[It’s a] challenging business question: How can we change [the] modeling without
all the computation and hassle?” Mroueh said. “That’s what we’re working toward.”
GAN applications
Image and video synthesis
GANs are perhaps best known for their contributions to image synthesis.
In June 2019, Microsoft researchers detailed ObjGAN, a novel GAN that could
understand captions, sketch layouts, and refine the details based on the wording.
The coauthors of a related study proposed a system — StoryGAN — that
synthesizes storyboards from paragraphs.
Such models have made their way into production. Startup Vue.ai‘s GAN susses out
clothing characteristics and learns to produce realistic poses, skin colors, and other
features. From snapshots of apparel, it can generate model images in every size up
to five times faster than a traditional photo shoot.
On the object synthesis side of the equation, Google and MIT’s Computer Science
and Artificial Intelligence Laboratory (CSAIL) developed a GAN that can generate
images of 3D models with realistic lighting and reflections and enables shape and
texture editing, as well as viewpoint shifts.
Video
Predicting future events from only a few video frames — a task once considered
impossible — is nearly within grasp thanks to state-of-the-art approaches involving
GANs and novel data sets.
One of the newest papers on the subject from DeepMind details recent advances in
the budding field of AI clip generation. Thanks to “computationally efficient”
components and techniques and a new custom-tailored data set, researchers say
their best-performing model — Dual Video Discriminator GAN (DVD-GAN) — can
generate coherent 256 x 256-pixel videos of “notable fidelity” up to 48 frames in
length.
In a twist on the video synthesis formula, Cambridge Consultants last year demoed a
model called DeepRay that invents video frames to mitigate distortion caused by
rain, dirt, smoke, and other debris.
Artwork
GANs are capable of more than generating images and video footage. When trained
on the right data sets, they’re able to produce de novo works of art.
Researchers at the Indian Institute of Technology Hyderabad and the Sri Sathya Sai
Institute of Higher Learning devised a GAN, dubbed SkeGAN, that generates stroke-
based vector sketches of cats, firetrucks, mosquitoes, and yoga poses.
Music
GANs are architecturally well-suited to generating media, and that includes music.
“For a long time, [GANs research] has been about improving the training instabilities
whatever the modality is — text, images, sentences, et cetera. Engineering is one
thing, but it’s also [about] coming up with [the right] architecture,” said Mroueh.
“It’s a combination of lots of things.”
Speech
Google and Imperial College London researchers recently set out to create a GAN-
based text-to-speech system capable of matching (or besting) state-of-the-art
methods. Their proposed system — GAN-TTS — consists of a neural network that
learned to produce raw audio by training on a corpus of speech with 567 pieces of
encoded phonetic, duration, and pitch data. To enable the model to generate
sentences of arbitrary length, the coauthors sampled 44 hours’ worth of two-
second snippets together with the corresponding linguistic features computed for
five-millisecond snippets. An ensemble of 10 discriminators — some of which assess
linguistic conditioning, while others assess general realism — attempt to distinguish
between real and synthetic speech.
Medicine
In the medical field, GANs have been used to produce data on which other AI
models — in some cases, other GANs — might train and to invent treatments for
rare diseases that to date haven’t received much attention.
Audio Player
Robotics
The field of robotics has a lot to gain from GANs, as it turns out.
“The idea of using adversarial loss for training agent trajectories is not new, but
what’s new is allowing it to work with a lot less data,” Tang said. “The trick to
applying these adversarial learning approaches is figuring out which inputs the
discriminator has access to — what information is available to avoid being tricked
[by the discriminator] … [In state-of-the-art approaches], discriminators need
access to [positional] data alone, allowing us to train with expert demonstrations
where all we have are the state data.”
Tang says this enables the training of much more robust models than was previously
possible — models that require only about two dozen human demonstrations. “If
you reduce the amount of data that the discriminator has access to, you’re reducing
the complexity of the data set that you have to provide to the model. These types of
adversarial learning methods actually work pretty well in low-data regimes,” he
added.
Deepfake detection
GANs’ ability to generate convincing photos and videos of people makes them ripe
targets for abuse. Already, malicious actors have used models to generate fake
celebrity pornography.
But preliminary research suggests GANs could root out deepfakes just as effectively
as they produce them. A paper published on the preprint server Arxiv.org in March
describes spamGAN, which learns from a limited corpus of annotated and
unannotated data. In experiments, the researchers say that spamGAN outperformed
existing spam detection techniques with limited labeled data, achieving accuracy of
between 71% and 86% when trained on as little as 10% of labeled data.
Future directions
What might the future hold with respect to GANs? Despite the leaps and bounds
brought by this past decade of research, Tang cautions that it’s still early days.
“GANs are still [missing] very fine-grained control,” he said. “[That’s] a big
challenge.”
For his part, Mroueh believes that GAN-generated content will become increasingly
difficult to distinguish from real content.
“My feeling is that the field will improve,” he said. “Comparing image generation in
2014 to today, I wouldn’t have expected the quality to become that good. If the
progress continues like this, [GANs] will remain a very important research project.”
https://venturebeat.com/2019/12/26/gan-generative-adversarial-network-explainer-ai-machine-learning/