Professional Documents
Culture Documents
Generative Artificial
Intelligence: Trends
and Prospects
Mlađan Jovanović , Singidunum University
Mark Campbell, EVOTEK GAI TECHNIQUES
Although there are many forms of GAI,
we will look at four of the most com-
Generative artificial intelligence can make mon techniques being leveraged today.
G
The second, known as the discriminator,
enerative modeling is an artificial intelli- evaluates the authenticity of the generator’s content, (that
gence (AI) technique that generates synthetic is, whether the face is natural or fake). The networks re-
artifacts by analyzing training examples; peat this generate/discriminate cycle until the generator
learning their patterns and distribution; and produces content that the discriminator cannot discern
then creating realistic facsimiles. Generative AI (GAI) between real and synthetic.
uses generative modeling and advances in deep learning
(DL) to produce diverse content at scale by utilizing ex- Generative Pre-trained Transformer
isting media such as text, graphics, audio, and video.1,2 Generative Pre-trained Transformer (GPT) models gen-
While mainly used in research settings, GAI is entering erate text in different languages and can create hu-
various domains and everyday scenarios. This article man-sounding words, sentences, and paragraphs on al-
sheds light on the unique practical opportunities and most any topic and writing style—from convincing news
challenges GAI brings. articles and essays to conversations in customer-service
chatbots or characters in video games.4 These have ma-
Digital Object Identifier 10.1109/MC.2022.3192720
tured over several generations, each with an increased
Date of current version: 26 September 2022 parameter set trained on a more extensive online textual
CO M PUTE R 0018-9162/ 22©2022IEEE PUBLISHED BY THE IEEE COMPUTER SOCIET Y OCTOBER 2022 107
IT INNOVATION
corpus than the previous. One recent USING GAI Computer graphics
example is OpenAI’s GPT-3, which These and other GAI techniques are AlphaFold is a neural network that cre-
stunned the AI world by writing, with- being used in a host of applications, in- ates highly accurate 3D protein struc-
out human assistance, a convincing cluding the following. tures14 by modeling and predicting
article about scientists discovering a protein structures as a graph inference
herd of unicorns in the Andes.5 Natural language and music problem in 3D space where nearby res-
GPTs can be readily applied to natural idues define the edges of the graph.
The generative diffusion model language (NL) text generation. GPT-3,4 The pair representation is encoded as
The generative diffusion model (GDM) mentioned previously, has been success- a directed edge in a graph (that is, the
synthesizes content by taking a train- fully scaled to 175 billion learnable pa- connection between the residues). The
ing data distribution, gradually adding rameters and trained on global-scale NVIDIA Canvas application GauGAN
transforms a textual phrase like “ocean
waves hitting rocks on the beach” into
virtual landscape images in real time.
While mainly used in research settings, GAI is When adding adjectives like “sunset at
entering various domains and everyday scenarios. a rocky beach” or swapping “sunset” for
“afternoon” or “rainy day,” the model
modifies the picture instantly.15 Sim-
noise, and learning how to recover the corpora of textual exemplars. Aside ilarly, DALL•E is a compiled version of
data as a reversal of the noise addition from showing high performance on a GPT-3 that produces images from text
process.6 This way, data are generated variety of NL processing (NLP) tasks, descriptions for concepts expressed in
from randomly sampled noise through such as translation and question an- NL, taking text/image pairs as input.16
the learned denoising process. swering, it is also a competent text gen- The latest GDM-based approaches for
erator producing eerily human-like text-to-image generation are DALL•E
Geometric DL textual content.10 216,17 and Imagen,18 capable of produc-
Geometric DL (GDL) attempts to un- For ex a mple, a GP T-3 prog ra m ing diverse, high-quality artistic and
derstand, interpret, and describe AI wrote an entire student essay from realistic images, respectively. 3D-GAN
models in terms of geometric prin- a simple prompt (“The construct of creates 3D shapes19 that can be manip-
ciples. These principles have already ‘learning styles’ is problematic be- ulated in 3D spaces (geometric trans-
been extensively studied over domains cause…”).11 The narrative flowed as if formation) and then scaled down to 2 D
such as grids; transformations in ho- it was written by a human—plagia- image representations.
mogeneous spaces; graphs; and vector rism software did not detect copying,
bundles.7 Petar Veličković,8,9 a staff and a Google search showed that each Computer vision
research scientist at DeepMind and sentence was original. The authors Using semantic label maps as an input,
Affiliated Lecturer at the University of highlighted several potential uses of conditional GANs (CGANs) can produce
Cambridge, describes it as GAI tools in education, such as facil- images of high-fidelity urban scenes
itating creative writing (for example, containing objects. Changing labels
“…building machine learning (ML) students and AI write paragraphs in- modifies scenes concerning individual
models that respect variances and termittently to explore alternatives objects, such as replacing trees with
symmetries inside data. For exam- and overcome writer’s block) or devel- buildings or changing colors or tex-
ple, if you know your data lives on oping a critical analysis of academic tures.20 TediGAN (Text-Guided Diverse
a grid and it should be translation writing (for example, AI generates Face Image Generation and Manipu-
symmetric, you should end up texts on a topic, and students critique lation) creates human portrait draw-
driving something like a convolu- and revise them). ings from facial photos with random
tional neural network [CNN]. So, Language Model for Dialogue Appli- changes to facial attributes.21 SinGAN22
our blueprint that consists of the cations (LaMDA) is another example. is a single-image generative model that
chosen data domain and symme- This generative, textual conversational synthesizes realistic textures of arbi-
try group you want to be resistant agent mimics human conversations,12 trary size and aspect ratio with signifi-
can guide different ML architec- but unlike GPT models trained on text cant variability.
tures such as CNNs, GNNs, trans- corpora, it is trained on dialog corpora. Motion and Content decomposed
formers, or recurrent models. For Objective-Reinforced GAN (ORGAN) is GAN (MoCoGAN)23 conducts video
example, transformers are a special another example that produces time synthesis with two modalities separated.
case of attentional graph neural series artifacts in sequential media The content can have different move-
networks over the complete graph.” such as music.13 ments. Conversely, it can apply the same
Document Texture
Paragraph 3D Object
cy a
Music
Accur
Complexity
ESRGAN
SinGAN CGAN
GPT-3 DALLE AlphaFold MoCoGAN
GPT-2 ORGAN
GPT-1 GDL
LaMDA 3D-GAN GauGAN TediGAN
GPTs
GANs
DALLE2 Imagen
GDMs
Generative Models
Explainability techniques that have a far better deploying GAI systems must diligently
Explaining the mechanisms of a deep signal-to-noise ratio.” strive to reduce model behavior risks.
neural network involves analyzing This requires teams to be thorough,
the input data properties (also known Fairness transparent, and proactive in commu-
as features) to determine which affect Although generative language mod- nicating identified threats, blind spots,
the outcomes and infer what happens els such as GPTs offer marked im- and areas where risks are unknown
inside the black box. However, deter- provements over various NLP tasks, when highlighting GAI system benefits.
mining which neurons affected the they require massive amounts of un-
synthesis of which output objects re- filtered online text. Consequently, GAI WORKFLOW
mains problematic today. Moreover, they can generate synthetic language To address these challenges, we envi-
in the case of GANs, quantifying the with bias, stereotypes, and harmful sion a future GAI deployment work-
mutual behavior of a pair of networks content.29 To manage these risks, flow as in Figure 2. This proposed
workflow can provide benefits beyond
GAI models. Because synthetic data
creation increases the data set size for
GAI has become a key technology in synthesized a particular GAI model, it can become
new virtual artifacts or enhanced semisynthesized part of the model’s iterative and incre-
mental development and deployment
augmented artifacts. cycle to continually improve perfor-
mance. Specifically, if a GAI model
is currently an intractable problem. A GAI providers should offer tools for works well overall but performs poorly
deeper understanding of GAI models’ preprocessing and curating training on certain features, more data can be
inner workings and explanations tai- data; monitoring and moderating the generated for those critical categories
lored to specific user groups are both media generation processes; and de- to help detect and correct errors.
still missing. Accordingly, Veličković9 veloping guidelines for responsible Moreover, traditional ML workflows
notes the following: deployment models. take good testing results on used data
sets as an indicator of modeling input
“Explainability will be very, very Ethics data distribution. However, physical- and
important for generative mod- GAI models can immediately synthe- digital-world data sets collected over
eling. In our recent research,28 size artifacts at scale for many differ- several model generations will change
mathematician collaborators ent contexts, from education to medi- their class structure and internal con-
took the explanations produced cal decision making. However, before nections. Newly synthesized objects
by our models and tried to prove diving into production deployment, or artifacts, such as a language, an
a mathematical theory based on model creators should clearly define environment, or a (human) being, are
these explanations, which is sig- their goals; identify beneficiaries; and examples of the data sets. When ob-
nificantly more rigorous than the confirm usage scenarios with target serving and evaluating such changes
original explainer’s output. What users to prevent unintended unethical in deployment, the straightforward
came out from these models was, product behavior. This requires that all way to expand generative capacity is to
to us, quite unintelligible. The affected stakeholders—GAI scientists, retrain the model with modified class
mathematicians were able to AI engineers, domain experts, regula- structure data.
spot the signal hidden inside the tory authorities, and target users—are GAI has become a key technology
tons of noise after looking at sent identified and actively participate. in synthesized new virtual artifacts
explanations for two weeks. So, or enhanced semisynthesized aug-
explainability methods are rudi- Accountability mented artifacts. GAI’s achievements
mentary, and if we truly want to Prospective users must weigh GAI prod- in many fields are paving the way for
make it accessible to everyone, ucts’ benefits against their risks. Organi- synthetic sciences to combine AI with
we really need explainability zations that are creating, training, and basic disciplines such as engineering,
biology, medicine, and environmental
Online/Production
science. Although current advance-
Offline/Test
ments within a discipline are seldom
(Re)Train Mitigate Verify Deploy Observe Evaluate connected with developments in oth-
ers due to various sociotechnical fac-
FIGURE 2. The future GAI model deployment workflow as a closed system with a
tors like communication norms, cul-
feedback loop.
tural differences, AI models, data, and
G
AI models and the synthetic in Proc. Adv. Neural Inf. Process. Syst., 16. A. Ramesh et al., “Zero-shot
artifacts they create are ever 2020, vol. 33, pp. 6840–6851. [On- text-to-image generation,” in Proc.
increasing in prevalence, adoption, line]. Available: https://proceedi 38th Int. Conf. Mach. Learn., 2021,
and sophistication. As verification, ngs.neurips.cc/paper/2020/file/4c5 pp. 8821–8831. [Online]. Available:
risk mitigation, and cross-disciplinary bcfec8584af0d967f1ab10179ca4b-Pa https://proceedings.mlr.press/v139/
deployment techniques evolve, GAI per.pdf ramesh21a.html
solutions will become a foundational 7. M. M. Bronstein, J. Bruna, T. Cohen, and 17. A. Ramesh, P. Dhariwal, A. Nichol,
feature of any data architecture. Chal- P. Veličković, “Geometric deep learn- C. Chu, and M. Chen, “Hierarchical
lenges like model explainability, cul- ing: Grids, groups, graphs, geodesics, text-conditional image generation
tural acceptance, and sociotechnical and gauges,” 2021, arXiv:2104.13478. with CLIP latents,” Apr. 13, 2022.
issues remain but will be surmounted 8. C. K. Joshi. “Transformers are Accessed: Jul. 15, 2022. [Online].
as GAI applications evolve into an es- graph neural networks.” The Available: https://arxiv.org/abs/
calating arena of deployment. Gradient. https://thegradient.pub/ 2204.06125
transformers-are-graph-neural 18. C. Saharia et al., “Photorealistic
ACKNOWLEDGMENT -networks/ (Accessed: Jul. 15, 2022). text-to-image diffusion models with
We thank Petar Veličković for his 9. P. Veličković, private communica- deep language understanding,” May
constructive suggestions and helpful tion, Jul. 5, 2022. 23, 2022. Accessed: Jul. 15, 2022.
comments. 10. “Text generation examples.” OpenAI. [Online]. Available: https://aps.arxiv.
https://beta.openai.com/examples? org/abs/2205.11487
REFERENCES category=generation (Accessed: 19. J. Wu, C. Zhang, T. Xue, B. Freeman,
1. J. Gui, Z. Sun, Y. Wen, D. Tao, and J. Ye, Jul. 15, 2022). and J. Tenenbaum, “Learning a prob-
“A review on generative adversarial 11. M. Sharples and R. y. Pérez, Story abilistic latent space of object shapes
networks: Algorithms, theory, and Machines: How Computers Have Be- via 3D generative-adversarial model-
applications,” IEEE Trans. Knowl. Data come Creative Writers. London, U.K.: ing,” in Proc. Adv. Neural Inf. Process.
Eng., early access, 2021. [Online]. Routledge, 2022. [Online]. Available: Syst., Barcelona, 2016, pp. 82–90, doi:
Available: https://ieeexplore.ieee. https://www.routledge.com/Story 10.5555/3157096.3157106. [Online].
org/document/9625798/authors#aut -Machines-How-Computers-Have-Be Available: https://papers.nips.cc
hors, doi: 10.1109/TKDE.2021.3130191. come-Creative-Writers/Sharples /paper/2016/hash/44f683a84163b
2. M. Abukmeil, S. Ferrari, A. Geno- -Perez/p/book/9780367751975 3523afe57c2e008bc8c-Abstract.html
vese, V. Piuri, and F. Scotti, “A survey 12. R. Thoppilan et al., “LaMDA: Lan- 20. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A.
of unsupervised generative models guage models for dialog applica- Tao, J. Kautz, and B. Catanzaro,
for exploratory data analysis and tions,” Feb. 10, 2022. Accessed: Jul. “High-resolution image synthesis
representation learning,” ACM 15, 2022. [Online]. Available: https:// and semantic manipulation with
Comput. Surv., vol. 54, no. 5, pp. 1–40, arxiv.org/abs/2201.08239 conditional GANs,” in Proc. IEEE/
2021, doi: 10.1145/3450963. 13. G. L. Guimaraes, B. Sanchez-Lenge- CVF Conf. Comput. Vis. Pattern
3. I. J. Goodfellow et al., “Generative ling, C. Outeiral, P. L. C. Farias, and Recognit., Salt Lake City, 2018, pp.
adversarial nets,” in Proc. Adv. Neural A. Aspuru-Guzik, “Objective-rein- 8798–8807. [Online]. Available:
Inf. Process. Syst., 2014, vol. 27, forced generative adversarial net- https://ieeexplore.ieee.org/docu
pp. 1–9. [Online]. Available: https:// works (ORGAN) for sequence genera- ment/8579015, doi: 10.1109/CVPR.
proceedings.neurips.cc/paper/2014 tion models,” May 30, 2017. Accessed: 2018.00917.
/file/5ca3e9b122f61f8f06494c97b1a Jul. 15, 2022. [Online]. Available: 21. W. Xia, Y. Yang, J.-H. Xue, and B.
fccf3-Paper.pdf https://arxiv.org/abs/1705.10843 Wu, “TediGAN: Text-guided diverse
4. T. Brown et al., “Language models 14. J. Jumper et al., “Highly accurate face image generation and manip-
are few-shot learners,” in Proc. Adv. protein structure prediction with ulation,” in Proc. IEEE/CVF Conf.
Neural Inf. Process. Syst., 2020, vol. 33, AlphaFold,” Nature, vol. 596, no. Comput. Vis. Pattern Recognit. (CVPR),
Nashville, 2021, pp. 2256–2265. [On- adversarial networks,” in Proc. Eur. 28. A. Davies et al., “Advancing math-
line]. Available: https://ieeexplore. Conf. Comput. Vis.-ECCV Work- ematics by guiding human intu-
ieee.org/document/9578577, doi: shops, Munich, 2018, pp. 63–79, doi: ition with AI,” Nature, vol. 600, no.
10.1109/CVPR46437.2021.00229. 10.1007/978-3-030-11021-5_5. 7887, pp. 70–74, 2021, doi: 10.1038/
22. T. R. Shaham, T. Dekel, and T. 25. T. Karras, T. Aila, S. Laine, and s41586-021-04086-x.
Michaeli, “SinGAN: Learning a gen- J. Lehtinen, “Progressive growing 29. P. Schramowski, C. Turan, N. Ander-
erative model from a single natural of GANs for improved quality, sen, C. A. Rothkopf, and K. Kersting,
image,” in Proc. IEEE/CVF Int. Conf. stability, and variation,” Feb. 26, “Large pre-trained language models
Comput. Vis. (ICCV), Seoul, 2019, 2018. Accessed: Jul. 15, 2022. [On- contain human-like biases of what is
pp. 4570–4580. [Online]. Available: line]. Available: https://arxiv.org/ right and wrong to do,” Nature Mach.
https://ieeexplore.ieee.org/docu abs/1710.10196 Intell., vol. 4, no. 3, pp. 258–268, 2022,
ment/9008787, doi: 10.1109 26. Z. Cai, Z. Xiong, H. Xu, P. Wang, W. doi: 10.1038/s42256-022-00458-8.
/ICCV.2019.00467. Li, and Y. Pan, “Generative adver-
23. S. Tulyakov, M.-Y. Liu, X. Yang, and sarial networks: A survey toward MLAĐAN JOVANOVIĆ is an assistant
J. Kautz, “MoCoGAN: Decomposing private and secure applications,” professor at Singidunum University,
motion and content for video genera- ACM Comput. Surv., vol. 54, no. 6, pp. Belgrade, 11000, Serbia. Contact him
tion,” in Proc. IEEE/CVF Conf. Comput. 1–38, 2022, doi: 10.1145/3459992. at mjovanovic@singidunum.ac.rs.
Vis. Pattern Recognit., Salt Lake City, 27. A. Vahdat and K. Kreis. “Improving
2018, pp. 1526–1535. [Online]. Avail- diffusion models as an alternative to MARK CAMPBELL is the chief
able: https://ieeexplore.ieee.org/ GANs.” nvidia. https://developer. innovation officer for EVOTEK,
document/8578263 nvidia.com/blog/improving-diffusion San Diego, California, 92121, USA.
24. X. Wang et al., “ESRGAN: Enhanced -models-as-an-alternative-to-gans Contact him at mark@evotek.com.
super-resolution generative -part-1/ (Accessed: Jul. 15, 2022).