You are on page 1of 6

EDITOR MARK CAMPBELL

IT INNOVATION EVOTEK; mark@evotek.com

Generative Artificial
Intelligence: Trends
and Prospects
Mlađan Jovanović , Singidunum University
Mark Campbell, EVOTEK GAI TECHNIQUES
Although there are many forms of GAI,
we will look at four of the most com-
Generative artificial intelligence can make mon techniques being leveraged today.

powerful artifacts when used at scale, Generative adversarial networks


Generative adversarial networks (GANs)
but developing trust in these artifacts and are the most prevalent GAI technique
controlling their creation are essential for being used today.3 A GAN uses a pair
of neural networks. One, known as the
user adoption. generator, synthesizes the content (for
example, an image of a human face).

G
The second, known as the discriminator,
enerative modeling is an artificial intelli- evaluates the authenticity of the generator’s content, (that
gence (AI) technique that generates synthetic is, whether the face is natural or fake). The networks re-
artifacts by analyzing training examples; peat this generate/discriminate cycle until the generator
learning their patterns and distribution; and produces content that the discriminator cannot discern
then creating realistic facsimiles. Generative AI (GAI) between real and synthetic.
uses generative modeling and advances in deep learning
(DL) to produce diverse content at scale by utilizing ex- Generative Pre-trained Transformer
isting media such as text, graphics, audio, and video.1,2 Generative Pre-trained Transformer (GPT) models gen-
While mainly used in research settings, GAI is entering erate text in different languages and can create hu-
various domains and everyday scenarios. This article man-sounding words, sentences, and paragraphs on al-
sheds light on the unique practical opportunities and most any topic and writing style—from convincing news
challenges GAI brings. articles and essays to conversations in customer-service
chatbots or characters in video games.4 These have ma-
Digital Object Identifier 10.1109/MC.2022.3192720
tured over several generations, each with an increased
Date of current version: 26 September 2022 parameter set trained on a more extensive online textual

CO M PUTE R 0018-9162/ 22©2022IEEE PUBLISHED BY THE IEEE COMPUTER SOCIET Y OCTOBER 2022 107
IT INNOVATION

corpus than the previous. One recent USING GAI Computer graphics
example is OpenAI’s GPT-3, which These and other GAI techniques are AlphaFold is a neural network that cre-
stunned the AI world by writing, with- being used in a host of applications, in- ates highly accurate 3D protein struc-
out human assistance, a convincing cluding the following. tures14 by modeling and predicting
article about scientists discovering a protein structures as a graph inference
herd of unicorns in the Andes.5 Natural language and music problem in 3D space where nearby res-
GPTs can be readily applied to natural idues define the edges of the graph.
The generative diffusion model language (NL) text generation. GPT-3,4 The pair representation is encoded as
The generative diffusion model (GDM) mentioned previously, has been success- a directed edge in a graph (that is, the
synthesizes content by taking a train- fully scaled to 175 billion learnable pa- connection between the residues). The
ing data distribution, gradually adding rameters and trained on global-scale NVIDIA Canvas application GauGAN
transforms a textual phrase like “ocean
waves hitting rocks on the beach” into
virtual landscape images in real time.
While mainly used in research settings, GAI is When adding adjectives like “sunset at
entering various domains and everyday scenarios. a rocky beach” or swapping “sunset” for
“afternoon” or “rainy day,” the model
modifies the picture instantly.15 Sim-
noise, and learning how to recover the corpora of textual exemplars. Aside ilarly, DALL•E is a compiled version of
data as a reversal of the noise addition from showing high performance on a GPT-3 that produces images from text
process.6 This way, data are generated variety of NL processing (NLP) tasks, descriptions for concepts expressed in
from randomly sampled noise through such as translation and question an- NL, taking text/image pairs as input.16
the learned denoising process. swering, it is also a competent text gen- The latest GDM-based approaches for
erator producing eerily human-like text-to-image generation are DALL•E
Geometric DL textual content.10 216,17 and Imagen,18 capable of produc-
Geometric DL (GDL) attempts to un- For ex a mple, a GP T-3 prog ra m ing diverse, high-quality artistic and
derstand, interpret, and describe AI wrote an entire student essay from realistic images, respectively. 3D-GAN
models in terms of geometric prin- a simple prompt (“The construct of creates 3D shapes19 that can be manip-
ciples. These principles have already ‘learning styles’ is problematic be- ulated in 3D spaces (geometric trans-
been extensively studied over domains cause…”).11 The narrative flowed as if formation) and then scaled down to 2 D
such as grids; transformations in ho- it was written by a human—plagia- image representations.
mogeneous spaces; graphs; and vector rism software did not detect copying,
bundles.7 Petar Veličković,8,9 a staff and a Google search showed that each Computer vision
research scientist at DeepMind and sentence was original. The authors Using semantic label maps as an input,
Affiliated Lecturer at the University of highlighted several potential uses of conditional GANs (CGANs) can produce
Cambridge, describes it as GAI tools in education, such as facil- images of high-fidelity urban scenes
itating creative writing (for example, containing objects. Changing labels
“…building machine learning (ML) students and AI write paragraphs in- modifies scenes concerning individual
models that respect variances and termittently to explore alternatives objects, such as replacing trees with
symmetries inside data. For exam- and overcome writer’s block) or devel- buildings or changing colors or tex-
ple, if you know your data lives on oping a critical analysis of academic tures.20 TediGAN (Text-Guided Diverse
a grid and it should be translation writing (for example, AI generates Face Image Generation and Manipu-
symmetric, you should end up texts on a topic, and students critique lation) creates human portrait draw-
driving something like a convolu- and revise them). ings from facial photos with random
tional neural network [CNN]. So, Language Model for Dialogue Appli- changes to facial attributes.21 SinGAN22
our blueprint that consists of the cations (LaMDA) is another example. is a single-image generative model that
chosen data domain and symme- This generative, textual conversational synthesizes realistic textures of arbi-
try group you want to be resistant agent mimics human conversations,12 trary size and aspect ratio with signifi-
can guide different ML architec- but unlike GPT models trained on text cant variability.
tures such as CNNs, GNNs, trans- corpora, it is trained on dialog corpora. Motion and Content decomposed
formers, or recurrent models. For Objective-Reinforced GAN (ORGAN) is GAN (MoCoGAN)23 conducts video
example, transformers are a special another example that produces time synthesis with two modalities separated.
case of attentional graph neural series artifacts in sequential media The content can have different move-
networks over the complete graph.” such as music.13 ments. Conversely, it can apply the same

108 COMPUTER W W W.CO M P U T E R .O R G /CO M P U T E R


action to varying content. Enhanced Su- resolution; inception score (that is, respect some kind of geometry
per-Resolution GAN (ESRGAN) improves the realism of generated images); and if you want your solutions to
the quality of the media through super- training time reduction.25 Nonfunc- be well constrained. Therefore,
resolution, a group of ML methods that tional evaluation, privacy, and secu- models that are mindful of ge-
upscale low-resolution image or video rity are also significant concerns for ometry are, in my opinion, more
media to a higher resolution.24 GANs.26 Some GAI evaluations, like a likely to succeed in the long run.”
Figure 1 illustrates the accuracy/ personal rating of the output for util-
complexity tradeoff for some typical ity, aesthetics, clarity, or similarity to EXPECTATIONS
GAI products. The accuracy refers to real-world content, are inherently sub- Along with algorithmic improve-
the intended performance of synthe- jective and difficult to evaluate. ments,25,26 future GAI solutions must
sized artifacts, while the complexity de- Currently, there are no standard meet the following expectations to
notes their richness in media content means to determine if a GAI is as realis- gain user trust and adoption.
and structure. tic to a user as a non-GAI application. Re-
spectively, reliable and consistent mea- Efficiency
EVALUATION surements of the effects of GANs are still Training and deploying GAI models
Metrics to verify the GAI efficiency and undetermined. Relatedly, Veličković9 leave a significant carbon footprint and
effectiveness of a generation task notes the following: high computation costs. For example,
can reuse current objective ML model GDMs naturally lag in sampling speed.27
evaluation techniques.1,2 For instance, “Geometric approaches are going Creating a fine-tuned, downsized model
GANs’ functional quality metrics fo- to be very important for genera- for input data and parameter space is a
cus on their inner workings by evalu- tive modeling because the data cost-efficient approach for researchers
ating outputs such as image quality; you are generating will need to and practitioners.

Future GAI Technology With Learning Capability


Protein Structure Human
Artistic Image Portrait
Superresolution Realistic Image
Dialog
Urban
Landscape Video
Synthetic Scene
Essay Image

Document Texture

Paragraph 3D Object
cy a

Music
Accur

Complexity

ESRGAN

SinGAN CGAN
GPT-3 DALLE AlphaFold MoCoGAN
GPT-2 ORGAN
GPT-1 GDL
LaMDA 3D-GAN GauGAN TediGAN
GPTs

GANs
DALLE2 Imagen

GDMs
Generative Models

FIGURE 1. The GAI landscape: generative models and artifacts.

OCTOBER 2022 109


IT INNOVATION

Explainability techniques that have a far better deploying GAI systems must diligently
Explaining the mechanisms of a deep signal-to-noise ratio.” strive to reduce model behavior risks.
neural network involves analyzing This requires teams to be thorough,
the input data properties (also known Fairness transparent, and proactive in commu-
as features) to determine which affect Although generative language mod- nicating identified threats, blind spots,
the outcomes and infer what happens els such as GPTs offer marked im- and areas where risks are unknown
inside the black box. However, deter- provements over various NLP tasks, when highlighting GAI system benefits.
mining which neurons affected the they require massive amounts of un-
synthesis of which output objects re- filtered online text. Consequently, GAI WORKFLOW
mains problematic today. Moreover, they can generate synthetic language To address these challenges, we envi-
in the case of GANs, quantifying the with bias, stereotypes, and harmful sion a future GAI deployment work-
mutual behavior of a pair of networks content.29 To manage these risks, flow as in Figure 2. This proposed
workflow can provide benefits beyond
GAI models. Because synthetic data
creation increases the data set size for
GAI has become a key technology in synthesized a particular GAI model, it can become
new virtual artifacts or enhanced semisynthesized part of the model’s iterative and incre-
mental development and deployment
augmented artifacts. cycle to continually improve perfor-
mance. Specifically, if a GAI model
is currently an intractable problem. A GAI providers should offer tools for works well overall but performs poorly
deeper understanding of GAI models’ preprocessing and curating training on certain features, more data can be
inner workings and explanations tai- data; monitoring and moderating the generated for those critical categories
lored to specific user groups are both media generation processes; and de- to help detect and correct errors.
still missing. Accordingly, Veličković9 veloping guidelines for responsible Moreover, traditional ML workflows
notes the following: deployment models. take good testing results on used data
sets as an indicator of modeling input
“Explainability will be very, very Ethics data distribution. However, physical- and
important for generative mod- GAI models can immediately synthe- digital-world data sets collected over
eling. In our recent research,28 size artifacts at scale for many differ- several model generations will change
mathematician collaborators ent contexts, from education to medi- their class structure and internal con-
took the explanations produced cal decision making. However, before nections. Newly synthesized objects
by our models and tried to prove diving into production deployment, or artifacts, such as a language, an
a mathematical theory based on model creators should clearly define environment, or a (human) being, are
these explanations, which is sig- their goals; identify beneficiaries; and examples of the data sets. When ob-
nificantly more rigorous than the confirm usage scenarios with target serving and evaluating such changes
original explainer’s output. What users to prevent unintended unethical in deployment, the straightforward
came out from these models was, product behavior. This requires that all way to expand generative capacity is to
to us, quite unintelligible. The affected stakeholders—GAI scientists, retrain the model with modified class
mathematicians were able to AI engineers, domain experts, regula- structure data.
spot the signal hidden inside the tory authorities, and target users—are GAI has become a key technology
tons of noise after looking at sent identified and actively participate. in synthesized new virtual artifacts
explanations for two weeks. So, or enhanced semisynthesized aug-
explainability methods are rudi- Accountability mented artifacts. GAI’s achievements
mentary, and if we truly want to Prospective users must weigh GAI prod- in many fields are paving the way for
make it accessible to everyone, ucts’ benefits against their risks. Organi- synthetic sciences to combine AI with
we really need explainability zations that are creating, training, and basic disciplines such as engineering,
biology, medicine, and environmental
Online/Production
science. Although current advance-
Offline/Test
ments within a discipline are seldom
(Re)Train Mitigate Verify Deploy Observe Evaluate connected with developments in oth-
ers due to various sociotechnical fac-
FIGURE 2. The future GAI model deployment workflow as a closed system with a
tors like communication norms, cul-
feedback loop.
tural differences, AI models, data, and

110 COMPUTER W W W.CO M P U T E R .O R G /CO M P U T E R


procedures, the metaverse is a promis- pp. 1877–1901. [Online]. Available: 7873, pp. 583–589, 2021, doi: 10.1038/
ing global, interdisciplinary testbed for https://proceedings.neurips.cc s41586-021-03819-2.
resolving these obstacles. GAI would /paper/2020/file/1457c0d6bfcb49 15. T. Park, M.-Y. Liu, T.-C. Wang, and
diminish the distinction between real 67418bfb8ac142f64a-Paper.pdf J.-Y. Zhu, “GauGAN: Semantic
and virtual artifacts to meld human 5. “GPT-3 discovers unicorns.” GPT- image synthesis with spatially
experience and behavior across the vir- 3. https://www.buildgpt3.com/ adaptive normalization,” in Proc.
tual and physical worlds. post/88/ (Accessed: Jul. 1, 2022). SIGGRAPH 2019 Real-Time Live!
6. J. Ho, A. Jain, and P. Abbeel, “Denois- New York, NY, USA, 2019, p. 1, doi:
ing diffusion probabilistic models,” 10.1145/3306305.3332370.

G
AI models and the synthetic in Proc. Adv. Neural Inf. Process. Syst., 16. A. Ramesh et al., “Zero-shot
artifacts they create are ever 2020, vol. 33, pp. 6840–6851. [On- text-to-image generation,” in Proc.
increasing in prevalence, adoption, line]. Available: https://proceedi 38th Int. Conf. Mach. Learn., 2021,
and sophistication. As verification, ngs.neurips.cc/paper/2020/file/4c5 pp. 8821–8831. [Online]. Available:
risk mitigation, and cross-disciplinary bcfec8584af0d967f1ab10179ca4b-Pa https://proceedings.mlr.press/v139/
deployment techniques evolve, GAI per.pdf ramesh21a.html
solutions will become a foundational 7. M. M. Bronstein, J. Bruna, T. Cohen, and 17. A. Ramesh, P. Dhariwal, A. Nichol,
feature of any data architecture. Chal- P. Veličković, “Geometric deep learn- C. Chu, and M. Chen, “Hierarchical
lenges like model explainability, cul- ing: Grids, groups, graphs, geodesics, text-conditional image generation
tural acceptance, and sociotechnical and gauges,” 2021, arXiv:2104.13478. with CLIP latents,” Apr. 13, 2022.
issues remain but will be surmounted 8. C. K. Joshi. “Transformers are Accessed: Jul. 15, 2022. [Online].
as GAI applications evolve into an es- graph neural networks.” The Available: https://arxiv.org/abs/
calating arena of deployment. Gradient. https://thegradient.pub/ 2204.06125
transformers-are-graph-neural 18. C. Saharia et al., “Photorealistic
ACKNOWLEDGMENT -networks/ (Accessed: Jul. 15, 2022). text-to-image diffusion models with
We thank Petar Veličković for his 9. P. Veličković, private communica- deep language understanding,” May
constructive suggestions and helpful tion, Jul. 5, 2022. 23, 2022. Accessed: Jul. 15, 2022.
comments. 10. “Text generation examples.” OpenAI. [Online]. Available: https://aps.arxiv.
https://beta.openai.com/examples? org/abs/2205.11487
REFERENCES category=generation (Accessed: 19. J. Wu, C. Zhang, T. Xue, B. Freeman,
1. J. Gui, Z. Sun, Y. Wen, D. Tao, and J. Ye, Jul. 15, 2022). and J. Tenenbaum, “Learning a prob-
“A review on generative adversarial 11. M. Sharples and R. y. Pérez, Story abilistic latent space of object shapes
networks: Algorithms, theory, and Machines: How Computers Have Be- via 3D generative-adversarial model-
applications,” IEEE Trans. Knowl. Data come Creative Writers. London, U.K.: ing,” in Proc. Adv. Neural Inf. Process.
Eng., early access, 2021. [Online]. Routledge, 2022. [Online]. Available: Syst., Barcelona, 2016, pp. 82–90, doi:
Available: https://ieeexplore.ieee. https://www.routledge.com/Story 10.5555/3157096.3157106. [Online].
org/document/9625798/authors#aut -Machines-How-Computers-Have-Be Available: https://papers.nips.cc
hors, doi: 10.1109/TKDE.2021.3130191. come-Creative-Writers/Sharples /paper/2016/hash/44f683a84163b
2. M. Abukmeil, S. Ferrari, A. Geno- -Perez/p/book/9780367751975 3523afe57c2e008bc8c-Abstract.html
vese, V. Piuri, and F. Scotti, “A survey 12. R. Thoppilan et al., “LaMDA: Lan- 20. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A.
of unsupervised generative models guage models for dialog applica- Tao, J. Kautz, and B. Catanzaro,
for exploratory data analysis and tions,” Feb. 10, 2022. Accessed: Jul. “High-resolution image synthesis
representation learning,” ACM 15, 2022. [Online]. Available: https:// and semantic manipulation with
Comput. Surv., vol. 54, no. 5, pp. 1–40, arxiv.org/abs/2201.08239 conditional GANs,” in Proc. IEEE/
2021, doi: 10.1145/3450963. 13. G. L. Guimaraes, B. Sanchez-Lenge- CVF Conf. Comput. Vis. Pattern
3. I. J. Goodfellow et al., “Generative ling, C. Outeiral, P. L. C. Farias, and Recognit., Salt Lake City, 2018, pp.
adversarial nets,” in Proc. Adv. Neural A. Aspuru-Guzik, “Objective-rein- 8798–8807. [Online]. Available:
Inf. Process. Syst., 2014, vol. 27, forced generative adversarial net- https://ieeexplore.ieee.org/docu
pp. 1–9. [Online]. Available: https:// works (ORGAN) for sequence genera- ment/8579015, doi: 10.1109/CVPR.
proceedings.neurips.cc/paper/2014 tion models,” May 30, 2017. Accessed: 2018.00917.
/file/5ca3e9b122f61f8f06494c97b1a Jul. 15, 2022. [Online]. Available: 21. W. Xia, Y. Yang, J.-H. Xue, and B.
fccf3-Paper.pdf https://arxiv.org/abs/1705.10843 Wu, “TediGAN: Text-guided diverse
4. T. Brown et al., “Language models 14. J. Jumper et al., “Highly accurate face image generation and manip-
are few-shot learners,” in Proc. Adv. protein structure prediction with ulation,” in Proc. IEEE/CVF Conf.
Neural Inf. Process. Syst., 2020, vol. 33, AlphaFold,” Nature, vol. 596, no. Comput. Vis. Pattern Recognit. (CVPR),

OCTOBER 2022 111


IT INNOVATION

Nashville, 2021, pp. 2256–2265. [On- adversarial networks,” in Proc. Eur. 28. A. Davies et al., “Advancing math-
line]. Available: https://ieeexplore. Conf. Comput. Vis.-ECCV Work- ematics by guiding human intu-
ieee.org/document/9578577, doi: shops, Munich, 2018, pp. 63–79, doi: ition with AI,” Nature, vol. 600, no.
10.1109/CVPR46437.2021.00229. 10.1007/978-3-030-11021-5_5. 7887, pp. 70–74, 2021, doi: 10.1038/
22. T. R. Shaham, T. Dekel, and T. 25. T. Karras, T. Aila, S. Laine, and s41586-021-04086-x.
Michaeli, “SinGAN: Learning a gen- J. Lehtinen, “Progressive growing 29. P. Schramowski, C. Turan, N. Ander-
erative model from a single natural of GANs for improved quality, sen, C. A. Rothkopf, and K. Kersting,
image,” in Proc. IEEE/CVF Int. Conf. stability, and variation,” Feb. 26, “Large pre-trained language models
Comput. Vis. (ICCV), Seoul, 2019, 2018. Accessed: Jul. 15, 2022. [On- contain human-like biases of what is
pp. 4570–4580. [Online]. Available: line]. Available: https://arxiv.org/ right and wrong to do,” Nature Mach.
https://ieeexplore.ieee.org/docu abs/1710.10196 Intell., vol. 4, no. 3, pp. 258–268, 2022,
ment/9008787, doi: 10.1109 26. Z. Cai, Z. Xiong, H. Xu, P. Wang, W. doi: 10.1038/s42256-022-00458-8.
/ICCV.2019.00467. Li, and Y. Pan, “Generative adver-
23. S. Tulyakov, M.-Y. Liu, X. Yang, and sarial networks: A survey toward MLAĐAN JOVANOVIĆ is an assistant
J. Kautz, “MoCoGAN: Decomposing private and secure applications,” professor at Singidunum University,
motion and content for video genera- ACM Comput. Surv., vol. 54, no. 6, pp. Belgrade, 11000, Serbia. Contact him
tion,” in Proc. IEEE/CVF Conf. Comput. 1–38, 2022, doi: 10.1145/3459992. at mjovanovic@singidunum.ac.rs.
Vis. Pattern Recognit., Salt Lake City, 27. A. Vahdat and K. Kreis. “Improving
2018, pp. 1526–1535. [Online]. Avail- diffusion models as an alternative to MARK CAMPBELL is the chief
able: https://ieeexplore.ieee.org/ GANs.” nvidia. https://developer. innovation officer for EVOTEK,
document/8578263 nvidia.com/blog/improving-diffusion San Diego, California, 92121, USA.
24. X. Wang et al., “ESRGAN: Enhanced -models-as-an-alternative-to-gans Contact him at mark@evotek.com.
super-resolution generative -part-1/ (Accessed: Jul. 15, 2022).

CG& A IEEE Computer Graphics and Applications bridges the theory


and practice of computer graphics. Subscribe to CG&A and
www.computer.org/cga

• stay current on the latest tools and applications and gain


invaluable practical and research knowledge,
• discover cutting-edge applications and learn more about
the latest techniques, and
• benefit from CG&A’s active and connected editorial board.

Digital Object Identifier 10.1109/MC.2022.3202583

112 COMPUTER W W W.CO M P U T E R .O R G /CO M P U T E R

You might also like