Professional Documents
Culture Documents
Date: 13-06-2023
1
1. Introduction
● Generative AI are possible due to the architecture like transformers, generative adversarial
networks or variational autoencoders
● Discussion on the taxonomy of the main generative models in industry and analyze these
models based on category
● Review of the applications of the models and the content they generate
2
2. Taxonomy of Generative AI
● Input to Output Format of model
3
2.1 Input to Output Format of model
4
2.2 Timeline of the release of model
5
2.3 Developer of the model
● Generative AI requires huge resources
● Collaboration of large companies with academia
6
3. Generative AI models categories
1. Text-to-image models
2. Text-to-3D models
3. Image-to-Text models
4. Text-to-Video models
5. Text-to-Audio models
6. Text-to-Text models
7. Text-to-Code models
8. Text-to-Science models
9. Other models
7
3.1 Text to Image Models
Dall-E 2
● Generate image and art through the prompt with text description
● Uses the Contrastive Language Image Pre Training (CLIP) neural network
● CLIP is able to find the relation between textual semantics and their visual
representations
● CLIP is combined with the prior model called GLIDE to create the images
● Application: Synthetic data generation and image editing
Image from https://www.assemblyai.com/blog/how-dall-e-2-actually-works/ Image generated from the prompt ”A cat wearing a beret and black turtleneck”.
8
3.1 Text to Image Models
Imagen
● Based on pretrained text encoder which generate text to sequence of word embeddings
● Cascade of conditional diffusion models which maps embeddings to images
● Discovered large language models, pretrained on text-only corpora very effective at
encoding text for image synthesis
● Increasing the size of the language model boosts both sample fidelity and image-text
alignment compared to the increasing size of image diffusion model
”A Golden Retriever dog wearing a blue checkered beret and red dotted
https://imagen.research.google/
turtleneck”.
9
3.1 Text to Image Models
Stable Diffusion
10
3.1 Text to Image Models
Muse
https://blog.metaphysic.ai/muse-googles-super-fast-text-to-image-model-aband
ons-latent-diffusion-for-transformers/#:~:text=The%20authors%20estimate%20t
hat%20Muse,Stable%20Diffusion%20requires%20for%20inference.
11
3.2 Text to 3D Models
Dreamfusion
3D model created with “cat wearing virtual reality headset in renaissance oil
painting high detail caravaggio” prompt
https://dreamfusion3d.github.io/gallery.html
12
3.2 Text to 3D Models
Magic3D
Textured 3D mesh
3D model for “a peacock on a surfboard”
https://research.nvidia.com/labs/dir/magic3d/
13
3.3 Image to Text Models
Flamingo
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
14
3.3 Image to Text Models
VisualGPT
15
3.4 Text to Video Models
Phenaki
16
3.4 Text to Video Models
Soundify
https://chuanenlin.com/papers/soundify-neurips2021.pdf
17
3.5 Text to Audio Models
AudioLM
https://ai.googleblog.com/2022/10/audiolm-language-modeling-approach-to.html 18
3.5 Text to Audio Models
Jukebox
● Generates music along with artist singing in different genre and artist style
● Raw audio is high dimensional which can be computationally challenging
● Uses hierarchical VQ-VAE architecture to compress audio to lower-dimensional space
● Trained on 1.2 millions songs along with lyrics and other metadata
● Can generate music for given audio and generate songs with given lyrics
https://cdn.openai.com/papers/jukebox.pdf
19
3.5 Text to Audio Models
Whisper
https://openai.com/research/whisper
20
3.6 Text to Text Models
ChatGPT
21
3.6 Text to Text Models
LaMDA
https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html
22
3.6 Text to Text Models
PEER
https://ai.facebook.com/research/publications/peer-a-collaborative-language-model/
23
3.6 Text to Text Models
https://ai.facebook.com/blog/ai-speech-brain-activity/ 24
3.7 Text to Code Models
Codex
25
3.7 Text to Code Models
Alphacode
https://www.deepmind.com/blog/competitive-programming-with-alphacode 26
3.8 Text to Science Models
Galactica
https://galactica.org/explore/ 27
3.8 Text to Science Models
Minerva
https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html 28
3.9 Other Models
Alphatensor
https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor 29
3.9 Other Models
GATO
https://www.deepmind.com/blog/a-generalist-agent 30
4. Conclusion
● Can help to optimize the non creative, creative tasks
● Lack of data and bias in data hinder the progress
● Lack of understanding of ethics
● Discovering phase of Generative AI and its purpose
31