Professional Documents
Culture Documents
2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
TL;DR
In this blog I will provide an in-depth overview of the key ideas, scientific
innovations, and technologies powering modern AI applications and agents. It
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 1/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Introduction
Generative AI systems like GPTs, Llama, PALM, Claude, and others are transforming
digital user experiences. They can be seamlessly integrated with various data
sources, allowing AI applications to interact with users in a highly personalized way,
delivering information tailored to their specific needs.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 2/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 3/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
There are a few different ways you can get LLMs to do what you want. Prompt
Engineering and Fine-tuning are the two most promising places to get started. Fine-
tuning is a more advanced technique, and we will discuss it in more detail in a
separate blog post. But let’s talk about Prompt Engineering. It is an iterative process,
and it’s difficult to predict how well a prompt will perform for a specific task in
advance. This approach involves trying out different prompts, evaluating the
results, and deciding what to do next. For instance, out of the two prompts provided
below, the second one performs better:
Prompt 1: [Problem/question] State the answer and then explain your reasoning.
Prompt 2: [Problem/question] Explain your reasoning and then state the answer.
The reason the second prompt works better is because LLMs have a tendency to
predict the next part of a sentence. This can sometimes lead them to come up with
answers too quickly when the first prompt is used, resulting in less effective
outcomes.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 4/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
State the answer and then explain your reasoning (there’s no subway at exactly 8:15 am)
Explain your reasoning and then state the answer (more reasonable answer)
In this blog, we will delve into the pivotal concepts, innovations, and technologies
powering modern AI applications. We will begin by focusing on the fundamentals
of machine learning, and subsequently, we will explore generative AI models in
greater depth. Additionally, I’ve included courses and hands-on tutorials at the
conclusion of each topic for those who wish to further immerse themselves in these
subjects.
Machine Learning
Machine learning, a crucial aspect of artificial intelligence, imitates how humans
learn by using data and algorithms to improve accuracy with experience. Coined by
IBM’s Arthur Samuel in a historic game of checkers in 1962, this concept has led to
revolutionary advancements like Netflix’s recommendation system, Tesla’s self-
driving cars, and OpenAI’s ChatGPT. It relies on statistical techniques to make
forecasts, uncovering important information. TensorFlow and PyTorch are essential
tools for creating machine learning models. As this field progresses, it introduces
both transformative possibilities and important ethical dilemmas.
I am including a brief introduction to the key machine learning techniques that are
relevant to generative AI. Knowing the basics of these techniques and the
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 6/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
underlying ideas will help you better understand the model outputs and aid in
conceptualizing your AI application development and evaluation process. A good
place to start with machine learning is Andrew Ng’s Machine Learning
Specialization on Coursera.
Widely used algorithms like neural networks, Naive Bayes, and support vector
machines (SVM) make supervised learning applicable in various business areas,
such as image recognition, sentiment analysis, text generation. Despite its
widespread use, there are ongoing challenges, including the need for expert model
design and the time-intensive nature of training. Nevertheless, supervised learning
continues to be crucial in generating predictive insights across different industries.
If you are interested in delving deeper into Supervised Learning, you can explore
the Supervised Learning course on Coursera.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 7/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into Unsupervised Learning, you can explore
the Unsupervised Learning course on Coursera.
Semi-Supervised Learning: Bridging the Gap Between Labeled and Unlabeled Data
Semi-Supervised Learning is useful when there is a shortage of labeled data but an
abundance of unlabeled data. For example, if there are millions of pictures of
different real-world objects, but only 50k of them are labeled, it’s impractical and
expensive to label the rest manually. Instead, semi-supervised learning offers a
solution by training a model on the labeled subset and using it to predict labels for
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 8/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
the unlabeled majority. This not only saves time and resources but also maintains
accuracy.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 9/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into this topic, you can explore the hands-on
tutorials by DigitalSreeni.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 10/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
The self-supervised workflow starts with an unlabelled source dataset and a labelled target dataset. SSL enables
models to learn strong data representations through programmatic label generation, pre-training, and fine-tuning
(Source: arxvi.org)
You may be curious about how SSL differs from unsupervised, semi-supervised, and
supervised learning. Let’s explore that further.
SSL and unsupervised learning have distinct purposes. Although both operate
without labeled data, they vary in feedback mechanisms. Unsupervised learning is
broader, emphasizing model-centric approaches, while SSL focuses on data-centric
feedback. Unsupervised learning excels at clustering and dimensionality reduction,
while SSL sets the stage for regression and classification tasks, similar to supervised
learning.
In terms of the distinction between SSL and semi-supervised learning, SSL relies on
data structure and doesn’t need labeled data, whereas semi-supervised learning
uses a small amount of labeled data alongside unlabeled data. Both aim to reduce
dependency on labels, but they differ in approach and application.
If you are interested in delving deeper into this topic, you can explore the Self-
Supervised Learning Series by Yann LeCun.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 11/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 12/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 13/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Deep Learning based self-driving car. The architecture can be implemented either as a sequential
perceptionplaning-action pipeline (a), or as an End2End system (b) (Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 14/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into the topic of Reinforcement Learning,
you can explore the Stanford Course CS234: Reinforcement Learning by Emma
Brunskill. And if you want to further explore Deep Reinforcement Learning, you
can learn from UC Berkley lectures: Deep Reinforcement Learning.
Neural Networks: Mimicking the Brain for Intelligent Computing
Neural networks, also known as artificial neural networks (ANNs), form the basis of
deep learning, a branch of machine learning. These networks are inspired by the
structure of the human brain and consist of nodes organized into layers, including
input, hidden, and output layers. Each node, or artificial neuron, processes data
using weights and thresholds, becoming active if the output exceeds a specific
threshold. Through training with data, their accuracy is refined, making them
valuable tools for tasks like speech and image recognition, image generation, as
well as natural language processing.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 15/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 16/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Each node operates like a linear regression model, with inputs, weights, a bias, and
an output. This allows the network to handle complex information and make
decisions. The process of passing data from one layer to the next defines a neural
network as feedforward. Sigmoid neurons, which produce values between 0 and 1,
are commonly used for this purpose.
Neural networks are widely used in image recognition, speech processing, and
natural language processing. They rely on supervised learning, using labeled
datasets to fine-tune their algorithms. Training involves minimizing the cost
function through gradient descent, gradually adjusting the model’s parameters.
Additionally, backpropagation allows for error calculation and parameter
adjustments, further improving the model’s accuracy.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 17/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Neural Network Training in Progress, Showing Forward and Backward Passes (Source: 3Blue1Brown)
There are various types of neural networks designed for specific purposes.
Perceptrons, the earliest form, paved the way for multi-layer perceptrons (MLPs)
and convolutional neural networks (CNNs), which are extensively used in image
recognition. Recurrent neural networks (RNNs) excel in making predictions for
time-series data, while autoencoders focus on creating abstract representations
from input data.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 18/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into the topic of Neural Networks, you can
explore the hands-on tutorials by Sentdex.
Deep Learning: Revolutionizing AI and Industries with Neural Networks
Deep learning involves neural networks with three or more layers, inspired by the
human brain, which learn from extensive data sets. Additional layers improve
accuracy compared to single-layer networks, forming the basis for various AI
applications like digital assistants, computer vision, and self-driving cars.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 19/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 20/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into the topic of Deep Learning, you can
explore the MIT’s Introduction to Deep Learning course by Alexander Amini and
Practical Deep Learning for Coders course by Jeremy Howard.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 21/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 22/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
The applications of NLP are broad and influential. It spans from automating tasks
with chatbots and agents to enhancing search capabilities and organizing large
document collections. Industries such as healthcare, legal, finance, customer
service, and insurance are benefiting greatly by streamlining processes involving
unstructured text.
At the core of NLP are machine learning models, with deep learning leading the
way. Techniques like pretrained foundation models such as LLMs and transfer
learning allow for adaptability to new tasks with minimal training data. API
providers like OpenAI have developed pretrained LLM models tailored to various
applications, further accelerating development.
If you are interested in delving deeper into the topic of Deep Learning, you can
explore the Stanford’s CS224N: Natural Language Processing with Deep Learning
course.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 23/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 24/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
The Image-based Joint-Embedding Predictive Architecture (I-JEPA) uses a single context block to predict
the representations of various target blocks originating from the same image (Source: MetaAI)
If you are interested in delving deeper into the topic of Autoregressive Language
Models, you can explore the UC Berkley CS294 Deep Unsupervised Learning course.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 25/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Transformers have also led to a positive cycle in AI development, as they can make
precise predictions and generate more data for ongoing model improvement.
Components of the transformer architecture, such as input and positional
embeddings, encoder-decoder layers, and residual connections, have paved the way
for significant advancements in natural language understanding and generation.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 26/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
In the broader context, there are three main types of transformers suited for
different tasks: auto-regressive, auto-encoding, and sequence-to-sequence models.
The choice of transformer depends on factors like dataset size, task complexity, and
desired outcomes. GPT-like models excel in text generation, BERT-like models in
text comprehension, and BART/T5-like models are adept at both.
If you are interested in delving deeper into the topic of Transformers, you can
explore the Stanford CS25: Transformers United course.
Markov Chain
A Markov chain, originally conceptualized by mathematician Andrey Markov,
stands as a transformative force in stochastic modeling. It operates on the principle
that the subsequent state depends solely on the current one, eliminating the need
for a complete historical record. This inherent ‘memorylessness’ expands its scope
of application, from fields as varied as economics to genetics and finance. Central
to its functioning are state transitions, dictated by probabilities. The future state is
contingent upon the present state and the time elapsed, disregarding prior
trajectories. This fundamental concept underlies Markov theory, which finds wide-
ranging applications in economics, genetics, and finance.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 27/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
These chains are typically represented as directed graphs, where each arrow
signifies a transition probability. The use of matrix representation is pivotal,
providing a visual depiction of transition probabilities between states. Higher-order
matrices offer valuable insights into multi-step transitions. Markov chains are
further categorized into discrete and continuous-time, each influencing the nature
of transitions. Properties such as irreducibility and periodicity offer valuable
insights into their behavior. They have greatly simplified the study of real-world
processes and play a crucial role in Data Science, spanning techniques like Markov
Chain Monte Carlo (MCMC), information theory, and Diffusion Models. Diffusion
Models are modelled using a Markov chain having T steps. While certain
assumptions underlie their application, Markov chains, as exemplified in
something as everyday as meal choices, demonstrate their effectiveness and
versatility in practical scenarios.
The Markov Chain of forward/reverse diffusion process of generating a sample by slowly adding/removing
noise (Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 28/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into the topic of Markov Chain, you can
explore the explanation of using Markov Chain in Diffusion Models by Ari Seff. You
can also checkout the the in-dept tutorials on Markov Chain by Normalized Nerd.
Autoencoders
Autoencoders are designed to replicate their input as output, making them
invaluable in tasks like image reconstruction and noise reduction. The magic lies in
their ability to distill complex data into a compact, lower-dimensional
representation, known as the bottleneck. This compression, coupled with
structured data, enables autoencoders to excel at tasks where correlations between
input features exist.
An autoencoder uses an encoder to compress an input into a representation and a decoder to reconstruct
the input from the representation (Source: DeepLearning.ai)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 29/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
The Decoder, on the other hand, is tasked with reconstructing the compressed
knowledge representation back into the original form. For simple autoencoders, the
output mirrors the input, albeit with reduced noise. However, variational
autoencoders (VAEs) generate entirely new content based on the input, showcasing
the versatility of this technology.
The size of the bottleneck is crucial. A smaller bottleneck reduces the risk of
overfitting but may lead to the loss of important information. Therefore, striking a
balance is essential to ensure optimal performance.
If you are interested in delving deeper into the topic of Autoencoders, you can
explore the hands-on tutorials by DigitalSreeni.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 30/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
CLIP pre-trains an image encoder and a text encoder to predict which images were paired with which texts
in our dataset. (Source: OpenAI)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 31/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 32/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
They power generative applications like ChatGPT, enabling the creation of human-
like text, images, and more. GPT’s impact spans industries, from Q&A bots to
content generation. Its significance lies in the transformative potential of the
transformer architecture, automating tasks from language translation to content
creation. GPT’s versatility spans social media content creation, code writing, data
analysis, and even building interactive voice assistants. By understanding and
predicting language, GPT models represent a leap towards achieving artificial
general intelligence.
Generative AI Models
Generative AI encompasses models capable of producing high-quality content
across various mediums, including text, images, audio, and video, driven by their
training data. OpenAI’s ChatGPT exemplifies this revolution, crafting poems, jokes,
and essays that rival human creations. While the initial emphasis was on visual
generation, the spotlight has now shifted to natural language processing.
Generative models can also extend beyond language, decoding software code,
molecular structures, and more.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 33/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
mechanisms. By processing vast datasets, GPT-3, with its 175 billion parameters,
attains a level of proficiency and fluency that marks a paradigm shift in AI
capabilities. From improving customer feedback analysis to enhancing virtual
reality interactions, GPT is reshaping industries across the board.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 35/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Transformer architecture and training objectives used for training original GPT model (Source: OpenAI)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 36/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
GPT performance on academic and professional exams. Exams are ordered from low to high based on GPT-
3.5 performance. GPT-4 outperforms GPT-3.5 on most exams tested. (Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 37/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
A survey of LLMs — May 2023. A timeline of existing large language models (having a size larger than 10B) in recent
years. The timeline was established mainly according to the release date of the technical paper for a model. (Source:
arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 38/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
The evolutionary tree of modern LLMs traces the development of language models in recent years and highlights
some of the most well-known models. Models on the same branch have closer relationships. Transformer-based
models are shown in non-grey colors: decoder-only models in the blue branch, encoder-only models in the pink
branch, and encoder-decoder models in the green branch. The vertical position of the models on the timeline
represents their release dates. Open-source models are represented by solid squares, while closed-source models
are represented by hollow ones. The stacked bar plot in the bottom right corner shows the number of models from
various companies and institutions (Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 39/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
LLM Ecosystem Graph is a framework to document the foundation models ecosystem, namely both the assets
(datasets, models, and applications) and their relationships. (Source: Stanford.edu)
A broader overview of LLMs, dividing LLMs into four branches: 1. Training 2. Inference 3. Applications 4. Challenges
(arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 40/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Open-source (and open) LLMs are becoming increasingly capable, and finetuned
models designed for specific tasks have started to surpass even the most capable
models like GPT-4 (Phind-CodeLlama-34B-v2 for coding and Gorilla for writing API
calls) . In contrast to their proprietary counterparts, which are restricted by
licensing agreements, open-source LLMs are generally freely accessible. This
accessibility allows AI engineers and researchers not only to employ them for
various purposes but also to enhance and distribute them. This democratization
brings forth a host of advantages.
Helpfulness human evaluation results for Llama 2-Chat compared to other open-source and closed-source
models. (Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 41/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
However, it’s essential to acknowledge and address potential risks, including issues
of bias, misinformation, consent, and security. Through education and robust AI
governance, these challenges can be mitigated, ensuring the responsible and
effective use of open-source LLMs in a variety of domains.
The landscape of open-source LLMs is evolving rapidly, and models like Llama-2
and Mistral are emerging as the preferred choices for many AI researchers and
engineers. HuggingFace Open LLM Leaderboard and Stanford Ecosystem Graph are
good places to keep track of open source (and open) LLMs.
For successful prompt engineering for your AI applications you need to understand
two crucial factors of LLMs: Temperature and Top P. Temperature influences text
randomness, with lower values favoring conservative predictions and higher values
fostering creativity. Adjusting temperature is key to tailoring output for specific
tasks. For fact-based questions, opt for a lower temperature to prioritize accuracy.
Conversely, creative tasks, like poetry generation, benefit from higher temperatures
for imaginative results. Top P, on the other hand, controls response determinism. A
lower value narrows token choices for more precise but potentially less diverse
outputs, while a higher value encourages diversity by considering a wider token
range. Choose a lower top_p for accuracy-driven tasks and increase it for more
diverse responses.
If you are interested in delving deeper into the topic of Large Language Models, you
can explore these courses: Stanford CS324 — Large Language Models, Stanford CS
224N NLP with Deep Learning, Princeton COS 597G Understanding Large Language
Models, Stanford XCS224U Natural Language Understanding, MIT Generative AI For
Constructive Communication.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 42/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
A high-level overview of unCLIP. Also the architecture behind OpenAI’s DALL-E 2 (Source: arxiv.org)
These generative models operate by adding noise to training data and then learning
to reverse this process, resulting in coherent images from randomness. They excel
in tasks like text-to-image generation, denoising, and more. At their core, diffusion
models are parameterized Markov chains, honed through variational inference,
designed to generate data resembling their training data. Put simply, if these
models are trained on images of dogs, they can conjure remarkably lifelike canine
images.
Diffusion models smoothly perturb data by adding noise, then reverse this process to generate new data from noise.
(Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 43/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 44/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into the topic of Diffusion Models, you can
explore these two couses: UC Berkley CS 198 Lecture on Diffusion Models and
Practical Deep Learning: Deep Learning Foundations to Stable Diffusion by Jeremy
Howard.
Multimodal Models
Multimodal models simultaneously handle diverse sensory inputs like text, images,
audio, and video. Unlike traditional unimodal AI systems, they fuse information
from various sources, yielding a richer understanding of data with context and
supporting details. These models employ intricate deep learning techniques
involving encoder, mixer, and decoder layers, imitating how humans integrate
sensory input.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 45/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
a) Comparison between the human brain and multimodal foundation model BriVL (Bridging-Vision-and-Language)
for coping with both vision and language information. b) Comparison between modeling weak semantic correlation
data and modeling strong semantic correlation data. (Source: Nature)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 46/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
MotionLM autoregressively generates sequences of discrete motion tokens for a set of agents to produce
consistent interactive trajectory forecasts. (Source: arxiv.org)
Generative AI for Autonomy (GAIA-1) Architecture: a generative world model that leverages video, text, and action
inputs to generate realistic driving scenarios. (Source: arxiv.org)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 48/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Illustration of the lightweight multimodal alignment learning of encoding and decoding. (Source: arxiv.org)
The late fusion approach combines predictions from separate models trained on
individual modalities, resulting in a final prediction. Late fusion proves effective
when modalities aren’t directly related or offer complementary information. An
example is emotion recognition in music, where audio features and lyrics are
separately modeled and then combined for a more accurate prediction.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 49/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
There are a few challenges in multimodal learning. The first one is representation.
Processing different modalities while preserving their unique characteristics
presents challenges. Joint and coordinated representation strategies are generally
employed to address this challenge. For instance, the MS COCO dataset requires
both joint and coordinated representation strategies to effectively handle
multimodal challenges.
The next challenge is alignment. Tasks like audio-visual speech recognition demand
precise alignment of audio and visual data. Techniques like hidden Markov models
and dynamic time warping are used for synchronization.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 50/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 51/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Overview of the VATT architecture and the self-supervised, multimodal learning strategy (Source: arxiv.org)
If you are interested in delving deeper into the topic of Multimodal Models, you can
explore CMU Multimodal Machine Learning course.
Vision Language Models (VLM)
The convergence of vision and language is propelling us into an era of
unprecedented multimodal understanding. Vision-language models (VLMs), adept
at processing diverse modalities like images, text, and video, stand as a
monumental advancement in this journey.
VLMs can perform multimodal tasks through few-shot learning. With just a handful
of task-specific examples, VLMs excel in problem-solving without additional
training.
These models process interleaved images, videos, and text prompts to generate
associated language. Much like their linguistic counterparts, VLMs use a dual
interface to tackle multimodal tasks. By providing example pairs of visual inputs
and expected text responses, the model learns to answer questions based on new
images or videos. This versatile approach extends to image and video tasks, treating
them as text prediction challenges with visual input conditioning.
VLMs stand out with their unique ability to process sequences of text tokens
interleaved with multimedia, bridging the gap between visual and linguistic
understanding. VLMs like Flamingo synergize pre-trained vision and language
models, enabling them to “perceive” visual scenes and engage in rudimentary
reasoning.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 52/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data
including clinical language, imaging, and genomics with the same model weights. (Source: Google
Research)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 53/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Approach to grafting a model works by training a medical information adapter that maps the output of an
existing or refined image encoder into an LLM-understandable form (Source: Google Research)
IDEFICS is an 80 billion parameters multimodal VLM that accepts sequences of images and texts as input and
generates coherent text as output. (Source: HuggingFace.co)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 54/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Selected examples of inputs and outputs obtained from Google Deepmind’s Flamingo-80B (Source: arxiv.org)
A qualitative example generated by a visual language model — InstructBLIP Vicuna model (Source:
arxiv.org)
Since 2021, models like Google’s Flamingo, OpenAI’s CLIP & GPT-4V, and Alibaba’s
Qwen-VL are redefining tasks such as image captioning and visual question-
answering, showcasing the transformative potential of joint vision-language
models. This evolution has ushered in the era of zero-shot generalization, opening
up new practical applications across a multitude of industries.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 55/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 56/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Underpinning these pre-training efforts are vast multi-modal datasets such as PMD,
COCO, Conceptual Captions, and Flickr30K. These rich and diverse datasets serve as
the foundation upon which these models are trained. For downstream tasks,
datasets like VQA, NLVR2, TextVQA, and Hateful Memes are instrumental in fine-
tuning models for specific applications.
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 57/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
If you are interested in delving deeper into the topic of Vision Language Models, you
Open in app
can explore Microsoft Research Vision-and-Language course.
Conclusion
In this blog I provided a comprehensive overview of the foundational concepts,
innovations, and technologies enabling the rapid advancement of AI applications
and agents. Understanding machine learning techniques like supervised learning,
neural networks, NLP, and reinforcement learning is key for AI Engineers looking
to leverage AI capabilities. Generative models like LLMs, diffusion models, and
VLMs are transforming user experiences across industries. Frameworks like
LangChain and Llama-Index are simplifying the integration of LLMs into real-world
applications. The future will likely see more specialized, adaptable AI applications
and open-source models, ensuring innovation flourishes. With this background
knowledge, developers can effectively engineer prompts, build demos, and create
production-ready AI applications that provide personalized, engaging experiences.
Prompt Engineering
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 58/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Follow
ai engineer. ai whisperer. ai curious. building llm powered ai applications & agents at AI Geek Labs
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 59/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
177 11
ai geek (wishesh)
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 60/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
73
ai geek (wishesh)
76
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 61/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
ai geek (wishesh)
42
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 62/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Paul Rose
12.7K 222
After one year of implementing AI features for various businesses, I share my perspective on
the mistakes I see companies making with LLMs…
1.2K 34
Lists
Qwak
485 5
3.4K 39
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 65/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
Gathnex
211
Shushant Lakhyani
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 66/67
8:28 ,4.12.2023 An AI Engineer’s Guide to Machine Learning and Generative AI | by ai geek (wishesh) | Medium
2.3K 38
https://medium.com/@_aigeek/an-ai-engineers-guide-to-machine-learning-and-generative-ai-b7444941ccee 67/67