Professional Documents
Culture Documents
How Does ChatGPT Actually Work - An ML Engineer Explains - Scalable Path
How Does ChatGPT Actually Work - An ML Engineer Explains - Scalable Path
Calin Cretu
Machine Learning Engineer
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 1/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
In the first five days after its launch, over a million users had already used
Get Started
ChatGPT to answer questions on various topics. While its capabilities have
been impressive, from writing song lyrics to simulating a Linux terminal, the
inner workings of ChatGPT remain a mystery to many. However,
understanding how ChatGPT works is important not just for satisfying our
curiosity, but also for unlocking its full potential. By demystifying ChatGPT’s
inner workings, we can appreciate its capabilities better and identify areas
for improvement. So how does ChatGPT work, and how was it trained to
achieve such exceptional performance?
In this article, we’ll take a deep dive into the architecture of ChatGPT and
explore the training process that made it possible. Using my years of
experience as a machine learning engineer, I’ll break down the inner
workings of ChatGPT in a way that is easy to understand, even for those who
are new to AI.
Table Of Contents
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 3/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Get Started
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 4/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Get Started
Let’s dive deeper into ChatGPT’s architecture to learn more about what’s
happening between the input and the output.
Now imagine that the orchestra is learning to play a new piece of music. At
first, the musicians may make mistakes and play off-key, just as the neural
network may produce incorrect outputs. However, with practice and
feedback from the conductor, the musicians gradually adjust their playing to
minimize the errors and produce a more accurate rendition of the music.
Similarly, during the learning process, the neural network adjusts the weights
and biases of the connections between the neurons to minimize the
difference between its output and the desired output, improving its accuracy
over time.
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 6/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Get Started
When designing a neural network, the sky’s the limit, but architectural
decisions can greatly impact its performance. The chosen architecture can
affect the network’s accuracy, training and inference speed, and overall size.
Since the first Transformer network was introduced in 2017, this architecture
has gained immense popularity. Initially used in Natural Language
Processing, it has more recently been applied to Computer Vision as well.
Some of the most popular applications of Transformers include DALL-E 2,
which can generate images based on text descriptions in natural language,
GitHub Copilot, which provides real-time programming code suggestions,
and ChatGPT.
At the core of the Transformer model lies a block called the Attention
Mechanism, which enables the network to weigh the importance of different
parts of the input when making predictions. This mechanism plays a critical
role in the network’s ability to process complex input data and make
accurate predictions.
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 7/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
To understand the Attention Mechanism, it’s useful to consider an analogy.
Get Started
Imagine you’re reviewing a textbook and using a highlighter to mark parts of
the page that are particularly important and relevant. In this scenario, the
highlighter is helping you more easily understand the overall context.
The pre-trained model used for ChatGPT was trained to predict the next
word in a sentence based on the context of the previous words. The training
dataset included a vast amount of text data from books, websites, and other
sources. While this training was successful, it needed further refinement for
the model to provide personalized and accurate outputs.
The model’s capability to predict the next word accurately didn’t necessarily
imply that it would generate useful and reliable responses in real-world
scenarios. For example, suppose a user asks the model, “How do I treat my
headache?” The model may be able to generate a response by completing
the prompt with the most probable words based on its training, such as:
“Take some aspirin, drink water, rest, and avoid bright lights.”
While this response may seem appropriate based on the prompt, it may not
be the right advice for the user. Depending on the cause and severity of the
headache, taking aspirin or other pain relievers may not be the best
treatment option. Also, some types of headaches may require medical
attention.
Therefore, while the model was good at predicting the next word in a
sentence, it still needed further refinement to understand the user’s specific
situation and provide personalized, accurate, and safe advice.
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 9/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
To improve ChatGPT’s ability to respond more accurately to user prompts, a
Get Started
three-step training process was employed, which involved human
intervention.
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 10/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
In the second step, the previously trained model generated multiple
Get Started
predictions for different user prompts, and human annotators ranked the
predictions from the least to the most helpful. Using this data, the Reward
Model was trained to predict how useful a response was to a given prompt.
Note: Steps 2 and 3 can be repeated multiple times. Using the newly trained
model from Step 3, a new reward model can be trained by repeating Step 2,
which is fed again into Step 3, and so on. ChatGPT used the same
architecture and training process as InstructGPT but with different data
collection.
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 11/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Get Started
ChatGPT’s response shows that the model has the ability to understand the
user’s needs and tailor its responses accordingly. By asking questions and
seeking more information, the model can provide more accurate and helpful
advice based on the user’s context.
One of the most exciting aspects of ChatGPT is the newly released ChatGPT
API, which allows companies to take advantage of the capabilities of artificial
intelligence without having to invest significant resources in developing their
own models. This innovation has the potential to transform various
industries and create new opportunities for innovation. Companies can now
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 12/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
build on top of ChatGPT to develop new tools and services that leverage its
Get Started
powerful language processing capabilities.
• • •
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 13/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Get Started
Subscribe
{} [+]
Comments
0 ● reply
Email Sign Up
Read Next
Data Science
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 14/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
What Is Bias in Machine Learning?
Get Started
As artificial intelligence, or AI, increasingly becomes a part of our everyday lives, the need for understanding
the systems behind this technology as well as their failings, becomes equally important. It’s simply not
acceptable to write AI off as a foolproof black box that outputs sage advice. In reality, AI can be as flawed...
Omar Trejo
Senior Data Scientist
Data Science
Nicolas Azevedo
Senior Data Scientist
Full-stack
Rafael Goulart
Senior Full-stack Developer
Hire Developers
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 15/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Back-end Developers Front-end Developers
Get Started
.NET Developers Angular Developers
PHP Developers
Python Developers
Project Managers
QA Engineers
UI/UX Designers
Hire Now
Apply as a Freelancer
Site Map
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 16/17
4/26/23, 3:18 PM How Does ChatGPT Actually Work? An ML Engineer Explains | Scalable Path
Looking to hire?
Home
Get Started
For Clients
For Freelancers
Blog
Contact Us
Code of Conduct
Core Values
Newsletter
Join 23979+ subscribers already getting our original articles about software design and development. No
spam, just insightful content once a month.
Email Sign Up
Social
Looking to hire?
https://www.scalablepath.com/data-science/chatgpt-architecture-explained 17/17