What is an LLM?
Generative AI
Dummies
for
(Understand the technology that powers
ChatGPT and other AI models)
Large Language Model (LLM)
AI model trained on large datasets and uses
advanced neural network architectures, deep
learning techniques, and transformer-based
models to understand patterns and context.
Virtual Assistants (Siri, Alexa)
Chatbots (ChatGPT)
Text translation, generation,
and Summarization
Analysis and prediction
Sentiment analysis
Content recommendations
Examples
GPT (Generative Pre-trained Transformers)
ChatGPT-4, the most popular example, is a
multimodal model, trained on 7000+ books
and images.
BERT (Bidirectional Encoder Representations
from Transformers)
Processes words in parallel, making it more
efficient compared to traditional sequential
models like recurrent neural networks (RNNs).
LaMDA (Language Model for Dialogue
Applications)
Conversational transformer based LLM by
Google. Now called Gemini.
LLaMA (Large Language Model Meta AI) is an
auto-regressive language model, built on
transformer architecture by Meta AI.
Learn how they are built
Steps to build an LLM
Training Data Collection: books, articles,
images, and websites.
Cleaning and preprocessing: formatting and
segmenting the data.
Tokenization: convert the native data into tokens
to help the model understand on a granular level.
Model Architecture Design: design the neural
network architectures and transformer models to
handle sequences of data, and capture the
complex relationships between tokens.
Embedding: Each token is converted into a
numerical vector capturing semantic and
syntactic information about its corresponding
token.
Steps to build an LLM
Training: the model is fed input tokens and
learns to predict the next token in a sequence. It
adjusts its internal parameters (weights) based
on the difference between its predictions and the
actual next tokens.
Learning Context and Relationships between
tokens and the context they appear. Transformer
models use self-attention mechanisms to weigh
the importance of different tokens in a sequence.
Fine-Tuning (Optional): to perform particular
tasks, such as legal document analysis or medical
report generation. This involves additional
training to the nuances of the new domain.
Post-processing: correcting grammar,
ensuring cohesiveness, or trimming
unnecessary parts, to improve the readability
and relevance of the generated text.
Learn how to build your
own LLM for free
With help from IBM Data Scientists
www.CognitiveClass.AI