Azure OpenAI
Artificial Intelligence
1956 Artificial Intelligence
the field of computer science that seeks to create
intelligent machines that can replicate or exceed
human intelligence
Machine Learning
1997 Machine Learning
subset of AI that enables machines to learn from
existing data and improve upon that data to make
decisions or predictions
Deep Learning 2017 Deep Learning
a machine learning technique in which layers of
neural networks are used to process data and make
decisions
Generative AI 2021 Generative AI
Create new written, visual, and auditory content
given prompts or existing data.
Traditional model development
High cost and slow deployment—each service is trained disjointly
Tagging Spatial Analysis Accessibility Spatial Azure Search,
DEPLOYMENTS Services Services Services Presenter Video Indexer
EAGLE
EAGLE
FLOWER PLAYING SOCCER
Classification Object Detection Object Tracking Action Entities Topics Sentiments
TASKS Recognition
Classification Detection Tracking Action Entity Topic Sentiment
Model Model Model Model Recognition Classification Analysis
INDIVIDUAL MODEL
(DISJOINTLY)
TRAINING DATA Tagging Detection Tracking Action Entity Topic Sentiment
(w/ ANNOTATION) data data data data data data data
Foundation Models Tasks
Question and
Answering
Data
Sentiment Analysis
Text
Images Information
Extraction
Speech Training Foundation Adaptation
Model
Structured Data Image Captioning
3d Signals Transformer Model
Object Recognition
Instruction Follow
GPT-3 (2020)
175B
GPT-3 | 2020
17B
Turing-NLG | 2020
2
340M
BERT-Large | 2018
1
Generative AI
GPT-3 DALL·E ChatGPT (GPT-Turbo) GPT-4
Prompt: Prompt: A white Siamese cat What is the fastest animal on land?
Write a tagline for an ice Response:
cream shop.
the cheetah (Acinonyx jubatus),
which can reach speeds of up to 60
miles (97 kilometers) per hour.
Response:
We serve up smiles with What makes them so fast?
every scoop!
Cheetahs are built for speed and
have several adaptations that make
them the fastest land animal: lean
body, long legs, flexible spine,
large nostrils and claws that don’t
retract.
Azure AI
Applications Partner Solutions
Business
Users
Application Platform
AI Builder
Power BI Power Apps Power Automate Power Virtual Agents
Scenario-Based Services
Applied AI Services
Bot Service Cognitive Search Form Recognizer Video Indexer Metrics Advisor Immersive Reader
Customizable AI Models
Cognitive Services Developers &
Vision Speech Language Decision OpenAI Service Data Scientists
ML Platform Azure Machine Learning
Azure OpenAI | Top 4 Capabilities & Use Cases
Content Code Semantic
Summarization generation
Generation Search
Call Center Analytics: Call Center Analytics: Summary Convert Natural Language to
Search reviews for a specific
Automatically generate of customer support SQL (or vice versa) for telemetry
product / service
responses to customer inquiries conversation logs data
Subject Matter Expert Document
Generate personalised UI for Convert Natural Language to Information Discovery and
Summarization (e.g. financial
your website Query Proprietary Data Models Knowledge Mining
reporting, analyst articles)
Social Media Trends
Code Documentation
Summarization
Examples of Multiple Model Use Cases
End to End Call Center Analytics: Classification, Sentiment, Entity Extraction, Summarization and Email Generation
Customer 360: Hyper-personalisation using timely Summarization of customer queries & trends, Search, and Content Generation
Business Process Automation: Search through structured & unstructured documentation, Generate Code to query data models, Content
Generation
Azure Open AI Enterprise Readiness
Microsoft Azure Cloud
Runs on trust
Your data is your data Data is stored encrypted in your Azure subscription
Your data from any fine-tuning Azure OpenAI Service provisioned in your Azure subscription
is not used to train the Model fine tuning stays in your Azure subscription
foundation AI models and never moves into the foundation AI models
Your data is protected by Encrypted with Customer Managed Keys
the most comprehensive enterprise Private Virtual Networks, Role Based Access Control
compliance and security controls Soc2, ISO, HIPPA, CSA STAR Compliant
Deployed in your Azure subscription,
secured by you, and tied to your datasets
and applications
Azure Large, pretrained AI models to unlock
OpenAI Service new scenarios
Custom AI models fine-tuned with your
data and hyperparameters
GPT-3 Codex
Built-in responsible AI to detect and
DALL·E ChatGPT mitigate harmful use
Enterprise-grade security with role-based
access control (RBAC) and private networks
Azure OpenAI Service Responsible AI
Custom content filters—
tailor tone and topics to your application
Abuse detection—
ensure responsible use of your application
Implementation guidelines, patterns,
and best practices
Large Language Model Adaptation
How to adapt GPT models for your task
Prompt Engineering with in-context learning
Zero-Shot One-Shot Few-Shot
The model predicts the answer given only In addition to the task description, the In addition to the task description, the
a natural language description of the task. model sees a single example of the task model sees a few examples of the task.
Fine Tuning
The model is trained via repeated gradient updates using a large corpus of example tasks.
Prepare and upload Train a new fined Use your fine-tuned
training data tuned model model
1. 2. 3. 4.
Higher quality results Ability to train on more examples Token savings due Lower latency requests
than prompt design than can fit in a prompt to shorter prompts
Prompt Engineering
2 min demo
Prompt Engineering Basic
Prompt engineering is the process of designing and refining prompts or input prompts for machine learning
models that enable them to produce relevant and useful outputs.
Instruction - a specific
task or instruction you
Output Indicator - indicates the type
want the model to
or format of output.
perform
Input Data - is the input or question
that we are interested to find a
response for
Context - can involve external
information or additional context
that can steer the model to better
responses
Basic applications
• Text Summarization
• Information Extraction
• Question Answering
• Text Classification
• Conversation
• Code Generation
• Data Generation
• Recommendations
Mini-challenge for audience
• Use Open AI to generate a customer support transcript for internet service provider ABC wireless
corp. Customer called to complain about technical issue and tech rep handled the call. The transcript
has a timestamp at the beginning to indicate date and time of the call
• Write a prompt to extract the information in the following structured format from the generated
customer support’s transcript in ISP company.
{ "issue": "product fault", "product": "Linksys router", "resolution": "resolved", "sentiment": "positive“,
“datetime:”Mar-12-2023 13:30:21” }
Advanced tips
• Add few-shot learning examples will improve clarity for complex scenario
• LLMs may suffer from greedy problem. Requires step by step reasoning or provide step by
step instruction will provide better result in very complex reasoning scenario
• Conversation: control style and behavior of chatbot
Fine Tuning
Model Adaptation with specific domain data
Fine-Tuning
GPT-3
Domain Data
Small Set of Labeled
Data Fine-tuning results is a new model being
• minimum of several
thousand examples high-dimensional generated with updated weights and biases.
• Maximum of 2.5M tokens
or 80-100mb size vector space
(Embeddings)
Fine-tuning This is in contrast to few-shot learning in which
model weights and biases are not updated.
Model parameters adjusted
(Weights and Bias)
Gradient Update
Fine-Tuned GPT-3 Model
Perform any domain-specific
NLP tasks
Azure OpenAI | Family of Models
GPT-3 models Codex models
GPT-4 Davinci-codex
GPT Turbo
Davinci
Curie Cushman-codex
Inferencing time
Babbage
Ada
Capability Capability
Fine tuning vs. prompt engineering
• Why fine tuning?
• You may have language patterns, context or communication or special
behavior that you need to hardwire into the LLM’s weights to simplify prompt
• Prompt engineering offers flexibility but comes at longer prompt -> higher
cost
• When to use fine-tuning
- It’s a repeated pattern across multiple applications (i.e. why attaching the
same instruction/context again and again?)
- Users expect the model/chatbot to already have a domain understanding
(context) which can be too large to insert into every prompt
- You have good quality data (prompt-completion format)
- You want your LLM to have special behavior based on input data alone that
prompt cannot be prepared in advance such as refusing to answer certain
question due to lack of information, asking human to chime in…
Hyperparameters specific to Fine-Tuning
Parameter Description Recommendation
n_epochs The number of epochs to train the Start from 4 and small datasets may need more epochs and large datasets may need fewer epochs.
controls how many times each model for. An epoch refers to one full If you see low training accuracy (underfitting), try increasing n_epochs. If you see high training
example is trained on cycle through the training dataset. accuracy but low validation accuracy (overfitting), try lowering n_epochs.
batch_size The batch size to use for training. The We’ve seen good performance in the range of 0.01% to 4% of training set size. In general, we've found
controls the number of training batch size is the number of training that larger batch sizes tend to work better for larger datasets.
examples used in a single examples used to train a single
training pass forward and backward pass.
learning_rate_multiplier The learning rate multiplier to use for We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best
controls rate at which the training. The fine-tuning learning rate results. Empirically, we've found that larger learning rates often perform better with larger batch sizes.
model weights are updated is the original learning rate used for
pre-training, multiplied by this value. Empirically, we found learning_rate_multiplier has minor impact compared to n_epochs and batch_size.
prompt_loss_weight The weight to use for loss on the When a model is fine-tuned, it learns to produce text it sees in both the prompt and the completion.
controls how much the model prompt tokens. This value controls In fact, from the point of view of the model being fine-tuned, the distinction between prompt and
learns from prompt tokens vs how much the model tries to learn to completion is mostly arbitrary. The only difference between prompt text and completion text is that
completion tokens generate the prompt (as compared to the model learns less from each prompt token than it does from each completion token. This ratio is
the completion, which always has a controlled by the prompt_loss_weight, which by default is 0.1. If prompts are extremely long (relative
weight of 1.0.) Increasing this value to completions), it may make sense to reduce this weight to avoid over-prioritizing learning the
can add a stabilizing effect to training prompt.
when completions are short.
Empirically, we found prompt_loss_weight has minor impact compared to n_epochs and batch_size.
Azure Open Solution Patterns
Retrieval Augmented Generation
Semantic Search Service
Retrieval Augmented Generation Demo
GPT + Enterprise data | Sample (entgptsearch.azurewebsites.net)
Natural Language Interface For Data Analysis
SQL query generation
Database engine
App UX Orchestrator
Code generation
Visualization
service
Result Interpretation
Natural Language Interface For Data Analysis
Summarization
Stuff
Summarized
Input document
docume Orchestrator Map-Reduce
nt
Refine