You are on page 1of 88

An Introduction to AI:

Understanding the scope of what


foundation models can do
Ryan S. Baker
University of Pennsylvania
First note
• If you have questions or comments, please post them in the chat
• I will keep an eye out for them and respond at a convenient moment
Who am I?
• Professor, University of Pennsylvania
• Founding (former) President, International Educational Data Mining
Society
• Founding Associate Editor, Journal of Educational Data Mining
• Founding Editor, Computer-Based Learning in Context
• Director, Penn Center for Learning Analytics
• Instructor, Big Data and Education, edX
• Faculty Director, online Masters in Learning Analytics
• 25K+ citations with 400+ co-authors
The Penn Center for Learning Analytics
• The Penn Center for Learning Analytics uses a blend of large-scale and
small-scale research methods, from analytics, data mining, and
foundation models to ethnographic and field observation methods, to
study learning and engagement.

• We conduct research on both state-of-the-art online learning


environments and traditional classroom instruction, investigating what
approaches and methods can best enhance student long-term
outcomes.
By now…
• … Foundation models and large language models like ChatGPT
probably need no introduction
By now…
• … Large language models like ChatGPT probably need no introduction

• But just in case


Prompt from a talk I gave a few months ago, when ChatGPT 3.5 had just been released
• ChatGPT 5 will likely be able to perform tasks that are currently
beyond the capabilities of its predecessors. For example, it will likely
have an improved ability to understand and generate human-like
responses in multiple languages, making it a valuable tool for global
communication and understanding.
• In addition, ChatGPT 5 will likely have a more advanced
understanding of context and meaning, which will allow it to generate
more accurate and relevant responses, even in complex situations.
Right now, ChatGPT can come up with utter nonsense responses when
it is asked to reason about mathematics or anything that involves
common sense. They will fix that.
• It will also likely be able to handle tasks that require more advanced
reasoning and decision-making, such as providing financial advice or
diagnosing medical conditions. Although when it fails, it will still fail
spectacularly, for a while.
• Another exciting development will be the integration of ChatGPT 5
with various forms of media, such as audio and video, allowing it to
perform even more complex tasks, such as transcribing and
summarizing audio and video content, and even generating new audio
and video content based on written instructions.
• Current technology can already transcribe and then summarize audio
content. It’s surprisingly good at that already, as tools like AI Whisper
and Firefly demonstrate
• Overall, I believe that ChatGPT 5 will be a major step forward in the
development of natural language processing technology, and I am
excited to see what new applications and innovations will arise as a
result."
GPT/ChatGPT is amazing
• I use GPT more than 3 times a day on average to save time (or accomplish
things I couldn’t accomplish otherwise), for tasks ranging from
• Checking and aligning the reading level on survey items my team creates for middle
school students
• Obtaining quick summaries on topics I don’t know well
• Writing Python code for simple functions where I don’t remember the operators off
the top of my head
• Fixing colleagues’ grammar/writing clarity in joint research articles (easier and better
than Grammarly, although I haven’t tried latest version of Grammarly)
• Getting example sentences in other languages for words I don’t know how to use
• First-drafting tenure letters
• Optimizing my workout routine
• Getting clear answers on cooking questions
Doing this
• As you saw here, I seldom just take what it gives me – I use it as a
starting point

• Prompt engineering – figuring out how to make your request to get


what you need – has become an important skill
Talk Outline
·The Foundation Models Available Today and How They Work
·Getting Foundation Models to Do What You Want
·Examples, Examples, Examples
·Discussion
Talk Outline
·The Foundation Models Available Today and How They Work
·Getting Foundation Models to Do What You Want
·Examples, Examples, Examples
·Discussion
Previous-Generation AI
“Machine Learning”
• Artificial Intelligence didn’t just pop up in 2023
• A previous generation, machine learning, has been around since the
1980s (and symbolic AI even before that)
• Machine learning is still widely used today, and there are many things
it does better than the newest approaches
Previous-Generation AI
“Machine Learning”
• The classical “supervised learning” approach to machine learning
• Find examples of the thing you want to detect (“Does this essay deserve an
A?”)
• Distill features of the data relevant to detecting it (“Does the essay use each
of these 17 keywords? How long is the essay? Do these words occur in this
order? Is there cohesion in what words are used between paragraphs?”)
• Building a special-purpose model that identifies the one thing you want to
identify (maybe several such special-purpose models)
• Validating the model works on new data and is not algorithmically biased

• Very powerful, requires specific data, considerable effort to build


each time
Foundation Models (Bommasani et al., 2021)
“AI is undergoing a paradigm shift with the rise of models (e.g., BERT,
DALL-E, GPT-3) trained on broad data (generally using self-supervision
at scale) that can be adapted to a wide range of downstream tasks. We
call these models foundation models to underscore their critically
central yet incomplete character… their scale results in new emergent
capabilities, and their effectiveness across so many tasks incentivizes
homogenization.”
Foundation Models (Bommasani et al., 2021)
“AI is undergoing a paradigm shift with the rise of models (e.g., BERT,
DALL-E, GPT-3) trained on broad data (generally using self-supervision
at scale) that can be adapted to a wide range of downstream tasks. We
call these models foundation models to underscore their critically
central yet incomplete character… their scale results in new emergent
capabilities, and their effectiveness across so many tasks incentivizes
homogenization.”

Huge effort to build in very general fashion


Very little effort to adapt for a huge range of applications
Foundation Models
“Large Language Models” (LLMs): BERT, ChatGPT, BingChat, Bard,
Claude, LLaMA , Github Copilot

Other Foundation Models: MathBERT, DALL-E 2, StableDiffusion,


Midjourney
Foundation Models
• Can predict
• Words
• Sentences
• Pixels
• Computer program text
• Mathematical equations
• Anything?
Transformer/Foundation Models
• Can predict
• Words
• Sentences
• Pixels
• Computer program text
• Mathematical equations
• Anything?

• And then, in a sudden light-switch


transformation, prediction became generation
DALL-E 2
• "Teddy bears working
on new AI research
underwater with 1990s
technology"
DALL-E 2
• “Sticking thumb out to hitchhike”
Stable Diffusion
• Open-source alternative to DALL-E 2 with several variants
• More powerful and easier to customize
• Significantly harder to set up and keep running
Text and image generation: not quite together
yet (Randall Munro)
Large Language Models
• ChatGPT
• BingChat
• Bard
• LLaMA
Prediction becomes Generation…
And Machine Learning Becomes Prompt Engineering

• Write a summary of how GPT works


• Write a summary of how GPT works, for a 5th grader, written at a 5th
grade reading level
• Write a summary of how GPT works, written for a PhD in Machine
Learning who has not read a research paper since 2017.
• Write a summary of how GPT works, written for a highly-intelligent
person who knows essentially nothing about artificial intelligence.
• Ryan Baker is a professor of Machine Learning. He is explaining to a
general audience how GPT works. He says, “
Talk Outline
·The Foundation Models Available Today and How They Work
· A Brief Digression: What’s Under the Hood?
·Getting Foundation Models to Do What You Want
·Examples, Examples, Examples
·Discussion
What’s Under the Hood?
• For those who are interested

• You may also feel welcome to turn off your brains for the next 3
minutes
Neural Networks
◻ Compose extremely complex relationships through
combining “perceptrons”
◻ Originally: take a set of inputs, figure out weights for each
input, and convert to [0,1] based on whether result is under
or over 0
◻ Nowadays: much more complex (pic in a second)
◻ Find very complicated models
LSTM Unit

• Note the:
Hidden state (h) Forget gate (Ft)
Input gate (It) Output gate (Ot)
Image by fdeloche - CC BY-SA 4.0
A Very Simple Neural Network
• Red circles: Predictors
• Blue circles: Perceptrons
• Green circles: Predicteds

Image courtesy of glosser.ca used under Creative Commons Licensing


What you see here
• A single layer neural network
• A very simple one
• Generally hundreds/
thousands/millions of hidden
perceptrons

Image courtesy of glosser.ca used under Creative Commons Licensing


What you see here
• A single layer neural network
• A very simple one
• Generally hundreds/
thousands/millions of hidden
perceptrons
• GPT4 has 1.7 trillion parameters

Image courtesy of glosser.ca used under Creative Commons Licensing


Contemporary work builds more complex
“deep learning” models

Image courtesy of IBM


Multiple hidden layers

Image courtesy of IBM


Why does deep learning (usually) work
better?
• Can capture multiple layers of abstraction

• Without having to do so in a way that human beings can understand


Often the term deep learning
• Reserved for recurrent neural networks
(or more complex algorithms still)

• Recurrent neural networks fits on sequence of events


◻ Keeping some degree of “memory” about previous events
◻ And then, later variants like LSTMs the use of memory decreases over time
◻ A node can over time influence itself
GPT4 is even more complex
• Converts inputs into matrix representations and inputs those matrix
representations to network word-by-word
• Using mechanism like more complex version of LSTM to process both early
parts of input and later parts of input at same time
GPT4 is even more complex
• Uses self-attentional mechanisms (a way of keeping some information
over time and discarding other information) to decide which
information most important and relationships between earlier and
later material
GPT4 is even more complex
• Uses layer normalization to prevent overfitting (although it still
happens, and specific rare strings can cause GPT to behave weirdly)
(Rumbelow & mwatkins, 2023)

• The SolidGoldMagickarp phenomenon


GPT4 is even more complex
• Uses a LOT of other neural network optimization techniques honed
over years of work
GPT4 is even more complex
• Uses reinforcement learning where humans train GPT on what
responses are good and what responses are bad
Whew…
Talk Outline
·The Foundation Models Available Today and How They Work
·Getting Foundation Models to Do What You Want
·Examples, Examples, Examples
·Discussion
Two types of prompts
• Request prompts
• Ryan Baker is a professor of Machine Learning. He is explaining to a general
audience how GPT works. He says, “
• Completion prompts
• You are a professor of Machine Learning. Please explain to a general audience
how GPT works.
What LLMs are quite good at
• Creating things that generally “look right”
• Putting ideas together in creative ways
• Factual information that’s hard to search for
• Adopting a voice/style where there are lots of examples
• Getting the right ballpark on the first try
• Explaining their reasoning
• Translation (across languages, across styles, etc.)
• Summarizing
Some current issues
• “Averaged reasoning”
• Ask them what a specific expert would say, they will give you what a generic
expert would say
• My Harry Potter adventure with my kids suddenly turned R rated, probably
based on GPT reading too much fan fiction

• “Not really getting it” errors


• Being too helpful in guessing what you “really want”
(because what you asked for is uncommon/rare/weird)
• AI Interpreter is outright terrible, at least in current version
• It misinterprets its own graphs, and skips lines of data
Hallucinations
• Making up facts or (more frequently) references

• GPT is surprisingly reliable for general facts, including facts that are
hard to search up on the internet
• “Where can I buy AdBlue in Italy?”
• “How long should I boil chicken hearts for?”
• It is absolutely terrible at giving correct citations for those facts
• BingChat can be OK at identifying websites but not scientific articles

• They will make up plausible articles with plausible authors, but the
articles don’t really exist
Hallucinations
• Making up facts or references

• GPT is surprisingly reliable for general facts, including facts that are
hard to search up on the internet
• “Where can I buy AdBlue in Italy?”
• “How long should I boil chicken hearts for?”

• However, it can still get things wrong sometimes, for obscure


questions
Hallucinations
• ChatGPT used to make up citations – as of today, 4.0 refuses to give
citations, and BingChat often now actually gets citations right
Just a “Stochastic Parrot”?
• These tools create the most probable next word
• So, some people refer to them as “stochastic parrots” (e.g. Bender et
al., 2021) – which implies no understanding whatsoever

• There’s an open debate, but I don’t think that’s quite right…


Just a “Stochastic Parrot”?
• There is some model underlying how the most likely next word is
produced

• Successful prompt engineering depends in part on being able to think


like that model
• And think of how to make the answer you want become the most
probable one
Praise-and-prompt
• You often get a better response if you tell ChatGPT it’s an expert
(and the more expert, the better: use words like “famous”, “award-
winning”, or specific institutions)

• Doing so makes the answers associated with experts – and the


characteristics associated with those answers – the most likely

• In general, ChatGPT and other LLMs are good at perspective-taking

• This is also the basis for a lot of the ways that people hack or “jail-break”
LLMs (Nardo, 2023)
• A lot harder than it used to be with ChatGPT, still easy with LLaMA
Some notes
• If you don’t entirely like what it gives you at first, you can correct it, or
ask for clarification, or ask for modifications, or ask it to redo

• You can ask it to do things OR ask it to complete text

• It can handle novel and unusual concepts like “carnivorous blanket”


Now we will look at and discuss examples and
discuss how they work…
Talk Outline
·The Foundation Models Available Today and How They Work
·Getting Foundation Models to Do What You Want
·Examples, Examples, Examples
· With Discussion
·Discussion
ChatGPT to summarize complex topics
ChatGPT to explain difficult concepts
ChatGPT (with plugin) to create diagrams
ChatGPT as a Tool for Adapting Content to Specific Audiences
(Example from my doctoral student Andres Zambrano)
ChatGPT as an Automated Tool for Improving Writing
(Example from my doctoral student Andres Zambrano)
ChatGPT as a Programming Assistant
(Example from my doctoral student Andres Zambrano)
ChatGPT as a Programming Assistant
ChatGPT as a Programming Assistant:
Converting Between Programming Languages
ChatGPT for Qualitative Coding of Data (Prompt design from
Zambrano et al. in press; data from Zhang et al., 2023)
ChatGPT as a Brainstorming Tool that Suggests Paper Titles
(Example from my doctoral student Andres Zambrano)
ChatGPT for Creating Creative New Content
(in this case, for my daughter’s bedtime stories)
ChatGPT to practice language skills
Automated Summarization of Lectures

(Gonzalez et al., 2023)


Automated Teaching Assistants
Automated Feedback (Pankiewicz et al., 2023)
Personalizing for Interest (Yadav et al., 2023)
Personalizing for Interest (Yadav et al., 2023)

Call-out to 2013 published paper!


Personalizing for Interest (Yadav et al., 2023)
Auto-Grading
• GPT can grade for you (Slater et al., in preparation)
• Give it examples you have
• Ask it to create 30 new examples based on those examples (and check them)
• Then give it those examples and ask it to grade student work
Auto-Grading of Short Responses on Graph
Interpretation (Slater et al., in preparation)
Prompt:
I am an explanation-generation bot. I am trying to generate compare-
contrast explanations for the graph I have chosen. Now I will generate
30 examples. I should make the sentence structures of these examples
as various as possible. I should not use words like "while" or "whereas".
Dialogue Tutoring
• A natural-language tutoring conversation between a student and the
AI

• Previously possible to do this really well, but really expensive and


focused on narrow range of content (e.g. Nye et al., 2014)

• Now a lot cheaper and easier


Dialogue Tutoring (Khanmigo)
Using Image Foundation Models
• Create images for presentations and reports
• Create persona images
Talk Outline
·Foundation Models including Large Language Models
·The role of foundation models in the world of work
·The use of foundation models by students
·Foundation models in learning and teaching
·Discussion
LLMs and Foundation Models in general
• A big change in our society with significant impacts on higher
education

• Each of us have to decide how we want to use them, and how we


want our students to use them

• With the goal of better supporting our students in achieving the


future we want them to achieve
Before I Open The Floor For Discussion…

Ryan Baker

twitter.com/BakerEDMLab

“Big Data and Education”, running on edX now


Online Masters in Learning Analytics
Computer-Based Learning in Context
All publications available online – Google “Ryan Baker”
Please Share Your
• Questions
• Thoughts
• Comments

• Let’s discuss

You might also like