You are on page 1of 70

Artificial Intelligence and

Machine Learning (CSET301)


Labs will be in
Google Colab.

Labs
For project you will
get access of DGX,
request basis.
Artificial Intelligence
not a new term …
Obvious Questions

Programs that behave externally like


What humans?
is AI? Programs that operate internally as
humans do?
Computational systems that behave
intelligently?
Turing Test

• Human beings are intelligent Alan Turing

• To be called intelligent, a machine


must produce responses that are
indistinguishable from those of a
human
Artificial Intelligence in 1950
Machine Learning
• Learning is any process by which a system improves
performance from experience.” (Herbert Simon)
• Machine Learning
Machine learning is the science of getting computers to act
(learn) without being explicitly programmed from a
given set of data to achieve a desirable outcome.
– a machine that learns on its own
• Machine Learning (Tom Mitchell (1998)) is the study of
algorithms that
• improve their performance P
• at some task T
• with experience E.
• A well-defined learning task is given by <P, T, E>.
Why Machine Learning ?
• Learning is used when:
– Human expertise does not exist (navigating on Mars),
– Humans are unable to explain their expertise (speech
recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases (user
biometrics)
• Develop systems that can automatically adapt and
customize themselves to individual users.
• Discover new knowledge from large databases

10
Defining the Learning Task
Improve on task T, with respect to performance metric P, based on
experience E
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
10
P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
Traditional Programming

Data
Output
Program(Features)

Machine Learning

Data
Program(Features)
Output

4
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2

6
• Statistics quantifies numbers
• Data Mining explains patterns
• Machine Learning predicts with models
• Artificial Intelligence behaves and reasons

14
1956
Birth of AI, early successes
Checkers (1952): Samuel’s program learned
weights and played at strong amateur level

Problem solving (1955): Newell & Simon’s Logic


The- orist: prove theorems in Principia
Mathematica using search + heuristics; later,
General Problem Solver (GPS)

10
Overwhelming optimism...
Machines will be capable, within twenty years, of doing any
work a man can do. (Herbert Simon)

Within 10 years the problems of artificial intelligence will be


substantially solved. (Marvin Minsky)
I visualize a time when we will be to robots what dogs are to
humans, and I’m rooting for the machines. (Claude Shannon)

12
...underwhelming results
Example: machine translation

The spirit is willing but the flesh is weak.

(Russian)

The vodka is good but the meat is rotten.

1966: ALPAC report cut off government funding for MT

14
Implications of early era
Problems:
• Limited computation: search space grew exponentially, outpac-
ing hardware (100! ≈ 10157 > 1080)
• Limited information: complexity of AI problems (number
of words, objects, concepts in the world)

Contributions:
• Lisp, garbage collection, time-sharing (John McCarthy)
• Key paradigm: separate modeling and inference

16
Knowledge-based systems (70-80s)

Expert systems: Elicit specific domain knowledge from experts


in form of rules:

if [premises] then [conclusion]

18
Knowledge-based systems (70-80s)

DENDRAL: infer molecular structure from mass


spectrometry

MYCIN: diagnose blood infections, recommend


antibiotics

XCON: convert customer orders into parts


specification; save DEC $40 million a year by1986

20
Knowledge-based systems
Contributions:
• First real application that impacted industry
• Knowledge helped curb the exponential growth

Problems:
• Knowledge is not deterministic rules, need to model
uncertainty
• Requires considerable manual effort to create rules, hard to
maintain

1987:Collapse of Lisp machines


22
1943
Artificial neural networks
1943: introduced artificial neural networks, connect neu-
ral circuitry and logic (McCulloch/Pitts)

1969: Perceptrons book showed that linear models could


not solve XOR, killed neural nets research (Minsky)
Training networks
1986: popularization of backpropagation for training
multi-layer networks (Rumelhardt, Hinton, Williams)

1989: applied convolutional neural networks to


recogniz- ing handwritten digits for USPS (LeCun)

27
Deep learning
AlexNet (2012): huge gains in object recognition; trans-
formed computer vision community overnight

AlphaGo (2016): deep reinforcement learning, defeat


world champion Lee Sedol

29
A melting pot
• Bayes rule (Bayes, 1763) from probability
• Least squares regression (Gauss, 1795) from astronomy
• First-order logic (Frege, 1893) from logic
• Maximum likelihood (Fisher, 1922) from statistics
• Artificial neural networks (McCulloch/Pitts,
1943) from neuro-science
• Minimax games (von Neumann, 1944) fromeconomics
• Stochastic gradient descent (Robbins/Monro,1951) from opti-
mization
• Uniform cost search (Dijkstra, 1956) from algorithms
• Value iteration (Bellman, 1957) from control theory
33
Two broad views of AI

• AI agents: How can we create intelligence?


• AI tools: How can we benefit society?
An intelligent agent (human)
Perception Robotics Language
(actions) (communicate)

Knowledge Reasoning Learning


(inferences and make
decisions )
Machine (AI agents) vs Human
Huge Gap: Between humans and machines operate in.

Machine: narrow tasks, millions of examples.


• AlphaGo learned from 19.6 million games, but
can only do one thing: play Go

Human: diverse tasks, very few examples


• hand learn from a much wider set of experiences and
can do many things.

41
Paradigm

Modeling

Inference Learning

59
Paradigm: Modeling

Real world

• formulate this as a graph


Modeling • Nodes➔ points in the city,
• edges ➔roads,
5
6 7
4 • Weight➔ traffic on that road.
5 5 3 1
8 6 3
Model 8
0
8 1 1

7 2
7 2 3 6
4
8
6

61
Paradigm: inference
Inference: is to answer questions with respect to the model.
Focus of inference is usually on efficient algorithms that can answer
these questions.
6 7
4
5
5 5 3 1
8 6 3
Model 8
0
8 1 1

7 2
7 2 3 6
4
8
6

Inference

6 7
4
5
5 5 3 1
8 3
6
Predictions 8
0
8 1 1

7 2
7 2 3 6
4
8
6
Paradigm: learning
? ?
?
?
? ? ? ?
? ? ?
Model without parameters ?
?
? ? ?

? ?
? ? ? ?
?
?
?

+data
we have the right type of data, we can run a
machine learning algorithm to tune the
Learningparameters of the model

6 7
4
5
5 5 3 1
8 6 3
Model with parameters 8
0
8 1 1

7 2
7 2 3 6
4
8
6

65
Machine learning
Data Model

• The main driver of recent successes in AI

• Move from ”code” to ”data” to manage the


information complex- ity

• Requires a leap of faith: generalization

69
Type of Data
• Relational Data (Tables/Transaction/etc)
• Text Data (Web)
• Semi-structured Data (XML, JSON)
• Graph Data
– Social Network, Semantic Web, …
• Streaming Data
– Network traffic, sensor data,…
• etc
36
Types of Learning
• Supervised Learning,
– Classification,
– Regression, etc.
• Unsupervised Learning,
• Semi-Supervised Learning,
• Etc.

37
Types of Learning
Reinforcement Learning

Supervised
Learning

Unsupervised
Learning

23
From Gartner, Recht
Supervised Learning: Uses
• Prediction of future cases: Use the rule to
predict the output of future inputs
• Knowledge extraction: The rule is easy to
understand
• Outlier detection: Exceptions not covered by
the rule, e.g., fraud

39
Classification
• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk 40
Classification: Applications
• Face recognition: Pose,make-up, hair style
• Character recognition: Different handwriting
styles.
• Medical diagnosis: From symptoms to illnesses
• Biometrics: Recognition/authentication using
physical and/or behavioral characteristics: Face,
iris, signature, etc.

41
Regression
• Example: Price of a
used car
• x : car attributes
y = wx+w0
y : price
y = g (x | q )
g ( ) model,
q parameters

42
Unsupervised Learning
• Clustering: Grouping similar instances
• Some applications
– Customer segmentation
– Image compression

43
Unsupervised Learning

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis 25


Slide credit: Andrew Ng
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states ➔ actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze

29
Semi-Supervised Learning
• This learning technique uses labeled as well as un-
labeled data.
• Label data is in small quantity while un-labeled data is
in large amount.
• First un-supervised algorithm forms groups (clusters)
of un-labeled data and then existing labeled data is
used to label the clustered un-labeled data.
• Elements closer (similar) to each other are more likely
to have the same output label.

46
Student example
• Supervised learning: faculty supervision all the
times (data?)
• Unsupervised learning: student has to figure out a
concept himself (data?)
• Semi-Supervised learning:
– SL: faculty teaches some concepts in class
– SSL: student solves homework questions based
on similar concepts taught by faculty in class.

47
Common tasks
• Description
• Estimation
• Prediction
• Classification
• Clustering
• Association

48
Description
• Find ways to describe patterns and trends lying
within data.
• For example, a pollster (a person who conducts
or analyses opinion polls) may uncover
evidence that those who have been laid off are
less likely to support the present prime minster
in the election.
• Decision trees provide an intuitive and human
friendly explanation of their results

49
Estimation
• Estimation is similar to classification except that the target
variable is numerical rather than categorical (divided into
groups).
• For example, we might be interested in estimating the
systolic blood pressure reading of a hospital patient, based
on the patient’s age, gender, body-mass index, and blood
sodium levels.
• Estimation model can be used to new cases.
• Linear Regression, Neural networks

50
Estimation Examples
• Estimating the amount of money a randomly chosen family
of four will spend for back-to-school shopping this winter.
• Estimating the CGPA of a graduate student, based on that
student’s undergraduate CGPA.

51
Prediction
• Prediction is similar to classification and estimation, except
that for prediction, the results lie in the future.
– Predicting the price of a stock three months into the
future
– Predicting the percentage increase in traffic deaths next
year if the speed limit is increased
– Predicting whether a particular molecule in COVID-2019
drug discovery will lead to a profitable new drug for a
pharmaceutical company

52
Classification
• In classification, there is a target categorical variable, such
as income bracket that can partitioned into different
classes or categories:
– High income,
– Middle income, and
– Low income.

53
Clustering
• Clustering refers to the grouping of records,
observations, or cases into groups of similar objects

54
Association
• The association task is the job of finding which attributes
“go together”
• Finding out which items in a supermarket are purchased
together and which items are never purchased together
• Apriori algorithm

55
Identify the relevant task
• The present India PM party would like to approximate
how many seats their next opponent party will get in
coming election.
– Estimation: estimating the number of seats (numeric
target).

56
Identify the relevant task Cont’d
• A political strategist is seeking the groups for donations
for his party in coming elections.
– Clustering: examine the profile of each homogeneous
group derived from a particular state’s population;
– Association: discover interesting rules pertaining to a
large proportion of the population

57
Identify the relevant task Cont’d
• Investigating the proportion of subscribers to a
company’s cell phone plan that respond positively to
an offer of a service upgrade.
• Predicting degradation in telecommunications
networks
• Examining the proportion of children whose parents
read to them who are themselves good readers
• Determining the proportion of cases in which a new
drug will exhibit dangerous side effects

58
The Agent-Environment Interface

30
Reflex
“Low-level AI and Machine learning “High-level
intelligence” intelligence”
Reflex-based models
• A reflex-based model simply performs a fixed
sequence of computations on a given input.
• Common models in machine learning
• Examples: linear classifiers, deep neural networks
• Fully feed-forward (no backtracking)

73
Search problems
Markov decision processes
Adversarial games

Reflex States

“Low-level AI and Machine learning “High-level


intelligence” intelligence”
75
State-based models
Search problems: You control everything

Markov decision processes:

Adversarial games: against opponent (e.g., chess)

79
State-based models

White to move
State-based models
To model the state and transitions between states that are
triggered by actions. G(V, E)
States ➔ Nodes
Transitions ➔Edges.
In state-based models, solutions are procedural.
Applications:
• Games:Chess, Go, Pac-Man, Starcraft, etc.
• Robotics: Motion planning
• Natural language generation: Machine translation, image
captioning
Search problems
Markov decision processes Constraint satisfaction
problems
Adversarial games Bayesian networks

Reflex States Variables

“Low-level AI and Machine learning “High-level


intelligence” intelligence”
75
Sudoku
Some applications the order in which things are done isn’t important

Goal: put digits in blank squares so each row, column, and 3x3 sub-block
has digits 1–9

order of filling squares doesn’t matter in the evaluation criteria


Variable-based models
Constraint satisfaction problems: hard constraints
Sudoku, scheduling

Bayesian networks: soft dependencies (where variables are random


variables which are dependent on each other)
Search problems
Markov decision processes Constraint satisfaction problems
Adversarial games Bayesian networks

Reflex States Variables Logic


”Low-level intelligence” ”High-level intelligence”
AI and Machine learning
Search problems
Markov decision processes Constraint satisfaction
problems
Adversarial games Bayesian networks
Reflex States Variables Logic
“Low-level AI and Machine learning “High-level
intelligence” intelligence”
75
Motivation: virtual assistant

Tell information Ask questions

Use natural language


Need to:
• Digest heterogenous information
• Reason deeply with that information
Optimization
Discrete optimization: find the best discrete object

min Cost(p)
p∈Paths

Algorithmic tool: dynamic programming

Continuous optimization: find the best vector of real numbers

min TrainingError(w)
w∈Rd

Algorithmic tool: gradient descent

You might also like