Intro To ML

Introduction to
Machine Learning
Dr Yogeshwar Singh Dadwhal
Asst Professor- SoCEMS
Defence Institute of Advanced Technology
Pune
What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.“ - Herbert Simon
“Field of study that give computers the ability to learn

without being explicitly programmed” - Arthur Samuel
[1959]
Definition by Tom Mitchell (1998):

Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
Slide credit: Pedro Domingos

When Do We Use Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
Learning isn’t always useful:

• There is no need to “learn” to calculate payroll
Based on slide by E. Alpaydin
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2
Slide credit: Geoffrey Hinton

Some more examples of tasks that are best
solved by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates
Slide credit: Geoffrey Hinton

Sample Applications
• Web search
• Computational biology
• Finance
• Defence
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]

Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words

P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.

P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
Slide credit: Ray Mooney

State of the Art Applications of
Machine Learning
Digital Assistants
Autonomous Cars
• Nevada made it legal for

autonomous cars to drive on
roads in June 2011
• As of 2013, four states (Nevada,
Florida, California, and
Michigan) have legalized
autonomous cars
Penn’s Autonomous Car 
(Ben Franklin Racing Team)
Autonomous Car Sensors
Autonomous Car Technology
Path
Planning
Laser Terrain Mapping
Learning from Human Drivers

Adaptive Vision
Sebastian
Stanley
Images and movies taken from Sebastian Thrun’s multimedia .

Google Duplex!
ML for Farm
Deep Learning in the Headlines
15
Deep Belief Net on Face Images
object models
object parts
(combination
of edges)
edges
pixels
Based on materials
by Andrew Ng
Learning of Object Parts
Slide credit: Andrew Ng

Training on Multiple Objects
Trained on 4 classes (cars, faces,

motorbikes, airplanes).
Second layer: Shared-features
and object-specific features.
Third layer: More specific
features.

Scene Labeling via Deep Learning
[Farabet et al. ICML 2012, PAMI 2013]

Inference from Deep Learned Models
Generating posterior samples from faces by “filling in” experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference

Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System
ML used to predict of phone states from the sound spectrogram
Deep learning has state-of-the-art results
# Hidden Layers 1 2 4 8 10 12
Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1
Baseline GMM performance = 15.4%

[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
Other side of the coin
Slide credit: Li Deng, MS Research

“Bias” in Machine Learning
“Bias” addressed
“Racist” Machine Learning?
Where is the bride?
A “reality” check
Adversaries!
Causality v.s. Correlation
Types of Learning
Types of Learning
• Supervised (inductive) learning

– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Based on slide by Pedro Domingos

An example application
An emergency room in a hospital measures 17 variables (e.g.,
blood pressure, age, etc) of newly admitted patients.
A decision is needed: whether to put a new patient in an
intensive-care unit.
Due to the high cost of ICU, those patients who may survive
less than a month are given higher priority.
Problem: to predict high-risk patients and discriminate them
from low-risk patients.
Another application
A credit card company receives thousands of applications for
new cards. Each application contains information about an
applicant,
age
Marital status
annual salary
outstanding debts
credit rating
etc.
Problem: to decide whether an application should approved,
or to classify applications into two categories, approved and
not approved.
Machine learning and our focus
• Like human learning from past experiences.

• A computer does not have “experiences”.
• A computer system learns from data, which represent some
“past experiences” of an application domain.
• Our focus: learn a target function that can be used to predict
the values of a discrete class attribute, e.g., approve or not-
approved, and high-risk or low risk.
• The task is commonly called: Supervised learning,
classification, or inductive learning.
The data and the goal
Data: A set of data records (also called examples, instances or

cases) described by
k attributes: A1, A2, … Ak.
a class: Each example is labelled with a pre-defined class.
Goal: To learn a classification model from the data that can be
used to predict the classes of new (future, or test)
cases/instances.
An example: data (loan application)
Approved or not
An example: the learning task
Learn a classification model from the data
Use the model to classify future loan applications
into
Yes (approved) and
No (not approved)
What is the class for following case/instance?
Supervised vs. unsupervised Learning
Supervised learning: classification is seen as supervised learning
from examples.
Supervision: The data (observations, measurements, etc.) are labeled with
pre-defined classes. It is like that a “teacher” gives the classes (supervision).
Test data are classified into these classes too.
Supervised learning: discover patterns in the data that relate data
attributes with a target (class) attribute.
These patterns are then utilized to predict the values of the target attribute in future
data instances
Unsupervised learning (clustering)
Class labels of the data are unknown
Given a set of data, the task is to establish the existence of classes or clusters
in the data
Unsupervised learning: The data have no target attribute.
We want to explore the data to find some intrinsic structures in them.
• Knowledge Discovery in Databases: KDD may be defined as: "The

non trivial process of identifying valid, novel, potentially useful, and
ultimately understandable patterns in data".
Examples in learning
• Supervised Models
– Neural Networks
– Multi Layer Perceptron
– Decision Trees
• Unsupervised Models
– Different Types of Clustering
– Distances and Normalization
– Kmeans
– Self Organizing Maps
• Combining different models
– CommiOee Machines
– Introducing a Priori Knowledge
– Sleeping Expert Framework
• KDD and Data Mining Tasks
• Finding the optimal approach
Supervised learning process: two steps
 Learning (training): Learn a model using the
training data
 Testing: Test the model using unseen test
data to assess the model accuracy
Number of correct classifications
Accuracy = ,
Total number of test cases
What do we mean by learning?
Given
a data set D,
a task T, and
a performance measure M,
a computer system is said to learn from D to perform
the task T if after learning the system’s performance on T
improves as measured by M.
In other words, the learned model helps the system to
perform T better as compared to no learning.
An example
Data: Loan application data
Task: Predict whether a loan should be approved or not.
Performance measure: accuracy.
No learning: classify all future applications (test data) to

the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
We can do better than 60% with learning.
Fundamental assumption of learning
Assumption: The distribution of training examples is identical
to the distribution of test examples (including future unseen
examples).
In practice, this assumption is often violated to certain degree.

Strong violations will clearly result in poor classification
accuracy.
To achieve good accuracy on the test data, training examples
must be sufficiently representative of the test data.
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent
7
(1,000,000 sq km)
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Based on example by Andrew Ng

• Given (x1, y1), (x2, y2), ..., (xn, yn)
1(Malignant)
0(Benign)
Tumor Size

Tumor Size
• Given (x1, y1), (x2, y2), ..., (xn, yn)
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant

Tumor Size
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size

Unsupervised Learning
• Given x1, x2, ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
Genomics application: group individuals by genetic similarity
Genes
Individuals
[Source: Daphne Koller]
Organize computing clusters Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Market segmentation Astronomical data analysis

• Independent component analysis – separate a
combined signal into its original sources
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html

• Independent component analysis – separate a
combined signal into its original sources
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html

What is Cluster Analysis?
Finding groups of objects in data such that the
objects in a group will be similar (or related) to
one another and different from (or unrelated to)
the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
Discovered Clusters Industry Group
Understanding 1
Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,
Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,
Technology1-DOWN
Group related documents for Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,
browsing, group genes and Sun-DOWN
2
Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,
proteins that have similar ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,
Computer-Assoc-DOWN,Circuit-City-DOWN,
Technology2-DOWN
functionality, or group stocks Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN,
Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN
with similar price fluctuations
3
Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,
MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN
4
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
Summarization Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP
Reduce the size of large data

sets
Clustering precipitation in
Australia
Types of Clusterings
A clustering is a set of clusters

Important distinction between hierarchical and partitional sets
of clusters
Partitional Clustering
A division data objects into non-overlapping subsets (clusters) such that each data object
is in exactly one subset
Hierarchical clustering
A set of nested clusters organized as a hierarchical tree
Partitional Clustering
Original Points A Partitional Clustering

Hierarchical Clustering
p1
p3 p4
p2
p1 p2 p3 p4
Traditional Hierarchical Clustering Traditional Dendrogram
p1
p3 p4
p2
p1 p2 p3 p4
Non-traditional Hierarchical Clustering Non-traditional Dendrogram
Clustering Algorithms
K-means and its variants
Hierarchical clustering
Density-based clustering
Knowledge Discovery in Databases
• KDD may be defined as: "The non-trivial process of identifying
valid, novel, potentially useful, and ul2mately understandable
patterns in data".
• KDD is an interactive and iterative process involving several steps.
You got your data: what’s next?
What kind of analysis do you need? Which model is more appropriate for it? …
Clean your data!
• Data preprocessing transforms the raw data into

a format that will be more easily and effectively
processed for the purpose of the user.
• Some tasks
• sampling: selects a representative subset
from a large population of data;
• Noise treatment Use standard
• strategies to handle missing data: sometimes formats!
your rows will be incomplete, and not all
parameters are measured for all samples.
• normalization
• feature extraction: pulls out specified data that is
significant in some particular context.
Missing Data
• Missing data are a part of almost all research, and we all have to decide
how to deal with it.
• Complete Case Analysis: use only rows with all the values
• Available Case Analysis
• Substitution
– Mean Value: replace the missing value with the mean value
for that particular attribute
– Regression Substitution: we can replace the
missing value with historical value from similar cases
– Matching Imputation: for each unit with a missing y, find a unit
with similar values of x in the observed data and take its y value
– Maximum Likelihood, EM, etc
• Some DM models can deal with missing data better than others.
• Which technique to adopt really depends on your data
Data Mining
• Crucial task within the KDD
• Data Mining is about automating the
process of searching for patterns in the
data.
• More in detail, the most relevant DM
tasks are:
– association
– sequence or path analysis
– clustering
– classification
– regression
– visualization
Finding Solution via Purposes
• You have your data, what kind of analysis do you need?
• Regression
– predict new values based on the past, inference
– compute the new values for a dependent variable based on the values of
one or more measured attributes
• Classification:
– divide samples in classes
– use a trained set of previously labeled data
• Clustering
– partitioning of a data set into subsets (clusters) so that data in each
subset ideally share some common characteristics
• Classification is in a some way similar to the clustering, but requires that the
analyst know ahead of time how classes are defined.
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states  actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
The Agent-Environment Interface
Agent and environment interact at discrete time steps : t = 0, 1, 2, K

Agent observes state at step t : st ∈S
produces action at step t : at ∈ A(st )
gets resulting reward : rt +1 ∈ℜ
and resulting next state : st +1
... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3 at +3
t
Slide credit: Sutton & Barto

Reinforcement Learning
https://www.youtube.com/watch?v=4cgWya-wjgY
Inverse Reinforcement Learning
• Learn policy from user demonstrations
Stanford Autonomous Helicopter

http://heli.stanford.edu/
https://www.youtube.com/watch?v=VCdxqn0fcnE
Framing a Learning Problem
Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience
Training data Learner
Environment/
Experience Knowledge
Testing data
Performance
Element
Based on slide by Ray Mooney
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
• If examples are not independent, requires

collective classification
• If test distribution is different, requires
transfer learning
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year
• Every ML algorithm has three components:

– Representation
– Optimization
– Evaluation

Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks

Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution

Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

ML in Practice
• Understand domain, prior knowledge, and goals

• Data integration, selection, cleaning, pre-processing, etc.
Loop • Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
Based on a slide by Pedro Domingos

Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
• Function approximation can be viewed as a search

through a space of hypotheses (representations of
functions) for one that best fits a set of training data.
• Different learning methods assume different

hypothesis spaces (representation languages) and/or
employ different search techniques.
49
Data Types in Machine
Learning
Data Types
• Data types are a way of classification that specifies which type of
value a variable can store and what type of mathematical operations,
relational, or logical operations can be applied to the variable without
causing an error.
• In machine learning, it is very important to know appropriate
datatypes of independent and dependent variable.
• As it provides the basis for selecting classification or regression
models.
• Incorrect identification of data types leads to incorrect modeling
which in turn leads to an incorrect solution.
Different Types
Of Data Types
1. Quantitative Data Type
This type of data type consists of numerical values. Anything which is

measured by numbers.
eg., Profit, quantity sold, height, weight, temperature, etc.
This data type tries to quantify things and it does by

considering numerical values that make it countable in nature.
The price of a smartphone, discount offered, number of
This is again of two types ratings on a product, the frequency of processor of a
A.) Discrete Data Type: – smartphone, or ram of that particular phone, all these things
fall under the category of Quantitative data types.
B.) Continuous Data Type: – The key thing is that there can be an infinite number of values
a feature can take. For instance, the price of a smartphone can
vary from x amount to any value and it can be further broken
down based on fractional values.
Examples of Quantitative Data :
•Height or weight of a person or object
•Room Temperature
•Scores and Marks (Ex: 59, 80, 60, etc.)
•Time
A.) Discrete
Data Type: –
• The numeric data which have
discrete values or whole numbers.
This type of variable value if
expressed in decimal format will
have no proper meaning. Their
values can be counted.
• E.G.: – No. Of cars you have,
no. Of marbles in containers, The numerical values which fall under are integers or whole
students in A class, etc. numbers are placed under this category. The number of
speakers in the phone, cameras, cores in the processor, the
number of sims supported all these are some of the examples
of the discrete data type.
Examples of Discrete Data :
•Total numbers of students present in a class
•Cost of a cell phone
•Numbers of employees in a company
•The total number of players who participated
in a competition
•Days in a week
B.) Continuous
Data Type: –
• The numerical measures which
can take the value within a certain
range. This type of variable value
if expressed in decimal format has
true meaning. Their values can
not be counted but measured.
The value can be infinite
• E.G.: – Height, weight, time, The fractional numbers are considered as continuous values.
area, distance, measurement of These can take the form of the operating frequency of the
rainfall, etc.
processors, the android version of the phone, wifi frequency,
temperature of the cores, and so on.
Examples of Continuous Data :
•Height of a person
•Speed of a vehicle
•“Time-taken” to finish the work
•Wi-Fi Frequency
•Market share price
• Difference between Discrete and
Continuous Data
• Discrete data are countable and finite; they are
whole numbers or integers
• Continuous data are measurable; they are in
the form of fraction or decimal
• Discrete data are represented mainly by bar
graphs
• Continuous data are represented in the form of
a histogram
• The values cannot be divided into subdivisions
into smaller pieces
• The values can be divided into subdivisions into
smaller pieces.
• Discrete data have spaces between the values.
• Continuous data are in the form of a
continuous sequence
2. Qualitative Data Type: –
These are the data types that cannot be expressed in numbers. This
describes categories or groups and is hence known as the categorical
data type.
Qualitative or Categorical Data describes the object under

This can be divided into:- consideration using a finite set of discrete classes. It means
that this type of data can’t be counted or measured easily
A. Structured Data: using numbers and therefore divided into categories. The
B. Unstructured Data: gender of a person (male, female, or others) is a good example
of this data type.
These are usually extracted from audio, images, or text
medium. Another example can be of a smartphone brand that
provides information about the current rating, the color of the
phone, category of the phone, and so on. All this information
can be categorized as Qualitative data.
The other examples of qualitative data are :
•What language do you speak
•Favourite holiday destination
•Opinion on something (agree, disagree, or neutral)
•Colours
A. Structured
Data:
• This type of data is either

number or words. This can
take numerical values but
mathematical operations
cannot be performed on it.
This type of data is
expressed in tabular format.
• E.G.) Sunny=1, cloudy=2,
windy=3 or binary form data
like 0 or1, good or bad, etc.
B. Unstructured Data:
• This type of data does not

have the proper format and is
therefore known as
unstructured data. This
comprises textual data, sounds,
images, videos, etc.
Besides this, there are also other types refer as data types
preliminaries or data measures:-
1.Nominal
2.Ordinal
3.Interval
4.Ratio
These can also be refer different scales of measurements.

I. Nominal Data
Type:
Gender (Female, Male), An Example Of Nominal Data Type
• This is in use to express names or

These are the set of values that don’t
labels which are not order or possess a natural ordering. Let’s understand
measurable. this with some examples. The color of a
• E.G., Male or female (gender), smartphone can be considered as a
nominal data type as we can’t compare one
race, country, etc. color with others.
It is not possible to state that ‘Red’ is
greater than ‘Blue’. The gender of a person
is another one where we can’t differentiate
between male, female, or others. Mobile
phone categories whether it is midrange,
budget segment, or premium smartphone
is also nominal data type.
Examples of Nominal Data :
•Colour of hair (Blonde, red, Brown, Black, etc.)
•Marital status (Single, Widowed, Married)
•Nationality (Indian, German, American)
•Gender (Male, Female, Others)
•Eye Color (Black, Brown, etc.)
II. Ordinal Data
Type:
Rating (Good, Average, Poor), An Example Of Ordinal Data Type
• This is also A categorical data type

These types of values have a
like nominal data but has some natural ordering while
natural ordering associated with it. maintaining their class of values.
• E.G., Likert rating scale, shirt sizes, If we consider the size of a
clothing brand then we can easily
ranks, grades, etc.
sort them according to their
These categories help us deciding which encoding strategy can be applied name tag in the order of small <
to which type of data. Data encoding for Qualitative data is important medium < large. The grading
because machine learning models can’t handle these values directly and system while marking candidates
needed to be converted to numerical types as the models are in a test can also be considered as
mathematical in nature. an ordinal data type where A+ is
For nominal data type where there is no comparison among the definitely better than B grade.
categories, one-hot encoding can be applied which is similar to binary
coding considering there are in less number and for the ordinal data type,
label encoding can be applied which is a form of integer encoding.
Examples of Ordinal Data :
•When companies ask for feedback, experience, or
satisfaction on a scale of 1 to 10
•Letter grades in the exam (A, B, C, D, etc.)
•Ranking of peoples in a competition (First, Second,
Third, etc.)
•Economic Status (High, Medium, and Low)
•Education Level (Higher, Secondary, Primary)
Difference between Nominal and Ordinal Data
•Nominal Data
•Ordinal Data
•Nominal data can’t be quantified, neither they have any intrinsic ordering
•Ordinal data give some kind of sequential order by their position on the scale
•Nominal data is qualitative data or categorical data
•Ordinal data is said to be “in-between” of qualitative data and quantitative data
•They don’t provide any quantitative value, neither we can perform any arithmetical
operation
•They provide sequence and can assign numbers to ordinal data but cannot perform
the arithmetical operation
•Nominal data cannot be used to compare with one another
•Ordinal data can help to compare one item with another by ranking or ordering
III. Interval Data Type:
• This is numeric data that has

proper order and the exact zero
means the true absence of a value
attached. Here zero means not a
complete absence but has some
value. This is the local scale.
• E.G., Temperature measured in
degree celsius, time, sat score, credit
score, PH, etc. The difference
between values is familiar. In this
case, there is no absolute zero.
Temperature, An Example Of Interval Data Type

Weight, An Example Of Ratio Data Type
IV. Ratio Data Type:

• This quantitative data type is the same as the interval data type but has the absolute zero. Here
zero means complete absence and the scale starts from zero. This is the global scale.
• E.G., Temperature in kelvin, height, weight, etc.
A Brief History of
Machine Learning
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM

History of Machine Learning (cont.)
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning

History of Machine Learning (cont.)
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
– ???
Based on slide by Ray Mooney

Intro To ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro To ML

Uploaded by

Copyright:

Available Formats

Introduction to

“Field of study that give computers the ability to learn

Definition by Tom Mitchell (1998):

Slide credit: Pedro Domingos

Learning isn’t always useful:

Slide credit: Geoffrey Hinton

Slide credit: Geoffrey Hinton

Slide credit: Pedro Domingos

T: Recognizing hand-written words

T: Driving on four-lane highways using vision sensors

T: Categorize email messages as spam or legitimate.

Slide credit: Ray Mooney

• Nevada made it legal for

Laser Terrain Mapping

Learning from Human Drivers

Images and movies taken from Sebastian Thrun’s multimedia .

Slide credit: Andrew Ng

Trained on 4 classes (cars, faces,

Slide credit: Andrew Ng

[Farabet et al. ICML 2012, PAMI 2013]

Slide credit: Andrew Ng

ML used to predict of phone states from the sound spectrogram

Deep learning has state-of-the-art results

Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1

Baseline GMM performance = 15.4%

Slide credit: Li Deng, MS Research

• Supervised (inductive) learning

Based on slide by Pedro Domingos

• Like human learning from past experiences.

Data: A set of data records (also called examples, instances or

• Knowledge Discovery in Databases: KDD may be deﬁned as: "The

No learning: classify all future applications (test data) to

In practice, this assumption is often violated to certain degree.

Based on example by Andrew Ng

Based on example by Andrew Ng

Based on example by Andrew Ng

Based on example by Andrew Ng

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis

Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html

Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html

Reduce the size of large data

A clustering is a set of clusters

Original Points A Partitional Clustering

K-means and its variants

• Data preprocessing transforms the raw data into

Agent and environment interact at discrete time steps : t = 0, 1, 2, K

Slide credit: Sutton & Barto

Stanford Autonomous Helicopter

Training data Learner

• If examples are not independent, requires

• Every ML algorithm has three components:

Slide credit: Pedro Domingos

Slide credit: Ray Mooney

Slide credit: Ray Mooney

Slide credit: Pedro Domingos

• Understand domain, prior knowledge, and goals

Based on a slide by Pedro Domingos

• Function approximation can be viewed as a search

• Different learning methods assume different

This type of data type consists of numerical values. Anything which is

This data type tries to quantify things and it does by