Professional Documents
Culture Documents
Machine Learning
Dr Yogeshwar Singh Dadwhal
Asst Professor- SoCEMS
Defence Institute of Advanced Technology
Pune
What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.“ - Herbert Simon
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
Sebastian
Stanley
15
Deep Belief Net on Face Images
object models
object parts
(combination
of edges)
edges
pixels
Based on materials
by Andrew Ng
Learning of Object Parts
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
# Hidden Layers 1 2 4 8 10 12
7
(1,000,000 sq km)
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
1(Malignant)
0(Benign)
Tumor Size
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size
Individuals
[Source: Daphne Koller]
Unsupervised Learning
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
Discovered Clusters Industry Group
Understanding 1
Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,
Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,
Technology1-DOWN
Group related documents for Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,
browsing, group genes and Sun-DOWN
2
Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,
proteins that have similar ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,
Computer-Assoc-DOWN,Circuit-City-DOWN,
Technology2-DOWN
functionality, or group stocks Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN,
Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN
with similar price fluctuations
3
Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,
MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN
4
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
Summarization Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP
Clustering precipitation in
Australia
Types of Clusterings
Hierarchical clustering
A set of nested clusters organized as a hierarchical tree
Partitional Clustering
p1
p3 p4
p2
p1 p2 p3 p4
Traditional Hierarchical Clustering Traditional Dendrogram
p1
p3 p4
p2
p1 p2 p3 p4
Non-traditional Hierarchical Clustering Non-traditional Dendrogram
Clustering Algorithms
Hierarchical clustering
Density-based clustering
Knowledge Discovery in Databases
• KDD may be defined as: "The non-trivial process of identifying
valid, novel, potentially useful, and ul2mately understandable
patterns in data".
• KDD is an interactive and iterative process involving several steps.
You got your data: what’s next?
What kind of analysis do you need? Which model is more appropriate for it? …
Clean your data!
... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3 at +3
t
https://www.youtube.com/watch?v=4cgWya-wjgY
Inverse Reinforcement Learning
• Learn policy from user demonstrations
Environment/
Experience Knowledge
Testing data
Performance
Element
Based on slide by Ray Mooney
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
49
Slide credit: Ray Mooney
Data Types in Machine
Learning
Data Types
• Data types are a way of classification that specifies which type of
value a variable can store and what type of mathematical operations,
relational, or logical operations can be applied to the variable without
causing an error.
• In machine learning, it is very important to know appropriate
datatypes of independent and dependent variable.
• As it provides the basis for selecting classification or regression
models.
• Incorrect identification of data types leads to incorrect modeling
which in turn leads to an incorrect solution.
Different Types
Of Data Types
1. Quantitative Data Type
These are the data types that cannot be expressed in numbers. This
describes categories or groups and is hence known as the categorical
data type.
1.Nominal
2.Ordinal
3.Interval
4.Ratio