Professional Documents
Culture Documents
with
Semi-Supervised Learning
Algorithm
Presented By:Anurodh Kumar Sinha
2ND Year MSLIS Student
DRTC,ISI Bangalore
2012-2013
12/3/2012
Agenda
What is Pattern Recognition?
What is Machine Learning n why we
need..?
Types of Learning Algorithm
Need for Semi-Supervised Learning
Conclusion
12/3/2012
What is a Pattern. ?
An entity, vaguely defined, that could be
given a name,
e.g.:
handwritten word,
human face,
fingerprint image,
speech signal,
12/3/2012
What is Feature.?
A Feature is an individual measurable heuristic property
of a phenomenon being observed
Examples
In speech recognition, features for recognizing
phonemes can include noise ratios, length of sounds,
relative power, filter matches and many others.
In spam detection algorithms, features may include
whether certain email headers are present or absent,
whether they are well formed, what language the email
appears to be, the grammatical correctness of the text
12/3/2012
12/3/2012
12/3/2012
Methodology
of
Pattern Recognitions
It consists of the following:
1.We observe patterns
2.We study the relationships between the various
patterns.
3.We study the relationships between patterns and
ourselves and thus arrive at situations
4.We study the changes in situations and come to know
about the events.
5.We study events and thus find rule behind the events.
6. Using the rule, we can predict future events.
12/3/2012
An Example
Suppose that:
A fish packing plant
wants to automate the
process of sorting
incoming fish on a
conveyor belt according
to species,
There are two species:
Sea bass,
Salmon.
12/3/2012
An Example
12/3/2012
10
An Example
How to distinguish one specie from the other ?
(length, width, weight, number and shape of fins,
tail shape,etc.)
12/3/2012
11
An Example
Suppose we also know that:
Sea bass are typically wider than salmon.
But it may happen that decision cant be
made on single feature
12/3/2012
12
12/3/2012
13
Examples of applications
Handwritten: sorting letters by postal code,
input device for PDAs.
Optical Character
Recognition (OCR)
Biometrics
Diagnostic systems
Military applications
12/3/2012
12/3/2012
15
12/3/2012
16
12/3/2012
17
Typical Example
Given:
9714 patient records, each describing a pregnancy and birth
Each patient record contains 215 features
Learn to predict:
Classes of future patients at high risk for Emergency Cesarean
Section
12/3/2012
18
The Sub-Fields
of
Machine Learning
Supervised Learning
Unsupervised Learning
Semi-Supervsed Learning
12/3/2012
19
Supervised Learning
12/3/2012
20
Supervised Learning
Supervised learning is the machine learning task of inferring
a function from labeled training data.
In training data each pair consisting of an input object
(typically a vector) and a desired output value (also called the
supervisory signal).
A supervised learning algorithm analyzes the training data
and produces an inferred function, which is called a classifier
(if the output is discrete) or a regression function (if the output
is continuous).
The inferred function should predict the correct output value
for any valid input object. This requires the learning algorithm
to generalize from the training data to unseen situations in a
"reasonable" way.
12/3/2012
21
Accuracy
12/3/2012
22
Example
A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
age
Job
House
credit rating
etc.
12/3/2012
23
12/3/2012
24
25
Bayesian Classifier
The Simple Bayesian Classifier (SBC) uses probabilistic
methods for classification
The basis of bayesian classifier is: The probability of document
d being in class c is computed as-
12/3/2012
26
12/3/2012
27
12/3/2012
28
Unsupervised Learning
12/3/2012
29
What is Unsupervised
Learning.?
12/3/2012
30
12/3/2012
31
Clustering is subjective
12/3/2012
Simpson's Family
School Employees
Females
Males
32
12/3/2012
33
12/3/2012
34
K-means algorithm
12/3/2012
35
An example
12/3/2012
36
An example (cont )
12/3/2012
37
Semi-Supervised learning
12/3/2012
38
Supervised Learning
versus
Unsupervised Learning
Unsupervised clustering Group similar objects together
to find clusters
Minimize intra-class distance
Maximize inter-class distance
12/3/2012
39
Customer modeling
Images
Protein sequences
Medical outcomes
Web pages
12/3/2012
40
12/3/2012
41
Semi-Supervised Learning
Combines labeled and unlabeled data
during training to improve performance:
Semi-supervised classification: Training on labeled data exploits
additional unlabeled data, frequently resulting in a more accurate
classifier.
Semi-supervised clustering: Uses small amount of labeled data to
aid and bias the clustering of unlabeled data.
Unsupervised
clustering
12/3/2012
Semi-supervised
learning
Supervised
classification
42
Semi-Supervised Classification
An initial classifier is designed using the labeled data set D(l).
This classifier is then used to assign class labels to examples
in D(u). Then the classifier is re-trained using D(l) U D(u).
The last two steps are usually repeated for a given number of
times or until some criterion is satisfied
12/3/2012
43
Semi-Supervised Classification
Example
.
.
.
. ..
.. .
.. .
.
. .
. . .
. .
12/3/2012
44
Semi-Supervised Classification
Example
.
.
.
. ..
.. .
.. .
.
. .
. . .
. .
12/3/2012
45
Semi-Supervised Classification
Algorithms:
Semisupervised EM
[Ghahramani:NIPS94,Nigam:ML00].
Co-training [Blum:COLT98].
Transductive SVMs [Vapnik:98,Joachims:ICML99].
Graph based algorithms
Assumptions:
Known, fixed set of categories given in the labeled
data.
Goal is to improve classification of examples into
these known categories.
12/3/2012
46
Semi-Supervised clustering
Input:
A set of unlabeled objects, each described by a set of attributes
(numeric and/or categorical)
A small amount of domain knowledge
Output:
A partitioning of the objects into k clusters (possibly with some
discarded as outliers)
Objective:
Maximum intra-cluster similarity
Minimum inter-cluster similarity
High consistency between the partitioning and the domain
knowledge
12/3/2012
47
12/3/2012
48
Determine
its label
Illustration
Must-link
x
49
Illustration
Determine
its label
Cannot-link
50
Semi-Supervised Clustering
According to different given domain knowledge:
Users provide class labels (seeded points) a priori to
some of the documents
Seeded points
12/3/2012
51
52
Co-Training Algorithm
12/3/2012
53
Conclusions
Semi-supervised learning is an area of increasing
importance in Machine Learning.
development.
12/3/2012
54
Reference
Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley &
Sons, New York, 1982. (2nd edition 2000).
Fukunaga: Introduction to Statistical Pattern Recognition. Academic
Press, 1990.
Sergios Theodoridis, Konstantinos Koutroumbas , pattern recognition
, Pattern Recognition ,Elsevier(USA)) ,1982
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability
of co-training. In Proceedings of the ninth international conference on
Information and knowledge management, pages 86{93. ACM, 2000.
http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-textclassification-1.html
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.htm
l
12/3/2012
55
Any
Question..Suggestion.Feedback.???
12/3/2012
56
Thank You
12/3/2012
57