You are on page 1of 33

What Is Prediction?

 (Numerical) prediction is similar to classification


 construct a model
 use model to predict continuous or ordered
value for a given input
 Prediction is different from classification
 Classification refers to predict categorical class
label
 Prediction models continuous-valued functions
What Is Prediction?
 Major method for prediction: regression
 model the relationship between one or more
independent or predictor variables and a
dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear
model, Poisson regression, log-linear models,
regression trees
Simple Linear Regression Model
 Only one independent variable, x
 Relationship between x and y is
described by a linear function
 Changes in y are assumed to be caused
by changes in x
Simple Linear Regression Model
 Linear regression: involves a response
variable y and a single predictor variable x
 y=b+mx (line)
 y = w 0 + w1 x
 where w0 (y-intercept) and w1 (slope) are
regression coefficients
What is “Linear”?
 Remember this:
 Y=mX+B?

B
What’s Slope?
 A slope of 2 means that every 1-unit
change in X yields a 2-unit change in Y.

B
Linear Regression
 Method of least squares: estimates the
best-fitting straight line
| D|

 ( xi  x )( yi  y )  xy   x y
w  i 1
w1  n
1  (x  x)
| D|

i
2
 x 2

(  x ) 2

i 1 n

w  y w x
0 1
Example
 x is the number of years
of work experience of a
college graduate
 y is the corresponding
salary of the graduate
Example
| D|

 ( x  x )( y  y)
w 
i i
i 1

1  (x  x)
| D|

i
2

i 1
Example

We can predict that the salary of a college graduate


with, 10 years of experience is $58,600.
Application
 Consider the following dataset
Application
 Consider the following dataset
SVM—Support Vector Machines
 A new classification method for both linear and
nonlinear data
 It uses a nonlinear mapping to transform the
original training data into a higher dimension
 With the new dimension, it searches for the linear
optimal separating hyperplane (i.e., “decision
boundary”)

April 29, 2024 Data Mining: Concepts and Techniques13


SVM—Support Vector Machines
 With an appropriate nonlinear mapping to a
sufficiently high dimension, data from two classes
can always be separated by a hyperplane
 SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined
by the support vectors)
SVM—History and Applications
 Vapnik and colleagues (1992)—groundwork from
Vapnik & Chervonenkis’ statistical learning theory
in 1960s
 Used both for classification and prediction
 Applications:
 handwritten digit recognition, object
recognition, speaker identification,
benchmarking time-series prediction tests
Support Vector Machines

 Find a linear hyperplane (decision boundary) that will separate


the data
Support Vector Machines
B1

 One Possible Solution


Support Vector Machines

B2

 Another possible solution


Support Vector Machines

B2

 Other possible solutions


Support Vector Machines
B1

B2

 Which one is better? B1 or B2?


 How do you define better?
SVM (cont…)
 SVM works by mapping data to a high-
dimensional feature space so that data
points can be categorized, even when the
data are not otherwise linearly separable.
 A separator between the categories is
found, then the data are transformed in
such a way that the separator could be
drawn as a hyperplane.
SVM (cont…)
 How can you draw a linear line to separate
the data points?
Parameters of SVM: Kernel, Regularization,
Gamma and Margin
 In machine learning, a “kernel” is usually used to
refer to the kernel trick, a method of using a linear
classifier to solve a non-linear problem
 It transforms the non-linear data points in such a way that
it can be separable.
 The kernel function is what is applied on each data
instance to map the original non-linear observations
into a higher-dimensional space in which they
become separable.
 Kernel types
 Linear Polynomial,
 Radial basis function (RBF)
 Sigmoid
Regularization
 The Regularization parameter tells the
SVM optimization how much you want to
avoid misclassifying each training
example.
Gama parameter
 The gamma parameter defines how far the
influence of a single training example
reaches, with low values meaning ‘far’ and
high values meaning ‘close’.
Margin

 Margin is the distance between the left


hyperplane and right hyperplane
 A good margin is one where this
separation is larger for both the classes.
 A good margin allows the points to be in their
respective classes without crossing to other
class.
Margin (cont…)
B1

B2

b21
b22

margin
b11

b12
 Find hyperplane maximizes the margin => B1 is better
than B2
Support Vector Machines

 Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of


training tuples associated with the class labels yi
 There are infinite lines (hyperplanes) separating the two
classes but we want to find the best one (the one that
minimizes classification error on unseen data)
 SVM searches for the hyperplane with the largest margin,
i.e., maximum marginal hyperplane (MMH)
Margin (cont…)
B1
Support
Vectors

B2

b21
b22

margin
b11

b12
Margin (cont…)
B1

 
w x  b  0  
w  x  b  1
 
w  x  b  1

b11
2
  b12 Margin   2
 1 if w  x  b  1 || w ||
f (x)    
 1 if w  x  b  1
Support Vector Machines
 A separating hyperplane can be written as
 W●X+b=0
 where W={w1, w2, …, wn} is a weight vector
and b a scalar (bias)
 For 2-D it can be written as
 w0 + w1 x1 + w2 x2 = 0
 The hyperplane defining the sides of the
margin:
 H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1
 H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
References
 https://medium.com/machine-learning-10
1/chapter-2-svm-support-vector-machine-
theory-f0812effc72
 https://www.ibm.com/support/knowledgec
enter/de/SS3RA7_15.0.0/com.ibm.spss.m
odeler.help/svm_howwork.htm
 https://towardsdatascience.com/kernel-fu
nction-6f1d2be6091
Thank you

You might also like