Professional Documents
Culture Documents
Classification tasks
• The class is treated as positive, and samples from other classes are
negative,
• For classifying data, n-classifiers are used and a class label with the
highest certainty is assigned to a given sample.
MIW - Classification - algorithms, logistic regression - lecture 4 4
Grouping
KNN algorithm:
1. Select some parameter value k and distance metric.
2. Find k nearest neighbors of the sample you want to classify.
3. Assign the class label by majority voting.
MIW - Classification - algorithms, logistic regression - lecture 4 7
KNN algorithm
• classification algorithm
• Calculates k of the nearest data points from the data point X. Uses
these points to determine which X class belongs to
• The classified point does not change the class,
• Requires only calculation of k distance.
K-mean algorithm
• grouping algorithm,
• Uses the distance of data to k-centroids,
• Centroids are not necessarily data points,
• Updates the centroids after each iteration,
• It must iterate the data until the centroid point stabilizes.
MIW - Classification - algorithms, logistic regression - lecture 4 13
• In case the desired i-th out value of y i is the class label (ie -1 or
1) and the neuron corresponds to the value yˆi then you can
calculate the response difference.
• The weight values should be corrected with a factor proportional
to the error
5wj = η · (yi − yˆi ) · xij
where η is a learning rate,
• New value of weight wj after correction for sample i is equal:
wj ⇐ wj + 5wj
MIW - Classification - algorithms, logistic regression - lecture 4 16
Multi-class classification
Regularization
λ 2 λX 2
||w|| = wj (1)
2 2
j
MIW - Classification - algorithms, logistic regression - lecture 4 21
wT (xpoz − xneg )
⇒ max
||w||
2
by maximizing ||w||
⇒ max
Instead of maximizing 2
||w||
can be minimized ||w|| or square ||w||2 and
“flexibilise” the hyperplane equations, introducing additional variables ζ i :
w T · xi ≥ 1 − ζ i gdy yi = 1
wT · xi ≤ −1 + ζ i gdy yi = −1
i j ||xi − xj ||2
k(x , x ) = exp(− ) ≈ exp(−γ||xi − xj ||2 )
2σ
Kernel functions for different values of γ = σ12 , small and large
MIW - Classification - algorithms, logistic regression - lecture 4 28
Logit function
logit(p(y = 1|x)) = w0 x0 + w1 x1 + . . . wn xn = wT · x
• z is a weight sum z = w0 x0 + w1 x1 + . . . wn xn = wT · x,
MIW - Classification - algorithms, logistic regression - lecture 4 31
• In the perceptron, the cost function was the sum of error squares
X
(φ(z)i − y i )2
i
Learning the model will involve minimizing the cost function J(W ).
• The derivative of the φ activation function is:
δ δ 1 1 −z
φ(z) = = e = φ(z)(1 − φ(z))
δwj δwj 1 + e−z (1 + e−z )2
• For the wj weight, the gradient of the cost function will be:
δ 1 1 δ
J(w) = [y − (1 − y) ] φ(z) =
δwj φ(z) 1 − φ(z) δwj
= . . . = (y − φ(z)) · xj
• The effect of all i samples the weight of wj after correction is:
X
wj ⇐ wj + η (y i − φ(z i )) · xij = wj + ∆wj
i
MIW - Classification - algorithms, logistic regression - lecture 4 35
c l a s s LogisticRegressionGD ( object ) :
d e f __init__ ( s e l f , e t a = 0 . 0 5 , n _ i t e r =100 , random_state =1):
s e l f . eta = eta
s e l f . n_iter = n_iter
s e l f . random_state = random_state
def fit (
s e l f , X, y ) :
rgen = np . random . RandomState ( s e l f . r a n d o m _ s t a t e )
self. w_ = r g e n . n o r m a l ( l o c = 0 . 0 , s c a l e = 0 . 0 1 , s i z e =1 + X . s h a p e [ 1 ] )
self. cost_ = [ ]
for i in range ( s e l f . n_iter ) :
n e t _ i n p u t = s e l f . n e t _ i n p u t (X)
output = s e l f . a c t i v a t i o n ( net_input )
e r r o r s = ( y − output )
s e l f . w_ [ 1 : ] += s e l f . e t a ∗ X . T . d o t ( e r r o r s )
s e l f . w_ [ 0 ] += s e l f . e t a ∗ e r r o r s . sum ( )
c o s t = (−y . d o t ( np . l o g ( o u t p u t ) ) − ( ( 1 − y ) . d o t ( np . l o g ( 1 − o u t p u t ) ) ) )
s e l f . c o s t _ . append ( c o s t )
return s e l f
def net_input ( s e l f , X) :
r e t u r n np . d o t (X, s e l f . w_ [ 1 : ] ) + s e l f . w_ [ 0 ]
def predict ( s e l f , X) :
r e t u r n np . w h e r e ( s e l f . n e t _ i n p u t (X) >= 0 . 0 , 1, 0)
MIW - Classification - algorithms, logistic regression - lecture 4 36
X 1
J(w) = C · [−y i log(φ(z i )) − (1 − y i ) log(1 − φ(z i ))] + ||w||2
2
i
l r = L o g i s t i c R e g r e s s i o n (C= 1 0 0 0 . 0 , r a n d o m _ s t a t e =1)
l r . f i t ( X_train_std , y _ t r a i n )
MIW - Classification - algorithms, logistic regression - lecture 4 37