You are on page 1of 16

TYL

Machine Learning
Classification

Rijo Jackson Tom, 9500 191 494, rijo.j@cmrit.ac.in


Don’t fill your
attendance on the
chat. I will share the
Google Form
Classification
Supervised Learning - K Nearest Neighbours, Decision Tree
05/03/2020
Classification
• Setof feature values numerical or
categorical
• Categorical – labels
• Model formed from features to labels
• Example

Rijo Jackson Tom


 Features: age, gender, income, profession
 Label: buyer, non-buyer

4
05/03/2020
Other applications
• Medical Diagnosis
 Feature values: age, gender, history, symptom1-
severity, symptom2-severity, test-result1, test-
result2
 Label: disease
• Email spam detection
 Feature values: sender-domain, length, images, keyword1,
keyword2, …, keywordn
 Label: spam or not-spam

Rijo Jackson Tom


• Credit card fraud detection
 Feature values: user, location, item, price
 Label: fraud or okay

5
05/03/2020
Other applications
• Email spam classification
• Bank customers loan pay willingness prediction.
• Cancer tumor cells identification.
• Sentiment analysis 
• Drugs classification
• Facial key points detection

Rijo Jackson Tom


• Pedestrians detection in an automotive car driving.

6
05/03/2020
K-Nearest Neighbour (K-NN)
• K Nearest Neighbor (KNN from now on) is one of those algorithms that
are very simple to understand but works incredibly well in practice.

• vision to proteins to
Application include
computational geometry to graphs
• Most people learn the algorithm and do not
use it much which is a pity as a clever use of
KNN can make things very simple.

Rijo Jackson Tom


• It also might surprise many to know that KNN
is one of the 
top 10 data mining algorithms.
7
05/03/2020
K-Nearest Neighbour (KNN)

Rijo Jackson Tom


8
Rijo Jackson Tom 05/03/2020
9
05/03/2020
kNN
• Choosing the optimal value for K is best done by first
inspecting the data.
• In general, a large K value is more precise as it reduces
the overall noise but there is no guarantee.
• Historically, the optimal K for most datasets has been
between 3-10. 
• One major drawback in calculating distance
measures directly from the training set is in the case

Rijo Jackson Tom


where variables have different measurement scales or
there is a mixture of numerical and categorical
variables. 

10
Rijo Jackson Tom 05/03/2020
11
Rijo Jackson Tom 05/03/2020
12
KNN examples

Rijo Jackson Tom 05/03/2020


13
KNN
examples

Rijo Jackson Tom 05/03/2020


14
Summary

Rijo Jackson Tom 05/03/2020


15
Python
implementation
Text reading:
https
://saravananthirumuruganathan.wordpress.com/2010/05/17/a-de
tailed-introduction-to-k-nearest-neighbor-knn-algorithm/

You might also like