You are on page 1of 15

chapter 3

unsupervised learning

1) Types of machine learning ?


2) Machine learning techniques ?
3) Unsupervised learning techniques ?
4) K-means technique ?

1) Types of machine learning ?


 supervised learning (SL): is the types of
machine learning in which machines are trained
using well "labelled"(input,output) training data, and
on basis of that data, machines predict the
model"mapping function".
- The labelled data means some input data is
already tagged with the correct output.
- supervised learning is called :"labelled learning".

 unsupervised learning: is a type of machine


learning in which models are trained using unlabelled
dataset and are allowed to act on that data without
any supervision.
- unsupervised learning is a machine learning
technique in which models are not supervised using
training dataset. Instead, models itself find the hidden
patterns and insights from the given data.

applications of unsupervised learning ?


grouping for customer for company.

 reinforcement learning :machine can learn


depend on reacting.
recommandation videos on youtube.

2) Machine learning techniques ?


3) Unsupervised learning techniques ?
clustering :
inter-cluster distance :is the distance between two
clusters."maximize"
intra-cluster distance: is the distance between the
objects in the same cluster."minimaize"

types of clustering :-
1) Partitioning algorithms
2) Hierarchical algorithms
3) Density-based algorithms

1) Partitioning algorithms:-
You divide the data set into groups(clusters) that
isn't overlapping between them.

techniques:-
"k-Means algorithm"
"PAM (k-Medoids algorithm)"
2) Hierarchical algorithms :-
there is overlapping between the clusters.

techniques:-
"DIANA "
"AGNES"
"ROCK"
3) Density-based algorithm:-
is depend on density in a certain volume,
when large amount of points collect in a
a certain area then form a cluster.
techniques:- "DBSCAN"

4) K-means technique ?

1) given the number of clusters(k).


2)select random initial centroids, the number of
centroids=k.
3) each point in the data set ,you calculate the
distance between it"point " and all centroids,the closest
centroids to this point will join it.
4) select new centroids based on the formed clusters
by calculating the mean of each cluster.
5) repeat until new centroid = old centroid
‫ر‬ ‫ر‬
example:- ‫المحاضه‬ ‫يف‬
‫التانيه‬

to evaluate k-means:-
calculate sum square error(SSE) :in each cluster calculate
the distance between each point and its centroid then
sum all.
the lower of SSE the better of clustering
the higher of SSE the worst clustering
to minimize SSE or SAE increase the number of
clusters"k".

complexity of k-means:
T(n)=o(n*m*k*I)
n=number of object/record
m=number of attributes
k=number of cluster
l=number of iteration

advantages of k-means:-
k-Means is simple and can be used for a wide variety of
object types.
It is also efficient both from storage requirement and
execution time point of views. By saving
disadvantages of k-means:-
- The k-Means is not suitable for all types of data. For
example, k-Means does not work on categorical data
because mean cannot be defined.
-
task???
how to download data set from kaggle?
https://www.kaggle.com/uciml/indian-liver-patient-recor
ds

You might also like