You are on page 1of 29

Aprendizaje No

Supervisado
TC2034 – Modelación del Aprendizaje
con Inteligencia Artificial

https://miro.medium.com/max/1920/1*VACikYaZIHb2OctTRohn8A.jpeg
Unsupervised Learning
Definition
• Uses information that is neither classified nor labelled • There are two categories of algorithms:
• Clustering
• Discover inherent grouping in data
• The algorithm (learner) has to discover patterns without
guidance • Dimension Reduction (Association)
• Discover rules that describe large portions of data

• Usually groups information by:


• Similarities
• Patterns
• Differences

2023FJ - juresti@tec.mx
Pedro Armijo 2
Unsupervised Learning…

Applications Challenges
• News sections: categorize articles • Computational complexity – high volume of data

• Computer vision: object recognition • Longer training times

• Medical imaging: image detection, classification, • High risk of inaccurate results


segmentation

• Need of human intervention for validation


• Anomaly detection: discover atypical points

• Lack of transparency to know exactly how data was


• Customer persona profiles organized

• Recommendation engines

2023FJ - juresti@tec.mx
Pedro Armijo 3
Clustering

2023FJ - juresti@tec.mx https://www.researchgate.net/profile/Adam-Warner-3/publication/333333035/figure/fig2/AS:761900535119875@1558662651725/Clustered-gene-expression-visualized-with-t-SNE-Gene-expression-values-


TPM-were.png

Pedro Armijo 4
Clustering

Definition Applications
• Grouping of unlabeled instances • Market segmentation

• Useful to understand a dataset • Social network analysis

• Need for a similarity measure to group similar instances • Medical imaging

• Groups are classified by features • Image segmentation


• The more features, the more difficult grouping is

• Anomaly detection

2023FJ - juresti@tec.mx
Pedro Armijo 5
Types of clustering

Centroid-based Density-based

2023FJ - juresti@tec.mx
https://developers.google.com/machine-learning/clustering/
Pedro Armijo 6
Types of clustering…

Distribution-based Hierarchical

2023FJ - juresti@tec.mx
https://developers.google.com/machine-learning/clustering/
Pedro Armijo 7
Properties of clusters

All data points in a cluster must be similar to each other Data points from different clusters should be as different
as possible

2023FJ - juresti@tec.mx
https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/
Pedro Armijo 8
Evaluation metrics for clustering

Intracluster distance (inertia) Intercluster distance (Dunn index)


• Calculate distance from all points to the center of the • Measures distance between 2 clusters
cluster
• Tries to create very compact clusters min(𝑖𝑛𝑡𝑒𝑟𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒)
𝐷𝑢𝑛𝑛 𝐼𝑛𝑑𝑒𝑥 =
• The smaller the inertia, the better the cluster max (𝐼𝑛𝑒𝑟𝑡𝑖𝑎)

• The bigger the Dunn Index, the more separated clusters


are.

2023FJ - juresti@tec.mx
Pedro Armijo 9
Similarity
Measures

2023FJ - juresti@tec.mx https://d1e4pidl3fu268.cloudfront.net/968ffd16-ad6b-47c7-a738-707352d3db33/Similarity.crop_709x532_0,93.preview.jpg

Pedro Armijo 10
Similarity

• Measure of how much two objects are alike

• Distance with dimensions representing features of objects

• Highly dependent on domain and application


• For instance, use of shape vs. use of size

• Large distance means low similarity

• Usually, a number between 0 and 1

2023FJ - juresti@tec.mx
Pedro Armijo 11
Euclidian Distance

• Shortest distance between two points

• Usually a 2D space, but can be an N-dimensional space

• Other names:
• Euclidian norm
• Euclidian metric
• L2 norm
• L2 metric
• Pythagorean metric

https://miro.medium.com/v2/resize:fit:1400/format:webp/1*j_h0BwDKraVoD6XbUUyn3g.png

2023FJ - juresti@tec.mx
Pedro Armijo 12
Manhattan Distance

• Calculate distance between two points using a grid-like


path

• Also known as:


• Taxicab geometry
• City block distance
• L1 norm

https://miro.medium.com/v2/resize:fit:1400/format:webp/1*XsJbklzmGXbx6tu7aX2Vug.png

2023FJ - juresti@tec.mx
Pedro Armijo 13
Minkowski Distance

• Generalization of the Manhattan and Euclidian distance

• p is the order of the norm

• When p = 1, calculates Manhattan distance

• When p = 2, calculates Euclidian distance

https://cdn-images-1.medium.com/max/1067/1*Fb22fnJRABGUANpjcjwEow.png

2023FJ - juresti@tec.mx
Pedro Armijo 14
Chebyshev Distance

• Maximum absolute distance in one dimension of two N-


dimensional points

https://miro.medium.com/v2/resize:fit:1400/format:webp/1*Ys2Wga9fvZAnlPUir2HRIA.png

2023FJ - juresti@tec.mx
Pedro Armijo 15
Hamming Distance

• Similarity between two strings of the same length

• Number of positions at which the corresponding


characters are different in both strings

https://miro.medium.com/v2/resize:fit:1286/format:webp/1*Q4rdY5KN91aZjvGXl7SRug.png

2023FJ - juresti@tec.mx
Pedro Armijo 16
Levenshtein Distance

• Number of operations needed to convert one string into


another

• Operations to count:
• Inserting character
• Deleting character
• Substituting character

https://miro.medium.com/v2/resize:fit:1170/format:webp/1*Et8w4ttqh37UqaARPu6fqA.png

2023FJ - juresti@tec.mx
Pedro Armijo 17
Cosine Distance

• Angle between two objects

• Judgment of orientation rather than magnitude

• Same orientation = cosine similarity of 1

• Perpendicular orientation = cosine similarity of 0

• Opposed orientation = cosine similarity of -1

https://miro.medium.com/v2/resize:fit:1400/format:webp/1*RTaoSa-_04l5HpQFxP116Q.png

2023FJ - juresti@tec.mx
Pedro Armijo 18
Cosine Distance…
Applied to text
• Three steps • Check notebook “Cosine Similarity Textos mios.ipynb”
1. Obtain all unique words in original sentences
2. Create a vector for each sentence
3. Calculate Cosine Distance between all sentences

Notice that it can be a sentence or a complete document

2023FJ - juresti@tec.mx
Pedro Armijo 19
K-means

https://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/KMeans-Gaussian-data.svg/434px-KMeans-Gaussian-data.svg.png
2023FJ - juresti@tec.mx
Pedro Armijo 21
K-means
Definition
• Clustering method that tries to minimize the distance
between points in a cluster (inertia)
• Centroid-based algorithm
• Distance-based algorithm

• Assign each point to a cluster based on its distance to the


cluster’s centroid

• Each cluster is associated with a centroid

2023FJ - juresti@tec.mx
Pedro Armijo 22
K-means

Algorithm Distance from a point to another point


1. Select number of clusters (K) • Euclidian distance is used

P1 = x1 , x2 ,  xn P2 = y1 , y2 ,  yn
2. Select K random points from the dataset
1. These will be used as centroids of our clusters i =n
D( P1 , P2 ) = (
å i i
x - y )2

i =1
3. Assign the rest of the points to the closest cluster

• In the case of 2 points (2D) use:


4. Calculate new centroids for the current clusters P1 = x1 , x2 P2 = y1 , y2

5. Repeat steps 3 and 4 until termination criterion is met D( P1 , P2 ) = (x1 - y1 )2 + (x2 - y2 )2

2023FJ - juresti@tec.mx
Pedro Armijo 23
K-means
Termination Criteria
1. Centroids do not change
1. After some iterations, centroids remain the same
2. Algorithm is not learning new patterns

2. Points remain in the same cluster


1. Centroids are changing
2. No data points are changing from cluster after several
iterations

3. Maximum number of iterations reached

https://www.programmersought.com/images/437/0676d01a03e72b72253d6e62590a88c5.JPEG
2023FJ - juresti@tec.mx
Pedro Armijo 24
K-means: disadvantages

Clusters may have different sizes Clusters may have different densities

Solution is usually to add more clusters

2023FJ - juresti@tec.mx
Pedro Armijo 25
Choosing K
Elbow method
• Empirical method
• Perhaps the most famous method for choosing K

• Method
• Calculate the Within-Cluster-Sum of Squared Errors (WCSS) for
different values of K (intertia) https://editor.analyticsvidhya.com/uploads/12158wcss.png
• WCSS = sum(square(point distance to cluster centroid))

• Choose the K where there is a decrease in the value of WCSS

https://miro.medium.com/max/407/1*O_JmBi6rM6PlrPFx_uNrVQ.png
https://cdn.analyticsvidhya.com/wp-content/uploads/2019/08/Screenshot-from-2019-08-09-16-01-34.png
2023FJ - juresti@tec.mx
Pedro Armijo 26
Choosing K…
Silhouette Method
• Measures how similar a point is to its own cluster (cohesion) • d(i,j) = distance from point i to point j
compared to other clusters (separation)
• s(i) varies from 1 to -1
• A high value indicates a good number of clusters
• A negative value indicates too many or too few clusters

• a(i) = similarity measure of the point i to its own cluster

https://miro.medium.com/max/389/1*Otg9XRLPqOjb80Png6B9Og.png

• b(i) = dissimilarity measure of i from points in other


clusters

2023FJ - juresti@tec.mx
https://medium.com/@cmukesh8688/silhouette-analysis-in-k-means-clustering-cefa9a7ad111 Pedro Armijo 27
K-means

Advantages Disadvantages
• Very simple to implement • Very sensitive to outliers

• Adapts new examples (instances) easily • Choosing K is not an easy task

• Can generalize clusters of different sizes and shapes • Not so efficient when the number of dimensions increases

2023FJ - juresti@tec.mx
Pedro Armijo 28
Class Exercise

• Tutorial
• https://realpython.com/k-means-clustering-python/
• Work up to "Evaluating Clustering Performance Using
Advance Techniques"

2023FJ - juresti@tec.mx
Pedro Armijo 29
References

• Russell, S. & Norvig, P. (2010). Artificial Intelligence: A Modern Approach.


Third Edition. Prentice Hall.

• Mitchell, T. (1997). Machine Learning. McGraw-Hill.

• Creator of the presentation design: Pedro Armijo. MS Office theme.

2023FJ - juresti@tec.mx
Pedro Armijo 30

You might also like