UNSUPERVISED LEARNING
AANAND BHANDARI
BU2023UCGS126
CONTEXT
Machine Learning
Unsupervised Learning
Working
Types of Unsupervised Learning
Different Algorithms Used in it
Real-World Applications
Advantages , Disadvantages & Challenges
Conclusion
What is Machine
Learning?
• A field of artificial intelligence that allows
computers to learn from data.
Supervised vs. Unsupervised
Learning
• Supervised Learning: Learns from labeled
data (e.g., predicting house prices).
• Unsupervised Learning: Learns from
unlabeled data, identifying hidden structures.
Why Unsupervised Learning?
• Helps uncover patterns and relationships
in data.
What is Unsupervised Learning?
Definition:
• A machine learning approach where the
model finds patterns in data without
predefined labels.
Key Idea:
• Automatically identifies clusters,
relationships, and anomalies.
Example:
• Grouping customers based on shopping
behavior without prior labels.
How It Works
•Step 1 : Input unlabeled data.
•Step 2 : The algorithm identifies hidden patterns
and relationships.
•Step 3 : Data is grouped or structured
meaningfully.
•Example:
• Organizing songs into genres based on
listening habits.
Types of Unsupervised Learning
1. Clustering - Groups similar data points together.
Example: Categorizing news articles into topics.
2. Dimensionality Reduction - Reduces data
complexity while retaining important features .
Example: Reducing high-resolution images
while keeping key details.
Clustering
Algorithms
1. K-Means
Clustering
•Divides data into K groups based
on similarity.
•Example: Market segmentation
of customers.
2. Hierarchical
Clustering
•Builds a tree of clusters.
•Example: Classifying different
species based on genetic
similarity.
Dimensionality Reduction
Algorithms
1. Principal Component Analysis (PCA)
• Reduces data dimensions while
preserving key patterns.
• Example: Facial recognition data
compression.
2. t-SNE (t-Distributed Stochastic
Neighbor Embedding)
• Helps visualize high-dimensional data
in 2D or 3D.
• Example: Representing DNA data in
a simpler format for analysis.
Real-World
Applications
1. Market Segmentation :
• Grouping customers based
on purchasing behavior
2. Anomaly Detection
• Detecting fraud in
banking transactions
3. Recommender Systems
• Suggesting movies on
Netflix.
4. Image Compression
• Reducing file size
while maintaining
quality.
Advantages
Advantages
• No need for labeled data, reducing data preparation
costs
• Can discover hidden patterns and relationships in data
• Helps in exploring large and complex datasets
• Useful for anomaly detection and fraud prevention
• Adaptable to various domains like healthcare, finance,
and marketing
Disadvantages
• Difficult to evaluate results due to lack of ground truth
• Algorithms can be sensitive to parameter selection
• May struggle with high-dimensional data without
proper techniques
• Potentially produces irrelevant or misleading patterns
• Requires significant computational power for large
datasets
Challenges
Challenges
• Results may not always be
interpretable.
• Requires good preprocessing and
fine-tuning.
• Sensitive to noise in data.
Conclusion
• Unsupervised learning finds
patterns in unlabeled data.
• Clustering and dimensionality
reduction are two key techniques.
• Used in diverse applications, from
fraud detection to
recommendation systems.
THANK YOU