MACHINE LEARNING R23 Material
MACHINE LEARNING R23 Material
UNIT-1
Introduction to Machine Learning
Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on enabling
systems to learn from data and make decisions or predictions without being explicitly
programmed. Rather than following predetermined rules, ML algorithms allow computers to
recognize patterns, make inferences, and improve their performance over time through
experience.
Machine learning refers to the field of study that gives computers the ability to learn from
data, identify patterns, and make decisions with minimal human intervention. This involves
using algorithms to analyze and model data to make predictions or decisions.
1. Supervised Learning:
o In supervised learning, the model is trained on labeled data (data that has both
input and corresponding output). The algorithm learns by example and
generalizes patterns to make predictions on new, unseen data.
o Example: Classifying emails as "spam" or "not spam."
2. Unsupervised Learning:
o In unsupervised learning, the model is given data without labels. The goal is to
find hidden patterns or structures within the data.
o Example: Customer segmentation, where a model groups customers based on
their purchasing behavior without predefined categories.
3. Reinforcement Learning:
o In reinforcement learning, an agent learns to make decisions by performing
actions in an environment to maximize a cumulative reward. The agent
receives feedback in the form of rewards or penalties based on its actions.
o Example: Training a robot to navigate through a maze.
4. Semi-supervised Learning (Optional category):
o This is a hybrid between supervised and unsupervised learning. It uses a small
amount of labeled data and a large amount of unlabeled data.
o Example: Image recognition where only a few images are labeled, but the
model can learn from a large amount of unlabeled images.
1. Linear Regression:
o Used for predicting continuous values, such as house prices based on features
like size, location, etc.
2. Decision Trees:
o A model that makes decisions based on answering questions (features) and
traversing a tree structure to reach a prediction.
3. Random Forest:
o An ensemble method that uses multiple decision trees to improve performance
and reduce overfitting.
4. K-Nearest Neighbors (KNN):
o A classification algorithm that makes predictions based on the majority class
of the nearest data points in the feature space.
5. Support Vector Machines (SVM):
o A powerful classifier that tries to find the optimal hyperplane separating
classes in a high-dimensional space.
6. Neural Networks:
o Inspired by the human brain, these networks consist of layers of
interconnected nodes (neurons) and are used in deep learning applications like
image and speech recognition.
7. K-means Clustering:
o A method for unsupervised learning where the algorithm groups data into k
clusters based on feature similarity.
The typical workflow of a machine learning project involves several key stages:
1. Data Collection: Gathering the relevant data for the problem you're solving.
2. Data Preprocessing: Cleaning and preparing the data for modeling, including
handling missing values, normalizing, and encoding categorical variables.
3. Model Selection: Choosing the appropriate machine learning algorithm.
4. Training the Model: Feeding the training data into the model to allow it to learn
from the data.
5. Evaluation: Assessing the model's performance using metrics like accuracy,
precision, recall, or mean squared error (for regression).
6. Hyperparameter Tuning: Adjusting the model parameters for optimal performance.
7. Deployment: Integrating the trained model into a real-world application.
The evolution of machine learning (ML) can be traced through various stages, each
representing a significant leap in our understanding and capability to make machines
"intelligent." From its early theoretical foundations to the modern-day applications powered
by deep learning, machine learning has undergone a profound transformation. Below is an
overview of the key milestones in the evolution of machine learning:
The roots of machine learning can be traced back to the early days of computing and artificial
intelligence (AI):
Turing's Work (1936-1937): The British mathematician Alan Turing laid the
groundwork for theoretical computing with the concept of the Turing machine
Neural Networks and Perceptrons (1950s): In the 1950s, early work on neural
networks began with the creation of the Perceptron by Frank Rosenblatt.
Rule-based AI: During the 1950s to 1970s, AI research was dominated by symbolic
approaches..Early ML Algorithms: Researchers began exploring algorithms like
decision trees and clustering methods, though the field was still in its infancy
Despite early successes, progress in AI and machine learning slowed significantly during this
period due to limited computational resources and overly optimistic expectations:
Challenges in Data and Computing: The limitations of computers at the time, both
in terms of memory and processing power, constrained the development of more
advanced ML algorithms. Additionally, AI and machine learning models struggled to
perform well in real-world, noisy data scenarios.
AI Winter: This term refers to a period of reduced funding and interest in AI research
during the late 1970s to early 1990s, as results from early ML models did not live up
to expectations.
The 1990s saw a resurgence in machine learning, driven by the development of statistical
methods, the increase in computational power, and the availability of larger datasets:
The 2010s saw significant breakthroughs in machine learning, particularly in the area of deep
learning:
Big Data:.
Rise of Deep Learning: Deep learning :
o ImageNet Breakthrough (2012): The ImageNet competition marked a
pivotal moment when deep learning models, especially convolutional neural
networks (CNNs), drastically outperformed traditional machine learning
algorithms in image classification tasks. This achievement sparked widespread
interest in deep learning.
Natural Language Processing (NLP) and Transformer Models: In the field of
NLP, algorithms like Word2Vec and later transformers (such as BERT and GPT)
revolutionized language understanding and generation, allowing machines to achieve
human-level performance on tasks like translation, question answering, and text
generation.
Reinforcement Learning Advancements: Reinforcement learning, notably through
deep Q-learning (DeepMind's AlphaGo playing Go), reached new heights, solving
complex decision-making problems.
Machine learning (ML) can be broadly categorized into different paradigms based on how the
algorithms learn from data and the nature of the problem they aim to solve. Each paradigm
has distinct approaches, techniques, and applications. The most common paradigms are
supervised learning, unsupervised learning, reinforcement learning, and semi-
supervised learning. Below is a detailed explanation of each paradigm:
1. Supervised Learning
Definition: In supervised learning, the model is trained using labeled data. The algorithm
learns the relationship between input data (features) and their corresponding outputs (labels)
during training, and then generalizes this knowledge to make predictions on new, unseen
data.
How it Works:
o Training Data: Consists of input-output pairs where the output (label) is
known.
o Objective: The goal is to learn a mapping function f(x)f(x)f(x) that maps
inputs xxx to outputs yyy, so the model can predict the output for new, unseen
inputs.
Key Algorithms:
o Linear Regression: For predicting continuous values.
o Logistic Regression: For binary classification problems.
o Decision Trees: Used for both classification and regression tasks.
o Support Vector Machines (SVM): Classification algorithm that finds the
optimal hyperplane.
o K-Nearest Neighbors (KNN): A simple algorithm that classifies data based
on the majority label of its nearest neighbors.
o Random Forests: An ensemble method that uses multiple decision trees for
more accurate predictions.
o Neural Networks: A powerful approach used in deep learning for tasks like
image and speech recognition.
Applications:
o Image classification (e.g., identifying cats and dogs in pictures).
o Email spam detection.
o Sentiment analysis (classifying reviews as positive or negative).
2. Unsupervised Learning
Definition: In unsupervised learning, the model is provided with data that has no labels. The
goal is to identify underlying patterns or structures in the data without predefined outputs.
How it Works:
o Training Data: Contains only input data, and the algorithm must find hidden
patterns or structures within this data.
o Objective: The model aims to explore the data and organize it into clusters,
dimensions, or other structures.
Key Algorithms:
o Clustering:
K-Means Clustering: Groups data into k clusters based on similarity.
Hierarchical Clustering: Builds a tree of clusters.
DBSCAN: Identifies clusters of varying shapes based on density.
o Dimensionality Reduction:
Principal Component Analysis (PCA): Reduces the number of
features while retaining variance.
t-Distributed Stochastic Neighbor Embedding (t-SNE): A technique
for visualizing high-dimensional data.
o Association Rules:
Apriori: Used for market basket analysis, discovering items that
frequently co-occur in transactions.
Applications:
o Customer segmentation (grouping customers based on purchasing behavior).
o Anomaly detection (detecting fraud or network intrusions).
o Topic modeling (grouping documents into topics based on word patterns).
3. Reinforcement Learning
Definition: Reinforcement learning (RL) is a paradigm where an agent learns to make
decisions by interacting with an environment to maximize a cumulative reward. The agent
does not know the right actions initially but learns from feedback after each action it takes.
How it Works:
o Agent: The decision-making entity that interacts with the environment.
o Environment: The world in which the agent operates.
o State: The current situation or configuration of the environment.
o Action: The choices the agent can make in the environment.
o Reward: The feedback the agent receives after taking an action. The objective
is to maximize the total reward over time.
o Policy: A strategy the agent uses to determine actions based on states.
o Value Function: A function that estimates the expected return (reward) for
each state.
Key Algorithms:
o Q-Learning: A model-free algorithm that learns the value of actions in
different states.
o Deep Q Networks (DQN): Combines Q-learning with deep neural networks
for complex environments.
o Policy Gradient Methods: Directly optimize the policy, used in algorithms
like Proximal Policy Optimization (PPO).
o Actor-Critic Methods: Combines policy-based and value-based methods.
Applications:
o Game playing (e.g., AlphaGo, chess, and video games).
o Robotics (e.g., teaching robots to walk or pick objects).
o Autonomous vehicles (e.g., self-driving cars making navigation decisions).
o Healthcare (e.g., personalized treatment recommendations).
4. Semi-Supervised Learning
How it Works:
o Training Data: Consists of both labeled and unlabeled data.
o Objective: The algorithm attempts to use the small amount of labeled data to
make sense of the unlabeled data and make more accurate predictions or
classifications.
Key Algorithms:
o Semi-Supervised Support Vector Machines (S3VM): An extension of SVM
that works with both labeled and unlabeled data.
o Self-training: An iterative approach where the model initially trains on the
labeled data, and then iteratively labels the unlabeled data to improve
performance.
o Graph-based Methods: Use relationships (edges) between labeled and
unlabeled data points to propagate labels.
Applications:
o Image recognition: Labeled images may be scarce, but large unlabeled
datasets can help improve accuracy.
o Speech recognition: Labeled audio data may be limited, but using large
amounts of unlabeled audio can enhance training.
o Medical diagnostics: Labeling medical images can be time-consuming and
expensive, so semi-supervised techniques help utilize the available data
effectively.
Definition: Self-supervised learning is a technique where a model generates its own labels
from the data itself. It creates a pretext task (an auxiliary task) to learn useful representations
from unlabeled data, which can then be fine-tuned for specific downstream tasks.
How it Works:
o Pretext Task: A task that is designed so the model learns useful features from
unlabeled data. For instance, predicting missing parts of an image, filling in
missing words in a sentence, or predicting the next frame in a video.
o Transfer Learning: The representations learned from the pretext task are
transferred to solve other tasks that require supervised learning.
Key Algorithms:
o Contrastive Learning: A method where the model learns by contrasting
positive and negative samples (e.g., SimCLR, MoCo).
o Masked Language Models (MLMs): In NLP, models like BERT are pre-
trained on tasks like predicting masked words to understand language
representations.
Applications:
o Natural Language Processing: Pretraining language models (e.g., GPT,
BERT) using self-supervised learning.
o Computer Vision: Pretraining models for tasks like object detection using
unlabeled images.
o Robotics: Learning representations from raw sensory data without explicit
supervision.
3. Learning by Rote:
4 Learning by Induction:
Learning by Induction in Machine Learning
In machine learning, learning by induction refers to the process where a model generalizes
from specific examples (data points) to broader principles or patterns. Inductive learning is
central to most machine learning techniques, where algorithms make predictions or
classifications based on previously observed data. This approach contrasts with deductive
learning, where conclusions are drawn from known rules or facts.
1. Generalization:
o The primary goal of inductive learning is to generalize from a set of training
data to make predictions on unseen examples. The idea is that patterns or
relationships found in the training data will hold for new, similar data points.
o Example: If a model learns that "birds typically fly" from a training dataset of
various bird species, it might generalize that all birds can fly, although there
might be exceptions like penguins.
2. Hypothesis Formation:
o Inductive learning involves forming a hypothesis or a model that captures the
general relationships in the data. This hypothesis is derived from specific
examples.
o Example: In supervised learning, a model might form a hypothesis (or rule)
like “if an animal has feathers and a beak, it is a bird” based on training data
containing labeled examples of animals.
3. Overfitting and Underfitting:
o Overfitting occurs when the model learns the noise or specific details of the
training data too well, resulting in poor performance on new, unseen data. It
implies the model has become too complex and is over-generalizing.
o Underfitting happens when the model is too simple to capture the underlying
patterns in the data, leading to poor performance even on training data.
4. Types of Inductive Learning Algorithms:
o Decision Trees: Algorithms like ID3 and C4.5 induce decision rules from
examples, which can be used for classification or regression tasks.
o Neural Networks: These models learn by adjusting weights based on training
data to generalize patterns for tasks like image classification or speech
recognition.
o Support Vector Machines (SVM): SVMs learn a decision boundary by
inducing patterns in the feature space to separate data into different classes.
o k-Nearest Neighbors (k-NN): This algorithm makes predictions based on the
majority class of the closest training data points, inducing general rules about
the relationships between data points in the feature space.
5. Inductive Bias:
o Inductive bias refers to the set of assumptions a learning algorithm makes to
generalize from the training data to unseen data. The nature of the inductive
bias affects how the model learns and how well it can generalize.
o Example: A decision tree algorithm assumes that the data can be split into
categories using binary decisions. This bias can work well for certain tasks but
may not perform well if the data cannot be easily split.
6. Inductive Learning Process:
o Training Phase: The model is provided with a set of examples, and it learns
from this data by creating a hypothesis (model).
o Testing Phase: The learned model is evaluated on unseen examples to check
how well it generalizes.
o Prediction: Once the model is trained and validated, it can be used to predict
the output for new inputs.
7. Example of Inductive Learning in Practice:
o Spam Email Classification:
In a supervised inductive learning task, a model might be trained on a
set of emails labeled as "spam" or "not spam". The algorithm
generalizes patterns (e.g., words like "free" or "offer") to create a
hypothesis about which future emails are likely to be spam.
8. Advantages of Inductive Learning:
o Scalability: Inductive learning algorithms can scale to handle large datasets
and automatically infer patterns.
o Flexibility: These models can adapt to different types of data and problems,
whether the task is classification, regression, or clustering.
o Automation: Inductive learning models reduce the need for manual rule-
making by learning patterns automatically from data.
9. Challenges:
o Overfitting: As the model tries to generalize, it might learn irrelevant patterns
that don't apply to new data.
o Bias in the Data: If the training data is biased, the learned hypothesis will also
be biased and inaccurate when applied to real-world data.
o Complexity: Some inductive learning models, especially deep learning, can
become complex and computationally expensive.
5. Reinforcement Learning:
6. Types of Data:
Structured Data: Data that is organized in rows and columns, often found in
databases (e.g., spreadsheets, SQL databases).
Unstructured Data: Data that lacks a predefined format (e.g., text, images, audio).
Semi-Structured Data: Data that has some structure but is not fully organized (e.g.,
JSON, XML files).
7 Matching:
9. Data Acquisition:
Definition: Data acquisition refers to the process of gathering data from various
sources, such as sensors, databases, or web scraping.
Methods:
o Manual Collection: Collecting data by hand or using traditional methods.
o Automated Collection: Using software or scripts to gather data automatically.
o Public Datasets: Utilizing open-source datasets available for research or
development purposes.
Definition: Data representation refers to how data is structured and stored so that
machine learning models can interpret it effectively.
Types:
o Numerical Representation: Using numbers to represent data, like integers or
floating-point numbers.
o Categorical Representation: Using categories or labels, often transformed
into numerical form using techniques like one-hot encoding.
o Vector Representation: Representing data in vectorized form, often used in
natural language processing or image processing.
Definition: Model selection involves choosing the right machine learning model or
algorithm based on the problem type and the characteristics of the data.
Factors to consider:
o Type of Data: Structured, unstructured, time-series, etc.
o Task Type: Classification, regression, clustering, etc.
o Performance Metrics: Accuracy, precision, recall, F1 score, etc.
o Complexity: Simpler models may be more interpretable but less powerful,
whereas complex models may have better accuracy but be harder to interpret.
Definition: Model learning refers to the process of training a machine learning model
by feeding it data so that it can learn patterns or relationships in the data.
Methods:
o Supervised Learning: Learning with labeled data (e.g., classification,
regression).
o Unsupervised Learning: Learning from unlabeled data (e.g., clustering,
dimensionality reduction).
o Semi-supervised Learning: A mix of labeled and unlabeled data.
o Reinforcement Learning: Learning through trial and error.
Definition: Model prediction refers to the process of using a trained model to make
predictions on new, unseen data.
Types:
o Classification: Predicting the class label of an input (e.g., spam or not spam).
o Regression: Predicting a continuous value (e.g., house prices).
o Clustering: Grouping data into clusters based on similarities.
Search: Involves exploring a problem space to find the best solution. It can be applied
in algorithms like search trees or optimization tasks.
o Types:
Depth-first Search (DFS): Explores as far as possible down one
branch before backtracking.
Breadth-first Search (BFS): Explores all nodes at the present depth
level before moving on to nodes at the next depth level.
Learning: Machine learning algorithms improve through experience, refining their
models and predictions over time.
Definition: A data set is a collection of related data points organized for analysis,
training, or testing purposes.
Types:
o Training Set: Used to train a model.
o Test Set: Used to evaluate the model’s performance.
o Validation Set: Used for tuning hyperparameters or for model selection.
Examples:
o MNIST: A dataset of handwritten digits used for image recognition tasks.
o Iris: A classic dataset in machine learning for classification tasks.
UNIT-2
Nearest Neighbor-Based Models in Machine Learning
Nearest neighbor-based models, especially the k-nearest neighbors (k-NN) algorithm, are a
fundamental class of machine learning techniques that rely on the idea of proximity or
similarity between data points for making predictions. These models are used in both
classification and regression tasks and are based on the idea that similar points are likely to
have similar labels or values. Below is a detailed breakdown of the key concepts related to
nearest neighbor-based models.
Nearest Neighbor methods rely on the idea that similar instances have similar outputs.
Predictions are based on the proximity of a query point to data points in the training set.
Common distance metrics:
o Euclidean Distance: d(x,x′)=∑i=1n(xi−xi′)2d(x, x') = \sqrt{\sum_{i=1}^n (x_i -
′)=∑i=1n∣xi−xi′∣
o Cosine Similarity: Measures the cosine of the angle between two vectors.
K-NN is a simple, instance-based algorithm that classifies a data point based on the majority
label of its kkk-nearest neighbors.
Algorithm:
1. Choose a value for kkk (number of neighbors).
2. Compute the distances between the query point and all points in the training set.
3. Identify the kkk closest points.
4. Assign the class label most common among these kkk points.
Key Considerations:
o Larger kkk smoothens the decision boundary (reduces variance but may increase
bias).
o Small kkk may lead to overfitting.
Predicts the value of a query point as the average (or weighted average) of the values of its
kkk-nearest neighbors: y^=1k∑i=1kyi\hat{y} = \frac{1}{k} \sum_{i=1}^k y_iy^=k1i=1∑kyi
o Weighted K-NN: Weights closer points more heavily:
y^=∑i=1kwiyi∑i=1kwi,wi=1d(x,xi)\hat{y} = \frac{\sum_{i=1}^k w_i y_i}{\sum_{i=1}^k
w_i}, \quad w_i = \frac{1}{d(x, x_i)}y^=∑i=1kwi∑i=1kwiyi,wi=d(x,xi)1
4. Distance Metrics
Strengths:
Weaknesses:
High Computation Cost: Requires computing distances to all training points at inference.
Memory Intensive: Stores the entire training set.
Curse of Dimensionality: Performance deteriorates in high-dimensional spaces.
Sensitive to Noise: Outliers can affect predictions.
7. Improving K-NN
Scaling Features:
Dimensionality Reduction:
Techniques like PCA (Principal Component Analysis) reduce noise and improve performance.
Proximity measures refer to techniques that quantify the similarity or closeness between two
data points. The idea is that if two data points are close to each other in the feature space,
they are more likely to belong to the same class (in classification) or have similar values (in
regression). These proximity measures are essential for nearest neighbor algorithms.
Distance Measures
Distance measures are specific types of proximity measures that calculate the physical
distance between data points in a multi-dimensional feature space. The choice of distance
measure can have a significant impact on the performance of the nearest neighbor algorithm.
1. Euclidean Distance:
o Formula: d(x,y)=∑i=1n(xi−yi)2d(\mathbf{x}, \mathbf{y}) = \sqrt{\
sum_{i=1}^{n} (x_i - y_i)^2}d(x,y)=i=1∑n(xi−yi)2
o The most common and widely used distance measure.
o It calculates the straight-line distance between two points in Euclidean space.
2. Manhattan Distance (also known as L1 distance or Taxicab Distance):
o Formula: d(x,y)=∑i=1n∣xi−yi∣d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^{n} |
x_i - y_i|d(x,y)=i=1∑n∣xi−yi∣
o The sum of the absolute differences between corresponding components.
3. Minkowski Distance:
o A generalization of Euclidean and Manhattan distance.
o Formula: d(x,y)=(∑i=1n∣xi−yi∣p)1/pd(\mathbf{x}, \mathbf{y}) = \left( \
sum_{i=1}^{n} |x_i - y_i|^p \right)^{1/p}d(x,y)=(i=1∑n∣xi−yi∣p)1/p
o When p=2p = 2p=2, it’s Euclidean distance; when p=1p = 1p=1, it’s
Manhattan distance.
4. Cosine Similarity:
A measure of similarity between two non-zero vectors of an inner product
o
space.
o Formula: cosine similarity=x⋅y∥x∥∥y∥\text{cosine similarity} = \frac{\
mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\| \|\
mathbf{y}\|}cosine similarity=∥x∥∥y∥x⋅y
o It measures the angle between the vectors, often used in text analysis or when
comparing documents.
5. Hamming Distance:
o Used for categorical data, particularly binary data.
o It counts the number of positions at which the corresponding elements of two
strings are different.
Not all similarity measures correspond to a "distance" in the strict mathematical sense. These
are non-metric similarity functions, often used in cases where the data is not naturally metric
(e.g., categorical or nominal data).
For binary patterns (data with values of 0 or 1), the distance or similarity can be computed
using measures that specifically handle binary data:
Hamming Distance: The number of differing bits between two binary patterns.
Jaccard Similarity: Works well for comparing binary sets, focusing on the
intersection over the union of two sets.
The k-Nearest Neighbors (k-NN) algorithm is one of the most straightforward and widely
used classification algorithms. It classifies new data points based on the majority class among
its kkk-nearest neighbors.
Algorithm Steps
Input:
Procedure:
1. Compute Distances:
o Calculate the distance d(xq,xi)d(x_q, x_i)d(xq,xi) between the query point xqx_qxq
and each data point xix_ixi in the training set.
2. Sort Neighbors:
o Sort the training points by their distance to xqx_qxq.
3. Select Nearest Neighbors:
o Select the kkk-nearest neighbors (smallest distances).
4. Majority Vote:
o Determine the majority class label among the kkk-nearest neighbors.
5. Output:
o Assign the query point xqx_qxq to the majority class.
3. Distance Metrics
4. Choosing kkk
Small kkk:
o Captures local patterns but may overfit (sensitive to noise).
Large kkk:
o Smoothens predictions but may underfit (ignores local structures).
Optimal kkk is usually determined via cross-validation.
Example
Dataset:
Training data:
D={(1,A),(2,A),(4,B),(5,B),(6,A)}D = \{(1, A), (2, A), (4, B), (5, B), (6, A)\}D={(1,A),(2,A),(4,B),(5,B),(6,A)}
Steps:
1. Compute Distances:
o d(1,3)=2,d(2,3)=1,d(4,3)=1,d(5,3)=2,d(6,3)=3d(1, 3) = 2, d(2, 3) = 1, d(4, 3) = 1, d(5, 3)
= 2, d(6, 3) = 3d(1,3)=2,d(2,3)=1,d(4,3)=1,d(5,3)=2,d(6,3)=3.
2. Sort Neighbors:
o Sorted distances: (2,A),(4,B),(1,A),(5,B),(6,A)(2, A), (4, B), (1, A), (5, B), (6, A)(2,A),
(4,B),(1,A),(5,B),(6,A).
3. Select k=3k = 3k=3 Nearest Neighbors:
o Neighbors: (2,A),(4,B),(1,A)(2, A), (4, B), (1, A)(2,A),(4,B),(1,A).
4. Majority Vote:
o Class labels: A,B,AA, B, AA,B,A.
o Majority class: AAA.
5. Prediction:
o Assign xq=3x_q = 3xq=3 to class AAA.
The Radius Nearest Neighbor (RNN) algorithm is an extension of the k-NN approach but
with a focus on the radius rather than the number of neighbors.
Algorithm Steps
Input:
Procedure:
1. Compute Distances:
o Calculate the distance d(xi,xq)d(x_i, x_q)d(xi,xq) between the query point xqx_qxq
and each data point xix_ixi in the training set.
2. Identify Neighbors:
o Select all points xix_ixi such that d(xi,xq)≤rd(x_i, x_q) \leq rd(xi,xq)≤r.
3. Prediction:
o Classification:
Use a majority vote among the labels of the selected neighbors.
If no neighbors are found within rrr, a default class can be assigned, or a
larger radius can be used.
o Regression:
Compute the mean (or weighted mean) of the target values yiy_iyi of the
selected neighbors.
Output:
3. Pseudocode
plaintext
Copy code
Algorithm RDNN:
Input: Training data D = {(x1, y1), (x2, y2), ..., (xn, yn)}, query point
x_q, radius r, distance metric d.
KNN Regression
In k-Nearest Neighbors Regression (k-NN regression), the output is continuous rather than
categorical. The prediction is made based on the average (or weighted average) of the target
values of the kkk-nearest neighbors.
Input:
1. Compute Distances:
o Calculate the distance d(xq,xi)d(x_q, x_i)d(xq,xi) between the query point xqx_qxq
and each training point xix_ixi.
2. Find kkk-Nearest Neighbors:
o Identify the kkk-nearest neighbors by selecting the kkk points with the smallest
distances to xqx_qxq.
3. Aggregate Neighbor Target Values:
o Simple Average:
Predict the target value as the mean of the target values of the kkk-nearest
neighbors: y^q=1k∑i=1kyi\hat{y}_q = \frac{1}{k} \sum_{i=1}^k y_iy^q=k1
i=1∑kyi
o Weighted Average:
Assign weights inversely proportional to the distances:
y^q=∑i=1kwiyi∑i=1kwi,wi=1d(xq,xi)+ϵ\hat{y}_q = \frac{\sum_{i=1}^k w_i y_i}
{\sum_{i=1}^k w_i}, \quad w_i = \frac{1}{d(x_q, x_i) + \epsilon}y^q=∑i=1kwi
∑i=1kwiyi,wi=d(xq,xi)+ϵ1 Here, ϵ\epsilonϵ is a small constant to prevent
division by zero.
Output:
Pseudocode
plaintext
Copy code
Algorithm KNN-Regression:
Input: Training data D = {(x1, y1), ..., (xn, yn)}, query point x_q, number
of neighbors k, distance metric d.
1. Compute distances:
For each (x_i, y_i) in D:
dist[i] = d(x_i, x_q)
Performance of Classifiers
The performance of nearest neighbor classifiers depends on several factors:
1. Choice of kkk: Too small a kkk can lead to noisy predictions (overfitting), while too
large a kkk can make the model too smooth (underfitting).
2. Distance Measure: The choice of distance measure can heavily affect the accuracy of
the classifier, especially when features are on different scales.
3. Dimensionality: As the number of features (dimensions) increases, the model can
struggle due to the curse of dimensionality.
Evaluating the performance of regression algorithms is essential to understand how well the model
predicts continuous target variables. The choice of metric depends on the problem's context and the
specific goals, such as minimizing large errors, understanding variability, or achieving interpretable
results.
Measures the average magnitude of absolute errors between predicted values (y^i\
hat{y}_iy^i) and actual values (yiy_iyi).
Formula: MAE=1n∑i=1n∣yi−y^i∣\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
MAE=n1i=1∑n∣yi−y^i∣
Advantages:
o Simple to interpret.
o Less sensitive to outliers than Mean Squared Error.
Disadvantages:
o Doesn't penalize large errors as heavily as other metrics.
Measures the average of the squared differences between predicted and actual values.
Formula: MSE=1n∑i=1n(yi−y^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \
hat{y}_i)^2MSE=n1i=1∑n(yi−y^i)2
Advantages:
o Penalizes large errors more than small ones.
Disadvantages:
o Can be heavily influenced by outliers.
o Units are squared, making interpretation less intuitive.
The square root of MSE; expresses error in the same units as the target variable.
Formula: RMSE=1n∑i=1n(yi−y^i)2\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \
hat{y}_i)^2}RMSE=n1i=1∑n(yi−y^i)2
Advantages:
o Penalizes large errors more than MAE.
o Easier to interpret than MSE.
Disadvantages:
o Still sensitive to outliers.
Measures the proportion of variance in the target variable explained by the model.
Formula: R2=1−∑i=1n(yi−y^i)2∑i=1n(yi−yˉ)2R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\
sum_{i=1}^n (y_i - \bar{y})^2}R2=1−∑i=1n(yi−yˉ)2∑i=1n(yi−y^i)2 Where yˉ\bar{y}yˉ is the
mean of the actual values.
Range:
o R2=1R^2 = 1R2=1: Perfect fit.
o R2=0R^2 = 0R2=0: Model predicts as well as the mean.
o R2<0R^2 < 0R2<0: Model performs worse than the mean prediction.
Advantages:
o Intuitive measure of goodness-of-fit.
1. .
UNIT-3
Models Based on Decision Trees
Decision Trees are supervised learning models used for classification and regression tasks.
Working Principle:
o They split the data into subsets based on the feature that provides the most
homogeneous subsets (purity of classes).
o Each internal node represents a test on a feature, branches represent the outcome
of the test, and leaves represent class labels.
2. Impurity Measures
Impurity measures quantify how "mixed" the data is in terms of target classes:
Advantages:
o Easy to interpret and visualize.
o Handles both numerical and categorical data.
o Non-parametric (does not assume a distribution of the data).
Disadvantages:
o Prone to overfitting, especially with deep trees.
o Sensitive to small changes in data (instability).
5. Bias–Variance Trade-off
The Bayes classifier is based on Bayes' theorem and predicts the class that maximizes the
posterior probability: Class=argmaxkP(Ck∣X)\text{Class} = \arg\max_{k} P(C_k \mid
X)Class=argkmaxP(Ck∣X)
The Bayes classifier is optimal in minimizing the classification error if the true probabilities
P(Ck∣X)P(C_k \mid X)P(Ck∣X) are known.
In practice, these probabilities are often estimated from the data.
4. Multi-Class Classification
Extends the Bayes classifier to handle multiple classes by computing posterior probabilities
for each class and choosing the one with the highest value.
Class Conditional Independence: Assumes that features X1,X2,…,XnX_1, X_2, \ldots, X_nX1
,X2,…,Xn are independent given the class CCC: P(X∣C)=∏i=1nP(Xi∣C)P(X \mid C) = \
prod_{i=1}^n P(X_i \mid C)P(X∣C)=i=1∏nP(Xi∣C)
Naive Bayes Classifier:
o Based on the independence assumption, it calculates posterior probabilities
efficiently.
o Works well for text classification and spam filtering, despite its simplicity.
o Advantages: Fast, scalable, and effective with high-dimensional data.
o Disadvantages: Assumption of feature independence may not hold in practice.
UNIT-4
1. Introduction to Linear Discriminants
3. Perceptron Classifier
Goal: Adjust weights www and bias bbb to correctly classify all training data.
Algorithm:
1. Initialize w=0w = 0w=0, b=0b = 0b=0.
2. For each misclassified point xix_ixi:
Update www: w=w+ηyixiw = w + \eta y_i x_iw=w+ηyixi
Update bbb: b=b+ηyib = b + \eta y_ib=b+ηyi
3. Repeat until no misclassifications. η\etaη is the learning rate.
5. Support Vector Machines (SVMs)
SVMs find the hyperplane that maximizes the margin (distance between the
hyperplane and the nearest points of each class).
Objective: Minimize 12∥w∥2subject to yi(wTxi+b)≥1∀i\text{Minimize } \
frac{1}{2} \| w \|^2 \quad \text{subject to } y_i(w^T x_i + b) \geq 1 \quad \
forall iMinimize 21∥w∥2subject to yi(wTxi+b)≥1∀i
Results in better generalization compared to perceptrons.
7. Non-Linear SVM
8. Kernel Trick
9. Logistic Regression
UNIT-5
1. Introduction to Clustering
Clustering is an unsupervised machine learning technique used to group similar data points
into clusters based on their characteristics.
Objective: Maximize similarity within clusters and minimize similarity between clusters.
Applications: Market segmentation, image segmentation, document categorization, anomaly
detection.
2. Partitioning of Data
Partitioning involves dividing a dataset into clusters based on similarity or distance metrics.
Methods: Distance-based (e.g., Euclidean), density-based, or probabilistic.
3. Matrix Factorization
Matrix factorization decomposes a matrix (e.g., data matrix) into two or more lower-
dimensional matrices to reveal hidden patterns or structure.
Applications: Dimensionality reduction, recommendation systems, latent topic modeling.
4. Clustering of Patterns
Identifies groups (clusters) in data where patterns within the same cluster are similar, and
patterns in different clusters are dissimilar.
Uses distance metrics like Euclidean, Manhattan, or cosine similarity.
5. Divisive Clustering
6. Agglomerative Clustering
7. Partitional Clustering
Divides the dataset into non-overlapping clusters, with each data point belonging to exactly
one cluster.
Example algorithms: K-Means, K-Medoids.
Focus is on optimizing a predefined objective function (e.g., minimizing intra-cluster
distance).
8. K-Means Clustering
Soft Partitioning allows data points to belong to multiple clusters with membership
probabilities.
Soft Clustering methods, such as Fuzzy C-Means, assign degrees of membership rather than
hard assignments.









