You are on page 1of 11

PRML Assignment 3

Prepared by: Arman Gupta (B22CS014)


March 16, 2024

Question 1:

Preprocessing
The preprocessing phase involves loading the dataset from a publicly accessible URL
and preparing it for analysis. The dataset is loaded into a DataFrame using the pandas
library. The dataset includes features and labels, with the assumption that the first two
columns are the features and the last column represents the label.
1 import numpy a s np
2 import pandas a s pd
3
4 # URL t o t h e raw CSV f i l e i n t h e GitHub r e p o
5 d a t a u r l = ’ h t t p s : / / raw . g i t h u b u s e r c o n t e n t . com / . . . / data . c s v ’
6 # Load t h e d a t a s e t
7 d f = pd . r e a d c s v ( d a t a u r l )
8 # Assuming t h e l a s t column i s t h e l a b e l and t h e f i r s t two columns a r e t h e
features
9 data = d f . v a l u e s

Task 2: Show the LDA Projection Vector on a Plot


Linear Discriminant Analysis (LDA) is a dimensionality reduction technique used in
machine learning to find the linear combination of features that best separates two or
more classes of objects or events. The goal of LDA is to project the features in the dataset
onto a lower-dimensional space with good class-separability to avoid overfitting and also
reduce computational costs.

LDA Steps:
1. Splitting the Dataset by Class: The dataset is divided based on the class labels,
resulting in two subsets of data corresponding to each class.
1 d a t a c l a s s 0 = data [ data [ : , −1] == 0 ] [ : , : −1]
2 d a t a c l a s s 1 = data [ data [ : , −1] == 1 ] [ : , : −1]

2. Calculating Class Means: We compute the mean vectors for each class, which will
be used to determine the direction of the projection that maximizes the distance
between the class means.

1
1 mean 0 = np . mean ( d a t a c l a s s 0 , a x i s =0)
2 mean 1 = np . mean ( d a t a c l a s s 1 , a x i s =0)

3. Computing Within-class Scatter Matrix (SW): The within-class scatter matrix is a


measure of the spread of the classes themselves. It is calculated by summing up the
scatter matrices of each class, which quantify how much the class samples spread
out from the mean vector of each class.
1 SW = np . z e r o s ( ( 2 , 2 ) )
2 for x in data class 0 :
3 SW += np . o u t e r ( x − mean 0 , x − mean 0 )
4 for x in data class 1 :
5 SW += np . o u t e r ( x − mean 1 , x − mean 1 )

4. Computing Between-class Scatter Matrix (SB): The between-class scatter matrix


measures the separation between different classes. It is calculated based on the
distance between the mean vectors of the classes.
1 d i f f e r e n c e m e a n s = mean 1 − mean 0
2 SB = np . o u t e r ( d i f f e r e n c e m e a n s , d i f f e r e n c e m e a n s )

5. Eigenvalue Decomposition: By computing the eigenvalues and eigenvectors of the


matrix (SW −1 SB), we can find the optimal projection vector(s) that maximize the
ratio of the determinant of the between-class scatter matrix to the determinant of
the within-class scatter matrix.
1 SW inv = np . l i n a l g . i n v (SW)
2 e i g e n v a l u e s , e i g e n v e c t o r s = np . l i n a l g . e i g ( np . dot ( SW inv , SB) )
3 m a x e i g v e c = e i g e n v e c t o r s [ : , np . argmax ( e i g e n v a l u e s ) ]

6. Projection and Visualization: The eigenvector corresponding to the largest eigen-


value is chosen as the LDA projection vector. This vector is then visualized on a
scatter plot of the two classes, along with the class means. The projection vector
visually represents the best direction to separate the classes linearly.
1 p l t . f i g u r e ( f i g s i z e =(8 , 6 ) )
2 plt . scatter ( data class 0 [ : , 0] , data class 0 [ : , 1 ] , c o l o r= ’ r e d ’ , l a b e l
=’ Class 0 ’ )
3 plt . scatter ( data class 1 [ : , 0] , data class 1 [ : , 1 ] , c o l o r= ’ b l u e ’ ,
l a b e l= ’ C l a s s 1 ’ )
4 p l t . s c a t t e r ( ∗ mean 0 , c o l o r= ’ b l a c k ’ , marker= ’ x ’ , s =100 , l a b e l= ’ Mean
Class 0 ’ )
5 p l t . s c a t t e r ( ∗ mean 1 , c o l o r= ’ b l a c k ’ , marker= ’ o ’ , s =100 , l a b e l= ’ Mean
Class 1 ’ )

Visualization
The scatter plot shows the two classes with different colors, along with their respec-
tive means. The LDA projection vector is drawn emanating from the mean of Class 0,
highlighting the direction that best separates the two classes. This visualization aids in
understanding the geometric interpretation of LDA as a linear separator.
1 p r o j l i n e = m a x e i g v e c ∗ 10 # S c a l e f o r v i s u a l i z a t i o n
2 p l t . q u i v e r ( ∗ mean 0 , ∗ p r o j l i n e , c o l o r= ’ g r e e n ’ , s c a l e =1, s c a l e u n i t s= ’ xy ’ ,
a n g l e s= ’ xy ’ , width =0.005 , l a b e l= ’LDA P r o j e c t i o n Vector ’ )

2
Figure 1: Visualization of the LDA projection vector.

Analyzing 1-NN Classifier Performance: Original vs.


LDA Projected Data
This section outlines the process of comparing the performance of a 1-Nearest Neighbor
(1-NN) classifier on the original dataset versus the data projected using Linear Discrim-
inant Analysis (LDA).

Steps and Code Snippets


• Implementing LDA from Scratch
– Calculate the overall mean, class-specific means, within-class scatter matrix,
and between-class scatter matrix.
– Solve the eigenvalue problem for the matrix SW −1 SB.
– Select the top eigenvectors and create the transformation matrix.
– Project the data onto the new LDA space.
• Evaluating 1-NN Classifier Performance
– Train and test the 1-NN classifier on the original data.

3
– Train and test the 1-NN classifier on the LDA projected data.
– Compare accuracies to observe the impact of LDA.

Results and Observations


Upon evaluating the 1-Nearest Neighbor (1-NN) classifier’s performance on both the
original and Linear Discriminant Analysis (LDA) projected data, the following results
were observed:

• Accuracy with original data: 89.00%

• Accuracy with LDA projected data: 88.75%

Observations:

• The 1-NN classifier’s accuracy on the original data is 89.00%. This performance
serves as a baseline for comparing the effectiveness of dimensionality reduction
techniques.

• After projecting the data using LDA, the 1-NN classifier’s accuracy decreased to
88.75%.

• In high-dimensional spaces, 1-NN classifiers can be prone to overfitting. By reducing


the dimensionality, LDA might help mitigate this risk to some extent. However,
the slight drop in accuracy suggests that, in this particular case, overfitting was not
significantly impacting the classifier’s performance on the original data.

• The slight decrease in accuracy might also suggest that the LDA projection, while
maintaining most of the class-separability information, could have led to a slightly
less generalized model. This is a trade-off often encountered in dimensionality
reduction techniques, balancing between maintaining important features and the
risk of losing some generalizability due to the removal of features.

• Although not explicitly mentioned, one should note that using LDA-projected data
likely improved the computational efficiency of the 1-NN classification. Dimension-
ality reduction reduces the number of features that the classifier needs to consider,
which can significantly speed up the classification process, especially in cases of
high-dimensional data.

The observed increase in accuracy, albeit slight, suggests that LDA can help improve
classification performance by maximizing class separability in the projected space. This
result underscores the utility of LDA not only as a technique for dimensionality reduction
but also as a tool for enhancing the discriminative power of classifiers in lower-dimensional
spaces.

4
Question 2:

Task-0: Dataset Splitting


This task involves splitting the dataset into training and testing sets, with 12 samples
allocated for training and the remaining 2 for testing.

Task-1: Calculating Prior Probabilities


The objective of this task is to determine the prior probabilities of the events that indi-
viduals decide to play and not to play. Prior probabilities are foundational in Bayesian
inference, informing us about the general likelihood of outcomes before considering new
evidence.
To compute the prior probabilities, we utilize the training set labels (y train). We
calculate the proportion of ’yes’ and ’no’ labels in y train as follows:
1 p l a y y e s p r o b = y t r a i n . v a l u e c o u n t s ( ) [ ’ y e s ’ ] / y t r a i n . shape [ 0 ]
2 p l a y n o p r o b = y t r a i n . v a l u e c o u n t s ( ) [ ’ no ’ ] / y t r a i n . shape [ 0 ]

This calculation yields the following prior probabilities:

• P (Play=yes) = 0.5833333333333334

• P (Play=no) = 0.4166666666666667

These values indicate the overall likelihood of playing or not playing before considering
the conditions described by the features.

Task-2: Calculating Likelihood Probabilities


Likelihood probabilities quantify how probable certain feature values are, given each
class in the target variable. For this task, we calculate probabilities like P (Outlook =
Sunny|Play = yes) and P (Temperature = Mild|Play = yes), among others, for each
combination of feature values and target classes (Play=yes and Play=no).
To compute these probabilities, we first combine the training features (X train) with
the training labels (y train) into a single dataset. Then, for each feature and its unique
values, we calculate the conditional probability given the target class. The implementa-
tion is as follows:
1 d e f c a l c u l a t e l i k e l i h o o d ( f e a t u r e , val ue , t a r g e t , d a t a s e t ) :
2 s u b s e t = d a t a s e t [ d a t a s e t [ ’ Play ’ ] == t a r g e t ]
3 count = s u b s e t [ s u b s e t [ f e a t u r e ] == v a l u e ] . shape [ 0 ]
4 p r o b a b i l i t y = count / s u b s e t . shape [ 0 ]
5 return probability

1
Feature=Value P(Value — Play=yes) (%) P(Value — Play=no) (%)
Outlook=Rainy 28.57 60.00
Outlook=Overcast 42.86 0.00
Outlook=Sunny 28.57 40.00
Temp=Hot 28.57 40.00
Temp=Cool 42.86 20.00
Temp=Mild 28.57 40.00
Humidity=High 28.57 80.00
Humidity=Normal 71.43 20.00
Windy=f 71.43 40.00
Windy=t 28.57 60.00

Table 1: Likelihood Probabilities for Playing based on Weather Conditions

Based on the calculated likelihood probabilities, we draw the following conclusions


about the decision-making process for playing outdoor sports:

• Outlook:

– Overcast conditions significantly increase the likelihood of playing.


– Rainy conditions have a higher association with not playing.

• Temperature:

– Cooler conditions are slightly more favorable, though temperature has a lesser
impact overall.

• Humidity:

– Normal humidity levels greatly favor the decision to play.


– High humidity is a strong deterrent against playing.

• Windy:

– Lack of wind is preferred for playing.


– Windy conditions increase the likelihood of not playing.

• General Insights:

– The ideal conditions for playing include overcast skies, normal humidity, and
calm winds.
– High humidity and windy conditions are the primary adverse factors.
– Understanding these patterns is crucial for making predictions with the Naive
Bayes classifier, especially in applications related to planning outdoor activities
based on weather conditions.

2
Task 3: Calculating Posterior Probabilities
In this task, we use the Naive Bayes formula to calculate the posterior probabilities for
both classes (Play = yes and Play = no) for a given test instance. The process involves
multiplying the prior probabilities by the likelihoods of each feature given the class, and
normalizing these products to get probabilities that sum to 1.

Procedure
Given a test instance, we perform the following steps:
1. Initialize the product of likelihoods for each class with the prior probability of the
class.
2. For each feature in the test instance, multiply the current product by the likelihood
of the feature value given the class.
3. Sum the products for both classes to get the total likelihood.
4. Normalize each class product by the total likelihood to get the posterior probabili-
ties.
• Posterior Probability of Playing (Yes): 81.999%
• Posterior Probability of Not Playing (No): 18.0005%
Decision: Based on the higher posterior probability, the predicted decision for the
given test instance is Play = yes.

Task 4: Making Predictions


The prediction is made by choosing the class (i.e., Play = yes or Play = no) with the
higher posterior probability for each test instance.

Prediction Function
This function calculates the posterior probabilities for both classes and makes a prediction
based on which probability is higher.
1 def predict naive bayes ( test instance , prior yes , prior no , likelihoods ,
features ) :
2 likelihood product yes = prior yes
3 likelihood product no = prior no
4

5 for feature in features :


6 value = t e s t i n s t a n c e [ feature ]
7 l i k e l i h o o d p r o d u c t y e s ∗= l i k e l i h o o d s [ ’ y e s ’ ] [ f e a t u r e ] . g e t ( value , 1 )
8 l i k e l i h o o d p r o d u c t n o ∗= l i k e l i h o o d s [ ’ no ’ ] [ f e a t u r e ] . g e t ( value , 1 )
9
10 total likelihood = likelihood product yes + likelihood product no
11 posterior yes = likelihood product yes / total likelihood
12 posterior no = likelihood product no / total likelihood
13
14 p r e d i c t i o n = ’ y e s ’ i f p o s t e r i o r y e s > p o s t e r i o r n o e l s e ’ no ’
15 return prediction

3
Sample Predictions
• Test instance 1: Details={’Outlook’: ’Sunny’, ’Temp’: ’Mild’, ’Humidity’: ’Nor-
mal’, ’Windy’: ’f’}
Predicted Play=yes

• Test instance 2: Details={’Outlook’: ’Overcast’, ’Temp’: ’Mild’, ’Humidity’: ’High’,


’Windy’: ’t’}
Predicted Play=yes
Accuracy = 100%

Task 5: Likelihood Calculation with Laplace Smooth-


ing
The likelihood of feature values given a class is recalculated with a pseudocount to prevent
zero probabilities. The formula used is:
Nxi ,y + α
P (xi |y) =
Ny + α · D

where Nxi ,y is the count of feature xi in class y, Ny is the total count of class y, α is the
pseudocount, and D is the number of distinct values of feature xi .

Calculated Likelihoods with Laplace

Table 2: Probabilities with Laplace Smoothing by Feature and Outcome


Feature Value Play = Yes Play = No
Rainy 0.2877 0.5849
Outlook Overcast 0.4247 0.0189
Sunny 0.2877 0.3962
Hot 0.2877 0.3962
Temp Cool 0.4247 0.2075
Mild 0.2877 0.3962
High 0.2917 0.7885
Humidity
Normal 0.7083 0.2115
False 0.7083 0.4038
Windy
True 0.2917 0.5962

Predictions with Laplace Smoothing


Predictions are made using updated likelihoods incorporating Laplace smoothing. The
prediction for each test instance is based on comparing the posterior probabilities for
each class.

• Test instance 1: Details={’Outlook’: ’Sunny’, ’Temp’: ’Mild’, ’Humidity’: ’Nor-


mal’, ’Windy’: ’f’}
Predicted Play=yes

4
• Test instance 2: Details={’Outlook’: ’Overcast’, ’Temp’: ’Mild’, ’Humidity’: ’High’,
’Windy’: ’t’}
Predicted Play=yes
Accuracy = 100%

Test Instance Predictions


Test Instance 1: Outlook=Sunny, Temp=Mild, Humidity=Normal, Windy=false, Pre-
dicted Play=yes

Comparison of Predictions
The impact of Laplace smoothing on predictions is analyzed by comparing likelihoods
and posterior probabilities with and without Laplace smoothing.

Table 3: Comparison of Probabilities Without and With Laplace Smoothing


Play = Yes Play = No
Feature Value
Without Laplace Laplace Without Laplace Laplace
Sunny 0.2857 0.2877 0.4 0.3962
Outlook Rainy 0.2857 0.2877 0.6 0.5849
Overcast 0.4286 0.4247 0.0 0.0189
Hot 0.2857 0.2877 0.4 0.3962
Temp Cool 0.4286 0.4247 0.2 0.2075
Mild 0.2857 0.2877 0.4 0.3962
Normal 0.7143 0.7083 0.2 0.2115
Humidity
High 0.2857 0.2917 0.8 0.7885
t 0.2857 0.2917 0.6 0.5962
Windy
f 0.7143 0.7083 0.4 0.4038

Observations
Below are the summarized observations:

• Laplace smoothing adjusts probabilities to ensure no zero probabilities, aiding in


more accurate predictions for unseen conditions.

• The Outlook of Overcast significantly increases the probability of playing, indicating


a strong preference for this weather condition.

• Cooler temperatures are preferred for playing, as seen from the higher probability
under the Cool category.

• High humidity is a strong deterrent to playing, whereas normal humidity conditions


are favorable.

• Windy conditions are less preferred for playing, as indicated by the lower probability
when it is true (t).

5
Figure 1: Visualization of the LDA projection vector.

Figure 2: Visualization of the LDA projection vector.

6
Figure 3: Visualization of the LDA projection vector.

Figure 4: Visualization of the LDA projection vector.

You might also like