CRT2 LDA Assignment

Peer Review Assignment 2 - Part II¶
Name:
Date: 12 September 2021
Instructions¶
• Work through the notebook, answer all questions, and do all problems
• You are allowed to consult the internet, and discuss on the module forum
• Your answers and solutions to the problems should be added to this notebook
• Submit your final work as an html file
• Note that the solutions to the problems used python version 3.6.4.
Marking Scheme (Theoretical Questions)¶
• All questions are marked out of 3.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 1 mark
• 'Perfect' answer: 3 marks
Marking Scheme (Practical Problems)¶
• All problems are marked out of 5.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 2 mark
• Working code: 5 marks
Linear Discriminant Analysis (LDA)¶

The PCA encountered in the previous exercise can be viewed as a dimensionality
reduction scheme, projecting onto the directions with maximal variance.
LDA is also a dimensionality reduction scheme but operates on a very different
principle. Now we are given data that belongs to different classes. We are given both
the data value $x$ and a class label $y$ If we have $k$ classes then $y$ will take on
$k$ labels, in Python typicall the values 0 through $k-1$.
The idea is to project the data onto a lower dimensional space in such a way that
maximal class separation is achieved in the lower dimensional space.
You can learn more about the scikit-learn implementation at http://scikit-
learn.org/stable/modules/generated/sklearn.lda.LDA.html
You will investigate the difference between PCA using the wine data set, for more
information see http://archive.ics.uci.edu/ml/datasets/Wine Since the wine dataset is
13 dimensional the difference between PCA and LDA is more pronounced than say,
with the Iris data set.
We project down to 2 dimensions for easy visualization. In fact, since there are only 3
classes, one does not retain any more information by using higher dimensions.
Import packages¶
In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import numpy as np
from matplotlib import pylab as plt
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
# source
This study Import different
was downloaded modules from
by 100000803816150 forCourseHero.com
using with the notebook
on 03-13-2022 12:08:48 GMT -05:00
https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
from IPython.display import display
from IPython.display import Image
Simple example¶
As a warmup run the example from the scikit-learn website.
In [2]:
# Create synthetic data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
# Instantiate & fit the model: LDA

clf = LDA()
clf.fit(X, y)
print(clf.predict([[-0.8, -1]]))
[1]
Loading the data¶

Read the data, extract the class labels from the last column, then extract the names of
the classes using the convenient set function in Python.
In [3]:
# import training data
wine_train = np.loadtxt('./data/wine/wine_train.txt',delimiter = ',')
wine_train_labels = wine_train[:,-1]
wine_train_classes = list(set(wine_train_labels))
wine_train_classes = np.array(wine_train_classes, dtype=int)
wine_train_labels = np.array(wine_train_labels, dtype = int)
wine_train = wine_train[:,:-1]
# import testing data

wine_test = np.loadtxt('./data/wine/wine_test.txt', delimiter = ',')
wine_test_labels = wine_test[:,-1]
wine_test_classes = list(set(wine_test_labels))
wine_test_classes = np.array(wine_test_classes, dtype=int)
wine_test_labels = np.array(wine_test_labels, dtype = int)
wine_test = wine_test[:, :-1]
PCA¶
Problem 1: (5 marks)¶
Project the data onto 2 PCA components and display the classes of the dimension-
reduced data.
You should see something like:
In [4]:
# Insert code to produce the image below
# fit the model on training data

pca = PCA(n_components=2)
pca.fit(wine_train)
pr_data = pca.transform(wine_test)
# Plot the 3 classes

col = ['r*','yo','k+']
for
This study clwasin
source wine_test_classes:
downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00
cl_labels = np.array([wine_test_labels==cl]).flatten()
dat_cl = pr_data[cl_labels,:]
plt.plot(dat_cl[:,0],dat_cl[:,1],col[int(cl-1)])
plt.title('The projection onto 2 PCA components')

plt.show()
In [5]:
display(Image(filename='./Wine_PCA.png'))
LDA¶
Problem 2:(5 marks)¶
Fit an LDA model to the data, using 2 components and display the different classes of
the projected data.
You should see:
In [6]:
# Insert code to produce the image below
# Fit LDA on training data

lda = LDA(n_components=2)
lda.fit(wine_train, wine_train_labels)
# Transform training and test data

This study source was downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00
wine_train_lda = lda.transform(wine_train)
wine_test_lda = lda.transform(wine_test)
# Plot the 3 classes

col = ['r*','yo','k+']
means = np.zeros((2,3))
for cl in wine_train_classes:
cl_labels = np.array([wine_test_labels == cl]).flatten()
wine_cl = wine_test_lda[cl_labels, :]
means[:, int(cl-1)] = np.mean(wine_cl, axis=0)
plt.plot(wine_cl[:,0], wine_cl[:,1], col[int(cl-1)])
plt.title('Transformation of test data, 2 LDA components')

plt.show()
In [7]:
display(Image(filename='./LDA_pr.png'))
That the LDA projection is much better at preserving the class structure.
This study source was downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00
Powered by TCPDF (www.tcpdf.org)

CRT2 LDA Assignment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CRT2 LDA Assignment

Uploaded by

Copyright:

Available Formats

Peer Review Assignment 2 - Part II¶

Linear Discriminant Analysis (LDA)¶

# Instantiate & fit the model: LDA

Loading the data¶

# import testing data

# fit the model on training data

# Plot the 3 classes

plt.title('The projection onto 2 PCA components')

# Fit LDA on training data

# Transform training and test data

# Plot the 3 classes

means[:, int(cl-1)] = np.mean(wine_cl, axis=0)

plt.plot(wine_cl[:,0], wine_cl[:,1], col[int(cl-1)])

plt.title('Transformation of test data, 2 LDA components')

You might also like