You are on page 1of 4

Peer Review Assignment 2 - Part II¶

Name:
Date: 12 September 2021
Instructions¶
• Work through the notebook, answer all questions, and do all problems
• You are allowed to consult the internet, and discuss on the module forum
• Your answers and solutions to the problems should be added to this notebook
• Submit your final work as an html file
• Note that the solutions to the problems used python version 3.6.4.
Marking Scheme (Theoretical Questions)¶
• All questions are marked out of 3.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 1 mark
• 'Perfect' answer: 3 marks
Marking Scheme (Practical Problems)¶
• All problems are marked out of 5.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 2 mark
• Working code: 5 marks

Linear Discriminant Analysis (LDA)¶


The PCA encountered in the previous exercise can be viewed as a dimensionality
reduction scheme, projecting onto the directions with maximal variance.
LDA is also a dimensionality reduction scheme but operates on a very different
principle. Now we are given data that belongs to different classes. We are given both
the data value $x$ and a class label $y$ If we have $k$ classes then $y$ will take on
$k$ labels, in Python typicall the values 0 through $k-1$.
The idea is to project the data onto a lower dimensional space in such a way that
maximal class separation is achieved in the lower dimensional space.
You can learn more about the scikit-learn implementation at http://scikit-
learn.org/stable/modules/generated/sklearn.lda.LDA.html
You will investigate the difference between PCA using the wine data set, for more
information see http://archive.ics.uci.edu/ml/datasets/Wine Since the wine dataset is
13 dimensional the difference between PCA and LDA is more pronounced than say,
with the Iris data set.
We project down to 2 dimensions for easy visualization. In fact, since there are only 3
classes, one does not retain any more information by using higher dimensions.
Import packages¶
In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import numpy as np
from matplotlib import pylab as plt
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# source
This study Import different
was downloaded modules from
by 100000803816150 forCourseHero.com
using with the notebook
on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
from IPython.display import display
from IPython.display import Image

Simple example¶
As a warmup run the example from the scikit-learn website.
In [2]:
# Create synthetic data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

# Instantiate & fit the model: LDA


clf = LDA()
clf.fit(X, y)

print(clf.predict([[-0.8, -1]]))
[1]

Loading the data¶


Read the data, extract the class labels from the last column, then extract the names of
the classes using the convenient set function in Python.
In [3]:
# import training data
wine_train = np.loadtxt('./data/wine/wine_train.txt',delimiter = ',')
wine_train_labels = wine_train[:,-1]
wine_train_classes = list(set(wine_train_labels))
wine_train_classes = np.array(wine_train_classes, dtype=int)
wine_train_labels = np.array(wine_train_labels, dtype = int)
wine_train = wine_train[:,:-1]

# import testing data


wine_test = np.loadtxt('./data/wine/wine_test.txt', delimiter = ',')
wine_test_labels = wine_test[:,-1]
wine_test_classes = list(set(wine_test_labels))
wine_test_classes = np.array(wine_test_classes, dtype=int)
wine_test_labels = np.array(wine_test_labels, dtype = int)
wine_test = wine_test[:, :-1]

PCA¶
Problem 1: (5 marks)¶
Project the data onto 2 PCA components and display the classes of the dimension-
reduced data.
You should see something like:
In [4]:
# Insert code to produce the image below

# fit the model on training data


pca = PCA(n_components=2)
pca.fit(wine_train)
pr_data = pca.transform(wine_test)

# Plot the 3 classes


col = ['r*','yo','k+']
for
This study clwasin
source wine_test_classes:
downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
cl_labels = np.array([wine_test_labels==cl]).flatten()
dat_cl = pr_data[cl_labels,:]
plt.plot(dat_cl[:,0],dat_cl[:,1],col[int(cl-1)])

plt.title('The projection onto 2 PCA components')


plt.show()

In [5]:
display(Image(filename='./Wine_PCA.png'))

LDA¶
Problem 2:(5 marks)¶
Fit an LDA model to the data, using 2 components and display the different classes of
the projected data.
You should see:
In [6]:
# Insert code to produce the image below

# Fit LDA on training data


lda = LDA(n_components=2)
lda.fit(wine_train, wine_train_labels)

# Transform training and test data


This study source was downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
wine_train_lda = lda.transform(wine_train)
wine_test_lda = lda.transform(wine_test)

# Plot the 3 classes


col = ['r*','yo','k+']
means = np.zeros((2,3))

for cl in wine_train_classes:
cl_labels = np.array([wine_test_labels == cl]).flatten()
wine_cl = wine_test_lda[cl_labels, :]

means[:, int(cl-1)] = np.mean(wine_cl, axis=0)

plt.plot(wine_cl[:,0], wine_cl[:,1], col[int(cl-1)])

plt.title('Transformation of test data, 2 LDA components')


plt.show()

In [7]:
display(Image(filename='./LDA_pr.png'))

That the LDA projection is much better at preserving the class structure.
This study source was downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
Powered by TCPDF (www.tcpdf.org)

You might also like