You are on page 1of 4

Peer Review Assignment 2 - Part II¶

Date: 12 September 2021
• Work through the notebook, answer all questions, and do all problems
• You are allowed to consult the internet, and discuss on the module forum
• Your answers and solutions to the problems should be added to this notebook
• Submit your final work as an html file
• Note that the solutions to the problems used python version 3.6.4.
Marking Scheme (Theoretical Questions)¶
• All questions are marked out of 3.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 1 mark
• 'Perfect' answer: 3 marks
Marking Scheme (Practical Problems)¶
• All problems are marked out of 5.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 2 mark
• Working code: 5 marks

Linear Discriminant Analysis (LDA)¶

The PCA encountered in the previous exercise can be viewed as a dimensionality
reduction scheme, projecting onto the directions with maximal variance.
LDA is also a dimensionality reduction scheme but operates on a very different
principle. Now we are given data that belongs to different classes. We are given both
the data value $x$ and a class label $y$ If we have $k$ classes then $y$ will take on
$k$ labels, in Python typicall the values 0 through $k-1$.
The idea is to project the data onto a lower dimensional space in such a way that
maximal class separation is achieved in the lower dimensional space.
You can learn more about the scikit-learn implementation at http://scikit-
You will investigate the difference between PCA using the wine data set, for more
information see Since the wine dataset is
13 dimensional the difference between PCA and LDA is more pronounced than say,
with the Iris data set.
We project down to 2 dimensions for easy visualization. In fact, since there are only 3
classes, one does not retain any more information by using higher dimensions.
Import packages¶
In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import numpy as np
from matplotlib import pylab as plt
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# source
This study Import different
was downloaded modules from
by 100000803816150
using with the notebook
on 03-13-2022 12:08:48 GMT -05:00
from IPython.display import display
from IPython.display import Image

Simple example¶
As a warmup run the example from the scikit-learn website.
In [2]:
# Create synthetic data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

# Instantiate & fit the model: LDA

clf = LDA(), y)

print(clf.predict([[-0.8, -1]]))

Loading the data¶

Read the data, extract the class labels from the last column, then extract the names of
the classes using the convenient set function in Python.
In [3]:
# import training data
wine_train = np.loadtxt('./data/wine/wine_train.txt',delimiter = ',')
wine_train_labels = wine_train[:,-1]
wine_train_classes = list(set(wine_train_labels))
wine_train_classes = np.array(wine_train_classes, dtype=int)
wine_train_labels = np.array(wine_train_labels, dtype = int)
wine_train = wine_train[:,:-1]

# import testing data

wine_test = np.loadtxt('./data/wine/wine_test.txt', delimiter = ',')
wine_test_labels = wine_test[:,-1]
wine_test_classes = list(set(wine_test_labels))
wine_test_classes = np.array(wine_test_classes, dtype=int)
wine_test_labels = np.array(wine_test_labels, dtype = int)
wine_test = wine_test[:, :-1]

Problem 1: (5 marks)¶
Project the data onto 2 PCA components and display the classes of the dimension-
reduced data.
You should see something like:
In [4]:
# Insert code to produce the image below

# fit the model on training data

pca = PCA(n_components=2)
pr_data = pca.transform(wine_test)

# Plot the 3 classes

col = ['r*','yo','k+']
This study clwasin
source wine_test_classes:
downloaded by 100000803816150 from on 03-13-2022 12:08:48 GMT -05:00
cl_labels = np.array([wine_test_labels==cl]).flatten()
dat_cl = pr_data[cl_labels,:]

plt.title('The projection onto 2 PCA components')

In [5]:

Problem 2:(5 marks)¶
Fit an LDA model to the data, using 2 components and display the different classes of
the projected data.
You should see:
In [6]:
# Insert code to produce the image below

# Fit LDA on training data

lda = LDA(n_components=2), wine_train_labels)

# Transform training and test data

This study source was downloaded by 100000803816150 from on 03-13-2022 12:08:48 GMT -05:00
wine_train_lda = lda.transform(wine_train)
wine_test_lda = lda.transform(wine_test)

# Plot the 3 classes

col = ['r*','yo','k+']
means = np.zeros((2,3))

for cl in wine_train_classes:
cl_labels = np.array([wine_test_labels == cl]).flatten()
wine_cl = wine_test_lda[cl_labels, :]

means[:, int(cl-1)] = np.mean(wine_cl, axis=0)

plt.plot(wine_cl[:,0], wine_cl[:,1], col[int(cl-1)])

plt.title('Transformation of test data, 2 LDA components')

In [7]:

That the LDA projection is much better at preserving the class structure.
This study source was downloaded by 100000803816150 from on 03-13-2022 12:08:48 GMT -05:00
Powered by TCPDF (

You might also like