You are on page 1of 20

COMP2712 Heuristic Optimisation

Dr Trent Lewis
College of Science and Engineering

Feature Level Processing


Feature Level Processing
• Feature Normalisation (e.g. min/max, z-score)
• Feature Reduction (e.g. PCA)
Feature Normalisation
Hard for the mouse to get noticed
as its scale is different to the elephant
The values of inputs*weights for Feature A
will dominate the summation in the
next layer compared to Feature B values

This will make convergence difficult,


especially if Feature B is important for
class separability.

Feature A Feature B
Feature Normalisation

Ah, now we can see mouse!

Now both Feature A and B are on the


same scale, convergence will be
quicker as values are comparable.

Feature A Feature B
Feature Normalisation
• Z-score: zero mean and unit standard deviation

• Min-Max scaling: [0, 1]


https://colab.research.google.com/drive/1nbxaVa7YElj9EdG78q1iHhXYDzIjLBH0?usp=sharing

Feature Normalisation
𝑥 − 𝜇
𝑧 =
𝜎

𝑋 − 𝑋 𝑚𝑖𝑛
𝑋 𝑛𝑜𝑟𝑚 =
𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛
Feature Reduction
• Curse of Dimensionality  overfitting
• Correlations between features  misleading results
• Principle Components Analysis (PCA) to the rescue
• Reduces the original data features into uncorrelated principal
components
• Each component representing a different set of correlated
features with different amounts of variation.
• “Retain components that account for 90% variation”
• Depends on data
• Could reduce from 100’s to 10’s of features
https://colab.research.google.com/drive/1rOL7B6PGb-bovZ7z26K0daqTCzErZJpX?usp=sharing

Feature Reduction: Breast Cancer


https://colab.research.google.com/drive/1rOL7B6PGb-bovZ7z26K0daqTCzErZJpX?usp=sharing

Feature Reduction: Breast Cancer


https://colab.research.google.com/drive/1rOL7B6PGb-bovZ7z26K0daqTCzErZJpX?usp=sharing

Feature Reduction: Breast Cancer


Feature Reduction: Breast Cancer
Feature Reduction: Breast Cancer
https://colab.research.google.com/drive/1CjzTBsyt7FsrPb4KCh0qfZEsvHpGTCBq?usp=sharing

Feature Reduction: Breast Cancer


https://medium.com/apprentice-journal/pca-application-in-machine-learning-4827c07a61db

Feature Reduction: PCA Limitations


• Model performance: PCA can lead to a reduction in model
performance on datasets with no or low feature correlation or does
not meet the assumptions of linearity.
• Classification accuracy: Variance based PCA framework does
not consider the differentiating characteristics of the classes. Also,
the information that distinguishes one class from another might be
in the low variance components and may be discarded.
• Outliers: PCA is also affected by outliers, and normalization of the
data needs to be an essential component of any workflow.
• Interpretability: Each principal component is a combination of
original features and does not allow for the individual feature
importance to be recognized.
https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/

Output Transformation
• Categorical variables are often called nominal
• Some examples include:
• A “pet” variable with the values: “dog” and “cat“.
• A “color” variable with the values: “red“, “green” and “blue“.
• A “place” variable with the values: “first”, “second” and “third“.
• A “passing grade”: “fail”, “pass”
• A “iris variety”: “Iris-setosa”, “Iris-versicolor”, “Iris-virginica”
• Each value represents a different category
• Classifiers (like MLP) need numbers!
Output Transformation
• Two Solutions
• Integer Encoding
• each unique category value is assigned an integer value.
• for example, “red” is 1, “green” is 2, and “blue” is 3.
• okay for ordinal (order matters), but not for true nominal
• One-Hot Encoding
• a new binary variable is added for each unique integer value.
• In the “color” variable example, there are 3 categories and therefore 3
binary variables are needed. A “1” value is placed in the binary variable
for that color and “0” values for the other colors.
y_oh = pd.get_dummies(df['class']).values
# one hot encoding - much easier using keras!
train_labels_oh = tf.keras.utils.to_categorical(train_labels)
Output Transformation
Instance Class Integer Encoding
• Two Solutions1 “Red”
2 “Green”
• Integer Encoding
3 “Blue”
• each unique category value is assigned an Instance
integer value. Class
4 “Green”
• for example, “red” is 1, “green” is 2, and “blue”1 is 3. 1
2 2
• okay for ordinal (order matters), but not for true nominal
3 3
• One-Hot Encoding 4 2
• a new binary variable is added for each unique integer value.
Instance Red Green Blue
• In1the “color” variable example, One-Hot
there are 3 Encoding
categories and therefore 3
1 0 0
binary variables are needed. A “1” value is placed in the binary variable
for2 the color
0 1
and “0” 0
values for the other colors.
3 0 0 1
4 0 1 0
Output Transformation
• Integer Encoding input hidden output

[1, 4] Single output that ranges of class,


e.g. [1, 4]

• One Hot Encoding


input hidden output
[1,0,0,0]
[0,1,0,0] Output neuron for each class,
e.g. [0, 1], [0,1], [0,1], [0,1]
[0,0,1,0]
[0,0,0,1]
Example
• Google Colaboratory Example
• COMP2712 ML Review Feature Normalisation: Output
https://colab.research.google.com/drive/1LkHD_QTzqhmo6UR
WHxwlLXE8rytW02kD?usp=sharing

• COMP2712 Exploring feature reduction with PCA with ML


https://colab.research.google.com/drive/1TqDnU8D5M4mNd9h
mpDJRofiuj7V9l5Gc?usp=sharing

You might also like