Comp2712 l05 ML Feature

COMP2712 Heuristic Optimisation
Dr Trent Lewis
College of Science and Engineering
Feature Level Processing

Feature Level Processing
• Feature Normalisation (e.g. min/max, z-score)
• Feature Reduction (e.g. PCA)
Feature Normalisation
Hard for the mouse to get noticed
as its scale is different to the elephant
The values of inputs*weights for Feature A
will dominate the summation in the
next layer compared to Feature B values
This will make convergence difficult,

especially if Feature B is important for
class separability.
Feature A Feature B
Ah, now we can see mouse!
Now both Feature A and B are on the

same scale, convergence will be
quicker as values are comparable.
Feature A Feature B
• Z-score: zero mean and unit standard deviation
• Min-Max scaling: [0, 1]

https://colab.research.google.com/drive/1nbxaVa7YElj9EdG78q1iHhXYDzIjLBH0?usp=sharing
𝑥 − 𝜇
𝑧 =
𝜎
𝑋 − 𝑋 𝑚𝑖𝑛
𝑋 𝑛𝑜𝑟𝑚 =
𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛
Feature Reduction
• Curse of Dimensionality  overfitting
• Correlations between features  misleading results
• Principle Components Analysis (PCA) to the rescue
• Reduces the original data features into uncorrelated principal
components
• Each component representing a different set of correlated
features with different amounts of variation.
• “Retain components that account for 90% variation”
• Depends on data
• Could reduce from 100’s to 10’s of features
https://colab.research.google.com/drive/1rOL7B6PGb-bovZ7z26K0daqTCzErZJpX?usp=sharing
Feature Reduction: Breast Cancer



https://colab.research.google.com/drive/1CjzTBsyt7FsrPb4KCh0qfZEsvHpGTCBq?usp=sharing

https://medium.com/apprentice-journal/pca-application-in-machine-learning-4827c07a61db
Feature Reduction: PCA Limitations

• Model performance: PCA can lead to a reduction in model
performance on datasets with no or low feature correlation or does
not meet the assumptions of linearity.
• Classification accuracy: Variance based PCA framework does
not consider the differentiating characteristics of the classes. Also,
the information that distinguishes one class from another might be
in the low variance components and may be discarded.
• Outliers: PCA is also affected by outliers, and normalization of the
data needs to be an essential component of any workflow.
• Interpretability: Each principal component is a combination of
original features and does not allow for the individual feature
importance to be recognized.
https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
Output Transformation
• Categorical variables are often called nominal
• Some examples include:
• A “pet” variable with the values: “dog” and “cat“.
• A “color” variable with the values: “red“, “green” and “blue“.
• A “place” variable with the values: “first”, “second” and “third“.
• A “passing grade”: “fail”, “pass”
• A “iris variety”: “Iris-setosa”, “Iris-versicolor”, “Iris-virginica”
• Each value represents a different category
• Classifiers (like MLP) need numbers!
• Two Solutions
• Integer Encoding
• each unique category value is assigned an integer value.
• for example, “red” is 1, “green” is 2, and “blue” is 3.
• okay for ordinal (order matters), but not for true nominal
• One-Hot Encoding
• a new binary variable is added for each unique integer value.
• In the “color” variable example, there are 3 categories and therefore 3
binary variables are needed. A “1” value is placed in the binary variable
for that color and “0” values for the other colors.
y_oh = pd.get_dummies(df['class']).values
# one hot encoding - much easier using keras!
train_labels_oh = tf.keras.utils.to_categorical(train_labels)
Instance Class Integer Encoding
• Two Solutions1 “Red”
2 “Green”
• Integer Encoding
3 “Blue”
• each unique category value is assigned an Instance
integer value. Class
4 “Green”
• for example, “red” is 1, “green” is 2, and “blue”1 is 3. 1
2 2
• okay for ordinal (order matters), but not for true nominal
3 3
• One-Hot Encoding 4 2
• a new binary variable is added for each unique integer value.
Instance Red Green Blue
• In1the “color” variable example, One-Hot
there are 3 Encoding
categories and therefore 3
1 0 0
binary variables are needed. A “1” value is placed in the binary variable
for2 the color
0 1
and “0” 0
values for the other colors.
3 0 0 1
4 0 1 0
• Integer Encoding input hidden output
[1, 4] Single output that ranges of class,

e.g. [1, 4]
• One Hot Encoding

input hidden output
[1,0,0,0]
[0,1,0,0] Output neuron for each class,
e.g. [0, 1], [0,1], [0,1], [0,1]
[0,0,1,0]
[0,0,0,1]
Example
• Google Colaboratory Example
• COMP2712 ML Review Feature Normalisation: Output
https://colab.research.google.com/drive/1LkHD_QTzqhmo6UR
WHxwlLXE8rytW02kD?usp=sharing
• COMP2712 Exploring feature reduction with PCA with ML

https://colab.research.google.com/drive/1TqDnU8D5M4mNd9h
mpDJRofiuj7V9l5Gc?usp=sharing

Comp2712 l05 ML Feature

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comp2712 l05 ML Feature

Uploaded by

Copyright:

Available Formats

COMP2712 Heuristic Optimisation

Feature Level Processing

This will make convergence difficult,

Ah, now we can see mouse!

Now both Feature A and B are on the

• Min-Max scaling: [0, 1]

Feature Reduction: Breast Cancer

Feature Reduction: Breast Cancer

Feature Reduction: Breast Cancer

Feature Reduction: Breast Cancer

Feature Reduction: PCA Limitations

[1, 4] Single output that ranges of class,

• One Hot Encoding

• COMP2712 Exploring feature reduction with PCA with ML

You might also like