Professional Documents
Culture Documents
CAT
Everything is a CAT => 99% accuracy
The Problem with Accuracy What
about
me?
• Unbalanced class distribution
• Truth: 80 cats (0), 20 dogs (1),
• Predict: 99 cats, 1 dog
Accuracy : 0.81
Precision : 0.81 CAT
Recall : 1.00 Everything is a CAT => 99% accuracy
F1 : 0.89
The Problem with Accuracy What
about
me?
• Unbalanced class distribution
• Truth: 80 cats (0), 20 dogs (1),
• Predict: 99 cats, 1 dog
Accuracy : 0.81
Precision : 0.81 CAT
Recall : 1.00 Everything is a CAT => 99% accuracy
F1 : 0.89
https://www.kaggle.com/residentmario/full-batch-mini-batch-and-online-learning
Introduction to Machine Learning with Python: A Guide for Data Scientists
Generalisation
Error
Just avoid
overfitting!
Example
• Google Colaboratory Example
• COMP2712 Evaluating Machine Learning
https://colab.research.google.com/drive/1tbbjAMc9QetoYQRsB
19KagXBz6d9Pqwl?usp=sharing
COMP2712 NNML
Feature A Feature B
Feature Normalisation
Ah, now we can see mouse!
Feature A Feature B
Feature Normalisation
• Z-score: zero mean and unit standard deviation
Feature Normalisation
𝑥 − 𝜇
𝑧 =
𝜎
𝑋 − 𝑋 𝑚𝑖𝑛
𝑋 𝑛𝑜𝑟𝑚 =
𝑋 𝑚𝑎𝑥 − 𝑋 𝑚𝑖𝑛
Curse of dimensionality
• Model fitting
• determine the relationship between the predictors and the outcome so that
future values can be predicted.
• The more predictors a model has, the more the model can learn from data.
• Real data contains random noise, redundancies, etc
• The more predictors the higher the probability that the model will learn fake
patterns within the data
• GENERALISE NOISE
• OVERFITTING
• Adding fewer predictors,
• the model may not learn enough information
• UNDERFITTING
• SOLUTION:
• to determine an appropriate mix between simplicity and complexity.
Feature Extraction
https://towardsdatascience.com/understanding-principal-component-analysis-ddaf350a363a
PCA
https://builtin.com/data-science/step-step-explanation-principal-component-analysis
• Assuming that the first principal component does not account for
100% of the variation within the data set,
• Use the second principal component .
• the linear combination of variables which maximizes variability among all
other linear combinations that are orthogonal to the first.
• Simply, once the first principal component is accounted for, the second
principal component will be the line that is perpendicular to the initial best fit
line.
• NOW, use the principal components as features!
PCA
https://rstudio-pubs-static.s3.amazonaws.com/884549_bcb9c4e0773243f6946c2bac58950b54.html
Principal Components as Descriptors
• Features as principal components
• Directions of the data that explain a maximal amount of variance
• Lines that capture most information of the data
• E.g. Features based on more than one image
• If we have a total of registered images,
• The corresponding pixels at the same spatial location in all images can be
arranged as an n-dimensional vector:
• -- Expected value of :
• An x matrix
• is the variance of
• is the covariance between and
The corresponding pixels at the same spatial location in
all images can be arranged as an n-dimensional vector:
Output Transformation
• Categorical variables are often called nominal
• Some examples include:
• A “pet” variable with the values: “dog” and “cat“.
• A “color” variable with the values: “red“, “green” and “blue“.
• A “place” variable with the values: “first”, “second” and “third“.
• A “passing grade”: “fail”, “pass”
• A “iris variety”: “Iris-setosa”, “Iris-versicolor”, “Iris-virginica”
• Each value represents a different category
• Classifiers (like MLP) need numbers!
Output Transformation
• Two Solutions
• Integer Encoding
• each unique category value is assigned an integer value.
• for example, “red” is 1, “green” is 2, and “blue” is 3.
• okay for ordinal (order matters), but not for true nominal
• One-Hot Encoding
• a new binary variable is added for each unique integer value.
• In the “color” variable example, there are 3 categories and therefore 3
binary variables are needed. A “1” value is placed in the binary variable
for that color and “0” values for the other colors.
y_oh = pd.get_dummies(df['class']).values
# one hot encoding - much easier using keras!
train_labels_oh = tf.keras.utils.to_categorical(train_labels)
Output Transformation
Instance Class Integer Encoding
1 “Red”
2 “Green”
3 “Blue” Instance Class
4 “Green” 1 1
2 2
3 3
4 2