Professional Documents
Culture Documents
Introduction to
Machine Learning
Enrique Vinicio Carrera
2020
Outline
1 Introduction
2 Workflow Example
3 Supervised Classifiers
4 Regresion
5 Unsupervised Learning
6 Deep Learning
Material
• Heart sound classifier
– Original dataset (Physionet)
– Feature matrix for Matlab (mat)
– Matlab scripts (zip)
• Matlab software
– Software download (Free trial)
– A quick tutorial (pdf)
• Other examples
– Matlab scripts (zip)
– Datasets and ML projects (Kaggle)
Introduction
Artificial Intelligence
Machine Learning
Supervised Learning
• The aim of supervised ML is to build a model that makes
predictions based on evidence in the presence of uncertainty
• A supervised learning algorithm takes a known set of input
data and known responses to the data (output) and trains a
model to generate reasonable predictions for the response to
new data
Supervised Learning
• Supervised learning uses classification and regression tech-
niques to develop predictive models
• Classification techniques predict discrete responses
– Classification models classify input data into categories
• Regression techniques predict continuous responses
Unsupervised Learning
Unsupervised Learning
Workflow Example
Considered Application
• Heart sounds are a rich source of information for early diag-
nosis of cardiac pathologies
• Distinguishing normal from abnormal heart sounds requires a
specially trained clinician
• We’ll (i) acquire the data, (ii) extract features, (iii) explore
various algorithms, (iv ) tune the model to get good perfor-
mance, and (v ) deploy the model in a prototype application
Workflow Example
1. This example uses the dataset from the 2016 PhysioNet and
Computing in Cardiology challenge
– It includes 3240 recordings for model training and 301 record-
ings for model validation
– Recorded heart sounds range in length from 5 to 120 seconds
– The data structure can be created using the function
>> downloadData
• Let’s first understand what kind of data we’re working with
– Common ways to explore data include inspecting some exam-
ples, creating visualizations, and applying signal processing or
clustering techniques to identify patterns
– So, let’s begin by listening and plotting to some examples
>> analyzeExamples
Workflow Example
Workflow Example
2. Feature extraction is one of the most important parts of ML
because it turns raw data into information that’s suitable for
ML algorithms
– Feature extraction eliminates the redundancy present in many
types of measured data, facilitating generalization during the
learning phase
– Generalization is critical to avoiding over-fitting the model to
specific examples
• We need to avoid using too many features
– A model with a larger number of features requires more com-
putational resources during the training stage, and using too
many features leads to over-fitting
• The challenge is to find the minimum number of features that
will capture the essential patterns in the data
Workflow Example
• Feature selection is the process of selecting those features
that are most relevant for your specific modeling problem and
removing unneeded or redundant features
– Common approaches to feature selection include stepwise re-
gression, sequential feature selection, and regularization
• Feature extraction can be time-consuming if you have a large
dataset and many features
• In the heart sounds example, we extract the following types of
features:
– Summary statistics: mean, median, standard deviation, mean
absolute deviation, IQR, skewness, kurtosis, and entropy
– Frequency domain: dominant frequency, spectrum entropy,
and Mel Frequency Cepstral Coefficients (MFCC)
Workflow Example
• Thus, splitting each recording into windows of 5 seconds of
heart sound, we extract the 27 features
>> createFeatureMatrix;
>> display(feature table(1:10, :))
>> grpstats(feature table, ’class’, ’mean’)
3. Because we know the true category (ground truth) of each
file in the dataset, we can apply supervised ML
• Developing a predictive model is an iterative process:
– Select the training and validation data
– Select a classification algorithm
– Interactively train and evaluate classification models
• Before training actual classifiers, we need to divide the data
into a training and a validation set
Workflow Example
Workflow Example
Workflow Example
• For each classifier, the accuracy is estimated either on the
held-out validation data or using cross validation
– Let’s try Linear Discriminant, Logistic Regression, SVM (Lin-
ear and Quadratic), Decision Tree, and KNN
Workflow Example
• It’s good if the initial classifier achieves an accuracy above
90%, but is it sufficient for a heart-screening application?
• Let’s understand what information provides us the confusion
matrix
Workflow Example
4. To improve model performance, aside from trying other (more
complex) algorithms, we need to make changes to the process
• Common approaches focus on one of the following aspects:
– Tuning model parameters
– Adding to or modifying training data
– Transforming or extracting features
– Making task-specific trade-offs
• We have assumed that all classification errors are equally un-
desirable, but that’s not always the case
– This situation can also occur when the data is unbalanced
• The standard way to bias a classifier is to introduce a cost
function that assigns higher penalties to undesired misclassifi-
cations
Machine Learning Enrique V. Carrera
Introduction Workflow Classification Regresion Unsupervided Deep Learning
Workflow Example
• Instead of using scatter plots and confusion matrices to ex-
plore the trade-off between true and false positives, we could
use a receiver operating characteristic (ROC) curve
• You can use an ROC curve to select the best operating point
or cutoff for your classifier to minimize the overall ‘cost’ as
defined by your cost function
• ML algorithm parameters can be tuned to achieve a better fit
– The process of identifying the set of parameters that provides
the best model is often referred to as hyper-parameter tuning
– In this case, we can use automated Grid search and Bayesian
optimization
• A final approach for optimizing performance entails feature
selection: removing features that are either redundant or not
carrying useful information
Machine Learning Enrique V. Carrera
Introduction Workflow Classification Regresion Unsupervided Deep Learning
Workflow Example
• This reduces computational cost and storage requirements,
and it results in a simpler model that’s less likely to overfit
• Feature selection approaches include systematically evaluating
feature subsets and incorporating feature selection into the
model construction process by applying feature weights
– For example, we can apply neighborhood component analysis,
a powerful feature selection technique that can handle very
high-dimensional datasets
>> selectFeatures(feature table)
Workflow Example
Supervised Classifiers
Supervised Classifiers
• Use supervised learning if you have existing data for the out-
put you’re trying to predict
• Use classification algorithms if your data can be separated
into specific groups or classes
• As mentioned, selecting a ML algorithm is a process of trial
and error
• It’s also a trade-off between specific characteristics of the al-
gorithms, such as:
– Speed of training
– Memory usage
– Predictive accuracy on new data
– Transparency or interpretability
Supervised Classifiers
• Important: using larger training datasets often yield models
that generalize well for new data
• When you’re working on a classification problem, begin by
determining whether the problem is binary or multiclass
– In a binary classification problem, a single training or test item
(instance) can only be divided into 2 classes
– In a multiclass classification problem, it can be divided into
more than 2
• Consider that a multiclass classification problem is generally
more challenging because it requires a more complex model
– Certain algorithms are designed specifically for binary classifi-
cation problems; during training, these algorithms tend to be
more efficient than multiclass algorithms
Supervised Classifiers
Regression
Regression
• If the nature of your response is a real number use regression
techniques
• Examples of regression are changes in temperature or fluctua-
tions in electricity demand
• Applications include forecasting stock prices, handwriting
recognition, and acoustic signal processing
• The same considerations for selecting the right algorithm in
classification tasks, are applied in the case of regression
Unsupervised Learning
Clustering
Clustering Algorithms
Deep Learning
Deep Learning
Deep Learning
Deep Learning
Deep Learning
Main Differences
• As we said, DL is a subtype of ML
– With ML, you manually extract the relevant features of the
original data
– With DL, you feed the raw data directly into a deep neural
network that learns the features automatically
Main Differences
• In addition, DL often requires hundreds of thousands or mil-
lions of examples for the best results
– It’s also computationally intensive and requires a high-perfor-
mance GPU
evcarrera@espe.edu.ec