Introduction To Machine Learning: Enrique Vinicio Carrera

Introduction Workflow Classification Regresion Unsupervided Deep Learning
Introduction to
Machine Learning
Enrique Vinicio Carrera
Department of Electrical Engineering

Universidad de las Fuerzas Armadas ESPE
2020
Machine Learning Enrique V. Carrera

Outline
1 Introduction
2 Workflow Example
3 Supervised Classifiers
4 Regresion
5 Unsupervised Learning
6 Deep Learning

Material
• Heart sound classifier
– Original dataset (Physionet)
– Feature matrix for Matlab (mat)
– Matlab scripts (zip)
• Matlab software
– Software download (Free trial)
– A quick tutorial (pdf)
• Other examples
– Matlab scripts (zip)
– Datasets and ML projects (Kaggle)
Copyright disclaimer: The MathWorks, Inc.

Introduction

Artificial Intelligence
• Artificial Intelligence (AI) is the intelligence demonstrated by

machines
– Term coined in 1955 by J. McCarthy, A. Turing, M. Minsky,
A. Newell and H. Simon
• Today, AI is a simulation of intelligent human behavior
• Computational Intelligence—a computer or system designed
to perceive its environment, understand its behaviors, and
take some action
• AI is estimated to create $13 trillion in economic value world-
wide by 2030, according to a McKinsey forecast
– That’s because AI is transforming engineering in nearly every
industry and application area

Artificial Intelligence Evolution

Machine Learning
• Machine Learning (ML) teaches computers to do what comes

naturally to humans and animals: learn from experience
• ML algorithms use computational methods to “learn” infor-
mation directly from data without relying on a predetermined
equation as a model
• ML algorithms find natural patterns in data that generate
insight and help you make better decisions and predictions
• The algorithms adaptively improve their performance as the
number of samples available for learning increases
• ML is currently used to build models for many engineering,
social and scientific areas

Machine Learning Applications

• ML is used every day to make critical decisions in medical
diagnosis, stock trading, energy load forecasting, and more

Machine Learning Applications

• With the rise in big data, ML has become particularly impor-
tant for solving problems in areas like:
– Image processing and computer vision, for face recognition,
motion detection, and object detection
– Computational finance, for credit scoring and algorithmic trad-
ing
– Computational biology, for tumor detection, drug discovery,
and DNA sequencing
– Energy production, for price and load forecasting
– Automotive, aerospace, and manufacturing, for predictive
maintenance
– Natural language processing, for information retrieval and digi-
tal assistants
– Recommendation engines, for e-commerce and marketing

Machine Learning Example
• Email classification: genuine vs. spam

– A black-box approach of ML
• ML uses 2 types of techniques:
– Supervised learning, which trains a model on known input and
output data so that it can predict future outputs
– Unsupervised learning, which finds hidden patterns or intrinsic
structures in input data

Machine Learning Techniques

Supervised Learning
• The aim of supervised ML is to build a model that makes
predictions based on evidence in the presence of uncertainty
• A supervised learning algorithm takes a known set of input
data and known responses to the data (output) and trains a
model to generate reasonable predictions for the response to
new data

Supervised Learning
• Supervised learning uses classification and regression tech-
niques to develop predictive models
• Classification techniques predict discrete responses
– Classification models classify input data into categories
• Regression techniques predict continuous responses

Unsupervised Learning
• Unsupervised learning finds hidden patterns or intrinsic struc-

tures in data
• It’s used to draw inferences from datasets consisting of input
data without labeled responses
• Clustering is the most common unsupervised learning tech-
nique
• It’s used for exploratory data analysis to find hidden patterns
or groupings in data
• Applications for clustering include gene sequence analysis,
market research, and object recognition


Machine Learning Algorithms

Machine Learning Algorithms
• Choosing the right algorithm can seem overwhelming

– There are dozens of supervised and unsupervised ML algo-
rithms, and each takes a different approach to learning
• There’s no best method or one size fits all
• Finding the right algorithm is partly just trial and error
– Even highly experienced data scientists can’t tell whether an
algorithm will work without trying it out
• But algorithm selection also depends on the size and type of
data you’re working with, the insights you want to get from
the data, and how those insights will be used

Machine Learning Usage

• Consider using ML when you have a complex task or problem
involving a large amount of data and lots of variables, but no
existing formula or equation
• For example, ML is a good option if you need to handle situa-
tions like:
– Hand-written rules and equations are too complex
– The rules of a task are constantly changing
– The nature of the data keeps changing, and the program
needs to adapt

Machine Learning Workflow
• Success with ML requires more than training an ML model,

especially in ML-driven systems that make decisions and take
some actions
• A solid ML workflow involves preparing the data, creating a
model, designing the system on which the model will run, and
deploying to hardware or enterprise systems
• Taking raw data and making it useful for an accurate, effi-
cient, and meaningful model is a critical step
– In fact, it represents most of your ML effort
– Data preparation requires domain expertise
– ML also involves prodigious amounts of data and sometimes
you don’t have enough data


• Key factors for success in modeling ML systems are:

– Start with a complete set of algorithms and pre-built models
– Use applications for productive design and analysis
– Work in an open ecosystem where ML tools can be used to-
gether
– Manage compute complexity with GPU acceleration and scal-
ing to parallel and cloud servers and on-premise data centers
• System design must consider that
– ML models exist within a complete system
– Complex ML-driven systems definitively require integration
and simulation

• ML models need to be deployed to CPUs, GPUs or FPGAs

in your final product, whether part of an embedded or edge
device, enterprise system, or cloud
– ML models running on the embedded or edge device provide
the quick results needed in the field
– ML models running in enterprise systems and the cloud pro-
vide results from data collected across many devices
• The deployment process is accelerated when you generate
code from your models and target your devices
– Using code generation optimization techniques and hardware-
optimized libraries, you can tune the code to fit the low power
profile required by embedded and edge devices or the high-
performance needs of enterprise systems and the cloud

Machine Learning Tools
• Python: programming language

– It includes a large set of libraries for modern ML
– Packages like TensorFlow, Keras, Pytorch and Scikit-learn
offer solutions for every existing problem
• Matlab: numerical computing environment
– It has standard libraries for ML, including data processing and
plotting
– Matlab is easy for beginners who are just starting to learn
about programming
– It also contains toolkits for the avid learner
• R: statistical analyses software
– R has a steep learning curve

Workflow Example

Considered Application
• Heart sounds are a rich source of information for early diag-
nosis of cardiac pathologies
• Distinguishing normal from abnormal heart sounds requires a
specially trained clinician
• We’ll (i) acquire the data, (ii) extract features, (iii) explore
various algorithms, (iv ) tune the model to get good perfor-
mance, and (v ) deploy the model in a prototype application

Workflow Example
1. This example uses the dataset from the 2016 PhysioNet and
Computing in Cardiology challenge
– It includes 3240 recordings for model training and 301 record-
ings for model validation
– Recorded heart sounds range in length from 5 to 120 seconds
– The data structure can be created using the function
>> downloadData
• Let’s first understand what kind of data we’re working with
– Common ways to explore data include inspecting some exam-
ples, creating visualizations, and applying signal processing or
clustering techniques to identify patterns
– So, let’s begin by listening and plotting to some examples
>> analyzeExamples

Workflow Example
• The sequence of functions used in each example is

[signal, fs] = audioread(’file.wav’);
soundsc(signal, fs);
plot(signal(1:fs*3))
• You might notice that the abnormal heart sound has higher
frequencies, while normal heart sound is more regular
• To get a deeper understanding of the differences between
these signals, we can view them in the frequency domain
signalAnalyzer(normalSignal, abnormalSignal)
• Most datasets require some preprocessing before feature ex-
traction: typical tasks include removing outliers and trends,
imputing missing data, and normalizing the data

Workflow Example
2. Feature extraction is one of the most important parts of ML
because it turns raw data into information that’s suitable for
ML algorithms
– Feature extraction eliminates the redundancy present in many
types of measured data, facilitating generalization during the
learning phase
– Generalization is critical to avoiding over-fitting the model to
specific examples
• We need to avoid using too many features
– A model with a larger number of features requires more com-
putational resources during the training stage, and using too
many features leads to over-fitting
• The challenge is to find the minimum number of features that
will capture the essential patterns in the data

Workflow Example
• Feature selection is the process of selecting those features
that are most relevant for your specific modeling problem and
removing unneeded or redundant features
– Common approaches to feature selection include stepwise re-
gression, sequential feature selection, and regularization
• Feature extraction can be time-consuming if you have a large
dataset and many features
• In the heart sounds example, we extract the following types of
features:
– Summary statistics: mean, median, standard deviation, mean
absolute deviation, IQR, skewness, kurtosis, and entropy
– Frequency domain: dominant frequency, spectrum entropy,
and Mel Frequency Cepstral Coefficients (MFCC)

Workflow Example
• Thus, splitting each recording into windows of 5 seconds of
heart sound, we extract the 27 features
>> createFeatureMatrix;
>> display(feature table(1:10, :))
>> grpstats(feature table, ’class’, ’mean’)
3. Because we know the true category (ground truth) of each
file in the dataset, we can apply supervised ML
• Developing a predictive model is an iterative process:
– Select the training and validation data
– Select a classification algorithm
– Interactively train and evaluate classification models
• Before training actual classifiers, we need to divide the data
into a training and a validation set

Workflow Example
• The validation set is used to measure accuracy during model

development
• For large datasets, holding out a certain percentage of the
data is appropriate
– Cross-validation is recommended for smaller datasets because
it maximizes how much data is used for model training, and
typically results in a model that generalizes better
• No single ML algorithm works for every problem, and identify-
ing the right algorithm is often a process of trial and error
– However, being aware of key characteristics of various algo-
rithms empowers you to choose which ones to try first, and to
understand the trade-offs you’re making

Workflow Example
• We’re now ready to begin the iterative training and evalua-

tion of models
– We can even follow the brute force approach and run all the
algorithms (Matlab Classification Learner app)
>> classificationLearner

Workflow Example
• For each classifier, the accuracy is estimated either on the
held-out validation data or using cross validation
– Let’s try Linear Discriminant, Logistic Regression, SVM (Lin-
ear and Quadratic), Decision Tree, and KNN

Workflow Example
• It’s good if the initial classifier achieves an accuracy above
90%, but is it sufficient for a heart-screening application?
• Let’s understand what information provides us the confusion
matrix
Sensitivity = TP/(TP+FN) Precision = TP/(TP+FP)

Specificity = TN/(TN+FP) NPV = TN/(TN+FN)

Workflow Example
4. To improve model performance, aside from trying other (more
complex) algorithms, we need to make changes to the process
• Common approaches focus on one of the following aspects:
– Tuning model parameters
– Adding to or modifying training data
– Transforming or extracting features
– Making task-specific trade-offs
• We have assumed that all classification errors are equally un-
desirable, but that’s not always the case
– This situation can also occur when the data is unbalanced
• The standard way to bias a classifier is to introduce a cost
function that assigns higher penalties to undesired misclassifi-
cations
Workflow Example
• Instead of using scatter plots and confusion matrices to ex-
plore the trade-off between true and false positives, we could
use a receiver operating characteristic (ROC) curve
• You can use an ROC curve to select the best operating point
or cutoff for your classifier to minimize the overall ‘cost’ as
defined by your cost function
• ML algorithm parameters can be tuned to achieve a better fit
– The process of identifying the set of parameters that provides
the best model is often referred to as hyper-parameter tuning
– In this case, we can use automated Grid search and Bayesian
optimization
• A final approach for optimizing performance entails feature
selection: removing features that are either redundant or not
carrying useful information
Workflow Example
• This reduces computational cost and storage requirements,
and it results in a simpler model that’s less likely to overfit
• Feature selection approaches include systematically evaluating
feature subsets and incorporating feature selection into the
model construction process by applying feature weights
– For example, we can apply neighborhood component analysis,
a powerful feature selection technique that can handle very
high-dimensional datasets
>> selectFeatures(feature table)

Workflow Example
5. ML applications can be deployed into production systems on

desktops, in enterprise IT systems (either on-site or in the
cloud), and embedded systems
– For desktops and enterprise IT systems, you can deploy ap-
plications on servers by compiling the code either as a stan-
dalone application or for integration with an application writ-
ten in another language
– Many embedded applications require model deployment in C
code
• It’s important to validate the generated C or C++ code
– For that, we can implement a simple prototype application
that interactively classifies files from the validation dataset

Supervised Classifiers

• Use supervised learning if you have existing data for the out-
put you’re trying to predict
• Use classification algorithms if your data can be separated
into specific groups or classes
• As mentioned, selecting a ML algorithm is a process of trial
and error
• It’s also a trade-off between specific characteristics of the al-
gorithms, such as:
– Speed of training
– Memory usage
– Predictive accuracy on new data
– Transparency or interpretability

• Important: using larger training datasets often yield models
that generalize well for new data
• When you’re working on a classification problem, begin by
determining whether the problem is binary or multiclass
– In a binary classification problem, a single training or test item
(instance) can only be divided into 2 classes
– In a multiclass classification problem, it can be divided into
more than 2
• Consider that a multiclass classification problem is generally
more challenging because it requires a more complex model
– Certain algorithms are designed specifically for binary classifi-
cation problems; during training, these algorithms tend to be
more efficient than multiclass algorithms

Linear and non-linear classifiers

Common Classification Algorithms

• Logistic Regression fits a model that can predict the proba-
bility of a binary response belonging to one class or the other
• Because of its simplicity, logistic regression is commonly used
as a starting point (baseline) for binary classification problems
• It’s used when data can be clearly separated by a single, lin-
ear boundary

• k Nearest Neighbor (kNN) categorizes objects based on the

classes of their nearest neighbors in the dataset
• kNN predictions assume that objects near each other are simi-
lar
• Distance metrics, such as Euclidean, city block, cosine, and
Chebychev, are used to find the nearest neighbor
• kNN is best used when:
– You need a simple algorithm to establish benchmark learning
rules
– Memory usage of the trained model is a lesser concern
– Prediction speed of the trained model is a lesser concern


• Support Vector Machine (SVM) classifies data by finding

the linear decision boundary (hyperplane) that separates all
data points of one class from those of the other class
• The best hyperplane is the one with the largest margin be-
tween the 2 classes, when the data is linearly separable
• If the data aren’t linearly separable, a loss function is used to
penalize points on the wrong side of the hyperplane
• SVMs normally use a kernel transform to transform non-
linearly separable data into higher dimensions where a linear
decision boundary can be found
– SVMs fall under a class of ML algorithms called kernel meth-
ods and are also referred to as kernel machines



• Support vectors refer to a small subset of the training obser-
vations that are used as support for the optimal location of
the decision surface
• Only the support vectors chosen from the training data are
required to construct the decision surface

• Popular kernels used with SVMs include:

– Linear
– Polynomial
– Sigmoid
– Gaussian or Radial Basis Function (RBF)
• SVM is best used:
– For data that has exactly 2 classes (you can also use it for
multiclass classification with a technique called error-correcting
output codes)
– For high-dimensional, non-linearly separable data
– When you need a classifier that’s simple, easy to interpret, and
accurate


• Artificial Neural Network (ANN) consists of highly con-
nected networks of neurons that relate the inputs to the de-
sired outputs
• This model is inspired by the human brain
• An ANN is trained by interactively modifying the strengths
of the connections so that given inputs map to the correct
response
• ANNs are best used:
– For modeling highly nonlinear systems
– When data is available incrementally and you wish to con-
stantly update the model
– When there could be unexpected changes in your input data
– When model interpretability isn’t a key concern


• Naive Bayes Classifier assumes that the presence of a par-

ticular feature in a class is unrelated to the presence of any
other feature
• It classifies new data based on the highest probability of its
belonging to a particular class
• Naive Bayes is best used:
– For a small dataset containing many parameters
– When you need a classifier that’s easy to interpret
– When the model will encounter scenarios that were not in the
training data, as is the case with many financial and medical
applications


• Discriminant Analysis classifies data by finding linear combi-
nations of features
• Discriminant analysis assumes that different classes generate
data based on Gaussian distributions
– Training a discriminant analysis model involves finding the
parameters for a Gaussian distribution for each class
– The distribution parameters are used to calculate boundaries,
which can be linear or quadratic functions
– These boundaries are used to determine the class of new data
• Discriminant analysis is best used when:
– You need a simple model that’s easy to interpret
– Memory usage during training is a concern
– You need a model that’s fast to predict


• Decision Tree lets you predict responses to data by following
the decisions in the tree from the root (beginning) down to a
leaf node
• A tree consists of branching conditions where the value of a
predictor is compared to a trained weight
• The number of branches and the values of weights are deter-
mined in the training process
• Additional modification, or pruning, may be used to simplify
the model
• Decision trees are best used when:
– You need an algorithm that’s easy to interpret and fast to fit
– You need to minimize memory usage
– High predictive accuracy isn’t a requirement


• Bagged and Boosted Decision Trees are ensemble meth-

ods where several “weaker” decision trees are combined into a
“stronger” ensemble
• A bagged decision tree consists of trees that are trained inde-
pendently on data that’s bootstrapped from the input data
• Boosting involves creating a strong learner by interactively
adding “weak” learners and adjusting the weight of each
weak learner to focus on misclassified examples
• These trees are best used when:
– Predictors are categorical (discrete) or behave non-linearly
– The time taken to train a model is less of a concern


Regression

Regression
• If the nature of your response is a real number use regression
techniques
• Examples of regression are changes in temperature or fluctua-
tions in electricity demand
• Applications include forecasting stock prices, handwriting
recognition, and acoustic signal processing
• The same considerations for selecting the right algorithm in
classification tasks, are applied in the case of regression

Common Regression Algorithms

• Linear Regression is a statistical modeling technique used to
describe a continuous response variable as a linear function of
one or more predictor variables
• Because linear regression models are simple to interpret and
easy to train, they’re often the first model to be fitted to a
new dataset
• Linear regression is used as a baseline for evaluating other,
more complex, regression models

• Nonlinear Regression is a statistical modeling technique

that helps describe nonlinear relationships in experimental
data
• Nonlinear regression models are generally assumed to be para-
metric, where the model is described as a nonlinear equation
• ‘Nonlinear’ refers to a fit function that’s a nonlinear function
of the parameters
• For example, if the fitting parameters are b0 , b1 , and b2
– The equation y = b0 + b1 x1 + b2 x2 is a linear function of the
fitting parameters
– Whereas y = (b0 x1b1 )/(x2 + b2 ) is a nonlinear function of the
fitting parameters

• Nonlinear regression is best used:

– When data has strong nonlinear trends and can’t be easily
transformed into a linear space
– For fitting custom models to data

• Gaussian Process Regression (GPR) models are non-para-

metric models that are used for predicting the value of a con-
tinuous response variable
• They’re widely used in the field of spatial analysis for interpo-
lation in the presence of uncertainty
• GPR is also referred to as Kriging
• GPR is best used:
– For interpolating spatial data, such as hydrogeological data for
the distribution of ground water
– As a surrogate model to facilitate optimization of complex
designs such as automotive engines


• SVM Regression algorithms work like SVM classification
algorithms, but are modified to predict a continuous response
• Instead of finding a hyperplane that separates data, SVM re-
gression algorithms find a model that deviates from the mea-
sured data by a value no greater than a small amount, with
parameter values that are as small as possible (to minimize
sensitivity to error)
• SVM regression is best used for high-dimensional data (where
there will be a large number of predictor variables


• Generalized Linear Model is a special case of nonlinear
models that uses linear methods
• It involves fitting a linear combination of the inputs to a non-
linear function (the link function) of the outputs
• It’s best used when the response variables have non-normal
distributions, such as a response variable that’s always ex-
pected to be positive


• Regression Trees or decision trees for regression are similar
to decision trees for classification, but they’re modified to be
able to predict continuous responses
• They’re best used when predictors are categorical (discrete)
or behave non-linearly


Clustering
• Unsupervised learning is useful when you want to explore your

data but don’t yet have a specific goal or aren’t sure what
information the data contains
• It’s also a good way to reduce the dimensions of your data
• Most unsupervised learning techniques are a form of cluster
analysis
• In cluster analysis, data is partitioned into groups based on
some measure of similarity or shared characteristic
• Clusters are formed so that objects in the same cluster are
very similar and objects in different clusters are very distinct

Clustering Algorithms
• Clustering algorithms fall into 2 broad groups:

– Hard clustering, where each data point belongs to only one
cluster
– Soft clustering, where each data point can belong to more
than one cluster
• You can use hard or soft clustering techniques if you already
know the possible data groupings
• If you don’t yet know how the data might be grouped
– Use self-organizing feature maps or hierarchical clustering to
look for possible structures in the data
– Use cluster evaluation to look for the ‘best’ number of groups
for a given clustering algorithm

Common Hard Clustering Algorithms

• k-means partitions data into k number of mutually exclusive
clusters
• How well a point fits into a cluster is determined by the dis-
tance from that point to the cluster’s center
• It’s best used:
– When the number of clusters is known
– For fast clustering of large data sets


• k-medoids is similar to k-means, but with the requirement
that the cluster centers coincide with points in the data
– For fast clustering of categorical data
– To scale to large data sets


• Hierarchical Clustering produces nested sets of clusters by
analyzing similarities between pairs of points and grouping
objects into a binary, hierarchical tree
• It’s best used when:
– You don’t know how many clusters are in your data
– You want visualization to guide your selection


• Self-Organizing Map is a neural-network based clustering
that transforms a dataset into a topology-preserving 2D map
• It’s best used to:
– Visualize high-dimensional data in 2D or 3D
– Deduce the dimensionality of data by preserving its topology
(shape)

Common Soft Clustering Algorithms

• Fuzzy c-Means is a partition-based clustering when data
points may belong to more than one cluster
– For pattern recognition
– When clusters overlap

Common Soft Clustering Algorithms

• Gaussian Mixture Model is a partition-based clustering
where data points come from different multivariate normal
distributions with certain probabilities
• It’s best used when:
– A data point might belong to more than one cluster
– Clusters have different sizes and correlation structures within
them

Deep Learning

Deep Learning
• Deep Learning (DL) is a type of ML in which a model learns

to perform classification tasks directly from images, text, or
sound
• DL is usually implemented using an ANN architecture
• The term “deep” refers to the number of layers in the net-
work – the more layers, the deeper the network
• Traditional ANNs contain only 2 or 3 layers, while deep net-
works can have hundreds
• DL is especially well-suited to identification applications such
as face recognition, text translation, voice recognition, and
advanced driver assistance systems

Deep Learning
• DL is the state-of-the-art in classifiers because its accuracy

• Advanced tools and techniques have dramatically improved
DL algorithms
– They can outperform humans at classifying images (see next
figure), win against the world’s best GO player, or enable a
voice-controlled assistant
• Three technology enablers make this degree of accuracy pos-
sible
– Easy access to massive sets of labeled data
– Increased computing power
– Pretrained models built by experts

Deep Learning

Deep Learning
• Data sets such as ImageNet and PASCAL VoC are freely

available, and are useful for training on many different types
of objects
• High-performance GPUs accelerate the training of the mas-
sive amounts of data needed for DL, reducing training time
from weeks to hours
• Models such as AlexNet can be retrained to perform new
recognition tasks using a technique called transfer learning
– While AlexNet was trained on 1.3 million high-resolution im-
ages to recognize 1000 different objects, accurate transfer
learning can be achieved with much smaller datasets

Main Differences
• As we said, DL is a subtype of ML
– With ML, you manually extract the relevant features of the
original data
– With DL, you feed the raw data directly into a deep neural
network that learns the features automatically

Main Differences
• In addition, DL often requires hundreds of thousands or mil-
lions of examples for the best results
– It’s also computationally intensive and requires a high-perfor-
mance GPU

Deep Learning Algorithms


• Deep Neural Network combines multiple nonlinear process-
ing layers, using simple elements operating in parallel and in-
spired by biological nervous systems
• It consists of an input layer, several hidden layers, and an out-
put layer
• The layers are interconnected via nodes, or neurons, with
each hidden layer using the output of the previous layer as
its input
• Using this training data, the network can then start to under-
stand the object’s specific features and associate them with
the corresponding category
• Notice that the network learns directly from the data – we
have no influence over what features are being learned


• Convolutional Neural Network (CNN or ConvNet) is one of
the most popular algorithms for DL with images and video
• Convolution puts the input images through a set of convolu-
tional filters, each of which activates certain features from the
images
• Like other ANNs, a CNN is composed of an input layer, an
output layer, and many hidden layers in between

• Feature Detection Layers perform one of 3 types of opera-

tions on the data: convolution, pooling, or rectified linear unit
(ReLU)
– Convolution puts the input images through a set of convolu-
tional filters, each of which activates certain features from the
images
– Pooling simplifies the output by performing nonlinear down-
sampling, reducing the number of parameters that the network
needs to learn about
– ReLU allows for faster and more effective training by mapping
negative values to zero and maintaining positive values
• These 3 operations are repeated over tens or hundreds of lay-
ers, with each layer learning to detect different features



• After feature detection, the architecture of a CNN shifts to
classification
• The next-to-last layer is a fully connected (FC) layer that
outputs a vector of K dimensions, where K is the number
of classes that the network will be able to predict
• This vector contains the probabilities for each class of any
image being classified
• The final layer of the CNN architecture uses a softmax func-
tion to provide the classification output
• There’s no exact formula for selecting layers
• The best approach is to try a few and see how well they work,
or to use a pre-trained network

• Recurrent Neural Network (RNN) is a class of ANN where

connections between nodes form a directed graph along a
temporal sequence
• This allows it to exhibit temporal dynamic behavior
– RNNs can use their internal state (memory) to process vari-
able length sequences of inputs
• RNNs are based on Hopfield ANNs (David Rumelhart)
• Long Short-Term Memory (LSTM) networks set accuracy
records in multiple applications domains
• Gated Recurrent Units are like an LSTM with a forget gate,
but they have fewer parameters

Simple Deep Learning

• A quick and easy way to get started is to use an existing net-
work, such as AlexNet, a CNN trained on more than a million
images
– AlexNet is most commonly used for image classification
– It can classify images into 1000 different categories, including
keyboards, computer mice, pencils, and other office equip-
ment, as well as various breeds of dogs, cats, horses, and other
animals
>> nnet = alexnet;
• So, we’ll use it to classify objects in an image from a webcam
installed on a desktop
>> camera = webcam;
>> picture = camera.snapshot;
>> picture = imresize(picture, [227, 227]);


• AlexNet can now classify the image
>> label = classify(nnet, picture);
>> image(picture);
>> title(char(label));
• In this example, we used the network straight out of the box
– We didn’t modify it in any way because AlexNet was trained
on images similar to the ones we wanted to classify
• To use AlexNet for objects not trained in the original network,
we can retrain it through transfer learning
– Transfer learning is an approach that applies knowledge of one
type of problem to a different but related problem
• In this case, we simply trim off the last 3 layers of the net-
work and retrain them with our own images


• Example code:
nnet = alexnet;
layers = nnet.Layers;
layers(23) = fullyConnectedLayer(5);
layers(25) = classificationLayer;
ims = imageDataStore(’Images’, ’IncludeSubfolders’,
true, ’LabelSource’, ’foldernames’);
[train, test] = splitEachLabel(ims,0.8,’randomize’)
opts = trainingOptions(’sgdm’, ’InitialLearnRate’,
0.001, ’MaxEphocs’,20, ’MiniBatchSize’, 64);
myNet = trainNetwork(train, layers, opts);
predictedLabels = classify(myNet, test);
accuracy = mean(predictedLabels == test.Labels)
• If transfer learning doesn’t suit your application, you may
need to train your own network from scratch

Deep Learning Resources

• Training a network from scratch produces the most accurate
results, but it generally requires hundreds of thousands of la-
beled images and considerable computational resources
• Training a DL model can take hours, days, or weeks, depend-
ing on the size of the data and the amount of processing
power you have available
• Selecting a computational resource is a critical consideration
when you set up your workflow
• Currently, there are 3 computation options: CPU-based, GPU-
based, and cloud-based

Deep Learning Resources

• CPU-based computation is the simplest and most readily
available option
– It’s recommended using CPU-based computation only for sim-
ple examples using a pre-trained network
• Using a GPU reduces network training time from days to
hours
– You can use a GPU in Matlab without any additional effort
– Matlab recommends an NVidia 3.0 compute-capable GPU
– Multiple GPUs can speed up processing even more
• Cloud-based GPU computation means that you don’t have to
buy and set up the hardware yourself
– The Matlab code for using a local GPU can be extended to
use cloud resources with a few settings changes

Object Detection Example

• You only look once (YOLO) is a state-of-the-art, real-time
object detection system
• YOLOv3 is extremely fast and accurate
– In fact, you can easily trade-off between speed and accuracy
simply by changing the size of the model (no retraining re-
quired)
• YOLO applies a single neural network to the full image
– This network divides the image into regions and predicts
bounding boxes and probabilities for each region
– These bounding boxes are weighted by the predicted probabili-
ties
• Darknet on the CPU is fast but it’s like 500 times faster on
GPU


• Installing YOLOv3 in Linux
$ git clone https://github.com/pjreddie/darknet
$ cd darknet
$ make
• Detection using a pre-trained model
$ wget https://pjreddie.com/media/files/...
yolov3-tiny.weights
$ ./darknet detect cfg/yolov3-tiny.cfg
yolov3-tiny.weights data/dog.jpg
• Check the output of the detection
$ eog predictions.jpg
• You can use yolov3.cfg and yolov3.weights instead of
yolov3-tiny.cfg and yolov3-tiny.weights

• Real-time detection on a webcam is possible if you enable

OpenCV and CUDA support in the Makefile file before com-
piling
• You can train YOLO from scratch downloading the Pascal
VOC or COCO datasets
• Real-time object detection example (YouTube)
• Some YOLO resources
– Web page
https://pjreddie.com/darknet/yolo/
– Technical paper
https://pjreddie.com/media/files/papers/YOLOv3.pdf

evcarrera@espe.edu.ec

Introduction To Machine Learning: Enrique Vinicio Carrera

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Machine Learning: Enrique Vinicio Carrera

Uploaded by

Copyright:

Available Formats

Introduction Workflow Classification Regresion Unsupervided Deep Learning

Department of Electrical Engineering

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Copyright disclaimer: The MathWorks, Inc.

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

• Artificial Intelligence (AI) is the intelligence demonstrated by

Machine Learning Enrique V. Carrera

Artificial Intelligence Evolution

Machine Learning Enrique V. Carrera

• Machine Learning (ML) teaches computers to do what comes

Machine Learning Enrique V. Carrera

Machine Learning Applications

Machine Learning Enrique V. Carrera

Machine Learning Applications

Machine Learning Enrique V. Carrera

Machine Learning Example

• Email classification: genuine vs. spam

Machine Learning Enrique V. Carrera

Machine Learning Techniques

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

• Unsupervised learning finds hidden patterns or intrinsic struc-

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Machine Learning Algorithms

Machine Learning Enrique V. Carrera

Machine Learning Algorithms

• Choosing the right algorithm can seem overwhelming

Machine Learning Enrique V. Carrera

Machine Learning Usage

Machine Learning Enrique V. Carrera

Machine Learning Workflow

• Success with ML requires more than training an ML model,

Machine Learning Enrique V. Carrera

Machine Learning Workflow

Machine Learning Enrique V. Carrera

Machine Learning Workflow

• Key factors for success in modeling ML systems are:

Machine Learning Enrique V. Carrera

Machine Learning Workflow

• ML models need to be deployed to CPUs, GPUs or FPGAs

Machine Learning Enrique V. Carrera

Machine Learning Tools

• Python: programming language

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

• The sequence of functions used in each example is

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

• The validation set is used to measure accuracy during model

Machine Learning Enrique V. Carrera

• We’re now ready to begin the iterative training and evalua-

Machine Learning Enrique V. Carrera

Machine Learning Enrique V. Carrera

Sensitivity = TP/(TP+FN) Precision = TP/(TP+FP)

Machine Learning Enrique V. Carrera