You are on page 1of 25

Machine Learning

Scholarship Program
for
Microsoft Azure
LESSON 2

The types of Data


 Numerical
 Time-Series ; data collected equally space of time
 Categorical ; discrete and class
 Text
 Image (pictures/image)

Standardization rescales data so that it has a mean of 0 and a standard deviation of 1.

Normalization rescales the data into the range [0, 1].

Image define as Height * Weight * channel (3)

The number of channels required to represent the color is known as the color depth or
simply depth. For RGB image, depth = 3 while for grayscale image has depth=1

Text analysis pipeline ; Text preprocessing (removing stop words, stemming & lemmatizing),
text extraction (vectorization)

Models are the specific representations learned from data

Algorithms are the processes of learning

Linear Regression
Linear Regression simplifies the target function Y to a line

Simple linear Regression ; 1 input


Y = B0 + B1 * X

More than 1 input called Multiple Linear Regression


Y = B0
B0 ; Bias
B0 - Bn are called coefficients that are learnt that best defined the input variables

Assumptions
 Assumes that the relationship between Input & Output is linear
 Assume that input & output aren't noisy, so it is expected to remove outliers in the
form of noise
 Remove collinearity ; It overfit where there are highly correlated features, hence it is
important to calculate pairwise correlation foe the input data and drop the most
correlated
 Gaussian distribution ; Input and ouptut should be guassian distributed
 Rescale inputs

Learning function
The core goal of machine learning prediction is to learn a useful transformation that
minimizes the error that is closer to the actual output value.

Y = F(x) + e
Where e is irreducible error which is independent on the input data (x) because no matter
how good our model get we can't completely remove the error

Parametric vs. Non-parametric


Parametric
 Simplify the mapping to a known functional form (e.g. linear regression B0 + B1 * X1
etc.)
Benefits
 Simpler
 Faster and explanability
 Less training data

Limitations
 Highly constrained
 Limited complexity
 Poor fit

Non Parametric
 They are not making assumptions regarding the form of the mapping between input
data and output, they are free to learn any form from the date (e.g. KNN)

Benefits
 High Flexibility and capable of fitting large number of functional forms
 High performances

Limitations
 More training data required
 Slower to training due to having more parameters to tune.
 Overfitting and difficulty in explanability

Classical ML vs DeepLearning (DL)


DL represents a category of ML algorithms base on neural networks. All DL algorithms are
ML and not vice-versa

DL advantage
 Use to learn large complex data availability
DL disadvantage
 Black box due to difficulty in explanation
 Large computation resource required

Classical ML advantages:
 More suitable for small data
 Easier to interpret outcomes
 Cheaper to perform
 Can run on low-end machines
 Does not require large computational power

Classical ML disadvantages:
 Difficult to learn large datasets
 Require feature engineering
 Difficult to learn complex functions

Approaches to Machine learning


 Supervised ; The algorithm learn from the data that contains both the input and the
label target
o Classification ; Output category / class e.g Yes/No
o Regression ; Output numerical and continuous
o Similarity Learning ; learn from examples using similarity e.g. Recommender
system

 Unsupervised ; algorithm learn from data that contains only inputs and finds hidden
structures in the data
o Clustering ; assigns clusters or groups. The goal is to find inherent groups or
clusters within the data while maximizing inter cluster similarity and intra cluster
dissimilarity e.g. customer segmentation
o Feature learning ; features learned from unlabeled data
o Anomaly detection ; learns from unlabeled data using assumption
 Reinforcement learning (RL) ; learns from an agent and instructs it to take actions in an
environment to maximize a reward function

What differentiates between Supervised, Unsupervised and RL is that the first 2 are passive
while RL is active. Passive in the sense that learning is performed without any action that
could influence what data could be observed in the future while RL is an active approach
where the action of the agent influences the environment.

Machine Learning Tradeoffs


 Bias vs Variance
 Underfitting vs Overfitting

 Bias ; Simplifying assumptions made by a model to make the target function easier to learn
 Variance ; Amount that the estimate of the target function will change if different training
data was used

As a general rule of thumb, parametric and linear algorithms often have High bias and low
variances and Non-parametric algorithms have Low bias and High variance.
Overfitting ; memorizing the data and doesn't generalize well on new data
Underfitting ;neither modeling the training data nor generalizing to new data but over
simplifying the assumed learned function to use

Any machine learning algorithm and solution aim to reduce the prediction error which
consist of;
Prediction error = Bias Error (BE) + Variance Error (VE) + Irreducible Error (IE)

IE = This is independent on the algorithm used and can be caused by undetected variables
used to train the model
BE = This is induced by bias, a low bias correlate to few bias made by the target function,
hence less assumptions about the target functions e.g Decision trees, KNN
VE = This is influenced by the training data used to model, hence more assumptions about
the target function e.g. Linear and Logistic Regression

The goal of a Machine learning problem is to have a low bias and low variance The optimal
model complexity is where bias error crosses with variance error.

Overfitting ; Good performance on Training data and poor on new/test data where the data
has memorizing all the details of the training data.
Limiting Overfitting
 Resampling techniques like K-fold cross validation
 Hold back a validation dataset from the initial training data
 Simplify the model
 Use more data if available
 Reduce dimensionality in the training dataset

Underfitting ; Can neither model the training data nor generalize on the new/test data

In summary
 High variance causes overfitting
 Low bias causes underfitting
LESSON 3

Model training is the process through which data is transformed into trained machine learning
model. It is one of the important process in the Machine learning pipeline, it always you to build,
train and check the quality of a machine learning model.

Data Import and Transformation


There is no perfect database and data source, hence the need for a well prepared and formatted
data is vital to be fed into the machine learning pipeline. With noisy data (e.g. missing value or
outliers ), wrong result with biases will be produce, so data wrangling is very important. Data
wrangling is therefore an iterative process (try and error going on) of cleaning and transforming
data to make it more appropriate for data analysis.

The processes involve in Data wrangling include;


 Exploratory Data Analysis (EDA)
 Data Transformation
 Validate and publish the data

Data Store vs Data set

Data store ; offer a layer of abstraction over the supported azure storage services, it stores all
the information require connect to a particular storage or service. The can be shared and
accessed simultaneously by different instances of the process. They answer the question 'how
do I securely connect to the data in my Azure storage"
Different specific data files are transferred to the data stores through the data set.

Data set ; how to get specific data file and sets that contain either the training or validation data
for the machine learning task. The process of data preparation is one of the most important in a
machine learning pipeline. They are created from public datasets, url or upload from local.
Datasets are therefore references that point to the data in the service

Data store & data set offers a secured, scalable and reproducible way to deliver data into all of
machine learning task

The Data Access Workflow

Azure storage service >>>>> Data Store >>>> Dataset

Steps of the data access workflow are;


1. Create a datastore
2. Create a dataset to be used to train a machine learning model
3. Create a dataset monitor to detect issues in the data (data drift)
o The concept of data drift explains the fact that data fed to a machine learning model
ass put might change and that is what is term the data drift, hence a particular data
drift can be problematic for model accuracy. Data monitors can be setup to monitor
the advent of a data drift

Data Sets

They are used to interact with data in the datasets and are used to package consumable object
for other machine learning tasks. They are not copies a your data but rather they only reference
to the original data that is kept into the storage service, hence there is no copy or duplicate or
data that might lead to billing when a new dataset is created.
Azure ML datasets are required when you need to access data for your local or remote ML
experiments, and are also used as input and outputs for ML pipelines
 One of the importance is ability to make a copy of a data in the data store and reference it
multiple times for use
Types
 Tabular Datasets
 Web URL (File datasets)

Dataset versioning plays a key role in bookmarking a data state for retraining and tracing of the
data. It's very useful ;
o When new data is available for retraining
o When you are applying different data preparation ore feature engineering
approaches.

Feature Engineering
One of the challenge in ML is selecting the best features which are most appropriate and
suitable for the type of algorithm you are trying to model, also many times the existing features
isn't enough which results to reengineering new features to be used to train the machine
learning model which can be called Feature Engineering (FE). Similar to FE is Feature Selection
(FS) which can be used to select the best feature that can be used to train a machine learning
model in a scenario where the dataset has a large number of features the model can learn from
which wouldn’t be optimally in many cases, hence the need to reduce such features to selecting
the perfect features using different techniques of selection, one of which is called Dimensionality
Reduction

FE help increase the performance of a machine learning model leverage on existing features to
generate new features that might be useful in improving model performance. FE isn't always
necessary because there are cases where the existing data are enough and significant enough to
train the model.
Classical ML is much more reliant on FE than DeepLearning (DL)

Examples of FE tasks
 Aggregation
 Part-of (Extract a part of a certain data structure e.g. say part of a month of a date_
 Binning
 Flagging
 Frequency-based
 Embedding
 Deriving by example

FE processes are applied differently depending on the data types you are working with.

Feature Selection
FE is about creating new features in a dataset, however help in the selecting the perfect features
to model the data to have an explosion in the feature space

Reasons for FS
 Elimination of irrelevant, redundant, or highly correlated features
 Reduce dimensionality

Techniques
The curse of dimensionality given that those algorithms can perform well when there is a high
dimension. Dimension reduction algorithm include;
 PCA (Principal Component Analysis)
 T-Sne (
t-Distributed Stochastic Neighboring Entities )
 Feature embedding

Azure ML prebuilt modules ;


 Filter-based feature selection ; identify highest predictive power columns
 Permutation feature importance ; determine the best features to use.

Data Drift ; It is the change in the input data for a model. It causes degradation of model's
performance.
Causes
 Changes in the upstream process
 Data Quality issues
 Natural data drift in the data due to intrinsic nature of the data
 Changes in the relationship between different features
Data drift is one of the top reason model accuracy degrade over time. A model training process
doesn't finish after that 1st training but rather it's an iterative process and there is need to
constantly monitor the performance of the model and one of the way to do that is to
continuously the data drift.

Data drift is monitored using dataset. Scenarios for setting data drift;
 Monitoring a model's input data for drift from the model's training data
 Monitoring a time series dataset for drift from a previous time period
 Performing analysis on past data

Model Training
The goal of training in ML process is to produce a learned model that can later be used. The basic
training process or pre include;
 Understanding the data
 Preparing/transforming the data
 Creating new features (FE)
 Feature selection
 Data splitting ; Train, Test & Validation ( where Train & Validation is used in the training
process while the Test is used for evaluating and validating the outcome of the model)

Model training involves the iterative process of tuning the parameters and hyperparamaters to
improve the performance of the model.

Model Training in Azure ML


Data Science Process
 Data Collection
 Data Preparation
 Model Training ; Train, Test & validation data selection and parameters tuning
 Model Evaluation
 Model Deployment
o It is very important to retraining the model from time to time on new data towards
improving the performance of the model from time to time

Taxonomy of Azure ML
 Azure Workspace ; the interface that must be created before starting anything on Azure
ML
 Compute Instances ; They allow access to environment such as Jupyter notebook, it also
have a designer package to perform codeless task
 Datasets ; It make data available for ML process
 Experiments; Task performed in the Azure ML studio, it is a container that helps group
artifact that are grouped together
o Run ; e.g. Training, Validation etc. Actions and every run output a process
 Snapshot
 Output files
 Metrics
 Logs
 Possible compute targets ; Azure ML can run with variants data source and environment
such as; Local, DS VM, Data bricks, Kubernetes etc
 Model Registry ; A service that enable versioning of model trained for easy bookmarking or
tracing
 Deployments ;

Classification Problem ; When the target is a class


Types
 Binary Classification ; True/False, 0/1
 Multi Classifier single label classification e.g. Recognition of written number in a single class
 Multi class multi label classification ; we have multiple category and the output can belong
to more than one 1 class

Algorithms used
 Logistics Regression
 Support Vector Machine (SVM)

Regression Problem ; continues value prediction


1. Regression to arbitrary values e.g. price of house, salary
2. Regression to values between 0 & 1 - Here the output is bounded between a range which is
often related somewhat to a problem of classification

Algorithms used
 Linear Regressor
 Decision Forest Regressor

Evaluation of Model Performance


It help to check and determine the predictive power of the model developed. The test dataset is
used for evaluation of machine learning

Confusion Matrix
 Accuracy ; The proportion of correct predictions ; (TP + TN)/(TP + FP+ FN+TN)
 Precision ; It is the proportion of positive cases that were correctly identified ; TP / (TP + FP)
 Recall ; The proportion of actual proportion of cases that were correctly identified ; TP /
(TP + FN)
 F1 Score ; Measures the balance between the Recall and Precision; 2 * (Precision * Recall)/
(Precision + Recall)

Model Evaluation Chart (ROC ; Receiver Operative Characters for Classification) - The graph of
True positive rate against False Positive Rate
 The Area under the curve (0.5-1)
 Gain and Lift charts
Regression Evaluation Metrics
1. RMSE
2. MAE
3. R^2 (Coefficient of Determination); measures how good the model is in explaining the
variability of the model
4. Spearman correlation

Model Evaluation Chart


 Histogram of Residuals

Strength in Numbers
No matter how well-trained an individual model is, there is still a significant chance that it could
perform poorly. Automated ML help to scale up the process of training model by combining the
result/strength outcome of individual models

Ensemble Learning
It is used to combine multiple ML algorithm to produce one powerful predictive model with
higher accuracy. There are 3 types;
1. Bagging or Bootstrap aggregation ; This is used to reduce overfitting for tree based
algorithm such as Decision trees. It involves using random subsampling of the training data
to produce a bag of trained models

2. Boosting ; using weak learners to combine where the final predictions are weighted
average from the individual models

3. Stacking; To train a large number of completely different models and combine as output.

Automated Machine Learning


Iterative, time consuming and tasks involved in Model development such as feature selection,
model hyperparameter tuning etc. can be automated which allows for scalability at large for
efficiency
LESSON 4

Supervised & Unsupervised Learning


Supervised
 Classification
 Regression
 Automated Machine Learning

Unsupervised Learning
 Clustering
 Representation Learning

Supervised
 Classification ; when the predicted output is discrete and belong to a class e.g.
o Classification on tabular data
o Classification on image or sound data
o Classification on text data

Example;
o Computer Vision
o Speech recognition
o Sentiment analysis
o Anomaly detection
o Credit risk scoring

There are 3 c
 Two -Class Classification
o Two categories (Yes/No, True/False)
 Multi Class Single Label Classification
o Multiple categories, output belongs to single category e.g. Red, Blue, Green
 Multi-Class Multi-Label Classification; Output can belong to one or more categories

Multi-Class Algorithm Hyper-parameters

Multi-Class Logistic Regression


 Optimization tolerance
 Regularization weight

Multi-Class Neural Network


 Number of hidden nodes
 Learning rate
 Number of learning iterations

Multi-Class Decision Forest (Ensemble of Decision Tree)


 Resampling method
 Number of decision trees
 Maximum depth of the decision trees
 Number of random splits per node
 Minimum number of sample leaves

Introduction to Regression

Common types of regression problems;


 Regression on tabular data
 Regression on image or sound data
 Regression on text data

Applications of Regression problems in


House price
Customer Churn
Customer Lifetime value

Categories of Algorithms
 Linear Regression
o Linear relationship between independent variables and a numeric outcome
o Approaches
 Ordinary Least Square Method
 Gradient Descent
 Decision Tree Regression
o Ensemble Learning method using multiple decision trees
o Each tree outputs a decision
 Neural Network Regression
o Supervised Learning method, therefore required a label target

Automating the Training of Regressors


In automating the conventional ML process which include;
 Features available in the datasets
 Algorithms that are suitable for the task
 Hyperparemeters tuning
 Model evaluation
What then is Auto ML ?
It intelligently test multiple algorithms and hyper-parameters in parallel and return the best one
for deployment in production and scalability.

Unsupervised Machine Learning.

It groups those algorithms that relies only on input in the training process. It comes from the fact
that there is no expected output, the algorithm learn from the input data to discover hidden
behavioral pattern to produce an output. Unsupervised learning help discover useful information
from unlabeled data. Applications of Unsupervised ML includes;
 Anomaly detection
 Customer segmentation

Types of Algorithm;
 Clustering : Occurs when entity from the input data must be assigned into a finite number
of subsets called clusters

 Feature learning ; Learning is used to transform set of inputs into other inputs that are
potentially more useful in solving a given problem
 Anomaly Detection ; To identify 2 major groups of entity;
o Normal
o Abnormal (Anomaly)
 Application of Anomaly detection can be seen in Spam detection & Credit Card
Fraud detection
 Dimensionality Reduction

The underlying motivation for unsupervised ML problem is that labeled data used for Supervised
ML is often hard to acquire and expensive whereas acquiring unlabeled data for unsupervised
ML task is usually inexpensive. Semi Supervised ML
It leverage on the availability of the fraction of the training data that is labeled to be used for the
unlabeled data using;
 Self-training ; The model is trained using the labeled data and then used to make prediction
for the unlabeled data where the end result is a fully labeled data
 Multi-view training ; This means training multiple models on different views of the data
which mean various feature selection part of training data or various model architecture

 Self-ensemble training ; This is similar to Multi-View training only that a single based model
is used but different hyperparameters that is on different views of the data

Clustering
Clustering is a problem of organizing entities from the input data into a finite number of subsets
or clusters with the goal of maximizing;
 Intra cluster similarity
 Inter cluster dissimilarity

Applications of Clustering Algorithms

 Personalization and target marketing


 Document Classification ; To tag and classify document base on similarity
 Fraud detection ; The goal is to isolate fraud cases from historical normal activity
 Medical imaging
 City planning

Clustering can be classified into 4 different categories


1. Centroid - based Clustering ; This organizes data into clusters based on the distance of
members from the centroid of the cluster e.g. KMeans
2. Density-based Clustering; The algorithm cluster members that are closely packed together
3. Distribution based clustering ; The underlying assumption in this case is that the data has
an inherent distribution type such as normal distribution which the algorithm then use to
cluster base on the probability of a member belonging to a distribution
4. Hierarchical Clustering; The algorithm builds a tree of clusters

K-Means Clustering
It is a Centroid-based unsupervised algorithm which creates up to a target (K) number of clusters
and grouping similar members and the objective is to minimize the square error of the member
with the center of the centroid, that is minimizing intra cluster similarity

The Steps for K-Means are as follows;

Initialize Centroids >> Cluster Assignment >> Move Centroids >> Check for Convergence
LESSON 5
Classical Machine Learning vs. Deep Learning

All DL algorithms ARE ML algorithm, ML algorithms ARE NOT NECESSARILY DL algorithms

One of the factors that separate DL classical ML is it inner capability of learning new features
without explicit Feature Engineering which is common with Classical ML

What is Deep Learning ?


Machine Learning is a subset of AI that focuses of creating a programs that are capable of
learning without explicit instructions. In the advent of computational resources, special class of
algorithms base on Artificial neutral networks emerge
DL is a subset of a wider class of Neural Network (Regular & Deep where the Deep is tie to the
field of ML)

Artificial Neural Networks though were inspired by the Human brain, however it doesn't copy the
human brain.

Characteristics of Deep Learning


 It has a capability of learning multidimensional, non-linear function
 Capable of handling massive amounts of training data
 It excels with raw and unstructured data because it has an intrinsic capability of learning
new features from the data
 It has the capability of performing automatic feature extraction from data without an
explicit process
 Computationally expensive where specialized resources are required such as GPU
(Graphical Processing Unit), TPU (Tensor Processing Unit). The more complex the Neural
Network, the more the hidden layer and the more the cost required to train such Neural
Network Deep learning application

Benefits and Application of Deep Learning


 It has the capability of learning arbitrary complex function leveraging on non-parametric
approach
 They are capable of learning pattern without explicitly seeing them
 It has the capability of working with various types of data types and formats which give rise
to large DL applications
 They can work on very large data sets which is one of the limitation of some classical ML
 They can naturally be distributed for parallel training that is using multiple approach.
 They are capable of reaching on-par performance with certain human activities such as
speech recognition, machine translation

Applications of Deep Learning


 Language Translation ; The task of taking one input text in a given language and output in
another language
 Image recognition
 Speech recognition
 Time-Series applications such as Time-Series forecasting
 Predictive Analytics
 Combination with Reinforcement Learning can be used in Autonomous vehicle
 Text Analytics for Semantics detection , Summarization, Topic Modelling , Clustering task
for identifying similarities between different task/documents

Approaches to Machine Learning


Widely used approaches includes;
 Supervised ; Learns from data that contains both the inputs and expected outputs
o Applications includes;
 Classifications - Outputs are categorical
 Regression - Outputs are numerical
 Similarity Learning ; Learns from examples using a similarity function
 Feature Learning - Features are learned using labeled data
 Anomaly detection ; Learns from data labeled as normal /abnormal
 Unsupervised ; Learns from data that contains only inputs and finds hidden structures in
data
o Applications includes;
 Clustering ; Assigns entities to clusters or groups
 Feature Learning ; Features are learned from unlabeled data
 Anomaly Detection
 Reinforcement Learning ; Learns how an agent should take actions in an environment to
maximize a reward function
o Application include;
 Markov Decision Process

Semi- supervised = Supervised + Unsupervised

Specialized Cases of Model Training


There are some variations of generic classes of problems in ML that are in themselves need to be
treated individually. Specialized case includes;
 Similarity Learning which has an application in Recommender System (Supervised)
 Text Classification (Supervised)
 Feature Learning (Supervised - Classification/ Unsupervised - Clustering)
 Anomaly Detection - Supervised (Classification), Unsupervised (Clustering)
 Forecasting

Similarity Learning
This is closely related to classification and regression, however uses a different type of objective
function and mostly applied in recommender system and also used in solving verification
problem such as speech, face, etc.
Similarity learning as a supervised learning approach is treated as a classification task where a
similarity function maps pairs of entities to a finite number of similarity levels (0/1), similarly it
can be treated as a regression approach where the similarity function maps pairs of entities to
numerical values.

The main aim of a recommendation system is to recommend one or more items to users of the
system. The approach to Recommendation engine include;
1. Content - Based ; make uses of features for both users and items
2. Collaborative Filtering ; Uses only identifiers for users and Items and get information from a
matrix of ratings.

Text Classification
Text translation into numerical formats is referred to as text embedding (Word embedding &
scoring). In word embedding we are trying to transform every text in the data set into a form
numeric feature, whereas in scoring we are aiming to calculate some kind of score that is related
to the importance of the word in a text. The resulting numerical representation which is usually
as vectors is then used as an input to the classification algorithm.

Text preprocessing

Training a classification model with text follows the stages below


 Documents input or text to be preprocessed
 Text Normalization ; Tokenization, Part of Speech, Stop word removal , Stemming,
Lemmatization etc.
 Applying Feature Extraction ;
o Define Vocabulary (Corpus)
o Vectorize documents with TF-IDF (Text Frequency - Inverse Document Frequency) or
word embedding
 Supervised DL/ML Algorithms
 Classification model
 Document Labels that is output

Predicting a Classification from text

Feature Learning
Feature engineering is one of the core techniques that can be used to increase the chances of
success in solving machine learning problems. As a part of feature engineering, feature learning
(also called representation learning) is a technique that you can use to learn or derive new
features in your dataset. This is why having the right input feature in a vital prerequisite for
training a ML model.
Feature Learning is used to set of input to another input.
Approaches;
 Supervised Feature Learning ; New features are learned using data that has already been
labelled e.g.
o Data sets that have a high cardinality (One-hot encoding blows up the feature space
that is the dimensions of the data)
o Image classification
 Unsupervised Feature Learning ; Based on learning the new features without having
labeled input data.
o Clustering = a form of feature learning that is cluster identifier

Algorithms used in Feature Learning includes;


 PCA
 Independent Component analysis
 Auto-encoder
 Matrix factorization

Application of Feature Learning


 Image classification with Convolution Neural Networks (CNN):
CNN is a special case of DL neural network
o It learn hierarchy complex
 Image search
 Feature embedding (categorical features with high cardinality)

Anomaly Detection
It is a machine learning technique concerned with finding data point of interest that deviate
significantly from the norm. It can be approached as both Supervised and Unsupervised ML task.
Usually the number of abnormal entities is much more smaller than a normal entity, this mean
general Anomaly detection often create an imbalance dataset which makes it quite difficult to
solve.

Supervised Anomaly Detection


It's a binary classification problem where entities must be classified either normal(good) or
anomaly (bad). In these instance there are labeled data to be used to train the ML mode

Unsupervised Anomaly Detection


A problem of identifying two major groups (clusters) of entities, that is the normal and abnormal.
So there are no prior knowledge about the label of the input and the task is to detect the pattern
which cluster normal transactions into a group and abnormal into another

Applications of Anomaly Detection


 Condition monitoring (Industrial maintenance) and failure prevention
 Fraud detection
 Intrusion detection
 Anti-virus and anti-malware protection
 Data preparation (Outlier detection)
o Anomaly detection in Machine Maintenance ; An autoencoder is trained to recognize
normality

Forecasting
Deals with set of event that can be ordered in time. This ordering in time mean most of time
connection with date, order in time.
Types of Forecasting Algorithms
 ARIMA - AutoRegressive Integrated Moving Average (evolution of ARMA - AutoRegressive
Moving Average)
 Multi-Variate Regression - Time-series forecasting can be pictured as a form of regression
problem.
 Prophet ; work best with time series that have strong seasonal effect.
 Temporal Convolutional Network (TCN) (It is 1D convolutional) - It is capable of exhibiting a
longer memory than other types of Forecasting algorithm
 RNNs (Recurrent Neural Networks)
LESSON 6

Managed Services for Machine Learning

Conventional ML requires several tool to prepare data to deploy and can be tedious due to ;
 Lengthy installation and setup process
 Expertise to configure hardware
 Fair amount of troubleshoot

Managed Services save the day with very little setup and easy configuration for any hardware
since it is cloud based, that is they provide a ready-made environment that is pre-optimized for
machine learning development and deployment

Examples of Compute resources


 Training clusters
 Inferencing clusters
 Compute instances
 Attached compute
 Local compute
 Etc.

Compute Resources

 Training Clusters is a compute used for training model, also the local machine (local
compute) can also be used for small task. Training cluster is used for training and batch
inferencing . With the option of training clusters provides:
o Single or multi-node cluster
o Can auto scale each time you submit a run due to it elastic capacity
o Automatic cluster management and job scheduling
o It has support for both CPU & GPU resources to allow for various large task

Inferencing Compute
Inferencing is also called scoring. When you have a trained model, you will want to able to
deploy it or use it when needed
 Real time & Batch Inferencing
o Real time - To make inferences for each new rows of data in real-time, for this
packaging the model as a webservice is a desired approach.
 Azure has an Inferencing Cluster for real time inferencing
 Azure Kubernetes Service (AKS)
o Batch Inferencing - To make inferencing on multi rows of data name batches. Any
computing resource can be used that enable training and loading of such.
 Any compute resource can be used for batch inferencing
 Azure ML training cluster
 Azure functions
 Azure IoT edge
 Azrue Data Box Edge

High-level architecture of a possible deployment

Web-App >> Invokes ML model via web service >>> Model deployed in Azure Kubernetes
Service (Server Cluster) for high scalability) >>> Azure ML used to train the model and stored in
Azure Container Registry

Azure Container Instances and Compute Clusters are managed by Azure Machine Learning

Managed Notebook Environments

The training of a ML model is the process through which a mathematical model is built from data
that contain input and expected output or only input in the case of unsupervised learning

Basic Modeling
The steps involved in a generic Model training are as follows;
 Experiment : It is a generic content of handling runs, it is a folder that organizes the artifact
use in the training process
 Runs : They are used to build a trained model, it contain all artifact associated with a
training log and script for that model. Azure ML record and take a snapshot of all artifact
associated with training of the model such as logs, meta data etc.
 Model registry ; It keeps track of all the models in Azure ML workspace. A run is used to
product a model, a model is a piece of code that takes an input and produces an output
which can either be produced by a Run or originate from outside of Azure ML
o Model = algorithm + data + hyperparameters

When training a model in Azure ML, the follow steps are performed;
 Create a new experiment for the run; An experiment is a generic context for handling and
organizing runs.
 You'll want to create one or more runs within the experiment once it is created
 You then register the final model in a model registry once you identify the best model.
After registration, you can then download or deploy the registered model and receive all
the files that were registered.

Advanced Modelling
Process involved in end-to-end ML model building:
 Data ingestion
 Data preparation (such as Normalization, Transformation, Validation & Featurization)
 Model building and training (Process such as Hyper-parameter tuning, Automatic Model
selection, Model testing, Model validation)
 Model deployment (Deployment & Batch scoring)

These steps are organized into Machine Learning pipelines. They are used to create and manage
workflow. ML pipelines are made up of distinct steps as highlighted above. Another steps of ML
pipelines is that are modular and they allow for collaboration while working on separate areas of
the workflow amidst different Data scientist/ ML Engineer

Steps are organized into pipelines include;


 DevOps = process automation applied to classical software development. It enables;
o Code reproducibility
o Code testing
o App deployment

 MLOps = applying DevOps principles to Machine learning pipelines. MLOps on the other
side enables;
o Model reproducibility
o Model Validation
o Model deployment
o Model retraining
 It is DevOps for AI which includes;
 Automating the end-to-end ML lifecycle
 Monitor ML processes
 Capture traceability data

Operationalizing of Model
This is a similar term to Deploying the model somewhere for use outside the testing or
development environment. A typical model development is as follow;
 Get the mode file (any format)
 Create a scoring script(.py)
 Optionally create a schema file describing the web service input (.json)
 Create a real-time scoring web service
 Call the web service from your applications
 Repeat the process each time you want to re-train the model

Azure ML service simplifies this step. Model scoring and inferencing can be done real time (on
demand that is as it receives the data) or batch (where model is run on large quantities of
existing data, that is it is run on recurring schedule)

When is Batch inferencing used?


 No need for real-time
 Inferencing results can be persisted
 Post processing or analysis on the predictions is needed
 Inferencing is complex
LESSON 7

Responsible AI

Modern AI : Challenges and Principles


 Increasing Inequality - e.g. Health care with features that were proxy for health which
made the model bias against the poor
 Weaponization
 Unintentional Bias
 Adversarial Attacks
 Killer Drones
 Deep Fakes ; weaponization of misinformation
 Data Poisoning and Bias
 Hype ; It drives totally unrealistic expectations

Approaches to Responsible AI
 Model Explainability
o Understand Global behavior/explanation
o Understand specific predictions generated by the model
 Evaluate the model for fairness
o Who is neglected
o Who is mis-represented ?

Microsoft AI Principles
1. Fairness : All systems should treat people fairly, you should build a system from a
diversified group that detect and eliminate bias
2. Reliability and Safety ; End user need to be certain the product will be reliable and
responsible in an expected way in a certain scenario
3. Privacy and Security ; It should be secured and data collection should be transparent and it
should comply with necessary regulation
4. Inclusiveness ; It should benefit everyone and should eliminate unintentional bias.
5. Transparency ; When they help make decisions that impact people's life, it is important
there is transparency
6. Accountability ; The developers and AI experts should be able to be accountable on the
solutions developed

Transparency and Accountability are vital in achieving other principles


Privacy and security
Accountability
Reliability and safety
Fairness
Inclusiveness
Transparency

Model Transparency and Explainability


The Spectrum of model explainability
Easy to explain (Self-explanatory and less Opacity)
Neural Networks (Difficult to explain)

Explainability in Azure ML
 Direct Explainers ; It is used base on the model type and used directly to explain the code.
They are used when one know they best explainer for the job. Examples of model specific
direct explainers include;
o SHAP Tree Explainer used for Tree based or ensembles model
o SHAP Deep Explainer for DL
o Mimic explainer
o SHAP Kernel Explainer
 Meta Explainers ; They give the right explainer to use for a type of task
o Tabular Explainer
o Text Explainer
o Image Explainer

Model Fairness
FairLearn is a toolkit to identify and mitigate unfairness in machine learning models

You might also like