You are on page 1of 17

Because learning changes everything.

Chapter 07
Automated Machine Learning

Copyright 2022 © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.
What Is Automated Machine Learning (AutoML)?

Recall there are two main types of machine learning methods.


• A supervised model is one with a defined target variable.
• An unsupervised model has no target variable.
Running each supervised technique individually and comparing accuracy
results is too time consuming.
• An efficient alternative is Automated Machine Learning (AutoML).
• This is a supervised approach that explores and selects models using
different algorithms and compares their predictive performance.
Users still must understand the underlying elements involved in
developing the model.

© McGraw-Hill Education 2
What Questions Might Arise?

How was the data collected and What are the reasons behind why
prepared for analysis? the recommended model
How did the model arrive at a produced the most accurate
particular conclusion? decision?

What is the blueprint of the Are there data issues that could
model? be impacting the validity of the
model?
Why did the model arrive at a
particular conclusion? Is the model consistent in its
predictions?
What variables had the greatest
impact on the predicted outcome? Why is the model a good
predictor?
What patterns exist in the data?
How accurate is the model?

© McGraw-Hill Education 3
AutoML in Marketing

Forty percent of companies report already using machine learning to


improve sales and marketing performance.
The adoption rate for AutoML is expected to increase substantially.

Access text alternative for this image.

© McGraw-Hill Education 4
Which Companies Are Actively Using AutoML?

• Facebook. Blue Health Intelligence (BHI).


• AirBnB. United Airlines.
• Sumitomo Mitsui Card URBN.
Company (SMCC). Disney.
• Kroger. Pelephone.
• The Philadelphia 76ers. Salesforce Einstein.

© McGraw-Hill Education 5
What Are Key Steps in the Automated Machine Learning
Process?

There are four key steps.

Preparing the data.

Building models.

Creating ensemble models.

Recommending models.

© McGraw-Hill Education 6
Data Preparation

Data preparation may include handling:


• Missing data.
• Outliers.
• Variable selection.
• Data transformation.
• Data standardization in order to maintain a common format.
Invalid and unreliable data results in “garbage in, garbage out.”
• Appropriate data preparation is a fundamental first step in producing
accurate model predictions.

© McGraw-Hill Education 7
Model Building

Many models are built automatically after the analyst


specifies the dependent variable.

The purpose of a model is to extract insights from


data.

AutoML uses pre-established modeling techniques


that create access for anyone from novices to data
science experts.

© McGraw-Hill Education 8
Creating Ensemble Models

Sometimes the best approach is to combine different algorithms, blending


information from more than one model into a single “super model.”
• This type of model is referred to as an ensemble model.
• This process reduces issues such as noise, bias, and inconsistent or
skewed variance that cause prediction problems.
An ensemble model usually generates the best overall predictive
performance.
• Keep in mind that understanding how different variables have
contributed to an outcome can be difficult.

© McGraw-Hill Education 9
Simple Approaches to Ensemble Modeling

For continuous target variables, one method is to take the average of


predictions from multiple models.
• You first run each model separately to create two prediction scores.
• Then calculate the average of the two models to create a new
ensemble score.

For categorical
Another more advanced technique involves using target variables,
a weighted average. the most common
• The higher quality data would be assigned category of
greater importance and thus weighted higher. “majority” rule can
be used.

© McGraw-Hill Education 10
Advanced Ensemble Methods – Bagging

Bagging, short for “Bootstrap Aggregating” involves two main steps.


• Step 1 generates multiple random small samples from the larger
sample.
• Because the observation is not removed from the original sample, only
copied, it can be copied again and placed in a second or third sample.
• This process is referred to as “bootstrap sampling.”

Step 2 is to execute a model on each sample and then combine the


results.
• Combined results are based on taking the average of all samples for
continuous outcomes.
• Or the majority of case results for categorical variables.

© McGraw-Hill Education 11
Exhibit 7-3: Bagging (Bootstrap Aggregating)

Access text alternative for this image.


Source: Amey Naik, “Bagging: Machine Learning Through Visuals. #1: What Is ‘Bagging’ Ensemble Learning?” Medium, June 24, 2018, https://medium.com/machine-learning-through-visuals/machine-
© McGraw-Hill Education learning-through-visuals-part-1-what-is-bagging-ensemble-learning-432059568cc8. 12
Advanced Ensemble Methods – Boosting

The objective of boosting is reducing error in the model.


• Boosting achieves this by observing the error records in a model and
then oversampling misclassified records in the next model created.
During the first step, the model is applied to a sample of the data.
• A new sample is drawn that is more likely to select records that were
misclassified in the first model.
Next, the second model is applied to the new sample.
• The steps are repeated multiple times by fitting a model over and over.
The purpose of boosting is to improve performance and reduce
misclassification.
• The final model will have a better prediction performance than any of
the other models.

© McGraw-Hill Education 13
Exhibit 7-4: Boosting

Access text alternative for this image.

© McGraw-Hill Education 14
Model Recommendation

Multiple predictive models are examined and the model with the most
accurate predictions is recommended.
• Accuracy is determined by how well a model identifies relationships
and patterns in a dataset and uses this knowledge to predict outcomes.
• Higher levels of accuracy are measured based on better predictions of
observations, not in the original datasets used to develop the model.
• The most accurate prediction model(s) is then used to make better
decisions.

© McGraw-Hill Education 15
Case Study – Loan Data: Understanding When and How to
Support Fiscal Responsibility in Customers

Lending Club is a P2P lending platform who wants to reduce their default
rate from 10 to 8 percent within a year.
• A supervised model is needed to identify borrowers with a high chance
of default.
• You will upload data into DataRobot for AutoML analysis.
• After identifying the target variable, you run the model and evaluate
the results before applying the model to predict new cases.
The results of the AutoML revealed 2 customers out of 11 who were more
likely to default on their loan.
• Lending Club can send targeted messages to these customers about
bill paying, penalties, and free access to financial advisors.

© McGraw-Hill Education 16
End of main content.

Because learning changes everything. ®

www.mheducation.com

Copyright 2022 © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.

You might also like