Professional Documents
Culture Documents
Course Notes
SAS® Enterprise Miner™ Tour: Hands-on Workshop Course Notes was developed by Terry Woodfield
and Jeff Thompson. Additional contributions were made by Tom Grant and Lincoln Groves.
Instructional design, editing, and production support was provided by the Learning Design and
Development team.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
Copyright © 2020 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise,
without the prior written permission of the publisher, SAS Institute Inc.
ISBN 978-1-952363-51-1
For Your Infor mation iii
Table of Contents
1.1 Introduction......................................................................................................1-3
2.1 Introduction......................................................................................................2-3
3.1 Introduction......................................................................................................3-3
4.1 Introduction......................................................................................................4-3
To learn more…
For information about other courses in the curriculum, contact the
SAS Education Division at 1-800-333-7660, or send e-mail to
training@sas.com. You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course
Catalog.
For a list of SAS books (including e-books) that relate to the topics
covered in this course notes, visit https://www.sas.com/sas/books.html or
call 1-800-727-0025. US customers receive free shipping to US
addresses.
Lesson 1 Introduction to SAS®
Enterprise Miner™
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Introduction 1-3
1.1 Introduction
2
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
3
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Lesson 1 Introduction to SAS® Enterprise Miner™
Project panel
4
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Properties panel
5
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Introduction 1-5
Help panel
6
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Diagram workspace
7
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Lesson 1 Introduction to SAS® Enterprise Miner™
Process flow
8
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Node
9
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Introduction 1-7
10
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
• Append
• Data Partition
• File Import
• Filter
• Input Data
• Merge
• Sample
11
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Lesson 1 Introduction to SAS® Enterprise Miner™
• Association • Multiplot
• Cluster • Path Analysis
• DMDB • SOM/Kohonen
• Graph Explore • StatExplore
• Link Analysis • Variable Clustering
• Market Basket • Variable Selection
12
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
• Drop
• Impute
• Interactive Binning
• Principal Components
• Replacement
• Rules Builder
• Transform Variables
13
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Introduction 1-9
• AutoNeural • MBR
• Decision Tree • Model Import
• Dmine Regression • Neural Network
• DMNeural • Partial Least Squares
• Ensemble • Regression
• Gradient Boosting • Rule Induction
• Least Angle Regression • Two Stage
14
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
• Cutoff
• Decisions
• Model Comparison
• Score
• Segment Profile
15
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Lesson 1 Introduction to SAS® Enterprise Miner™
16
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
17
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Introduction 1-11
• Incremental Response
• Survival
18
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
• TS Correlation
• TS Data Preparation
• TS Decomposition
• TS Dimension Reduction
• TS Exponential Smoothing
• TS Similarity
19
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Lesson 1 Introduction to SAS® Enterprise Miner™
• Credit Exchange
• Interactive Grouping
• Reject Inference
• Scorecard
20
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
• Text Cluster
• Text Filter
• Text Import
• Text Parsing
• Text Profile
• Text Rule Builder
• Text Topic
21
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Introduction 1-13
Analytic workflow
Integrate deployment
Define analytic objective
Gather results
Validate input data
22
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Pattern Discovery
Predictive Modeling
23
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Lesson 1 Introduction to SAS® Enterprise Miner™
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 2 Accessing and Assaying
Prepared Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Introduction 2-3
2.1 Introduction
Objectives
• Open SAS Enterprise Miner.
• Open a project.
• Create a data source.
• Explore the data source.
• Identify unwanted cases.
33
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Lesson 2 Accessing and Assaying Prepared Data
Data Source
Select table.
Define variable roles.
Define measurement levels.
Define table roles.
SAS
Foundation
Server
Libraries
55
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Analysis plan:
• Select segmentation inputs.
• Select the number of segments to create.
• Create segments with the Cluster tool.
• Interpret the segments.
6
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-5
This demonstration illustrates opening SAS Enterprise Miner, opening a project, and def ining a data
source. The analysis continues into the next chapter.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Lesson 2 Accessing and Assaying Prepared Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-7
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Lesson 2 Accessing and Assaying Prepared Data
5. Change the SAS Enterprise Miner Interactive Sampling def aults. From the main menu bar, select
Options Preferences.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-9
6. Change the Sample Method property under Interactive Sampling to Random and the Fetch Size
property to Max. Click OK.
7. Af ter you expand the Data Sources and Diagrams f olders, the Project panel appears as below:
Note: The INS5050 data source and the prediction and segmentation diagrams are used later.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Lesson 2 Accessing and Assaying Prepared Data
8. To create the data source f or this demonstration, right-click Data Source and select
Create Data Source.
10. The f irst step is to select a metadata source. Select SAS Table f rom the drop-down menu
and click Next.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-11
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Lesson 2 Accessing and Assaying Prepared Data
13. Double-click the Dmtour library to show the data tables in the library.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-13
15. Click Next three times until the window below appears.
Note: For this demonstration, no changes are needed in the role or level settings f or the
variables.
16. Click Next two times. The data source role of Raw is correct f or this demonstration.
Note: In a later demonstration, the role of the data source must be changed.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Lesson 2 Accessing and Assaying Prepared Data
17. Click Next. Step 8 provides summary details of the data source. Click Finish.
The data source is now created as shown in the Properties panel f or the project.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-15
This demonstration illustrates assaying and exploring a data source. Steps include selecting a
random sample, exploring data with graphs, and identif ying and f iltering unwanted cases.
2. Select the variables LocX, LocY, LogRegPop, and RegPop by holding down the Ctrl key
as you select their names.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Lesson 2 Accessing and Assaying Prepared Data
3. To change the role to Rejected, click in the Role column next to one of the selected variables.
Note: Exploring Source Data The log variables are actually the log of one plus the original
variable. This is done to avoid creating missing values.
5. Select all the remaining variables with the role of Input by f irst selecting one and then holding
down the Ctrl key and selecting the remaining input variables.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-17
6. Click Explore.
Note: Histograms are produced f or the six selected variables. Based on these plots, the
segments are constructed using LogMedHHInc, RegDens, and MeanHHSz. Bef ore
creating the segments, explore the data f urther to identif y and, if necessary, f ilter
unwanted cases.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Lesson 2 Accessing and Assaying Prepared Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Creating a SAS Enterprise Miner Data Source 2-19
11. Select the spike at zero and then restore the plot to its initial size by double-clicking again
on its title bar.
Note: The spike at zero is an indication that some census tracts have zero median household
incomes.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Lesson 2 Accessing and Assaying Prepared Data
12. Expand the data table window. Notice that some rows have zeros f or several of the variables.
Note: These cases are census tracts with no household income. They should be excluded f rom
the data bef ore you perf orm the segmentation analysis.
13. Close the Explore window. Close the Variables window by clicking OK.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 3 Introduction to Pattern
Discovery
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Introduction 3-3
3.1 Introduction
Pattern Discovery
3
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Pattern Discovery
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-4 Lesson 3 Introduction to Pattern Discovery
Novelty detection
Profiling
AC Sequence analysis
B
5
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-5
Unsupervised Classification
Training Data Training Data
case 1: inputs, ? case 1: inputs, cluster 1
case 2: inputs, ? case 2: inputs, cluster 3
case 3: inputs, ? case 3: inputs, cluster 2
case 4: inputs, ? case 4: inputs, cluster 1
case 5: inputs, ? case 5: inputs, cluster 2
new new
case case
7
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-6 Lesson 3 Introduction to Pattern Discovery
This demonstration illustrates how to use the Cluster tool to segment the cases in the
CENSUSTRACT data set. Steps include selecting segment variables, specifying the number of
segments, and exploring and prof iling the segments.
1. Continue f rom the modif ied dmtour project of the previous demonstration. Right-click and open
the segmentation diagram. You should see the diagram, as shown below.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-7
2. Select the CENSUSTRACTS data set and drag it into the segmentation diagram.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-8 Lesson 3 Introduction to Pattern Discovery
3. Position the mouse pointer on the right side of the CENSUSTRACTS node, and a pencil icon
appears. Then click and drag a connection to the lef t side of the Filter node. The segmentation
diagram should appear as seen below.
Note: In the previous lesson, some unwanted cases were identif ied in the CENSUSTRACTS
data. We need to conf igure the Filter and Cluster nodes to eliminate these cases and
then to generate appropriate segments f or the data. Your instructor will review the
changes that have been made to the def ault settings in the Filter and Cluster nodes as
the demonstration continues.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-9
4. Select the Filter node so that the segmentation diagram looks as shown below. The Filter node’s
properties appear in the Properties panel.
Recall that cases with a LogMedHHInc (log median household income) value of zero are
to be eliminated f rom the analysis bef ore creating segments of the data.
The f irst step is to set the Def ault Filtering Method property to User-Specified Limits f or the
Interval Variables.
Note: Under its def ault settings, the Filter node is an outlier eliminator. For Interval-Valued
Variables, it f inds observations that are greater than 3 standard deviations away f rom
the mean of the variable and eliminates them. The def ault f iltering method for interval
variables is Standard Deviation from the Mean.
5. Setting the Def ault Filtering Method property to User-Specified Limits tells SAS Enterprise
Miner that you want to set limits f or f iltering. To see what these are, select the ellipsis next to the
Interval Variables property.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-10 Lesson 3 Introduction to Pattern Discovery
6. Under the limits set below, any case with values f or LogMedHHInc less than 0.1 is eliminated
f rom the analysis.
7. Click OK.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-11
9. Click Results.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-12 Lesson 3 Introduction to Pattern Discovery
10. The output indicates that 1082 cases have variables whose values f all below the thresholds set
above.
12. Select the Cluster node. The properties of the Cluster node should appear in the Properties
panel.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-13
The def ault f or the Use column is Yes f or input roles and No f or rejected roles.
Based on exploration of the CENSUSTRACTS data in the previous lesson, the segmenting
variables are MeanHHSz, RegDens, and LogMedianHHInc.
Consequently, LogMeanHHSz, LogRegDens, and MedianHHInc have the Use column set to
No.
15. Right-click the Cluster node and select Run f rom the menu.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-14 Lesson 3 Introduction to Pattern Discovery
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-15
You can see f rom the values f or LogMedHHInc that cluster 4 has the highest income. Other
tools are available in SAS Enterprise Miner to prof ile the segments.
19. Click the Assess tab. Drag the Segment Profile tool onto the diagram and connect
it to the Cluster node.
You of ten want to use variables to prof ile segments that were not used to construct the
segments.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-16 Lesson 3 Introduction to Pattern Discovery
20. Select the Segment Profile node and select the ellipsis next to the Variables property.
Note: Two new variables, _SEGMENT_ and _SEGMENT_LABEL_, are in the results data set
generated by the Cluster node.
21. Select the three log-input variables and change the Use column to No. Select the remaining
three input variables and _SEGMENT_, and change Use to Yes.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Segmentation Analysis 3-17
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-18 Lesson 3 Introduction to Pattern Discovery
The red clear histograms show the distribution of the entire population so that it is easy to comp
are the distribution of each segment with the population.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Further Exploration of Segments (Self-Study) 3-19
Note: The Graph Explore tool can also be used to profile segments. One f eature of the Graph
Explore node is that the graphs and plots that it generates persist when the node is
closed.
3. Under the Sample Properties group, change the Method property to Random and Size to Max.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-20 Lesson 3 Introduction to Pattern Discovery
The CENSUSTRACTS data is ordered geographically. Consequently, taking only the top portion
f or exploring gives misleading results.
4. Select the ellipsis next to the Variables property. For LocX and LocY, change the values f rom
No to Yes in the Use column. Click OK.
5. Run the Graph Explore node and select Results when it is completed.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Further Exploration of Segments (Self-Study) 3-21
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-22 Lesson 3 Introduction to Pattern Discovery
9. Click Next.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Further Exploration of Segments (Self-Study) 3-23
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-24 Lesson 3 Introduction to Pattern Discovery
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Further Exploration of Segments (Self-Study) 3-25
16. Smaller markers will improve the plot. To change the marker size, right-click the graph and select
Graph Properties.
17. Clear the Autosize Markers box and slide the size scale to the lef t to size 3 as shown above.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-26 Lesson 3 Introduction to Pattern Discovery
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Further Exploration of Segments (Self-Study) 3-27
The census tracts in segment 2 are highlighted in the US graph and in the data table.
22. Select dif ferent bars in the bar chart to see how the other segments are distributed
geographically.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RI GHTS RESERVED.
3-28 Lesson 3 Introduction to Pattern Discovery
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 4 Introduction to
Predictive Modeling
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Introduction 4-3
4.1 Introduction
Predictive Modeling
33
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Database marketing
Fraud detection
Process monitoring
Pattern detection
44
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-4 Lesson 4 Introduction to Predictive Modeling
Analysis plan:
• Define modeling data.
• Build and compare predictive models.
55
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Primary
Decision Tree Regression Neural Network
Multiple Model
Ensemble Two Stage
77
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Introduction 4-5
99
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Score Data
case 1: inputs ?
case 2: inputs ?
Only input values known
case 3: inputs ?
case 4: inputs ?
case 5: inputs ?
10 10
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-6 Lesson 4 Introduction to Predictive Modeling
Predictions
Training Data Predictions
case 1: inputs target prediction
case 2: inputs target prediction
case 3: inputs target prediction
case 4: inputs target prediction
case 5: inputs target prediction
Score Data
case 1: inputs ? prediction
case 2: inputs ? prediction
case 3: inputs ? prediction
case 4: inputs ? prediction
case 5: inputs ? prediction
11 11
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
new
case
Predict new cases.
Optimize complexity.
12 12
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Introduction 4-7
13 13
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Decision Predictions
Training Data Decisions
case 1: inputs target primary A trained model
case 2: inputs target secondary uses input
case 3: inputs target tertiary measurements
case 4: inputs target primary to make the best
case 5: inputs target secondary decision for each
case.
14 14
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-8 Lesson 4 Introduction to Predictive Modeling
Redundancy Irrelevancy
x2 x4
0.70
0.60
0.50
0.40
x1 x3
15 15
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Optimize complexity.
16 16
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Introduction 4-9
Fool’s Gold
I struck it rich!
17 17
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Model Complexity
Too flexible
Not flexible
enough
18 18
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-10 Lesson 4 Introduction to Predictive Modeling
Data Splitting
19 19
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-11
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-12 Lesson 4 Introduction to Predictive Modeling
This demonstration illustrates constructing and comparing predictive models. Steps include
accessing a data source, f itting a decision tree, imputing missing data, fitting a logistic regression
model, f itting a neural network, and comparing the three f itted models on validation data.
1. To open the prediction diagram, right-click prediction and select Open f rom the menu.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-13
2. Select the INS5050 data source node so that its properties are displayed.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-14 Lesson 4 Introduction to Predictive Modeling
5. Scroll down until you see the variable Ins. Click on just this variable and change the role to
Target.
6. Select the variables, as shown below, by holding down the Ctrl key and clicking their names.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-15
7. Click Explore.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-16 Lesson 4 Introduction to Predictive Modeling
Notice that the histogram f or Ins shows that the number of event cases (cases where INS=1)
and nonevent cases (INS=0) are equal. The training data were constructed in this way.
The cases where Ins equals 1 are shaded. By comparing the distribution of the event cases, you
can identif y variables that are associated with the target and thus might be good choices to
include in a model.
9. Expand the DMTOUR.INS5050 data table window by selecting the maximize button.
Note: Some cases have missing values. For some modeling methods, missing values can
cause problems, and the data must be f urther prepared by replacing the missing values
in a reasonable way.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-17
10. Close the Explore window, and close the Variables window by clicking OK.
11. Partition the raw data f or honest assessment of the models. To do this, drag the Data Partition
node onto the workspace and connect it to the INS5050 data source as shown below.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-18 Lesson 4 Introduction to Predictive Modeling
Af ter you select the Data Partition node, its properties are displayed.
12. Scroll down in the properties of the Data Partition node and c hange the values of the Data Set
Allocations property as shown below.
13. Run the f low f rom the Data Partition node. Select the results when the run is completed.
The output includes the distribution of the data partitioned by the target.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-19
16. Drag the Decision Tree tool (second f rom lef t) onto the workspace, and connect it as shown
below.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-20 Lesson 4 Introduction to Predictive Modeling
18. Change the Assessment Measure property to Misclassification. (In this example, the
Assessment Measure property is equivalent to the def ault, so this step was not necessary.)
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-21
21. Select the f irst pair of bars in the Leaf Statistics window.
Note: The terminal leaf associated with the selected bars is outlined in the Treemap window,
and the tree plot f ocuses on the same leaf .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-22 Lesson 4 Introduction to Predictive Modeling
Note: You can display other charts by selecting f rom the menu that appears when the down
arrow is selected.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-23
Bef ore a logistic regression or neural network model is f it, the missing data must be replaced
with some reasonable value.
26. Select the Modify group and drag the Impute tool onto the workspace.
27. Connect the Impute node to the Partition node as shown below.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-24 Lesson 4 Introduction to Predictive Modeling
29. Run the Impute node and select Results when the run is completed.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-25
31. Select the Model group and drag the Neural Network and Regression tools onto the
workspace and connect them to the Impute node as shown below.
Note: The properties f or each node need to be changed so that they use the same criterion f or
optimizing complexity as the decision tree.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-26 Lesson 4 Introduction to Predictive Modeling
32. Select the Regression node and scroll down to the Model Selection properties .
34. Change the Selection Criterion property to Validation Misclassification to agree with the
decision tree.
36. Scroll down to the Train properties and change the Model Selection Criterion property
to Misclassification to match the Decision Tree and Regression nodes.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Fitting and Comparing Predictive Models 4-27
39. Connect the Regression node, the Neural Network node, and f inally the Decision Tree node
to the Model Comparison node as shown below.
40. Run the f low by right-clicking the Model Comparison node and selecting Run. All preceding
nodes that were not yet run will also run in the correct order.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-28 Lesson 4 Introduction to Predictive Modeling
Assessment inf ormation is calculated for all three models on both the training and validation
data sets.
Based on the common assessment criterion of validation misclassification, the neural network
model is the best.
Note: In the next lesson, you see how to score prospects using the chosen model. Because
the neural network is only slightly better than the decision tree, and a decision tree is
easier
to understand, the scoring will be done with the decision tree.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 5 Model Implementation
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-3
Model Implementation
Training Data ◼ Internally scored data sets
case 1: inputs target=1 ◼ Score code modules
case 2: inputs target=0
case 3: inputs target=0
case 4: inputs target=1
case 5: inputs target=0
Score Data
case 1: inputs ? prediction
case 2: inputs ? prediction
case 3: inputs ? prediction
case 4: inputs ? prediction
case 5: inputs ? prediction
22
C o p y r i g h t © S AS In s t i tu t e In c. Al l r i g h t s re s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-4 Lesson 5 Model Implementation
This demonstration illustrates scoring in SAS Enterprise Miner. Steps include creating a score data
source, choosing the scoring model, and scoring within SAS Enterprise Miner. Options f or creating
external scoring modules are also illustrated.
The demonstration continues in the prediction diagram that is presented in the previous lesson.
1. Right-click Data Sources in the Project panel and select Create Data Source.
2. Click Next.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-5
3. Click Browse.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-6 Lesson 5 Model Implementation
Because this data set will be scored, its role needs to be changed to Score.
7. Select the down arrow at the end of the Role f ield. (The f ield currently has a value of Raw.)
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-7
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-8 Lesson 5 Model Implementation
10. Drag the PROSPECTS data source into the Prediction Diagram.
11. From the Assess group, drag the Score tool onto the workspace.
13. Connect the Decision Tree node to the Score node as shown below. (The diagram workspace
has been magnif ied to assist in viewing.) Note that we select the Decision Tree because it is
easier to explain to most audiences and perf orms equivalently to the neural network model.
14. Run the PROSPECTS data source node and then browse the data by clicking the ellipsis next to
Exported Data. This reveals that there is no target and that there is missing data.
15. Right-click the Score node and select Run f rom the menu. Select Results when the run is
complete.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-9
16. The Results window appears and contains f our windows: Optimized SAS Code, SAS Code,
Output, and Output Variables.
The Output Variables window shows the scored data set that is created by the Score node. The
data set includes several new variables, including P_Ins0 and P_Ins1.
P_Ins1 is the predicted probability that a case will respond to the promotion based on the model.
17. You revisit the results f rom the Score node af ter viewing the scored data set. Close the Results
window.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-10 Lesson 5 Model Implementation
18. To browse the scored data set, click the ellipsis next to Exported Data in the Properties panel
of the Score node.
19. Select the Score data set and then click Explore.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-11
The column labeled Probability for level 1 of Ins is the P_Ins1 variable mentioned previously.
(The column heading has been expanded.)
21. Close the Explore window and close the Exported Data window to return to the process f low.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-12 Lesson 5 Model Implementation
The Base SAS score code can be browsed in the SAS Code wind ow shown below.
23. The Score node also generates score code in C and Java, but these properties are turned of f by
def ault in the Score node. Close the Results window.
24. In the Properties panel of the Score node, under Score Code Generation, change the C Score
and Java Score properties to Yes.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-13
25. Rerun the Score node by right-clicking the node and selecting Run. View the results when the
run is complete.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-14 Lesson 5 Model Implementation
Note: The Java score code can be browsed in a manner similar to that shown in the
demonstration above.
Note: The score code can be prepared f or executing outside of SAS Enterprise Miner in
several ways.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Scoring with SAS Enterprise Miner 5-15
30. The f iles can be exported f rom the Score node by selecting File Save As.
Note: The f ile can be saved in any location by navigating to the desired f older.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-16 Lesson 5 Model Implementation
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.