You are on page 1of 48

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

National Guard
Black Belt Training
Module 36

Simple Linear Regression

UNCLASSIFIED / FOUO
This material is not for general distribution, and its contents should not be quoted, extracted for publication, or otherwise
copied or distributed without prior coordination with the Department of the Army, ATTN: ETF. UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO

CPI Roadmap – Analyze


8-STEP PROCESS
6. See
1.Validate 2. Identify 3. Set 4. Determine 5. Develop 7. Confirm 8. Standardize
Counter-
the Performance Improvement Root Counter- Results Successful
Measures
Problem Gaps Targets Cause Measures & Process Processes
Through

Define Measure Analyze Improve Control

ACTIVITIES TOOLS
• Value Stream Analysis
• Identify Potential Root Causes • Process Constraint ID
• Reduce List of Potential Root • Takt Time Analysis
Causes • Cause and Effect Analysis
• Brainstorming
• Confirm Root Cause to Output
• 5 Whys
Relationship
• Affinity Diagram
• Estimate Impact of Root Causes • Pareto
on Key Outputs • Cause and Effect Matrix
• FMEA
• Prioritize Root Causes
• Hypothesis Tests
• Complete Analyze Tollgate • ANOVA
• Chi Square
• Simple and Multiple
Regression

Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive. UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO

Learning Objectives
 Terminology and data requirements for conducting a
regression analysis
 Interpretation and use of scatter plots
 Interpretation and use of correlation coefficients
 The difference between correlation and causation
 How to generate, interpret, and use regression
equations

Simple Linear Regression UNCLASSIFIED / FOUO 3


UNCLASSIFIED / FOUO

Application Examples
 Administrative – A financial analyst wants to predict
the cash needed to support growth and increases in
training
 Market/Customer Research – The main exchange
wants to determine how to predict a customer’s
buying decision from demographics and product
characteristics
 Hospitality – The MWR Guest House wants to see if
there is a relationship between room service delays
and order size

Simple Linear Regression UNCLASSIFIED / FOUO 4


UNCLASSIFIED / FOUO

When Should I Use Regression?


Independent Variable (X)
Continuous Attribute
Continuous
Dependent Variable (Y)

Regression ANOVA
Attribute

Logistic Chi-Square (2)


Regression Test

The tool depends on the data type. Regression is typically used with a continuous
input and a continuous response but can also be used with count or categorical
inputs and outputs.
Simple Linear Regression UNCLASSIFIED / FOUO 5
UNCLASSIFIED / FOUO

General Strategy for Regression Modeling

Planning and • What variables?


Data Collection • How will I get the data?
• How much data do I need?

Initial Analysis and • What input variables have the biggest


Reduction of Variables effect on the response variable?
• What are some candidate prediction
models?

Select and Refine • What is the best model?


Models

Validate • How well does the model predict new


Model observations?

Simple Linear Regression UNCLASSIFIED / FOUO 6


UNCLASSIFIED / FOUO

Regression Terminology
Types of Variables
 Input Variable (Xs)
 These are also called predictor
variables or independent variables
 Best if the variables are continuous, Error
but can be count or categorical
X1
 Output Variable (Ys) Process or
X2 Y
 These are also called response
Product
X3
variables or dependent variables
(what we’re trying to predict)
 Best if the variables are continuous,
but can be count or categorical

Simple Linear Regression UNCLASSIFIED / FOUO 7


UNCLASSIFIED / FOUO

Visualize the Data – A Good Start!


Scatter Plot: A graph showing a relationship (or correlation)
between two factors or variables

 Lets you “see” patterns in data


 Supports or refutes theories about the data
 Helps create or refine hypotheses
 Predicts effects under other circumstances (be careful
extending predictions beyond the range of data used)

Be Careful
Correlation does not
guarantee causation!

Simple Linear Regression UNCLASSIFIED / FOUO 8


UNCLASSIFIED / FOUO

Correlation vs. Causation


 Correlation by itself does not imply a cause and
effect relationship!

Other examples?
Average life expectancy

Gas mileage

# divorces/10,000 Price of automobiles

Lurking
variables!
When is it correct to infer causation?

Simple Linear Regression UNCLASSIFIED / FOUO 9


UNCLASSIFIED / FOUO

Example: Mortgage Estimates


 A Belt is trying to reduce the call length for military
clients calling for a good faith estimate on a VA loan
 The Belt thinks that there is a relationship between
broker experience and call length, and creates a
scatter plot to visualize the relationship

Simple Linear Regression UNCLASSIFIED / FOUO 10


UNCLASSIFIED / FOUO

Example: Mortgage Estimate Scatter Plot


Hypothesis:
Brokers with more experience can provide
estimates in a shorter time.
60

50
Call Length

40

30

20
10 20 30
Broker Experience

Does it look like a relationship exists between Broker Experience and Call Length?
Simple Linear Regression UNCLASSIFIED / FOUO 11
UNCLASSIFIED / FOUO

Scatter Plot - Structure

Y Axis
60
Paired
(Result?) Data
50
Call Length

40

X Axis
30 ( Suspected
Influence )
20
10 20 30
Broker Experience
Paired Data?
To use a scatter plot, you must have measured two factors for a single observation or item (ex: for a
given measurement, you need to know both the call length and the broker’s experience). You have to
make sure that the data “pair-up” properly in Minitab, or the diagram will be meaningless.

Simple Linear Regression UNCLASSIFIED / FOUO 12


UNCLASSIFIED / FOUO

Input, Process, Output Context


PREDICTOR MEASURES RESULTS MEASURES
Y (X) (X) (Y)

Input Process Output


• Arrival • Customer
Time Satisfaction
• Accuracy • Total
• Cost Defects
• Key Specs • Cycle Time
• Cost

• Time Per Task


• In-Process Errors
• Labor Hours
• Exceptions
X Axis – Y Axis –
Independent Variable Dependent Variable

X
Simple Linear Regression UNCLASSIFIED / FOUO 13
UNCLASSIFIED / FOUO

Scatter Plots

No Correlation Negative Curvilinear Positive

 See how one factor relates to changes in another


 Develop and/or verify hypotheses
 Judge strength of relationship by width or tightness of
scatter

Don’t assume a causal relationship!

Simple Linear Regression UNCLASSIFIED / FOUO 14


UNCLASSIFIED / FOUO

Exercise: Interpreting Scatter Plots


1. As a team, review assigned Scatter Plots – see next pages
2. What kind of correlation do you see? (Name)
3. What does it mean?
4. What can you conclude?
5. What data might this represent? (Example)

Simple Linear Regression UNCLASSIFIED / FOUO 15


UNCLASSIFIED / FOUO

Example One

Simple Linear Regression UNCLASSIFIED / FOUO 16


UNCLASSIFIED / FOUO

Example Two

Simple Linear Regression UNCLASSIFIED / FOUO 17


UNCLASSIFIED / FOUO

Example Three

Simple Linear Regression UNCLASSIFIED / FOUO 18


UNCLASSIFIED / FOUO

Minitab Example: Scatter Plot


 Next, we will work through a Minitab example using
data collected at the Anthony’s Pizza company
 The Belt suspects that the customers have to wait too
long on days when there are many deliveries to make
at Anthony’s Pizza

Simple Linear Regression UNCLASSIFIED / FOUO 19


UNCLASSIFIED / FOUO

Minitab Example: Pizza Scatter Plot


 A month of data was collected, and stored in the
Minitab file Regression-Pizza.mtw

Simple Linear Regression UNCLASSIFIED / FOUO 20


UNCLASSIFIED / FOUO

Pizza Scatter Plot (Cont.)


1. Open worksheet
Regression-Pizza.mtw
2. Choose Graph>Scatterplot

Simple Linear Regression UNCLASSIFIED / FOUO 21


UNCLASSIFIED / FOUO

Pizza Scatter Plot (Cont.)


When you click on Scatterplots,
this is the first dialog box that
comes up
3. Select the Simple Scatterplot

4. Click on OK to move to the


next dialog box

Simple Linear Regression UNCLASSIFIED / FOUO 22


UNCLASSIFIED / FOUO

Pizza Scatter Plot (Cont.)

5. Double click on
C5 Wait Time to enter it
as the Y variable, then
double click on
C6 Deliveries to enter it
as the X variable

6. Edit dialog box options


(Optional)

7. Click OK

Simple Linear Regression UNCLASSIFIED / FOUO 23


UNCLASSIFIED / FOUO

Pizza Scatter Plot (Cont.)


Does it look like the number of Deliveries
influences the customer’s Wait Time?

Scatterplot of Wait Time vs Deliveries


55

50
Wait Time

45

40

35
10 15 20 25 30 35
Deliveries

Simple Linear Regression UNCLASSIFIED / FOUO 24


UNCLASSIFIED / FOUO

Pizza Scatter Plot (Cont.)

Note: Hold your cursor over any


point on the Scatterplot and Minitab will identify the
Row, X-Value and Y-Value for that point

Simple Linear Regression UNCLASSIFIED / FOUO 25


UNCLASSIFIED / FOUO

Correlation Coefficients (r & r2)


 Numbers that indicate the strength of the correlation
between two factors
r - strength and the direction of the relationship
 Also called Pearson’s Correlation Coefficient
 r2 - percentage of variation in Y attributable to the
independent variable X.
 Adds precision to a person’s visual judgment about
correlation
 Test the power of your hypothesis
 How much influence does this factor have?
 Are there other, more important, “vital few” causes?
Simple Linear Regression UNCLASSIFIED / FOUO 26
UNCLASSIFIED / FOUO

Interpreting Correlation Coefficients


 r falls on or between -1 and 1
 Calculate in Minitab
 Figures below -0.65 and above
0.65 indicate a meaningful
correlation
 1 = “Perfect” positive correlation
r=0
 -1 = “Perfect” negative
correlation
 Use to calculate r2

r=-.8

Simple Linear Regression UNCLASSIFIED / FOUO 27


UNCLASSIFIED / FOUO

Pearson Correlation Coefficient (r) – Mortgage


 Betty Black Belt used the scatter plot to get a visual
picture of the relationship between broker experience
and call length
 Now she uses the Pearson Correlation Coefficient, r,
to quantify the strength of the relationship
60

50
Call Length

40
r = - 0.896
30
(a strong negative correlation)
20
10 20 30
Broker Experience

Simple Linear Regression UNCLASSIFIED / FOUO 28


UNCLASSIFIED / FOUO

Exercise: Correlation
 The scatter plot shows that the customers are waiting
longer when Anthony’s Pizza has to make more
deliveries
 Next, the Belt wants to quantify the strength of that
relationship
 To do that, we will calculate the Pearson Correlation
Coefficient, r

Simple Linear Regression UNCLASSIFIED / FOUO 29


UNCLASSIFIED / FOUO

Pizza Correlation
1. Choose Stat > Basic Statistics > Correlation

Simple Linear Regression UNCLASSIFIED / FOUO 30


UNCLASSIFIED / FOUO

Correlation Input Window

2. Double click on C5 Wait


Time and C6 Deliveries
to add them to the
Variables box
3. Uncheck the box,
Display p-values
4. Click OK

Simple Linear Regression UNCLASSIFIED / FOUO 31


UNCLASSIFIED / FOUO

Correlation Coefficient

Since r, the Pearson correlation, is 0.970, there is a meaningful


correlation between the wait time and number of deliveries

Simple Linear Regression UNCLASSIFIED / FOUO 32


UNCLASSIFIED / FOUO

Interpreting Coefficients – r2
 First, we obtained r from the Correlation analysis
 Next, in Regression, we will look at r2 to see how good our
model (regression equation) is
 r2: Compute by multiplying r x r (Pearson correlation
squared)

 Example: With an r value of .970, in the Pizza example,


the team computed r2 :
.970 x .970 = .941 or 94.1%
 So, 94% of the variation in wait time is explained by the
variability in deliveries

Simple Linear Regression UNCLASSIFIED / FOUO 33


UNCLASSIFIED / FOUO

Regression Analysis
 Regression Analysis is used in conjunction with
Correlation and Scatter Plots to predict future
performance using past results
 While Correlation shows how much linear relationship
exists between two variables, Regression defines the
relationship more precisely
 Use this tool when there is existing data over a
defined range
 Regression analysis is a tool that uses data on
relevant variables to develop a prediction equation, or
model

Simple Linear Regression UNCLASSIFIED / FOUO 34


UNCLASSIFIED / FOUO

Linear Regression
 In Simple Linear Regression, a single variable “X” is
used to define/predict “Y”

 e.g.; Wait Time = B1 + (B2) x (Deliveries) +  (error)


 Simple Regression Equation: Y = B1 + (B2) x (X) + 
Y B2 = Slope

y
x

Simple Linear Regression UNCLASSIFIED / FOUO 35


UNCLASSIFIED / FOUO

Exercise: Regression
 Since the Pearson Correlation (r) was .970, we know
that there is a strong positive correlation between the
number of deliveries and the wait time
 Next, the Belt would like to get an equation to predict
how long the customers will be waiting

Simple Linear Regression UNCLASSIFIED / FOUO 36


UNCLASSIFIED / FOUO

Regression (Cont.)
1. Choose Stat>Regression>Fitted Line Plot

Simple Linear Regression UNCLASSIFIED / FOUO 37


UNCLASSIFIED / FOUO

Fitted Line Input Window

2. Double click on
C5 Wait Time to enter it as
the Response (Y) variable
3. Double click on
C6 Deliveries to enter it as
the Predictor (X) variable
4. Make sure Linear is checked
for the type of Regression
5.Edit dialog box options
(Optional)
6. Click OK

Simple Linear Regression UNCLASSIFIED / FOUO 38


UNCLASSIFIED / FOUO

Pizza Regression Plot


Fitted Line Plot
Wait Time = 32.05 + 0.5825 Deliveries
55
S 1.11885
R-Sq 94.1%
R-Sq(adj) 93.9%
50
Wait Time

45

40

35
10 15 20 25 30 35
Deliveries

Simple Linear Regression UNCLASSIFIED / FOUO 39


UNCLASSIFIED / FOUO

Regression Analysis Results – Session Window

Prediction Equation
(Regression Model)

R-Sq is the amount of variation in the data explained by the model.


Notice that 94.1 = .970 * .970. R-Sq is the square of the Pearson
correlation from the previous analysis.
Simple Linear Regression UNCLASSIFIED / FOUO 40
UNCLASSIFIED / FOUO

Using the Prediction Equation


 If we have 20 deliveries to make, how long will the
customer have to wait for their order?
 Based on our 30 minute guarantee, how acceptable is
our performance?

Simple Linear Regression UNCLASSIFIED / FOUO 41


UNCLASSIFIED / FOUO

Method of “Least Squares”


Regression – Technical Note
Fitted Line Plot
Wait Time = 32.05 + 0.5825 Deliveries
55


50
“fitted” observation
(the line)
Wait Time

45

Y
40
true observation
(the data point)
35
10 15 20 25 30 35
Deliveries

Minitab will find the “best fitting” line for us. How does it do that?
•We want to have as little difference as possible between the true observations and
the fitted line
•Minitab minimizes the sums of squares of the distance between the fitted and true
observations
Simple Linear Regression UNCLASSIFIED / FOUO 42
UNCLASSIFIED / FOUO

Multiple Regression
 Use this when you want to consider more than one
predictor variable
 The benefit is that you might need more predictors to
create an accurate model
 In the case of our Anthony’s Pizza example, we may
want to look at the impact that incorrect orders,
damaged pizzas, and cold pizzas have on wait time

Simple Linear Regression UNCLASSIFIED / FOUO 43


UNCLASSIFIED / FOUO

Individual Exercise: Pizza


 As a Anthony’s Pizza Belt, you suspect that the number of
pizza defects increases when more pizzas are ordered.
You want to visualize the data and quantify the relationship
 Use the Minitab file Pizza Exercise.mtw data to
investigate the relationship between “Total Pizzas” and
“Defects”
 Create a scatter plot
 Determine correlation
 Create a fitted line plot
 Determine the prediction equation

How many defects do we usually have when 50 pizzas are


on order? What do you think of this model?
Simple Linear Regression UNCLASSIFIED / FOUO 44
UNCLASSIFIED / FOUO

Another Exercise: Absentee Rate


 The human resources director of a chain of fast-food
restaurants studied the absentee rate of employees.
Whenever employees called in sick, or simply did not
show up, the restaurant manager had to find
replacements in a hurry, or else work short-handed
 The director had data on the number of absences per
100 employees per week (Y) and the average number
of months’ experience at the restaurant (X) for 10
restaurants in the chain. The director expected that
long-term employees would be more reliable and
absent less often

Simple Linear Regression UNCLASSIFIED / FOUO 45


UNCLASSIFIED / FOUO

Absentee Rate
1. Open an blank Minitab worksheet Experience Absences
and input the data 18.1 31.5
2. Create a scatter plot and decide 20.0 33.1
whether a straight line is a 20.8 27.4
reasonable model 21.5 24.5
3. Conduct a regression analysis and 22.0 27.0
get the linear prediction equation 22.4 27.8
4. Predict the number of absences for 22.9 23.3
employees with 19.5 months of 24.0 24.7
experience
25.4 16.9
27.3 18.1

Simple Linear Regression UNCLASSIFIED / FOUO 46


UNCLASSIFIED / FOUO

Takeaways
 Start with a visual tool – create a scatter plot
 Determine the Pearson correlation coefficient, r, to
determine the strength of the relationship
 Remember that correlation does not guarantee
causation!
 Create and interpret the Regression Plot
 Use the prediction equation
 Validate the prediction model’s r-squared using new
data (not part of the data set used in creating the
prediction equation)

Simple Linear Regression UNCLASSIFIED / FOUO 47


UNCLASSIFIED / FOUO

What other comments or questions


do you have?

UNCLASSIFIED / FOUO

You might also like