You are on page 1of 17

Correlation

• The primary purpose of linear Correlation Analysis is to measure the strength of


linear association between two variables (X and Y).
• If X increases with no definite change in the value of Y, there is no correlation or
no association between X and Y.
• If X increases and there is a shift in the value of Y there is a correlation.
• The correlation is positive when Y tends to increase with an increase in X and
negative when Y tends to decrease with an increase in X.
• If the ordered pairs (X, Y) tend to follow a straight line path there is a linear
correlation.
• The preciseness of the shift in Y as X increases determines the strength of the
linear correlation.
• To conduct a linear correlation analysis we need:
– Bivariate Data – Two pieces of data that are variable
– Bivariate data is comprised of ordered pairs (X/Y)
– X is the independent variable
– Y is the dependent variable

3 © Open Source Six Sigma, LLC


Correlation Coefficient

Ho: No Correlation Ho ho ho….

Ha: There is Correlation


Ha ha ha….

The Correlation Coefficient always assumes a value between –1 and +1.

The Correlation Coefficient of the population, R, is estimated by the sample


Correlation Coefficient, r:

4 © Open Source Six Sigma, LLC


Types and Magnitude of Correlation

Strong Positive Correlation Moderate Positive Correlation Weak Positive Correlation

Output
Output
Output

Input Input Input


Strong Negative Correlation
Output
Moderate Negative Correlation Weak Negative Correlation
Output

Output
Input Input Input

5 © Open Source Six Sigma, LLC


Limitations of Correlation

• The magnitude of the Correlation Coefficient is somewhat relative and should be


used with caution.
• As usual statistical significance is judged by comparing a P-value with the chosen
degree of alpha risk.

• Guidelines for practical significance are as follows:

– If | r | > 0.80, relationship is practically significant

– If | r | < 0.20, relationship is not practically significant

Area of negative Area of positive


linear correlation No linear correlation linear correlation

-1.0 -0.8 -0.2 0 0.2 0.8 +1.0

6 © Open Source Six Sigma, LLC


Correlation Example

RB Stats Correlation.mtw

The Correlation Coefficient [r]:


• Is a positive value if one variable
increases as the other variable
increases.
• Is a negative value if one variable
decreases as the other increases.

Correlation Formula

( X i − X )(Yi − Y )
r=
( X i − X ) 2 (Yi − Y ) 2

7 © Open Source Six Sigma, LLC


Correlation Analysis

Graph>Scatter Plot>Simple…

Get outta my
way!

8 © Open Source Six Sigma, LLC


Correlation Example

Look at the graph. Do you observe any correlation in this graph?

Lowess stands for LOcally-WEighted Scatterplot Smoother.


9 © Open Source Six Sigma, LLC
Correlation Example

Correlation Coefficient is high and


the P-value is low. Reject the null
hypothesis; there is a correlation.

Results for: RB STATS CORRELATION.MTW


Scatterplot of Payton yards vs Payton carries
Correlations: Payton carries, Payton yards
Pearson correlation of Payton carries and Payton yards = 0.935
P-Value = 0.000
10 © Open Source Six Sigma, LLC
Regression Analysis

The last step to proper analysis of Continuous Data is to determine the


Regression Equation.

The Regression Equation can mathematically predict Y for any given X.

MINITABTM gives the BEST FIT for the plotted data.

Prediction Equations:
Y = a + bx (Linear or 1st order model)
Y = a + bx + cx2 (Quadratic or 2nd order model)
Y = a + bx + cx2 + dx3 (Cubic or 3rd order model)
Y = a (bx) (Exponential)
11 © Open Source Six Sigma, LLC
Simple versus Multiple Regression

Simple Regression:
– One X, One Y
– Analyze in MINITABTM using
• Stat>Regression>Fitted Line Plot or
• Stat>Regression>Regression

Multiple Regression:
– Two or More X’s, One Y
– Analyze in MINITABTM using:
• Stat>Regression>Regression

In both cases the R-sq value signifies the input


variation contribution on the output variation as
explained in the model.
12 © Open Source Six Sigma, LLC
Regression Analysis Graphical Output

13 © Open Source Six Sigma, LLC


Regression Analysis Statistical Output

Stat > Regression > Regression

R-Sq value of 87.3% = 1798587 / 2059413


R-Sq (adj) of 86.2% = (1798587 – 23711)/2059413

Mean Squares

R-Sq value of 87.3% quantifies the strength of the association between


Carries and Yards. In this case our Prediction Equation explains 87.3% of
the total variation seen in “Yards”. 12.7% of the variation seen in “Yards”
is not explained by our equation.
14 © Open Source Six Sigma, LLC
Regression (Prediction) Equation

Constant Level of X

Coefficient

The solution:
Payton yards = - 163.497 + 4.91622(250) = 1,065.6

15 © Open Source Six Sigma, LLC


Regression (Prediction) Equation

Compare to the Fitted Line.

~1067 yds

16 © Open Source Six Sigma, LLC


Regression Graphical Output

For a demonstration check other Regression fits.


Stat>Regression>Fitted Line Plot

Quadratic and Cubic – Check the r2 value against the linear model
to determine if the difference between the variance explained by
our equation is significant.

17 © Open Source Six Sigma, LLC


Regression Graphical Output

Quadratic

If the R-Sq value improves Cubic


significantly or if the assumptions
of the residuals are better met as a
result of utilizing the quadratic or
cubic equation you will want to use
the best fitting equation.

18 © Open Source Six Sigma, LLC


Residuals

As in ANOVA the residuals should:


– Be Normally Distributed (normal plot of residuals)
– Be independent of each other
• no patterns (random)
• data must be time ordered (residuals vs. order graph)
– Have a constant variance (visual, see residuals versus fits chart, should
be (approximately) same number of residuals above and below the
line, equally spread.)

19 © Open Source Six Sigma, LLC

You might also like