You are on page 1of 26

Correlation

Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Go to Table of Content

Additional Reading
• For additional reading see Chapter 6 in Michael R. Middleton’s Data Analysis Using Excel, Duxbury Thompson Publishers, 2000. • See also Chapter 4 section 7 of Keller and Warrack’s Statistics for Management and Economics. Fifth Edition, Duxbury Thompson Learning Publisher, 2000. • Read any introductory statistics book about correlation.
Go to Table of Content

2

Which Approach Is Appropriate When? • Choosing the right method for the data is the key statistical expertise that you need to have. Go to Table of Content 3 . • You might want to review a decision tool that we have organized for you to help you in choosing the right statistical method.

• You do need to understand the concept behind them and the general statistical concepts imbedded in the use of the formulas. • You do not need to be able to do correlation and regression by hand. • You do need to know where they are in your reference book.Do I Need to Know the Formulas? • You do not need to know exact formulas. You must be able to do it on a computer using Excel or other software. Go to Table of Content 4 .

Table of Content • • • • • • • • • Objectives Independent and dependent variables Example Scatter plot Correlation coefficient Range of correlation coefficient Formula for correlation coefficient Example for correlation coefficient Possible relationships between variables Go to Table of Content 5 .

Objectives • To learn the assumptions behind and the interpretation of correlation. Go to Table of Content 6 . • To use Excel to calculate correlations.

Go to Table of Content .Purpose of Correlation Correlation determines whether values of one variable are related to another.

Go to Table of Content 8 . Its values are predicted from the independent variable. • Dependent variable: is a variable that cannot be controlled or manipulated.Independent and Dependent Variables • Independent variable: is a variable that can be controlled or manipulated.

• Are these two variables related? Student Hours studied % Grade A 6 82 B C D 2 1 5 63 57 88 E F 3 2 68 75 9 Go to Table of Content . • The grade student receives depend upon the number of hours he or she will study.Example • Independent variable in this example is the number of hours studied. • The grade the student receives is a dependent variable.

Go to Table of Content 10 . • By convention. the independent variable is plotted on the horizontal x-axis. The dependent variable is plotted on the vertical y-axis.Scatter Plot • The independent and dependent can be plotted on a graph called a scatter plot.

Scatter Plot 100 80 Grade (%) 60 40 20 0 0 1 2 3 4 5 6 7 Hours Studied Go to Table of Content 11 . y. • Please use excel to create a scatter plot.y) of numbers consisting of the independent variables. and the dependent variables.Example of Scatter Plot • A scatter plot is a graph of the ordered pairs (x. x.

Interpret a Scatter Plot The graph suggests a positive relationship between hours of studies and grades Scatter Plot 100 80 Grade (%) 60 40 20 0 0 1 2 3 4 5 6 7 Hours Studied Go to Table of Content 12 .

• The range of the correlation coefficient is. Go to Table of Content 13 .Correlation Coefficient • The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. .1 to + 1 and is identified by r.

(Weight and height). (Strength and age).Positive and Negative Correlations • A positive relationship exists when both variables increase or decrease at the same time. Go to Table of Content 14 . • A negative relationship exist when one variable increases and the other variable decreases or vice versa.

Range of correlation coefficient • In case of exact positive linear relationship the value of r is +1. the value of r will be close to + 1. • In case of a strong positive linear relationship. Correlation = +1 Dependent variable 25 20 15 10 12 14 16 18 20 Independent variable Go to Table of Content 15 .

Range of correlation coefficient • In case of exact negative linear relationship the value of r is –1. • In case of a strong negative linear relationship. Correlation = -1 Dependent variable 25 20 15 10 12 14 16 18 20 Independent variable Go to Table of Content 16 . the value of r will be close to – 1.

Range of correlation coefficient Dependent variable In case of a weak relationship the value of r will be close to 0. Correlation = 0 30 25 20 15 10 0 2 4 6 8 10 12 Independent variable Go to Table of Content 17 .

Correlation = 0 30 20 10 0 0 2 4 6 8 10 12 Independent variable Go to Table of Content 18 .Range of correlation coefficient Dependent variable In case of nonlinear relationship the value of r will be close to 0.

Go to Table of Content 19 .Formula for correlation coefficient The formula to compute a correlation coefficient is: r = [n(xy) – (x)(y)] / {[n(x2) – (x)2][n(y2) – (y)2]}0. x is the independent variable and y the dependent variable.5 Where n is the number of data pairs.

x2 and y2. Student Age A B C D E F Sum 43 48 56 61 67 70 345 Blood Age* Pressure BP 128 120 135 143 141 152 819 5504 5760 7560 8723 9447 age2 1849 2304 3136 3721 4489 BP2 16384 14400 18225 20449 19881 23104 112443 20 10640 4900 47634 20399 Go to Table of Content . y. • Using the data on age and blood pressure.Example for correlation coefficient • Let’s do an example. let’s calculate the x. xy.

r= 0.5. Go to Table of Content 21 . • The correlation coefficient suggests a strong positive relationship between age and blood pressure.Example for correlation coefficient • Substitute in the formula and solve for r: r= {(6*47634)-(345*819)}/{[(6*20399)3452][(6*112443)-8192]}0.897.

• Relationship caused by third variable. Death due to drowning and soft drink consumption during summer. • Both cause and effect. that y cause x or coffee consumption causes nervousness as well nervous people have more coffee. Go to Table of Content 22 . Both variables are related to heat and humidity (third variable).Possible Relationships Between Variables • Direct cause and effect. that is x cause y or water causes plant to grow.

• Coincidental relationship. and instructors. hours of study. Increase in the number of people exercising and increase in the number of people committing crimes. age.Possible Relationships Between Variables • Complexity of interrelationships among many variables. motivation. influence of parents. Go to Table of Content 23 . Relationship between student’s high school grade and college grades. But others variables are involved too such as IQ.

9 • There is a strong positive relationship between age and blood pressure Age Blood Pressure 0.90 Go to Table of Content 24 .Interpretation • The correlation is 0.

Test of Correlation • Null hypothesis: correlation is zero • Test statistic is t = r [(n-2)/(1-r2)]0.5 • The statistic is distributed as Student t distribution with n-2 degrees of freedom • Excel does not calculate this statistic and you can manually calculate it Go to Table of Content 25 .

• Correlation assumes linear relationship.Take Home Lesson • Correlation measures association and not causation. Go to Table of Content 26 . • Values range between –1 and +1 and measure the strength and direction of the relationship.