You are on page 1of 3

Linear Regression and Correlation

What is Correlation Analysis?


Correlation analysis is the study of the relationship between variables.

CORRELATION ANALYSIS A group of techniques to measure the strength of the


association between two variables.

The basic idea of correlation analysis is to report the strength of the association between
two variables. The usual first step is to plot the data in a scatter diagram.

SCATTER DIAGRAM A chart that portrays the relationship between two variables.

DEPENDENT VARIABLE The variables that is being predicted or estimated.

INDEPENDENT VARIABLE A variable that provides the basis for estimation. It is the
predictor variable.

It is common practice to scale the dependent variable on the vertical or Y-axis and the
independent variable on the horizontal or X-axis.

The Coefficient of Correlation


Originated by Karl Pearson about 1900, the coefficient of correction describes the
strength of the relationship between two sets of interval-scaled or ratio-scaled variables.
Designated r, it is often referred to as Pearson's r and as the Pearson product-moment
correlation coefficient. It can assume any value from -1.00 to + 1.00 inclusive. A
correlation coefficient of -1.00 or +1.00 indicates perfect correlation.

COEFFICIENT OF CORRELATION A measure of the strength of the linear


relationship between two variables.

How is the value of the coefficient of correlation determined?


The coefficient of correlation can be computed from a computational formula based on the actual values of
X and Y. The formula is:
COEFFICIENT OF CORRELATION,
n( XY )  ( X )( Y )
r
 n( X 2

)  ( X ) 2 n( Y 2 )  ( Y ) 2 
Where: n is the number of paired observations.

Example: The Manager of a Company selects a random sample of 10 representatives and


determines the number of sales calls each representative made last month and the number
of copies sold.
-2-
Sales Copiers
Sales Calls Sold
Representative (X) (Y) X2 Y2 XY
A 20 30 400 900 600
B 40 60 1,600 3,600 2,400
C 20 40 400 1,600 800
D 30 60 900 3,600 1,800
E 10 30 100 900 300
F 10 40 100 1,600 400
G 20 40 400 1,600 800
H 20 50 400 2,500 1,000
I 20 30 400 900 600
J 30 70 900 4,900 2,100
Total 220 450 5,600 22,100 10,800

Using the above formula


10(10,800)  ( 220)(450)
r=
[10(5,600)  (220) 2 ][10(22,100)  ( 450) 2 ]
= 0.759
How do we interpret a coefficient of 0.759? [A chart is written on the board]
First, it is positive, so we see there is a direct relationship between the number of sales
calls and the number of copiers sold. The value of 0.759 is fairly close to 1.00, so we
conclude that the association is strong. To put it another way, a 25 percent increase in
calls will likely lead to 25 percent more sales.

THE COEFFICIENT OF DETERMINATION

It is computed by squaring the coefficient of correlation. In the example, the coefficient


of determination, r2, is 0.576, found by (0.759)2. This is a Proportion or a percent; we can
say that 57.6 percent of the variation in the number of copiers sold is explained, or
accounted for, by the variation in the number of sales calls.

COEFFICIENT OF DETERMINATION The proportion of the total variation in the


dependent variable Y that is explained, or accounted for, by the variation in the
independent variable X.
Example: A sample of 10 families in a certain area in the U.S.A
revealed the following figures for family size and the amount
spent on food per week.
Table 1
Family size 3 6 5 6 6 3 4 4 5 3
Amount spent on food $99 104 151 129 142 111 74 91 119 91

a) Compute the coefficient of correlation and interpret.


b) Determine the coefficient of determination and interpret.
c) Determine the regression equation and interpret.
d)

You might also like