You are on page 1of 4

Correlation and Regression

Scatter plot is used to show a rough estimate of the relationship between two variables
Correlation – measures the strength of the association between two variables (bivariate
data)
- Only concerned with strength of the relationship
- No causal effect is implied
Bivariate data – are data sets in which each subject has two observations associated with it.

Types of Correlation:

1. Positive correlation – exists when high scores in one variable are associated with high scores
in the second variable or low scores in one variable are associated with low scores in the
other.
2. Negative correlation – exists when high scores in one variable are associated with low scores
in the second or vice versa.
3. Zero correlation – exists when the points on the scatter diagram are spread in a random
manner.
4. Perfect correlation – all points lie on a straight line

The strength or degree of the relationship is based on the following ranges of the correlation
coefficient:

Ranges of r Degree/strength of relationship


± 1.00 Perfect relationship
±0.90 to ± 0.99 Very strong/very high
±0.70 to ± 0.89 Strong/high
±0.40 to ± 0.69 Moderate/substantial
±0.20 to ± 0.39 Weak/small
±0.01 to ± 0.19 Almost negligible to slight
0 No correlation

Scatter Plot Examples:


Correlation Coefficient

- It is a descriptive measure usually denoted by r, which ranges from -1 to 1.


- It measures the degree of relationship between two variables.

Features of correlation coefficient (r)

- Unit free
- Ranges between -1 to 1
- The closer to -1, the stronger the negative linear relationship
- The closer to 1, the stronger the positive linear relationship
- The closer to 0, the weaker the linear relationship

Calculating the Correlation Coefficient

n ∑ xy−∑ x ∑ y
r=
√ [ n (∑ x )−(∑ x ) ][ n (∑ y )−(∑ y ) ]
2 2 2 2

Where:

r = sample correlation coefficient


n = sample size
x = value of the independent variable
y = value of the dependent variable

Calculation example

Trunk
Tree
Diamete
Height
r
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
321 73 3142 14111 713

70

60

50

40
Tree Height
30
y
20

10

0
5 6 7 8 9 10 11 12 13 14
Trunk Diameter

n ∑ xy−∑ x ∑ y
r=
√ [ n( ∑ x )−(∑ x) ][n (∑ y )−(∑ y ) ]
2 2 2 2

8 ( 3142 )−(73)(321)
r=
√ [ 8 ( 713 )− (73 ) ][ 8 ( 14111 )−( 321 ) ]
2 2

r =0.886 relatively strong positive linear association between x and y

Coefficient of Determination

Terms:
Pearson product moment correlation coefficient
Coefficient of determination = R2
Indicates the proportion of the variance in one variable that can be associated within the
variance in the other variable.

The coefficient of determination is the portion of the total variation in the dependent variable that is
explained by variation in the independent variable

Note: In the single independent variable case, the coefficient of determination is


2 2
R =r
Where:
R2= Coefficient of determination
r= simple correlation coefficient

Regression analysis is used to:


 Predict the value of a dependent variable based on the value of at least one independent
variable
 Explain the impact of changes in an independent variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain the dependent variable

Simple Linear Regression Model


 Only one independent variable, x
 Relationship between x and y is described by linear function
 Changes in y are assumed to be caused by changes in x

Coefficient of determination
2 SSR ∑ of squares explained by regression
R= =
SST total ∑ of squares

Note: in the single independent variable case, the coefficient of determinations is


R2=r 2
Where:
R2= Coefficient of determination
r= simple correlation coefficient

You might also like