You are on page 1of 31

1

Correlation &
Regression
SMT 1063
LECTURE 10

NUSKA RUFFIN
Introduction 2

Correlation and Regression analyses explore relationships between


two or more numerical /quantitative variables

Correlation – determines the strength of the linear relationship


between variables

Regression – Describes the dependence of a variable on another


and nature of the relationship (Positive, Negative, Linear or
Non- linear)
3
1. Are two or more variables related?

2.If so, what is the strength of the relationship?

3.What type of relationship exists?

4.What kind of predictions can be made from the


relationship?
4
Scatter plots vs Correlation

Simple correlation study involving drawing plot

Two variables involved – Independent and dependent

Independent variable is designated to x axis of the table and the


dependent varible is designated to the y axis of the table

“A scatter plot is a graph of the ordered pairs (x, y) of numbers


consisting of the independent variable x and the dependent
variable y"
Example 1
5
Construct a scatter plot for the data shown for car rental
companies in the United States for a recent year
6
Example 2
7
Construct a scatter plot for the data obtained in a study on the
number of hours that nine people exercise each week and the
amount of milk (in ounces) each person consumes per week.
8
Correlation Coefficient 9

Correlation coefficient measures the strength and


direction of a linear relationship between two variables

sample correlation coefficient : r

population correlation coefficient : ρ (rho)

r, also known as Pearson product moment correlation coefficient


(PPMC)

A correlation coefficient is a statistical measure of the degree to


which changes to the value of one variable predict change to the
value of another
10

The range of the correlation coefficient is from -1 to +1

Correlation coefficient close to +1 indicates strong positive linear


relationship

Correlation coefficient close to -1 indicates strong negative linear


relationship

Correlation coefficient close to 0 indicates no linear relationship


or weak relationship
11
12
Formula for Correlation Coefficient 13

n = number of data pairs


Note:
Round up the r value to nearest 3 decimal places
Example 3
14
Compute the correlation coefficient for the data in
example 1
15
Complete the table

Substitute the values to the formula and calculate the r


value
16

The correlation coefficient suggests a strong relationship


between the number of cars a rental agency has and its annual
income
The significance of the C.C. 17

When r is not equal to 0:


either the value of r is high enough to conclude that there is a
significant linear relationship between the variables, or the value
of r is due to chance

1. Traditional method with t value


2. P-value method
3. Critical values of PPMC method
Traditional method 18

Step 1 State the hypotheses


Step 2 Find the critical values Compute
Step 3 the test value Make the decision
Step 4 Summarize the results
Step 5

H0: ρ = 0 This null hypothesis means that there is no correlation between


the x and y variables
H1: ρ ≠ 0 This alternative hypothesis means that there is a significant
correlation between the variables
19
t - test formula for C.C

Degree of freedom = n - 2
Example 4
20
Test the significance of the correlation coefficient found in Example
3.
Use α = 0.05 and r = 0.982
State the hypothesis: H0: ρ = 0 H1: ρ ≠ 0
Find the critical values from the t distribution
(α = 0.05 and d.f = 6-2 = 4) = ± 2.776
Compute the test value ; 21

Make the decision – Reject the null hypothesis Summarize -


there is a significant positive correlation
Perform a correlation test for the given data set 22
Use α = 0.05
23
Coefficient of determination
The coefficient of determination (R² or r²) measures how well a statistical
model predicts an outcome

For any given value of r, the r² will denote a value that is closer to 0 and
will be devoid of a sign.

if r is +0.7 or −0.7, r² will be 0.49


interpretation: 49% of the variability in y is due to variation of x

Note: When r2 > 50%, one variable is responsible for more than half of the variation in the other; the
relationship is obviously strong
Regression 24

Regression line is the data’s line of the best fit; drawn when the
correlation coefficient between two variables is significant

The purpose of the regression line is to enable the researcher to see


the trend and make predictions based on the data

When r is positive, the line slopes upward and to the right

When r is negative, the line slopes downward from left to right


25
26
Determination of the Regression line Equation

In statistics, the equation of the regression line is written as;

𝒚′ = 𝒂+ 𝒃𝒙
where a is the y’ intercept and b is the slope of the line
same values are used in computing the correlation coefficient
27

Round the Values of a and b to three decimal places


Example 5
28
Find the equation of the regression line for the data in Example
1
The values needed for the equation:
N = 6, ∑x = 153.8, ∑y = 18.7, ∑xy = 682.77,
and ∑x2= 5859.26
29

Equation: 𝑦 = 0.396 + 0.106𝑥

Substitute 2 x values and get the corresponding y values

Plot the two points in the graph paper and draw the line of best
fit

The sign of the correlation coefficient and the sign of the slope
of the regression line is always same
30
Question: 31
Find the equation of the regression line for the data in Example 2

You might also like