You are on page 1of 34

Correlation (Pearson R)

& Linear Regression

Azmi Mohd Tamil


The purpose of this topic is to answer
these questions statistically:

⚫ Are two or more variables related?

⚫ If so, what is the strength of the relationship?

⚫ What type of relationship exists?

⚫ What kind of predictions can be made from the


relationship?
Scatter Plot and Correlation

• In simple correlation and regression studies, the researcher collects


data on two numerical or quantitative variables to see whether a
relationship exists between the variables.

• The scatter plot is a visual way to describe the nature of the relationship
between the independent and dependent variables.

• The scales of the variables can be different, and the coordinates of the
axes are determined by the smallest and largest data values of the
variables.
Example of Non-Linear Relationship
Yerkes-Dodson Law – not for correlation

Better

Performance

Worse

Low
Stress High
Correlation

• The correlation coefficient computed from the sample


data measures the strength and direction of a linear
relationship between two variables.

• The symbol for the sample correlation coefficient is r.

• The symbol for the population correlation coefficient is r


(Greek letter rho).
Correlation

Two pieces of information:


⚫ The strength of the relationship
⚫ The direction of the relationship
Strength of relationship

⚫ r lies between -1 and 1. Values near 0


means no (linear) correlation and values
near ± 1 means very strong correlation.

-1.0 0.0 +1.0


Strong Negative No Rel. Strong Positive
How to interpret the value of r?
How to interpret the value of r?
Correlation ( + direction)

⚫ Positive correlation:
high values of one
variable associated with
high values of the other
⚫ Example: Higher
course entrance exam
scores are associated
with better course
grades during the final Positive and Linear
exam.
Correlation ( - direction)

⚫ Negative correlation: The


negative sign means that
the two variables are
inversely related, that is,
as one variable increases
the other variable
decreases.
⚫ Example: Increase in
body mass index is
associated with reduced Negative and Linear
effort tolerance.
Pearson Correlation

⚫ 2 Continuous Variables
– linear relationship
– e.g., association between height and weight, +

⚫ measures the degree of linear association


between two interval scaled variables
⚫ analysis of the relationship between two
quantitative outcomes, e.g., height and weight,
How to calculate r?
Strength of Relationship is computed, Significance
of relationship should be computed next.
Causal Silence:
Correlation Does Not Imply Causality

• Causality – a.k.a. Cause and Effect. In other words,


the principle that everything has a cause.

• Directionality of Effect Problem

• Third Variable Problem


Causality

X Y
Stress Illness
Linear Regression
Regression

X Y
Independent Dependent
Variable Variable

Predicting Y based on a given value of X


Regression and Prediction

X Y
Height Weight
REGRES SION

⚫ Regression
– one variable is a direct cause of the other
– or if the value of one variable is changed, then as
a direct consequence the other variable also
change
– or if the main purpose of the analysis is prediction
of one variable from the other
REGRES SION

– Regression: the dependence of dependent


variable (outcome) on the independant variable
(RF)
– relationship is summarised by a regression
equation.
REGRES SION

- The regression line


x - independent variable - horizontal axis
y - dependent variable - vertical axis
– Regression equation
y’ = a + bx
– a = y' intercept
– b = slope of the line
– x = independent variable
Regression Equation

⚫ Significance of relationship should be established


prior to Linear Regression

y` = a + bx
Regression Equation

⚫ Relationship and significance of relationship should


be established prior to Linear Regression
Relationship:

Significance of Relationship:
An emergency service wishes to see whether a relationship exists between the outside temperature (Fahrenheit) and the
number of emergency calls it receives for a 7-hour period. The data are shown. Aside from identifying the relationship,
predict the number of calls if the temperature (Fahrenheit) is 100.
An emergency service wishes to see whether a relationship exists between the outside temperature (Fahrenheit) and the
number of emergency calls it receives for a 7-hour period. The data are shown. Aside from identifying the relationship,
predict the number of calls if the temperature (Fahrenheit) is 100.
An emergency service wishes to see whether a relationship exists between the outside temperature (Fahrenheit) and the
number of emergency calls it receives for a 7-hour period. The data are shown. Aside from identifying the relationship,
predict the number of calls if the temperature (Fahrenheit) is 100.
A researcher wishes to determine if there is a relationship between the number of day care centers and the number of group
day care homes for counties in Pennsylvania. If there is a significant relationship, predict the number of group care homes a
county has if the county has 20 day care centers.
A car manufacturing company wanted to see the
relationship between the number of cars produced with
their income. Moreover, they also tasked their statisticians
and data analysts to create prediction on their income
(in billions) if they decided to produce 45 (in ten
thousands) cars.
A car manufacturing company wanted to see the
relationship between the number of cars produced with
their income. Moreover, they also tasked their statisticians
and data analysts to create prediction on their income
(in billions) if they decided to produce 45 (in ten
thousands) cars.
A car manufacturing company wanted to see the
relationship between the number of cars produced with
their income. Moreover, they also tasked their statisticians
and data analysts to create prediction on their income
(in billions) if they decided to produce 45 (in ten
thousands) cars.

You might also like