You are on page 1of 13

LINEAR

REGRESSION AND
CORRELATION
Correlation

■Correlation is the extent to which two


variables are related.
■If the two variables are highly related,
then knowing the value of one of them
will allow you to predict the other
variable with considerable accuracy.
■The less highly related the variable, the
less accurate your ability to predict
when you know the other.
The Scatter plot: Graphing
Correlations
■ Also known as the scatter diagram, the scatter
plot allows us to visually see the relation between
two variables.
■ One variable is plotted on the ordinate and the
other on the abscissa.
– Although you can list either variable on either
axis, it is common to place the variable you
are attempting to predict on the ordinate.
– Positive correlations – occur when both
variables move in the same direction (e.g., as
PUPCET scores increase, so to do GPAs).
– Negative Correlations – occur when one
variable increases, the other decreases (e.g.,
as age increases, the number of speeding
tickets decrease).
Example of the
Scatterplot
Remember this!
■ Often used as means for prediction,
correlation tells us how related two
variables are.

■ However, note that even though two variables


may be highly correlated, you should not
assume that one variable causes the other.

■ CORRELATION DOES NOT IMPLY


CAUSATION.
– For example, there is the third variable
possibility (i.e., there may be additional
variable(s) that are causing the two things
■ The Pearson correlation is also known as
the “product moment correlation
coefficient” (PMCC) or simply
“correlation”.
Pearson correlations are suitable only for
metric variables (which include
dichotomous variables).
■ For ordinal variables, use the Spearman
correlation or Kendall’s tau and
■ for nominal variables, use Cramér’s V.
The Pearson Product Moment
Correlation Coefficient

■ The correlation coefficient is the single number


that represents the degree of relation
between two variables.
■ The Pearson Product-Moment Correlation
Coefficient (symbolized by r) is the most common
measure of correlation; researchers calculate it
when both the X variable and the Y variable are
interval or ration scale measurements.
Mathematically, it can be defined as the average
of the cross-products of z-scores.
■ The raw score formula for r is:
Example:
■ Find the value of the correlation coefficient from the following table.
Table 1. Age and Glucose Level of the Respondents

Age Glucose
Level
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Example:
■ The following data show the hours per week
that a student spent playing Mobile Legends
and the student’s weekly algebra scores for
those same weeks.

Hours per 4 5 7 8 10
week spent
playing
Mobile
Legends
Weekly 52 60 72 79 83
Algebra test
score

Find the correlation coefficient and interpret it.


Linear regression

■ It is the process of determining the linear


relationship between two variables.
■ Using the bivariate data, we will determine
the equation of the line of best fit.
■ The line of best fit is also called as the
regression line or the least squares
line.
The equation of the line
of best fit is

y=  mx + b,
Where m =
and
b=
Example:
Calori 80 70 60 40 70 50
es

Fat 9 8 7 4.5 8 5
(gram
s)

Determine the correlation coefficient between the


number of calories and the number of grams of fat.
Determine the equation of the line of best fit for the
number of calories and the number of fat.

You might also like