Professional Documents
Culture Documents
PROBABILITY
Learner’s Material
Lesson 18-20
NAME:________________________
GRADE & SECTION:___________
TEACHER:___________________
WEEK 18-20
LEARNING OUTCOMES:
LESSON OUTLINE:
1. Pearson’s correlation
2. Pearson correlation coefficient formula
3. Pearson correlation coefficient calculator
4. How to interpret the Pearson correlation coefficient
The Pearson coefficient correlation has a high statistical significance. It looks at the relationship between two variables. It
seeks to draw a line through the data of two variables to show their relationship. The relationship of the variables is
measured with the help Pearson correlation coefficient calculator. This linear relationship can be positive or negative.
For example:
Positive linear relationship: In most cases, universally, the income of a person increases as his/her age increases.
Negative linear relationship: If the vehicle increases its speed, the time taken to travel decreases, and vice versa.
From the example above, it is evident that the Pearson correlation coefficient, r, tries to find out two things – the strength
and the direction of the relationship from the given sample sizes.
The correlation coefficient formula finds out the relation between the variables. It returns the values between -1 and 1. Use
the below Pearson coefficient correlation calculator to measure the strength of two variables.
Page 2 of 14
Where:
Step one: Create a Pearson correlation coefficient table. Make a data chart, including both the variables. Label these
variables ‘x’ and ‘y.’ Add three additional columns – (xy), (x^2), and (y^2). Refer to this simple data chart.
Page 3 of 14
Step three: Add up all the columns from bottom to top.
If the result is negative, there is a negative correlation relationship between the two variables. If the result is positive, there
is a positive correlation relationship between the variables. Results can also define the strength of a linear relationship i.e.,
strong positive relationship, strong negative relationship, medium positive relationship, and so on.
The Pearson product-moment correlation coefficient, or simply the Pearson correlation coefficient or the Pearson
coefficient correlation r, determines the strength of the linear relationship between two variables. The stronger the
association between the two variables, the closer your answer will incline towards 1 or -1. Attaining values of 1 or -1 signify
that all the data points are plotted on the straight line of ‘best fit.’ It means that the change in factors of any variable does
not weaken the correlation with the other variable. The closer your answer lies near 0, the more the variation in the
variables.
Below are the proposed guidelines for the Pearson coefficient correlation interpretation:
Page 4 of 14
Note that the strength of the association of the variables depends on what you measure and sample sizes.
On a graph, one can notice the relationship between the variables and make assumptions before even calculating them.
The scatterplots, if close to the line, show a strong relationship between the variables. The closer the scatterplots lie next
to the line, the stronger the relationship of the variables. The further they move from the line, the weaker the relationship
gets. If the line is nearly parallel to the x-axis, due to the scatterplots randomly placed on the graph, it’s safe to assume that
there is no correlation between the two variables.
The terms ‘strength’ and ‘direction’ have a statistical significance. Here’s a straightforward explanation of the two words:
Strength: Strength signifies the relationship correlation between two variables. It means how consistently one variable
will change due to the change in the other. Values that are close to +1 or -1 indicate a strong relationship. These values
are attained if the data points fall on or very close to the line. The further the data points move away, the weaker the
strength of the linear relationship. When there is no practical way to draw a straight line because the data points are
scattered, the strength of the linear relationship is the weakest.
Direction: The direction of the line indicates a positive linear or negative linear relationship between variables. If the line
has an upward slope, the variables have a positive relationship. This means an increase in the value of one variable will
lead to an increase in the value of the other variable. A negative correlation depicts a downward slope. This means an
increase in the amount of one variable leads to a decrease in the value of another variable.
Let’s look at some visual examples to help you interpret a Pearson correlation coefficient table:
Page 5 of 14
Large positive correlation:
Page 6 of 14
Small negative correlation
In the figure above, the scatter plots are not as close to the straight line compared to the earlier examples
It shows a negative linear correlation of approximately -0.5
The change in one variable is inversely proportional to the change of the other variable as the slope is negative.
An example of a small negative correlation would be – The more somebody eats, the less hungry they get.
Weak / no correlation
Page 7 of 14
How to Calculate a Regression Line
In statistics, you can calculate a regression line for two variables if their scatterplot shows a linear pattern and the
correlation between the variables is very strong (for example, r = 0.98). A regression line is simply a single line that best fits
the data (in terms of having the smallest overall distance from the line to the points). Statisticians call this technique for
finding the best-fitting line a simple linear regression analysis using the least squares method.
The formula for the best-fitting line (or regression line) is y = mx + b, where m is the slope of the line and b is
the y-intercept. This equation itself is the same one used to find a line in algebra; but remember, in statistics the points
don’t lie perfectly on a line — the line is a model around which the data lie if a strong linear pattern exists.
The slope of a line is the change in Y over the change in X. For example, a slope of
means as the x-value increases (moves right) by 3 units, the y-value moves up by 10 units on average.
The y-intercept is the value on the y-axis where the line crosses. For example, in the equation y=2x – 6, the
line crosses the y-axis at the value b= –6. The coordinates of this point are (0, –6); when a line crosses the y-
axis, the x-value is always 0.
You may be thinking that you have to try lots and lots of different lines to see which one fits best. Fortunately,
you have a more straightforward option (although eyeballing a line on the scatterplot does help you think about what
you’d expect the answer to be). The best-fitting line has a distinct slope and y-intercept that can be calculated using
formulas (and these formulas aren’t too hard to calculate).
Page 8 of 14
Solved Example Problems for Regression Analysis
Regression analysis is a set of statistical methods used for the estimation of relationships between a
dependent variable and one or more independent variables. It can be utilized to assess the strength of the relationship
between variables and for modeling the future relationship between them.
Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most
common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more
complicated data sets in which the dependent and independent variables show a nonlinear relationship.
Regression Analysis – Linear model assumptions Linear regression analysis is based on six fundamental
assumptions:
1. The dependent and independent variables show a linear relationship between the slope and the
intercept.
Page 9 of 14
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all observations.
5. The value of the residual (error) is not correlated across all observations.
6. The residual (error) values follow the normal distribution.
Regression Analysis – Simple linear regression Simple linear regression is a model that assesses the
relationship between a dependent variable and an independent variable. The simple linear model is expressed using
the following equation:
Y = a + bX + ϵ
Where:
Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
Where:
Y – Dependent variable
X1, X2, X3 – Independent (explanatory) variables
a – Intercept
b, c, d – Slopes
ϵ – Residual (error)
Multiple linear regression follows the same conditions as the simple linear model. However, since there are
several independent variables in multiple linear analysis, there is another mandatory condition for the model:
Non-collinearity: Independent variables should show a minimum of correlation with each other. If the
independent variables are highly correlated with each other, it will be difficult to assess the true relationships between
the dependent and independent variables.
Page 11 of 14
Page 12 of 14
WEEK 18-20
7. Zoe needs to analyze the strength of the relationship between two variables. What is the correct name
for the test she needs to conduct?
a. PCC
b. Pearson Co
c. Pearson R
d. Correlation Co
8. Joe is doing an analysis of data for his Economics 104 class. He has the value of the foreign direct
investment and political stability of one country. What could he learn about this relationship from the
Pearson Coefficient?
9. Joe is doing an analysis of data for his Economics 104 class. He has the value of the foreign direct
investment and political stability of one country. What could he learn about this relationship from the
Pearson Coefficient?
Page 13 of 14
a. Approximately 0.73
b. Approximately 0.90
c. Approximately 1.2
d. Approximately 0.52
11. Find the correlation coefficient between the Average Number of Assignments in Class and the Class
Absences. Deduce whether there is a positive or negative correlation.
12. Find the correlation coefficient between the two values. Is there a positive or negative correlation?
Page 14 of 14