You are on page 1of 14

STATISTICS AND

PROBABILITY
Learner’s Material

Lesson 18-20

NAME:________________________
GRADE & SECTION:___________
TEACHER:___________________
WEEK 18-20

LEARNING OUTCOMES:

At the end of the lesson the learners should:

1. calculates the Pearson’s sample correlation coefficient.


2. solves problems involving correlation analysis.
3. identifies the independent and dependent variables.
4. calculates the slope and y-intercept of the regression line.
5. interprets the calculated slope and y-intercept of the regression line.
6. predicts the value of the dependent variable given the value of the independent variable.
7. solves problems involving regression analysis.

LESSON OUTLINE:

1. Pearson’s correlation
2. Pearson correlation coefficient formula
3. Pearson correlation coefficient calculator
4. How to interpret the Pearson correlation coefficient

What does the Pearson correlation coefficient test do?

The Pearson coefficient correlation has a high statistical significance. It looks at the relationship between two variables. It
seeks to draw a line through the data of two variables to show their relationship. The relationship of the variables is
measured with the help Pearson correlation coefficient calculator. This linear relationship can be positive or negative.

For example:

 Positive linear relationship: In most cases, universally, the income of a person increases as his/her age increases.
 Negative linear relationship: If the vehicle increases its speed, the time taken to travel decreases, and vice versa.
From the example above, it is evident that the Pearson correlation coefficient, r, tries to find out two things – the strength
and the direction of the relationship from the given sample sizes.

Pearson correlation coefficient formula

The correlation coefficient formula finds out the relation between the variables. It returns the values between -1 and 1. Use
the below Pearson coefficient correlation calculator to measure the strength of two variables.

Pearson correlation coefficient formula:

Page 2 of 14
Where:

N = the number of pairs of scores

Σxy = the sum of the products of paired scores

Σx = the sum of x scores

Σy = the sum of y scores

Σx2 = the sum of squared x scores

Σy2 = the sum of squared y scores

Pearson correlation coefficient calculator

Here is a step by step guide to calculating Pearson’s correlation coefficient:

Step one: Create a Pearson correlation coefficient table. Make a data chart, including both the variables. Label these
variables ‘x’ and ‘y.’ Add three additional columns – (xy), (x^2), and (y^2). Refer to this simple data chart.

Step two: Use basic multiplication to complete the table.

Page 3 of 14
Step three: Add up all the columns from bottom to top.

Step four: Use the correlation formula to plug in the values.

If the result is negative, there is a negative correlation relationship between the two variables. If the result is positive, there
is a positive correlation relationship between the variables. Results can also define the strength of a linear relationship i.e.,
strong positive relationship, strong negative relationship, medium positive relationship, and so on.

Determining the strength of the Pearson product-moment correlation coefficient

The Pearson product-moment correlation coefficient, or simply the Pearson correlation coefficient or the Pearson
coefficient correlation r, determines the strength of the linear relationship between two variables. The stronger the
association between the two variables, the closer your answer will incline towards 1 or -1. Attaining values of 1 or -1 signify
that all the data points are plotted on the straight line of ‘best fit.’ It means that the change in factors of any variable does
not weaken the correlation with the other variable. The closer your answer lies near 0, the more the variation in the
variables.

How to interpret the Pearson correlation coefficient

Below are the proposed guidelines for the Pearson coefficient correlation interpretation:

Page 4 of 14
Note that the strength of the association of the variables depends on what you measure and sample sizes.
On a graph, one can notice the relationship between the variables and make assumptions before even calculating them.
The scatterplots, if close to the line, show a strong relationship between the variables. The closer the scatterplots lie next
to the line, the stronger the relationship of the variables. The further they move from the line, the weaker the relationship
gets. If the line is nearly parallel to the x-axis, due to the scatterplots randomly placed on the graph, it’s safe to assume that
there is no correlation between the two variables.

What do the terms strength and direction mean?

The terms ‘strength’ and ‘direction’ have a statistical significance. Here’s a straightforward explanation of the two words:

 Strength: Strength signifies the relationship correlation between two variables. It means how consistently one variable
will change due to the change in the other. Values that are close to +1 or -1 indicate a strong relationship. These values
are attained if the data points fall on or very close to the line. The further the data points move away, the weaker the
strength of the linear relationship. When there is no practical way to draw a straight line because the data points are
scattered, the strength of the linear relationship is the weakest.
 Direction: The direction of the line indicates a positive linear or negative linear relationship between variables. If the line
has an upward slope, the variables have a positive relationship. This means an increase in the value of one variable will
lead to an increase in the value of the other variable. A negative correlation depicts a downward slope. This means an
increase in the amount of one variable leads to a decrease in the value of another variable.

Examples of Pearson’s correlation coefficient

Let’s look at some visual examples to help you interpret a Pearson correlation coefficient table:

Page 5 of 14
 Large positive correlation:

The above figure depicts a correlation of almost +1.


The scatterplots are nearly plotted on the straight line.
The slope is positive, which means that if one variable increases, the other variable also increases, showing a positive linear
line.
This denotes that a change in one variable is directly proportional to the change in the other variable.
An example of a large positive correlation would be – As children grow, so do their clothes and shoe sizes.
Let’s look at some visual examples to help you interpret a Pearson correlation coefficient table:

 Medium positive correlation:

The figure above depicts a positive correlation.


The correlation is above than +0.8 but below than 1+.
It shows a pretty strong linear uphill pattern.
An example of a medium positive correlation would be – As the number of automobiles increases, so does the demand in
the fuel variable increases.

Page 6 of 14
 Small negative correlation

In the figure above, the scatter plots are not as close to the straight line compared to the earlier examples
It shows a negative linear correlation of approximately -0.5
The change in one variable is inversely proportional to the change of the other variable as the slope is negative.
An example of a small negative correlation would be – The more somebody eats, the less hungry they get.

 Weak / no correlation

The scatterplots are far away from the line.


It is tough to practically draw a line.
The correlation is approximately +0.15
It can’t be judged that the change in one variable is directly proportional or inversely proportional to the other variable.
An example of a weak/no correlation would be – An increase in fuel prices leads to lesser people adopting pets.

Page 7 of 14
How to Calculate a Regression Line

In statistics, you can calculate a regression line for two variables if their scatterplot shows a linear pattern and the
correlation between the variables is very strong (for example, r = 0.98). A regression line is simply a single line that best fits
the data (in terms of having the smallest overall distance from the line to the points). Statisticians call this technique for
finding the best-fitting line a simple linear regression analysis using the least squares method.

The formula for the best-fitting line (or regression line) is y = mx + b, where m is the slope of the line and b is
the y-intercept. This equation itself is the same one used to find a line in algebra; but remember, in statistics the points
don’t lie perfectly on a line — the line is a model around which the data lie if a strong linear pattern exists.

 The slope of a line is the change in Y over the change in X. For example, a slope of

 means as the x-value increases (moves right) by 3 units, the y-value moves up by 10 units on average.

 The y-intercept is the value on the y-axis where the line crosses. For example, in the equation y=2x – 6, the
line crosses the y-axis at the value b= –6. The coordinates of this point are (0, –6); when a line crosses the y-
axis, the x-value is always 0.

You may be thinking that you have to try lots and lots of different lines to see which one fits best. Fortunately,
you have a more straightforward option (although eyeballing a line on the scatterplot does help you think about what
you’d expect the answer to be). The best-fitting line has a distinct slope and y-intercept that can be calculated using
formulas (and these formulas aren’t too hard to calculate).

Page 8 of 14
Solved Example Problems for Regression Analysis

What is Regression Analysis?

Regression analysis is a set of statistical methods used for the estimation of relationships between a
dependent variable and one or more independent variables. It can be utilized to assess the strength of the relationship
between variables and for modeling the future relationship between them.

Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most
common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more
complicated data sets in which the dependent and independent variables show a nonlinear relationship.

Regression analysis offers numerous applications in various disciplines, including finance.

Regression Analysis – Linear model assumptions Linear regression analysis is based on six fundamental
assumptions:

1. The dependent and independent variables show a linear relationship between the slope and the
intercept.

Page 9 of 14
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all observations.
5. The value of the residual (error) is not correlated across all observations.
6. The residual (error) values follow the normal distribution.

Regression Analysis – Simple linear regression Simple linear regression is a model that assesses the
relationship between a dependent variable and an independent variable. The simple linear model is expressed using
the following equation:

Y = a + bX + ϵ

Where:

Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)

Regression Analysis – Multiple linear regression


Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that
multiple independent variables are used in the model. The mathematical representation of multiple
linear regression is:

Y = a + bX1 + cX2 + dX3 + ϵ

Where:

Y – Dependent variable
X1, X2, X3 – Independent (explanatory) variables
a – Intercept
b, c, d – Slopes
ϵ – Residual (error)

Multiple linear regression follows the same conditions as the simple linear model. However, since there are
several independent variables in multiple linear analysis, there is another mandatory condition for the model:

Non-collinearity: Independent variables should show a minimum of correlation with each other. If the
independent variables are highly correlated with each other, it will be difficult to assess the true relationships between
the dependent and independent variables.

Regression analysis in finance


Regression analysis has several applications in finance. For example, the statistical method is fundamental to
the Capital Asset Pricing Model (CAPM). Essentially, the CAPM equation is a model that determines the relationship
between the expected return of an asset and the market risk premium.
Page 10 of 14
The analysis is also used to forecast the returns of securities, based on different factors, or to forecast the
performance of a business. Learn more forecasting methods in CFI’s Budgeting and Forecasting Course!

Beta and CAPM


In finance, regression analysis is used to calculate the Beta (volatility of returns relative to the overall
market) for a stock. It can be done in Excel using the Slope function.

Page 11 of 14
Page 12 of 14
WEEK 18-20

NAME: ______________________________________________ SECTION: __________________________

ANSWER . Circle your answer (5 pts each)

7. Zoe needs to analyze the strength of the relationship between two variables. What is the correct name
for the test she needs to conduct?

a. PCC
b. Pearson Co
c. Pearson R
d. Correlation Co

8. Joe is doing an analysis of data for his Economics 104 class. He has the value of the foreign direct
investment and political stability of one country. What could he learn about this relationship from the
Pearson Coefficient?

a. How long ago something has occurred


b. How strong the relationship is
c. How varied the relationship is
d. Whether or not there is a causal link between the variables

9. Joe is doing an analysis of data for his Economics 104 class. He has the value of the foreign direct
investment and political stability of one country. What could he learn about this relationship from the
Pearson Coefficient?

a. How long ago something has occurred


b. How strong the relationship is
c. How varied the relationship is
d. Whether or not there is a causal link between the variables

10. Find the correlation coefficient, r between the X and Y values.

Page 13 of 14
a. Approximately 0.73
b. Approximately 0.90
c. Approximately 1.2
d. Approximately 0.52

11. Find the correlation coefficient between the Average Number of Assignments in Class and the Class
Absences. Deduce whether there is a positive or negative correlation.

a. Approximately -0.95, Strong Negative Correlation


b. Approximately -0.95 Strong Positive Correlation
c. Approximately -1.42, Strong Negative Correlation
d. Approximately 0.89, Strong Positive Correlation

12. Find the correlation coefficient between the two values. Is there a positive or negative correlation?

a. Approximately -0.78, Strong Negative Correlation


b. Approximately -0.44, Negative Correlation
c. Approximately 0.34, Positive Correlation
d. Approximately -0.13, Weak Negative Correlation

Page 14 of 14

You might also like