You are on page 1of 19

Biñan Integrated National High School.

Brgy. Sto. Domingo, Biñan City Laguna

Term Paper in Statistics and

Probability

Submitted by:

Valencia, Lady Bianca E. Posecion, John Noel S.

Frondoza, Bethany F. Bocboc, Kyla

Esta, Keiza Mae Balon, Mary Rose

Abuan, Reylan C. Belenzo, Maybelene

Ibias Katrina Cassandra T. Debuton, Shiela May

Submitted to:

Mrs. Yolanda P. Recuerdo


ABSTRACT

Correlation and Regression Analysis is one of the statistical measurements that can be

use in real-life situation. These concepts develop by Sir Francis Galton is somehow related to

each other because one talks about how the variable is related to other variable and how an

independent variable is numerically related to the other variable.

The purpose of this study is to determine the difference between correlation and

regression analysis and also to know how to calculate and interpret the relationship of one

variable to another.

It is hoped that this study will help learners to learn and understand what correlation

and regression analysis are.


INTRODUCTION

This study discusses about what correlation and regression analysis is. According to

Surbhi (2017), these two are different from one another because correlation analysis

determines the strength of relationship between the two variables while the regression

analysis describes how an independent variable is numerically related to the dependent

variable. Despite its differences, Basically, this study will focuses on determining if the

variables given in the problem is related or not.

In this study, we will learn how to analyze and determine the relationship between

two or more variables. We will also know and learn the types of correlations that this study

will tackle so that it will help us know how each variable is connected with each other and

also we will be able to interpret the correlation coefficient if it shows positive, negative or no

correlation. Along with this, we will discuss the linear regression.

It is important to learn correlation and regression because we can use it in a real life

situation. For example, when we do a research paper we can use correlation to determine if

the variables that we have are connected to each other. Another example is that we are

business man we can use the things we learn in regression to predict what the expected sales

are in upcoming month or year and whether we expand our business and make a new product.
BACKGROUND OF THE STUDY

The correlation and regression analysis are concepts that are somehow related to each

other. According to Brutlag (2007), these concepts were developed by Sir Francis Galton in

19th century. It is said that Galton developed the ideas of correlation and regression in the

study of sweet peas and human physical characteristics. Along with this, according to Stanton

(2017), the coefficient of correlation or also known as the Pearson Product Moment

Correlation is one of the sub-topics of correlation that was developed by Karl Pearson. With

Galton and Pearson's efforts, it brought many general techniques of multiple regression and

the product-moment correlation coefficient.


METHODOLOGY

The researchers use textbook as a primary source to find reliable information that will

help them explain their research. To further explain the topic, the researchers use internet as a

secondary source to have a wide understanding and strengthen the study of their research.

Gathering information from the internet is not easy because not all information are credible

and reliable that's why the researchers carefully analyzed the information such as articles if it

contains concrete evidences. With the help of this information, the researchers will be able to

share their knowledge about the topic that they had study and also give new information.

Correlation analysis

Correlation Analysis is determining the strength of relationship of the two variables

(independent and dependent variable).

Positive Correlation

According to Hayes (2019), positive correlation is a relationship between two

variables in which both variables move in the same direction. It exists when as one variable

decreases, the other variable decreases, or when one variable increases the other will also

increases.

Example No. 1

If you look at the age of the child and the child’s height, you will find that as the child

gets older, the child gets taller. Because both are going up, it is positive correlation.

The relationship of child’s age to its height


Age 1 2 3 4 5 6 7 8

Height 33.5 33.7 37 39.5 42.5 45.5 47.7 50.5

(inch)

Independent Variable x: Age

Dependent Variable y: Height (inch)

The relationship of child’s age to its height


60

50

40
Height (inch)

30

20

10

0
0 1 2 3 4 5 6 7 8 9
Age

Example No. 2

The local ice cream shop keeps track of how much ice cream they sell versus the noon

temperature on that day. Here are the figures for the last 12 days:

Ice Cream Sales vs Temperature

Temperature °C Ice Cream Sales

14.5 °C $150

19 °C $200
20 °C $210

21°C $230

26.6 °C $280

28 °C $316

29.2 °C $330

31 °C $380

34 °C $410

35.5 °C $425

38 °C $435

39 °C $450

40 °C $462

Independent Variable x: Temperature

Dependent Variable y: Ice Cream Sales

Ice Cream Sales at different temperatures of the


day
500
450
400
350
Ice cream sales

300
250
200
150
100
50
0
0 5 10 15 20 25 30 35 40 45
Temperature (°C )
Example No. 1

Internet usage of people aged 15-45 in a day

Hours of 1 1 2 3 3 3 5 5 6 7 7 8 8 8 1 9 1
8 9
2 2 2
Internet

usage per

day

Age 45 40 37 42 34 33 35 29 31 30 27 28 24 20 24 22 16

Independent variable x: Hours of Internet usage per day

Dependent variable y: Age

Internet usage of people aged 15-45 in a day


50
45
40
35
30
Age

25
20
15
10
5
0
0 1 2 3 4 5 6 7 8 9 10
Hours of internet usage per day
Example No. 2:

Age of drivers who mostly get into accidents

Age of driver 16 17 18 19 20 21 22 23 24

No. of people 25 23 20 16 15 17 14 13 10

who get into an

accident

Independent variable x: Age of driver

Dependent variable y: No. of people who get into an accident

Age of drivers who mostly get into accidents


30
No. of people who mostly get into an

25

20
accident

15

10

0
0 5 10 15 20 25 30
Age of driver
No Correlation

No correlation shows that there is no relationship between two variables. Unlike the other

two types of correlation, this correlation has no pattern to the points.

Example No. 1

Josh wants to know the relationship between the last digit of his ten classmates’ phone

numbers and their vocabulary quiz scores.

Last 0 1 4 1 3 5 7 5 8 8

Digit

Score 65 75 80 84 90 94 55 70 90 85

Last Digits of Phone Numbers and Quiz Scores


100
90
80
70
Quiz Score

60
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9
Ladt Digit Number
Example No. 2

The diameter of the wheels and the height of the drivers

Height 35 40 90 65 80 55 60 70 85

(inches)

Diameter 5 16 18 10 19 15 4 10 19

(inches)

The diameter of the wheels and the height of the


drivers
20
18
16
14
DIameter (inches)

12
10
8
6
4
2
0
0 20 40 60 80 100
Height (inches)
Coefficient of a Correlation

The coefficient of correlation is a measure that describes how closely points in the

scatter diagram are spread around the line. The sample correlation coefficient is

represented by r while population correlation coefficient is represented by p. The formula

in getting the coefficient of a correlation is:

𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑟=
√𝑛 ∑ 𝑥 2 − (∑ 𝑥 2 ) • √𝑛 ∑ 𝑦 2 − (∑ 𝑦 2 )

where:

x = independent variable

y = dependent variable

n = sample size

Interpreting Correlation Coefficient, r

0.91 - 1.00 Very high positive (negative) correlation

0.71 - 0.90 High positive (negative) correlation

0.51 - 0.70 Moderate positive (negative) correlation

0.31 - 0.50 Low positive (negative) correlation

0.00 - 0.30 Little or no linear correlation

Example No. 1

Olivia is studying for a test, and she wonders if her friend, Laney, is also studying for

a test. She calls Laney and asked her how long she has been studying for her test all week,

approximately 8 hours total. Olivia has only been studying for her test for a couple of
hours. The next week, Olivia and Laney got a C. Olivia wonders if there is a correlation

between the number of hours spent studying (x) and the grade a student earns (y). Take a

look at the data Olivia collected from her classmates and see if you can find a correlation.

X Y

8 97

2 73

4 82

6 88

3 75

Solution:

X Y XY 𝑋2 𝑌2

8 97 776 64 9409

2 73 143 4 5329

4 82 328 16 6724

6 88 528 36 7744

3 75 225 9 5625

∑ 𝑋 = 23 ∑ 𝑌 = 415 ∑ 𝑋𝑌 = 2,000 ∑ 𝑋 2 = 129 ∑ 𝑌 2 = 34,831

5(2,000) − (23)(415)
𝑟= = 0.962
√5(129) − (23)2 • √ 5(34,831) − (415)2

The number of hours spent studying and the grade a student earns have very high positive

correlation.
Example No. 2

Researchers want to determine the relationship between a person’s age, x, and the

time spent in exercise, y (hour), per week.

X 13 18 20 25 30 40

Y 10 8 5 4 3 2

Solution:

X Y XY 𝑋2 𝑌2

13 12 156 169 144

18 10 180 324 100

20 8 160 400 64

25 5 125 625 25

30 4 120 900 16

40 2 80 1600 4

∑ 𝑋 = 146 ∑ 𝑌 = 41 ∑ 𝑋𝑌 = 821 ∑ 𝑋 2 = 4,018 ∑ 𝑌 2 = 353

6(821) − (146)(41)
𝑟= = −0.959
√6(4,018) − (146)2 • √6(353) − (41)2

Age and time spent in exercise have very high negative correlation.
Regression

According to Beers (2019), regression is a statistical measurement used in finance, investing,

and other disciplines that attempts to determine the strength of the relationship between one

dependent variable and a series of other changing variables.

The formulas used in linear regression

∑ 𝑥𝑦−𝑛𝑥̅ 𝑦
𝑏= 2
∑ 𝑥 2 −𝑛𝑥

𝑎 = 𝑦 − 𝑏𝑥

y’ = a + bx

Example No. 1

Weekly sales of a popular brand of chocolate and its price

Price (x) 16 7 8 10 3 5

Sales (y) 2 9 3 5 11 6

Solution:

X Y XY 𝑋2 𝑌2

16 2 32 256 4

7 9 63 49 81

8 3 24 64 9

10 5 50 100 25

3 11 33 9 121
5 6 30 25 36

∑ 𝑋 = 49 ∑ 𝑌 = 36 ∑ 𝑋𝑌 = 232 ∑ 𝑋 2 = 503 ∑ 𝑌 2 = 276

∑ 𝑥𝑦−𝑛𝑥̅ 𝑦
𝑏= 2 𝑎 = 𝑦 − 𝑏𝑥
∑ 𝑥 2 −𝑛𝑥

Answer:
(232)−6 (49)(36)
𝑏= 𝑎 = (36) − (0.745)(49)
503−(6)(49)2 y’ = a + bx

y’ = -0.505 + 0.705x
−10,352
𝑏= 𝑎 = 36 − 36.505
−13903

b = 0.745 a = -0.505

Example No. 2

The correct answer of the students based on their attitude

Correct 18 10 8 17 19 14 5 16 17 18

Attitude 95 73 50 65 93 86 55 79 76 89

Solution:

X Y XY 𝑋2 𝑌2

18 95 1710 324 9025

10 73 730 100 5329

8 50 400 64 2500

17 65 1105 289 4225

19 93 1767 361 8649

14 86 1204 196 7396

5 55 267 25 3025
16 79 1264 256 6241

17 76 1292 289 5776

18 89 1602 324 7921

∑ 𝑋 = 142 ∑ 𝑌 = 761 ∑ 𝑋𝑌 = 11349 ∑ 𝑋 2 = 2228 ∑ 𝑌 2 = 60087

∑ 𝑥𝑦−𝑛𝑥̅ 𝑦
𝑏= 2 𝑎 = 𝑦 − 𝑏𝑥
∑ 𝑥 2 −𝑛𝑥

(11,349)−10(142)(761)
𝑏= 𝑎 = (761) − (−5.362)(142)
2228−(10)(142)2

−1,069,271
𝑏= 𝑎 = 761 − 761.404
199,412

b = -5.362 a = -0.404

Answer:

y’ = a + bx

y’ = -0.404 – 5.362x
RESULT AND DISCUSSION

The study made by the researchers was able to show what the difference of correlation

and regression analysis is. These two concepts are somehow similar from one another that’s

why the researchers think that there is still a confusion. Based on the study that the

researchers have conducted, the correlation analysis is the one that shows the strength of the

relationship of the two variables. On the other hand, regression analysis shows how

independent variable is numerically related with the dependent variable.

In this study, when you are calculating for coefficient of correlation and the linear

regression. You should be more careful in solving the problem because if one of the values

you have computed is wrong, the rest of your answer will be wrong also. However, if you

were able to answer it correctly and understand the research well. It will be very helpful for

you because you can use this in real-life situation like when you are having a research.

You might also like