You are on page 1of 8

Course Guide in

CIT204 – Quantitative Methods


(Midterm Period Only)

Prepared by:

June S. Garcia, Ph.D.


Course Facilitator

April 2021

1
Unit 2: Inferential Statistics

At the end of the unit, the student must have:


1. distinguished inferential statistics from descriptive statistics; and
2. used linear correlation and regression analysis to variables under study.

Inferential Statistics – the technique by which decisions about a statistical population are made based
only on a sample having been observed or a judgement having been obtained. This kind of statistics is
concerned more with generalizing information or making inference about population.
Normal Distribution (or Gaussian Distribution) – is a continuous probability distribution that
describes data that clusters around a mean.
- is a continuous, symmetric, bell-shaped distribution of a variable
- is often referred to as the Gaussian distribution in honor of
Carl Friedrich Gauss (1777 – 1855)
Normal Curve – the graph of normal distribution.

Abraham de Moivre (1667-1754) – derived the mathematical operation of the normal curve in 1733.
Skewness – is the degree of asymmetry, or departure from symmetry of a distribution.

2
Skewness > 0 – Right skewed distribution (or positive skewness) – most values are
concentrated on left of the mean, with extreme values to the right.
Skewness < 0 – Left skewed distribution (or negative skewness) – most values are
concentrated on the right of the mean, with extreme values to the left.
Skewness = 0 (mean = median) Symmetric or normal distribution - the distribution is
symmetrical around the mean.
Kurtosis - is a statistical measure used to describe the degree to which scores cluster in the tails or
the peak of a frequency distribution. The peak is the tallest part of the distribution, and the tails are the
ends of the distribution. Kurtosis is from the Greek word krystos or kurtos, meaning bulging.
Three Types of Kurtosis
1. Leptokurtic (k>0 or positive kurtosis) are distributions where values clustered heavily
or pile up in the center. There are tall distribution with narrow humps and long and high
tails.
• An extreme positive kurtosis indicates a distribution where more of the values are located
in the tails of the distribution rather than around the mean.
2. Mesokurtic (k=0 or normal distribution) are intermediate distribution which are
neither too peaked nor too flat. The values are immediately distributed about the center.
3. Platykurtic (k<0 or negative kurtosis) are flat distributions with values more evenly
distributed about the center with broad humps and short tails.

Linear Correlation Analysis


Correlation Analysis – attempts to measure the strength of relationship between two variables by means
of a single number called a correlation coefficient.
Correlation – refers to the departure of two random variables from independence.
Pearson’s product-moment correlation coefficient or simply correlation coefficient (or
Pearson’s r) - is a measure of the linear strength of the association between two variables. It is founded
by Karl Pearson.
- The most widely used in statistics to measure the degree of the relationship between
the linear related variables.
𝑁(Σ𝑥𝑦) − (Σ𝑥)(Σ𝑦)
𝑟=
√[𝑁Σ𝑥 2 − (Σ𝑥)2 ][𝑁Σ𝑦 2 − (Σ𝑦)2 ]

where:
𝑟 – correlation coefficient
𝑁 – the number of pairs of scores
Σ𝑥𝑦 – the sum of the products of paired scores
Σ𝑥 – the sum of x scores
Σ𝑦 – the sum of y scores
Σ𝑥 2 – the sum of squared x scores
Σ𝑦 2 – the sum of squared y scores

3
Direction of a Correlation

Graphical representation of different correlation coefficients

• A positive correlation (r>0) is a relationship between two variables in which both variables
move in the same direction. Therefore, when one variable increases as the other variable increases,
or one variable decreases while the other decreases. An example of positive correlation would be
height and weight. Taller people tend to be heavier.
• A negative correlation (r<0) is a relationship between two variables in which an increase in one
variable is associated with a decrease in the other. An example of negative correlation would be
height above sea level and temperature. As you climb the mountain (increase in height) it gets
colder (decrease in temperature).
• A zero correlation (r=0) exists when there is no relationship between two variables. For example
there is no relationship between the amount of tea drunk and level of intelligence.
• A perfect positive correlation (r=1) means the relationship that exists between two variables is
exactly the same all of the time.
• A perfect negative correlation (r=-1) means the relationship that exists between two variables is
exactly opposite all of the time.

Strength of a Correlation
r Description

+.70 or higher Very strong positive relationship

+.40 to +.69 Strong positive relationship

+.30 to +.39 Moderate positive relationship

+.20 to +.29 weak positive relationship

+.01 to +.19 No or negligible relationship

0 No relationship [zero correlation]

-.01 to -.19 No or negligible relationship

-.20 to -.29 weak negative relationship

-.30 to -.39 Moderate negative relationship

-.40 to -.69 Strong negative relationship

-.70 or higher Very strong negative relationship

4
Example:
Find the value of the correlation coefficient from the following table:

Subject Age (x) Glucose Level (y)


1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Solution:

How Do We Get It?


4257 = (43)(99) 1365 = (21)(65)
1849 = (43)2 = (43)(43) 441 = (21)2 = (21)(21)
9801 = (99)2 = (99)(99) 4225 = (65)2 = (65)(65)
𝑁 = 6 (the total number of subjects/items/terms/sample/number of pairs of scores)

𝑁(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) 6 (20485)−(247)(486)
𝑟= =
√[𝑁Σ𝑥 2 −(Σ𝑥)2 ][𝑁Σ𝑦 2 −(Σ𝑦)2 ] √ [6(11409)−(247)2 ][6(40022)−(486)2 ]
122910−120042 2868 2868 2868
𝑟= = = = = 0.5298
√[68454−61009][240132−236196] √(7445)(3936) √29303520 5413.2726

The correlation coefficient is 𝑟 = 0.5298. Age and glucose level have a strong positive relationship.

Strong Positive Correlation Between


Age and Glucose Level
120

100
Glucose Level

80

60

40

20

0
0 10 20 30 40 50 60 70
Age

Scatterplot or Scatter Charts


5
Linear Regression Analysis

Linear Regression

Regression equation – a mathematical equation that allows us to predict values of one dependent
variable from known values of one or more independent variables.

Purpose of Regression Analysis


The purpose of Regression Analysis is to determine the trend as two variables which are assumed to be
related change over time, that is to say whether the trend is rising or falling. Once the trend has been
established, it can be used to determine the value of one variable when the value of the other variable is
given or assumed. This trend can be represented by a straight line. In algebra, we know that it is possible
to determine the equation of any line. It is the equation of this line which is used to determine the value of
one variable when the value of the other variable is known. Furthermore, a significant test can be
conducted to evaluate whether the value of one variable predicts the value of the other variable.

𝑁Σ𝑋𝑌−(Σ𝑋)(Σ𝑌)
𝑏= 𝑁Σ𝑋 2 −(Σ𝑋)2
(Σ𝑌)(Σ𝑋 2 )−(Σ𝑋)(ΣX𝑌)
𝑎= 𝑁Σ𝑋 2 −(Σ𝑋)2
𝑦 = 𝑎 + 𝑏𝑥 (the slope-intercept form of a straight line which serves as the simple linear
regression equation)

where: 𝑋 – independent variable


𝑌 – dependent variable
𝑁 - the number of pairs of scores
𝑏 – slope
𝑎 – y-intercept
𝑦 – the predicted value given by the regression line and an actual value y for some value of x

Example 2:
A researcher is examining the relationship between stress levels and performance on a test of cognitive
performance. She hypothesizes that stress levels lead to an increase in performance to a point, and then
increased stress decreases performance. She tests ten participants, who have the following levels of stress
as shown in the table below. When she tests their levels of mental performance, she finds the following
cognitive performance scores:

Levels of Stress Cognitive Performance Scores


10.94 5.24
12.76 4.64
7.62 4.68
8.17 5.04
7.83 4.17
12.22 6.20
9.23 4.54
11.17 6.55
11.88 5.79
8.18 3.17

1. Determine the simple linear regression equation.


2. What is the cognitive performance score if the participant’s level of stress is 13.2? 10.15?

6
Solution:

How Do We Get It?


119.68 = (10.94)2 = (10.94)(10.94) 162.82 = (12.76)2 = (12.76)(12.76)
57.33 = (10.94)(5.24) 59.21 = (12.76)(4.64) N or n = 10

1. The simple linear regression equation is y = 1.8393 + 0.3163x.


a = (∑Y)(∑X2) – (∑X)(∑XY) = (50.02)(1035.95) – (100)(511.57) = 1.8393
N∑X2-(∑X)2 10(1035.95) – (100)2
b = N∑XY – (∑X)(∑Y) = 10(511.57) – (100)(50.02) = 0.3163
N∑X2-(∑X)2 10(1035.95) – (100)2
y = a + bx
y = 1.8393 + 0.3163x

2. If x= 13.2, then
y = 1.8393 + 0.3163(13.2) = 1.8393 + 4.17516 = 6.01446
If x= 10.15, then
y = 1.8393 + 0.3163(10.15) = 1.8393 + 3.210445 = 5.049745

The Simple Linear Regression Equation of the Level of


Stress and the Cognitive Performance Scores
7
Cognitive Performance Scores

6
y = 0.3163x + 1.8393
5
4
3
2
1
0
0 2 4 6 8 10 12 14
Level of Stress

Scatterplot or Scatter Charts


7
References:
aiSOURCE. (2015). What is Skewness. Retrieved March 31, 2021 at
https://www.managedfuturesinvesting.com/what-is-skewness/
Benchmark Six Sigma. (2018). Retrieved March 30, 2021 at
https://www.benchmarksixsigma.com/forum/topic/35195-skewness-and-kurtosis/
Glen, S. (2021). Correlation Coefficient: Simple Definition, Formula, Easy Steps. Retrieved April 5, 2021 at
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/
International Journal of Advanced Medical Research. (2021). Retrieved March 31, 2021 at
https://www.ijamhrjournal.org/viewimage.asp?img=IntJAdvMedRes_2014_1_1_30_134449_u5.jpg
McLeod, S. A. (2019). What is Kurtosis? Retrieved March 31, 2021 at
https://www.simplypsychology.org/kurtosis.html
McLeod, S. A. (2020). Correlation Definitions, Examples & Interpretation. Retrieved April 5, 2021 at
https://www.simplypsychology.org/correlation.html
Picardo, E. (2021). Negative Correlation. Retrieved April 5, 2021 at
https://www.investopedia.com/terms/n/negative-
correlation.asp#:~:text=In%20statistics%2C%20a%20perfect%20negative,opposite%20all%20of
%20the%20time
Sirug, W. S. (2015). Basic probability and statistics: A step by step approach revised edition . Manila,
Philippines: Mindshapers Co., Inc.

 Unit Test

The following table shows the final grades of ten students in Algebra and Statistics.

Algebra (X) Statistics (Y)


75 82
80 78
93 86
65 72
87 91
71 80
98 95
68 72
84 89
77 74

1. Solve for the correlation coefficient.


2. What is the strength and direction of a correlation?
3. Find the simple linear regression equation.
4. What is a student’s expected grade in Statistics if his grade in Algebra is 78? 89? 95?

No one is more secure than the one who is


held in God’s hands. Safety is not found in the
absence of danger but in the presence of
God.Prepared by Sir June

You will keep him in perfect peace, whose mind is stayed on you (Isaiah 26:3).

You might also like