Professional Documents
Culture Documents
Prepared by:
April 2021
1
Unit 2: Inferential Statistics
Inferential Statistics – the technique by which decisions about a statistical population are made based
only on a sample having been observed or a judgement having been obtained. This kind of statistics is
concerned more with generalizing information or making inference about population.
Normal Distribution (or Gaussian Distribution) – is a continuous probability distribution that
describes data that clusters around a mean.
- is a continuous, symmetric, bell-shaped distribution of a variable
- is often referred to as the Gaussian distribution in honor of
Carl Friedrich Gauss (1777 – 1855)
Normal Curve – the graph of normal distribution.
Abraham de Moivre (1667-1754) – derived the mathematical operation of the normal curve in 1733.
Skewness – is the degree of asymmetry, or departure from symmetry of a distribution.
2
Skewness > 0 – Right skewed distribution (or positive skewness) – most values are
concentrated on left of the mean, with extreme values to the right.
Skewness < 0 – Left skewed distribution (or negative skewness) – most values are
concentrated on the right of the mean, with extreme values to the left.
Skewness = 0 (mean = median) Symmetric or normal distribution - the distribution is
symmetrical around the mean.
Kurtosis - is a statistical measure used to describe the degree to which scores cluster in the tails or
the peak of a frequency distribution. The peak is the tallest part of the distribution, and the tails are the
ends of the distribution. Kurtosis is from the Greek word krystos or kurtos, meaning bulging.
Three Types of Kurtosis
1. Leptokurtic (k>0 or positive kurtosis) are distributions where values clustered heavily
or pile up in the center. There are tall distribution with narrow humps and long and high
tails.
• An extreme positive kurtosis indicates a distribution where more of the values are located
in the tails of the distribution rather than around the mean.
2. Mesokurtic (k=0 or normal distribution) are intermediate distribution which are
neither too peaked nor too flat. The values are immediately distributed about the center.
3. Platykurtic (k<0 or negative kurtosis) are flat distributions with values more evenly
distributed about the center with broad humps and short tails.
where:
𝑟 – correlation coefficient
𝑁 – the number of pairs of scores
Σ𝑥𝑦 – the sum of the products of paired scores
Σ𝑥 – the sum of x scores
Σ𝑦 – the sum of y scores
Σ𝑥 2 – the sum of squared x scores
Σ𝑦 2 – the sum of squared y scores
3
Direction of a Correlation
• A positive correlation (r>0) is a relationship between two variables in which both variables
move in the same direction. Therefore, when one variable increases as the other variable increases,
or one variable decreases while the other decreases. An example of positive correlation would be
height and weight. Taller people tend to be heavier.
• A negative correlation (r<0) is a relationship between two variables in which an increase in one
variable is associated with a decrease in the other. An example of negative correlation would be
height above sea level and temperature. As you climb the mountain (increase in height) it gets
colder (decrease in temperature).
• A zero correlation (r=0) exists when there is no relationship between two variables. For example
there is no relationship between the amount of tea drunk and level of intelligence.
• A perfect positive correlation (r=1) means the relationship that exists between two variables is
exactly the same all of the time.
• A perfect negative correlation (r=-1) means the relationship that exists between two variables is
exactly opposite all of the time.
Strength of a Correlation
r Description
4
Example:
Find the value of the correlation coefficient from the following table:
Solution:
𝑁(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) 6 (20485)−(247)(486)
𝑟= =
√[𝑁Σ𝑥 2 −(Σ𝑥)2 ][𝑁Σ𝑦 2 −(Σ𝑦)2 ] √ [6(11409)−(247)2 ][6(40022)−(486)2 ]
122910−120042 2868 2868 2868
𝑟= = = = = 0.5298
√[68454−61009][240132−236196] √(7445)(3936) √29303520 5413.2726
The correlation coefficient is 𝑟 = 0.5298. Age and glucose level have a strong positive relationship.
100
Glucose Level
80
60
40
20
0
0 10 20 30 40 50 60 70
Age
Linear Regression
Regression equation – a mathematical equation that allows us to predict values of one dependent
variable from known values of one or more independent variables.
𝑁Σ𝑋𝑌−(Σ𝑋)(Σ𝑌)
𝑏= 𝑁Σ𝑋 2 −(Σ𝑋)2
(Σ𝑌)(Σ𝑋 2 )−(Σ𝑋)(ΣX𝑌)
𝑎= 𝑁Σ𝑋 2 −(Σ𝑋)2
𝑦 = 𝑎 + 𝑏𝑥 (the slope-intercept form of a straight line which serves as the simple linear
regression equation)
Example 2:
A researcher is examining the relationship between stress levels and performance on a test of cognitive
performance. She hypothesizes that stress levels lead to an increase in performance to a point, and then
increased stress decreases performance. She tests ten participants, who have the following levels of stress
as shown in the table below. When she tests their levels of mental performance, she finds the following
cognitive performance scores:
6
Solution:
2. If x= 13.2, then
y = 1.8393 + 0.3163(13.2) = 1.8393 + 4.17516 = 6.01446
If x= 10.15, then
y = 1.8393 + 0.3163(10.15) = 1.8393 + 3.210445 = 5.049745
6
y = 0.3163x + 1.8393
5
4
3
2
1
0
0 2 4 6 8 10 12 14
Level of Stress
Unit Test
The following table shows the final grades of ten students in Algebra and Statistics.
You will keep him in perfect peace, whose mind is stayed on you (Isaiah 26:3).