MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression

MATHEMATICS IN THE MODERN WORLD
MODULE 4.3
I. TOPIC: MEASURE OF VARIATION, NORMAL DISTRIBUTION & SIMPLE

REGRESSION
II. OBJECTIVE(S):
1. Explain the importance of measuring variability

2. Calculate and interpret the index of range, interquartile range, the variance, and
the standard deviation
3. Identify the relative strengths and weaknesses of the measures
4. Understand the concepts of normal distribution and simple regression
III. INTRODUCTION:
Statistics (in the singular sense) is a scientific discipline that deals with the methods and theories
in the manipulation of numerical data. It leads to the analysis and interpretation of the data set so
one can make a sound decision and thorough inferences.
Statistics (in the plural sense) are numerical data. Some examples are revenues, allowed kilograms
for check in luggage, stipend, tuition fee, ID number, military ranks, etc.
IV. DISCUSSION:
MEASURE OF VARIATION
A measure of variation is a single value that is used to describe the spread of the distribution. A
measure of central tendency alone does not uniquely describe a distribution.
There are two types of measure of variation; (1) Absolute measures of dispersion and (2) Relative
measure of dispersion.
Absolute measures of dispersion consist of Range, Inter-quartile Range, Variance and Standard
Deviation.
Relative Measure of Variation consist only of coefficient of variation.
A. RANGE
Range is the difference between the maximum and the minimum value in a data set.
R = MAX – MIN
Example:
Pulse rates of 15 male residents of a village
54 58 58 60 62 65 66 71 74 75 78 80 85
Range = 85 – 54 = 31
So, the range is 31.
Properties of range:
1. The karger the value of the range, the more dispersed the observations are.
2. It is quick and easy to understand
3. A rough measure of dispersion
B. INTERQUARTILE RANGE
The difference between the third quartile and the first quartile.
IQR = Q3 – Q1
Properties of the interquartile range:
1. Reduces the influence of extreme values
2. Not as easy to calculate as the range
The following are the steps in calculating interquartile range:

1. Quartiles are scope points which divide the distribution into four equal parts.
2. First quartile (Q1) or the lower quartile is the value that separates the lower 25% from the
upper 75% of the scores.
3. Third quartile (Q3) or the upper quartile is the value that separates the lower 75% from the
upper 25% of the scores.
4. Locating quartiles is similar to the median.
a. That is Q1 = ¼ * n, Q3 = ¾ * n
5. Arrange the scores in an ascending order to locate the Q1 and Q3.
Example:
First, compute the Q3 and Q1.
Q1= ¼ * 9 = 2.25 round up to 3rd from the lowest, thus Q1=43
Seventy percent of the expemses are higher that 43,000php but only 25% are below it.
Q3= ¾ * 9 = 6.75 roud up to 7th from the lowest, thus Q3 = 59
Twenty five percent of the expenses are higher that 59,000php but 75% are below it.
Therefore, IQR = 59 – 43 = 14
This means that the middle 50% of the housewives’ expemses has a deviation of 14,000php.
C. VARIANCE
Variance is important measure of variance. It shows variation about the mean
Formula:
Population Variance:
 2
=
(X − X ) 2
Sample Variance:
s 2
=
(X − X ) 2
N −1
D. STANDARD DEVIATION
Most important measure of variation. It is the squareroot of variance. It has the same units as the
original data.
Formula:
Population Standard Deviation:
 =  = 2 (X − X ) 2
Sample Standard Deviation:
s= s = 2 (X − X ) 2
N −1
Example:
Consider the following data:
10 12 14 15 17 18 18 24
N=8
Mean = 16
(10−16)2 + (12−16)2 + (14−16)2+ (15−16)2+(17−16)2 +(18−16)2+(18−16)2 +(24−16)2

𝑠= √ 7
S= 4.309
E. COEFFICIENT OF VARIATION
Measure of relative variaktion. Usually expressed in percent. It shows variation relative to the
mean and used to compare 2 or more groups.
Formula:
𝑆𝐷
𝐶𝑉 = ( ) 𝑋 100%
𝑀𝐸𝐴𝑁
Example:
The data below are the number of latecomers in a week from the three sections in the college if
Liberal Arts. Which section has the highest variability?
Section 1: 5, 4, 2, 1, 3, 1, 2
Section 2: 1, 0, 2, 1, 3, 1, 2
Section 3: 2, 1, 2, 1, 3, 1, 2
MEAN STANDARD DEVIATION COEFFECIENT OF VARIANCE

Section 1 2.571 1.511 58.77%
Section 2 1.429 0.976 68.30%
Section 3 1.714 0.76 44.34%
The most dispersed section is section 2, since it has the highest variability with a CV of 68.30%.
Section 3 has the least variability with a CV of 44.34%.
NORMAL DISTRIBUTION
Normal distribution is also known as Gaussian distribution, after the mathematician and
astronomer Karl Gauss. It is a continuous distribution which is regarded by many as the most
significant probability distribution in the entire theory of statistics, particularly in the field of
statistical inference.
It is a graphically represented by a symmetrical, bell shaped curve known as the normal curve.
Norma Distribution is characterized by the following:

The mean, median and mode have the same value, and therefore are plotted on the same
point (central point) along the horizontal axis.
The curve is symmetric about the vertical line which contains the mean.
The curve is asymptotic to the horizontal axis; that is, the curve extends indefinitely in
both directions.
The total area under the normal curve is equal to 1.
The standard normal distribution is a normal distribution of standardized values called z-
scores. A z-score is measured in units of the standard deviation.
Why Standardize?
Because it can help us make decisions about our data.
Example:
The IQ scores of a large group of students are approximately normally distributed with a mean of
100 and a standard deviation of 15. What is the probability that a randomly chosen student from
this group will have an IQ score?
a. above 120?
b. below 128?
c. below 93?
d. between 98 and 105?
Solution:
a. above 120?
b.
𝑥 − 𝜇 120 − 100 20
𝑧= = = = 1.33
𝜎 15 15
𝑃(𝑧 > 1.33) = 0.5 − 0.4082 = 𝟎. 𝟎𝟗𝟏𝟖 = 9.18%
b. below 128?
𝑃(𝑥 < 128)
𝑥 − 𝜇 128 − 100 28
𝑧= = = = 1.87
𝜎 15 15
𝑃(𝑧 < 1.87) = 0.5 + 0.4693 = 0.9693 = 96.93%
c. below 93?
𝑃(𝑥 < 93)
𝑥 − 𝜇 93 − 100 −7
𝑧= = = = −0.47
𝜎 15 15
𝑃(𝑧 < −0.47) = 0.5 − 0.1808 = 0.3192 = 31.92%
d. between 98 and 105?

𝑃(98 < 𝑥 < 105)
𝑥1 − 𝜇 98 − 100
𝑧1 = = = −0.13
𝜎 15
𝑥2 − 𝜇 105 − 100
𝑧2 = = = 0.33
𝜎 15
SIMPLE LINEAR REGRESSION ANALYSIS

Regression determines if the independent variable 𝑥 and the dependent variable 𝑦 show a
positive or negative relationship.
The variable 𝑥 is used to explain or predict the value of the dependent variable, thus it is called
explanatory or predictor variable or repressor.
The variable that is being explained or predicted is symbolized as 𝑦 and is called explained or
predicted variable or regressand.
Linear regression shows a direct relationship between x and y.a
It is represented by the linear equation 𝑦 = 𝑎 + 𝑏𝑥,
Where a: y – intercept of the line (regression constant)
b: slope of the line (regression coefficient)
Regression analysis aims to establish a line called the regression line that abridges the stochastic
relationship between x and y.
Direct and Indirect Relationships
The Method of Least Square is more precise method of finding the regression line which minimizes
the sum of the squared errors.
𝑦̂ = 𝑎 + 𝑏𝑥,
∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 ∑ 𝑦−𝑏 ∑ 𝑥
𝑏 = 𝑛 ∑ 𝑥 2−(∑ 𝑥)2 , 𝑎 = , 𝑛 is the number of pairs
𝑛
Example: The manager of an art gallery wants to determine the relationship between the auction
of price of paintings, y, and the number of bidders, x. From the data,
a. Determine the regression model
b. Find the estimated price of a painting if there are 20 bidders
c. Find the estimated number of bidders if the price is P13k.
y (in 12 8.5 9.6 11 7.3 6.9 10.5 9

thousand
of pesos)
x 9 12 14 16 17 15 10 13
(bidder)
a. Determine the regression model
𝑦̂ = 𝑎 + 𝑏𝑥 = 14.3874 − 0.3802𝑥
b. Find the estimated price of a painting if there are 20 bidders
𝑦̂ = 𝑎 + 𝑏𝑥 = 14.3874 − 0.3802(20) = 𝑃𝐻𝑃6.7835
c. Find the estimated number of bidders if the price is P13k.

𝑦̂ = 𝑎 + 𝑏𝑥
13 = 14.3874 − 0.3802𝑥
0.3802𝑥 = 14.3874 − 13
(14.3874 − 13)
𝑥= = 3.644~4 𝑏𝑖𝑑𝑑𝑒𝑟𝑠
0.3802
d. 𝑟 = −0.6014, 𝑟 2 = 0.3617 = 36.17% 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛
V. SUMMARY
A measure of variability is a summary statistic that represents the amount of dispersion in a
dataset. In statistics, variability, dispersion, and spread are synonyms that denote the width of the
distribution. A range is one of the most basic measures of variation. It is the difference between
the smallest data item in the set and the largest. Quartiles divide your data into quarters: the
lowest 25%, the next lowest 25%, the second highest 25% and the highest 25%. The interquartile
range is one of the most popular measures of variation used in statistics. It is a measure of how
data is spread around the mean. The basic formula is: IQR = Q3 – Q1. Variance tells you how far
a data set is spread out, but it is an abstract number that really is only useful for calculating
the Standard Deviation.
Normal Distribution is a continuous distribution which is regarded by many as the most
significant probability distribution in the entire theory of statistics, particularly in the field of
statistical inference.
Regression determines if the independent variable 𝑥 and the dependent variable 𝑦 show a
positive or negative relationship.
VI. REFERENCES
https://online.stat.psu.edu/stat500/lesson/1/1.5/1.5.3
http://www.glencoe.com/sites/pdfs/impact_math/ls8_c1_measures_of_variation.pdf
https://brazosport.edu/Assets/faculty/agut-
ioana/statistics/7.%20Measures%20of%20Variation.pdf
https://www.statisticshowto.com/empirical-rule-2
https://www.mathsisfun.com/data/standard-normal-distribution-table.html

MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression

Uploaded by

Copyright:

Available Formats

MATHEMATICS IN THE MODERN WORLD

I. TOPIC: MEASURE OF VARIATION, NORMAL DISTRIBUTION & SIMPLE

1. Explain the importance of measuring variability

The following are the steps in calculating interquartile range:

First, compute the Q3 and Q1.

Q1= ¼ * 9 = 2.25 round up to 3rd from the lowest, thus Q1=43

Q3= ¾ * 9 = 6.75 roud up to 7th from the lowest, thus Q3 = 59

Population Standard Deviation:

Sample Standard Deviation:

(10−16)2 + (12−16)2 + (14−16)2+ (15−16)2+(17−16)2 +(18−16)2+(18−16)2 +(24−16)2

MEAN STANDARD DEVIATION COEFFECIENT OF VARIANCE

Norma Distribution is characterized by the following:

𝑃(𝑧 > 1.33) = 0.5 − 0.4082 = 𝟎. 𝟎𝟗𝟏𝟖 = 9.18%

𝑃(𝑧 < 1.87) = 0.5 + 0.4693 = 0.9693 = 96.93%

𝑃(𝑧 < −0.47) = 0.5 − 0.1808 = 0.3192 = 31.92%

d. between 98 and 105?

SIMPLE LINEAR REGRESSION ANALYSIS

y (in 12 8.5 9.6 11 7.3 6.9 10.5 9

a. Determine the regression model

c. Find the estimated number of bidders if the price is P13k.

d. 𝑟 = −0.6014, 𝑟 2 = 0.3617 = 36.17% 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛

You might also like