You are on page 1of 14

Statistical Methods DR.

Asmaa El-Toony

Chapter 3
Data Description (Summarizing)

(A)Measures of Central Tendency

The Statistic and The Parameter

A Statistic: It is a descriptive measure computed from the data of a sample.

A Parameter: It is a descriptive measure computed from the data of a


population.

Since it is difficult to measure a parameter from the population, a sample is


drawn of size n, whose values are 𝑥1 , 𝑥2 , … … . 𝑥𝑛 . From this data, we measure
the statistic.

The Mean

For ungrouped data:


The Population Mean: If are the population values, 𝑥1 , 𝑥2 , … … . 𝑥𝑁 , then the
population mean is
Statistical Methods DR. Asmaa El-Toony

∑𝑁
𝑖=1 𝑋𝑖
𝜇= ,
𝑁
which is usually unknown, then we use the sample mean to estimate or
approximate it.

The Sample Mean: The mean (average) of a set of values 𝒙𝟏 , 𝒙𝟐 , … … . 𝒙𝒏 is


∑𝒏
𝒊=𝟏 𝒙𝒊
̅=
𝒙
𝒏
The sample mean x̅ is a statistic (it is known) and is used to approximate
(estimate) the population mean μ.

Example 2.9: Here is a random sample of size 10 of ages:


𝑥1 = 42, 𝑥2 = 28, 𝑥3 = 28, 𝑥4 = 61, 𝑥5 = 31, 𝑥6 = 23, 𝑥7 = 50, 𝑥8 = 34,
𝑥9 = 32, 𝑥10 = 37.

𝑥̅ = (42 + 28 + … + 37) / 10 = 36.6

Properties of the Mean:


1.Uniqueness. For a given set of data there is one and only one mean.
2.Simplicity. It is easy to understand and to compute.
3.Affected by extreme values. Since all values enter the computation.
4. The mean can only be found for quantitative variables.

For grouped data:


A B C D
Class Frequency f Midpoint Xm f · Xm
5.5–10.5 1
10.5–15.5 2
15.5–20.5 3
20.5–25.5 5
25.5–30.5 4
30.5–35.5 3
35.5–40.5 2
n = 20
Statistical Methods DR. Asmaa El-Toony

Find the midpoints of each class and enter them in column C.


5.5 + 10.5
Xm = =8
2

A B C D
Class Frequency f Midpoint Xm f · Xm
5.5–10.5 1 8 8
10.5–15.5 2 13 26
15.5–20.5 3 18 54
20.5–25.5 5 23 115
25.5–30.5 4 28 112
30.5–35.5 3 33 99
35.5–40.5 2 38 76
n = 20 ∑ f • Xm = 490

∑ 𝑓𝑋𝑚 490
𝑋̿ = = = 24.5
𝑛 20

The Median (M):


When ordering the data, it is the observation that divides the set of observations
into two equal parts such that half of the data are before it and the other are after
it.

Let 𝒙𝟏 , 𝒙𝟐 , … … . 𝒙𝒏 be the sample values, then

If n is odd → the median will be the middle of observations. It will be


𝑛+1
the th ordered observation.
2

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑋𝑛+1
2

When n = 11, then the median is the observation 𝑋6 .


Statistical Methods DR. Asmaa El-Toony

If n is even → there are two middle observations. The median will be


the mean of these two middle observations. It will be the mean of
(n/2, (n+1)/2) the ordered observation.
𝑋𝑛 + 𝑋𝑛+1
2 2
𝑀𝑒𝑑𝑖𝑎𝑛 =
2

When n = 12, then the median is the 6.5th observation, which is an observation
halfway between the 6th and 7th ordered observation.

Example 2.11: For the same random sample, the ordered observations will be
as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61.

Solution:
Since n = 10, then the median is the 5.5th observation, i.e.
M = (32+34)/2 = 33.

Example 2.12: Find the median for the sample values:


10, 54, 21, 38, 53.
Solution:
Since n = 5 (odd number) and ordered set is 10, 21, 38, 53, 54.
Then Median = X6 = 38

Properties of the Median:


1.Uniqueness. For a given set of data there is one and only one median.
2.Simplicity. It is easy to calculate.
3.It is not affected by extreme values as is the mean.
4.The median can only be found for quantitative variables
Statistical Methods DR. Asmaa El-Toony

The Mode:
The mode of a set of values is that value which occurs with the highest
frequency.
If all values are different there is no mode and sometimes, there are more than
one mode.

Example 2.13: Find the mode of the signing bonuses of eight NFL players for a
specific year. The bonuses in millions of dollars are:
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10

Solution:
Since $10 million occurred 3 times—a frequency larger than any other number,
the mode is $10 million.

Example 2.14: Find the mode for the number of branches that six banks have:
401, 344, 209, 201, 227, 353

Solution:
Since each value occurs only once, there is no mode.

Note: Do not say that the mode is zero. That would be incorrect, because in some
data, such as temperature, zero can be an actual value.

Example 2.15:
The data show the number of licensed nuclear reactors in the United States for a
recent 15-year period. Find the mode.

104 104 104 104 104


110 109 109 109 107
109 111 112 111 109
Statistical Methods DR. Asmaa El-Toony

Solution:
Since the values 104 and 109 both occur 5 times, the modes are 104 and 109. The
data set is said to be bimodal.

For grouped data:


The mode for grouped data is the modal class. The modal class is the class with the
largest frequency.

Example 2.16:

The modal class is 20.5–25.5, since it has the largest frequency. Sometimes
the midpoint of the class is used rather than the boundaries; hence, the
mode could also be given as 23 miles per week.

Properties of the Mode:


1.The mode is simple to calculate but it is not “good”.
2.The mode is not affected too much by extreme values
3. It may be used for describing qualitative data.
Statistical Methods DR. Asmaa El-Toony

(B) Measures of Dispersion (Variation)


A measure of dispersion conveys information regarding the amount of variability
present in a set of data. The variation or dispersion in a set of values refers to how
spread out the values are from each other.

Note:
1. If all the values are the same. there is no dispersion.
2. If all the values are different, there is a dispersion:
3.If the values close to each other, the amount of Dispersion small.
4) If the values are widely scattered, the Dispersion is greater

Some measures of dispersion:


Statistical Methods DR. Asmaa El-Toony

1.Range:
Range = Largest value- smallest value
Note: The range is not useful as a measure of the variation since it only considers
two of the values. (it is not good)

Example 2.17: 43, 66, 61, 64, 65, 38, 59, 57, 57, 50.

Range = 66 - 38=28

2.The Variance:

Population Variance and Standard Deviation

Example 2.18: A testing lab wishes to test two experimental brands of outdoor
paint to see how long each will last before fading. The testing lab makes 6
gallons of each paint to test. Since different chemical agents are added to each
group and only six cans are involved, these two groups constitute two small
populations. The results (in months) are shown. Find the variance and
standard deviation for the data set for brand A and B.
Statistical Methods DR. Asmaa El-Toony

Solution:
For Brand A:
Statistical Methods DR. Asmaa El-Toony

Sample Variance and Standard Deviation


The sample variance for a sample of n measurements is equal to the sum of the
Square deviations from the mean divided by (n -1). In symbols, using 𝑆 2 to
represent the sample variance,
𝑛 2
∑𝑖=1 (𝑥𝑖 − 𝑥̅ )
𝑆2 =
𝑛−1

Note: A shortcut formula for calculating 𝑆 2 is


2
𝑛 2 ( ∑𝑛
𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 −
S2 = 𝑛
𝑛−1

DEFINITION
The sample standard deviation, s, is defined as the positive square root of the
sample variance, 𝑆 2 , Thus, 𝑠 = √𝑠 2 .

The population variance, denoted by the symbol 𝜎 2 (sigma squared), is the


average of the squared distances of the measurements on all units in the
population from the mean, µ and  (sigma) is the square root of this quantity.

Example 2.19: We want to compute the sample variance of the following


sample values:
3, 2, 5, 1, 9.

Solution:
2 ∑5𝑖=1(𝑥𝑖 −𝑥)
̅̅̅2 (𝟑−𝟒)𝟐 +(𝟐−𝟒)𝟐 (𝟓−𝟒)𝟐 +(𝟏−𝟒)𝟐 +(𝟗−𝟒)𝟐
n=5, 𝑥̅ =4 , then 𝑠 = =
4 𝟒
𝟒𝟎
= = 10 (𝑢𝑛𝑖𝑡)2
𝟒
Statistical Methods DR. Asmaa El-Toony

𝒙𝒊 ̅
𝒙𝒊 − 𝒙 ̅ )2
(𝒙𝒊 − 𝒙
3 -1 1
2 -2 4
5 1 1
1 -3 9
9 5 25
∑ 𝒙𝒊 0 40

𝟒𝟎
𝑺𝟐 = = 𝟏𝟎
𝟒

Example 2.20: Calculate the variance and standard deviation of the following
sample:
2, 3, 3, 3, 4

Solution:
These can easily be obtained from the following type of tabulation:
x 2 3 3 3 4 ∑x = 15
𝑥2 4 9 9 9 16 ∑𝑥 2 =47,

Then we use
2
𝑛 2 ( ∑𝑛
𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 −
S2 = 𝑛
𝑛−1

152
47 −
= 5 = 2 = 0.5
5−1 4
so, S = 0.71.

𝑆 2 is a statistic because it is obtained from the sample values (it is known).


and is used to approximate (estimate) σ2 .
Statistical Methods DR. Asmaa El-Toony

Calculating Variance and standard deviation from grouped Data:

We assume that all values in a particular class interval are located at the
midpoint of that interval. for calculating 𝑥̅ and 𝑆 2 we need n, ∑ 𝑥 and ∑ 𝑥 2

Class Interval Midpoint. 𝑥𝑖 Freq. f 𝑥𝑖 𝑓𝑖 𝑥𝑖2 𝑓𝑖


1th class 𝑥1 𝑓1 𝑥1 𝑓1 𝑥12 𝑓1
2th class 𝑥2 𝑓2 𝑥2 𝑓2 𝑥22 𝑓2
.. … .. .. ..
nth class 𝑥𝑛 𝑓𝑛 𝑥𝑛 𝑓𝑛 𝑥𝑛2 𝑓𝑛

𝑛 ∑ 𝑥𝑖 𝑓𝑖 ∑ 𝑥𝑖2 𝑓𝑖
= ∑ 𝑓𝑖

Therefore, the approximation of 𝑥̅ and 𝑆 2 are:

∑ 𝑥𝑖 𝑓𝑖
𝑥̅ =
∑ 𝑓𝑖
and
∑ x2i fi −(∑ xi fi )2 /n ∑ x2i fi −nx̅2
S2 = ∑ fi −1
or S2 = ∑ fi −1

Example 2.21:

Classes F x 𝑥𝑖 𝑓𝑖 𝑥𝑖2 𝑓
15-19 8 17 136 2312
20-24 16 22 352 7744
25-29 32 27 864 23328
30-34 28 32 896 28672
35-39 12 37 444 16428
40-44 4 42 168 7056
100 2860 85540
Statistical Methods DR. Asmaa El-Toony

∑ 𝑥𝑖 𝑓𝑖 2860
𝑥̅ = ∑ 𝑓𝑖
= = 28.6 year
100

2 ∑ 𝑥𝑖2 𝑓𝑖 −(∑ 𝑥𝑖 𝑓𝑖 )2 /𝑛 85540−(2860)2 /100


𝑆 = ∑ 𝑓𝑖 −1
= = 37.8 ( 𝑦𝑒𝑎𝑟)2
99

𝑆 = √𝑆 2 = 6.1 year
6.1
𝐶. 𝑉 = ∗ 100(%) = 21.5% .
28.6

INTERPRTING THE STANDARD DEVIATION


We’ve seen that if we are comparing the variability of two samples selected from
a population, the sample with the larger standard deviation is the more variable
of the two. Thus, we know how to interpret the standard deviation on a relative or
comparative basis, but we haven’t explained how it provides a measure of
variability for a single sample.

The Coefficient of Variation (C.V):


The variance and the standard deviation are useful as measures of variation of the
values of a single variable for a single population (or sample).

If we want to compare the variation of two variables we cannot use the variance or
the standard deviation because:
1. The variables might have different units.
2. The variables might have different means.

We need a measure of the relative variation that will not depend on either the
units or on how large the values are. This measure is the coefficient of variation
(C.V.) which is defined by:
Statistical Methods DR. Asmaa El-Toony

where S: Sample standard deviation and 𝑥̅ : Sample mean

Example 2.22. Suppose two samples of human males yield the following data:

Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound

We wish to know which is more variable.

Solution:
c.v (Sample1)= (10/145)*100= 6.9
c.v (Sample2)= (10/80)*100= 12.5
Then age of 11-years old (sample2) is more variation

You might also like