EPS - Chapter - 1 - Descriptive Statistics - JNN - OK

Chapter 1
Descriptive Statistics
1.1 Measures of Central Tendency

Measures of central tendency are measures of the location of the middle or the center of a
distribution or A measure of central tendency is a sample value ( statistic) around which the
distribution is centered.
The most common measures of central tendency are
1.) Mean
2.) Mode
3.) Median
1.1.1 The Mean

There are three categories of mean namely the Arithmetic mean, Geometric mean and the
Harmonic mean.
1.1.1.1 The Arithmetic Mean

This is the most commonly used measure of central tendency and is the most widely used of
the three categories of means.
Arithmetic Mean for un grouped data
If observations are in raw form then the mean is computed by summing up the observations
and then dividing their sum by the number of observations. That is,
n
X ¯ = Σ1 X i
n (1.1)
i=1
Sum of all observations
=
Their total number
1
1.1. MEASURES OF CENTRAL
Example 1.1.1 Given the following observations, 2, 5, 50, 100, 200, 0 and 10.
The mean is given by

X¯
2 + 5 + 50 + 100 + 200 + 0 + 10 367
X¯ = 7 =7
Arithmetic mean for un-grouped data using the working (assumed) mean method. Here the
mean is given by
n
Σ
di
X¯ = A + i=1
(1.2)
n
Where
di = Xi − A
With A the assumed mean.
Example 1.1.2 Let the observations be 2, 3, 5, and 10 in a given data set. Using an assumed
mean of 5. find their arithmetic mean.
A table for the assumed mean
Xi di
2 -3
3 -2
5 0
10 5
Σ
di =
0
Thus from,
n
Σ
di
X¯ = A + i=1
n
0
X¯ = 5 + = 5
4
Note 1.1.1 The working( assumed) mean is chosen randomly that is without special consid-
eration from within the range of the observations for easy computations.
JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 2 of 219

Arithmetic Mean for grouped data
For grouped data, the arithmetic mean is given by;

n
Σ
fiXi
i=1
µ = n (1.3)
Σ
fi
i=1
where fi is the frequency of the ith class mark Xi - is the class mark of the ith class. n is the
number of classes.
Alternatively, we can compute the grouped mean using the assumed mean. The steps involved
using the working mean are
1.) Choose the assumed mean A where A is preferably a value near or equal to the class mark
of a class with highest frequency.
2.) Compute the arithmetic mean using the formula.

n
Σ
fidi
i=1
µ = A+ Σ (1.4)
fi

Example 1.1.3 The frequency table below shows the marks obtained by students in a certain
examination,
Class Frequency
60 - 62 5
63 - 65 18
66 - 68 42
69 - 71 27
72 - 74 8
Using the information above, find the arithmetic mean with and without using the working
mean.
Class Class Mark(Xi) Frequency(fi) fiXi di = Xi − 67 fidi
60 − 62 61 5 305 −6 −30
63 − 65 64 18 1152 −3 −54
66 − 68 67 42 2814 0 0
69 − 71 70 27 1890 3 81
72 − 74 73 8 584 6 48
Σ5 Σ5 Σ5
fi = 100 fiXi = 6745 fidi = 45
i=1 i=1 i=1
Σ5
fiXi 6745
i=1 = = 67.45
µ= 100
Σ5 fi
i=1
with the working mean A =
67. 5
Σ
fidi
µ = A + i=1 45
= 67 = 67.45.
Σ5 fi + 100
i=1

1.1.1.2 Harmonic Mean

Another measure of central tendency which is only occasionally used is the harmonic mean. It
is most frequently employed for averaging speeds where the distances for each section of the
journey are equal.
For a given n observations xi; i = 1, 2, then the Harmonic mean is defined as;
n
H = n (1.5)
Σ 1
Xi
i=1
Example 1.1.4 Given the observations 2, 3, 3, 5. Find their Harmonic mean.

H = nn = 4
= 120
Σ 1 1 1 1 1 41
+ + +
i=1 Xi 3 5 2 3
1.1.1.3 Geometric Mean

The geometric mean is seldom used outside of specialist applications. It is appropriate when
dealing with a set of data such as that which shows exponential growth. It is sometimes quite
difficult to decide where the use of the geometric mean over the arithmetic mean is the best
choice.
Definition 1.1.1 Given any n observation ie xi; i = 1, 2, . . . then the geometric mean is defined
Σ
as the nth root of their product;
√ log Xi
G = n
x .x . . . = Antilog (1.6)
1 2
n
Proof is Example ?? on page (pg. ??).
Example 1.1.5 Compute the geometric mean of the data in Example 1.1.4 above.
√ √
4 4
G= 3× 5× 3× 2= 90
Example 1.1.6 In 1980 the population of a town is 300,000. In 1990 a new census reveals it
has risen to 410,000. Estimate the population in 1985.
Solution : If we assume that was no net immigration or migration then the birth
rate will depend on the size of the population (exponential growth) so the geometric
mean is appropriate. √
2
G= 300, 000 × 410, 000 = 350, 713
■
Example 1.1.7 An aeroplane travels a distance of 900 miles. If it covers the first third and
the last third of the trip at a speed of 250 mph and the middle third at a speed of 300 mph,
find the average (harmonic) speed.
Solution :
H= n = 3 = 3 = 264.7
n
Σ 1 1
+
1
+
1 0.01133
i=1 Xi 250 300 250
■

Definition 1.1.2 The Harmonic mean of frequency and grouped data is given by
n
H = n (1.7)
Σ f
i=1 Xi
Definition 1.1.3 The Geometric mean of frequency

Σ and grouped data is given by
f log Xi
G = Antilog (1.8)
n
Example 1.1.8 Calculate Geometric mean, Harmonic mean from the following grouped data
f
Class Mark (X) f log(X) X
2-4 3 3 1.4314 1
4-6 5 4 2.7959 0.8
6-8 7 2 1.6902 0.2857
8 - 10 9 1 0.9542 0.1111
Σ Σ Σ f
f= f log X = = 2.1968
X
10 6.8717
Solution : Σ
f log Xi 6.8717
G = Antilog = Antilog = Antilog (0.6872) = 4.866
n 10
n 10
H = n = = 4.552
Σ f 2.1968
i=1 Xi
■
Example 1.1.9 Calculate Geometric mean, Harmonic mean from the following grouped data
Class 0-4 5-9 10 - 14 15 - 19 20 - 24 25 - 29
Frequency 2 10 7 5 3 8
Solution : Σ
f log Xi 3.7387
G = Antilog = Antilog = Antilog (1.0925) = 12.3739
n 35
n 35
H = n = = 9.3616
Σ f 3.7387
i=1 Xi
■

1.1.1.4 Properties of the Mean

1.) It can be calculated for every given data set that is it always exists.
2.) The set of numerical data has only one mean indicating that the mean is always unique.
3.) It takes into account all observations in the data set.
4.) The sum of the deviations of a set of observations from their mean is 0 ie
n
Σ
(Xi − X¯ ) = 0
i=1
5.) The sum of squares of the deviations from the mean are minimal ie the sum of squares
of deviation from the mean is less than the sum of sums of squares of deviation from any
observation i.e. n n
Σ Σ
¯ 2
(Xi − X ) < (Xi − x⋆ )2 .
i=1 i=1
Where x is any other observation and X¯ is the mean of the observations

⋆
1.1.2 The Median

1.1.2.1 Median for un-grouped data
It’s a statistical measure that divides a data set into two equal subsets. For an un-grouped
data order first arrange the items in either ascending or descending order and the median will
then be given by the observation that falls in the middle (for an odd number of observations)
but for an even number of observations we get the average of the two middle observations.
Example 1.1.10 Find the median for the data set,
1.) 1, 2, 8, 9, 4, 7, 6.
2.) 1, 2, 8, 10, 9, 4, 7, 6.
As a solution, we first arrange in either ascending or descending order
1.) 1, 2, 4, 6, 7, 8, 9. thus median = 6.
6+7
2.) 1, 2, 4, 6, 7, 8, 9, 10., Median = 6.5.
= 2
th
n+1
Generally for a set of n observation the value of the median is given by term for n
2
odd and for n even the value of the median is given by the average of n th
and th
2 n+2
terms. 2
1.1.2.2 Median for grouped data

For grouped data the median can be estimated by
1. an orgive
2. linear interpolation

1.1.2.3 Median by Linear Interpolation method

The following steps are taken;
1.) Compute the cumulative frequency,
2.) Divide total frequency by 2 in order to ascertain the median class and locate this class using
the cumulative frequency column,
3.) Compute the median using the formula,

"N
2
Median = lm + ×c (1.9)
fm
#
Where − cfb
lm - lower class boundary of the median class.
fm - frequency of the median class.
cfb - cumulative frequency of the class just before the median class
c - the class width.
Median by Graphical Method

1
1. Compute Σ f and locate it on the y-axis (cumulative frequency axis) of the ogive.
2
2. Draw a perpendicular line from this point and extend it to intersect with the ogive.
3. At the point of intersection with the graph (ogive) draw another perpendicular to the x -
axis (lower class boundaries axis).
4. Read off the value of the median from the x - axis.
Example 1.1.11 Given the frequency distribution table as in the Example 1.1.3, find the
median mark using,
1.) Linear interpolation method
2.) The graphical method
Class Class Mark (Xi) Frequency(fi) Cumulative Frequency
60 - 62 61 5 5
63 - 65 64 18 23
66 - 68 67 42 65
69 - 71 70 27 92
72 - 74 73 8 100

1.) Using Linear interpolation method.
Median class is
66 − 68
with,
lm = 65.5, c = 3, cfb = 23, fm = 42
such that
" #
N2
Median = lm + − cfb ×
fm c
50 − 23
= 65.5 + 42 ×3
= 67.429
2.) The graphical method
1.1.3 The Mode

1.1.3.1 Mode for un-grouped data
It’s the observation with the highest frequency or number of appearances in a given data set
for un-grouped data. A given data set may have more than one mode. If a set of data has one
mode then it’s called unimodal, if it has two modes then its bimodal and with many modes its
multi modal.
Example 1.1.12 Given the data set
2, 3, 1, 3, 4, 5, 6, 7, 4
then the mode is 3 & 4 (bimodal).
Example 1.1.13 Following are the margin of victory in the foot ball matches of a league.
1 3 2 5 1 4 6 2 5 2 2 2 4 1 2 3 2 3 2 3
1 1 2 3 2 6 4 3 2 1 1 4 2 1 5 3 4 2 1 2

Since 2 has occurred more number of times (14 times), the mode of the given data is 2.
JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 10 of

1.1.3.2 Mode for grouped data

It’s estimated from the class with the highest frequency called the modal class. this is done by
the formula below,
Mode = lm d1
+c d +
d
1 2
(fm − fb)
Mode = l + ×c (1.10)
m m
Where (f − f ) + (f − f )
1. m a
lm : lower class boundary of the modal class

fm : frequency of the modal class
fb : frequency of the class just before the modal
class fa : frequency of the class just after the
modal class c : the class width
or the mode can be estimated from the histogram.

Example 1.1.14 Given the frequency distribution table as in Example 1.1.3, find the modal
mark of the students.
The modal class with the highest frequency is
66 − 68
with
lm = 65.5, fm = 42, fa = 27, fb = 18, c = 3
Therefore,
(42 − 18)
Mode = 65.5 + 3
(42 − 18) + (42 − 27
= 67.3
×
Or It can be estimated from the histogram.
1.1.3.3 Properties of the Mode

1.) The mode does not always exist
2.) It may or may not be unique
3.) For grouped data if the modal class happens to be the first or the last class in the distribution
then we estimate the mode as mode = 3(median) - 2 (mean).
4.) The mode can be estimated practically from the histogram. This is done by drawing lines
diagonally from the upper corners of the tallest bar to the upper corner of the adjacent
bars and a perpendicular line is drawn from the point of intersection to the x - axis and
the mode is read from the class boundaries axis.

1.2. MEASURES OF
1.2 Measures of Position

1.2.1 Quartiles
These are measures which divide a given data set into four (4) parts, the first quartile relates to
the lower 25% of the observations, the 2 nd quartile relates to the lower 50% of the observations
and the third quartile relates to the lower 75% of the observations.
Interquartile range is the difference between the third quartile and the first quartile.
1.2.1.1 Quartiles from un-grouped data

We arrange the observations in ascending or descending order and locate the quartiles using
the formula
n+1
i, for i = 1, 2, 3,
4
where n is the total number of observations in the data set, this expression gives the position
of the quartile in the data set.
Example 1.2.1 Consider the monthly salaries of secretaries in a certain organization in dollars
as,
441, 430, 515, 420, 490, 438, 435, 447, 445, 500, 510
Find the quartiles together with the interquartile range?.
We arrange the data in an ascending array as below,
420, 430, 435, 438, 441, 445, 447, 490, 500, 510, 515
The positions for the quartiles are,
1 11 4+ 1
Q = (1) = 3rd observation
2 11 4+ 1
Q = (2) = 6th observation
3 11 4+ 1
Q = (3) = 9th observation
That’s Q1 is in the third position, Q2 is in the sixth position and Q3 is in the ninth position
thus
Q1 = 435, Q2 = 445 and Q3 = 500
The interquartile range is given by
Q3 − Q1 = 500 − 435 = 65
Example 1.2.2 Compute Q3 for the data 2, 6, 8, 10?

3 th
Q
1.2. MEASURES OF
3 = (4 + 1) = 3.75 .obs = 8 + 0.75(10 − 8) = 9.5

4

1.2. MEASURES OF
1.2.1.2 Quartiles from grouped data
Here we locate the quartiles by the help of the formula i (n + 1) i, for i = 1, 2, 3, but we
locate it with the aid of the cumulative frequencies, 4
Qi = lm + CF − cf × c (1.11)
i b
Where fw
lm : is the lower class boundary of the ith quartile class

CFi : × n , cumulative frequency of the ith quartile class
i
4
cfb : cumulative frequency of the class just before the ith quartile class
fw : frequency of the quartile class
1. : the class width
Example 1.2.3 Given the following frequency distribution table use it to find the quartiles
and their interquartile range.
Age of students Number of students (f ) Cumulative Frequency
20 - 24 11 11
25 - 29 24 35
30 - 34 30 65
35 - 39 18 83
40 - 44 11 94
45 - 49 5 99
50 - 54 1 100
1.) Lower quartile Q1
1 100 + 1
Q1 = (n + 1) = = 25.25thposition.
4 4
this implies that the lower quartile class is = 25 − 29
i = (100) = 25, f w = 24
ml = 24.5, c = 5, Cf
b = 11, CF
1 = 4 ×n 4
1
25 − 11
Q 1 = 24.5 + 24 × 5 = 27.42 units.
2.) Upper quartile Q3:

3 3 100 + 1
Q = (n + 1) =
(100 + 1) = (3) = 75.75thposition.
3
4 4 4
this implies that the Upper quartile class is = 35 − 39
i 3
lm = 34.5, c = 5, Cf b = 65, CF 3 = w = 18
4 × n = 4 n = 75, f

1.2. MEASURES OF
75 − 65
Q 3 = 34.5 + 18 × 5 = 37.27778 units.
3.) Interquartile range Q3 − Q1 = 9.85778

1.3. MEASURES OF
1.3 Measures of Variation

1.3.1 The Range
The range gives the distance between the largest and smallest observation in a given set for an un-
grouped data set. For an un-grouped data set the range is given by the difference between the
largest (biggest) and the smallest (least) observation
Example 1.3.1 For the data set
2, 4, 6, 5, 3, 21, 70
Have a range = 70 − 2 = 68
For grouped data the range is given by the difference between the class mark of the last class
interval and the class mark of the first class interval.
Note 1.3.1 The greater the range value the wider the dispersion/spread and vice versa.
Example 1.3.2 Given the following frequency distributions.
Class Class Mark(Xi) Frequency(fi)
60 - 62 61 5
63 - 65 64 18
66 - 68 67 42
69 - 71 70 27
72 - 74 73 8
The range is 73 − 61 = 12.
Though the range has an advantage of easy computation, it has some disadvantages viz;
1.) It may be misleading if either of the extreme values are outliers (far smaller or far larger
than other observations).
2.) The range is silent about the arrangement of the observation that fall in between the two
extreme values.

1.3. MEASURES OF
1.3.2 The Variance, Standard Deviation and Mean Deviation of un-

grouped data
1.) Mean deviation of un-grouped data
For a given data set xi, i = 1, 2, . . . , n we define the mean deviation as the average of the
absolute deviation from the mean given by the formula,
n
1 Σ
Median Deviation = = X i — X¯ . (1.12)
n
. i=1
SX¯
2.) The Variance is defined as the mean of the squared deviations of individual observations
from their arithmetic mean.
(a) For the ungrouped population, the population variance, denoted by σ2 defined as;
Σn1
σ2 = (Xi — µ) 2 (1.13)
n
i=1
n
Σ
Xi
i=1
where µ = for a population with n as the total number of observations in the
n
population. Xi is the ith observation.
(b) For the ungrouped sample, the sample variance denoted as S2 is given by
Σ 1n
S2 = (Xi
— X¯ )2 (1.14)
n − 1 i=1
n
i
n i=1
Σ
1
¯
with the sample mean X = X , n is the total number of observations in the
th
sample with Xi the i observation.
Equation (1.13) gives the population variance and equation (1.14) gives the sample
variance.
3.) The standard Deviation is defined as the positive square root of the variance. Equation
(1.15) gives the population standard deviation for the un-grouped data and equation (1.16)
gives the sample standard deviationfor the un-grouped data .
‚
. 1 Σn
σ = , (Xi − µ)2 (1.15)
‚ n i=1
. n
S = , (Xi − X¯ )2 (1.16)
n−1 i=1
Note 1.3.2 Σ 1

1.3. MEASURES OF
I. Expression (1.13) can be re-written as,

n
Σ n 2
X2 Xi
Σ
2 i=1
i i=1 
σ = −
n  n 
II. Expression (1.14) can be re-written as

n n 2
Σ 2
n X i− Xi
i=1 i=1
2
S = Σ
n(n − 1)
Example 1.3.3 The figures below show production of a certain product in a Kampala based
factory
98, 99, 99, 100, 100, 100, 101, 101, 102
Find
1.) The mean deviation

Σ
1
— X¯ .
n
Mean Deviation = S ¯ =
.X
X i
n
i=1
but n
1Σ
¯
X = Xi = 900 = 100
n 9
i=1
8
⇒ Mean deviation = SX¯ =
9
2.) The variance and standard deviation
The variance of the sample (less than 30 items)

Σ 1n
S2 = (Xi — X¯ )2 = 1.5 units
n − 1 i=1
and standard deviation is

‚
√
S= . 1 (Xi − X¯ )2 1.5
,
=
Σn
n− 1 i=1

1.3. MEASURES OF
1.3.3 Variance, Standard Deviation and Mean Deviation of Grouped

Data
1.) For a grouped population;
(a) Mean Deviation
Σn1
Mean deviation = σµ = fi |Xi — µ| (1.17)
n
i=1
(b) The variance σ2

#
σ 2
= "
1Σ (1.18)
fi(Xi − µ)2
n
n
i=1
which can be re-written as any of the following formulae,

 2
n Σn fiXi
Σ
1 i i
σ2 = n 
 i=1 f X2 − n
i=1 


!2 
n
Σ n
σ2 = 1 2 n fiX2 i − Σ fiXi 
n
i=1
2
Σn i=1
fiX2 nf
Σ iXi
−
σ2 = n i n2
i=1 square root
(c) Standard Deviation: Given by the i=1 of the Variance.
2.) For a grouped sample (subset of the population) of size n.

(a) Mean deviation
Σn1
Mean deviation = = fi .Xi — X¯ . (1.19)
n
SX¯ i=1
(b) Variance, denoted by S2 is defined by the Σ

formula,
1 n
S2 = f
— X ¯ )2 (1.20)
(X
i i
n−1 i=1
and can be re-written as,

 2
n fiXi 
1 Σ
n Σ
i=1
S2 = i
(n − 1)  i=1 f X2 n
− i
S2 = 1 1)
n(n −

1.3. MEASURES OF 
Σ 

nn
i=1 Σ !2 
i=1

f iX 2 − i fiXi
n
2
Σ fiXi2
fiXi
n
Σ
S2 = i=1
−
i=1
(n − 1) n(n − 1)

1.3. MEASURES OF
(c) Standard Deviations will respectively be the positive square roots of the Variance.
Example 1.3.4 Consider the following tiny grouped data set in Table below
Class Class Frequency Cumulative Class fX f X2
Boundary f Frequency cf Mark X
1 - 20 0.5 - 20.5 5 5 10.5 52.5 551.25
21 - 40 20.5 - 40.5 25 30 30.5 762.5 23256.25
41 - 60 40.5 - 60.5 37 67 50.5 1868.5 94359.25
61 - 80 60.5 - 80.5 23 90 70.5 1621.5 114315.75
81 - 100 80.5 - 100.5 8 98 90.5 724 65522

Σ Σ Σ
f = 98 fX= f X2 =
5029 298004.5
Determine
n
Σ
fiXi
1.) Mean: µ = i=1n
= 5029
98 = 51.3163
Σ
fi
i=1
98
2.) Median: = 49, so the median class is (40.5 − 60.5)
2
"N
Median = lm + 2 49 − 30
× c = 40.5 × 20 = 50.7703
fm 37
+
3.) Mode: The modal class is the class with the highest frequency, (40.5 − 60.5)
(fm − fb) (37 − 25)
Mode = lm + (fm × c = 40.5 + × 20 = 49.7308
— fb ) + m — a ) (37 − 25) + (37 − 23)
(f f
1 n +41 98 +41
4.) Lower quartile: Q = i= (1) = 24.75th value of the observation in cf
1 i4 1
column, so class (20.5 − 40.5), and CF = n = 4× 98 = 24.5
1 m CF1 − cfb 24.5 − 5

Q = l + fw × c = 20.5 + 25 × 20 = 36.1
3 n +41 98 +41
5.) Upper quartile: Q = i= (3) = 74.25th value of the observation in cf

1.3. MEASURES OF
i 3
column, so class (60.5 − 80.5), and CF3 = 4 n = 4× 98 = 73.5
3 m CF3 − cfb 73.5 − 67

Q = l + fw × c = 60.5 + 23 × 20 = 66.1522

1.3. MEASURES OF
6.) Variance for the sample given

 
2
n
n Σ
S 2
=
2
f iX i − fiXi = 298004.5 − = 411.6979
(n −1 1)  i=1 n 1 5029
98
2
i=1 
 (98 − 1)
Σ

7.) Standard deviation
√ √
2
S = S = 411.6979 = 20.2903
8.) Show that
(a) Q2 = 50.7703 (c)

Mean deviation SX¯ = 0.3001
(b) Range = 100.5 − 0.5 (d)
Quartile deviation = 15.0261
Example 1.3.5 Given the following frequency distribution table from a given sample
Class Class Mark(Xi) Freq(fi) fiXi |Xi − (Xi − X¯ fi |Xi − X¯ | fi (Xi − X¯ )2

X¯ | )2
60-62 61 5 305 6.45 41.6025 32.25 208.0125
63-65 64 18 1152 3.45 11.9025 62.1 214.245
66-68 67 42 2814 0.45 .2025 18.9 8.505
69-71 70 27 1890 2.55 6.5025 68.85 175.5675
72-74
Σ 73 8 584 5.55 30.8025 44.4 246.42
100 6745 226.5 852.75
Find
1.) Mean deviation
1 Σn 226.5
. ¯ = 2.265
Mean deviation = SX¯ =Σ fi Xi − X . =
i=1
100
f
2.) The sample variance

n
i i
n−1 i=1
99
Σ
1 852.75
2
S = f (X − X¯ )2 =
= 8.6136
√
and Standard deviation = 8.6136 = 2.9349.
Note 1.3.3 The greater the depression in a given data set the larger will be the value of it’s
standard deviation and variance. Therefore the standard deviation can be used to compare
dispersion of two or more data sets,

1.3. MEASURES OF
However to we at a meaningful conclusion the following conditions should be satisfied;

1.) The data sets should be expressed in the same units
2.) The means of the data sets should be nearly the same in magnitude
Once the two conditions are satisfied then the smaller of the two standard deviations would
indicate that the distribution to which it belongs exhibits less dispersion than others under
comparison.

1.3. MEASURES OF
Exercise 1.1 The systolic blood pressure of seven middle aged ladies were as follows:
151, 124, 132, 170, 146, 124 and 113
Compute their
1.) Mean systolic blood pressure. [137.14]

2.) Median systolic blood pressure [132]
3.) Mode systolic blood pressure [124]
Exercise 1.2 Six men with high cholesterol participated in a study to investigate the effects
of diet on cholesterol level. At the beginning of the study, their cholesterol levels (mg/dL)
were as follows:
366, 327, 274, 292, 274 and 230
1.) Median cholesterol levels [283]
2.) Mode cholesterol levels [274]
Exercise 1.3 Define the median of the random sample, distinguishing between the two cases
n odd and n even. Show that the median has expected value 12 if the random sample is drawn
from a uniform distribution on (0, 1).
Find its variance in the particular case when n is odd. What is the expected value of the
median if the random sample is drawn from a uniform distribution on (a, b)?
Exercise 1.4 Consider the following: Data are Total Patient Care Revenues for a sample of
hospitals in Buddu County (Greater Masaka) Note that,
Hospital Revenue (in millions)

1 414.6
2 358.6
3 439.8
4 64.8
5 159.2
6 130.5
7 395.3
Table 1.1: Hospital Revenues in Buddu in 1996
Hospitals all have different level of revenues
The spread ranges from 64.8 million to 414.6 million
Hospital 4 appears to have unusually low revenues-outlier?
Compute
1.) The average revenue in 1996 4.) The third quartile Q3

2.) The median revenue in 1996 5.) The range
3.) The mode in 1996 6.) The variance

1.3. MEASURES OF
Exercise 1.5 Repeat problems in Exercise 1.4 for the grouped data
Heights: 160 - 164 165 - 169 170 - 174 175 - 179 180 -184 185-189
Frequency: 7 11 17 20 16 6
Table 1.2: Height of employees in cm
Example 1.3.6 The wheat production (in Kg) of 20 acres is given as:
1120 1240 1320 1040 1080 1200 1440 1360 1680 1730
1785 1342 1960 1880 1755 1720 1600 1470 1750 1885
After arranging the observations in ascending order, we get
1040 1080 1120 1200 1240 1320 1342 1360 1440 1470
1600 1680 1720 1730 1750 1755 1785 1880 1885 1960
1.)
1 1
th th
Q1 = (n + 1) = (21) = 5.25th = 5th + —5
4
0.25(6 4
)
= 1240 + 0.25(1320 − 1240)
= 1240 + 20 = 1260
2.)
3 3 th th th th
Q
3 = (n + 1) = (21) = 15.75 = 15 + 0.75(16 − 15 )
4 4
= 1750 + 0.75(1755 − 1750)
= 1750 + 3.75 = 1753.75
3.)
2 2 th th th th
Q
2 = (n + 1) = (21) = 10.5 = 10 + 0.5(11 − 10 )
4 4
= 1470 + 0.5(1600 − 1470)
= 1470 + 65 = 1533
4.) The Quartile Deviation (Q.D)

Q3 − Q1 1753.75 − 1260
2 = 2 = 246.875
5.) Coefficient of Quartile Deviation

Q3 − Q1 1753.75 − 1260
Q3 + Q1 = 1753.75 + 1260 = 0.164

1.3. MEASURES OF
Exercise 1.6 Given the age data of the 12 village members:
12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25, 65
1.) Find the ages’ upper quartile Q3. [40.5]
2.) Determine the median age. [23.5]
3.) Compute the inter quartile range

4.) Find the Quartile Deviation (Q.D)
5.) Establish the Coefficient of Quartile Deviation
Exercise 1.7 Based on the grouped data below,
Time to Travel to Work Frequency
1 - 10 8
11 - 20 14
21 - 30 12
31 - 40 9
41 - 50 7
Find
1.) Median [24] 4.) Interquartile range [20.6746]
2.) Lower Quartile Q1 [13.7143] 5.) Mode [17.5]
3.) Upper Quartile Q3 [34.3889] 6.) Variance S2
Exercise 1.8 Find the interquartile of the data set:
{1, 3, 4, 5, 5, 6, 9, 14, 21}
[Q3 − Q1 = 9 − 4 = 5]
Exercise 1.9 In a work study investigation, the times taken by 20 men in a firm to do a
particular job were tabulated as follows:
Time Taken (min) 8-10 11-13 14-16 17-19 20-22 23-25
Frequencies 2 4 6 4 3 1
Compute the second quartile Q2. [15.50]

Note 1.3.4 The second quartile Q2 is usually the median.

1.3. MEASURES OF
Example 1.3.7 The mean has one main disadvantage: it is particularly susceptible to the
influence of outliers. These are values that are unusual compared to the rest of the data set
by being especially small or large in numerical value. For example, consider the wages of staff
at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 5k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that
this mean value might not be the best way to accurately reflect the typical salary of a worker,
as most workers have salaries in the $12k to $18k range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.
Exercise 1.10 Find the median for the following data
65 , 55 , 89 , 56 , 35 , 14 , 56 , 55 , 87 , 45 , 92
55.5
Exercise 1.11 Given the data 1, 5, 8, 10, 7 and 5 Calculate
1.) mean/avaerge 6
2.) range 9
3.) median 6
4.) and mode 5
Example 1.3.8 Assume that we have obtained the following 20 observations:
2, 4, 7, − 20, 22, − 1, 0, − 1, 7, 15, 8, 4, − 4, 11, 11, 12, 3, 12, 18, 1
In order to calculate the quartiles we first have to sort the observations:
−20, − 4, − 1, − 1, 0, 1, 2, 3, 4, 4, 7, 7, 8, 11, 11, 12, 12, 15, 18, 22
The position of the first quartile is x = round(0.25*(20+1)) = round(5.25) = 5, which

means that Q1 is the 5th value of the sorted series, namely Q1 = 0. The other quartiles are
calculated in the same way resulting in Q2 = 5.5 and Q3 = 12

1.3. MEASURES OF
Example 1.3.9 Consider the aptitude test scores of ten children below:
95, 78, 69, 91, 82, 76, 76, 86, 88, 80
Find the mean, median, and mode.
1.) Mean
Solution :
1
X¯ = (95 + 78 + 69 + 91 + 82 + 76 + 76 + 86 + 88 + 80) = 82.1
10
■
2.) Median
Solution : First, order the data.
69, 76, 76, 78, 80, 82, 86, 88, 91, 95
(10 + 1)
With n = 10, the median position is found = 5.5. Thus, the median
by 2
is the average of the fifth (80) and sixth (82) ordered value and the median = 81
■
3.) Mode
Solution : The most frequent value in this data set is 76. Therefore the mode
is 76.
■
Exercise 1.12
1.) What are measures of central tendency as used in statistics?.
2.) Mention any three measures of central tendency you know.
3.) Construct a frequency distribution table for the following figures of weights obtained from
36 Elements of mathematics students in a Ugandan university using a class width of 3 and
starting with the class 56 − 58.
66 70 68 67 71 60
64 70 68 65 64 61
71 66 67 65 68 59
67 65 68 66 69 58
66 65 65 71 70 56
57 60 62 56 59 72
4.) Using the frequency distribution table in (3) above, find

(a) the mean weight,
(b) the modal weight and
(c) the median weight of the students.

EPS - Chapter - 1 - Descriptive Statistics - JNN - OK

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EPS - Chapter - 1 - Descriptive Statistics - JNN - OK

Uploaded by

Copyright:

Available Formats

Chapter 1

1.1 Measures of Central Tendency

The most common measures of central tendency are

1.1.1 The Mean

1.1.1.1 The Arithmetic Mean

Arithmetic Mean for un grouped data

The mean is given by

A table for the assumed mean

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 2 of 219

Arithmetic Mean for grouped data

For grouped data, the arithmetic mean is given by;

2.) Compute the arithmetic mean using the formula.

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 3 of 219

Class Class Mark(Xi) Frequency(fi) fiXi di = Xi − 67 fidi

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 4 of 219

1.1.1.2 Harmonic Mean

Example 1.1.4 Given the observations 2, 3, 3, 5. Find their Harmonic mean.

1.1.1.3 Geometric Mean

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 5 of 219

Definition 1.1.3 The Geometric mean of frequency

4-6 5 4 2.7959 0.8

6-8 7 2 1.6902 0.2857

Class 0-4 5-9 10 - 14 15 - 19 20 - 24 25 - 29

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 6 of 219

1.1.1.4 Properties of the Mean

Where x is any other observation and X¯ is the mean of the observations

1.1.2 The Median

1.1.2.2 Median for grouped data

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 7 of 219

1.1.2.3 Median by Linear Interpolation method

1.) Compute the cumulative frequency,

3.) Compute the median using the formula,

Median by Graphical Method

4. Read off the value of the median from the x - axis.

1.) Linear interpolation method

2.) The graphical method

Class Class Mark (Xi) Frequency(fi) Cumulative Frequency

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 8 of 219

1.) Using Linear interpolation method.

2.) The graphical method

1.1.3 The Mode

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 9 of 219

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 10 of

1.1.3.2 Mode for grouped data

lm : lower class boundary of the modal class

or the mode can be estimated from the histogram.

The modal class with the highest frequency is

1.1.3.3 Properties of the Mode

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 11 of

1.2 Measures of Position

1.2.1.1 Quartiles from un-grouped data

We arrange the data in an ascending array as below,

Example 1.2.2 Compute Q3 for the data 2, 6, 8, 10?

3 = (4 + 1) = 3.75 .obs = 8 + 0.75(10 − 8) = 9.5

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 13 of

1.2.1.2 Quartiles from grouped data

lm : is the lower class boundary of the ith quartile class

2.) Upper quartile Q3:

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 14 of

3.) Interquartile range Q3 − Q1 = 9.85778

JNN & ELEMENTS OF PROBABILITY & STATISTICS Page 15 of

1.3 Measures of Variation