Professional Documents
Culture Documents
Descriptive Statistics
1.) Mean
2.) Mode
3.) Median
If observations are in raw form then the mean is computed by summing up the observations
and then dividing their sum by the number of observations. That is,
n
X ¯ = Σ1 X i
n (1.1)
i=1
Sum of all observations
=
Their total number
1
1.1. MEASURES OF CENTRAL
Example 1.1.1 Given the following observations, 2, 5, 50, 100, 200, 0 and 10.
Arithmetic mean for un-grouped data using the working (assumed) mean method. Here the
mean is given by
n
Σ
di
X¯ = A + i=1
(1.2)
n
Where
di = Xi − A
With A the assumed mean.
Example 1.1.2 Let the observations be 2, 3, 5, and 10 in a given data set. Using an assumed
mean of 5. find their arithmetic mean.
Xi di
2 -3
3 -2
5 0
10 5
Σ
di =
0
Thus from,
n
Σ
di
X¯ = A + i=1
n
0
X¯ = 5 + = 5
4
Note 1.1.1 The working( assumed) mean is chosen randomly that is without special consid-
eration from within the range of the observations for easy computations.
where fi is the frequency of the ith class mark Xi - is the class mark of the ith class. n is the
number of classes.
Alternatively, we can compute the grouped mean using the assumed mean. The steps involved
using the working mean are
1.) Choose the assumed mean A where A is preferably a value near or equal to the class mark
of a class with highest frequency.
Example 1.1.3 The frequency table below shows the marks obtained by students in a certain
examination,
Class Frequency
60 - 62 5
63 - 65 18
66 - 68 42
69 - 71 27
72 - 74 8
Using the information above, find the arithmetic mean with and without using the working
mean.
60 − 62 61 5 305 −6 −30
63 − 65 64 18 1152 −3 −54
66 − 68 67 42 2814 0 0
69 − 71 70 27 1890 3 81
72 − 74 73 8 584 6 48
Σ5 Σ5 Σ5
fi = 100 fiXi = 6745 fidi = 45
i=1 i=1 i=1
Σ5
fiXi 6745
i=1 = = 67.45
µ= 100
Σ5 fi
i=1
with the working mean A =
67. 5
Σ
fidi
µ = A + i=1 45
= 67 = 67.45.
Σ5 fi + 100
i=1
For a given n observations xi; i = 1, 2, then the Harmonic mean is defined as;
n
H = n (1.5)
Σ 1
Xi
i=1
Example 1.1.6 In 1980 the population of a town is 300,000. In 1990 a new census reveals it
has risen to 410,000. Estimate the population in 1985.
Solution : If we assume that was no net immigration or migration then the birth
rate will depend on the size of the population (exponential growth) so the geometric
mean is appropriate. √
2
G= 300, 000 × 410, 000 = 350, 713
■
Example 1.1.7 An aeroplane travels a distance of 900 miles. If it covers the first third and
the last third of the trip at a speed of 250 mph and the middle third at a speed of 300 mph,
find the average (harmonic) speed.
Solution :
H= n = 3 = 3 = 264.7
n
Σ 1 1
+
1
+
1 0.01133
i=1 Xi 250 300 250
■
Definition 1.1.2 The Harmonic mean of frequency and grouped data is given by
n
H = n (1.7)
Σ f
i=1 Xi
f
Class Mark (X) f log(X) X
2-4 3 3 1.4314 1
8 - 10 9 1 0.9542 0.1111
Σ Σ Σ f
f= f log X = = 2.1968
X
10 6.8717
Solution : Σ
f log Xi 6.8717
G = Antilog = Antilog = Antilog (0.6872) = 4.866
n 10
n 10
H = n = = 4.552
Σ f 2.1968
i=1 Xi
■
Example 1.1.9 Calculate Geometric mean, Harmonic mean from the following grouped data
Frequency 2 10 7 5 3 8
Solution : Σ
f log Xi 3.7387
G = Antilog = Antilog = Antilog (1.0925) = 12.3739
n 35
n 35
H = n = = 9.3616
Σ f 3.7387
i=1 Xi
■
5.) The sum of squares of the deviations from the mean are minimal ie the sum of squares
of deviation from the mean is less than the sum of sums of squares of deviation from any
observation i.e. n n
Σ Σ
¯ 2
(Xi − X ) < (Xi − x⋆ )2 .
i=1 i=1
odd and for n even the value of the median is given by the average of n th
and th
2 n+2
terms. 2
2.) Divide total frequency by 2 in order to ascertain the median class and locate this class using
the cumulative frequency column,
3. At the point of intersection with the graph (ogive) draw another perpendicular to the x -
axis (lower class boundaries axis).
Example 1.1.11 Given the frequency distribution table as in the Example 1.1.3, find the
median mark using,
60 - 62 61 5 5
63 - 65 64 18 23
66 - 68 67 42 65
69 - 71 70 27 92
72 - 74 73 8 100
Median class is
66 − 68
with,
lm = 65.5, c = 3, cfb = 23, fm = 42
such that
" #
N2
Median = lm + − cfb ×
fm c
50 − 23
= 65.5 + 42 ×3
= 67.429
Since 2 has occurred more number of times (14 times), the mode of the given data is 2.
Mode = lm d1
+c d +
d
1 2
(fm − fb)
Mode = l + ×c (1.10)
m m
Where (f − f ) + (f − f )
1. m a
66 − 68
with
lm = 65.5, fm = 42, fa = 27, fb = 18, c = 3
Therefore,
(42 − 18)
Mode = 65.5 + 3
(42 − 18) + (42 − 27
= 67.3
×
Or It can be estimated from the histogram.
3.) For grouped data if the modal class happens to be the first or the last class in the distribution
then we estimate the mode as mode = 3(median) - 2 (mean).
4.) The mode can be estimated practically from the histogram. This is done by drawing lines
diagonally from the upper corners of the tallest bar to the upper corner of the adjacent
bars and a perpendicular line is drawn from the point of intersection to the x - axis and
the mode is read from the class boundaries axis.
Interquartile range is the difference between the third quartile and the first quartile.
Example 1.2.1 Consider the monthly salaries of secretaries in a certain organization in dollars
as,
441, 430, 515, 420, 490, 438, 435, 447, 445, 500, 510
Find the quartiles together with the interquartile range?.
420, 430, 435, 438, 441, 445, 447, 490, 500, 510, 515
The positions for the quartiles are,
1 11 4+ 1
Q = (1) = 3rd observation
2 11 4+ 1
Q = (2) = 6th observation
3 11 4+ 1
Q = (3) = 9th observation
That’s Q1 is in the third position, Q2 is in the sixth position and Q3 is in the ninth position
thus
Q1 = 435, Q2 = 445 and Q3 = 500
The interquartile range is given by
Q3 − Q1 = 500 − 435 = 65
Here we locate the quartiles by the help of the formula i (n + 1) i, for i = 1, 2, 3, but we
locate it with the aid of the cumulative frequencies, 4
Qi = lm + CF − cf × c (1.11)
i b
Where fw
Example 1.2.3 Given the following frequency distribution table use it to find the quartiles
and their interquartile range.
Age of students Number of students (f ) Cumulative Frequency
20 - 24 11 11
25 - 29 24 35
30 - 34 30 65
35 - 39 18 83
40 - 44 11 94
45 - 49 5 99
50 - 54 1 100
1.) Lower quartile Q1
1 100 + 1
Q1 = (n + 1) = = 25.25thposition.
4 4
this implies that the lower quartile class is = 25 − 29
i = (100) = 25, f w = 24
ml = 24.5, c = 5, Cf
b = 11, CF
1 = 4 ×n 4
1
25 − 11
Q 1 = 24.5 + 24 × 5 = 27.42 units.
i 3
lm = 34.5, c = 5, Cf b = 65, CF 3 = w = 18
4 × n = 4 n = 75, f
75 − 65
Q 3 = 34.5 + 18 × 5 = 37.27778 units.
2, 4, 6, 5, 3, 21, 70
Have a range = 70 − 2 = 68
For grouped data the range is given by the difference between the class mark of the last class
interval and the class mark of the first class interval.
Note 1.3.1 The greater the range value the wider the dispersion/spread and vice versa.
60 - 62 61 5
63 - 65 64 18
66 - 68 67 42
69 - 71 70 27
72 - 74 73 8
Though the range has an advantage of easy computation, it has some disadvantages viz;
1.) It may be misleading if either of the extreme values are outliers (far smaller or far larger
than other observations).
2.) The range is silent about the arrangement of the observation that fall in between the two
extreme values.
For a given data set xi, i = 1, 2, . . . , n we define the mean deviation as the average of the
absolute deviation from the mean given by the formula,
n
1 Σ
Median Deviation = = X i — X¯ . (1.12)
n
. i=1
SX¯
2.) The Variance is defined as the mean of the squared deviations of individual observations
from their arithmetic mean.
(a) For the ungrouped population, the population variance, denoted by σ2 defined as;
Σn1
σ2 = (Xi — µ) 2 (1.13)
n
i=1
n
Σ
Xi
i=1
where µ = for a population with n as the total number of observations in the
n
population. Xi is the ith observation.
(b) For the ungrouped sample, the sample variance denoted as S2 is given by
Σ 1n
S2 = (Xi
— X¯ )2 (1.14)
n − 1 i=1
n
i
n i=1
Σ
1
¯
with the sample mean X = X , n is the total number of observations in the
th
sample with Xi the i observation.
Equation (1.13) gives the population variance and equation (1.14) gives the sample
variance.
3.) The standard Deviation is defined as the positive square root of the variance. Equation
(1.15) gives the population standard deviation for the un-grouped data and equation (1.16)
gives the sample standard deviationfor the un-grouped data .
‚
. 1 Σn
σ = , (Xi − µ)2 (1.15)
‚ n i=1
. n
S = , (Xi − X¯ )2 (1.16)
n−1 i=1
Note 1.3.2 Σ 1
Example 1.3.3 The figures below show production of a certain product in a Kampala based
factory
98, 99, 99, 100, 100, 100, 101, 101, 102
Find
but n
1Σ
¯
X = Xi = 900 = 100
n 9
i=1
8
⇒ Mean deviation = SX¯ =
9
2.) The variance and standard deviation
n− 1 i=1
n
i=1
which can be re-written as any of the following formulae,
2
n Σn fiXi
Σ
1 i i
σ2 = n
i=1 f X2 − n
i=1
!2
n
Σ n
σ2 = 1 2 n fiX2 i − Σ fiXi
n
i=1
2
Σn i=1
fiX2 nf
Σ iXi
−
σ2 = n i n2
i=1 square root
(c) Standard Deviation: Given by the i=1 of the Variance.
S2 = 1 1)
n(n −
(c) Standard Deviations will respectively be the positive square roots of the Variance.
Example 1.3.4 Consider the following tiny grouped data set in Table below
3.) Mode: The modal class is the class with the highest frequency, (40.5 − 60.5)
(fm − fb) (37 − 25)
Mode = lm + (fm × c = 40.5 + × 20 = 49.7308
— fb ) + m — a ) (37 − 25) + (37 − 23)
(f f
1 n +41 98 +41
4.) Lower quartile: Q = i= (1) = 24.75th value of the observation in cf
1 i4 1
column, so class (20.5 − 40.5), and CF = n = 4× 98 = 24.5
3 n +41 98 +41
5.) Upper quartile: Q = i= (3) = 74.25th value of the observation in cf
i 3
column, so class (60.5 − 80.5), and CF3 = 4 n = 4× 98 = 73.5
√ √
2
S = S = 411.6979 = 20.2903
Example 1.3.5 Given the following frequency distribution table from a given sample
i i
n−1 i=1
99
Σ
1 852.75
2
S = f (X − X¯ )2 =
= 8.6136
√
and Standard deviation = 8.6136 = 2.9349.
Note 1.3.3 The greater the depression in a given data set the larger will be the value of it’s
standard deviation and variance. Therefore the standard deviation can be used to compare
dispersion of two or more data sets,
Exercise 1.1 The systolic blood pressure of seven middle aged ladies were as follows:
Compute their
Exercise 1.2 Six men with high cholesterol participated in a study to investigate the effects
of diet on cholesterol level. At the beginning of the study, their cholesterol levels (mg/dL)
were as follows:
366, 327, 274, 292, 274 and 230
Exercise 1.3 Define the median of the random sample, distinguishing between the two cases
n odd and n even. Show that the median has expected value 12 if the random sample is drawn
from a uniform distribution on (0, 1).
Find its variance in the particular case when n is odd. What is the expected value of the
median if the random sample is drawn from a uniform distribution on (a, b)?
Exercise 1.4 Consider the following: Data are Total Patient Care Revenues for a sample of
hospitals in Buddu County (Greater Masaka) Note that,
Compute
Exercise 1.5 Repeat problems in Exercise 1.4 for the grouped data
Heights: 160 - 164 165 - 169 170 - 174 175 - 179 180 -184 185-189
Frequency: 7 11 17 20 16 6
Example 1.3.6 The wheat production (in Kg) of 20 acres is given as:
1120 1240 1320 1040 1080 1200 1440 1360 1680 1730
1785 1342 1960 1880 1755 1720 1600 1470 1750 1885
1040 1080 1120 1200 1240 1320 1342 1360 1440 1470
1600 1680 1720 1730 1750 1755 1785 1880 1885 1960
1.)
1 1
th th
Q1 = (n + 1) = (21) = 5.25th = 5th + —5
4
0.25(6 4
)
= 1240 + 0.25(1320 − 1240)
= 1240 + 20 = 1260
2.)
3 3 th th th th
Q
3 = (n + 1) = (21) = 15.75 = 15 + 0.75(16 − 15 )
4 4
= 1750 + 0.75(1755 − 1750)
= 1750 + 3.75 = 1753.75
3.)
2 2 th th th th
Q
2 = (n + 1) = (21) = 10.5 = 10 + 0.5(11 − 10 )
4 4
= 1470 + 0.5(1600 − 1470)
= 1470 + 65 = 1533
1 - 10 8
11 - 20 14
21 - 30 12
31 - 40 9
41 - 50 7
Find
[Q3 − Q1 = 9 − 4 = 5]
Exercise 1.9 In a work study investigation, the times taken by 20 men in a firm to do a
particular job were tabulated as follows:
Frequencies 2 4 6 4 3 1
Example 1.3.7 The mean has one main disadvantage: it is particularly susceptible to the
influence of outliers. These are values that are unusual compared to the rest of the data set
by being especially small or large in numerical value. For example, consider the wages of staff
at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 5k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that
this mean value might not be the best way to accurately reflect the typical salary of a worker,
as most workers have salaries in the $12k to $18k range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.
65 , 55 , 89 , 56 , 35 , 14 , 56 , 55 , 87 , 45 , 92
55.5
1.) mean/avaerge 6
2.) range 9
3.) median 6
4.) and mode 5
Example 1.3.9 Consider the aptitude test scores of ten children below:
95, 78, 69, 91, 82, 76, 76, 86, 88, 80
Find the mean, median, and mode.
1.) Mean
Solution :
1
X¯ = (95 + 78 + 69 + 91 + 82 + 76 + 76 + 86 + 88 + 80) = 82.1
10
■
2.) Median
Solution : First, order the data.
69, 76, 76, 78, 80, 82, 86, 88, 91, 95
(10 + 1)
With n = 10, the median position is found = 5.5. Thus, the median
by 2
is the average of the fifth (80) and sixth (82) ordered value and the median = 81
■
3.) Mode
Solution : The most frequent value in this data set is 76. Therefore the mode
is 76.
■
Exercise 1.12
1.) What are measures of central tendency as used in statistics?.
2.) Mention any three measures of central tendency you know.
3.) Construct a frequency distribution table for the following figures of weights obtained from
36 Elements of mathematics students in a Ugandan university using a class width of 3 and
starting with the class 56 − 58.
66 70 68 67 71 60
64 70 68 65 64 61
71 66 67 65 68 59
67 65 68 66 69 58
66 65 65 71 70 56
57 60 62 56 59 72