Professional Documents
Culture Documents
STAT
STAT
Prepared by
Lecturer
Department of Mathematics
Syllabus
Books Recommended
Class Routine
iv
Contents
Course Outline iv
Syllabus iv
Books Recommended iv
Class Routine iv
2.1. Data 6
v
CONTENTS vi
3.2. Mean 30
3.3. Median 38
3.5. Mode 50
Chapter 5. Probability 68
5.1. Probability 68
Bibliography 69
Introduction to Statistics
1
1.3. TYPES OF STATISTICS 2
of numerical data.
statisticians is as follows.
two categories :
data. Constructing charts, tables etc on the basis of numerical data are the
ulation on the basis of a sample chosen from the population. Inferential statis-
tics use summarized data to predict or infer some general statements about the
Example 1.2 The statistics illustrate that the illiteracy rate has been de-
tistics. In singular sense, like an uncountable noun statistics refers to the field
niques, notions which are employed for studying data. In plural perspective,
statistics refers to numerical data which have been collected and analyzed.
causes.
statistics are very wide and increasing day by day. A few examples of different
(a) Business
and so on.
(b) There are certain phenomena or notions like beauty, honesty, intelli-
gence etc which can not be quantified. So here statistics can not be
applied.
1.5. CHARACTERISTICS, USAGE AND LIMITATIONS OF STATISTICS 5
conclusion.
the population. Moreover, different surveys run over the same size of
(e) Statistics are collected for a fixed target. So data collected for one
In this chapter, for details the reader is referred to study [2, 4, 5].
2.1. Data
Definition 2.1.1 [4, Subsection 1.6.1 of Page 11] Data are the observations
tions.
into various independent categories and then counting the frequency of the
2.1.4. Sources of data. There are mainly two sources of data, namely
6
2.1. DATA 7
2.1.5. Primary sources. [4, Page 12] These are raw materials of primary
of investigations.
Example 2.3 Data collected from survey, online survey, interview, observa-
2.1.6. Secondary sources. [4, Page 14] Those data which have been
Definition 2.1.2 [4, Page 30] According to Conner, the process of arranging
gives expression to the unity of attributes that may subsist amongst a diversity
of individuals.
2.1.8. Basis of classifying data. [2, Page 26 – 27] Data can be classified
geographic areas.
time.
2.2. LEVEL OF MEASUREMENT 8
variable and frequency. For example, the marks obtained by the EEE
students is frequency.
ment.
Definition 2.2.1 [4, Page 33] Measurement is essentially the task assigning
level of measurement, the categories differ from one another only in name.
Example 2.5 Colour such as red, white etc; sex such as male, female; nu-
Note 2.1 Measurements involving numbers can be nominal also. For in-
stance, student ID, Room no, NID number, Birth registration number etc.
In this case these numbers have no numerical value, rather these are used as
identity.
Unlike nominal level, here we get the typical relationships higher, lower, more
than, less than, more difficult, less favorable, more prejudiced etc.
Example 2.6 The level of education such that MS, B.Sc(Hons) etc.
Example 2.8 Class performance like outstanding, excellent, very good, good.
2.2.5. Interval level. It includes all properties of the previous two levels.
Example 2.9 The difference 21o and 20o in centigrade temperature scale is
Note 2.2 In interval level of measurement 0 does not mean the absence of
anything. For example, 0c temperature does not mean the absence of temper-
ature, 0000 hour does not mean the absence of time and so on.
defined, though multiplication and division are not defined. For instance, 10o
2024 is not the half of the year 2048 or double of the year 1012 and so on.
ingful value and is the difference between the values is important are ratio level
of measurement.
tiplication and division are defined. For instance, a plant with height 16 feet
Note 2.5 Higher level of measure can be used as a lower level of measure,
called variable.
containing two or more values or categories that can vary from one individual
to another individual.
CUET.
These are :
able does not necessarily involves only integer or whole numbers, it can take
not take all numbers (for example 4.0123) between 4.00 to 5.00. It takes only
some fixed numbers for different gradings, like 5.00 for A+ , GPA 4.00 for A,
GPA 3.75 for A− and so on, but Grading system GPA 3.00019 does not come.
Example 2.19 Laptop size, television size (there is no 14.234 inches size
television), Mobile phone size, video quality (there is no video with quality
Note 2.6 A variable taking only integer value is always discrete. For exam-
a discrete variable may not take only integral value. For example, CGPA of a
Note 2.7 A continuous variable can take any value between any two given
values, like the height of a plant can be any value from 5 meter to 20 meter.
a constant.
Example 2.21 Total angle of a triangle, velocity of light, radius of the earth,
π, e, 32 etc.
referred as attributes.
qualitative) data.
Example 2.22 [1] Suppose that you are looking at sales data of a clothing
store. You may use attribute like colour and size to segment the data and
better understand which product are selling well and which are not.
Example 2.23 [4] If someone notes down for each individual whether he/she
possess or does not posses certain characteristic like owns a laptop, smokes or
does not smoke, holds an opinion on certain political issues - these character-
Definition 2.4.1 (Frequency) [2, Page 30] The repeated times of a value of
tion is a listing of a data set which divides the data in different mutually
tribution can be constructed for both categorical and numerical data. Mainly,
consideration.
represent categorical data we use different types of tables like univariate table,
(i) Choose the category into which the data are to be grouped.
is as follows :
The univariate frequency distribution table of the above data is displayed below
Muslim 5 50
Hindu 1 10
Christian 3 30
Buddhist 1 10
Total 10 100
Table 2.1.
ordinal variables a cross table is used. A cross table with r rows and c columns
r-th row and c-th column in a cross table is called the cell frequency of
the rj-th position of that table. In a cross table, usually, row total, column
total and percentage comparison are shown. The totals in the columns and
rows are called the marginal frequencies. A cross table that demonstrates
Example 2.25 [4, Table 2.6 of page 43] A contingency table showing rela-
tionship between education level and family size of 50 students is given below
Family size
None 4 6 1 11
Primary 6 8 5 19
higher 6 10 4 20
Column Total 16 24 10 50
Table 2.2.
Note 2.9 A cross table may be of mixed type in terms of variables. One
array first.
But organisation of data using array becomes cumbersome, when the num-
with each value of a variable in one column and its frequency in another col-
1 2 2 3 2
.
4 3 4 2 1
The ungrouped frequency distribution of the above data is give by the Ta-
ble 2.3.
1 2
2 4
3 2
4 2
Total 10
Table 2.3.
2.6.9.1. Range. Let L be the largest value and S be the smallest value in a
data set. Then L − S is called range of that data set. In other words, range is
2.6. TABULAR REPRESENTATION OF DATA 20
by R.
some chosen groups of appropriate size. Each of these groups is called a class.
bution should neither be too large or too small. This should be of a reasonable
classes should be in the range 5 to 25 ([2, Page 31]). The choice of actual
number of classes depends on the number of observations and the size of class
interval desired. Sturge, a famous statician, has proposed the following formula
k = 1 + 3.322 log10 N ,
2.6.9.4. Class interval. The difference between the lower limit and upper
limit of a class is called the class width or class interval of that class. It is
denoted by w.
R
According to the Sturge’s approach, the formula w = determines the
k
approximate the class interval.
2.6. TABULAR REPRESENTATION OF DATA 21
2.6.9.5. Class-mark. The class-mark is the value that lies in the middle
of working days, number of workers, television size etc are discrete data. For-
Problem 2.1 [4, Example 2.4 of Page 52] The number of days absent of the
5 8 9 9 10 10 10 10 11 11
12 12 12 13 13 13 14 14 14 15
15 15 15 16 16 16 16 17 17 17.
17 18 18 18 18 18 19 19 19 19
20 21 21 22 23 24 26 27 29 33
Note 2.10 For discrete distributions, class limits are always inclusive in
nature.
Problem 2.2 [4, Example 2.5 of Page 55] The ages of 50 workers are given
as follows.
25 33 37 42 45 28 34 37 42 46
29 35 37 42 46 30 35 38 43 46
31 35 38 43 46 32 36 38 43 47.
32 36 39 44 50 32 36 40 44 51
33 36 41 44 52 33 37 42 45 54
size.
Note 2.11 For continuous distributions, class limits are always exclusive in
nature.
the i-th class and N is the total frequency, then the percentage of the cases
fi
Pi = × 100.
N
P fi
Note 2.12 × 100 = 100.
N
2.8. GRAPHICAL REPRESENTATION OF DATA 23
present the same through graphs, diagrams or charts. Some of these graphs,
used :
Among these bar and pie diagrams are discussed below as these are com-
are represented by rectangle separated along one of the two axes, namely x-axis
Separation taken along the horizontal axis forms vertical bars, whereas sep-
aration taken along the vertical axis forms horizontal bars. Bar diagram is
Note 2.14 The widths of the bars have no significance, but are taken to
area is proportional to various part into which the whole quantity is divided.
Other data can also be employed to construct a pie chart after suitable and
meaningful classification or grouping of the data. Pie diagrams are also known
as pie chart.
Percent value
angle = × 360o
100
angle = 360o .
P
Note 2.15
2.8. GRAPHICAL REPRESENTATION OF DATA 25
Note 2.17 [4, Page 81] The various parts of the pie chart drawn may be
data are of two types, namely discrete and continuous. These two types of
Beside these, discrete data can be represented by bar charts also [4, Page 91–
92].
(i) Histogram
are taken along horizontal axis and frequencies are taken along the vertical
axis. Drawing a rectangular bar with class boundary as its base and frequency
2.8. GRAPHICAL REPRESENTATION OF DATA 26
a class is used and here the area of a bar represents the corresponding class
Note 2.18 In bar diagram, width of a bar is not significant. But in histogram
it is significant.
2.8.10.1. Histogram for equal class interval. The following problem illus-
Problem 2.3 [4, Example 2.39 of page 93] House rent paid (as percentage
of their total income) by 80 urban families revealed the following data (see
Table 2.4).
4.5 – 9.5 08
9.5 – 14.5 29
14.5 – 19.5 27
19.5 – 24.5 12
24.5 – 29.5 04
Table 2.4.
2.8.10.2. Histogram for unequal class interval. The following example il-
Problem 2.4 [4, Example 2.40 of page 94] A frequency distribution (see
4.5 – 14.5 37
14.5 – 19.5 27
19.5 – 29.5 16
Table 2.5.
polygon, frequencies are taken against the mid-value (or class-mark) of a class
make it closed two classes with zero frequency are taken at the top and bottom
Problem 2.5 [4, Example 2.41 of page 95–96] Draw a frequency polygon for
against the upper limit of the class interval respectively along the horizontal
and vertical axis. It is referred as the ogive. There are two types of ogive
Problem 2.6 Draw a ogive curve for the distribution given in Problem 2.3.
CHAPTER 3
Central Tendency
(a) A central value is useful for describing the position or location of a set
28
3.1. CENTRAL TENDENCY 29
(b) To compare two or more sets of data or series, a central value is used.
value is used. For example, if it is said that the average life expectancy
of sample data.
(e) In the research, averages play vital role in setting standards, estima-
expression.
(c) It should be based upon all the observations and can be calculated
easily.
of average values. Among them, some most commonly used averages are
(i) Mean
(ii) Median
(iii) Mode
3.2. Mean
3.2.1. Mean and its types. One of the well-known central tendencies is
ferred as mean.
Definition 3.2.1 [4, Definition 3.1 of Page 128] The arithmetic mean is
observations and then diving this sum by the number of such observations.
x1 + x2 + x3 + · · · + xn
x̄ = .
n
Pn
i=1 xi
x̄ = ,
n
PN
i=1 xi
µ= .
N
3.2.3. Arithmetic mean for ungrouped data. Suppose that the values
Problem 3.1 [4, Example 3.2 of Page 130] A sample survey of Bangladesh
Bureau of Statistics in a rural area collected the age of first marriage (AFM)
AFM 11 12 13 14 15 16 17 18 19 20
No. of women 17 28 37 52 70 48 36 23 11 08
Calculate the mean age at first marriage for the women in the sample.
mean from grouped distribution, the mid-point of each class is taken as the
3.2. MEAN 32
representative value of that class. The mid-values of different classes are mul-
tiplied by their respective class frequencies. Then the products are added.
Finally, the sum of the products is divided by the total number of frequen-
cies to get the required arithmetic mean. Suppose that there are k number of
Problem 3.2 [4, Example 3.3 of Page 132] Distribution of a group of workers
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
24.5 - 29.5 27 3 81
P
fi y i 1965
Therefore ȳ = P = = 39.3.
fi 50
a constant value a from each of the values of x, then obtain a new set of values
x̄ = a + d¯ ,
d1 + d2 + d3 + · · · + dn
where d¯ = and di = xi − a.
n
Problem 3.3 [4, Example 3.4 of Page 133] Compute the arithmetic mean
of the values 249, 211, 447, 380, 410 and 190 by suitably changing the origin.
x̄ = a + d¯ = 300 + 13 = 313.
Answer 313.
3.2. MEAN 34
yi = a + hdi
measurement.
Problem 3.4 [4, Example 3.5 of Page 134] Distribution of a group of workers
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
yi − a
Age (in years) Mid-value (yi ) Frequency (fi ) di = f i di
h
24.5 - 29.5 27 3 -2 -6
29.5 - 34.5 32 9 -1 -9
34.5 - 39.5 37 15 0 0
39.5 - 44.5 42 12 1 12
44.5 - 49.5 47 7 2 14
49.5 - 54.5 52 4 3 12
P P
Total – fi = 50 – fi di = 23
Table 3.2.
P
¯ f i di 23
Here a = 37, h = 5 and d = P = = 39.5. Therefore
fi 50
only when a and h are chosen wisely and h is uniform throughout the distri-
bution.
Note 3.3 There are many others mean also. For example, pooled mean,
n
X n
X n
X
(xi − x̄) = xi − x̄ = nx̄ − nx̄ = 0.
i=1 i=1 i=1
Note 3.4 Above theorem implies that sum of the deviations from an arbi-
trary value other than the mean is nonzero. So the mid-value of any class of the
given distribution can taken as the assumed mean. But it is more convenient
to choose the mid-point of the class nearer to the center of the distribution as
observations x11 , x12 , x13 , · · · , x1m with mean x̄1 and a second set consisting
of n observations x21 , x22 , x23 , · · · , x2n with mean x̄2 , then the combined mean
mx̄1 + nx̄2
x̄c of all the m + n observations is x̄c = .
m+n
deviation from the arithmetic mean is less than the sum of squared deviation
X X X
(xi − a)2 = (xi − x̄ + x̄ − a)2 = ((xi − x̄) + (x̄ − a))2
X
(xi − x̄)2 + 2(xi − x̄)(x̄ − a) + (x̄ − a)2
=
X X
= (xi − x̄)2 + 2(x̄ − a) (xi − x̄) + n (x̄ − a)2 ,
by Theorem A.1.5
X
= (xi − x̄)2 + n (x̄ − a)2 ,
X
as (xi − x̄) = 0 by Theorem 3.2.2.
Theorem 3.2.6 Arithmetic mean is the most stable measure of central ten-
Theorem 3.2.7 If a and b are constants such that x = a ± by, where x and
P
w1 x1 + w2 x2 + w3 x3 + · · · + wn xn w i xi
x̄w = = P .
w1 + w2 + w3 + · · · + wn wi
3.3. MEDIAN 38
1
G = (x1 .x2 .x3 . · · · .xn ) n .
Note 3.5 [4, Page 182] Geometric mean is used with numbers that tend
values x1 , x2 , x3 , · · · , xn is defined as
n n
H= =P .
1 1 1 1 1
+ + + ···
x1 x2 x3 xn xi
Note 3.6 [4, Page 186] Harmonic mean is used when rates are expressed as
x per y and x is a constant. For example, miles per hour, production per acre,
3.3. Median
magnitude.
Definition 3.3.1 [4, definition 3.3 of Page 145] The median, denoted by
m̃, is the value in a set of ordered observation that divides the whole set of
middle of the data set after the values in the set have been placed in the
ordered way.
low.
(c) The median can be computed from distributions with open-ends classes
(d) Unlike the mean, the median can be obtained for all levels of data
given below.
(a) An overall pooled median can not be obtained from a set of medians.
3.3.4. Median for ungrouped data. To find the median for ungrouped
order.
they are arranged in order of magnitude. Then the median m̃ is given by the
3.3. MEDIAN 40
formula
n+1
th observation, if n is odd
2
m̃ = n n
th observation + + 1 th observation
2 2
, if n is even.
2
Example 3.2 Suppose that we are given the ages (in years) of 7 boys as 7,
4, 5, 6, 7, 8, 10, 11.
Here 7 is lying in the middle of the ordered data set. So 7 is the median.
Example 3.3 Suppose that we are given the ages (in years) of eight boys as
Here two values namely 7 and 8 are lying in the middle of the ordered data
7+8
set. So here the median is the = 7.5.
2
Problem 3.5 [4, Example 3.17] Result of a survey conducted among 100
families to know their family size produces the following distribution (see Ta-
ble 3.3).
Family size 1 2 3 4 5 6 7 8 9
Number of families 2 6 12 18 19 15 11 11 6
Table 3.3.
Family size 1 2 3 4 5 6 7 8 9
Frequency, fi 2 6 12 18 19 15 11 11 6
1(100 + 1)
= 50.5-th ordered position.
2
= 5 + 0.5(5 − 5) = 5.
h n
m̃ = lm + − F(m)−1 ,
fm 2
where
Problem 3.6 [4, Example 3.19 of Page 154] Distribution of a group of work-
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.5.
h n
m̃ = lm + − F(m)−1 = 38.83.
fm 2
3.3. MEDIAN 43
24.5 - 29.5 3 3
29.5 - 34.5 9 12
34.5 - 39.5 15 27
39.5 - 44.5 12 39
44.5 - 49.5 7 46
49.5 - 54.5 4 50
Total 50 –
Table 3.6.
Problem 3.7 [4, Example 3.18] The rate of sales tax as a percentage of the
from 0% to 25%. The distribution (see Table 3.7) of the tax payers by sales
intervals of 5%.
00 – 05 75
05 – 10 128
10 – 15 100
15 – 20 68
20 – 25 29
Table 3.7.
3.4. QUARTILES, PERCENTILES AND DECILES 44
Problem 3.8 [4, Page151–152] The longevity (in years) of 40 rats as ob-
1.45 – 1.95 2
1.95 – 2.45 1
2.45 – 2.95 4
2.95 – 3.45 15
3.45 – 3.95 10
3.95 – 4.45 5
4.45 – 4.95 3
Table 3.8. Longevity of rats’ life in years
dian graphically. For detail, go through [4, Page 155 – 156]. Also by ogive
3.4.1. Quartiles. Quartiles are three such values related to a given data
(i) the first quartile, namely Q1 , is the value from which 25% of all obser-
vations are smaller and remaining 75% of all observations are greater.
(ii) the second quartile, namely Q2 , is the value from which 50% of all
greater.
(iii) the third quartile, namely Q3 , is the value from which 75% of all
greater.
3.4.1.1. Quartiles for ungrouped data. The point of location for the Qr -th
r(n + 1)
quartile value is Lr = -th ordered position. If
4
(i) Lr is an integer, then the particular numerical observation correspond-
quartile Qr = V5 + 0.25(V6 − V5 ).
Problem 3.9 [4, Example 3.23 of Page 161] Calculate Q1 , Q2 and Q3 for
the following ordered observations :14, 17, 19, 23, 27, 32, 40, 49, 54, 59, 71,
80.
h
3.4.1.2. Quartiles for grouped data. Qr = lr + Lr − F(r)−1 for r =
fr
r(n + 1)
1, 2, 3, where Lr = is the location of r-th quartile, F(r)−1 is the
4
cumulative frequency of the class prior to the r-th quartile class, fr is the
3.4. QUARTILES, PERCENTILES AND DECILES 46
frequency of the r-th quartile class and lr is the lower limit of the r-th quartile
class.
Problem 3.10 [4, Example 3.25 of Page 163] Compute the first and third
quartile from the following age distribution (see Table 3.9) of the 50 workers
of a company.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.9.
24.5 - 29.5 3 3
29.5 - 34.5 9 12
34.5 - 39.5 15 27
39.5 - 44.5 12 39
44.5 - 49.5 7 46
49.5 - 54.5 4 50
Total 50 –
Table 3.10.
h
Q1 = l1 + L1 − F(1)−1 ,
f1
3.4. QUARTILES, PERCENTILES AND DECILES 47
1(50 + 1)
and L1 = = 12.75. Observe that 12.75 lies in the class 34.5 − 39.5.
4
So l1 = 34.5, h = 5, f1 = 15 and F(1)−1 = 12. Hence
5
Q1 = 34.5 + (12.75 − 12) = 34.75.
15
h
Q3 = l3 + L3 − F(3)−1 ,
f3
3(50 + 1)
and L3 = = 38.25. Observe that 38.25 lies in the class 39.5 − 44.5.
4
So l1 = 39.5, h = 5, f3 = 12 and F(3)−1 = 27. Therefore
5
Q3 = 39.5 + (38.25 − 27) = 44.19.
12
3.4.3. Percentiles. Quartiles are three such values related to a given data
3.4.3.1. Percentiles for ungrouped data. The point of location for the Pr -th
r(n + 1)
quartile value is Lr = -th ordered position. If
100
(i) Lr is an integer, then the particular numerical observation correspond-
Problem 3.11 [4, Example 3.27 of Page 166] Calculate 29th and 75th per-
centiles for the following ordered observations : 11, 14, 17, 23, 27, 32, 40, 49,
h
3.4.3.2. Percentiles for grouped data. Pr = lr + Lr − F(r)−1 for r =
fr
r(n + 1)
1, 2, 3, · · · , 99, where Lr = is the location of r-th percentile, F(r)−1
100
is the cumulative frequency of the class prior to the r-th percentile class, fr is
the frequency of the r-th percentile class and lr is the lower limit of the r-th
percentile class.
Problem 3.12 [4, Example 3.28 of Page 167] Compute the 30th percentile
from the following age distribution (see Table 3.11) of the 50 workers of a
company.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.11.
3.4.4. Deciles. Deciles are 9 such values related to a given data set of
3.4.4.1. Deciles for ungrouped data. The point of location for the Dr -th
r(n + 1)
decile value is Lr = -th ordered position. If
10
V5 = 5-th ordered value and V6 = 6-th ordered value. Then the decile
Dr = V5 + 0.77(V6 − V5 ).
Problem 3.13 [4, Example 3.30 of Page 168] Calculate 4th decile for the
following ordered observations : 14, 17, 23, 27, 32, 40, 49, 54, 59, 71, 80.
h
3.4.4.2. Deciles for grouped data. Dr = lr + Lr − F(r)−1 for r = 1,
fr
r(n + 1)
2, 3, · · · , 9, where Lr = is the location of r-th decile, F(r)−1 is
10
the cumulative frequency of the class prior to the r-th decile class, fr is the
frequency of the r-th decile class and lr is the lower limit of the r-th decile
class.
Problem 3.14 [4, Example 3.31 of Page 169] Age distribution of 50 workers
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.12.
3.5. Mode
Definition 3.5.1 [4, definition 3.4 of Page 170] The mode is denoted by Mo
3.5.3. Mode for grouped distribution. The class in which mode lies
formula
∆1
M0 = l0 + h ,
∆1 + ∆2
where
Definition 3.5.2 If a series of observations has more than mode, then the
Problem 3.15 [4, Example 3.34] Age distribution of some workers of a com-
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.13.
Measure of Dispersion
sion, we mean how the observations or values of a data set are scattered, or
varied or dispersed. For detail study the reader is referred to [4, Chapter 4]
and for more problems the reader is referred to [3, Chapter 25].
give any idea as to how the individual values differ from the central value,
more specifically, whether they are closely packed around the central value
or widely scattered away from it. The magnitude of such a variation of the
Definition 4.1.1 [2, Page 121] The distance of different individual values
are as follows.
52
4.1. WHAT IS MEASURE OF DISPERSION 53
the distribution.
the distribution.
(iii) zero (or absent) dispersion indicates the perfect uniformity of the ob-
When all observations of a distribution are identical, then the dispersion be-
To summarise a large data set, that is, to locate the center of data set or
of variation among the observations, that is, when all the observations are
not identical, then to get the precise descriptive summary of a data set both
compare between two distributions. Two different distributions may have ex-
actly the same averages, but they may have differences in variability. For
4.2. CLASSIFICATION OF MEASURES OF DISPERSIONS 54
example, suppose that three students have obtained the following marks (see
1 61 49 40 50 21
2 52 53 45 50 8
3 51 50 49 50 2
Table 4.1.
In Table 4.1 all three distributions are not identical. But their averages are
same. The differences lie in the dispersion of their scores. The first student
shows the largest variation in his scores. The second student shows relatively
less variation, while the third student shows least variation in his secured
scores among the stated three students. The scores of third one secured in
the fact that how the values or the observations of a data set are spread out
following categories :
(i) range
(iv) variance
4.3.1.1. Range for ungrouped data. In case of ungrouped data, if the highest
value and lowest value of a given data set are H and L respectively, then its
range is R = H − L .
4.3.1.2. Range for grouped data. Difference between the upper boundary
4.5 – 14.5 37
14.5 – 24.5 27
24.5 – 34.5 16
Table 4.2.
Example 4.2 Range of the above grouped distribution (see Table 4.2) is
Example 4.3 Coefficient of the range of the data set given in Example 4.1
4
is CR = × 100% = 40%.
7+3
4.4.1. Mean (or average) deviation. Mean deviations are defined for
sample of observations. Then the mean deviation about any arbitrary value a
P
|xi − a|
is Md (a) = . If we replace a by
n
P
|xi − a|
(i) mean x̄, then we get Md (x̄) = , which is called mean de-
n
viation about the mean.
P
|xi − m̃|
(ii) median m̃, then we get Md (m̃) = , which is called mean
n
deviation about the mean.
P
|xi − M0 |
(ii) mode M0 , then we get Md (M0 ) = , which is called mean
n
deviation about the mode.
Problem 4.1 [6, Episode 10] Calculate the mean deviation of the data set
given below.
1, 2, 3, 4, 5, 6, 7.
1+2+3+4+5+6+7
Solution x̄ = = 4.
7
X
|xi − x̄| = 3 + 2 + 1 + 0 + 1 + 2 + 3 = 12.
P
|xi − x̄| 12
Therefore Md (x̄) = = = 1.714.
n 7
Food for thought 4.1 Why is the absolute value of (xi − x̄) taken in defi-
P
fi |xi − a|
Then the mean deviation about any arbitrary value a is Md (a) = .
n
If we replace a by
P
fi |xi − a|
(i) mean x̄, then we get Md (x̄) = , which is called mean
n
deviation about the mean.
P
fi |xi − m̃|
(ii) median m̃, then we get Md (m̃) = , which is called mean
n
deviation about the mean.
P
fi |xi − M0 |
(ii) mode M0 , then we get Md (M0 ) = , which is called
n
mean deviation about the mode.
Note 4.2 [4, Page 213] Among Md (x̄), Md (m̃) and Md (a) the mean deviation
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 4.3.
(a) mean
(b) median
Solution Construct the following Table 4.4 and calculate the mean x̄.
4.4. MEAN DEVIATION AND COEFFICIENT OF IT 59
Age (in years) Mid-value (xi ) Frequency (fi ) fi xi |xi − x̄| fi |xi − x̄|
P
f i xi 1965
(a) Clearly, x̄ = = = 39.3. Therefore from the Table 4.4,
n
P 50
fi |xi − x̄| 247.2
we obtain that Md (x̄) = = = 5.48.
50 50
(b) From Problem 3.6, we get median m̃ = 38.83. By constructing an
P
appropriate table we can calculate fi |xi − m̃|. Here
X X
fi |xi − m̃| = fi |xi − 38.8| = 272.32.
P
fi |xi − 38.8| 272.32
Therefore Md (m̃) = = = 5.45.
50 50
(c) By constructing a convenient table we can calculate the required mean
X X
fi |xi − a| = fi |xi − 42| = 285.
P
fi |xi − 42| 285
Therefore Md (42) = = = 5.7.
50 50
the corresponding central location (like mean, median, mode) is called the
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 60
(i) the mean x̄, then the coefficient of mean deviation is denoted by
Md (x̄)
CMd (x̄) and defined as CMd (x̄) = × 100% .
x̄
(ii) the median m̃, then the coefficient of mean deviation is denoted by
Md (m̃)
CMd (x̄) and defined as CMd (m̃) = × 100% .
m̃
(iii) the mode M0 , then the coefficient of mean deviation is denoted by
Md (M0 )
CMd (M0 ) and defined as CMd (x̄) = × 100% .
M0
third quartile. Then the quantity Q3 − Q1 is called the inter quartile range
Q3 − Q1
and the quantity is called the quartile deviation. Quartile devia-
2
tion is denoted by QD and is also known as semi-inter quartile range. So
Q3 − Q1
QD = .
2
measure of dispersion.
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 61
Definition 4.6.1 [2, Page 125] The positive square root of the mean of the
squared deviations of a set of values from their mean is called the standard
However, if the deviations are measured from any other value a other than
Food for thought 4.2 What is the minimum value of standard deviation?
set of values. Then their standard deviation can be calculated by the formula
rP
(xi − x̄)2
σ= , where x̄ is the mean.
n
sP 2
x2i
P
xi
Theorem 4.6.1 σ= − .
n n
Theorem 4.6.2 If we consider a as the assumed mean of the given data set
avoid applying the formula given in Theorem 4.6.1. Because if we apply this
2, 3, 4, 5, 6.
rP
(xi − x̄)2
Hints Apply the formula σ = .
n
sP 2
fi d2i
P
f i di
σ =h× − ,
n n
avoid applying the formula given in Theorem 4.6.3. Because if we apply this
Note 4.3 [4, Equation 4.19 of Page 216] In the textbooks for higher studies,
is denoted by σ 2 .
standard deviation.
Theorem 4.6.5 [2, Theorem 2 of Page 134] Variance depends on scale not
on origin.
σ
Cv = × 100% .
x̄
Problem 4.6 [6, Episode 11] Sales (in Taka) of 25 days of a company is
Sales (Lakh) 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60
No. of days 03 06 11 03 02
Table 4.5.
Calculate its (a) standard deviation, (b) variance and (c) coefficient of
variance.
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 64
Solution Construct the following Table 4.6 and calculate the mean x̄.
10 – 20 15 03 45 3.24 972
30 – 40 35 11 385 04.0 44
P
f i xi 825
Here x̄ = P = = 33.
fi 25 sP r
fi (xi − x̄)2 2800
(a) Standard deviation, σ = P = = 10.58.
fi 25
(b) Variance, σ 2 = (10.58)2 = 111.9364.
σ 10.58
(c) Coefficient of variance, Cv = × 100% = × 100% = 32.06%.
x̄ 33
Alternative method. To solve the problem using the formula
sP 2
fi d2i
P
f i di
σ =h× − ,
n n
we construct the following Table 4.7. Let us consider the assumed mean,
xi − a
a = 35. Here h = 10. By the formula di = , we calculate di .
h
(a) Standard deviation,
sP 2 s 2
fi d2i
P
f i di 29 −5
σ =h× P − P = 10 × − = 10.58.
fi fi 25 25
Sales xi fi di fi di fi d2i
10 – 20 15 03 -2 -6 12
20 – 30 25 06 -1 -6 06
30 – 40 35 11 0 00 00
40 – 50 45 03 1 03 03
50 – 60 55 02 2 04 08
Total – 25 – -5 29
Table 4.7.
P
f i xi
(c) Here x̄ = P = 33. Alternatively,
fi
P
f i di −5
x̄ = a + h P = 35 + × 10 = 33.
fi 25
σ 10.58
Therefore coefficient of variance, Cv = × 100% = × 100% = 32.06%.
x̄ 33
Exercise 4.1 [6, Episode 12] Sales (in kg) of 25 days of a grocery is given
Sales (kg) 01 – 05 06 – 10 11 – 15 16 – 20 21 – 25 26 – 30
No. of days 02 03 05 08 04 03
Table 4.8.
Calculate its (a) mean deviation, (b) standard deviation, (c) variance and
Solution Construct
P the following Table 4.9 and calculate the mean x̄.
f i xi 415
Here x̄ = P = = 16.6.
fi 25 P
fi |xi − x̄|
(a) Mean deviation, Md (x̄) = P
fi
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 66
0.50 – 05.5 03 02 06
05.5 – 10.5 08 03 24
10.5 – 15.5 13 05 65
20.5 – 25.5 23 04 92
25.5 – 30.5 28 03 84
Total – 25 415 —
Table 4.9.
sP
fi (xi − x̄)2
r
−
(b) Standard deviation, σ = P = = −−.
fi −
(c) Variance, σ 2 = (−)2 = −−.
σ −−
(d) Coefficient of variance, Cv = × 100% = × 100% = − − %.
x̄ −−
Note 4.4 Since in Exercise 4.1 it is also asked to find mean deviation and
rP
fi (xi − x̄)2
in it |xi − x̄| is used, we employ the formula σ = to find the
n
standard deviation and according to the demand of this formula we construct
Problem 4.7 [6, Episode 13] The share prices (in Taka) during first 12 days
DSE 105 120 115 118 130 127 109 110 104 112
CSE 108 117 120 130 100 125 125 120 110 135
Table 4.10.
sP
(xi − x̄)2
For Chattogram Stock Exchange (CSE), x̄ = 119, σ = P =
fi
σ
10.09. Cv = × 100% = 8.47%.
x̄
Since the coefficient of variance of Dhaka Stock Exchange is less than that
Probability
5.1. Probability
68
Bibliography
[2] M. Abdul Aziz, Statistics First Paper, Seventh ed., The Angel Publications, Dhaka,
January 2022.
[3] B. S. Grewal, Higher Engineeering Mathematics, Fourty third ed., Khanna Publishers,
2015.
[4] M. N. Islam, An Introduction to Statistics and Probability, Fifth ed., Mullick & Brothers,
[5] R. E. Walpole, R. H. Myers, S. L. Myers and K. E. Ye, Probability and Statstics for
youtube.com/playlist?list=PLxSt9YDBipm6UxKzXpeyzSsED94ukGEBJ, 2018.
[7] S. M. Ross, Introduction to Probability and Statistics for Engineers and Scientists, Fifth
69
APPENDIX A
x1 + x2 + x3 + · · · + xn
Pn
is simply represented by i=1 xi .
n
X n
X
αxi = α xi .
i=1 i=1
Theorem A.1.2 [2, Page 12] If x is a variable and α, β are two constants,
then
n
X n
X
(αxi − β) = α xi − nβ.
i=1 i=1
Theorem A.1.3 [2, Page 12] If x is a variable and α, β, γ are three constants,
then
n
X n
X n
X
αx2i x2i
− βxi + γ = α −β xi + nγ.
i=1 i=1 i=1
Theorem A.1.4 [2, Page 13] If x, y are two variables and α, β are two
constants, then
n
X n
X n
X
(αxi − βyi ) = α xi − β yi .
i=1 i=1 i=1
70
A.1. SIGMA NOTATION 71
Theorem A.1.5 [2, Page 13] If x, y are two variables and α, β are two
constants, then
n
X n
X n
X
2 2
(αxi − β) = α x2i − 2αβ xi + nβ 2 .
i=1 i=1 i=1