STAT

Lecture Notes on Statistics
Prepared by
Muhammad Shahnewaz Bhuyan
Lecturer
Department of Mathematics
Chittagong University of Engineering & Technology
Chottogram - 4349, Bangladesh
Former Lecturer at Sylhet Cadet College, Sylhet-3101, Bangladesh
Started from : November 20, 2023
Last Updated : February 7, 2024

“There are three kind of lies : lies, damned
lies and statistics”

— British prime minister Benjamin Disraeli
Course Outline
Syllabus
Standard deviation, Variance and coefficient of variance, Correlation and
regression, Probability, Bayes’ theorem, Discrete and random variable, Proba-
bility mass function, Probability density function, Mathematical expectations.
Books Recommended
(1) M. N. Islam, Introduction to Statistics and Probability, Fifth edition,
Mullick & Brothers, Dhaka, June 2022
(2) R. E. Walpole, R. H. Myers, S. L. Myers and K. E. Ye, Probability and
Statistics for Engineering and Scientists, Ninth edition, Pearson, 2022.
(3) S. M. Ross, Introduction to Probability and Statistics for Engineers
and Scientists, Fifth edition, Elsevier, 2014.
Class Routine
Sunday Monday Tuesday Wednesday
09:00 - 09:50 Sec A
09:50 - 10:40 Sec C Sect C
11:00 - 11:50 Sec B Sec B
11.50 - 12:40 Sec A Sec B
12:40 - 01:30 Section A Sec C
iv
Contents
Course Outline iv
Syllabus iv
Books Recommended iv
Class Routine iv
Chapter 1. Introduction to Statistics 1
1.1. Population and Sample 1
1.2. Definition of Statistics 1
1.3. Types of Statistics 2
1.4. Singular and Plural Perspective of Statistics 3
1.5. Characteristics, Usage and Limitations of Statistics 3
Chapter 2. Data Summarization and Representation 6
2.1. Data 6
2.2. Level of Measurement 8
2.3. Variable and Attribute 11
2.4. Frequency Distribution 14
2.5. Representation of Data 15
2.6. Tabular Representation of Data 15
2.7. Other Forms of Frequency Distributions 22
2.8. Graphical Representation of Data 23
v
CONTENTS vi
Chapter 3. Central Tendency 28
3.1. Central Tendency 28
3.2. Mean 30
3.3. Median 38
3.4. Quartiles, Percentiles and Deciles 44
3.5. Mode 50
3.6. Comparison and Relationship between the Averages 51
Chapter 4. Measure of Dispersion 52
4.1. What is Measure of Dispersion 52
4.2. Classification of Measures of Dispersions 54
4.3. Range and Coefficient of It 55
4.4. Mean Deviation and Coefficient of It 56
4.5. Quartile Deviation and Coefficient of It 60
4.6. Variance and Coefficient of Variance 60
Chapter 5. Probability 68
5.1. Probability 68
Bibliography 69
Appendix A. Some Theorems Related to Sigma Notation 70
A.1. Sigma Notation 70

CHAPTER 1
Introduction to Statistics
For detail discussion the reader is referred to [2, 4, 5].
1.1. Population and Sample
1.1.1. Statistical population and sample. In statistics notions of pop-
ulation and sample are of great importance. Simply, statistics is a study of
population through sample.
Definition 1.1.1 [4, Definition 1.2 of Page 11] A statistical population is
the collection of all items of interest in a particular study.
Definition 1.1.2 [4, Definition 1.2 of Page 11] Sample is a representative
subset (or part) of population.
1.2. Definition of Statistics
1.2.1. Definition of statistics. Etymologically, statistics means the num-
ber - based description of any event or subject. R. A. Fisher is known as the
father of statistics. Different statisticians have given the definition of statistics
from their perspective.
Definition 1.2.1 According to R. A. Fisher, the science of statistics is
essentially a branch of applied mathematics and may be regarded as mathe-
matics, applied to observational data.
1
1.3. TYPES OF STATISTICS 2
Definition 1.2.2 According to Croxton and Cowden, Statistics may be
defined as the science of collection, presentation, analysis and interpretation
of numerical data.
A summarized version of the definitions of statistics given by different
statisticians is as follows.
Definition 1.2.3 [4, Definition 1.1 of Page 9] Statistics is concerned with
scientific methods for collecting, organising, summarising, presenting and an-
alyzing sample data from a specified population of interest as well as drawing
valid conclusions, and making inferences about the population characteristics
and finally reaching a reasonable decision.
Simply, statistics can be treated as methodology for collecting, presenting,
analyzing and interpreting data.
1.3. Types of Statistics
1.3.1. Sub-divisions of statistics. Mainly, statistics is sub-divided into
two categories :
(i) descriptive statistics
(ii) inferential statistics.
1.3.2. Descriptive statistics. It includes methods for describing, orga-
nizing or summarizing a data set. It can be referred as a superficial study of
data. Constructing charts, tables etc on the basis of numerical data are the
part of descriptive statistics.

1.5. CHARACTERISTICS, USAGE AND LIMITATIONS OF STATISTICS 3
1.3.3. Inferential statistics. It finds out some characteristics of the pop-
ulation on the basis of a sample chosen from the population. Inferential statis-
tics use summarized data to predict or infer some general statements about the
population on the study run over a sample. It is also referred as predictive
statistics or inductive statistics.
1.4. Singular and Plural Perspective of Statistics
1.4.1. Singular perspective. In singular sense, statistics refers to the
field of study as a subject.
Example 1.1 He is good at statistics — here singular perspective is used.
1.4.2. Plural perspective. In plural sense, statistics refer to information
in terms of numerical data.
Example 1.2 The statistics illustrate that the illiteracy rate has been de-
creased by 5% in the last two years.
1.4.3. Difference between singular and plural perspective of sta-
tistics. In singular sense, like an uncountable noun statistics refers to the field
of study as a whole. This meaning is used to discuss different principles, tech-
niques, notions which are employed for studying data. In plural perspective,
statistics refers to numerical data which have been collected and analyzed.
1.5. Characteristics, Usage and Limitations of Statistics
1.5.1. Characteristics of statistics. The statistics we collect should
posses the following characteristics.

(a) Statistics should deal with aggregate or average of observations of
individuals rather than with individuals alone.
(b) Statistics should expressed as numerical figures.
(c) Statistics should have the property of being varied by multiplicity of
causes.
(d) Statistics should be of reasonable standards of accuracy.
(e) Statistics should be obtained for pre-determined purposes.
(f) Statistics collected should allow comparison with other data.
1.5.2. Usages and importance of statistics. The scope and usages of
statistics are very wide and increasing day by day. A few examples of different
usages are given below.
(a) Business
(b) Pharmaceutical companies
(c) Nutrition survey
(d) Manufacturing industries
(e) Insurance companies
and so on.
1.5.3. Limitations of statistics. Statistics has a number of limitations,
some of these are given below.
(a) Statistics are not 100% are precise like is Mathematics.
(b) There are certain phenomena or notions like beauty, honesty, intelli-
gence etc which can not be quantified. So here statistics can not be
applied.
(c) Statistics reveals the average behaviour. So if the decision taken on
an average concept is applied on an individual, it may lead to a wrong
conclusion.
(d) Statistics studies population through sampling. Sampling is generally
used as it is not physically possible to cover all the units consisting
the population. Moreover, different surveys run over the same size of
sample may give different results.
(e) Statistics are collected for a fixed target. So data collected for one
purpose may not be relevant for another purpose.
(f) Statistics studies relationships between two or more variables. Such
a relationship show the similarity or dissimilarity in the movement of
the two variables. But these relationship do not indicate cause.

CHAPTER 2
Data Summarization and Representation
In this chapter, for details the reader is referred to study [2, 4, 5].
2.1. Data
Definition 2.1.1 [4, Subsection 1.6.1 of Page 11] Data are the observations
or chance outcomes that occur in planned experiments or scientific investiga-
tions.
2.1.1. Classification of data. In broad sense, statistical data are clas-
sified into two categories as follows.
(i) Quantitative data
(ii) Qualitative data.
2.1.2. Quantitative data. Which can be measured in quantitative units.
Example 2.1 Measurements of weight, height, temperature of a body.
2.1.3. Qualitative data. These are generated by assigning observations
into various independent categories and then counting the frequency of the
occurrences within these categories.
Example 2.2 Eye colour, IQ, religion etc.
2.1.4. Sources of data. There are mainly two sources of data, namely
(a) primary sources
6
2.1. DATA 7
(b) secondary sources
2.1.5. Primary sources. [4, Page 12] These are raw materials of primary
of investigations.
Example 2.3 Data collected from survey, online survey, interview, observa-
tions, experiment etc
2.1.6. Secondary sources. [4, Page 14] Those data which have been
already collected by and readily available from other sources.
Example 2.4 Salary statement collected from a bank.
2.1.7. Classification of data. [2, Page 26] Classification is a process
of arranging the available information into homogeneous groups according to
similarities or same characteristics.
Definition 2.1.2 [4, Page 30] According to Conner, the process of arranging
things in groups or classes according to their resemblances and affinities and
gives expression to the unity of attributes that may subsist amongst a diversity
of individuals.
2.1.8. Basis of classifying data. [2, Page 26 – 27] Data can be classified
based on the following categories :
(a) Geographical classification : data are classified on the basis of
geographic areas.
(b) Chronological classification : data are classified on the basis of
time.
2.2. LEVEL OF MEASUREMENT 8
(c) Quantitative classification : data are classified in terms of magni-
tude. In this classifiaction there are two important elements, namely
variable and frequency. For example, the marks obtained by the EEE
students in statistics. Here marks is variable and the number of EEE
students is frequency.
(d) Qualitative classification : data are classified on the basis of some
attributes. For example, on the basis religion, sex etc.
2.1.9. Importance of classifying data. [4, Page 31] The necessity of
classifying summarizing and classifying data is
(a) to simplify the complex data.
(b) to economize spaces.
(c) to identify omissions and errors.
(d) to facilitate statistical analysis.
(e) to depict trend.
(f) to use it as future reference.
2.2. Level of Measurement
2.2.1. Measurement. Statistical data are obtained through measure-
ment.
Definition 2.2.1 [4, Page 33] Measurement is essentially the task assigning
numbers to observations according to certain rules.

2.2.2. Levels of measurements. The way in which numbers are as-
signed to observations determines the scale of measurements. There are four
levels of measurements such as :
(i) nominal level (iii) interval level and
(ii) ordinal level (iv) ratio level.
2.2.3. Nominal level. All qualitative measurements are nominal. In this
level of measurement, the categories differ from one another only in name.
That is, one category of a characteristic is not necessarily higher or lower,
greater or smaller than the other category.
Example 2.5 Colour such as red, white etc; sex such as male, female; nu-
merals such as bank account no, student registration number etc.
Note 2.1 Measurements involving numbers can be nominal also. For in-
stance, student ID, Room no, NID number, Birth registration number etc.
In this case these numbers have no numerical value, rather these are used as
identity.
2.2.4. Ordinal level. There is ordered relationship among the categories.
Unlike nominal level, here we get the typical relationships higher, lower, more
than, less than, more difficult, less favorable, more prejudiced etc.
Example 2.6 The level of education such that MS, B.Sc(Hons) etc.
Example 2.7 Official position like Professor, Associate Professor, Assistant
Professor, Lecturer etc.

Example 2.8 Class performance like outstanding, excellent, very good, good.
2.2.5. Interval level. It includes all properties of the previous two levels.
In addition, the difference between values is known and of constant size. In
this measurement level zero is not a significant figure.
Example 2.9 The difference 21o and 20o in centigrade temperature scale is
same as 13o and 12o .
Note 2.2 In interval level of measurement 0 does not mean the absence of
anything. For example, 0c temperature does not mean the absence of temper-
ature, 0000 hour does not mean the absence of time and so on.
Note 2.3 In interval level of measurement, addition and substraction are
defined, though multiplication and division are not defined. For instance, 10o
centigrade temperature is not double of the temperature 50 centigrade, year
2024 is not the half of the year 2048 or double of the year 1012 and so on.
2.2.6. Ratio level. Numerical measurements in which zero (0) is a mean-
ingful value and is the difference between the values is important are ratio level
of measurement.
Example 2.10 Height, weight, wage etc.
Note 2.4 Unlike interval of measurement in ratio level of measurement, mul-
tiplication and division are defined. For instance, a plant with height 16 feet
is double of a plant with 8 feet.

2.3. VARIABLE AND ATTRIBUTE 11
Note 2.5 Higher level of measure can be used as a lower level of measure,
but the converse is not true.
2.3. Variable and Attribute
2.3.1. Variable. The characteristics which varies from unit to unit is
called variable.
Definition 2.3.1 (Variable) [4, Definition 2.5 of Pgae 36] A variable is
a characteristic or property, often but not always quantitatively measured,
containing two or more values or categories that can vary from one individual
to another individual.
Example 2.11 Fathers’ occupation of the students of EEE Department of
CUET.
Example 2.12 Income, as it varies person to person.
Example 2.13 Age, as it varies person to person.
2.3.2. Classification of variables. There are two types of variables mainly.
These are :
(i) qualitative variable and
(ii) quantitative variable.
2.3.3. Qualitative variable. The characteristics of a unit or of an item
that can not be expressed in numerical form is called qualitative variable.
It is also referred as categorical variable or nominal variable.

Example 2.14 Religion of the students of EEE Department.
Example 2.15 Gender
2.3.4. Quantitative variable. The characteristics of a unit or of an item
that can be expressed in numerical form is called quantitative variable. This
type of variable is known as metric variable.
Example 2.16 Fathers’ monthly income of the students of EEE Department.
2.3.5. Classification of quantitative variables. Quantitative variables
are also divided into two categories :
(i) discrete variable and
(ii) continuous variable.
2.3.6. Discrete variable. A quantitative variable that can take only
some (pre-)specified isolated values is called discrete variable. Discrete vari-
able does not necessarily involves only integer or whole numbers, it can take
fractional or decimal values also.
Example 2.17 Grades of SSC exam is a discrete variable. Because it does
not take all numbers (for example 4.0123) between 4.00 to 5.00. It takes only
some fixed numbers for different gradings, like 5.00 for A+ , GPA 4.00 for A,
GPA 3.75 for A− and so on, but Grading system GPA 3.00019 does not come.
Example 2.18 Number of road accidents per day in Chottogram.

Example 2.19 Laptop size, television size (there is no 14.234 inches size
television), Mobile phone size, video quality (there is no video with quality
725.4 or in between 720–1080) etc.
Note 2.6 A variable taking only integer value is always discrete. For exam-
ple, number of students in different Departments of CUET. On the other hand,
a discrete variable may not take only integral value. For example, CGPA of a
student may be 3.71.
2.3.7. Continuous variable. A quantitative variable that can take val-
ues within a range or limit is called continuous variable.
Example 2.20 Height and weight of the students of EEE Department.
Note 2.7 A continuous variable can take any value between any two given
values, like the height of a plant can be any value from 5 meter to 20 meter.
2.3.8. Constant. A numerical characteristic that never changes is called
a constant.
Example 2.21 Total angle of a triangle, velocity of light, radius of the earth,
π, e, 32 etc.
2.3.9. Attribute. The distinct categories of the qualitative variable is
referred as attributes.
Definition 2.3.2 (Attribute) [4, Definition 2.10 of Pgae 38] An attribute
is a quality, character or characteristic to someone or something. In other

2.4. FREQUENCY DISTRIBUTION 14
words, attribute is referred as the property or characteristic of categorical (or
qualitative) data.
Attribute data is typically used to conjunction with other forms od data
to provide additional contexts and insights.
Example 2.22 [1] Suppose that you are looking at sales data of a clothing
store. You may use attribute like colour and size to segment the data and
better understand which product are selling well and which are not.
Example 2.23 [4] If someone notes down for each individual whether he/she
possess or does not posses certain characteristic like owns a laptop, smokes or
does not smoke, holds an opinion on certain political issues - these character-
istics may be called attribute.
2.4. Frequency Distribution
2.4.1. Frequency and frequency Distribution. Frequency means how
many times an observation occurs.
Definition 2.4.1 (Frequency) [2, Page 30] The repeated times of a value of
the variable is called frequency of that value.
Definition 2.4.2 (Frequency distribution) [2, Page 30] Frequency distribu-
tion is a listing of a data set which divides the data in different mutually
exclusive (non-over-lapping) classes and gives a count of number of observa-
tions in each class.

2.6. TABULAR REPRESENTATION OF DATA 15
2.4.2. Types of frequency distribution. [2, Page 30] Frequency dis-
tribution can be constructed for both categorical and numerical data. Mainly,
frequency distribution are of two types :
(i) discrete or ungrouped frequency distribution
(ii) continuous or grouped frequency distribution.
In discrete frequency distribution, data are represented against the discrete
variable. When every observed value of a variable is listed, then ungrouped
frequency distribution is formed. When data are grouped and organised in a
frequency distribution, then a grouped frequency distribution is formed.
2.5. Representation of Data
We need to classify, categorise and summarise data in a convenient and
interpretable form. We can do it by tabular and graphical procedures.
2.5.1. Methods of data representation. Largely there are two ways
to represent data. These are
(i) Tabular representation
(ii) Graphical representation.
The concept of frequency distribution is introduced as a tabular method of
summarising data. Frequency distribution can be demonstrated graphically.
2.6. Tabular Representation of Data
2.6.1. Tabular representation of data. [2, Page 28] Tabular repre-
sentation of data is an orderly and logical form of information expressed in
numbers. According to L. R. Connor,

Tabulation involves the orderly and systematic presentation of
numerical data in a form designed to elucidate the problem under
consideration.
2.6.2. Frequency table. Frequency table demonstrates frequency distri-
bution. Usually, frequency distribution is presented in tabular form.
Note 2.8 Frequency distribution may also be represented graphically or by
some rules for pairing a class of observations with its frequency.
2.6.3. Construction of frequency table for categorical data. To
represent categorical data we use different types of tables like univariate table,
bi-variate table, cross table etc.
2.6.4. Univariate table for categorical data. The univariate table
is that one, in which frequency distribution of a single categorical variable is
shown. To construct such type of table following steps are followed :
(i) Choose the category into which the data are to be grouped.
(ii) Sort (or tally) the data into appropriate category.
(iii) Count the number of items falling in each category.
(iv) Display the result in a table.
Sometimes in a univariate table a column of percentage is added. It is to be
noted that percentage must add to 100.

Example 2.24 Suppose that religion of 10 students of a specific department
is as follows :
Muslim Hindu Christian Buddhist Muslim
Muslim Christian Christian Muslim Muslim
The univariate frequency distribution table of the above data is displayed below
(see Table 2.1).
Religion Frequency Percentage
Muslim 5 50
Hindu 1 10
Christian 3 30
Buddhist 1 10
Total 10 100
Table 2.1.
2.6.5. Cross table. To describe relationship between two categorical or
ordinal variables a cross table is used. A cross table with r rows and c columns
is referred as r × c table (which is read as r by c table). An entry of the
r-th row and c-th column in a cross table is called the cell frequency of
the rj-th position of that table. In a cross table, usually, row total, column
total and percentage comparison are shown. The totals in the columns and
rows are called the marginal frequencies. A cross table that demonstrates
relationship qualitative data is referred as contingency table.

Example 2.25 [4, Table 2.6 of page 43] A contingency table showing rela-
tionship between education level and family size of 50 students is given below
(see Table 2.2).
Family size
Education level Large Medium small Row total
None 4 6 1 11
Primary 6 8 5 19
higher 6 10 4 20
Column Total 16 24 10 50
Table 2.2.
Note 2.9 A cross table may be of mixed type in terms of variables. One
variable may be categorical and another may be numerical
2.6.6. Multi-variate tables. A table showing relations three variables
is referred as a tri-variate table and so on.
2.6.7. Construction of frequency distribution of numerical data.
To construct a frequency table with numerical data, usually we try to form an
array first.
Definition 2.6.1 [4, Definition 2.13] An array is an ordering of the values
of a variable by means of their magnitude usually in ascending order.
But organisation of data using array becomes cumbersome, when the num-
ber of observations is very large. So we arrange the data into a number of

mutually exclusive classes. In such a way we get a descent arrangement of raw
data, which is known to us as frequency distribution.
2.6.8. Ungrouped frequency distribution. If we construct a table
with each value of a variable in one column and its frequency in another col-
umn, then ungrouped frequency distribution is formed.
Example 2.26 (Example of ungrouped frequency distribution) Number of
family members of 10 EEE students are given as below :
1 2 2 3 2
.
4 3 4 2 1
The ungrouped frequency distribution of the above data is give by the Ta-
ble 2.3.
Family size Frequency
1 2
2 4
3 2
4 2
Total 10
Table 2.3.
2.6.9. Grouped frequency distribution. At this stage we discuss some
terminologies which are used to form a grouped frequency distribution.
2.6.9.1. Range. Let L be the largest value and S be the smallest value in a
data set. Then L − S is called range of that data set. In other words, range is
the difference of the two extreme values of a set of observations. It is denoted
by R.
2.6.9.2. Class. In the process of condensation, raw data are assigned to
some chosen groups of appropriate size. Each of these groups is called a class.
The number of classes, usually denoted by k, in a grouped frequency distri-
bution should neither be too large or too small. This should be of a reasonable
and suitable figure. According to the maximum statistician the number of
classes should be in the range 5 to 25 ([2, Page 31]). The choice of actual
number of classes depends on the number of observations and the size of class
interval desired. Sturge, a famous statician, has proposed the following formula
to determine the approximate number of classes
k = 1 + 3.322 log10 N ,
where N is the total number of observations or total frequencies. Another
propose of Sturge to determine the number of classes k is as follows.
k should be the smallest whole number so that 2k ≥ N , where
N is the total number of observations.
2.6.9.3. Frequency. The number of observations lying into each class is
called the class frequency, or simply frequency.
2.6.9.4. Class interval. The difference between the lower limit and upper
limit of a class is called the class width or class interval of that class. It is
denoted by w.
R
According to the Sturge’s approach, the formula w = determines the
k
approximate the class interval.
2.6.9.5. Class-mark. The class-mark is the value that lies in the middle
of the class. It is obtained by averaging two class boundaries.
2.6.10. Grouped frequency distribution for discrete data. Number
of working days, number of workers, television size etc are discrete data. For-
mation of frequency distribution for such type discrete data is illustrated in
the following problem.
Problem 2.1 [4, Example 2.4 of Page 52] The number of days absent of the
worker from their work due to illness is as follows.
5 8 9 9 10 10 10 10 11 11
12 12 12 13 13 13 14 14 14 15
15 15 15 16 16 16 16 17 17 17.
17 18 18 18 18 18 19 19 19 19
20 21 21 22 23 24 26 27 29 33
Construct a grouped frequency distribution of the above data set.
Solution N = 50. Range R = 33 − 5 = 28. Number of classes k = 6.44 ≈ 6,
as 26 = 64 ≥ 50. Class width w = 4.2 ≈ 5 (next higher digit).
Note 2.10 For discrete distributions, class limits are always inclusive in
nature.
2.6.11. Grouped frequency distribution for continuous data. Weight,
age, temperature etc are continuous data. Formation of frequency distribution
for such type continuous data is illustrated in the following problem.

2.7. OTHER FORMS OF FREQUENCY DISTRIBUTIONS 22
Lower limit of second class − Upper limit of first class

Correction factor, F =
2
Problem 2.2 [4, Example 2.5 of Page 55] The ages of 50 workers are given
as follows.
25 33 37 42 45 28 34 37 42 46
29 35 37 42 46 30 35 38 43 46
31 35 38 43 46 32 36 38 43 47.
32 36 39 44 50 32 36 40 44 51
33 36 41 44 52 33 37 42 45 54
Construct a suitable frequency distribution with class width of appropriate
size.
Solution N = 50. Range R = 54−25 = 29. Number of classes k = 6.64 ≈ 6,
as 26 = 64 ≥ 50. Class width w = 4.83 ≈ 5 (next higher digit).
Note 2.11 For continuous distributions, class limits are always exclusive in
nature.
2.7. Other Forms of Frequency Distributions
2.7.1. Percentage frequency distribution. If fi is the frequency of
the i-th class and N is the total frequency, then the percentage of the cases
(observations) lying in the i-th class is
fi
Pi = × 100.
N

P fi
Note 2.12 × 100 = 100.
N
2.8. GRAPHICAL REPRESENTATION OF DATA 23
2.7.2. Relative frequency distribution. Relative frequency of the i-th

fi
class is Ri =.
N

P fi
Note 2.13 = 1.
N
2.7.3. Cumulative frequency distribution. decumulative
2.8. Graphical Representation of Data
In addition presenting a frequency distribution in tabular form, one can
present the same through graphs, diagrams or charts. Some of these graphs,
diagrams or charts are as follows.
2.8.1. Graphical representation of categorical data. [4, Page 65]
To represent categorical or qualitative data the following diagrams are usually
used :
(i) Bar diagram (iv) Pie diagram
(ii) Stacked bar diagram (v) Pareto diagram
(iii) Cluster bar diagram (vi) Dot plot.
Among these bar and pie diagrams are discussed below as these are com-
monly used in this level.
2.8.2. Bar diagram. In bar diagram frequencies against the categories
are represented by rectangle separated along one of the two axes, namely x-axis
and y-axis. It is commonly used to represent categorical data. Bar diagrams
are of two types :
(i) Vertical bars

(ii) horizontal bars
Separation taken along the horizontal axis forms vertical bars, whereas sep-
aration taken along the vertical axis forms horizontal bars. Bar diagram is
also know as bar chart.
Note 2.14 The widths of the bars have no significance, but are taken to
make the chart look attractive.
2.8.3. Pictogram. Attractive visual appearance.
2.8.4. Stacked bar chart. Also known as component bar chart or
sub-divided bar chart.
Example 2.27 [4, Example 2.19 of page 75]
2.8.5. Cluster bar chart. multiple bar diagram
Example 2.28 [4, Table 2.21 of page 79]
2.8.6. Pie diagram. Pie diagram is useful to represent categorical data.
A pie diagram consists of a circle subdivided into sectors, whose enclosed
area is proportional to various part into which the whole quantity is divided.
Other data can also be employed to construct a pie chart after suitable and
meaningful classification or grouping of the data. Pie diagrams are also known
as pie chart.
Percent value
angle = × 360o
100
angle = 360o .
P
Note 2.15
Note 2.16 Percentage frequency is used in pie chart.
Note 2.17 [4, Page 81] The various parts of the pie chart drawn may be
identified either by angles in degrees or in percentage values.
2.8.7. Graphical representation of quantitative data. Quantitative
data are of two types, namely discrete and continuous. These two types of
data are represented graphically in different ways.
2.8.8. Graphical representation of discrete data. To represent dis-
crete data two types of lines are usually used :
(i) Dot lines
(ii) Continuous lines.
Beside these, discrete data can be represented by bar charts also [4, Page 91–
92].
Example 2.29 [4, Example 2.38 of page 92].
2.8.9. Graphical representation of continuous data. [4, Page 93]
Usually in one of the following ways, we represent continuous data :
(i) Histogram
(ii) Frequency polygon
(iii) Ogive curve.
2.8.10. Histogram. Most commonly used. In histogram class boundaries
are taken along horizontal axis and frequencies are taken along the vertical
axis. Drawing a rectangular bar with class boundary as its base and frequency
proportional to its height, each class is represented. In histogram, width of
a class is used and here the area of a bar represents the corresponding class
frequency. Proportional frequencies is often referred as frequency densities.
Note 2.18 In bar diagram, width of a bar is not significant. But in histogram
it is significant.
2.8.10.1. Histogram for equal class interval. The following problem illus-
trates the process of constructing histogram for equal class interval.
Problem 2.3 [4, Example 2.39 of page 93] House rent paid (as percentage
of their total income) by 80 urban families revealed the following data (see
Table 2.4).
House rent No. of family
4.5 – 9.5 08
9.5 – 14.5 29
14.5 – 19.5 27
19.5 – 24.5 12
24.5 – 29.5 04
Table 2.4.
Construct a histogram using the above data.
2.8.10.2. Histogram for unequal class interval. The following example il-
lustrates the process of constructing histogram for unequal class interval.

Problem 2.4 [4, Example 2.40 of page 94] A frequency distribution (see
Table 2.5) with unequal class width is given below.
Class boundary Frequency
4.5 – 14.5 37
14.5 – 19.5 27
19.5 – 29.5 16
Table 2.5.
Construct a histogram using the above data.
2.8.11. Frequency polygon. Alternative to histogram. In frequency
polygon, frequencies are taken against the mid-value (or class-mark) of a class
respectively along vertical axis and horizontal axis. It is a closed figure. To
make it closed two classes with zero frequency are taken at the top and bottom
of the given frequency distribution.
Problem 2.5 [4, Example 2.41 of page 95–96] Draw a frequency polygon for
the distribution given in Problem 2.3.
2.8.12. Cumulative frequency polygon. Cumulative frequency is taken
against the upper limit of the class interval respectively along the horizontal
and vertical axis. It is referred as the ogive. There are two types of ogive
curves, namely less than or greater than type.
Problem 2.6 Draw a ogive curve for the distribution given in Problem 2.3.
CHAPTER 3
Central Tendency
For details the reader is referred to [4].
3.1. Central Tendency
3.1.1. Central Tendency. A Measure of central tendency is a nu-
merical index that attempts to answer the question :
What is the typical value of the observations in this distribution?
It represents the average character of the data. According to Simpson and
Kafea, a measure of central tendency is a typical value around which other
figures congregate. According to D. A. Lind, W. G. Marchal and R. D. Mason,
measures of central tendency is a single value that summarizes a set of data.
It locates the center of the value. According to Murry R. Speigal, averages is
a value which is typical or representative of set of data.
3.1.2. Functions or purposes or usages of central value. There are
many objectives of averaging data. Some of them are as follows :
(a) A central value is useful for describing the position or location of a set
of observations or values. For example, it is very tough to memorize
the weights of the all EEE students of CUET, whereas an average
value in this case express the overall weight.
28
3.1. CENTRAL TENDENCY 29
(b) To compare two or more sets of data or series, a central value is used.
For example, to compare the height of male and female students of
CUET we can compare the averages of both groups.
(c) To express an overall condition of a set of data or observations, central
value is used. For example, if it is said that the average life expectancy
of a new born in Bangladesh is 60 years, then we easily understand
that the overall life-time of a citizen of Bangladesh is 60 years.
(d) Averages help to obtain the picture of a complete universe by means
of sample data.
(e) In the research, averages play vital role in setting standards, estima-
tions and other managerial decisions.
(f) To trace a mathematical relationship among different groups an aver-
age becomes essential.
3.1.3. Requisites of a measure of central tendency. The requisites
of a measure of central tendency are given below :
(a) It should not be indefinite or ill-defined. It should be rigidly defined.
(b) It should be easily understandable and should have an overall general
expression.
(c) It should be based upon all the observations and can be calculated
easily.
(d) It should be the representative of the data.
(e) It should be suitable for further arithmetic algebraic treatment.
(f) It should give a good standard or comparison.
(g) It should be less affected by sampling fluctuations.

3.2. MEAN 30
3.1.4. Different measures of central tendency. There are several types
of average values. Among them, some most commonly used averages are
(i) Mean
(ii) Median
(iii) Mode
3.2. Mean
3.2.1. Mean and its types. One of the well-known central tendencies is
mean. It is also of three types, namely
(i) arithmetic mean
(ii) geometric mean
(iii) harmonic mean.
3.2.2. Arithmetic mean. Sometimes arithmetic mean is also simply re-
ferred as mean.
Definition 3.2.1 [4, Definition 3.1 of Page 128] The arithmetic mean is
an average or central value of a set observations obtained by summing these
observations and then diving this sum by the number of such observations.
The arithmetic mean x̄ of the n real number, namely x1 , x2 , x3 , · · · , xn is
x1 + x2 + x3 + · · · + xn
x̄ = .
n
Arithmetic mean may be positive, negative or zero.
Example 3.1 If the marks of six students in mathematics are 3, 5, 9, 8, 5

3+5+9+8+5+3
and 3, then the arithmetic mean of their is .
6
3.2. MEAN 31
Remark 1 (Sample mean versus population mean) Suppose that a sample
of size n is chosen from a population with N and xi denotes i-th observation.
Then the sample mean is denoted by x̄ and defined as
Pn
i=1 xi
x̄ = ,
n
whereas population mean is denoted by µ and defined as
PN
i=1 xi
µ= .
N
3.2.3. Arithmetic mean for ungrouped data. Suppose that the values
x1 , x2 , x3 , · · · , xn of the variable x occur respectively f1 , f2 , f3 , · · · , fn

Pn
f i xi
times. Then the formula for the arithmetic mean takes form x̄ = Pi=1 n ,
i=1 fi
P
f i xi
or simply x̄ = P .
fi
A dummy table for computing arithmetic mean of ungrouped distribution
is in [4, Table 3.1 of Page 130].
Problem 3.1 [4, Example 3.2 of Page 130] A sample survey of Bangladesh
Bureau of Statistics in a rural area collected the age of first marriage (AFM)
in years of 330 newly married women. The distribution was as follows.
AFM 11 12 13 14 15 16 17 18 19 20
No. of women 17 28 37 52 70 48 36 23 11 08
Calculate the mean age at first marriage for the women in the sample.
3.2.4. Arithmetic mean for grouped data. To compute arithmetic
mean from grouped distribution, the mid-point of each class is taken as the
3.2. MEAN 32
representative value of that class. The mid-values of different classes are mul-
tiplied by their respective class frequencies. Then the products are added.
Finally, the sum of the products is divided by the total number of frequen-
cies to get the required arithmetic mean. Suppose that there are k number of
classes and y1 , y2 , y3 , · · · , yn are the mid-values of the classes with frequencies
f1 , f2 , f3 , · · · , fn respectively. Then the formula for the arithmetic mean takes

P
fi yi
form ȳ = P .
fi
Problem 3.2 [4, Example 3.3 of Page 132] Distribution of a group of workers
by their age (in years) is as follows.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Calculate their mean age.
Solution Let us produce the following Table 3.1.
Age (in years) Mid-value (yi ) Frequency (fi ) fi yi
24.5 - 29.5 27 3 81
29.5 - 34.5 32 9 288
34.5 - 39.5 37 15 255
39.5 - 44.5 42 12 504
44.5 - 49.5 47 7 329
49.5 - 54.5 52 4 208

P P
Total – fi = 50 fi yi = 1965
Table 3.1.
3.2. MEAN 33
P
fi y i 1965
Therefore ȳ = P = = 39.3.
fi 50
Answer 39.5 years.
3.2.5. Short-cut method to find arithmetic mean of ungrouped
data. Suppose that a set n values x1 , x2 , x3 , · · · , xn are given. If we subtract
a constant value a from each of the values of x, then obtain a new set of values
d1 , d2 , d3 , · · · , dn . Then formula to calculate the mean for the ungrouped
distribution takes the form

P
di
x̄ = a + ,
n
where a is called the origin factor, or assumed mean, or provisional mean.
This formula is written as
x̄ = a + d¯ ,
d1 + d2 + d3 + · · · + dn
where d¯ = and di = xi − a.
n
Problem 3.3 [4, Example 3.4 of Page 133] Compute the arithmetic mean
of the values 249, 211, 447, 380, 410 and 190 by suitably changing the origin.
Solution Let the origin be a = 300. Now we compute di as follows.
xi 249 211 447 380 410 190
di = xi − a -60 -89 147 80 110 -110

P
di 78
di = 78. So d¯ =
P
Here = = 13. Therefore
n 6
x̄ = a + d¯ = 300 + 13 = 313.
Answer 313.
3.2. MEAN 34
3.2.6. Short-cut method to find arithmetic mean of grouped data.
Let y be a quantitative variable taking on values y1 , y2 , y3 , · · · , yk , the corre-
sponding class frequencies being f1 , f2 , f3 , · · · , fk . Suppose that d is a new

yi − a
variable taking on the values d1 , d2 , d3 , · · · , dk such that di = , where
h
a is an arbitrary value and h is a common class width. Then
yi = a + hdi
or, fi yi = afi + hdi fi

X X X
or, fi y i = a fi + h f i di
X X
or, fi yi = an + h f i di
P P
fi yi f i di
or, =a+h
n n
P
f i di
∴ ȳ = a + h = a + hd¯ .
n
In above the following theorem is proved.
Theorem 3.2.1 Arithmetic mean is dependent on both origin and scale of
measurement.
Problem 3.4 [4, Example 3.5 of Page 134] Distribution of a group of workers
by their age (in years) is as follows.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Calculate their mean age by short-cut method.

3.2. MEAN 35
yi − a
Age (in years) Mid-value (yi ) Frequency (fi ) di = f i di
h
24.5 - 29.5 27 3 -2 -6
29.5 - 34.5 32 9 -1 -9
34.5 - 39.5 37 15 0 0
39.5 - 44.5 42 12 1 12
44.5 - 49.5 47 7 2 14
49.5 - 54.5 52 4 3 12
P P
Total – fi = 50 – fi di = 23
Table 3.2.
P
¯ f i di 23
Here a = 37, h = 5 and d = P = = 39.5. Therefore
fi 50
ȳ = a + hd¯ = 37 + 5 × 0.46 = 39.3.
Answer 39.3 years.
Note 3.1 Shortcut-cut method is also referred as coding method.
Note 3.2 (Drawback of short-cut method) Short-cut method is rewarding
only when a and h are chosen wisely and h is uniform throughout the distri-
bution.
Note 3.3 There are many others mean also. For example, pooled mean,
weighted mean etc.
3.2.7. Properties of arithmetic mean. The next theorems state some
properties of arithmetic mean.

3.2. MEAN 36
Theorem 3.2.2 The algebraic sum of the deviations of the values x1 , x2 , x3 ,
· · · , xn from their arithmetic mean x̄ is zero.

Pn
x1 + x2 + x3 + · · · + xn i=1 xi Pn
Proof. We have x̄ = = . So nx̄ = i=1 xi .
n n
Again, ni=1 x̄ = nx̄. Therefore
P
n
X n
X n
X
(xi − x̄) = xi − x̄ = nx̄ − nx̄ = 0.
i=1 i=1 i=1
Hence the statement.
Note 3.4 Above theorem implies that sum of the deviations from an arbi-
trary value other than the mean is nonzero. So the mid-value of any class of the
given distribution can taken as the assumed mean. But it is more convenient
to choose the mid-point of the class nearer to the center of the distribution as
the assumed mean, which has relatively large frequency.
Theorem 3.2.3 The arithmetic mean is sensitive to extreme value.
Theorem 3.2.4 (Pooled mean or combined mean) If a set consisting of m
observations x11 , x12 , x13 , · · · , x1m with mean x̄1 and a second set consisting
of n observations x21 , x22 , x23 , · · · , x2n with mean x̄2 , then the combined mean
mx̄1 + nx̄2
x̄c of all the m + n observations is x̄c = .
m+n
Theorem 3.2.5 (Least-square property of the mean) The sum of squared
deviation from the arithmetic mean is less than the sum of squared deviation
from any other value.
Proof. Let x̄ be the arithmetic mean of a set of observations x2 , x3 , · · · , xn
(xi − x̄)2 ≤ (xi − a)2 .

P P
and a be any constant. We have show that
3.2. MEAN 37
We can write that
X X X
(xi − a)2 = (xi − x̄ + x̄ − a)2 = ((xi − x̄) + (x̄ − a))2
X
(xi − x̄)2 + 2(xi − x̄)(x̄ − a) + (x̄ − a)2

=
X X
= (xi − x̄)2 + 2(x̄ − a) (xi − x̄) + n (x̄ − a)2 ,
by Theorem A.1.5
X
= (xi − x̄)2 + n (x̄ − a)2 ,
X
as (xi − x̄) = 0 by Theorem 3.2.2.
(xi − x̄)2 ≤ (xi − a)2 .

P P
Therefore
Theorem 3.2.6 Arithmetic mean is the most stable measure of central ten-
dency than any other measure.
Theorem 3.2.7 If a and b are constants such that x = a ± by, where x and
y are two variables assuming the values x1 , x2 , x3 , · · · , xn and y1 , y2 , y3 , · · · ,
yn respectively, then x̄ = a ± bȳ.
Theorem 3.2.8 The arithmetic mean x̄ of first n natural numbers is x̄ =

n+1
.
2
3.2.8. Weighted mean. The weighted arithmetic mean or simply
weighted mean, denoted by x̄w , of a set of observations x1 , x2 , x3 , · · · , xn
with respective weights w1 , w2 , w3 , · · · , wn is defined as
P
w1 x1 + w2 x2 + w3 x3 + · · · + wn xn w i xi
x̄w = = P .
w1 + w2 + w3 + · · · + wn wi
3.3. MEDIAN 38
3.2.9. Geometric mean. The geometric mean of n non-zero positive
values x1 , x2 , x3 , · · · , xn is denoted by G and defined as
1
G = (x1 .x2 .x3 . · · · .xn ) n .
Note 3.5 [4, Page 182] Geometric mean is used with numbers that tend
to increase or decrease geometrically rather than arithmetically, that is, each
number is same multiple of the preceding number.
3.2.10. Harmonic mean. The harmonic mean, denoted by H, of n
values x1 , x2 , x3 , · · · , xn is defined as
n n
H= =P .
1 1 1 1 1
+ + + ···
x1 x2 x3 xn xi
Note 3.6 [4, Page 186] Harmonic mean is used when rates are expressed as
x per y and x is a constant. For example, miles per hour, production per acre,
income per household etc.
3.3. Median
3.3.1. Median. Like mean median is also a central tendency. In general,
median is the middle most observations, when the observations or set of
values of a particular study are arranged in ascending or descending order of
magnitude.
Definition 3.3.1 [4, definition 3.3 of Page 145] The median, denoted by
m̃, is the value in a set of ordered observation that divides the whole set of
observations into two parts of equal size.

3.3. MEDIAN 39
The median is sometimes referred as positional average, as it lies in the
middle of the data set after the values in the set have been placed in the
ordered way.
3.3.2. Properties of median. Some properties of median are given be-
low.
(a) The median is unique.
(b) It is unaffected by the extremely large or small values.
(c) The median can be computed from distributions with open-ends classes
For example, less than 10, 25 or more, etc.
(d) Unlike the mean, the median can be obtained for all levels of data
except the nominal.
3.3.3. Disadvantages of median. Some disadvantages of median are
given below.
(a) An overall pooled median can not be obtained from a set of medians.
(b) Medians are less stable.
(c) Median can not be calculated for the nominal data.
3.3.4. Median for ungrouped data. To find the median for ungrouped
data, at first given data set should be arranged in ascending or descending
order.
Let x1 , x2 , x3 , · · · , xn be a data set with n observations. Suppose that
they are arranged in order of magnitude. Then the median m̃ is given by the
3.3. MEDIAN 40
formula


 n+1
th observation, if n is odd


2

m̃ = n n
 th observation + + 1 th observation
2 2

, if n is even.



2
Example 3.2 Suppose that we are given the ages (in years) of 7 boys as 7,
10, 5, 8, 4, 11 and 6. Arranging them in ascending order,
4, 5, 6, 7, 8, 10, 11.
Here 7 is lying in the middle of the ordered data set. So 7 is the median.
Example 3.3 Suppose that we are given the ages (in years) of eight boys as
7, 10, 5, 8, 4, 11, 12 and 6. Arranging them in ascending order,
4, 5, 6, 7, 8, 10, 11, 12.
Here two values namely 7 and 8 are lying in the middle of the ordered data
7+8
set. So here the median is the = 7.5.
2
3.3.5. Median for ungrouped distribution. If discrete frequency dis-
tribution is given in ungrouped form, then we compute the median following
the method illustrated in the problem below.
Problem 3.5 [4, Example 3.17] Result of a survey conducted among 100
families to know their family size produces the following distribution (see Ta-
ble 3.3).
Calculate the median family size.
Solution Let us construct the Table 3.4 using cumulative frequency.

3.3. MEDIAN 41
Family size 1 2 3 4 5 6 7 8 9
Number of families 2 6 12 18 19 15 11 11 6
Table 3.3.
Family size 1 2 3 4 5 6 7 8 9
Frequency, fi 2 6 12 18 19 15 11 11 6
Cumulative Frequency, Fi 2 8 20 38 57 72 83 94 100

Table 3.4.
Here the location of the median is the
1(100 + 1)
= 50.5-th ordered position.
2
But 50.5 is not an integer. So the the required median is
m̃ = 50-th observation + 0.5(51-th observation − 50-th observation)
= 5 + 0.5(5 − 5) = 5.
Alternative way. Since n = 100 is even, median

100 100
th observation + + 1 th observation
2 2
m̃ =
2
50 -th observation + 51-th observation
=
2
5+5
= =5
2
3.3.6. Median for grouped distribution. Let n be the total frequency.

n
The class in which -th observation lies is called the median class. To
2
calculate median m̃ the following theorem is used.
3.3. MEDIAN 42
Theorem 3.3.1 Let n be the total frequency. Then the median
h n
m̃ = lm + − F(m)−1 ,
fm 2
where
lm = lower limit of the median class,
fm = the frequency of the median class,
F(m)−1 = cumulative frequency of the pre-median class and
h = width of the median class.
Proof. See [4, Page 151–153].
Problem 3.6 [4, Example 3.19 of Page 154] Distribution of a group of work-
ers by their age (in years) is as follows (see Table 3.5).
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.5.
Calculate their median age.

1(100 + 1)
Here n = 50. So in the Lm = = 50.5-th ordered position m̃ lies.
2
Thus 34.5 − 39.5 is the median class. Hence lm = 35.5, h = 5, F(m)−1 = 12
and fm = 15. Therefore
h n
m̃ = lm + − F(m)−1 = 38.83.
fm 2
3.3. MEDIAN 43
Age (in years) Frequency (fi ) Cumulative frequency Fi
24.5 - 29.5 3 3
29.5 - 34.5 9 12
34.5 - 39.5 15 27
39.5 - 44.5 12 39
44.5 - 49.5 7 46
49.5 - 54.5 4 50
Total 50 –
Table 3.6.
Answer 38.83 years.
Problem 3.7 [4, Example 3.18] The rate of sales tax as a percentage of the
sales paid by 400 shopkeepers of a market during an assessment year ranged
from 0% to 25%. The distribution (see Table 3.7) of the tax payers by sales
tax paid is summarised in the form of a frequency distribution below taking
intervals of 5%.
Sales tax Frequency
00 – 05 75
05 – 10 128
10 – 15 100
15 – 20 68
20 – 25 29
Table 3.7.
3.4. QUARTILES, PERCENTILES AND DECILES 44
Compute the median value.
Problem 3.8 [4, Page151–152] The longevity (in years) of 40 rats as ob-
tained in an experimental set up is given below (see Table 3.8).
Life lengths Number of rats
1.45 – 1.95 2
1.95 – 2.45 1
2.45 – 2.95 4
2.95 – 3.45 15
3.45 – 3.95 10
3.95 – 4.45 5
4.45 – 4.95 3
Table 3.8. Longevity of rats’ life in years
Calculate median longevity of the rats.
3.3.7. Locating median graphically. Histogram is used to locate me-
dian graphically. For detail, go through [4, Page 155 – 156]. Also by ogive
median can also be located graphically (see [4, Page 157]).
3.4. Quartiles, Percentiles and Deciles
3.4.1. Quartiles. Quartiles are three such values related to a given data
set of distribution, usually denoted by Q1 , Q2 and Q3 , which divide the whole
data into 4 equal parts. Among these

(i) the first quartile, namely Q1 , is the value from which 25% of all obser-
vations are smaller and remaining 75% of all observations are greater.
(ii) the second quartile, namely Q2 , is the value from which 50% of all
observations are smaller and remaining 50% of all observations are
greater.
(iii) the third quartile, namely Q3 , is the value from which 75% of all
observations are smaller and remaining 25% of all observations are
greater.
Clearly, the second quartile is identical with the median.
3.4.1.1. Quartiles for ungrouped data. The point of location for the Qr -th
r(n + 1)
quartile value is Lr = -th ordered position. If
4
(i) Lr is an integer, then the particular numerical observation correspond-
ing to that point is chosen for the corresponding quartile.
(ii) Lr is not an integer, then the corresponding quartile is determined by
a rule, which is clarified by an example. Suppose that Lr = 5.25. Let
V5 = 5 − th ordered value and V6 = 6 − th ordered value. Then the
quartile Qr = V5 + 0.25(V6 − V5 ).
Problem 3.9 [4, Example 3.23 of Page 161] Calculate Q1 , Q2 and Q3 for
the following ordered observations :14, 17, 19, 23, 27, 32, 40, 49, 54, 59, 71,
80.
h
3.4.1.2. Quartiles for grouped data. Qr = lr + Lr − F(r)−1 for r =
fr
r(n + 1)
1, 2, 3, where Lr = is the location of r-th quartile, F(r)−1 is the
4
cumulative frequency of the class prior to the r-th quartile class, fr is the
frequency of the r-th quartile class and lr is the lower limit of the r-th quartile
class.
Problem 3.10 [4, Example 3.25 of Page 163] Compute the first and third
quartile from the following age distribution (see Table 3.9) of the 50 workers
of a company.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.9.
Solution Construct the following Table 3.10 using cumulative frequency.
Age (in years) Frequency (fi ) Cumulative frequency Fi
24.5 - 29.5 3 3
29.5 - 34.5 9 12
34.5 - 39.5 15 27
39.5 - 44.5 12 39
44.5 - 49.5 7 46
49.5 - 54.5 4 50
Total 50 –
Table 3.10.
Here n = 50. We have for first quartile
h
Q1 = l1 + L1 − F(1)−1 ,
f1
1(50 + 1)
and L1 = = 12.75. Observe that 12.75 lies in the class 34.5 − 39.5.
4
So l1 = 34.5, h = 5, f1 = 15 and F(1)−1 = 12. Hence
5
Q1 = 34.5 + (12.75 − 12) = 34.75.
15
We have for third quartile
h
Q3 = l3 + L3 − F(3)−1 ,
f3
3(50 + 1)
and L3 = = 38.25. Observe that 38.25 lies in the class 39.5 − 44.5.
4
So l1 = 39.5, h = 5, f3 = 12 and F(3)−1 = 27. Therefore
5
Q3 = 39.5 + (38.25 − 27) = 44.19.
12
3.4.2. Locating quartiles geometrically. We can represent quartiles
using ogive. For detail, see [4, Page 164 – 165].
3.4.3. Percentiles. Quartiles are three such values related to a given data
set of distribution, usually denoted by P1 , P2 , P3 , · · · , P99 , which divide the
whole data into 100 equal parts. Among these
(i) P25 represents Q1 percentile.
(ii) P50 represents Q2 percentile or median.
(iii) P75 represents Q3 percentile or median.
Clearly, 100th percentile is identical with the median.
3.4.3.1. Percentiles for ungrouped data. The point of location for the Pr -th
r(n + 1)
quartile value is Lr = -th ordered position. If
100
ing to that point is chosen for the corresponding percentile.

(ii) Lr is not an integer, then the corresponding percentile is determined
by a rule, which is clarified by an example. Suppose that Lr = 5.75.
Let V5 = 5 − th ordered value and V6 = 6 − th ordered value. Then
the percentile Pr = V5 + 0.75(V6 − V5 ).
Problem 3.11 [4, Example 3.27 of Page 166] Calculate 29th and 75th per-
centiles for the following ordered observations : 11, 14, 17, 23, 27, 32, 40, 49,
54, 59, 71, 80.
h
3.4.3.2. Percentiles for grouped data. Pr = lr + Lr − F(r)−1 for r =
fr
r(n + 1)
1, 2, 3, · · · , 99, where Lr = is the location of r-th percentile, F(r)−1
100
is the cumulative frequency of the class prior to the r-th percentile class, fr is
the frequency of the r-th percentile class and lr is the lower limit of the r-th
percentile class.
Problem 3.12 [4, Example 3.28 of Page 167] Compute the 30th percentile
from the following age distribution (see Table 3.11) of the 50 workers of a
company.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.11.
3.4.4. Deciles. Deciles are 9 such values related to a given data set of
distribution, usually denoted by D1 , D2 , D3 , · · · , D10 , which divide the whole
data into 10 equal parts. D5 = P50 = Q2 .

3.4.4.1. Deciles for ungrouped data. The point of location for the Dr -th
r(n + 1)
decile value is Lr = -th ordered position. If
10
ing to that point is chosen for the corresponding decile.
(ii) Lr is not an integer, then the corresponding decile is determined by a
rule, which is clarified by an example. Suppose that Lr = 5.77. Let
V5 = 5-th ordered value and V6 = 6-th ordered value. Then the decile
Dr = V5 + 0.77(V6 − V5 ).
Problem 3.13 [4, Example 3.30 of Page 168] Calculate 4th decile for the
following ordered observations : 14, 17, 23, 27, 32, 40, 49, 54, 59, 71, 80.
h
3.4.4.2. Deciles for grouped data. Dr = lr + Lr − F(r)−1 for r = 1,
fr
r(n + 1)
2, 3, · · · , 9, where Lr = is the location of r-th decile, F(r)−1 is
10
the cumulative frequency of the class prior to the r-th decile class, fr is the
frequency of the r-th decile class and lr is the lower limit of the r-th decile
class.
Problem 3.14 [4, Example 3.31 of Page 169] Age distribution of 50 workers
of a company is given in the Table 3.12.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.12.
Compute the 4th decile of the given distribution.

3.5. MODE 50
3.5. Mode
3.5.1. Mode. Like mean median is also a central tendency.
Definition 3.5.1 [4, definition 3.4 of Page 170] The mode is denoted by Mo
and defined as the most frequently occurring value in a set of observation.
3.5.2. Properties of mode. Some properties of mode are given below.
(a) It is useful for non-quantitative data.
(b) It is not affected by extreme values.
(c) It can calculated when there exists open interval.
3.5.3. Mode for grouped distribution. The class in which mode lies
is called the modal class.
To calculate mode the following theorem is used.
Theorem 3.5.1 The mode of a grouped distribution is calculated by the
formula

∆1
M0 = l0 + h ,
∆1 + ∆2
where
l = lower limit of the modal class,
∆1 = frequency difference between the modal and pre-modal classes,
∆2 = frequency difference between the modal and post-modal classes,
h = width of the modal class.
Proof. See [4, Section 3.6.2].

3.6. COMPARISON AND RELATIONSHIP BETWEEN THE AVERAGES 51
Definition 3.5.2 If a series of observations has more than mode, then the
mode is said to be ill-defined.
Problem 3.15 [4, Example 3.34] Age distribution of some workers of a com-
pany is given in the Table 3.13.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 3.13.
Compute the modal age.
Solution Here 34.5 − −39.5 is the modal class. So l0 = 34.5, ∆1 = 15 − 9 =
16, ∆2 = 15 − 12 = 3 and h = 5. So the mode

6
M0 = 34.5 + 5 = 37.83.
6+3
3.5.4. Locating mode graphically. By using histogram mode is lo-
cated. For detail see [4, Page 171 – 176].
3.6. Comparison and Relationship between the Averages
3.6.1. Comparing different averages. See [4, Page 191 – 194]
3.6.2. Relationship between averages. See [4, Page 194 – 200]

CHAPTER 4
Measure of Dispersion
Literal meaning of dispersion is scatteredness of a distribution. By disper-
sion, we mean how the observations or values of a data set are scattered, or
varied or dispersed. For detail study the reader is referred to [4, Chapter 4]
and for more problems the reader is referred to [3, Chapter 25].
4.1. What is Measure of Dispersion
4.1.1. Measure of dispersion. Measures of central tendencies fail to
give any idea as to how the individual values differ from the central value,
more specifically, whether they are closely packed around the central value
or widely scattered away from it. The magnitude of such a variation of the
individual values from the central value is referred as dispersion.
Definition 4.1.1 [2, Page 121] The distance of different individual values
from the central value is called the dispersion.
According to Spigel, the degree to which numerical data tend to spread
about an average value is called the variation or dispersion of data.
4.1.2. Interpretation of the dispersion of a distribution. Dispersion
of a data set or distribution may be small, large or zero, interpretation of these
are as follows.
52
4.1. WHAT IS MEASURE OF DISPERSION 53
(i) small dispersion indicates the high uniformity of the observations in
the distribution.
(ii) large dispersion indicates the less uniformity of the observations in
the distribution.
(iii) zero (or absent) dispersion indicates the perfect uniformity of the ob-
servations in the distribution.
When all observations of a distribution are identical, then the dispersion be-
comes zero. In this situation, description of any single observation is sufficient
to know about the distribution. In real life situations it happens rarely.
4.1.3. Purpose of the measure of dispersion. It serves two purposes :
(i) To characterise a frequency distribution.
(ii) To compare between two or more frequency distributions.
4.1.4. Necessity to know the degree of dispersion of a data set.
To summarise a large data set, that is, to locate the center of data set or
distribution, we use the measure of central tendency. If there is any degree
of variation among the observations, that is, when all the observations are
not identical, then to get the precise descriptive summary of a data set both
central tendency and dispersion become necessary to know. By describing the
measure of central tendency and measure of dispersion, we can describe almost
all distributions with a reasonable degree of accuracy.
Knowing only the measure of central tendency may not be sufficient to
compare between two distributions. Two different distributions may have ex-
actly the same averages, but they may have differences in variability. For
4.2. CLASSIFICATION OF MEASURES OF DISPERSIONS 54
example, suppose that three students have obtained the following marks (see
the Table 4.1) in Statistics, Physics and Chemistry.
Serial Statistics Physics Chemistry Average Range
1 61 49 40 50 21
2 52 53 45 50 8
3 51 50 49 50 2
Table 4.1.
In Table 4.1 all three distributions are not identical. But their averages are
same. The differences lie in the dispersion of their scores. The first student
shows the largest variation in his scores. The second student shows relatively
less variation, while the third student shows least variation in his secured
scores among the stated three students. The scores of third one secured in
three subjects are very close to one another.
In conclusion, though the measures of central tendency do exhibit one of the
important characteristics of a distribution, but these averages do not describe
the fact that how the values or the observations of a data set are spread out
or scattered among themselves. So we need the measures of dispersions.
4.2. Classification of Measures of Dispersions
4.2.1. Types of dispersions. Dispersion is nothing but the variation of
an item around an average. It is mainly of two categories :
(i) absolute measures of dispersions
(ii) relative measures of dispersions.

4.3. RANGE AND COEFFICIENT OF IT 55
4.2.2. Absolute measures of dispersions. In this case, dispersions are
measured in original units. This type of measure of dispersion includes the
following categories :
(i) range
(ii) quartile deviation
(iii) mean or average deviation
(iv) variance
(v) standard deviation.
4.2.3. Relative measures of dispersions. When two or more data sets
are expressed in different units, then relative measure of dispersion is used.
This type of measure of dispersion includes the following categories :
(i) coefficient of range
(ii) coefficient of quartile deviation
(iii) coefficient of mean deviation
(iv) coefficient of variance.
Some important measures of dispersions are detailed below.
4.3. Range and Coefficient of It
4.3.1. Range. Difference between two extreme values of a set of observa-
tions is called the range. It is denoted by R.
Note 4.1 Range is always nonnegative, it can not be negative.

4.4. MEAN DEVIATION AND COEFFICIENT OF IT 56
4.3.1.1. Range for ungrouped data. In case of ungrouped data, if the highest
value and lowest value of a given data set are H and L respectively, then its
range is R = H − L .
Example 4.1 Range of the data set 5, 4, 3, 6, 3, 7 is R = 7 − 3 = 4.
4.3.1.2. Range for grouped data. Difference between the upper boundary
of the highest class and lower boundary of the lowest class.
Class boundary Frequency
4.5 – 14.5 37
14.5 – 24.5 27
24.5 – 34.5 16
Table 4.2.
Example 4.2 Range of the above grouped distribution (see Table 4.2) is
R = 34.5 − 4.5 = 30.
4.3.2. Coefficient of range. Coefficient of the range is denoted by CR

R
and defined as CR = × 100% .
H +L
Example 4.3 Coefficient of the range of the data set given in Example 4.1
4
is CR = × 100% = 40%.
7+3
4.4. Mean Deviation and Coefficient of It
4.4.1. Mean (or average) deviation. Mean deviations are defined for
two types of data - ungrouped and grouped.

4.4.1.1. Mean deviation for ungrouped data. Let x1 , x2 , x3 , · · · , xn form a
sample of observations. Then the mean deviation about any arbitrary value a
P
|xi − a|
is Md (a) = . If we replace a by
n
P
|xi − a|
(i) mean x̄, then we get Md (x̄) = , which is called mean de-
n
viation about the mean.
P
|xi − m̃|
(ii) median m̃, then we get Md (m̃) = , which is called mean
n
deviation about the mean.
P
|xi − M0 |
(ii) mode M0 , then we get Md (M0 ) = , which is called mean
n
deviation about the mode.
Problem 4.1 [6, Episode 10] Calculate the mean deviation of the data set
given below.
1, 2, 3, 4, 5, 6, 7.
1+2+3+4+5+6+7
Solution x̄ = = 4.
7
X
|xi − x̄| = 3 + 2 + 1 + 0 + 1 + 2 + 3 = 12.
P
|xi − x̄| 12
Therefore Md (x̄) = = = 1.714.
n 7
Problem 4.2 [4, Example 4.1 of Page 210 – 211]
Food for thought 4.1 Why is the absolute value of (xi − x̄) taken in defi-
nition of mean deviation?
4.4.1.2. Mean deviation for grouped data. Let x1 , x2 , x3 , · · · , xn with fre-
quencies f1 , f2 , f3 , · · · , fn respectively form a grouped sample of observations.

P
fi |xi − a|
Then the mean deviation about any arbitrary value a is Md (a) = .
n
If we replace a by
P
fi |xi − a|
(i) mean x̄, then we get Md (x̄) = , which is called mean
n
P
fi |xi − m̃|
(ii) median m̃, then we get Md (m̃) = , which is called mean
n
P
fi |xi − M0 |
(ii) mode M0 , then we get Md (M0 ) = , which is called
n
mean deviation about the mode.
Note 4.2 [4, Page 213] Among Md (x̄), Md (m̃) and Md (a) the mean deviation
about the median Md (m̃) is the smallest of all.
Problem 4.3 [4, Example 4.2] Age distribution of 50 workers of a company
is given in Table 4.3.
Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5
Frequency 03 09 15 12 07 04
Table 4.3.
Compute the mean deviations about the
(a) mean
(b) median
(c) an arbitrary value 42
of the age distribution of the group of workers.
Solution Construct the following Table 4.4 and calculate the mean x̄.
Age (in years) Mid-value (xi ) Frequency (fi ) fi xi |xi − x̄| fi |xi − x̄|
24.5 - 29.5 27 3 81 12.3 36.9
29.5 - 34.5 32 9 288 7.3 65.7
34.5 - 39.5 37 15 555 2.3 34.5
39.5 - 44.5 42 12 504 2.7 32.4
44.5 - 49.5 47 7 329 7.7 53.9
49.5 - 54.5 52 4 208 12.7 50.8
Total – 50 1965 – 274.2

Table 4.4.
P
f i xi 1965
(a) Clearly, x̄ = = = 39.3. Therefore from the Table 4.4,
n
P 50
fi |xi − x̄| 247.2
we obtain that Md (x̄) = = = 5.48.
50 50
(b) From Problem 3.6, we get median m̃ = 38.83. By constructing an
P
appropriate table we can calculate fi |xi − m̃|. Here
X X
fi |xi − m̃| = fi |xi − 38.8| = 272.32.
P
fi |xi − 38.8| 272.32
Therefore Md (m̃) = = = 5.45.
50 50
(c) By constructing a convenient table we can calculate the required mean
deviation about a = 42. Here
X X
fi |xi − a| = fi |xi − 42| = 285.
P
fi |xi − 42| 285
Therefore Md (42) = = = 5.7.
50 50
4.4.2. Coefficient of mean deviation. The ratio of mean deviation to
the corresponding central location (like mean, median, mode) is called the
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 60
coefficient of mean deviation. Coefficient of mean deviation is expressed
as percentage. If the mean deviation is calculated about
(i) the mean x̄, then the coefficient of mean deviation is denoted by
Md (x̄)
CMd (x̄) and defined as CMd (x̄) = × 100% .
x̄
(ii) the median m̃, then the coefficient of mean deviation is denoted by
Md (m̃)
CMd (x̄) and defined as CMd (m̃) = × 100% .
m̃
(iii) the mode M0 , then the coefficient of mean deviation is denoted by
Md (M0 )
CMd (M0 ) and defined as CMd (x̄) = × 100% .
M0
4.5. Quartile Deviation and Coefficient of It
4.5.1. Quartile deviation. Let Q1 be the first quartile and Q3 be the
third quartile. Then the quantity Q3 − Q1 is called the inter quartile range
Q3 − Q1
and the quantity is called the quartile deviation. Quartile devia-
2
tion is denoted by QD and is also known as semi-inter quartile range. So
Q3 − Q1
QD = .
2
4.5.2. Coefficient of quartile deviation. Let Q1 be the first quartile
and Q3 be the third quartile. Then the coefficient of quartile deviation is

Q3 − Q1
denoted by CQD and defined as CQD = × 100% .
Q3 + Q1
4.6. Variance and Coefficient of Variance
4.6.1. Standard deviation. standard deviation is the most important
measure of dispersion.
Definition 4.6.1 [2, Page 125] The positive square root of the mean of the
squared deviations of a set of values from their mean is called the standard
deviation. It is usually denoted by σ.
However, if the deviations are measured from any other value a other than
x̄, then it is called the root mean square deviation.
Food for thought 4.2 What is the minimum value of standard deviation?
In what situation does the standard deviation take this value?
4.6.1.1. Standard deviation for ungrouped data. Let x1 , x2 , x3 , · · · , xn be a
set of values. Then their standard deviation can be calculated by the formula
rP
(xi − x̄)2
σ= , where x̄ is the mean.
n
sP 2
x2i
P
xi
Theorem 4.6.1 σ= − .
n n
Proof. Left as an exercise.
Theorem 4.6.2 If we consider a as the assumed mean of the given data set
x1 , x2 , x3 , · · · , xn and assume di = xi − a, then their standard deviation can

sP P 2
d2i di
be calculated by the formula σ = − .
n n
Caution 4.1 To calculate standard deviation for ungrouped data, we try to
avoid applying the formula given in Theorem 4.6.1. Because if we apply this
formula, then we need to perform more calculation.

Problem 4.4 Calculate the standard deviation of the data set :
2, 3, 4, 5, 6.
rP
(xi − x̄)2
Hints Apply the formula σ = .
n
4.6.1.2. Standard deviation for grouped data. Let x1 , x2 , x3 , · · · , xn be a set
of values which occur with the frequencies f1 , f2 , f3 , · · · , fn respectively. Then

rP
fi (xi − x̄)2
their standard deviation can be calculated by the formula σ = ,
n
where x̄ is the mean.
sP 2
fi x2i
P
f i xi
Theorem 4.6.3 σ= − .
n n
Proof. Left as an exercise.
Theorem 4.6.4 If we consider a as the assumed mean of the given data
x1 , x2 , x3 , · · · , xn with frequency f1 , f2 , f3 , · · · , fn respectively and assume

xi − a
di = , then their standard deviation can be calculated by the formula
h
sP 2
fi d2i
P
f i di
σ =h× − ,
n n
where h is the class interval.
Caution 4.2 To calculate standard deviation for grouped data, we try to
avoid applying the formula given in Theorem 4.6.3. Because if we apply this
formula, then we need to perform more calculation.

Note 4.3 [4, Equation 4.19 of Page 216] In the textbooks for higher studies,
in denominator of the formula for calculating sample standard deviation or
sample variance n is replaced by n − 1, since then degrees of freedom is n − 1.
4.6.2. Variance. Square of standard deviation is called the variance. It
is denoted by σ 2 .
Problem 4.5 Disprove the statement — variance is always greater than
standard deviation.
Disproof. Left as an exercise to the reader.
Theorem 4.6.5 [2, Theorem 2 of Page 134] Variance depends on scale not
on origin.
Proof. See [2, Theorem 2 of Page 134]
4.6.3. Coefficient of variance. It is denoted by Cv and defined as
σ
Cv = × 100% .
x̄
Problem 4.6 [6, Episode 11] Sales (in Taka) of 25 days of a company is
given by the following distribution (see Table 4.5).
Sales (Lakh) 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60
No. of days 03 06 11 03 02
Table 4.5.
Calculate its (a) standard deviation, (b) variance and (c) coefficient of
variance.
Solution Construct the following Table 4.6 and calculate the mean x̄.
Sales xi fi fi xi (xi − x̄)2 fi (xi − x̄)2
10 – 20 15 03 45 3.24 972
20 – 30 25 06 150 64.0 384
30 – 40 35 11 385 04.0 44
40 – 50 45 03 135 144 432
50 – 60 55 02 110 484 968
Total – 25 825 — 2800

Table 4.6.
P
f i xi 825
Here x̄ = P = = 33.
fi 25 sP r
fi (xi − x̄)2 2800
(a) Standard deviation, σ = P = = 10.58.
fi 25
(b) Variance, σ 2 = (10.58)2 = 111.9364.
σ 10.58
(c) Coefficient of variance, Cv = × 100% = × 100% = 32.06%.
x̄ 33
Alternative method. To solve the problem using the formula
sP 2
fi d2i
P
f i di
σ =h× − ,
n n
we construct the following Table 4.7. Let us consider the assumed mean,
xi − a
a = 35. Here h = 10. By the formula di = , we calculate di .
h
(a) Standard deviation,
sP 2 s 2
fi d2i
P
f i di 29 −5
σ =h× P − P = 10 × − = 10.58.
fi fi 25 25
(b) Variance, σ 2 = (10.58)2 = 111.9364.

Sales xi fi di fi di fi d2i
10 – 20 15 03 -2 -6 12
20 – 30 25 06 -1 -6 06
30 – 40 35 11 0 00 00
40 – 50 45 03 1 03 03
50 – 60 55 02 2 04 08
Total – 25 – -5 29
Table 4.7.
P
f i xi
(c) Here x̄ = P = 33. Alternatively,
fi
P
f i di −5
x̄ = a + h P = 35 + × 10 = 33.
fi 25
σ 10.58
Therefore coefficient of variance, Cv = × 100% = × 100% = 32.06%.
x̄ 33
Exercise 4.1 [6, Episode 12] Sales (in kg) of 25 days of a grocery is given
by the following distribution (see Table 4.8).
Sales (kg) 01 – 05 06 – 10 11 – 15 16 – 20 21 – 25 26 – 30
No. of days 02 03 05 08 04 03
Table 4.8.
Calculate its (a) mean deviation, (b) standard deviation, (c) variance and
(d) coefficient of variance.
Solution Construct
P the following Table 4.9 and calculate the mean x̄.
f i xi 415
Here x̄ = P = = 16.6.
fi 25 P
fi |xi − x̄|
(a) Mean deviation, Md (x̄) = P
fi
Sales xi fi fi xi |xi − x̄| fi |xi − x̄| fi (xi − x̄)2
0.50 – 05.5 03 02 06
05.5 – 10.5 08 03 24
10.5 – 15.5 13 05 65
15.5 – 29.5 18 08 144
20.5 – 25.5 23 04 92
25.5 – 30.5 28 03 84
Total – 25 415 —
Table 4.9.
sP
fi (xi − x̄)2
r
−
(b) Standard deviation, σ = P = = −−.
fi −
(c) Variance, σ 2 = (−)2 = −−.
σ −−
(d) Coefficient of variance, Cv = × 100% = × 100% = − − %.
x̄ −−
Note 4.4 Since in Exercise 4.1 it is also asked to find mean deviation and
rP
fi (xi − x̄)2
in it |xi − x̄| is used, we employ the formula σ = to find the
n
standard deviation and according to the demand of this formula we construct
the Table 4.9.
Problem 4.7 [6, Episode 13] The share prices (in Taka) during first 12 days
of a month at Dhaka Stock Exchange (DSE) and Chattogram Stock Exchange
(CSE) are given as follows (see Table 4.10).

On the basis of the given data in which share market will you invest? Why?
sP
(xi − x̄)2
Solution For Dhaka Stock Exchange (DSE), x̄ = 115, σ = P =
fi
σ
8.333. Cv = × 100% = 7.24%.
x̄
DSE 105 120 115 118 130 127 109 110 104 112
CSE 108 117 120 130 100 125 125 120 110 135
Table 4.10.
sP
(xi − x̄)2
For Chattogram Stock Exchange (CSE), x̄ = 119, σ = P =
fi
σ
10.09. Cv = × 100% = 8.47%.
x̄
Since the coefficient of variance of Dhaka Stock Exchange is less than that
of Chattogram Stock Exchange. So the data of Dhaka Stock Exchange is
comparatively more uniform. Thus I will invest in Dhaka Stock Exchange.

CHAPTER 5
Probability
In this chapter, the reader is referred to study [4, 7, 5].
5.1. Probability
Definition 5.1.1 Probability is the measure of uncertainty.
68
Bibliography
[1] LEANSCAPE, URL:https://leanscape.io/what-is-attribute-data/.
[2] M. Abdul Aziz, Statistics First Paper, Seventh ed., The Angel Publications, Dhaka,
January 2022.
[3] B. S. Grewal, Higher Engineeering Mathematics, Fourty third ed., Khanna Publishers,
2015.
[4] M. N. Islam, An Introduction to Statistics and Probability, Fifth ed., Mullick & Brothers,
Dhaka, June 2022.
[5] R. E. Walpole, R. H. Myers, S. L. Myers and K. E. Ye, Probability and Statstics for
Engineering and Scientists, Ninth ed., Pearson, 2022.
[6] M. A. Nayeem, Measures of Dispersion, OnnoRokom Pathshala, URL: https://www.
youtube.com/playlist?list=PLxSt9YDBipm6UxKzXpeyzSsED94ukGEBJ, 2018.
[7] S. M. Ross, Introduction to Probability and Statistics for Engineers and Scientists, Fifth
ed., Elsevier, 2014.
69
APPENDIX A
Some Theorems Related to Sigma Notation
A.1. Sigma Notation
Definition A.1.1 Let the variable x having n values x1 , x2 , x3 , · · · , xn .
Then the sum of the values of the variable x
x1 + x2 + x3 + · · · + xn
Pn
is simply represented by i=1 xi .
Theorem A.1.1 [2, Page 12] If x is a variable and α is a constant, then
n
X n
X
αxi = α xi .
i=1 i=1
Theorem A.1.2 [2, Page 12] If x is a variable and α, β are two constants,
then
n
X n
X
(αxi − β) = α xi − nβ.
i=1 i=1
Theorem A.1.3 [2, Page 12] If x is a variable and α, β, γ are three constants,
then
n
X n
X n
X
αx2i x2i

− βxi + γ = α −β xi + nγ.
i=1 i=1 i=1
Theorem A.1.4 [2, Page 13] If x, y are two variables and α, β are two
constants, then
n
X n
X n
X
(αxi − βyi ) = α xi − β yi .
i=1 i=1 i=1
70
A.1. SIGMA NOTATION 71
Theorem A.1.5 [2, Page 13] If x, y are two variables and α, β are two
constants, then
n
X n
X n
X
2 2
(αxi − β) = α x2i − 2αβ xi + nβ 2 .
i=1 i=1 i=1
Theorem A.1.6 [2, Page 13] If x is a variable, then

n
!2 n
X X XX
xi = x2i + xi y j .
i=1 i=1 i̸=j

STAT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT

Uploaded by

Copyright:

Available Formats

Lecture Notes on Statistics

Muhammad Shahnewaz Bhuyan

Chittagong University of Engineering & Technology

Chottogram - 4349, Bangladesh

Former Lecturer at Sylhet Cadet College, Sylhet-3101, Bangladesh

Started from : November 20, 2023

Last Updated : February 7, 2024

lies and statistics”

Standard deviation, Variance and coefficient of variance, Correlation and

regression, Probability, Bayes’ theorem, Discrete and random variable, Proba-

bility mass function, Probability density function, Mathematical expectations.

(1) M. N. Islam, Introduction to Statistics and Probability, Fifth edition,

Mullick & Brothers, Dhaka, June 2022

(2) R. E. Walpole, R. H. Myers, S. L. Myers and K. E. Ye, Probability and

Statistics for Engineering and Scientists, Ninth edition, Pearson, 2022.

(3) S. M. Ross, Introduction to Probability and Statistics for Engineers

and Scientists, Fifth edition, Elsevier, 2014.

Sunday Monday Tuesday Wednesday

09:00 - 09:50 Sec A

09:50 - 10:40 Sec C Sect C

11:00 - 11:50 Sec B Sec B

11.50 - 12:40 Sec A Sec B

12:40 - 01:30 Section A Sec C

Chapter 1. Introduction to Statistics 1

1.1. Population and Sample 1

1.2. Definition of Statistics 1

1.3. Types of Statistics 2

1.4. Singular and Plural Perspective of Statistics 3

1.5. Characteristics, Usage and Limitations of Statistics 3

Chapter 2. Data Summarization and Representation 6

2.2. Level of Measurement 8

2.3. Variable and Attribute 11

2.4. Frequency Distribution 14

2.5. Representation of Data 15

2.6. Tabular Representation of Data 15

2.7. Other Forms of Frequency Distributions 22

2.8. Graphical Representation of Data 23

Chapter 3. Central Tendency 28

3.1. Central Tendency 28

3.4. Quartiles, Percentiles and Deciles 44

3.6. Comparison and Relationship between the Averages 51

Chapter 4. Measure of Dispersion 52

4.1. What is Measure of Dispersion 52

4.2. Classification of Measures of Dispersions 54

4.3. Range and Coefficient of It 55

4.4. Mean Deviation and Coefficient of It 56

4.5. Quartile Deviation and Coefficient of It 60

4.6. Variance and Coefficient of Variance 60

Appendix A. Some Theorems Related to Sigma Notation 70

A.1. Sigma Notation 70

For detail discussion the reader is referred to [2, 4, 5].

1.1. Population and Sample

1.1.1. Statistical population and sample. In statistics notions of pop-

ulation and sample are of great importance. Simply, statistics is a study of

population through sample.

Definition 1.1.1 [4, Definition 1.2 of Page 11] A statistical population is

the collection of all items of interest in a particular study.

Definition 1.1.2 [4, Definition 1.2 of Page 11] Sample is a representative

subset (or part) of population.

1.2. Definition of Statistics

1.2.1. Definition of statistics. Etymologically, statistics means the num-

ber - based description of any event or subject. R. A. Fisher is known as the

father of statistics. Different statisticians have given the definition of statistics