You are on page 1of 77

Lecture Notes on Statistics

Prepared by

Muhammad Shahnewaz Bhuyan

Lecturer

Department of Mathematics

Chittagong University of Engineering & Technology

Chottogram - 4349, Bangladesh

Former Lecturer at Sylhet Cadet College, Sylhet-3101, Bangladesh

Started from : November 20, 2023

Last Updated : February 7, 2024


“There are three kind of lies : lies, damned

lies and statistics”


— British prime minister Benjamin Disraeli
Course Outline

Syllabus

Standard deviation, Variance and coefficient of variance, Correlation and

regression, Probability, Bayes’ theorem, Discrete and random variable, Proba-

bility mass function, Probability density function, Mathematical expectations.

Books Recommended

(1) M. N. Islam, Introduction to Statistics and Probability, Fifth edition,

Mullick & Brothers, Dhaka, June 2022

(2) R. E. Walpole, R. H. Myers, S. L. Myers and K. E. Ye, Probability and

Statistics for Engineering and Scientists, Ninth edition, Pearson, 2022.

(3) S. M. Ross, Introduction to Probability and Statistics for Engineers

and Scientists, Fifth edition, Elsevier, 2014.

Class Routine

Sunday Monday Tuesday Wednesday

09:00 - 09:50 Sec A

09:50 - 10:40 Sec C Sect C

11:00 - 11:50 Sec B Sec B

11.50 - 12:40 Sec A Sec B

12:40 - 01:30 Section A Sec C

iv
Contents

Course Outline iv

Syllabus iv

Books Recommended iv

Class Routine iv

Chapter 1. Introduction to Statistics 1

1.1. Population and Sample 1

1.2. Definition of Statistics 1

1.3. Types of Statistics 2

1.4. Singular and Plural Perspective of Statistics 3

1.5. Characteristics, Usage and Limitations of Statistics 3

Chapter 2. Data Summarization and Representation 6

2.1. Data 6

2.2. Level of Measurement 8

2.3. Variable and Attribute 11

2.4. Frequency Distribution 14

2.5. Representation of Data 15

2.6. Tabular Representation of Data 15

2.7. Other Forms of Frequency Distributions 22

2.8. Graphical Representation of Data 23

v
CONTENTS vi

Chapter 3. Central Tendency 28

3.1. Central Tendency 28

3.2. Mean 30

3.3. Median 38

3.4. Quartiles, Percentiles and Deciles 44

3.5. Mode 50

3.6. Comparison and Relationship between the Averages 51

Chapter 4. Measure of Dispersion 52

4.1. What is Measure of Dispersion 52

4.2. Classification of Measures of Dispersions 54

4.3. Range and Coefficient of It 55

4.4. Mean Deviation and Coefficient of It 56

4.5. Quartile Deviation and Coefficient of It 60

4.6. Variance and Coefficient of Variance 60

Chapter 5. Probability 68

5.1. Probability 68

Bibliography 69

Appendix A. Some Theorems Related to Sigma Notation 70

A.1. Sigma Notation 70


CHAPTER 1

Introduction to Statistics

For detail discussion the reader is referred to [2, 4, 5].

1.1. Population and Sample

1.1.1. Statistical population and sample. In statistics notions of pop-

ulation and sample are of great importance. Simply, statistics is a study of

population through sample.

Definition 1.1.1 [4, Definition 1.2 of Page 11] A statistical population is

the collection of all items of interest in a particular study.

Definition 1.1.2 [4, Definition 1.2 of Page 11] Sample is a representative

subset (or part) of population.

1.2. Definition of Statistics

1.2.1. Definition of statistics. Etymologically, statistics means the num-

ber - based description of any event or subject. R. A. Fisher is known as the

father of statistics. Different statisticians have given the definition of statistics

from their perspective.

Definition 1.2.1 According to R. A. Fisher, the science of statistics is

essentially a branch of applied mathematics and may be regarded as mathe-

matics, applied to observational data.

1
1.3. TYPES OF STATISTICS 2

Definition 1.2.2 According to Croxton and Cowden, Statistics may be

defined as the science of collection, presentation, analysis and interpretation

of numerical data.

A summarized version of the definitions of statistics given by different

statisticians is as follows.

Definition 1.2.3 [4, Definition 1.1 of Page 9] Statistics is concerned with

scientific methods for collecting, organising, summarising, presenting and an-

alyzing sample data from a specified population of interest as well as drawing

valid conclusions, and making inferences about the population characteristics

and finally reaching a reasonable decision.

Simply, statistics can be treated as methodology for collecting, presenting,

analyzing and interpreting data.

1.3. Types of Statistics

1.3.1. Sub-divisions of statistics. Mainly, statistics is sub-divided into

two categories :

(i) descriptive statistics

(ii) inferential statistics.

1.3.2. Descriptive statistics. It includes methods for describing, orga-

nizing or summarizing a data set. It can be referred as a superficial study of

data. Constructing charts, tables etc on the basis of numerical data are the

part of descriptive statistics.


1.5. CHARACTERISTICS, USAGE AND LIMITATIONS OF STATISTICS 3

1.3.3. Inferential statistics. It finds out some characteristics of the pop-

ulation on the basis of a sample chosen from the population. Inferential statis-

tics use summarized data to predict or infer some general statements about the

population on the study run over a sample. It is also referred as predictive

statistics or inductive statistics.

1.4. Singular and Plural Perspective of Statistics

1.4.1. Singular perspective. In singular sense, statistics refers to the

field of study as a subject.

Example 1.1 He is good at statistics — here singular perspective is used.

1.4.2. Plural perspective. In plural sense, statistics refer to information

in terms of numerical data.

Example 1.2 The statistics illustrate that the illiteracy rate has been de-

creased by 5% in the last two years.

1.4.3. Difference between singular and plural perspective of sta-

tistics. In singular sense, like an uncountable noun statistics refers to the field

of study as a whole. This meaning is used to discuss different principles, tech-

niques, notions which are employed for studying data. In plural perspective,

statistics refers to numerical data which have been collected and analyzed.

1.5. Characteristics, Usage and Limitations of Statistics

1.5.1. Characteristics of statistics. The statistics we collect should

posses the following characteristics.


1.5. CHARACTERISTICS, USAGE AND LIMITATIONS OF STATISTICS 4

(a) Statistics should deal with aggregate or average of observations of

individuals rather than with individuals alone.

(b) Statistics should expressed as numerical figures.

(c) Statistics should have the property of being varied by multiplicity of

causes.

(d) Statistics should be of reasonable standards of accuracy.

(e) Statistics should be obtained for pre-determined purposes.

(f) Statistics collected should allow comparison with other data.

1.5.2. Usages and importance of statistics. The scope and usages of

statistics are very wide and increasing day by day. A few examples of different

usages are given below.

(a) Business

(b) Pharmaceutical companies

(c) Nutrition survey

(d) Manufacturing industries

(e) Insurance companies

and so on.

1.5.3. Limitations of statistics. Statistics has a number of limitations,

some of these are given below.

(a) Statistics are not 100% are precise like is Mathematics.

(b) There are certain phenomena or notions like beauty, honesty, intelli-

gence etc which can not be quantified. So here statistics can not be

applied.
1.5. CHARACTERISTICS, USAGE AND LIMITATIONS OF STATISTICS 5

(c) Statistics reveals the average behaviour. So if the decision taken on

an average concept is applied on an individual, it may lead to a wrong

conclusion.

(d) Statistics studies population through sampling. Sampling is generally

used as it is not physically possible to cover all the units consisting

the population. Moreover, different surveys run over the same size of

sample may give different results.

(e) Statistics are collected for a fixed target. So data collected for one

purpose may not be relevant for another purpose.

(f) Statistics studies relationships between two or more variables. Such

a relationship show the similarity or dissimilarity in the movement of

the two variables. But these relationship do not indicate cause.


CHAPTER 2

Data Summarization and Representation

In this chapter, for details the reader is referred to study [2, 4, 5].

2.1. Data

Definition 2.1.1 [4, Subsection 1.6.1 of Page 11] Data are the observations

or chance outcomes that occur in planned experiments or scientific investiga-

tions.

2.1.1. Classification of data. In broad sense, statistical data are clas-

sified into two categories as follows.

(i) Quantitative data

(ii) Qualitative data.

2.1.2. Quantitative data. Which can be measured in quantitative units.

Example 2.1 Measurements of weight, height, temperature of a body.

2.1.3. Qualitative data. These are generated by assigning observations

into various independent categories and then counting the frequency of the

occurrences within these categories.

Example 2.2 Eye colour, IQ, religion etc.

2.1.4. Sources of data. There are mainly two sources of data, namely

(a) primary sources

6
2.1. DATA 7

(b) secondary sources

2.1.5. Primary sources. [4, Page 12] These are raw materials of primary

of investigations.

Example 2.3 Data collected from survey, online survey, interview, observa-

tions, experiment etc

2.1.6. Secondary sources. [4, Page 14] Those data which have been

already collected by and readily available from other sources.

Example 2.4 Salary statement collected from a bank.

2.1.7. Classification of data. [2, Page 26] Classification is a process

of arranging the available information into homogeneous groups according to

similarities or same characteristics.

Definition 2.1.2 [4, Page 30] According to Conner, the process of arranging

things in groups or classes according to their resemblances and affinities and

gives expression to the unity of attributes that may subsist amongst a diversity

of individuals.

2.1.8. Basis of classifying data. [2, Page 26 – 27] Data can be classified

based on the following categories :

(a) Geographical classification : data are classified on the basis of

geographic areas.

(b) Chronological classification : data are classified on the basis of

time.
2.2. LEVEL OF MEASUREMENT 8

(c) Quantitative classification : data are classified in terms of magni-

tude. In this classifiaction there are two important elements, namely

variable and frequency. For example, the marks obtained by the EEE

students in statistics. Here marks is variable and the number of EEE

students is frequency.

(d) Qualitative classification : data are classified on the basis of some

attributes. For example, on the basis religion, sex etc.

2.1.9. Importance of classifying data. [4, Page 31] The necessity of

classifying summarizing and classifying data is

(a) to simplify the complex data.

(b) to economize spaces.

(c) to identify omissions and errors.

(d) to facilitate statistical analysis.

(e) to depict trend.

(f) to use it as future reference.

2.2. Level of Measurement

2.2.1. Measurement. Statistical data are obtained through measure-

ment.

Definition 2.2.1 [4, Page 33] Measurement is essentially the task assigning

numbers to observations according to certain rules.


2.2. LEVEL OF MEASUREMENT 9

2.2.2. Levels of measurements. The way in which numbers are as-

signed to observations determines the scale of measurements. There are four

levels of measurements such as :

(i) nominal level (iii) interval level and

(ii) ordinal level (iv) ratio level.

2.2.3. Nominal level. All qualitative measurements are nominal. In this

level of measurement, the categories differ from one another only in name.

That is, one category of a characteristic is not necessarily higher or lower,

greater or smaller than the other category.

Example 2.5 Colour such as red, white etc; sex such as male, female; nu-

merals such as bank account no, student registration number etc.

Note 2.1 Measurements involving numbers can be nominal also. For in-

stance, student ID, Room no, NID number, Birth registration number etc.

In this case these numbers have no numerical value, rather these are used as

identity.

2.2.4. Ordinal level. There is ordered relationship among the categories.

Unlike nominal level, here we get the typical relationships higher, lower, more

than, less than, more difficult, less favorable, more prejudiced etc.

Example 2.6 The level of education such that MS, B.Sc(Hons) etc.

Example 2.7 Official position like Professor, Associate Professor, Assistant

Professor, Lecturer etc.


2.2. LEVEL OF MEASUREMENT 10

Example 2.8 Class performance like outstanding, excellent, very good, good.

2.2.5. Interval level. It includes all properties of the previous two levels.

In addition, the difference between values is known and of constant size. In

this measurement level zero is not a significant figure.

Example 2.9 The difference 21o and 20o in centigrade temperature scale is

same as 13o and 12o .

Note 2.2 In interval level of measurement 0 does not mean the absence of

anything. For example, 0c temperature does not mean the absence of temper-

ature, 0000 hour does not mean the absence of time and so on.

Note 2.3 In interval level of measurement, addition and substraction are

defined, though multiplication and division are not defined. For instance, 10o

centigrade temperature is not double of the temperature 50 centigrade, year

2024 is not the half of the year 2048 or double of the year 1012 and so on.

2.2.6. Ratio level. Numerical measurements in which zero (0) is a mean-

ingful value and is the difference between the values is important are ratio level

of measurement.

Example 2.10 Height, weight, wage etc.

Note 2.4 Unlike interval of measurement in ratio level of measurement, mul-

tiplication and division are defined. For instance, a plant with height 16 feet

is double of a plant with 8 feet.


2.3. VARIABLE AND ATTRIBUTE 11

Note 2.5 Higher level of measure can be used as a lower level of measure,

but the converse is not true.

2.3. Variable and Attribute

2.3.1. Variable. The characteristics which varies from unit to unit is

called variable.

Definition 2.3.1 (Variable) [4, Definition 2.5 of Pgae 36] A variable is

a characteristic or property, often but not always quantitatively measured,

containing two or more values or categories that can vary from one individual

to another individual.

Example 2.11 Fathers’ occupation of the students of EEE Department of

CUET.

Example 2.12 Income, as it varies person to person.

Example 2.13 Age, as it varies person to person.

2.3.2. Classification of variables. There are two types of variables mainly.

These are :

(i) qualitative variable and

(ii) quantitative variable.

2.3.3. Qualitative variable. The characteristics of a unit or of an item

that can not be expressed in numerical form is called qualitative variable.

It is also referred as categorical variable or nominal variable.


2.3. VARIABLE AND ATTRIBUTE 12

Example 2.14 Religion of the students of EEE Department.

Example 2.15 Gender

2.3.4. Quantitative variable. The characteristics of a unit or of an item

that can be expressed in numerical form is called quantitative variable. This

type of variable is known as metric variable.

Example 2.16 Fathers’ monthly income of the students of EEE Department.

2.3.5. Classification of quantitative variables. Quantitative variables

are also divided into two categories :

(i) discrete variable and

(ii) continuous variable.

2.3.6. Discrete variable. A quantitative variable that can take only

some (pre-)specified isolated values is called discrete variable. Discrete vari-

able does not necessarily involves only integer or whole numbers, it can take

fractional or decimal values also.

Example 2.17 Grades of SSC exam is a discrete variable. Because it does

not take all numbers (for example 4.0123) between 4.00 to 5.00. It takes only

some fixed numbers for different gradings, like 5.00 for A+ , GPA 4.00 for A,

GPA 3.75 for A− and so on, but Grading system GPA 3.00019 does not come.

Example 2.18 Number of road accidents per day in Chottogram.


2.3. VARIABLE AND ATTRIBUTE 13

Example 2.19 Laptop size, television size (there is no 14.234 inches size

television), Mobile phone size, video quality (there is no video with quality

725.4 or in between 720–1080) etc.

Note 2.6 A variable taking only integer value is always discrete. For exam-

ple, number of students in different Departments of CUET. On the other hand,

a discrete variable may not take only integral value. For example, CGPA of a

student may be 3.71.

2.3.7. Continuous variable. A quantitative variable that can take val-

ues within a range or limit is called continuous variable.

Example 2.20 Height and weight of the students of EEE Department.

Note 2.7 A continuous variable can take any value between any two given

values, like the height of a plant can be any value from 5 meter to 20 meter.

2.3.8. Constant. A numerical characteristic that never changes is called

a constant.

Example 2.21 Total angle of a triangle, velocity of light, radius of the earth,

π, e, 32 etc.

2.3.9. Attribute. The distinct categories of the qualitative variable is

referred as attributes.

Definition 2.3.2 (Attribute) [4, Definition 2.10 of Pgae 38] An attribute

is a quality, character or characteristic to someone or something. In other


2.4. FREQUENCY DISTRIBUTION 14

words, attribute is referred as the property or characteristic of categorical (or

qualitative) data.

Attribute data is typically used to conjunction with other forms od data

to provide additional contexts and insights.

Example 2.22 [1] Suppose that you are looking at sales data of a clothing

store. You may use attribute like colour and size to segment the data and

better understand which product are selling well and which are not.

Example 2.23 [4] If someone notes down for each individual whether he/she

possess or does not posses certain characteristic like owns a laptop, smokes or

does not smoke, holds an opinion on certain political issues - these character-

istics may be called attribute.

2.4. Frequency Distribution

2.4.1. Frequency and frequency Distribution. Frequency means how

many times an observation occurs.

Definition 2.4.1 (Frequency) [2, Page 30] The repeated times of a value of

the variable is called frequency of that value.

Definition 2.4.2 (Frequency distribution) [2, Page 30] Frequency distribu-

tion is a listing of a data set which divides the data in different mutually

exclusive (non-over-lapping) classes and gives a count of number of observa-

tions in each class.


2.6. TABULAR REPRESENTATION OF DATA 15

2.4.2. Types of frequency distribution. [2, Page 30] Frequency dis-

tribution can be constructed for both categorical and numerical data. Mainly,

frequency distribution are of two types :

(i) discrete or ungrouped frequency distribution

(ii) continuous or grouped frequency distribution.

In discrete frequency distribution, data are represented against the discrete

variable. When every observed value of a variable is listed, then ungrouped

frequency distribution is formed. When data are grouped and organised in a

frequency distribution, then a grouped frequency distribution is formed.

2.5. Representation of Data

We need to classify, categorise and summarise data in a convenient and

interpretable form. We can do it by tabular and graphical procedures.

2.5.1. Methods of data representation. Largely there are two ways

to represent data. These are

(i) Tabular representation

(ii) Graphical representation.

The concept of frequency distribution is introduced as a tabular method of

summarising data. Frequency distribution can be demonstrated graphically.

2.6. Tabular Representation of Data

2.6.1. Tabular representation of data. [2, Page 28] Tabular repre-

sentation of data is an orderly and logical form of information expressed in

numbers. According to L. R. Connor,


2.6. TABULAR REPRESENTATION OF DATA 16

Tabulation involves the orderly and systematic presentation of

numerical data in a form designed to elucidate the problem under

consideration.

2.6.2. Frequency table. Frequency table demonstrates frequency distri-

bution. Usually, frequency distribution is presented in tabular form.

Note 2.8 Frequency distribution may also be represented graphically or by

some rules for pairing a class of observations with its frequency.

2.6.3. Construction of frequency table for categorical data. To

represent categorical data we use different types of tables like univariate table,

bi-variate table, cross table etc.

2.6.4. Univariate table for categorical data. The univariate table

is that one, in which frequency distribution of a single categorical variable is

shown. To construct such type of table following steps are followed :

(i) Choose the category into which the data are to be grouped.

(ii) Sort (or tally) the data into appropriate category.

(iii) Count the number of items falling in each category.

(iv) Display the result in a table.

Sometimes in a univariate table a column of percentage is added. It is to be

noted that percentage must add to 100.


2.6. TABULAR REPRESENTATION OF DATA 17

Example 2.24 Suppose that religion of 10 students of a specific department

is as follows :

Muslim Hindu Christian Buddhist Muslim

Muslim Christian Christian Muslim Muslim

The univariate frequency distribution table of the above data is displayed below

(see Table 2.1).

Religion Frequency Percentage

Muslim 5 50

Hindu 1 10

Christian 3 30

Buddhist 1 10

Total 10 100
Table 2.1.

2.6.5. Cross table. To describe relationship between two categorical or

ordinal variables a cross table is used. A cross table with r rows and c columns

is referred as r × c table (which is read as r by c table). An entry of the

r-th row and c-th column in a cross table is called the cell frequency of

the rj-th position of that table. In a cross table, usually, row total, column

total and percentage comparison are shown. The totals in the columns and

rows are called the marginal frequencies. A cross table that demonstrates

relationship qualitative data is referred as contingency table.


2.6. TABULAR REPRESENTATION OF DATA 18

Example 2.25 [4, Table 2.6 of page 43] A contingency table showing rela-

tionship between education level and family size of 50 students is given below

(see Table 2.2).

Family size

Education level Large Medium small Row total

None 4 6 1 11

Primary 6 8 5 19

higher 6 10 4 20

Column Total 16 24 10 50
Table 2.2.

Note 2.9 A cross table may be of mixed type in terms of variables. One

variable may be categorical and another may be numerical

2.6.6. Multi-variate tables. A table showing relations three variables

is referred as a tri-variate table and so on.

2.6.7. Construction of frequency distribution of numerical data.

To construct a frequency table with numerical data, usually we try to form an

array first.

Definition 2.6.1 [4, Definition 2.13] An array is an ordering of the values

of a variable by means of their magnitude usually in ascending order.

But organisation of data using array becomes cumbersome, when the num-

ber of observations is very large. So we arrange the data into a number of


2.6. TABULAR REPRESENTATION OF DATA 19

mutually exclusive classes. In such a way we get a descent arrangement of raw

data, which is known to us as frequency distribution.

2.6.8. Ungrouped frequency distribution. If we construct a table

with each value of a variable in one column and its frequency in another col-

umn, then ungrouped frequency distribution is formed.

Example 2.26 (Example of ungrouped frequency distribution) Number of

family members of 10 EEE students are given as below :

1 2 2 3 2
.
4 3 4 2 1

The ungrouped frequency distribution of the above data is give by the Ta-

ble 2.3.

Family size Frequency

1 2

2 4

3 2

4 2

Total 10
Table 2.3.

2.6.9. Grouped frequency distribution. At this stage we discuss some

terminologies which are used to form a grouped frequency distribution.

2.6.9.1. Range. Let L be the largest value and S be the smallest value in a

data set. Then L − S is called range of that data set. In other words, range is
2.6. TABULAR REPRESENTATION OF DATA 20

the difference of the two extreme values of a set of observations. It is denoted

by R.

2.6.9.2. Class. In the process of condensation, raw data are assigned to

some chosen groups of appropriate size. Each of these groups is called a class.

The number of classes, usually denoted by k, in a grouped frequency distri-

bution should neither be too large or too small. This should be of a reasonable

and suitable figure. According to the maximum statistician the number of

classes should be in the range 5 to 25 ([2, Page 31]). The choice of actual

number of classes depends on the number of observations and the size of class

interval desired. Sturge, a famous statician, has proposed the following formula

to determine the approximate number of classes

k = 1 + 3.322 log10 N ,

where N is the total number of observations or total frequencies. Another

propose of Sturge to determine the number of classes k is as follows.

k should be the smallest whole number so that 2k ≥ N , where

N is the total number of observations.

2.6.9.3. Frequency. The number of observations lying into each class is

called the class frequency, or simply frequency.

2.6.9.4. Class interval. The difference between the lower limit and upper

limit of a class is called the class width or class interval of that class. It is

denoted by w.
R
According to the Sturge’s approach, the formula w = determines the
k
approximate the class interval.
2.6. TABULAR REPRESENTATION OF DATA 21

2.6.9.5. Class-mark. The class-mark is the value that lies in the middle

of the class. It is obtained by averaging two class boundaries.

2.6.10. Grouped frequency distribution for discrete data. Number

of working days, number of workers, television size etc are discrete data. For-

mation of frequency distribution for such type discrete data is illustrated in

the following problem.

Problem 2.1 [4, Example 2.4 of Page 52] The number of days absent of the

worker from their work due to illness is as follows.

5 8 9 9 10 10 10 10 11 11

12 12 12 13 13 13 14 14 14 15

15 15 15 16 16 16 16 17 17 17.

17 18 18 18 18 18 19 19 19 19

20 21 21 22 23 24 26 27 29 33

Construct a grouped frequency distribution of the above data set.

Solution N = 50. Range R = 33 − 5 = 28. Number of classes k = 6.44 ≈ 6,

as 26 = 64 ≥ 50. Class width w = 4.2 ≈ 5 (next higher digit).

Note 2.10 For discrete distributions, class limits are always inclusive in

nature.

2.6.11. Grouped frequency distribution for continuous data. Weight,

age, temperature etc are continuous data. Formation of frequency distribution

for such type continuous data is illustrated in the following problem.


2.7. OTHER FORMS OF FREQUENCY DISTRIBUTIONS 22

Lower limit of second class − Upper limit of first class


Correction factor, F =
2

Problem 2.2 [4, Example 2.5 of Page 55] The ages of 50 workers are given

as follows.

25 33 37 42 45 28 34 37 42 46

29 35 37 42 46 30 35 38 43 46

31 35 38 43 46 32 36 38 43 47.

32 36 39 44 50 32 36 40 44 51

33 36 41 44 52 33 37 42 45 54

Construct a suitable frequency distribution with class width of appropriate

size.

Solution N = 50. Range R = 54−25 = 29. Number of classes k = 6.64 ≈ 6,

as 26 = 64 ≥ 50. Class width w = 4.83 ≈ 5 (next higher digit).

Note 2.11 For continuous distributions, class limits are always exclusive in

nature.

2.7. Other Forms of Frequency Distributions

2.7.1. Percentage frequency distribution. If fi is the frequency of

the i-th class and N is the total frequency, then the percentage of the cases

(observations) lying in the i-th class is

fi
Pi = × 100.
N
 
P fi
Note 2.12 × 100 = 100.
N
2.8. GRAPHICAL REPRESENTATION OF DATA 23

2.7.2. Relative frequency distribution. Relative frequency of the i-th


fi
class is Ri =.
N
 
P fi
Note 2.13 = 1.
N

2.7.3. Cumulative frequency distribution. decumulative

2.8. Graphical Representation of Data

In addition presenting a frequency distribution in tabular form, one can

present the same through graphs, diagrams or charts. Some of these graphs,

diagrams or charts are as follows.

2.8.1. Graphical representation of categorical data. [4, Page 65]

To represent categorical or qualitative data the following diagrams are usually

used :

(i) Bar diagram (iv) Pie diagram

(ii) Stacked bar diagram (v) Pareto diagram

(iii) Cluster bar diagram (vi) Dot plot.

Among these bar and pie diagrams are discussed below as these are com-

monly used in this level.

2.8.2. Bar diagram. In bar diagram frequencies against the categories

are represented by rectangle separated along one of the two axes, namely x-axis

and y-axis. It is commonly used to represent categorical data. Bar diagrams

are of two types :

(i) Vertical bars


2.8. GRAPHICAL REPRESENTATION OF DATA 24

(ii) horizontal bars

Separation taken along the horizontal axis forms vertical bars, whereas sep-

aration taken along the vertical axis forms horizontal bars. Bar diagram is

also know as bar chart.

Note 2.14 The widths of the bars have no significance, but are taken to

make the chart look attractive.

2.8.3. Pictogram. Attractive visual appearance.

2.8.4. Stacked bar chart. Also known as component bar chart or

sub-divided bar chart.

Example 2.27 [4, Example 2.19 of page 75]

2.8.5. Cluster bar chart. multiple bar diagram

Example 2.28 [4, Table 2.21 of page 79]

2.8.6. Pie diagram. Pie diagram is useful to represent categorical data.

A pie diagram consists of a circle subdivided into sectors, whose enclosed

area is proportional to various part into which the whole quantity is divided.

Other data can also be employed to construct a pie chart after suitable and

meaningful classification or grouping of the data. Pie diagrams are also known

as pie chart.
Percent value
angle = × 360o
100

angle = 360o .
P
Note 2.15
2.8. GRAPHICAL REPRESENTATION OF DATA 25

Note 2.16 Percentage frequency is used in pie chart.

Note 2.17 [4, Page 81] The various parts of the pie chart drawn may be

identified either by angles in degrees or in percentage values.

2.8.7. Graphical representation of quantitative data. Quantitative

data are of two types, namely discrete and continuous. These two types of

data are represented graphically in different ways.

2.8.8. Graphical representation of discrete data. To represent dis-

crete data two types of lines are usually used :

(i) Dot lines

(ii) Continuous lines.

Beside these, discrete data can be represented by bar charts also [4, Page 91–

92].

Example 2.29 [4, Example 2.38 of page 92].

2.8.9. Graphical representation of continuous data. [4, Page 93]

Usually in one of the following ways, we represent continuous data :

(i) Histogram

(ii) Frequency polygon

(iii) Ogive curve.

2.8.10. Histogram. Most commonly used. In histogram class boundaries

are taken along horizontal axis and frequencies are taken along the vertical

axis. Drawing a rectangular bar with class boundary as its base and frequency
2.8. GRAPHICAL REPRESENTATION OF DATA 26

proportional to its height, each class is represented. In histogram, width of

a class is used and here the area of a bar represents the corresponding class

frequency. Proportional frequencies is often referred as frequency densities.

Note 2.18 In bar diagram, width of a bar is not significant. But in histogram

it is significant.

2.8.10.1. Histogram for equal class interval. The following problem illus-

trates the process of constructing histogram for equal class interval.

Problem 2.3 [4, Example 2.39 of page 93] House rent paid (as percentage

of their total income) by 80 urban families revealed the following data (see

Table 2.4).

House rent No. of family

4.5 – 9.5 08

9.5 – 14.5 29

14.5 – 19.5 27

19.5 – 24.5 12

24.5 – 29.5 04
Table 2.4.

Construct a histogram using the above data.

2.8.10.2. Histogram for unequal class interval. The following example il-

lustrates the process of constructing histogram for unequal class interval.


2.8. GRAPHICAL REPRESENTATION OF DATA 27

Problem 2.4 [4, Example 2.40 of page 94] A frequency distribution (see

Table 2.5) with unequal class width is given below.

Class boundary Frequency

4.5 – 14.5 37

14.5 – 19.5 27

19.5 – 29.5 16
Table 2.5.

Construct a histogram using the above data.

2.8.11. Frequency polygon. Alternative to histogram. In frequency

polygon, frequencies are taken against the mid-value (or class-mark) of a class

respectively along vertical axis and horizontal axis. It is a closed figure. To

make it closed two classes with zero frequency are taken at the top and bottom

of the given frequency distribution.

Problem 2.5 [4, Example 2.41 of page 95–96] Draw a frequency polygon for

the distribution given in Problem 2.3.

2.8.12. Cumulative frequency polygon. Cumulative frequency is taken

against the upper limit of the class interval respectively along the horizontal

and vertical axis. It is referred as the ogive. There are two types of ogive

curves, namely less than or greater than type.

Problem 2.6 Draw a ogive curve for the distribution given in Problem 2.3.
CHAPTER 3

Central Tendency

For details the reader is referred to [4].

3.1. Central Tendency

3.1.1. Central Tendency. A Measure of central tendency is a nu-

merical index that attempts to answer the question :

What is the typical value of the observations in this distribution?

It represents the average character of the data. According to Simpson and

Kafea, a measure of central tendency is a typical value around which other

figures congregate. According to D. A. Lind, W. G. Marchal and R. D. Mason,

measures of central tendency is a single value that summarizes a set of data.

It locates the center of the value. According to Murry R. Speigal, averages is

a value which is typical or representative of set of data.

3.1.2. Functions or purposes or usages of central value. There are

many objectives of averaging data. Some of them are as follows :

(a) A central value is useful for describing the position or location of a set

of observations or values. For example, it is very tough to memorize

the weights of the all EEE students of CUET, whereas an average

value in this case express the overall weight.

28
3.1. CENTRAL TENDENCY 29

(b) To compare two or more sets of data or series, a central value is used.

For example, to compare the height of male and female students of

CUET we can compare the averages of both groups.

(c) To express an overall condition of a set of data or observations, central

value is used. For example, if it is said that the average life expectancy

of a new born in Bangladesh is 60 years, then we easily understand

that the overall life-time of a citizen of Bangladesh is 60 years.

(d) Averages help to obtain the picture of a complete universe by means

of sample data.

(e) In the research, averages play vital role in setting standards, estima-

tions and other managerial decisions.

(f) To trace a mathematical relationship among different groups an aver-

age becomes essential.

3.1.3. Requisites of a measure of central tendency. The requisites

of a measure of central tendency are given below :

(a) It should not be indefinite or ill-defined. It should be rigidly defined.

(b) It should be easily understandable and should have an overall general

expression.

(c) It should be based upon all the observations and can be calculated

easily.

(d) It should be the representative of the data.

(e) It should be suitable for further arithmetic algebraic treatment.

(f) It should give a good standard or comparison.

(g) It should be less affected by sampling fluctuations.


3.2. MEAN 30

3.1.4. Different measures of central tendency. There are several types

of average values. Among them, some most commonly used averages are

(i) Mean

(ii) Median

(iii) Mode

3.2. Mean

3.2.1. Mean and its types. One of the well-known central tendencies is

mean. It is also of three types, namely

(i) arithmetic mean

(ii) geometric mean

(iii) harmonic mean.

3.2.2. Arithmetic mean. Sometimes arithmetic mean is also simply re-

ferred as mean.

Definition 3.2.1 [4, Definition 3.1 of Page 128] The arithmetic mean is

an average or central value of a set observations obtained by summing these

observations and then diving this sum by the number of such observations.

The arithmetic mean x̄ of the n real number, namely x1 , x2 , x3 , · · · , xn is

x1 + x2 + x3 + · · · + xn
x̄ = .
n

Arithmetic mean may be positive, negative or zero.

Example 3.1 If the marks of six students in mathematics are 3, 5, 9, 8, 5


3+5+9+8+5+3
and 3, then the arithmetic mean of their is .
6
3.2. MEAN 31

Remark 1 (Sample mean versus population mean) Suppose that a sample

of size n is chosen from a population with N and xi denotes i-th observation.

Then the sample mean is denoted by x̄ and defined as

Pn
i=1 xi
x̄ = ,
n

whereas population mean is denoted by µ and defined as

PN
i=1 xi
µ= .
N

3.2.3. Arithmetic mean for ungrouped data. Suppose that the values

x1 , x2 , x3 , · · · , xn of the variable x occur respectively f1 , f2 , f3 , · · · , fn


Pn
f i xi
times. Then the formula for the arithmetic mean takes form x̄ = Pi=1 n ,
i=1 fi
P
f i xi
or simply x̄ = P .
fi
A dummy table for computing arithmetic mean of ungrouped distribution

is in [4, Table 3.1 of Page 130].

Problem 3.1 [4, Example 3.2 of Page 130] A sample survey of Bangladesh

Bureau of Statistics in a rural area collected the age of first marriage (AFM)

in years of 330 newly married women. The distribution was as follows.

AFM 11 12 13 14 15 16 17 18 19 20

No. of women 17 28 37 52 70 48 36 23 11 08

Calculate the mean age at first marriage for the women in the sample.

3.2.4. Arithmetic mean for grouped data. To compute arithmetic

mean from grouped distribution, the mid-point of each class is taken as the
3.2. MEAN 32

representative value of that class. The mid-values of different classes are mul-

tiplied by their respective class frequencies. Then the products are added.

Finally, the sum of the products is divided by the total number of frequen-

cies to get the required arithmetic mean. Suppose that there are k number of

classes and y1 , y2 , y3 , · · · , yn are the mid-values of the classes with frequencies

f1 , f2 , f3 , · · · , fn respectively. Then the formula for the arithmetic mean takes


P
fi yi
form ȳ = P .
fi

Problem 3.2 [4, Example 3.3 of Page 132] Distribution of a group of workers

by their age (in years) is as follows.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Calculate their mean age.

Solution Let us produce the following Table 3.1.

Age (in years) Mid-value (yi ) Frequency (fi ) fi yi

24.5 - 29.5 27 3 81

29.5 - 34.5 32 9 288

34.5 - 39.5 37 15 255

39.5 - 44.5 42 12 504

44.5 - 49.5 47 7 329

49.5 - 54.5 52 4 208


P P
Total – fi = 50 fi yi = 1965
Table 3.1.
3.2. MEAN 33

P
fi y i 1965
Therefore ȳ = P = = 39.3.
fi 50

Answer 39.5 years.

3.2.5. Short-cut method to find arithmetic mean of ungrouped

data. Suppose that a set n values x1 , x2 , x3 , · · · , xn are given. If we subtract

a constant value a from each of the values of x, then obtain a new set of values

d1 , d2 , d3 , · · · , dn . Then formula to calculate the mean for the ungrouped

distribution takes the form


P
di
x̄ = a + ,
n

where a is called the origin factor, or assumed mean, or provisional mean.

This formula is written as

x̄ = a + d¯ ,

d1 + d2 + d3 + · · · + dn
where d¯ = and di = xi − a.
n

Problem 3.3 [4, Example 3.4 of Page 133] Compute the arithmetic mean

of the values 249, 211, 447, 380, 410 and 190 by suitably changing the origin.

Solution Let the origin be a = 300. Now we compute di as follows.

xi 249 211 447 380 410 190

di = xi − a -60 -89 147 80 110 -110


P
di 78
di = 78. So d¯ =
P
Here = = 13. Therefore
n 6

x̄ = a + d¯ = 300 + 13 = 313.

Answer 313.
3.2. MEAN 34

3.2.6. Short-cut method to find arithmetic mean of grouped data.

Let y be a quantitative variable taking on values y1 , y2 , y3 , · · · , yk , the corre-

sponding class frequencies being f1 , f2 , f3 , · · · , fk . Suppose that d is a new


yi − a
variable taking on the values d1 , d2 , d3 , · · · , dk such that di = , where
h
a is an arbitrary value and h is a common class width. Then

yi = a + hdi

or, fi yi = afi + hdi fi


X X X
or, fi y i = a fi + h f i di
X X
or, fi yi = an + h f i di
P P 
fi yi f i di
or, =a+h
n n
P 
f i di
∴ ȳ = a + h = a + hd¯ .
n

In above the following theorem is proved.

Theorem 3.2.1 Arithmetic mean is dependent on both origin and scale of

measurement.

Problem 3.4 [4, Example 3.5 of Page 134] Distribution of a group of workers

by their age (in years) is as follows.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Calculate their mean age by short-cut method.

Solution Let us produce the following Table 3.2.


3.2. MEAN 35

yi − a
Age (in years) Mid-value (yi ) Frequency (fi ) di = f i di
h
24.5 - 29.5 27 3 -2 -6

29.5 - 34.5 32 9 -1 -9

34.5 - 39.5 37 15 0 0

39.5 - 44.5 42 12 1 12

44.5 - 49.5 47 7 2 14

49.5 - 54.5 52 4 3 12
P P
Total – fi = 50 – fi di = 23
Table 3.2.

P
¯ f i di 23
Here a = 37, h = 5 and d = P = = 39.5. Therefore
fi 50

ȳ = a + hd¯ = 37 + 5 × 0.46 = 39.3.

Answer 39.3 years.

Note 3.1 Shortcut-cut method is also referred as coding method.

Note 3.2 (Drawback of short-cut method) Short-cut method is rewarding

only when a and h are chosen wisely and h is uniform throughout the distri-

bution.

Note 3.3 There are many others mean also. For example, pooled mean,

weighted mean etc.

3.2.7. Properties of arithmetic mean. The next theorems state some

properties of arithmetic mean.


3.2. MEAN 36

Theorem 3.2.2 The algebraic sum of the deviations of the values x1 , x2 , x3 ,

· · · , xn from their arithmetic mean x̄ is zero.


Pn
x1 + x2 + x3 + · · · + xn i=1 xi Pn
Proof. We have x̄ = = . So nx̄ = i=1 xi .
n n
Again, ni=1 x̄ = nx̄. Therefore
P

n
X n
X n
X
(xi − x̄) = xi − x̄ = nx̄ − nx̄ = 0.
i=1 i=1 i=1

Hence the statement.

Note 3.4 Above theorem implies that sum of the deviations from an arbi-

trary value other than the mean is nonzero. So the mid-value of any class of the

given distribution can taken as the assumed mean. But it is more convenient

to choose the mid-point of the class nearer to the center of the distribution as

the assumed mean, which has relatively large frequency.

Theorem 3.2.3 The arithmetic mean is sensitive to extreme value.

Theorem 3.2.4 (Pooled mean or combined mean) If a set consisting of m

observations x11 , x12 , x13 , · · · , x1m with mean x̄1 and a second set consisting

of n observations x21 , x22 , x23 , · · · , x2n with mean x̄2 , then the combined mean
mx̄1 + nx̄2
x̄c of all the m + n observations is x̄c = .
m+n

Theorem 3.2.5 (Least-square property of the mean) The sum of squared

deviation from the arithmetic mean is less than the sum of squared deviation

from any other value.

Proof. Let x̄ be the arithmetic mean of a set of observations x2 , x3 , · · · , xn

(xi − x̄)2 ≤ (xi − a)2 .


P P
and a be any constant. We have show that
3.2. MEAN 37

We can write that

X X X
(xi − a)2 = (xi − x̄ + x̄ − a)2 = ((xi − x̄) + (x̄ − a))2
X
(xi − x̄)2 + 2(xi − x̄)(x̄ − a) + (x̄ − a)2

=
X X
= (xi − x̄)2 + 2(x̄ − a) (xi − x̄) + n (x̄ − a)2 ,

by Theorem A.1.5
X
= (xi − x̄)2 + n (x̄ − a)2 ,
X
as (xi − x̄) = 0 by Theorem 3.2.2.

(xi − x̄)2 ≤ (xi − a)2 .


P P
Therefore

Theorem 3.2.6 Arithmetic mean is the most stable measure of central ten-

dency than any other measure.

Theorem 3.2.7 If a and b are constants such that x = a ± by, where x and

y are two variables assuming the values x1 , x2 , x3 , · · · , xn and y1 , y2 , y3 , · · · ,

yn respectively, then x̄ = a ± bȳ.

Theorem 3.2.8 The arithmetic mean x̄ of first n natural numbers is x̄ =


n+1
.
2

3.2.8. Weighted mean. The weighted arithmetic mean or simply

weighted mean, denoted by x̄w , of a set of observations x1 , x2 , x3 , · · · , xn

with respective weights w1 , w2 , w3 , · · · , wn is defined as

P
w1 x1 + w2 x2 + w3 x3 + · · · + wn xn w i xi
x̄w = = P .
w1 + w2 + w3 + · · · + wn wi
3.3. MEDIAN 38

3.2.9. Geometric mean. The geometric mean of n non-zero positive

values x1 , x2 , x3 , · · · , xn is denoted by G and defined as

1
G = (x1 .x2 .x3 . · · · .xn ) n .

Note 3.5 [4, Page 182] Geometric mean is used with numbers that tend

to increase or decrease geometrically rather than arithmetically, that is, each

number is same multiple of the preceding number.

3.2.10. Harmonic mean. The harmonic mean, denoted by H, of n

values x1 , x2 , x3 , · · · , xn is defined as

n n
H= =P .
1 1 1 1 1
+ + + ···
x1 x2 x3 xn xi
Note 3.6 [4, Page 186] Harmonic mean is used when rates are expressed as

x per y and x is a constant. For example, miles per hour, production per acre,

income per household etc.

3.3. Median

3.3.1. Median. Like mean median is also a central tendency. In general,

median is the middle most observations, when the observations or set of

values of a particular study are arranged in ascending or descending order of

magnitude.

Definition 3.3.1 [4, definition 3.3 of Page 145] The median, denoted by

m̃, is the value in a set of ordered observation that divides the whole set of

observations into two parts of equal size.


3.3. MEDIAN 39

The median is sometimes referred as positional average, as it lies in the

middle of the data set after the values in the set have been placed in the

ordered way.

3.3.2. Properties of median. Some properties of median are given be-

low.

(a) The median is unique.

(b) It is unaffected by the extremely large or small values.

(c) The median can be computed from distributions with open-ends classes

For example, less than 10, 25 or more, etc.

(d) Unlike the mean, the median can be obtained for all levels of data

except the nominal.

3.3.3. Disadvantages of median. Some disadvantages of median are

given below.

(a) An overall pooled median can not be obtained from a set of medians.

(b) Medians are less stable.

(c) Median can not be calculated for the nominal data.

3.3.4. Median for ungrouped data. To find the median for ungrouped

data, at first given data set should be arranged in ascending or descending

order.

Let x1 , x2 , x3 , · · · , xn be a data set with n observations. Suppose that

they are arranged in order of magnitude. Then the median m̃ is given by the
3.3. MEDIAN 40

formula


 n+1
th observation, if n is odd


2

m̃ = n  n 
 th observation + + 1 th observation
2 2

, if n is even.



2

Example 3.2 Suppose that we are given the ages (in years) of 7 boys as 7,

10, 5, 8, 4, 11 and 6. Arranging them in ascending order,

4, 5, 6, 7, 8, 10, 11.

Here 7 is lying in the middle of the ordered data set. So 7 is the median.

Example 3.3 Suppose that we are given the ages (in years) of eight boys as

7, 10, 5, 8, 4, 11, 12 and 6. Arranging them in ascending order,

4, 5, 6, 7, 8, 10, 11, 12.

Here two values namely 7 and 8 are lying in the middle of the ordered data
7+8
set. So here the median is the = 7.5.
2

3.3.5. Median for ungrouped distribution. If discrete frequency dis-

tribution is given in ungrouped form, then we compute the median following

the method illustrated in the problem below.

Problem 3.5 [4, Example 3.17] Result of a survey conducted among 100

families to know their family size produces the following distribution (see Ta-

ble 3.3).

Calculate the median family size.

Solution Let us construct the Table 3.4 using cumulative frequency.


3.3. MEDIAN 41

Family size 1 2 3 4 5 6 7 8 9

Number of families 2 6 12 18 19 15 11 11 6
Table 3.3.

Family size 1 2 3 4 5 6 7 8 9

Frequency, fi 2 6 12 18 19 15 11 11 6

Cumulative Frequency, Fi 2 8 20 38 57 72 83 94 100


Table 3.4.

Here the location of the median is the

1(100 + 1)
= 50.5-th ordered position.
2

But 50.5 is not an integer. So the the required median is

m̃ = 50-th observation + 0.5(51-th observation − 50-th observation)

= 5 + 0.5(5 − 5) = 5.

Alternative way. Since n = 100 is even, median


 
100 100
th observation + + 1 th observation
2 2
m̃ =
2
50 -th observation + 51-th observation
=
2
5+5
= =5
2

3.3.6. Median for grouped distribution. Let n be the total frequency.


n
The class in which -th observation lies is called the median class. To
2
calculate median m̃ the following theorem is used.
3.3. MEDIAN 42

Theorem 3.3.1 Let n be the total frequency. Then the median

h n 
m̃ = lm + − F(m)−1 ,
fm 2

where

lm = lower limit of the median class,

fm = the frequency of the median class,

F(m)−1 = cumulative frequency of the pre-median class and

h = width of the median class.

Proof. See [4, Page 151–153].

Problem 3.6 [4, Example 3.19 of Page 154] Distribution of a group of work-

ers by their age (in years) is as follows (see Table 3.5).

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Table 3.5.

Calculate their median age.

Solution Let us produce the following Table 3.6.


1(100 + 1)
Here n = 50. So in the Lm = = 50.5-th ordered position m̃ lies.
2
Thus 34.5 − 39.5 is the median class. Hence lm = 35.5, h = 5, F(m)−1 = 12

and fm = 15. Therefore

h n 
m̃ = lm + − F(m)−1 = 38.83.
fm 2
3.3. MEDIAN 43

Age (in years) Frequency (fi ) Cumulative frequency Fi

24.5 - 29.5 3 3

29.5 - 34.5 9 12

34.5 - 39.5 15 27

39.5 - 44.5 12 39

44.5 - 49.5 7 46

49.5 - 54.5 4 50

Total 50 –
Table 3.6.

Answer 38.83 years.

Problem 3.7 [4, Example 3.18] The rate of sales tax as a percentage of the

sales paid by 400 shopkeepers of a market during an assessment year ranged

from 0% to 25%. The distribution (see Table 3.7) of the tax payers by sales

tax paid is summarised in the form of a frequency distribution below taking

intervals of 5%.

Sales tax Frequency

00 – 05 75

05 – 10 128

10 – 15 100

15 – 20 68

20 – 25 29
Table 3.7.
3.4. QUARTILES, PERCENTILES AND DECILES 44

Compute the median value.

Problem 3.8 [4, Page151–152] The longevity (in years) of 40 rats as ob-

tained in an experimental set up is given below (see Table 3.8).

Life lengths Number of rats

1.45 – 1.95 2

1.95 – 2.45 1

2.45 – 2.95 4

2.95 – 3.45 15

3.45 – 3.95 10

3.95 – 4.45 5

4.45 – 4.95 3
Table 3.8. Longevity of rats’ life in years

Calculate median longevity of the rats.

3.3.7. Locating median graphically. Histogram is used to locate me-

dian graphically. For detail, go through [4, Page 155 – 156]. Also by ogive

median can also be located graphically (see [4, Page 157]).

3.4. Quartiles, Percentiles and Deciles

3.4.1. Quartiles. Quartiles are three such values related to a given data

set of distribution, usually denoted by Q1 , Q2 and Q3 , which divide the whole

data into 4 equal parts. Among these


3.4. QUARTILES, PERCENTILES AND DECILES 45

(i) the first quartile, namely Q1 , is the value from which 25% of all obser-

vations are smaller and remaining 75% of all observations are greater.

(ii) the second quartile, namely Q2 , is the value from which 50% of all

observations are smaller and remaining 50% of all observations are

greater.

(iii) the third quartile, namely Q3 , is the value from which 75% of all

observations are smaller and remaining 25% of all observations are

greater.

Clearly, the second quartile is identical with the median.

3.4.1.1. Quartiles for ungrouped data. The point of location for the Qr -th
r(n + 1)
quartile value is Lr = -th ordered position. If
4
(i) Lr is an integer, then the particular numerical observation correspond-

ing to that point is chosen for the corresponding quartile.

(ii) Lr is not an integer, then the corresponding quartile is determined by

a rule, which is clarified by an example. Suppose that Lr = 5.25. Let

V5 = 5 − th ordered value and V6 = 6 − th ordered value. Then the

quartile Qr = V5 + 0.25(V6 − V5 ).

Problem 3.9 [4, Example 3.23 of Page 161] Calculate Q1 , Q2 and Q3 for

the following ordered observations :14, 17, 19, 23, 27, 32, 40, 49, 54, 59, 71,

80.

h 
3.4.1.2. Quartiles for grouped data. Qr = lr + Lr − F(r)−1 for r =
fr
r(n + 1)
1, 2, 3, where Lr = is the location of r-th quartile, F(r)−1 is the
4
cumulative frequency of the class prior to the r-th quartile class, fr is the
3.4. QUARTILES, PERCENTILES AND DECILES 46

frequency of the r-th quartile class and lr is the lower limit of the r-th quartile

class.

Problem 3.10 [4, Example 3.25 of Page 163] Compute the first and third

quartile from the following age distribution (see Table 3.9) of the 50 workers

of a company.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Table 3.9.

Solution Construct the following Table 3.10 using cumulative frequency.

Age (in years) Frequency (fi ) Cumulative frequency Fi

24.5 - 29.5 3 3

29.5 - 34.5 9 12

34.5 - 39.5 15 27

39.5 - 44.5 12 39

44.5 - 49.5 7 46

49.5 - 54.5 4 50

Total 50 –
Table 3.10.

Here n = 50. We have for first quartile

h 
Q1 = l1 + L1 − F(1)−1 ,
f1
3.4. QUARTILES, PERCENTILES AND DECILES 47

1(50 + 1)
and L1 = = 12.75. Observe that 12.75 lies in the class 34.5 − 39.5.
4
So l1 = 34.5, h = 5, f1 = 15 and F(1)−1 = 12. Hence

5
Q1 = 34.5 + (12.75 − 12) = 34.75.
15

We have for third quartile

h 
Q3 = l3 + L3 − F(3)−1 ,
f3

3(50 + 1)
and L3 = = 38.25. Observe that 38.25 lies in the class 39.5 − 44.5.
4
So l1 = 39.5, h = 5, f3 = 12 and F(3)−1 = 27. Therefore

5
Q3 = 39.5 + (38.25 − 27) = 44.19.
12

3.4.2. Locating quartiles geometrically. We can represent quartiles

using ogive. For detail, see [4, Page 164 – 165].

3.4.3. Percentiles. Quartiles are three such values related to a given data

set of distribution, usually denoted by P1 , P2 , P3 , · · · , P99 , which divide the

whole data into 100 equal parts. Among these

(i) P25 represents Q1 percentile.

(ii) P50 represents Q2 percentile or median.

(iii) P75 represents Q3 percentile or median.

Clearly, 100th percentile is identical with the median.

3.4.3.1. Percentiles for ungrouped data. The point of location for the Pr -th
r(n + 1)
quartile value is Lr = -th ordered position. If
100
(i) Lr is an integer, then the particular numerical observation correspond-

ing to that point is chosen for the corresponding percentile.


3.4. QUARTILES, PERCENTILES AND DECILES 48

(ii) Lr is not an integer, then the corresponding percentile is determined

by a rule, which is clarified by an example. Suppose that Lr = 5.75.

Let V5 = 5 − th ordered value and V6 = 6 − th ordered value. Then

the percentile Pr = V5 + 0.75(V6 − V5 ).

Problem 3.11 [4, Example 3.27 of Page 166] Calculate 29th and 75th per-

centiles for the following ordered observations : 11, 14, 17, 23, 27, 32, 40, 49,

54, 59, 71, 80.

h 
3.4.3.2. Percentiles for grouped data. Pr = lr + Lr − F(r)−1 for r =
fr
r(n + 1)
1, 2, 3, · · · , 99, where Lr = is the location of r-th percentile, F(r)−1
100
is the cumulative frequency of the class prior to the r-th percentile class, fr is

the frequency of the r-th percentile class and lr is the lower limit of the r-th

percentile class.

Problem 3.12 [4, Example 3.28 of Page 167] Compute the 30th percentile

from the following age distribution (see Table 3.11) of the 50 workers of a

company.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Table 3.11.

3.4.4. Deciles. Deciles are 9 such values related to a given data set of

distribution, usually denoted by D1 , D2 , D3 , · · · , D10 , which divide the whole

data into 10 equal parts. D5 = P50 = Q2 .


3.4. QUARTILES, PERCENTILES AND DECILES 49

3.4.4.1. Deciles for ungrouped data. The point of location for the Dr -th
r(n + 1)
decile value is Lr = -th ordered position. If
10

(i) Lr is an integer, then the particular numerical observation correspond-

ing to that point is chosen for the corresponding decile.

(ii) Lr is not an integer, then the corresponding decile is determined by a

rule, which is clarified by an example. Suppose that Lr = 5.77. Let

V5 = 5-th ordered value and V6 = 6-th ordered value. Then the decile

Dr = V5 + 0.77(V6 − V5 ).

Problem 3.13 [4, Example 3.30 of Page 168] Calculate 4th decile for the

following ordered observations : 14, 17, 23, 27, 32, 40, 49, 54, 59, 71, 80.

h 
3.4.4.2. Deciles for grouped data. Dr = lr + Lr − F(r)−1 for r = 1,
fr
r(n + 1)
2, 3, · · · , 9, where Lr = is the location of r-th decile, F(r)−1 is
10
the cumulative frequency of the class prior to the r-th decile class, fr is the

frequency of the r-th decile class and lr is the lower limit of the r-th decile

class.

Problem 3.14 [4, Example 3.31 of Page 169] Age distribution of 50 workers

of a company is given in the Table 3.12.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Table 3.12.

Compute the 4th decile of the given distribution.


3.5. MODE 50

3.5. Mode

3.5.1. Mode. Like mean median is also a central tendency.

Definition 3.5.1 [4, definition 3.4 of Page 170] The mode is denoted by Mo

and defined as the most frequently occurring value in a set of observation.

3.5.2. Properties of mode. Some properties of mode are given below.

(a) It is useful for non-quantitative data.

(b) It is not affected by extreme values.

(c) It can calculated when there exists open interval.

3.5.3. Mode for grouped distribution. The class in which mode lies

is called the modal class.

To calculate mode the following theorem is used.

Theorem 3.5.1 The mode of a grouped distribution is calculated by the

formula
 
∆1
M0 = l0 + h ,
∆1 + ∆2

where

l = lower limit of the modal class,

∆1 = frequency difference between the modal and pre-modal classes,

∆2 = frequency difference between the modal and post-modal classes,

h = width of the modal class.

Proof. See [4, Section 3.6.2].


3.6. COMPARISON AND RELATIONSHIP BETWEEN THE AVERAGES 51

Definition 3.5.2 If a series of observations has more than mode, then the

mode is said to be ill-defined.

Problem 3.15 [4, Example 3.34] Age distribution of some workers of a com-

pany is given in the Table 3.13.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Table 3.13.

Compute the modal age.

Solution Here 34.5 − −39.5 is the modal class. So l0 = 34.5, ∆1 = 15 − 9 =

16, ∆2 = 15 − 12 = 3 and h = 5. So the mode


 
6
M0 = 34.5 + 5 = 37.83.
6+3

3.5.4. Locating mode graphically. By using histogram mode is lo-

cated. For detail see [4, Page 171 – 176].

3.6. Comparison and Relationship between the Averages

3.6.1. Comparing different averages. See [4, Page 191 – 194]

3.6.2. Relationship between averages. See [4, Page 194 – 200]


CHAPTER 4

Measure of Dispersion

Literal meaning of dispersion is scatteredness of a distribution. By disper-

sion, we mean how the observations or values of a data set are scattered, or

varied or dispersed. For detail study the reader is referred to [4, Chapter 4]

and for more problems the reader is referred to [3, Chapter 25].

4.1. What is Measure of Dispersion

4.1.1. Measure of dispersion. Measures of central tendencies fail to

give any idea as to how the individual values differ from the central value,

more specifically, whether they are closely packed around the central value

or widely scattered away from it. The magnitude of such a variation of the

individual values from the central value is referred as dispersion.

Definition 4.1.1 [2, Page 121] The distance of different individual values

from the central value is called the dispersion.

According to Spigel, the degree to which numerical data tend to spread

about an average value is called the variation or dispersion of data.

4.1.2. Interpretation of the dispersion of a distribution. Dispersion

of a data set or distribution may be small, large or zero, interpretation of these

are as follows.

52
4.1. WHAT IS MEASURE OF DISPERSION 53

(i) small dispersion indicates the high uniformity of the observations in

the distribution.

(ii) large dispersion indicates the less uniformity of the observations in

the distribution.

(iii) zero (or absent) dispersion indicates the perfect uniformity of the ob-

servations in the distribution.

When all observations of a distribution are identical, then the dispersion be-

comes zero. In this situation, description of any single observation is sufficient

to know about the distribution. In real life situations it happens rarely.

4.1.3. Purpose of the measure of dispersion. It serves two purposes :

(i) To characterise a frequency distribution.

(ii) To compare between two or more frequency distributions.

4.1.4. Necessity to know the degree of dispersion of a data set.

To summarise a large data set, that is, to locate the center of data set or

distribution, we use the measure of central tendency. If there is any degree

of variation among the observations, that is, when all the observations are

not identical, then to get the precise descriptive summary of a data set both

central tendency and dispersion become necessary to know. By describing the

measure of central tendency and measure of dispersion, we can describe almost

all distributions with a reasonable degree of accuracy.

Knowing only the measure of central tendency may not be sufficient to

compare between two distributions. Two different distributions may have ex-

actly the same averages, but they may have differences in variability. For
4.2. CLASSIFICATION OF MEASURES OF DISPERSIONS 54

example, suppose that three students have obtained the following marks (see

the Table 4.1) in Statistics, Physics and Chemistry.

Serial Statistics Physics Chemistry Average Range

1 61 49 40 50 21

2 52 53 45 50 8

3 51 50 49 50 2
Table 4.1.

In Table 4.1 all three distributions are not identical. But their averages are

same. The differences lie in the dispersion of their scores. The first student

shows the largest variation in his scores. The second student shows relatively

less variation, while the third student shows least variation in his secured

scores among the stated three students. The scores of third one secured in

three subjects are very close to one another.

In conclusion, though the measures of central tendency do exhibit one of the

important characteristics of a distribution, but these averages do not describe

the fact that how the values or the observations of a data set are spread out

or scattered among themselves. So we need the measures of dispersions.

4.2. Classification of Measures of Dispersions

4.2.1. Types of dispersions. Dispersion is nothing but the variation of

an item around an average. It is mainly of two categories :

(i) absolute measures of dispersions

(ii) relative measures of dispersions.


4.3. RANGE AND COEFFICIENT OF IT 55

4.2.2. Absolute measures of dispersions. In this case, dispersions are

measured in original units. This type of measure of dispersion includes the

following categories :

(i) range

(ii) quartile deviation

(iii) mean or average deviation

(iv) variance

(v) standard deviation.

4.2.3. Relative measures of dispersions. When two or more data sets

are expressed in different units, then relative measure of dispersion is used.

This type of measure of dispersion includes the following categories :

(i) coefficient of range

(ii) coefficient of quartile deviation

(iii) coefficient of mean deviation

(iv) coefficient of variance.

Some important measures of dispersions are detailed below.

4.3. Range and Coefficient of It

4.3.1. Range. Difference between two extreme values of a set of observa-

tions is called the range. It is denoted by R.

Note 4.1 Range is always nonnegative, it can not be negative.


4.4. MEAN DEVIATION AND COEFFICIENT OF IT 56

4.3.1.1. Range for ungrouped data. In case of ungrouped data, if the highest

value and lowest value of a given data set are H and L respectively, then its

range is R = H − L .

Example 4.1 Range of the data set 5, 4, 3, 6, 3, 7 is R = 7 − 3 = 4.

4.3.1.2. Range for grouped data. Difference between the upper boundary

of the highest class and lower boundary of the lowest class.

Class boundary Frequency

4.5 – 14.5 37

14.5 – 24.5 27

24.5 – 34.5 16
Table 4.2.

Example 4.2 Range of the above grouped distribution (see Table 4.2) is

R = 34.5 − 4.5 = 30.

4.3.2. Coefficient of range. Coefficient of the range is denoted by CR


R
and defined as CR = × 100% .
H +L

Example 4.3 Coefficient of the range of the data set given in Example 4.1
4
is CR = × 100% = 40%.
7+3

4.4. Mean Deviation and Coefficient of It

4.4.1. Mean (or average) deviation. Mean deviations are defined for

two types of data - ungrouped and grouped.


4.4. MEAN DEVIATION AND COEFFICIENT OF IT 57

4.4.1.1. Mean deviation for ungrouped data. Let x1 , x2 , x3 , · · · , xn form a

sample of observations. Then the mean deviation about any arbitrary value a
P
|xi − a|
is Md (a) = . If we replace a by
n
P
|xi − a|
(i) mean x̄, then we get Md (x̄) = , which is called mean de-
n
viation about the mean.
P
|xi − m̃|
(ii) median m̃, then we get Md (m̃) = , which is called mean
n
deviation about the mean.
P
|xi − M0 |
(ii) mode M0 , then we get Md (M0 ) = , which is called mean
n
deviation about the mode.

Problem 4.1 [6, Episode 10] Calculate the mean deviation of the data set

given below.

1, 2, 3, 4, 5, 6, 7.

1+2+3+4+5+6+7
Solution x̄ = = 4.
7

X
|xi − x̄| = 3 + 2 + 1 + 0 + 1 + 2 + 3 = 12.

P
|xi − x̄| 12
Therefore Md (x̄) = = = 1.714.
n 7

Problem 4.2 [4, Example 4.1 of Page 210 – 211]

Food for thought 4.1 Why is the absolute value of (xi − x̄) taken in defi-

nition of mean deviation?

4.4.1.2. Mean deviation for grouped data. Let x1 , x2 , x3 , · · · , xn with fre-

quencies f1 , f2 , f3 , · · · , fn respectively form a grouped sample of observations.


4.4. MEAN DEVIATION AND COEFFICIENT OF IT 58

P
fi |xi − a|
Then the mean deviation about any arbitrary value a is Md (a) = .
n
If we replace a by
P
fi |xi − a|
(i) mean x̄, then we get Md (x̄) = , which is called mean
n
deviation about the mean.
P
fi |xi − m̃|
(ii) median m̃, then we get Md (m̃) = , which is called mean
n
deviation about the mean.
P
fi |xi − M0 |
(ii) mode M0 , then we get Md (M0 ) = , which is called
n
mean deviation about the mode.

Note 4.2 [4, Page 213] Among Md (x̄), Md (m̃) and Md (a) the mean deviation

about the median Md (m̃) is the smallest of all.

Problem 4.3 [4, Example 4.2] Age distribution of 50 workers of a company

is given in Table 4.3.

Age 24.5 - 29.5 29.5 - 34.5 34.5-39.5 39.5 - 44.5 44.5 - 49.5 49.5 - 54.5

Frequency 03 09 15 12 07 04

Table 4.3.

Compute the mean deviations about the

(a) mean

(b) median

(c) an arbitrary value 42

of the age distribution of the group of workers.

Solution Construct the following Table 4.4 and calculate the mean x̄.
4.4. MEAN DEVIATION AND COEFFICIENT OF IT 59

Age (in years) Mid-value (xi ) Frequency (fi ) fi xi |xi − x̄| fi |xi − x̄|

24.5 - 29.5 27 3 81 12.3 36.9

29.5 - 34.5 32 9 288 7.3 65.7

34.5 - 39.5 37 15 555 2.3 34.5

39.5 - 44.5 42 12 504 2.7 32.4

44.5 - 49.5 47 7 329 7.7 53.9

49.5 - 54.5 52 4 208 12.7 50.8

Total – 50 1965 – 274.2


Table 4.4.

P
f i xi 1965
(a) Clearly, x̄ = = = 39.3. Therefore from the Table 4.4,
n
P 50
fi |xi − x̄| 247.2
we obtain that Md (x̄) = = = 5.48.
50 50
(b) From Problem 3.6, we get median m̃ = 38.83. By constructing an
P
appropriate table we can calculate fi |xi − m̃|. Here

X X
fi |xi − m̃| = fi |xi − 38.8| = 272.32.
P
fi |xi − 38.8| 272.32
Therefore Md (m̃) = = = 5.45.
50 50
(c) By constructing a convenient table we can calculate the required mean

deviation about a = 42. Here

X X
fi |xi − a| = fi |xi − 42| = 285.
P
fi |xi − 42| 285
Therefore Md (42) = = = 5.7.
50 50

4.4.2. Coefficient of mean deviation. The ratio of mean deviation to

the corresponding central location (like mean, median, mode) is called the
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 60

coefficient of mean deviation. Coefficient of mean deviation is expressed

as percentage. If the mean deviation is calculated about

(i) the mean x̄, then the coefficient of mean deviation is denoted by
Md (x̄)
CMd (x̄) and defined as CMd (x̄) = × 100% .

(ii) the median m̃, then the coefficient of mean deviation is denoted by
Md (m̃)
CMd (x̄) and defined as CMd (m̃) = × 100% .

(iii) the mode M0 , then the coefficient of mean deviation is denoted by
Md (M0 )
CMd (M0 ) and defined as CMd (x̄) = × 100% .
M0

4.5. Quartile Deviation and Coefficient of It

4.5.1. Quartile deviation. Let Q1 be the first quartile and Q3 be the

third quartile. Then the quantity Q3 − Q1 is called the inter quartile range
Q3 − Q1
and the quantity is called the quartile deviation. Quartile devia-
2
tion is denoted by QD and is also known as semi-inter quartile range. So
Q3 − Q1
QD = .
2

4.5.2. Coefficient of quartile deviation. Let Q1 be the first quartile

and Q3 be the third quartile. Then the coefficient of quartile deviation is


Q3 − Q1
denoted by CQD and defined as CQD = × 100% .
Q3 + Q1

4.6. Variance and Coefficient of Variance

4.6.1. Standard deviation. standard deviation is the most important

measure of dispersion.
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 61

Definition 4.6.1 [2, Page 125] The positive square root of the mean of the

squared deviations of a set of values from their mean is called the standard

deviation. It is usually denoted by σ.

However, if the deviations are measured from any other value a other than

x̄, then it is called the root mean square deviation.

Food for thought 4.2 What is the minimum value of standard deviation?

In what situation does the standard deviation take this value?

4.6.1.1. Standard deviation for ungrouped data. Let x1 , x2 , x3 , · · · , xn be a

set of values. Then their standard deviation can be calculated by the formula
rP
(xi − x̄)2
σ= , where x̄ is the mean.
n
sP 2
x2i
P
xi
Theorem 4.6.1 σ= − .
n n

Proof. Left as an exercise.

Theorem 4.6.2 If we consider a as the assumed mean of the given data set

x1 , x2 , x3 , · · · , xn and assume di = xi − a, then their standard deviation can


sP  P 2
d2i di
be calculated by the formula σ = − .
n n

Caution 4.1 To calculate standard deviation for ungrouped data, we try to

avoid applying the formula given in Theorem 4.6.1. Because if we apply this

formula, then we need to perform more calculation.


4.6. VARIANCE AND COEFFICIENT OF VARIANCE 62

Problem 4.4 Calculate the standard deviation of the data set :

2, 3, 4, 5, 6.
rP
(xi − x̄)2
Hints Apply the formula σ = .
n

4.6.1.2. Standard deviation for grouped data. Let x1 , x2 , x3 , · · · , xn be a set

of values which occur with the frequencies f1 , f2 , f3 , · · · , fn respectively. Then


rP
fi (xi − x̄)2
their standard deviation can be calculated by the formula σ = ,
n
where x̄ is the mean.
sP 2
fi x2i
P
f i xi
Theorem 4.6.3 σ= − .
n n

Proof. Left as an exercise.

Theorem 4.6.4 If we consider a as the assumed mean of the given data

x1 , x2 , x3 , · · · , xn with frequency f1 , f2 , f3 , · · · , fn respectively and assume


xi − a
di = , then their standard deviation can be calculated by the formula
h

sP 2
fi d2i
P
f i di
σ =h× − ,
n n

where h is the class interval.

Caution 4.2 To calculate standard deviation for grouped data, we try to

avoid applying the formula given in Theorem 4.6.3. Because if we apply this

formula, then we need to perform more calculation.


4.6. VARIANCE AND COEFFICIENT OF VARIANCE 63

Note 4.3 [4, Equation 4.19 of Page 216] In the textbooks for higher studies,

in denominator of the formula for calculating sample standard deviation or

sample variance n is replaced by n − 1, since then degrees of freedom is n − 1.

4.6.2. Variance. Square of standard deviation is called the variance. It

is denoted by σ 2 .

Problem 4.5 Disprove the statement — variance is always greater than

standard deviation.

Disproof. Left as an exercise to the reader.

Theorem 4.6.5 [2, Theorem 2 of Page 134] Variance depends on scale not

on origin.

Proof. See [2, Theorem 2 of Page 134]

4.6.3. Coefficient of variance. It is denoted by Cv and defined as

σ
Cv = × 100% .

Problem 4.6 [6, Episode 11] Sales (in Taka) of 25 days of a company is

given by the following distribution (see Table 4.5).

Sales (Lakh) 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60

No. of days 03 06 11 03 02

Table 4.5.

Calculate its (a) standard deviation, (b) variance and (c) coefficient of

variance.
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 64

Solution Construct the following Table 4.6 and calculate the mean x̄.

Sales xi fi fi xi (xi − x̄)2 fi (xi − x̄)2

10 – 20 15 03 45 3.24 972

20 – 30 25 06 150 64.0 384

30 – 40 35 11 385 04.0 44

40 – 50 45 03 135 144 432

50 – 60 55 02 110 484 968

Total – 25 825 — 2800


Table 4.6.

P
f i xi 825
Here x̄ = P = = 33.
fi 25 sP r
fi (xi − x̄)2 2800
(a) Standard deviation, σ = P = = 10.58.
fi 25
(b) Variance, σ 2 = (10.58)2 = 111.9364.
σ 10.58
(c) Coefficient of variance, Cv = × 100% = × 100% = 32.06%.
x̄ 33
Alternative method. To solve the problem using the formula

sP 2
fi d2i
P
f i di
σ =h× − ,
n n

we construct the following Table 4.7. Let us consider the assumed mean,
xi − a
a = 35. Here h = 10. By the formula di = , we calculate di .
h
(a) Standard deviation,

sP 2 s  2
fi d2i
P
f i di 29 −5
σ =h× P − P = 10 × − = 10.58.
fi fi 25 25

(b) Variance, σ 2 = (10.58)2 = 111.9364.


4.6. VARIANCE AND COEFFICIENT OF VARIANCE 65

Sales xi fi di fi di fi d2i

10 – 20 15 03 -2 -6 12

20 – 30 25 06 -1 -6 06

30 – 40 35 11 0 00 00

40 – 50 45 03 1 03 03

50 – 60 55 02 2 04 08

Total – 25 – -5 29
Table 4.7.

P
f i xi
(c) Here x̄ = P = 33. Alternatively,
fi
P
f i di −5
x̄ = a + h P = 35 + × 10 = 33.
fi 25
σ 10.58
Therefore coefficient of variance, Cv = × 100% = × 100% = 32.06%.
x̄ 33

Exercise 4.1 [6, Episode 12] Sales (in kg) of 25 days of a grocery is given

by the following distribution (see Table 4.8).

Sales (kg) 01 – 05 06 – 10 11 – 15 16 – 20 21 – 25 26 – 30

No. of days 02 03 05 08 04 03
Table 4.8.

Calculate its (a) mean deviation, (b) standard deviation, (c) variance and

(d) coefficient of variance.

Solution Construct
P the following Table 4.9 and calculate the mean x̄.
f i xi 415
Here x̄ = P = = 16.6.
fi 25 P
fi |xi − x̄|
(a) Mean deviation, Md (x̄) = P
fi
4.6. VARIANCE AND COEFFICIENT OF VARIANCE 66

Sales xi fi fi xi |xi − x̄| fi |xi − x̄| fi (xi − x̄)2

0.50 – 05.5 03 02 06

05.5 – 10.5 08 03 24

10.5 – 15.5 13 05 65

15.5 – 29.5 18 08 144

20.5 – 25.5 23 04 92

25.5 – 30.5 28 03 84

Total – 25 415 —
Table 4.9.

sP
fi (xi − x̄)2
r

(b) Standard deviation, σ = P = = −−.
fi −
(c) Variance, σ 2 = (−)2 = −−.
σ −−
(d) Coefficient of variance, Cv = × 100% = × 100% = − − %.
x̄ −−

Note 4.4 Since in Exercise 4.1 it is also asked to find mean deviation and
rP
fi (xi − x̄)2
in it |xi − x̄| is used, we employ the formula σ = to find the
n
standard deviation and according to the demand of this formula we construct

the Table 4.9.

Problem 4.7 [6, Episode 13] The share prices (in Taka) during first 12 days

of a month at Dhaka Stock Exchange (DSE) and Chattogram Stock Exchange

(CSE) are given as follows (see Table 4.10).


On the basis of the given data in which share market will you invest? Why?
sP
(xi − x̄)2
Solution For Dhaka Stock Exchange (DSE), x̄ = 115, σ = P =
fi
σ
8.333. Cv = × 100% = 7.24%.

4.6. VARIANCE AND COEFFICIENT OF VARIANCE 67

DSE 105 120 115 118 130 127 109 110 104 112

CSE 108 117 120 130 100 125 125 120 110 135
Table 4.10.

sP
(xi − x̄)2
For Chattogram Stock Exchange (CSE), x̄ = 119, σ = P =
fi
σ
10.09. Cv = × 100% = 8.47%.

Since the coefficient of variance of Dhaka Stock Exchange is less than that

of Chattogram Stock Exchange. So the data of Dhaka Stock Exchange is

comparatively more uniform. Thus I will invest in Dhaka Stock Exchange.


CHAPTER 5

Probability

In this chapter, the reader is referred to study [4, 7, 5].

5.1. Probability

Definition 5.1.1 Probability is the measure of uncertainty.

68
Bibliography

[1] LEANSCAPE, URL:https://leanscape.io/what-is-attribute-data/.

[2] M. Abdul Aziz, Statistics First Paper, Seventh ed., The Angel Publications, Dhaka,

January 2022.

[3] B. S. Grewal, Higher Engineeering Mathematics, Fourty third ed., Khanna Publishers,

2015.

[4] M. N. Islam, An Introduction to Statistics and Probability, Fifth ed., Mullick & Brothers,

Dhaka, June 2022.

[5] R. E. Walpole, R. H. Myers, S. L. Myers and K. E. Ye, Probability and Statstics for

Engineering and Scientists, Ninth ed., Pearson, 2022.

[6] M. A. Nayeem, Measures of Dispersion, OnnoRokom Pathshala, URL: https://www.

youtube.com/playlist?list=PLxSt9YDBipm6UxKzXpeyzSsED94ukGEBJ, 2018.

[7] S. M. Ross, Introduction to Probability and Statistics for Engineers and Scientists, Fifth

ed., Elsevier, 2014.

69
APPENDIX A

Some Theorems Related to Sigma Notation

A.1. Sigma Notation

Definition A.1.1 Let the variable x having n values x1 , x2 , x3 , · · · , xn .

Then the sum of the values of the variable x

x1 + x2 + x3 + · · · + xn

Pn
is simply represented by i=1 xi .

Theorem A.1.1 [2, Page 12] If x is a variable and α is a constant, then

n
X n
X
αxi = α xi .
i=1 i=1

Theorem A.1.2 [2, Page 12] If x is a variable and α, β are two constants,

then
n
X n
X
(αxi − β) = α xi − nβ.
i=1 i=1

Theorem A.1.3 [2, Page 12] If x is a variable and α, β, γ are three constants,

then
n
X n
X n
X
αx2i x2i

− βxi + γ = α −β xi + nγ.
i=1 i=1 i=1

Theorem A.1.4 [2, Page 13] If x, y are two variables and α, β are two

constants, then
n
X n
X n
X
(αxi − βyi ) = α xi − β yi .
i=1 i=1 i=1

70
A.1. SIGMA NOTATION 71

Theorem A.1.5 [2, Page 13] If x, y are two variables and α, β are two

constants, then
n
X n
X n
X
2 2
(αxi − β) = α x2i − 2αβ xi + nβ 2 .
i=1 i=1 i=1

Theorem A.1.6 [2, Page 13] If x is a variable, then


n
!2 n
X X XX
xi = x2i + xi y j .
i=1 i=1 i̸=j

You might also like