You are on page 1of 178

BS127 STAT-E

B.Sc.

FIRST YEAR SEMESTER - I


STATISTICS

DESCRIPTIVE STATISTICS &


PROBABILITY

“We may forgo material benefits of civilization, but we cannot


forgo our right and opportunity to reap the benefits of the
highest education to the fullest extent…”
Dr.B.R.Ambedkar

Dr. B.R. AMBEDKAR OPEN UNIVERSITY


HYDERABAD
COURSE TEAM
Course Development Team (CBCS)
Editor
Prof. V.V. Haragopal
Associate Editors
Prof. Vedanabhatla Srinivas
Writers
Dr. K.S. Harish (Unit 1-3, 11)
Dr. D. Vijaya Lakshmi (Unit 7-10)
Dr. J. Srinivas (Units 4-6, 12)
Cover Design
G. Venkata Swamy

First Edition : 2018

(c) 2018, Dr. B.R. Ambedkar Open University, Hyderabad, A.P.

All rights reserved. No part of this book may be reproduced in any form without the permission
in writing from the University.

The text forms part of Dr. B.R.Ambedkar Open University Programme.

Further information on Dr. B.R.Ambedkar Open University courses may be obtained from
the Director (Academic), Dr. B.R.Ambedkar Open University, Road No. 46, Prof. G. Ram
Reddy Marg, Jubilee Hills, Hyderabad - 500033.

Web: www.braou.ac.in

E-mail: info@braou.ac.in

Printed on behalf of Dr. B.R.Ambedkar Open University, Hyderabad by the Registrar.

Lr. No. _____________________________________________________________

Printed at : ___________________________________________________________

II
PREFACE

The course material of Statistics is prepared in accordance with the guidelines issued
by the University Grants Commission and the Telangana State Commission of Higher Education
to suit the Choice Based Credit Format in the semester system. Keeping in view the diversity
of students of the open university with regards to their age, learning abilities and the minimum
academic pre-requirements, the course material is prepared in self-instructional mode with all
the inputs to help the students to learn by themselves. This text material is supplemented with
limited face to face instruction and audio/video support. From this year a separate Practical
Manual cum Record Book is prepared and supplied along with this course material. The students
are expected to attend the practical classes and practice the problem solving skills by solving
the problems given in the manual under the guidance of a counsellor. The text material of
theory part carries a weightage of 4 credits and the practical’s carry that of 1 credit.

In the open distance learning system the onus of learning rests with the learner. The
university provides an opportunity to realize the academic aspirations of the learner and facilitate
the learner to ‘LEARN TO LEARN’. In bringing this idea achievable to the student every
effort is made to make the learning a pleasure.

Any suggestions to improve the content of the course material are welcome and the
university would consider without fail.

III
CONTENTS

BLOCK/UNIT TITLE PAGE

Block-I: Data 1

Unit-1: Method of Collection and Editing of Primary Data 3-13

Unit-2: Method of Collection and Editing of Secondary Data 14-21

Unit-3: Classification and Tabulation of the Data 22-38

Block-II: Measures of Central Tendency and Dispersion 39

Unit -4: Mean, Median, Mode, GM, HM 41-71

Unit -5: Range, QD, MD, SD 72-95

Unit -6: Moments, Central and Non-central Moments, Skewness 96-113

Block-III: Addition Rule 115

Unit - 7: Definition and Basic Concepts 117-130

Unit - 8: Addition Theorem of Probability 131-141

Unit - 9: Some More Important Theorems 142-147

Block-IV: Baye’s Theorem 149

Unit - 10: Conditional Probability 151-158

Unit - 11: Independent Events 159-169

Unit - 12: Baye’s Theorem 170-176

Model Examination Question Paper 177-180

IV
BLOCK-1: DATA

The units included in this block are:

Unit-1: Method of Collection and Editing of Primary Data

Unit-2: Method of Collection and Editing of Secondary Data

Unit-3: Classification and Tabulation of the Data

1
UNIT-1: METHODS OF COLLECTION AND EDITING OF
PRIMARY DATA
Contents

1.0 Objectives

1.1 Introduction

1.2 Data: Definition and Examples

1.3 Primary Data Definition and Examples

1.4 Methods of collection and editing of Primary data

1.5 Summary

1.6 Check Your Progress - Model Answers

1.7 Model Examination Questions

1.0 OBJECTIVES

After studying this unit, you should be able to

• Define and identify Primary data

• Identify sources for primary data

• Collect primary data by using various methods.

• Edit primary data by using various methods of editing primary data

1.1 INTRODUCTION

We are living in a world where everything can be monitored and measured. If you
speak about anything without using a measurement your knowledge is very limited and not
satisfactory, when you can measure what you are speaking about, you know something about
it. Therefore, without data it is not possible to measure a phenomenon. Therefore, data collection
and analysis is playing vital role in various fields to take decisions. For example companies
maintain a variety of data about their employees, customers and business operations. Data on
employees are salaries, age and years of experience etc. This data can be obtained from
internal records of the company. It helps in understanding various facts regarding the employees.
Other internal records contain data on sales, advertising expense, distribution cost, inventory
levels etc. which can help the manager to know about facts like expenditure, sales and profits
which can help in decision making process. Obviously data types are many, broadly classified
into Primary and secondary data. This unit focuses on primary data, methods of collection and
editing of primary data.

2
1.2 STATISTICAL SURVEY

A statistical survey is nothing but a systematic search for truth. It seeks some authentic
answers to a problem which is quantifiable and, therefore amenable to statistical treatment.
The statistical inquiry has to pass through the following four stages.

1.2.1 Observation

This concerns of preliminary assessment of problem. It requires understanding of the


nature and magnitude of the problem, how it started and how long it has been there. Related to
problem some important facts may have to be collected and analysed and some hypothese can
be farmed. In this process there is a chance of finding the element of subjectivity. This can be
avoided by using the logic of deductive.

1.2.2 Laying Down of Hypotheses

The next step is lay down hypotheses on the basis of available information possibly by
using deductive logic. For example the business firm finds that the sales of one of its products
have been declining and it undertake to investigate the causes. The preliminary investigation
might yield following tentative results.

1. Quality of the raw material is not good.

2. Package is not attractive

3. Delays in the transportation.

4. Quality of the product is poor etc.

1.2.3 Prediction

It refers to anticipations about future deduced from the facts of preliminary investigation
or based on the opinion of experts. The verification of the truth would reveal how a particular
prediction was correct.

1.2.4 Verification

The final stage is that of the actual verification of the truth of the prediction. For this
purpose, experiments have to be conducted and observations should be recorded. This will
reveal whether the methods followed in prediction was sound. Deviations are within the toler-
ance limits may be ignored. But, if the differnce is significant the methods adopted for predic-
tion have to be modified and improved.

3
1.3 PLANNING OF STATISTICAL SURVEY

1.3.1. Purpose

Before a survey is conducted, one should be clear about the purpose underlying it. A
clear statement of the purpose is necessary when the survey is handed over to any agency.
Clear purpose statement will enable us the steps we need to follow.

1.3.2. Scope of survey

Scope of survey explains the various aspects have to be covered to achieve the given
objectives. For example in case of declining sales of the product, it is clearly stated whether the
quality aspect has to be covered, also whether the external forces affecting the sales have to be
considered.

1.3.3. Definition of terms

There may be various statements of purpose and scope of the work which should have
many things. To give a clean idea about the statement we avoid any mistakes due to use of
terminology the various terms should be properly defined.

1.3.4 Preparation of dummy reports

During the survey if the scope of survey is not clearly known either to the persons
conducting the survey or the agency entrusted with the task. The best course in such
circumstances is to get a dummy report prepared on the basis of some broad ideas about the
nature and scope of the work. Such reports will indicate the formats of the tables in which the
final results will give clear picture of the scope of work for those who have commissioned the
study and those who are going to conduct it.

1.4 COLLECTION OF DATA

A sound structure of statistical investigation is based on a systematic collection of data.


Data is generally classified into two groups, viz., 1. Primary data 2. Secondary data

1.5 PRIMARY DATA

Primary data refers to the data collected for the first time. Primary data are originated
by a researcher for the specific purpose of addressing the problem on hand. For example the
statistics collected by the Government organisation relating to the population is primary data for
that organisation since it has been collected to achieve a particular objective. A data set collected
by an organisation to know its customer perceptions from its customers is another example for
primary data. Primary data might be a qualitative or quantitative in nature. Quantitative data
can be obtained from the phenomenon which can be measured in terms of units. For example
height, weight, length, temperature etc. can be mesured by using appropriate measurements
like inches, kilograms, meters, centigrade etc. On the other hand when we have data sets

4
which can’t be measured in terms of numbers such data is called qualitative data. For example
honesty, braveness, kindness, satisfaction, perceptions etc. can’t be measured in terms of units.
These can be mesured by using qualitative scales. The measurements of such qualitative
phenomenon is called qualitative data sets.

1.5.1 Methods of of Collecting Primary Data

The following are the methods to collect primary data. By applying any of the following
methods we can collect

Primary Data

1. Direct Personal Interviews.

2. Indirect Oral Interviews.

3. Information from Correspondents.

4. Mailed Questionnaire Methods.

1. Direct Personal Interviews

This method is face to face interview, investigator will contact the respondents from
whom the information is to be obtained. The interviewer asks the questions to the respondents
pertaining to the survey and collects the required information. Thus, if a person wants to collect
data regarding perceptions on the working conditions of the employee of an IT Company in
Hyderabad, he will go to the company concerned, contact the employee working in the organisation
and obtain the desired information. The information collected in this manner is first hand and
also original in character, it is known as primary data.

Personal interviews can be classified into six categories. They are:

1. Door-to-Door Interview

In this method investigator contacts the respondents at their residence. This method
gives an opportunity to an investigator to uncover some real facts and figures otherwise it is
very difficult to obtain the data.

2. Mall intercept Interview

The investigator stationed at the entrance of the shopping mall invite the respondents to
participate in the structured interview process. It is a cost efficient approach in addition due to
the limited coverage space, a researcher can control the interview up to some extent which
otherwise is extremely difficult when the respondents are scattered in a huge geographical
area. A researcher can also use efficiently a huge respondent pool available at different mall
locations.

5
3. Office Interview

This method is appropriate when the researcher’s objective is to unfold the consumer
attitude on any industrial product, services or opinion on government’s new policies from
various categories of employees. To achieve this objective investigator will conduct an office
interview.

4. Self-administered questionnaire

No interviewer is involved, a series of questions are presented to the respondents


without the intervention of the investigator. This method eliminates investigators bias in any
form but a personal clarification to some of the questions of the survey by the investigator is
missing.

5. Computer assisted personal interview

In this method the respondents provide their information input through a computer
terminal by using key board. To collect large amounts of data investigator’s will use this method

6. Observation

Under this method investigator collects the information by mere observing the respondent
and he records the data. This is the most useful method in some marketing studies where
consumer behaviour is studied.

There are many merits and demertis of these methods, which are discussed as under.

Merits

1. Respondents feel free to give the required information when contacted personally.

2. Most of the times the data collected through this method is more accurate because of
the face to face Interaction.

3. This method also provides the scope for getting supplementary information from the
respondent, because while interviewing it is possible to ask some supplementary questions
which may be of greater use later.

4. The interviewer can change his language according to the educational status and level
of understanding of the respondents to avoid inconvenience and confusions of the
respondent.

Demerits

1. This method is expensive if the number of respondents spread widely in large


geographical area

2. There is a greater chance of personal bias and prejudice under this method as compared
to other methods.

6
3. The interviewers have to be thoroughly trained and experienced; otherwise they may
not be able to obtain the desired information. Untrained or poorly trained interviewers
may spoil the entire work.

4. This method is more time taking as compared to other methods. This is because
interviews can be held only at the convenience of the informants.

Conclusion

Though there are some demerits still we cannot say that it is not useful. It can be used
when the area of study is limited. Now a days cause of the advancement in the communication
system investigator can collect data through phone, email or by mailing questionnaires.

2. Indirectoral Interviews

According to this method the investigator meets the third parties generally called
‘witnesses’ who are capable of supplying necessary information. Generally this method can be
employed when the data to be collected is complex in nature and respondent may not be willing
to give information directly.

For example, when the researcher is trying to obtain data on drug addiction or on AIDS
disease or on the habit of alcohol drinking, respondent may not show interest to share information
with the investigator. In these situations researcher will approach third person or third parties
who know information regarding the respondent. This method of data collection is called indirect
oral interview. The accuracy of the data depends upon the ability of the investigator to draw
information from the third parties.

Merits

1. A wide area can be covered

2. It is economical in terms of time, cost and manpower.

3. Confidential information can be collected.

4. This method is relatively simple to under-stand.

Demerits

1. The degree of accuracy of information sometimes is less and it may not be reliable

3. Information from Correspondents

According to this method investigator appoints local agents or correspondents in


different places to collect information. These correspondents collect and pass the information
to the central office where data are processed. This method is generally adopted by newspaper
agencies. Correspondents who are posted at different places supply information to the head
office. This method is also adopted generally by the government departments in such cases
where regular information is to be collected from a wide area. For example, in the construction

7
of a wholesale price index numbers regular information is obtained from correspondents
appointed in different areas. This method is more economical and appropriate for extensive
investigation. But it may not always ensure accurate results because of the personal prejudice
and bias of the correspondents.

Merits

1. It is an economical method and the investigation is extensive.

2. This method is widely used to supply information on a continuous basis.

3. In case of skilled and experienced local agents, the data obtained are of good quality.

Demerits

1. The data obtained by this method may not be reliable because of personal prejudices.

4. Mailed Questionnaire Method

Questionnaire is the set of questions for collecting desired information from the
respondents. A questionnaire can be administered personally or mailed to the respondents. It is
an efficient method of collecting primary data when the investigator knows what exactly is
required and how to measure the variables of interest as: Behaviour - past, present, or intended,
Demographic characteristics - Age, Gender, Income and Occupation, Level of knowledge,
Attitudes and opinions. The questionnaire design process is a step-wise and structured process
which begins with converting study objectives into information needs and specifying the
populations.

Mertis

1. Data can be easily collected by using this method

2. Cost of data collection by using this method is less

3. This method takes less time for data collection.

4. By using this method it is easy to collect personal information of the respondent

Demertis

1. This method we can’t use on illiterates

2. Most of the times the collected data by this method may not be precise

3. We may not get complete information from the respondent

4. Very difficult to verify the accuracy of the data

8
1.5.2 Methods of Editing the Primary Data

Data editing is a process which reviews and adjusts the collected data. The main objective of
editing is to detect possible errors and irregularities and to improve the quality of the data. The
task of editing is a highly specialized one and requires great care and attention. We can’t
overlook because, if the collected data is not a representative data of the population we may get
useless findings which may not serve the purpose. If the data collected from the internal sources
and from published sources, editing data becomes simple. If the data collected from the surveys
then we need extensive editing. However, it should be noted that after collecting data editing is
mandatory to check its representativeness. Data editing can be done by manually or with the
help of computer. The following are some of the methods of primary data editing.

1. Interactive editing

2. Selective editing

3. Macro editing

4. Automatic editing

1. Interactove Edotomg

The term interactive editing is commonly used for modem computer-assisted manual
editing. Most interactive data editing tools applied at National Statistical Institutes (NSIs) allow
one to check the specified edits during or after data entry, and if necessary to correct erroneous
data immediately. Several approaches can be followed to correct erroneous data:

• Recollecting the data from the respondent.

• Compare the data given by the respondent with the data collected earlier from the
same.

• Compare the respondent’s data with the data of similar respondents.

• Use the common sense and your knowledge to gauge the data collected from the
respondents.

Interactive editing is a standard way to edit data. It can be used to edit both categorical
and continuous data. Interactive editing reduces the time frame needed to complete the cyclical
process of review and adjustment.

2 Selective Editing

In selective editing we identify the influential errors and outliers, in selective editing the
data will be splitting into two parts or streams. I. The critical stream 2. The non - critical
stream. The critical stream consists of records that are more likely to contain influential errors.
These critical records are edited in a traditional interactive manner. The records in the non-
critical stream which are unlikely to contain influential errors are not edited in a computer
assisted manner.

9
3. Macro Editing

There are two methods of macro editing

1. Aggregation Method

This method is followed in almost every statistical agency before publication: verifying
whether figures to be published seem plausible. This is accomplished by comparing quantities in
publication tables with same quantities in previous publications. If an unusual value is observed,
a micro-editing procedure is applied to the individual records and fields contributing to the
suspicious quantity

2. Distribution Method

Data available is used to characterize the distribution of the variables. Then all individual
values are compared with the distribution. Records containing values that could be considered
uncommon (given the distribution) are candidates for further inspection and possibly for editing.

4 Automatic Editing

In automatic editing records are edited by a computer without human intervention.


Prior knowledge on the values of a single variable or a combination of variables can be formulated
as a set of edit rules which specify or constrain the admissible values

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1.5.3 Define primary data?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
1.5.4 Write about different sources of primary data.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

10
1.5.5 What are the methods you employ to collect primary data.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
1.5.6 What is data editing? Why should we edit primary data?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

1.5 SUMMARY

In this unit you have defined and identified primary data. You have learnt to identify
sources of primary data. You have looked into various methods used to collect primary data.
You have learnt to edit primary data using various methods of editing primary data. To conclude
you have acquired skills to collect and edit primary data.

1.6 CHECK YOUR PROGRESS - MODEL ANSWERS

1.5.3 It is data collected for the first time.

1.5.4 Statistics collected by the government organization related to the population.

1.5.5 1. Direct personal interviews

2. Indirect oral interviews

1.5.6 A process which revices and adjusts the collected data.

1.7 MODEL EXAMINATION QUESTIONS

Section - A (Long Answers)

1. A company would like to conduct a survey to know about its customer’s satisfaction.
In this case which data (primary or secondary) will you recommend to the company?
What are the data collection methods you suggest to the company for data collection ?
Why? State the appropriate reasons.

11
Section - A (Short Answers)

2. State the preliminary steps you would take for planning a statistical enquiry.

3. What is macro editing? When is it appropriate? Explain with example.

12
UNIT - 2: METHODS OF COLLECTION AND EDITING OF
SECONDARY DATA
Contents

2.0 Objectives

2.1 Introduction

2.2 Secondary Data Definition and Examples

2.3 Sources of Secondary Data

2.4 Precautions in the Use of Secondary Data

2.5 Editing of Secondary Data

2.6 Summary

2.7 Check Your Progress - Model Answers

2.8 Model Examination Questions

2.0 OBJECTIVES

After studying this unit, you should be able to

• Define and identify Secondary data

• Identify sources for Secondary data

• Collect Secondary data by using various methods.

• Edit Secondary data by using various methods of editing primary data

2.1 INTRODUCTION

Any governmental or non-governmental departments aim is to produce undisputed and


up-to-date statistics about their departments or institutions as a whole. This requires up-to-date
and reliable data. These could be data that the organization itself collects (primary data) or data
that are available in the outside world (secondary data). The secondary data can, for instance,
be administrative sources maintained by other governmental organisations, and sources now a
days identified as ‘Big data’, such as data available on the internet and data generated by
sensors. Mindful of the costs and response burden involved in the collection of primary data,
more and more organisations such as NSSO, CSO, ICMR and NFHS aim to maximize the use
of secondary data for statistics purposes. The entire process of collecting already existing data
is generally referred to as the collection of secondary data. This unit discusses the various
sources of secondary data and their advantages and disadvantages of this approach from any
organisation’s point of view.
13
2.2 SECONDARY DATA

Secondary data are the data that have already been collected by someone else before
the current needs of the researcher. The researcher only uses the data with related reference
and never collect it from the field. When compared with the primary data, secondary data can
be collected easily with time and cost efficiency.

2.3 SOURCES OF SECONDARY DATA

Secondary data sources can be broadly classified into two categories. They are

1. Published sources

2. Unpublished sources.

1. Published Sources

The governmental, international and local agencies publish statistical data. Some of
them are explained below:

(A) International Publications

There are some international institutions and bodies like I.M.F, I.B.R.D, I.C.A.F.E and
U.N.O who publish regular and occasional reports. These reports provide lot of data to the
researcher. The accuracy and the quality of the data are unquestionable

(B) Official Publications of Central and State Governments

Several departments of the Central and the State Governments regularly publish reports
on a number of subjects. Some of the important publications are: The Reserve Bank of India
Bulletin, Census of India, Statistical Abstracts of States, Agricultural Statistics of India, Indian
Trade Journal, CSO, NSSO etc provide lot of database to any researcher.

(C) Semi - official Publications

Semi-Government institutions like Municipal Corporations, District Boards, Panchayats,


etc. Publish reports relating to socio economic data.

(D) Publications of Research Institutions

Indian Statistical Institute (I.S.I), Central Statistical Organisation (CSO), National Sample Surveys
Organisation (NSSO), Indian Council of Agricultural Research (LC.A.R), Indian Agricultural
Statistics Research Institute (I.A.S.R.I), etc. publish data regarding various phenomenon and
also present the findings of their research projects.

14
(E) Reports of Various Committees and Commissions Appointed by the
Government

Reports of various committees and commissions appointed by the Government as the


Raj Committee’s Report on Agricultural Taxation, Wanchoo Committee’s Report on Taxation
and Black Money, etc. are also important sources of secondary data.

(F) Journals and News Papers

Journals and News Papers are very important and powerful sources of secondary
data. Current and important materials on statistics and socioeconomic problems can be obtained
from journals and newspapers like Economic Times, Commerce, Capital, Indian Finance, Monthly
Statistics of trade etc.

2. UNPUBLISHED SOURCES

Unpublished data can be obtained from many unpublished sources like records
maintained by various government and private offices, from the university libraries. We can
gather data from these for the numerous research process by the researchers.

2.4 PRECAUTIONS IN THE USE OF SECONDARY DATA

Since secondary data have already been obtained, it is highly desirable that a proper
scrutiny of such data is made before they are used by the investigator. In this context Prof.
Bowley rightly points out that “Secondary data should not be accepted at their face value.”
Because it may be biased, inadequate sample size, substitution errors, arithmetical errors etc.
Even if there is no error such data may not be suitable to achieve the objectives of the researcher.
Therefore, before using the secondary data the investigators should consider the following
factors:

THE SUITABILITY OF DATA

The investigator must satisfy himself that the data available are suitable to achieve
formulated research objectives.

ADEQUACY OF DATA

If the data are suitable for the purpose of investigation then we must judge whether the
data can provide adequate information to the present study.

RELIABILITY OF DATA

The reliability of data depends upon the various methods and various measurements
adopted by the organisation for data collection.

15
Once data have been obtained the next step in the statistical investigation is to edit the
data. The chief objective of editing is to detect possible errors and irregularities. The task of
editing is a highly specialized one and requires great care and attention.. While editing data, the
following considerations should be borne in mind:

1. The data should be complete in every respect

2. The data should be accurate

3. The data should be consistent

4. The data should be homogeneous

2.5 EDITING OF SECONDARY DATA

1. Editing for Completeness

While editing, the editor should see that each schedule and questionnaire is complete in all
respects. He should see to it that the answers to each and every question have been furnished.
If some questions are not answered and if they are of vital importance, the informants should
be contacted again either personally or through correspondence. Even after all the efforts it
may happen that a few questions remain unanswered. In such questions, the editor should mark
‘No answer’ in the space provided for answers and if the questions are of vital importance then
the schedule or questionnaire should be dropped.

2. Editing for Consistency

At the time of editing the data for consistency, the editor should see that the answers to
questions are not contradictory in nature. If they are mutually contradictory answers, he should
try to obtain the correct answers either by referring back the questionnaire or by contacting,
wherever possible, the informant in person. For example, if amongst others, two questions in
questionnaire are (a) Are you a student? (b) Which class do you study and the reply to the first
question is ‘no’ and to the latter ‘tenth’ then there is contradiction and it should be clarified.

3. Editing for Accuracy

The reliability of conclusions depends basically on the correctness of information. If


the information supplied is wrong, conclusions can never be valid. It is, therefore, necessary for
the editor to see that the information is accurate in all respects. If the inaccuracy is due to
arithmetical errors, it can be easily detected and corrected. But if the cause of inaccuracy is
faulty information supplied, it may be difficult to verify it and an example of this kind is information
relating to income, age etc.

4. Editing for Homogeneity

Homogeneity means the condition in which all the questions have been understood in
the same sense. The editor must check all the questions for uniform interpretation. For example,

16
if the question of income is asked, if some informants have given monthly income, others annual
income and some others weekly income or even daily income, no comparison can be made.
Therefore, it becomes an essential duty of the editor to check-up that the information supplied
by the various people is homogeneous and uniform.

Choice Between Primary and Secondary Data

As we have already seen, there are a lot of differences in the methods of collecting
Primary and Secondary data. Primary data which is to be collected originally involves an entire
scheme of plan starting with the definitions of various terms used, units to be employed, type of
enquiry to be conducted, extent of accuracy aimed at etc. For the collection of secondary data,
a mere compilation of the existing data would be sufficient. A proper choice between the type
of data needed for any particular statistical investigation is to be made after taking into
consideration, the nature, objective and scope of the enquiry; the time and the finances at the
disposal of the agency; the degree of precision aimed at and the status of the agency (whether
government- state or central-or private institution or an individual).

In using the secondary data, it is best to obtain the data from the primary source as far
as possible. By doing so, we would at least save ourselves from the errors of transcription
which might have inadvertently crept in the secondary source. Moreover, the primary source
will also provide us with detailed discussion about the terminology used, statistical units employed,
size of the sample and the technique of sampling (if sampling method was used), methods of
data collection and analysis of results and we can ascertain ourselves if these would suit our
purpose. Now-a-days in a large number of statistical enquiries, secondary data are generally
used because fairly reliable published data on a large number of diverse fields are now available
in the publications of governments, private organizations and research institutions, agencies,
periodicals and magazines etc. In fact, primary data are collected only if there do not exist any
secondary data suited to the investigation under study. In some of the investigations both primary
as well as secondary data may be used.

17
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

2.5.1 Define secondary data.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2.5.2 Explain the different sources of secondary data and the precautions in using secondary
data.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2.5.3 What is editing of secondary data? Why is it required?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2.5.4 What are the different types of editing of secondary data?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

18
2.6 SUMMARY

There are two types of data, primary and secondary. Data which are collected first
hand are called Primary data and data which have already been collected and used by some-
body are called Secondary data. There are two methods of collecting data: (a) Survey method
or total enumeration method and (b) Sample method. When a researcher goes for investigating
all the units of the subject, it is called as survey method. On the other hand if he/she resorts to
investigating only a few units of the subject and gives the result on the basis of that, it is known
as sample survey method. There are different sources of collecting Primary and Secondary
data. Some of the important sources of Primary data are-Direct Personal Interviews, Indirect
Oral Interviews, Information from Correspondents, Mailed questionnaire method, Schedules
sent through enumerators and so on. Though all these sources or methods of Primary data have
their relative merits and demerits, a researcher should use a particular method with lot of care.
There are basically two sources of collecting secondary data- (a) Published sources and (b)
Unpublished sources. Published sources are like publications of different government and semi-
government departments, research institutions and agencies etc. Whereas unpublished sources
are like records maintained by different government departments and unpublished Theses of
different universities etc. Editing of secondary data is necessary for different purposes as -
editing for completeness, editing for consistency, editing for accuracy and editing for homoge-
neity.

It is always a tough task for the researcher to choose between primary and secondary
data. Though primary data are more authentic and accurate, time, money and labour involved in
obtaining these more often prompt the researcher to go for the secondary data. There are
certain amount of doubt about its authenticity and suitability, but after the arrival of many
government and semi government agencies and some private institutions in the field of data
collection most of the apprehensions in the mind of the researcher have been removed.

2.7 CHECK YOUR PROGRESS - MODEL ANSWERS

2.5.1 Data that have already been collected by someone else before the current needs of the
researcher.

2.5.2 Published sources and unpublished sources. Precautions are sustainability, adequacy,
reliability.

2.5.4 For completeness consistency, accuracy, homogeneity.

2.5.3 It is required because the secondary data may bebiased inadequate sample size,
substitution errors. Editing is to detect possible errors and innegularities.

19
2.8 MODEL EXAMINATION QUESTIONS

Section - A (Long Answers)

1. Write a note on published sources of secondary data.

2. Write a note on unpublished sources of secondary data.

3. Give an explanation of editing of secondary data.

Section - B (Short Answers)

4. Define secondary data and categorize.

5. Write a brief note on precautions in the use of secondary data.

20
UNIT - 3 : CLASSIFICATION AND TABULATION OF THE
DATA
Contents

3.0 Objectives

3.1 Introduction

3.2 Classification of data

3.3 Rules for Classification

3.4 Types of Classification

3.5 Tabulation

3.6 Summary

3.7 Check Your Progress - Model Answers

3.8 Model Examination Questions

3.0 OBJECTIVES

After studying this unit, you should be able to

• Understand the concept of classification

• Understand the concept of tabulation.

• Classify the data by using different classification techniques.

• Construct a frequency distribution to the quantitative and qualitative data.

• Interpret a frequency distribution.

3.0 INTRODUCTION

In the last unit we have learned about various types of data and data collection methods
from primary and secondary sources. After obtaining data if you present data as it is, does not
give any idea or insights of the problem under study. Therefore, we need to present data in a
systematic way. Such arrangement is called data classification and tabulation of the data. In
data classification we arrange data into mutually exclusive groups on the basis of certain
properties.

21
3.1 CLASSIFICATION OF DATA:

Arranging data in groups/classes on the basis of certain properties is called as data


classification. The main objectives of data classification are:

1. To condense the raw data.

2. To reveal the patterns of the variables,

3. To compare and find some facts between the groups.

4. To establish relations between the groups or variables.

5. To present data in orderly form for statistical analysis.

3.2 RULES FOR CLASSIFYING DATA


Raw data will not give us any information regarding the characteristics of the data.
The first step in statistical analysis of classification of data is to classify data and we should
follow the following guidelines:

a. Unambiguous - the classes should be rigid and unambiguous (clear). An unclear


classification can have severe consequences and can also impact all further statistical
treatments.

b. Exhaustive - every classified data must be exhaustive in the sense that they should
belong to one of the classes or categories.

c. Stability - in order to facilitate effective comparisons of data, it is important that the


classified data are stable. Classified data should be stable in the sense that the same
classification pattern must be adopted throughout the analysis. Adopting different
classification techniques for the same analysis would lead to ambiguity.

c. Suitable for the purpose - it is crucial to remember the objective of the report or
analysis while classifying data. Avoid classifying the data in a manner that does not suit
the purpose of the inquiry.

d. Flexibility - it is important to classify data in a manner that allows future modification.


Due to changing conditions, there may arise the need to change the statistical methods
and data classifications. In such a situation, a flexible classification of data would solve
many issues.

3.3 TYPES OF CLASSIFICATION


The following types of classification techniques we use for data classification. They are:

1. Temporal Classification

2. Spatial Classification

3. Quantitative Classification
4. Qualitative Classification

22
3.3.1. TEMPORAL CLASSIFICATION

If the classification basis is Time, then such classification is called Temporal


Classification. For example profits of a company over the years, population of a country over
the years, Revenue generating by an organisation over the months etc. represents temporal
classification. This classification is also known as chronological classification. The following is
an example of temporal classification:
Illustration 1:
Table 3.1

Year 2002 2003 2004 2005 2006 2007 2008 2009 2010
Population
25.8 43.2 54.2 63 68 70 72 77 80
(crore)

3.3.2. SPATIAL CLASSIFICATION

If the basis of the classification is Geographical regions, then such classification is


called Spatial Classification. For example population density in different cities, sales of a company
in different regions of a city, rain fall statistics of different regions etc. represent spatial
c1assification.

For example sales of TV’s in various regions given in the Table 3.2.

Table 3.2
Sales of TV sets in different regions
Region Nellore Vijayawada Warangal Kareemnagar Hyderabad

Sales of TV 1010 1300 1200 1121 6034

Table 3.3
Sales of products in different regions
(Figures in Numbers)
Region/
Hyderabad Secunderabad Khammam Nalgonda Warangal Total
products

Product –1 200 100 150 100 150 700

Product – 2 300 400 200 150 100 1150

Product – 3 600 500 400 300 200 2000

Total 1100 1000 750 550 450 3850

23
3.3.3 CLASSIFICATION OF QUALITATIVE DATA

If the classification basis is some attribute like honesty, kindness, gender, literacy, region,
occupation etc. which can’t be quantified, then that classification is called qualitative
classification. To classify qualitative data we construct a frequency distribution. Frequency
distribution represents various attributes observed in the data and number of times each attribute
is repeated which is also called frequency. The following table 3.4 shows the satisfaction of the
data collected randomly from the 113 guests stayed in the hotel and the customers visited a
hotel.

Table 3.4

Very much Neither satisfied Very much


Satisfaction rating Dissatisfied Satisfied
dissatisfied nor dissatisfied satisfied

No. of guests 13 12 34 42 12

3.3.4 QUANTITATIVE CLASSIFICATION

The classification of data is based on some characteristics which can be measured


such as height, weight, distance, income etc. can be quantified. The classification based on
such phenomenon is called quantitative classification. The following Table 3.5 shows a frequency
distribution.

Table 3.5

No. of children 0 1 2 3 4

No. of families 15 34 76 23 10

3.3.4.1 FREQUENCY DISTRIBUTION

One of the most common ways of grouping data is through the use of frequency
distribution. A frequency distribution divides data into several ordered nonoverlapping classes
or groups or categories. Raw data collected needs to be grouped together in a meaningful way.
One of those meaningful presentations of the data is frequency distribution. The number of
observations in each class is known as frequency. A table which represents values or classes
and their frequencies is called frequency distribution. A frequency distribution tells us how
values are distributed over the classes. We need to understand the process of converting data
into a frequency distribution.

3.3.4.2 FREQUENCY DISTRIBUTION FOR A QUALITATIVE DATA

To summarize qualitative data we construct frequency distribution. In this case we


identify these attributes which are repeating themselves continuously in the data and we arrange
them in a table with corresponding frequency. Here frequency is number of times each attribute
repeated, which is called a frequency distribution.

24
Example 3.1

For example guests staying at a 3 star hotel were asked to rate the quality of their
accommodations as being excellent, above average, average, below average or poor.
The ratings provided by a sample of 25 guests are shown below.

Below average Average Above Average Above Average Above Average Above Average
Above Average Below Average Below Average Poor Poor Above Average
Excellent Above Average Average Above Average Average Above Average
Average Excellent Above Average Above Average Below Average Below Average

Frequency Distribution

Rating Frequency

Poor 2

Below Average 4

Average 3

Above Average 9

Excellent 2

From the frequency distribution, we can observe that the most of the guests visiting
that hotel feel that the quality of the service provided by the hotel staff is above average since
the frequency of the rating above average is the highest.

3.3.4.3 RELATIVE FREQUENCY DISTRIBUTION

We might want to compare the various opinions of the guests in the example or we
might express the same information instead of frequencies with relative importance of each
rating, we convert the above frequency distribution into a relative frequency distribution. We
calculate each rating relative frequency by dividing the respective category’s frequency by the
total number of observations. The sum of the relative frequency equals to one. The following
table is the relative frequency distribution of the example

Relative Frequency Distribution

Frequency Relative
Rating
Frequency

Poor 2 0.10

Below Average 4 0.20

Average 3 0.15

Above Average 9 0.45

Excellent 2 0.10

25
We can easily convert the relative frequencies into percent frequencies by multiplying
relative frequencies by 100. For instance the percent of respondents rated service quality as
above average is 45% and only 10% of the respondents rated the service quality offered by the
respondents is excellent.

3.3.4.4 FREQUENCY DISTRIBUTION FOR QUANTITATIVE DATA

Case I: If the data set consists repeated values

For a quantitative data having few values which are repeating continuously within the data set
can be expressed with a discrete frequency distribution. In this case to summarize such data
we can arrange values in ascending order and count their frequencies by using tally bars.

Example: 2

The following data represents number of accidents observed in a busy centre of a


city during thirty days (one month). Construct a frequency distribution to summarize the data.

Number of Accidents per day

0 1 1 1 1 2 2 2 0 0

0 0 0 0 0 0 0 0 1 1

3 0 0 0 0 0 0 0 0 0

Solution: The above data consists 30 observations, but it consists only four values which are
continuously repeating themselves. In this case to summarize above data we can use the
following frequency distribution. Which is also called discrete frequency distribution

3.4.4 Case II: If the data set consist values without more repetitions:

In this case to construct a frequency distribution we use class intervals. To construct


such distribution we need the following terminology.

3.4.4.1 CLASS INTERVAL

To construct a frequency distribution the entire range of the data can be divided into
various subranges. Each subrange is known as a class. Each class can be represented by two
values. They are lower limit and upper limit. Lower limit represents the smallest possible
value of the class and Upper limit represents the largest possible value of the class. The

26
length of the class is called class interval, it can be determined by calculating the difference
between Lower limit and Upper limit. The number of observation in each class is called class
frequency

For example in the following frequency distribution 10-20, 20-30, 30-40, and so on are
called classes. 10, 20, 30, … are called lower limits of the first, second, third classes etc.
respectively. 20, 30, 40, … are called upper limits of the first, second, third classes etc.
respectively. The difference between upper limit and lower limit of the classes are called
lengths of class intervals 20-10 = 10, 30-20 = 10 etc.

Class Interval Frequency

10 – 20 8

20 – 30 12

30 – 40 15

40 – 50 8

50 – 60 6

60 – 70 2

3.4.4.2 INCLUSIVE CLASS INTERVALS

In a frequency distribution if the upper limits of the class included in the same class
then the class intervals are called inclusive class intervals. For example, in the following frequency
distribution the upper limit of the first class 9 included in the same class, 19 is the upper limit of
the second class included in the second class and so on.

Class Interval Frequency

0–9 8

10–19 12

20–29 13

30–39 6

3.4.4.3 EXCLUSIVE CLASS INTERVALS

In a frequency distribution if the upper limit of the class excluded from that class then
the class intervals are known as exclusive class interval. For example in the following frequency
distribution in the first class interval 0 – 10, 10 excluded from the first class. It is included in the
second class. Similarly from the second class 20 excluded and included in the third class and so
on.

27
Class Interval Frequency

0–10 8

10–20 12

20–30 13

30–40 6

We have various forms of frequency Distributions

a) Open End Frequency Distribution

b) Frequency Distribution with Unequal Class Width

c) Cumulative Frequency Distribution

d) Relative Frequency Distribution

3.4.4.4 OPEN END FREQUENCY DISTRIBUTION

Open-end frequency distribution is the one which has at least one of its ends open.
Either the lower limit of the first class or upper limit of the last class or both are not specified.
The words “below” or “less than” and “above” or “more than” are used. In the former the
value extends to -  and in the latter to +  co. Example of such a frequency distribution is
given in Table.

Class Interval Frequency


Less than 10 8
10–20 12
20–30 13
30–40 6

Class Interval Frequency


0–10 8
10–20 12
20–30 13
More than 40 6

28
3.4.4.5 A Frequency Distribution with unequal class width

The classes of a frequency distribution may or may not be of equal width. A frequency
distribution with unequal class width is reproduced in table. Here, the width of 1st, 2nd and 5th
classes is 10, while that of 3rd is 5 and that of 4th is 15.
Class Interval Frequency

0–10 8

10–20 12

20–30 13

30–40 6

40–50 2

3.4.4.6 RELATIVE FREQUENCY DISTRIBUTION

For a quantitative data a relative frequency distribution identities the proportion of the
values that fall into each class, that is,

Class frequency

Class frequency
Class relative frequency = Total Number of values

Cumulative Frequency Distribution:

In a cumulative frequency distribution, the cumulative frequencies are calculated by


adding successive class frequencies. The cumulative frequency of a given class is equal to the
total frequencies of its preceding classes and the class frequency for which cumulative frequency
is calculated. If the cumulative frequency is of less than type will represent the total frequency
of all classes less than and equal to the class value which it relates. If it is more than type it will
represent the total frequency of classes more than and equal to the class value to which it
relates.

Example 1:

No. of defective Less than Cumulative Greater than Cumulative


No. of days
produced frequency (LCF) frequency (GCF)

0 18 18 30

1 5 18+5=23 30–18=12

2 3 23+3=26 12–5=7

3 2 26+2=28 7–3=4

4 1 28+1=29 4–2=2

5 1 29+1=30 2–1=1

29
Example 2:

Less than Cumulative Greater than Cumulative


Class interval Frequency
frequency (LCF) frequency (GCF)

0–10 8 8 39

10–20 12 8–12=20 49–8=31

20–30 13 20+13=33 31–12=19

30–40 6 33+6=39 6

Note: Where LCF = Less than Cumulative Frequency, GCF = Greater than Cumulative
frequency

The above table indicates quickly the number of people drawing less than or more than
a particular value. It may be noted that the cumulative frequencies are corresponding to the
class limits or class boundaries. The less than cumulative frequencies are corresponding to the
upper limits or upper boundaries of the class and the more than cumulative frequencies are
corresponding to the lower limits or lower boundaries of the class.

Less than Cumulative Frequency Distribution

Upper Limits Less than cumulative frequency

Less than 10 8

Less than 20 20

Less than 30 33

Less than 40 39

Greater than Cumulative Frequency Distribution

Lower Limits Greater than cumulative frequency

Greater than 0 39

Greater than 10 31

Greater than 20 19

Greater than 30 6

30
Construction of a Frequency Distribution

To construct a frequency distribution we need to follow the following steps

1. Find the range of the data by using the following formulas.

Range = Max Value – Min Value of the given data set.

2. Fix the number of class intervals. There is no fixed rules to obtain the number of class
intervals. As a rule of thumb, the number of class intervals should be not less than 5
and more than 15. The final decision is based on the discretion of the researcher.

3. Determine the width of the class interval by using the following formula

Range
Width of the Class Interval = Number of class int ervals

Example 3:

A company organised a training programme. After the first week, the company officials
evaluated the training programme. The scores (out of 100) of 40 employees are presented
below;

32 36 31 67 65 74 43 42 39 56 78 61 46
42 39 56 78 61 46 56 34 78 75 78 56 30
65 42 45 79 64 46 54 59 56 54 53 62 58
71 85 87 73 88 65 54 78 47 56 59

Construct a frequency distribution to the above data:

The first step in the construction of a frequency distribution is to find the range of
the given data.

Range = Max value – Min value = 88 – 30 = 58

Let the number of Classes are equal to 6

Range
Width of the Class Interval = Number of class int ervals = 58/6 = 9.83

for convenience, the width of the interval is rounded as off 10.

To construct frequency distribution, the class interval must start from the value lower
than or equal to the lowest number of the ungrouped data and must end at the value higher than
or equal to the highest number of the ungrouped data. In this case the lowest marks 30 and
highest marks 88. We can start the distribution from 30 and can end the distribution at 88 or
more with the number of class intervals 6 represented in the following table.

31
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

3.4.5.1 Distinguish the following, by giving at least two points of distinctions.

1. Simple and cumulative frequency distributions.

2. Exclusive and Inclusive class intervals.

3. Frequency distribution, relative frequency distribution.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3.4.5.2 Explain the following terms giving examples:

a) Ungrouped data b) Class mark c) Open end classes

d) Class limits e) Class boundaries f) Class frequencies

g) Tally bar h) Relative frequencies

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

32
3.4.5.3 What points are to be kept in mind while taking decisions for preparing a frequency
distribution in respect of a) The number of classes, and b) Width of the class interval?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3.4.5.4 Construct relative frequency distribution, less than and more than type cumulative
frequency distributions from the following data:

Class 10–20 20–30 30–40 40–50 50–60 60–70

Frequency 5 8 10 12 8 7

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

3.5 TABULATION OF DATA

A table or a statistical table is a systematic arrangement of data in columns and rows,


with a given predetermined and a well decided objective. A row of a table represents a horizontal
arrangement while a column represents a vertical arrangement of data. To explain the nature
of information given in a table, its rows and columns are designated by appropriate stubs and
captions (or headings or sub-headings) respectively. Presentation of data in a tabular form
should be simple, planned, unambiguous and logical.

“The logical listing of related quantitative data in vertical columns and horizontal rows
of numbers with sufficient explanatory and qualifying words, phrases and statements in the
form of titles, headings and explanatory notes to make clear the full meaning, context and the
origin of the data “A saying”.

Table 3.15 is based on hypothetical figures of exports and imports of country X with
country B’ for three years 1995, 1996 and 1997.

33
Table 3.15 Imports and Exports of India with country B during 1995-1997
(in crores of rupees)

Year 1995 1996 1997

Country Imports Exports Imports Exports Imports Exports

Nepal 60 70 50 60 40 50

Bangladesh 45 50 60 70 80 50

China 50 60 70 80 60 70

Singapore 60 70 70 80 80 90

Total 215 190 250 290 260 260

Note: Figures are quick estimates.

Source: Trade Bulletin, 1998, Ministry of Foreign Trade of X.

In this table it is clear that the purpose is to show the imports and the exports of India
vis-a-vis the rest of the world. Note that a particular entry of the table refers to a column and
a row. For example, an entry at the intersection of second row and fourth column indicates that
in 1996 India imported goods and services worth Rs.50 crore from country Nepal. This figure
then can be compared with other import and export figures to seek important interpretations.

Objectives of Tabulation

The major objectives of tabulation are

1. It simplifies the complex data by avoiding unnecessary information.

2. Data presented in the form of a table reveal the trend or pattern of data which otherwise
cannot be understood in a descriptive form of presentation.

3. Tabulation facilitates quick comparisons among its observations.

4. It helps as a reference for future needs.

Types of Tables

Basically, we have two types of tables. They are

1) Reference tables or general purpose tables

2) Text tables or special purpose tables.

34
1) Reference tables are tables we prepare for a general purpose and these tables used
for storing information with the objective of presenting data meaningfully. From these tables,
we can derive information which is also called secondary data. Examples are tables presented
by different government departments, ministries, Reserve Bank of India, Economic Surveys,
etc. are reference tables and are a routine work of these departments.

Population Census tables prepared by the Registrar General of India giving detailed
information on the demographic features of India is another important example. Students are
advised to consult the latest issue of “Economic Survey” which is issued every year along with
the union budget of India.

Text tables are the special type of tables. These tables are smaller in size and are
prepared from the reference tables. To study a particular aspect these tables will be prepared.
For example from the Population Census tables we may collect information regarding various
demographic factors like religion, age, gender, Income, languages they speak etc. Similarly
from various publications of Reserve Bank of India, we may be able to extract information, in
tabular form, on money supply, rate of interest offerred by banks for the last ten years or so.

Features of a Table

A table should have the following features.

1. Table number

A table should have its own identification. For example in case of report generation
based on the data stored in various tables it is mandatory to mention the table identification from
which we are generating the report. Therefore a table should be identified by a number. Generally
table number can be mentioned at the top of the table.

2. Title of the table

Title of the table is necessary to understand what is in the table. Title of the table should
be brief and precise, avoid lengthy sentences. Present the title with bold or capital letters.

3. Head note

It is also called prefatory note, is written just below the title. It shows contents and unit
of measurement like (rupees crore) or (lakh tonnes) or (thousand bales). It should be written in
brackets and should appear on right side top just below the title. However, every table does not
need a head note like number of students in each class.

4. Stubs

Stubs are used to designate rows. They appear on the left hand column of the table.
Stubs consist of two parts: a) Stub head describes the nature of stub entry. b) Stub entry is the
description of row entries.

35
5. Captions

Captions also called box heads, designate the data presented in the columns of the
table. It may contain more than one column heads, and each column head may be sub-divided
into more than one sub-head. For example, we can divide the students of a college into hostellers
and non-hostellers and then again into males and females. This will help us to know the number
of male hostellers in, say, first year, second year and third year.

6. Main body of the table

It is also called field of the table, is its most important and bulky part. It contains the
relevant numerical information about which a hint is already contained in the title of the table

7. Foot Note

Foot note is a qualifying statement put just below the table (at the bottom). Its purpose
is to caution about the limitations of the data or certain omissions. Source of data may be the
last part of a table, yet it is important. It speaks about the authenticity of the data quoted.

Taking all these points into consideration, the format of a hypothetical table is presented below:

Table Number and Title [Head or precfactory Note (if any)]

Caption

Sub heading Subhead Subhead

Column – head Column – head Column – head Column – head Column – head

Sub entries

Total (Columns)

Footnote:

Source Note:

Remarks:

Not available information should be indicated by the letter N.A, or by Dash (-) in the
body of the table Ditto marks (“), ‘etc’, and abbreviated forms should be avoided in the table.

Importance of Tables

Numerical information arranged in tabular form has distinct advantage over other forms of
presentation. First, tabulated data are easy to understand and interpret. Secondly, one can
make quick comparison between different characteristics, for example, ‘Are imports greater

36
than exports over all the three years?’ or ‘Are exports increasing?’ Thirdly, it opens doors for
further investigations. Fourthly, they have a more lasting impression on human mind than the
textual statements. Needless to say, that the statistical tables are used extensively in almost all
fields of human inquiry.

3.6 SUMMARY

In this unit you have understood the concepts of classification and tabulation. You have
learnt different classification techniques to classify the data. For quantitative and qualitative
data you have been taught how to construct a frequency distribution. Finally you have learnt
how to interpret a frequency distribution.

3.7 CHECK YOUR PROGRESS - MODEL ANSWERS

3.4.5.1 In simple frequency distribution, frequencies are tabulated class interval - wise. In
cumulative frequency distribution, frequencies of successive class intervals are added.

3.4.5.2 Example of 3.3.4.4.

3.4.5.3 Number of classes is between 5 and 15, width of the class interval is range divided by
number of class intervals.

3.4.5.4 LCF 5, 13, 23, 35, 43, 50

3.8 MODEL EXAMINATION QUESTIONS.

Section - A (Long Answers)

1) Distinguish between

a) Caption, stub-head and stub-entries

b) One-way and two-way tables

c) Reference tables and text tables

d) Column entry and row entry

e) Head note and foot note

2. Draw a blank table to show the number of candidates gender wise appearing in the
pre-university, first year, second year, and third year examinations of a university in
the faculties of Arts, Science, and Commerce in a certain year.

3. What is a statistical table? Explain clearly the essential features of a table.

37
BLOCK - II: MEASURES OF CENTRAL TENDENCY AND
DISPERSION
In this block, you will know about the concept of measures of central tendency and
measures of dispersion. In unit 4, you study measures of central tendency i.e. mean, median,
mode, GM and HM. In unit 5, we will disucss the measures of dispersion i.e., Range, QD, MD
and SD. In unit 6, we shall introduce the moments, central and non-central moments and
skewness.

The units included in this block are

Unit -4: Mean, Median, Mode, GM, HM

Unit -5: Range, QD, MD, SD

Unit -6: Moments, Central and Non-central Moments, Skewness

38
UNIT - 4: MEAN, MEDIAN, MODE, GM, HM
Contents

4.0 Objectives

4.1 Introduction

4.2 Mean

4.3 Median

4.4 Mode

4.5 Geometric Mean

5.6 Harmonic Mean

4.7 Exercise

4.8 Summary

4.9 Check Your Progress - Model Answers

4.10 Model Examination Questions

4.0 OBJECTIVES

After completion of this unit, you should be able to:

 Describe central tendency.

 Define and compute the various measures of central tendency viz., mean, median,
mode, geometric mean and harmonic mean for different forms of data, i.e., (i) raw
data, (ii) data arranged in the form of a frequency distribution, (iii) data arranged in the
form of grouped or continuous frequency distribution.

4.1 INTRODUCTION

One of the most important objectives of statistical analysis is to get one single value
that describes the characteristic of the entire mass of the data. Such a value is called central
value or an average. The word average is very commonly used in day-to-day conversation. For
example, we often talk of average boy in a class. When we say he is an average student it
means that he is neither very good nor very bad, just a mediocre type of student. However, in
statistics the term average has different meaning.

An average value is a single value within the range of the data that is used to represent
all the values in the series. It is also known as central value of central tendency.

39
Measure of Central Tendency

The following are the various measures of central tendency.

(1) Arithmetic Mean or Mean

(2) Median

(3) Mode

(4) Geometric Mean

(5) Harmonic Mean

4.2 MEAN

4.2.1 Mean

The most popular and widely used measure of central tendency is mean.

4.2.1 Calculation of Mean in Individual Series

Following two methods are used for calculating arithmetic mean of an individual series.

(i) Direct method

(ii) Short - cut method

(i) Direct Method:

The process of calculating arithmetic mean or mean in the case of an individual series
is very simple. Let x1 , x2 , x3 ,..., xn be n observations then mean is denoted by x and defined as

x1  x2  x3  ...  xn 
xi
i 1
x 
n n

Example 1: Marks obtained by 10 students in statistics are 52, 76, 70, 40, 56, 43, 65, 36, 48, 64.
Calculate the arithmetic mean.

10

x i
Solution: Mean marks of the students is x  i 1

52  76  70  40  56  43  65  36  48  64

10

550
  55
10

Mean marks of the student is 55.


40
(ii) Short-cut Method:

The arithmetic mean can also be calculated by short-cut method or indirect method.
This method reduces the amount of calculation. It involves the following steps.

(i) Presume any one value as an assumed mean (A), which is also known as working
mean or provisional mean or arbitrary average.

(ii) Find the deviation or difference of each value from the assumed mean d = x - A

(iii) Add all the deviations i.e. d .

d
(iv) Now apply the formula X  A 
n
Example 2: Let us solve the previous example by short-cut method.

i.e. calculation of mean marks of 10 students in satisfies by indirect or short-cut method:

52, 76, 70, 40, 56, 43, 65, 36, 48, 64

Marks Deviation from assumed mean

d = x - A where A= 56

52 -4

76 +20

70 +14

40 -16

56 0

43 -13

65 +9

36 -20

48 -8

64 8

 d  10

41
d
Mean = X  A 
n

10
 56 
10

= 56 - 1 = 55.

Mean marks of the students is 55, which is same as computed in the direct method.

4.2.2 Calculation of mean in Discrete Series

In discrete frequency series, there are frequencies corresponding to the different values
in the series. There are two methods of estimating mean of the discrete series.

(a) Direct method

(b) Short - cut method

(a) Direct Method: Formula to estimate mean in discrete series using direct method is

 fx  fx
X 
f N

where f is the frequency corresponding to x.

(b) Short-cut method: Formula to estimate mean in discrete series using short-cut method
is

 fd
x  A , where N  f , d  x  A .
N

Example 3: Find the arithmetic mean of the following frequency distribution using (a) Direct
method and (b) Short - cut method.

x 1 2 3 4 5 6 7
f 5 9 12 17 14 10 6

42
x f fx
1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42

 f  73  fx  299

 fx 299
X   4.09 .
f 73

(ii) In short-cut method mean is given by

 fd
x  A , where N  f , d  x  A
N

d=x-A
x f fx d
=x-4
1 5 -3 -15
2 9 -2 -18
3 12 -1 -12
4 17 0 0
5 14 1 14
6 10 2 20
7 6 3 18
f = 73 fd = 7

7 292  7
Mean = x  4    4.09 .
43 73

43
4.2.3 Calculation of Mean for Continuous Series (Grouped Data)

In continuous series (continuous frequency distribution) the procedure for calculating


arithmetic mean is same as in the case of discrete frequency distribution. The only difference
is that in continuous frequency distribution the frequencies are given corresponding to class
intervals. Using the mid-values of the classes and the corresponding frequencies, arithmetic
mean for the continuous frequency distribution can be calculated by using any of the methods
used in a discrete case.

(a) Direct Method

Formula for the mean is continuous series is given by

 fm
x , where m = mid-value of class interval.
f
(b) Short-cut Method
Formula for the mean in continous series is given by

x  A
 fd , where d= deviations of mid point = m - A.
f
Example: For the following data compute arithmetic mean by (i) direct method, (ii) short-cut
method.

Marks 0-10 10-20 20-30 30-40 40-50 50-60


No. of Students 5 10 25 30 20 10

(i) In direct method, formula for mean is given by

 fm
x , where m = mid-value of the class interval.
 f

Mid – value No. of students


Marks fm
m f
0-10 5 5 25
10-20 15 10 150
20-30 25 25 625
30-40 35 30 1050
40-50 45 20 900
50-60 55 10 550
f = 100 fm = 3300

44
x
 fm  3300  33
.
 f 100
Average marks = 33.

(ii) In short - cut method, formula for mean is

x  A
 fd
 f , where d= deviations of mid point = m - A.
Mid – value d = m-A
Marks f fd
m = m -35
0-10 5 -30 5 -150
10-20 15 -20 10 -200
20-30 25 -10 25 -250
30-40 35 0 30 0
40-50 45 10 20 200
50-60 55 20 10 200
f = 100 fd = -200

200
Mean = x  35 
100
= 35 + (-2)
= 35 - 2 = 33.
4.2.4 Merits and Demerits of Arithmetic Mean

Merits

1. It is rigidly defined.

2. It is easy to understand and easy to calculate.

3. If the number of items is sufficiently large, it is more accurate and more reliable.

4. It is a calculated value and is not based on its position in the series.

5. It is possible to calculate even if some of the details of the data are lacking.

6. Of all averages, it is least affected by fluctuations of sampling.

7. It provides a good basis for comparison.

Demerits

1. It can neither be obtained by inspection nor located through a frequency graph.

2. It cannot be in the study of qualitative phenomena not capable of numerical measurement


(i.e.) intelligence, beauty, honesty, etc.

3. It is very much affected by extreme values.

45
4. It cannot be calculated for open-end classes.

5. It may lead to fallacious conclusions, if the details of the data from which it is computed
are not given.

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. The following are the wages (Rs.) of the workers 215, 265, 275, 280, 412, 495, 672,
890, 1115, 1245. Calculate the mean wage of the workers by (i) direct method (ii)
short-cut method.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2. Following table gives the wages paid to 125 workers in a factory. Calculate the arithmetic
mean of the wages using (i) direct method (ii) short-cut method.

Wages (Rs.) 200 210 220 230 240 250 260


No. of
5 15 32 42 15 12 4
workers

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. Calculate the mean for the following frequency distribution using (i) direct method (ii)
short-cut method.

Class Interv al 0-8 8-16 16-24 24-32 32-40 40-48


Frequency 8 7 16 24 15 7

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

46
4.3 MEDIAN

Second measure of central tendency is median. It is defined as the value of the middle
item (or the mean of two middle items) when the items are arranged in an increasing or decreasing
order of magnitude. Method to determine the median is explained below.

4.3.1 Calculation of Median for Individual Series

The median is determined by using the following steps in individual series.

Step 1: Arrange the given observations either in ascending or descending order of magnitude.
Generally, we follow ascending order.

Step 2: N is the number of observations.

N 1
Case (i) If N is odd then median is the value corresponding to th item.
2

th
N  N 2
Case (ii) If N is even then median is the average   and   th items.
2  2 

Example: Find the median of the following items 5, 19, 40, 11, 55, 32, 21, 60, 58, 38, 30.

Solution: Arranging the given observations in ascending order (AO)

A.O: 5, 11, 19, 21, 30, 32, 38, 40, 55, 58, 60

N = number of items = 11 (odd order series)

N 1
Median = th item
2

11  1
 6.
2

Median is the 6th item i.e. 32.

Example: Find the median for the following data 50, 30, 5, 20, 10, 25, 15, 45, 35, 40.

Solution: A.O: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50

N = 10 (even)

N N  2 12
 5,  6
2 2 2

Median is the average of values corresponding to 5th and 6th items.

25  30
Median =  27.5 .
2

47
4.3.2 Calculation of Median in a discrete frequency distribution

In case of a discrete frequency distribution the procedure for determining median is


same as that of individual series. The following steps are involved in calculating the median.

N
Step 1:Find , where N   f
2

N
Step 2:Locate the cumulative frequency just greater than .
2

Step 3: The corresponding value of x is median.

Example: Obtain the median for the following frequency.

x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6

Solution:

x f Cumulative
frequency (cf)
1 8 8
2 10 18
3 11 29
4 16 45
5 20 65
6 25 90
7 15 105
8 9 114
9 6 120
f = 120

Here N   f  120

N 120
  60 .
2 2
N
The cumulative frequency just greater than is 65 and the value of x corresponding
2
to 65 is 5

 Median is 5.

48
4.3.3 Calculation of Median for a Continuous Frequency Distribution

In the case of continuous frequency distribution the following steps are involved to
calculate median.

Step 1: Obtain the cumulative frequencies (c.f’s).

N
Step 2: Locate c.f which is just greater than and it is called median class.
2

Step 3: Value of median is obtained by the following formula.

N  h
Median = l    c  .
 2  f

where l is the lower limit of the median class.

f is the frequency of the median class.

h is the magnitude of the median class.

c is the c.f. of the class preceeding the median class.

and N   f .

Example: From the following data calculate the median.

Class interval 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
Frequency 5 7 10 18 20 12 8 6 4 1

Solution:

Class Cumulative
Frequency
Interval frequency (cf)
0-5 5 5
5-10 7 12
10-15 10 22
15-20 18 40
20-25 20 60
25-30 12 72
30-35 8 80
35-40 6 86
40-45 4 90
45-50 1 91
N =  f = 120

49
N 91
  45.5 .
2 2
From the above table the cumulative frequency, just greater than 45.5 is 60 and the
corresponding class 20-25 is median class.

N  h
Median = l    c  .
 2  f

l = lower limit of the median class = 20

f = frequency of the median class = 20

h = magnitude of the median class = 25 - 20 = 5

c = c.f. of the class preceeding the median class = 40

5
Median = 20   45.5  40  = 21.375.
20

Example: Calculate the median from the following data.

Weight (in gms) 410-419 420-429 430-439 440-449 450-459 460-469 470-469
No. of apples 14 20 42 54 45 18 7

Solution: It is an example of inclusive series in which the upper class limit of first class finishes
at 419 but the lower class limit of second class begins at 420 thus causing a difference of 1.We
should convert it to the exclusive series by deducing 0.5 from the lower limits and adding 0.5 to
the upper limits.

Weight Frequency Cumulative


frequency (cf)
409.5-419.5 14 14
419.5-429.5 20 34
429.5-439.5 42 76
439.5-449.5 54 130
449.5-459.5 45 175
459.5-469.5 18 193
469.5-479.5 7 200
N = f = 200

N 200
  100
2 2

50
c.f just greater than 100 is 130. Therefore, 439.5-449.5 is the median class.

From the above table

l = 439.5, c = 76, h = 10, f = 54

N  h
Median = l    c  .
 2  f

10
 439.5  100  76 
54

= 439.5 + 4.44

= 443.94.

Merits and Demerits of Median

Merits

1. Median is not influenced by extreme values becasue it is a positional average.

2. Median can be calculated in case of distribution with open - end intervals.

3. Median can be located even if the data are incomplete.

4. Median can be located even for qualitative factors such as ability, honesty, etc.

Demerits

1. A slight change in the series may bring drastic change in median value.

2. In case of even number of items or continuous series, median is an estimated value


other than any value in the series.

3. It is not suitable for further mathematical treatment except its use in mean deviation.

4. It does not take into account all the observations.

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

4. Marks obtained by 9 students in statistics are 55, 48, 63, 78, 36, 45, 67, 59, 61. Find
median.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

51
5. Find the median size of the following series.

Size X 4 5 6 7 8 9 10
Frequency 6 12 15 28 20 14 5

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

6. Calculate the median for the following frequency distribution.

Marks 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
Frequency 7 15 24 31 42 30 26 15 10

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
7. Calculate the median from the following data.

Income (Rs.) 0-99 100-199 200-299 300-399 400-499 500-599


No. of Persons 10 18 25 12 8 3

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

52
4.4 MODE

Mode is the another measure of central tendency. The mode or the modal value is that
value in a series of observations or data which occurs with the greatest frequency.

Example: Find the mode of the following individual series of scores:

9, 9, 10, 11, 11, 11, 12, 14, 14, 17

Solution: Here are only ten observations and the observation 11 has the maximum frequency
3.

Therefore 11 is the mode.

4.4.1 Determination of Mode in Discrete Frequency Distribution

In the case of discrete frequency distribution mode can be located simply by inspection.
Here the variate having maximum frequency will be taken as mode.

Example: Determine mode from the following data.

x 10 12 14 16 18 20 22

f 4 6 10 11 21 10 5

solution: In the above discrete frequency distribution the variate value 18 has the maximum
frequency 21. Therefore, 18 is the mode of the given data.

4.4.2 Calculation of Mode in Continuous Frequency Distribution

If the data are given in class intervals then the following formula is used to calculate
mode.

f1  f0
Mode = l  2 f  f  f  c
1 0 2

where l = lower limit of modal class

f1 = frequency of modal class

f 0 = frequency of the class, prior to modal class

f 2  frequency of the class, subsequent to modal class

c = width of the modal class

53
Example: From the following data regarding the incomes of 100families, find out the average
income by means of mode.

Income No. of
families
Upto 1500 18
1500-2500 10
2500-4000 15 – f0
Modal class 4000-5000 25 – f1
5000-6000 12 – f2
6000-8000 11
8000-10000 7
Above 10000 2

From the given data, highest frequency is 25. The class corresponding to 25 is
4000-5000 and is called modal class.

f1  f0
Mode = l  2 f  f  f  c
1 0 2

l  4000, f1  25, f 0  15, f 2  12, c  1000

10
Mode = 4000   1000
23

= 4000 + (0.4347) 1000

= 4000 + 434.7

= 4,434.7.

Merits and Demerits of Mode

Merits

1. It is comparatively easy to understand.

2. It is simplest descriptive measure of average.

3. It is not affected by extreme items. It can be obtained even if the extreme values are
not given.

4. It can be determined for open-end distributions.

5. It can be located graphically while mean cannot be ascertained through graphs.

54
6. Mode has been defined as the most typical value of a distribution. Therefore, it is a
useful average for many practical situations, such as, average size of shoe, average
price of a commodity, the average type of dress, average wages and so on.

Demerits

1. It is not precisely defined.

2. It is not based on all the observations.

3. It is not capable of being handled algebraically as its value is not based on all the
observations.

4. The mode doesnot exist in many cases while there may be more than one mode in
other cases, i.e., it is not useful as an average in such situations.

5. The value of mode is significantly affected by the size of the class interval which is the
basis of grouping the frequencies.

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

8. Find the mode for the following data.

2, 8, 8, 3, 6, 7, 2, 8, 9, 2, 8

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
9. Calculate mode from the following data.

x 3 4 7 8 9 11 12
f 2 6 5 14 10 6 3

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

55
10. Calculate the mode for the following continuous frequency distribution.

Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 5 7 10 12 18 10 6 3 2 1

_____________________________________________________________________

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
4.4.4 Determination of Mode from Mean and Median Using an Empirical Relationship

In a symmetrical distribution mean, median and mode are identical, i.e., have the same
value. However, for moderately asymmetrical distributions, the values of mean, median and
mode are observed to have the following empirical relationship.

Mode = 3 Median - 2 Mean

If any two values out of the three are known, the third can be calculated by using the
above relation.

Example: In a moderately asymmetrical distribution, the mode and the mean are 64.2 and 67.4
respectively. Find the median.

Solution: The empirical relation among mean, median and mode is

Mode = = 3 Median - 2 Mean

64.2 = 3 Median - 2(67.4)

3 Median = 64.2 + 134.8 = 199.0

199
Median =  66.33 .
3

4.5 GEOMETRIC MEAN

The most important measures of central tendency are mean, median and mode becasue
of their wide usefulness, simplicity and general applicability. Under certain conditions other
measures of central tendency may be useful and we shall, therefore, consider the geometric
mean.

The geometric mean (GM) is the nth root of the product of n positive values. The
formula is

56
1/ n
GM =  x1. x2 ....xn  .

In case of frequency distribution xi / f i , i  1, 2,..., n the G.M. is

f f f 1/ n
GM =  x1 1 x2 2 ...xn n  , where n  f i .

Example: Find the geometric mean of 4, 16 and 27.

Solution:The geometric mean of the given values is

1/ 3
GM =   4 16  27   12

Example: The frequency table shows the number of goals scored in netball by Iris in 12 games
played.

No. of goals 1 2 3 4 5 6

frequency 2 3 2 1 2 2

Find the geometric mean of goals scored per game.

f f f 1/ n 2 2
Solution:GM =  x1 1 x2 2 ...xn n  = 1 ...6  , where n  f i .

= 2.862  3 goals per game.

Alternative method to compute geometric mean

Geometric mean is defined as the nth root of the product of n positive items. When the
number of items is three or more the task of multiplying the numbers and of extracting the root
becomes excessively difficult. To simplify calculations logarithms are used. Geometric mean
then is calculated as follows:

log x1  log x2  ...  log xn


log GM =
N

  log x 
GM = Antilog  N 
 

  f log x 
In discrete series, GM = Antilog  , N   f
 N 

  f log m 
In continuous series, GM = Antilog  N
 , where m = mid-value.
 

57
Example: Find the GM of 2, 2.5, 7, 15.05 and 6.75

Solution:

x log x
2 0.30103
2.5 0.39794
7 0.84509
15.05 1.17753
6.75 0.82930
Total 3.55089

  log x 
GM = Antilog  N 
 

 3.55089 
= Antilog 
 5 

= 0.710182.

Example: Find the geometric mean for the following data.

x 110 115 118 119 120

f 4 11 21 6 2

Solution:

x f log x f . log x
110 4 2.0414 8.1165
115 11 2.0607 22.6677
118 21 2.0719 43.5099
119 6 2.0755 12.4530
120 2 2.0792 4.1584
f = 44 f log x = 90.9055

  f .log x 
Geometric Mean (GM) = Antilog  
  f 

 90.9055 
= Antilog   116.7 .
 44 

58
Example: Find the geometric mean for the following data.

Classes: 0-10 10-20 20-30 30-40 40-50

frequency: 1 2 6 6 5

Solution: Geometric mean for continuous frequency distribution is given by

  f log m 
GM = Antilog  
  f 

Mid-values Frequency log m f.log m


Classes
(m) (f)
0-10 5 1 0.699 0.699
10-20 15 2 1.1761 2.3522
20-30 25 6 1.3879 8.3874
30-40 35 6 1.5441 9.2646
40-50 45 5 1.6532 8.266
f = 20 f log m = 28.9692

  f log m 
GM = Antilog  
  f 

 28.9692 
= Antilog  
 20 

= 28.08.

4.5.1 Merits and Demerits of Geometric Mean

Merits

1. It is rigidly defined and thus can be precisely determined.

2. Its computation takes into account every observation.

3. It is suitable for further algebraic treatment.

4. It is not much affected by the fluctuations of sampling.

5. It is very useful in dealing with ratio, rates etc.

Demerits

1. It is defined only for positive values of the variable. GM is undefined if and only if the
observation is negative or zero.

2. It is difficult to understand.

3. Its computation is complex.

59
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

11. Daily income of ten families of a particular place is given below. Find geometric mean.
85, 70, 15, 75, 500, 8, 45, 250, 40, 36

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
12. Calculate the geometric mean of the following data.

x 50 60 70 80 90 100 110
f 2 4 7 10 9 6 2

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
13. Calculate the geometric mean of the following data.

Marks 0-10 10-20 20-30 30-40 40-50


No. of students 4 8 10 6 7

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

60
4.6 HARMONIC MEAN (H.M)

Harmonic mean, like geometric mean is a measure of central tendency in solving special
types of problems. Harmonic mean is the reciprocal of the arithmetic mean of the reciprocal
values of various items in the variable. It is a specified average which solves problems involving
variables expressed within ‘time rates’ that vary according to time.

Determination of Harmonic Mean:

Individual Series

N
 HM = 
1
  x 

1
where   x  = sum of the reciprocal values of the variable x.

N = number of items.

Example: From the data 5, 10, 17, 24, 30 calculate harmonic mean.

Solution:

 1
x Reciprocal values o f x   
 x
5 0.2000
10 0.1000
17 0.0588
24 0.0417
30 0.0333
1
N=5  x  = 0.4338
 

5
Harmonic mean = HM =  11.53 .
0.4338

61
Discrete Series

If the data is in discrete series, harmonic mean is calculated with the formula

N
HM =  1 N   f = total frequency..
  f . x  , where

Example:Find the harmonic mean of the marks obtained in a class test given below.

Marks 11 12 13 14 15
No. of students 3 7 8 5 2

Solution:

No. of 1 1
Marks (x) f
students (f) x x
11 3 0.0909 0.2727
12 7 0.0833 0.5831
13 8 0.0769 0.6152
14 5 0.0714 0.3570
15 2 0.0667 0.1334
N = 25 1
 f . x  = 1.9614

N 25
Harmonic mean =   12.75 .
 1 1.9614
  f  x 

Continuous Series

Harmonic mean for continuous series can be calculated by using the following formula.

N
HM = , where m = mid-value of the class, N   f .
 1
  f . m 

Example: From the following data compute the value of harmonic mean.

Class interval 10-20 20-30 30-40 40-50 50-60


Frequency 4 6 10 7 3

62
Solution:

Class Mid-values Frequency 1 1


(f) f
intervals (m) m m
10-20 15 4 0.066 0.267
20-30 25 6 0.040 0.240
30-40 35 10 0.028 0.286
40-50 45 7 0.022 0.156
50-60 55 3 0.018 0.055
N =f = 30  1
 f  m  = 1.004

N 30
Harmonic Mean (HM) = =  29.88 .
 1 1.004
  f . m 

4.6.1 Merits and Demerits of Harmonic Mean

Merits

1. It is rigidly defined

2. Its computation is based on all the observations

3. It is capable of further algebraic treatment.

4. It is also not affected much by sampling fluctuations.

5. It is very useful for measuring average relative changes in certain types of rates or
ratios.

Demerits

1. It is not easy to understand.

2. It is rather complicated to calculate.

3. It gives more weight to small observation and thus may lead to fallacious results.
However, in view of this property, the harmonic mean is more useful when more
weights are to be given to smaller observations.

63
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

14. Find the harmonic mean from the following data.

2574, 475, 75, 5, 0.8, 0.08, 0.005, 0.0009

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
15. Number of tomatoes per plant is given below. Calculate the harmonic mean.

No. of tomatoes per plant 20 21 22 23 24 25


No. of plants 4 2 7 1 3 1

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
16. Find the harmonic mean of the following data.

Class Interval 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 5 8 11 21 35 30 22 18

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

64
4.7 EXERCISE

1. Calculate mean for the following data.

52, 76, 70, 40, 56, 43, 65, 36, 48

(Ans. 54)

2. Calculate arithmetic mean of the following data.

Size of item 10 20 30 40 50 60 70
Frequency 8 12 18 26 16 12 8

(Ans. mean = 38)

3. Calculate the mean for the following frequency distribution.

Marks 0-20 20-40 40-60 60-80 80-100


Frequency 3 17 27 20 9

(Ans. 54)

4. Find the median from the following data.

80, 180, 150, 200, 250, 500, 350, 220, 400

(Ans. 220)

5. The following is the frequency distribution of ages of 670 students of a school. Compute
the median of the data.

Age (in years) 5 6 7 8 9 10 11 12 13 14


Frequency 25 45 90 165 112 96 81 26 18 12

(Ans. 9 years)

6. Find median for the following data.

Class Interval 10-20 20-30 30-40 40-50 50-60 60-70 70-80


Frequency 24 42 56 66 108 130 154

(Ans. 52.62)

65
7. From the following data determine mode.

6, 5, 3, 4, 3, 7, 8, 5, 9, 5, 4

8. Calculate mode for the following data.

Marks 10 20 30 40 50 60
No. of students 8 23 45 65 75 80

(Ans. 27.78)

9. Calculate mode from the following data.

Income (Rs.) 10-20 20-30 30-40 40-50 50-60 60-70 70-80


No. of Persons 24 42 56 66 108 130 154

(Ans. 71.34)

10. Daily income of ten families of a particular place is given below. Find out geometric
mean.

85, 70, 15, 75, 500, 8, 45, 250, 40, 36

(Ans. GM = 58.03)

11. Find the geometric mean of the following distribution.

Values 352 220 230 160 190


Frequency 48 10 8 12 15

(Ans. GM= 72.6)

12. Compute the GM of the following distribution.

Marks 0-10 10-20 20-30 30-40


No. of Persons 5 8 3 4

(Ans. GM = 12.58)

13. From the given data calculate H.M.

5, 10, 17, 24, 30

(Ans. 11.526)

66
14. From the following data compute the value of harmonic mean.

Marks 10 20 25 40 50
No. of students 20 30 50 15 5

(Ans. 20.08)

15. From the following data compute the value of harmonic mean.

Class interval 10-20 20-30 30-40 40-50 50-60


Frequency 4 6 10 7 3

(Ans. 29.88)

4.8 SUMMARY

One of the most important aspects of describing distribution is the central value around
which the observations are distributed. A statistical measure used for representing the centre or
central value of a set of observations is known as a measure of central tendency or measure of
location. There are three measures of central tendency in common use. Arithmetic Mean,
Median and Mode.

Arithmetic mean is obtained by dividing the sum of the given observations by their
number. Apart from having most of the characteristics of a good average, arithmetic mean
possesses some simple but important algebraic properties. Because of these properties arithmetic
mean is one of the most widely used average.

The median, is the value of the variable which divides the group of observations into
two equal parts.

Mode is another measure of location. The mode of a distribution is the value around
which the observations tend to be most heavily concentrated.

Mode = 3 Median - 2 Mean

The remaining two averages are geometric mean and harmonic mean. The geometric
mean of a set of n observations is defined as the nth root of their product. GM is often used in
the construction of index numbers.

The harmonic mean of a set of observations is the reciprocal of the arithmetic average
of the reciprocals of the observations. If all items in a series have the same value then the
arithmetic mean, GM and HM of the series coincide

i.e. AM = GM = HM

67
4.9 CHECK YOUR PROGRESS - MODEL ANSWERS

1. Mean wage is Rs.587

2. Mean = Rs.227.92

3. Mean = 25.404

4. Median = 59

5. median = 7

6. Median = 27.74

7. Median = 239.5

8. Mode = 8

9. Mode = 8

10. Mode = 44.28

11. GM = 58.03

12. GM = 80.02

13. GM = 22.06 marks

14. HM = 0.006

15. HM = 21.91

16. HM = 33.52

4.10 MODEL EXAMINATION QUESTIONS

Section - A (Long Answers)

1. What are measures of central tendency? Explain their objectives and functions.

2. What is arithmetic mean? Discuss its merits and demerits.

3. Explain the procedure of calculation of mean in three series.

4. Define median, mention its merits and demerits.

5. What is mode? Explain merits and demerits of mode.

6. Explain the relationship between various averages.

7. Briefly explain the merits and demerits of geometric mean.

8. Define harmonic mean, explain the merits and demerits of harmonic mean.

68
Section - B (Short Answers)

1. What is average?

2. Explain advantages of mean.

3. Define median.

4. What are merits of A.M?

5. What is mode?

6. What are the uses of mode?

7. Write the relation between mean, median and mode.

8. Define the geometric mean.

9. Define the harmonic mean.

10. Describe the uses of geometric mean and harmonic mean.

69
UNIT - 5: RANGE, QD, MD, SD
Contents

5.0 Objectives

5.1 Introduction

5.2 Characteristics of an Ideal Measure of Dispersion

5.3 Absolute and Relative Measures of Dispersion

5.4 Range

5.5 Quartile Deviation

5.6 Mean Deviation

5.7 Standard Deviation

5.8 Exercise

5.9 Summary

5.10 Check Your Progress - Model Answers

5.11 Model Examination Questions

5.0 OBJECTIVES

After studying this unit you should be able to

 Explain the concept of dispersion and the significance of measuring it.

 Define and compute different measures of dispersion viz., range, quartile deviation,
mean deviation and standard deviation for different types of data.

5.1 INTRODUCTION

The different measures of central tendency give us an idea of the central values whereas,
measures of dispersion are the measures of spread about an average. The central value may
be same in two or more distributions but may differ in respect to dispersion. The measures of
dispersion help us in studying the important characteristics of the distribution. It measures the
extent to which there are differences between individual observations and the central value.

Literal meaning of dispersion is ‘scatteredness’. We study dispersion to have an idea


about the homogeneity or heterogeneity of the data distributed.

70
5.2 CHARACTERISTICS OF AN IDEAL MEASURE OF
DISPERSION

A good measure of dispersion should possess as far as possible the following


characteristics.

1. It should be simple to understand and easy to compute.

2. It should be rigidly defined.

3. It should be based on all the observations.

4. It should be amenable to further mathematical treatment.

5. It should have sampling stability.

6. It should not be unduly affected by extreme items.

Method of Measuring Dispersion

For ascertaing the deviations from the central value there are certain specific measures
of dispersion. The important measures of dispersion are:

1. Range

2. Quartile deviation

3. Mean deviation

4. Standard deviation

Measures of Dispersion

Various measures of dispersion can be classified into two categories.

1. The measures which express the spread of observations in terms of distance between
the values of the selected observations. These are also termed as distance measures,
eg., range and quartile deviation.

2. The measures which express the spread of observations in terms of the average of the
deviations of observations from some central value eg., mean deviation and standard
deviation.

5.3 ABSOLUTE AND RELATIVE MEASURES OF DISPERSION

Measures of dispersion may be either absolute or relative. The absolute measures of


dispersion can be compared with another, only if the two belong to the same population. Absolute
measures of dispersion are always expressed in the same units of the variables in which the
original data are given. For example, they are expressed in cms, kgs, tonnes etc. depending
upon the unit of measurement of the original data. The absolute measures of dispersion do not
help us in cases where the variables are expressed in different units.

71
Relative measure of dispersion is called coefficient of dispersion. The relative measure
of dispersion is the ratio of the absolute measure of dispersion to the mean and is expressed as
a percentage.

Absolute measure of dispersion


Relative measure of dispersion =  100 .
mean

5.4 RANGE

Range is the simplest method of studying dispersion. It is defined as the difference


between the value of the largest observation and the smallest observation included in the
distribution. It is denoted by

Range = X max  X min  L  S , where L = largest observation, S = smallest observation.

A relative measure of range called the coefficient of range is given by

LS
Coefficient of range = .
LS

Example: Find the range and coefficient of range for the following data.

10, 8, 5, 10, 9, 14, 7

Solution: Largest value (L) = 14, Smallest value (S) = 5

Range = L - S= 14 - 5 = 9

L  S 14  5 9
Coefficient of range =   .
L  S 14  5 19

Merits and Demerits of Range

Merits

1. It is simple to understand and easy to calculate.

2. In certain types of problems like quality control, weather forecasts, share price analysis
etc., range is mot widely used.

Demerits

1. It is very much affected by the extreme items.

2. It is based only on two extreme observations.

3. It is not suitable for mathematical treatement.

4. It is not at all a reliable measure of dispersion.

72
5.5 QUARTILE DEVIATION

The range as a measure of dispersion has certain limitations. It is based on two extreme
observations. For this reason there has been developed a measure called the interquartile range
or quartile deviation. Quartiles divide the whole data of observations into approximately four
equal parts. Quartile deviation is definitely a better measure than the range, as it makes use of
50% of the data. But it ignores other 50% of the data. Quartile deviation is given by
Q3  Q1 Q3  Q1
Q.D = , coefficient of Q.D = Q  Q .
2 3 1

Example: Calculate quartile deviation and its coefficient for the following data.

20, 28, 40, 12, 30, 15, 50


Solution: Arranging given observations in ascending order.
12, 15, 20, 28, 30, 40, 50
th
N 1 7 1
Q1 = size of item = size of = 2nd item
4 4

 Q1  15

 N 1  38 
Q3 = size of 3   th item = size of   = 6th item
 4   4 

 Q3  40

Q3  Q1 40  15
Quartile deivation (Q.D) = =  12.5 .
2 2

Q3  Q1 40  15 25
Coefficient of Q.D =   0.455 .
Q3  Q1 = 40  15 55

Example: Calculate quartile deviation and its coefficient for the following data.

Marks 10 20 30 40 50 60

No. of students 4 7 15 8 7 2

Solution:

Marks No. of students (f) c. f


10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43
N = 43

73
 N 1 43  1
Q1 = size of   th item = size of = 11th item
 4  4

 Q1  20

 N 1  44 
Q3 = size of 3   th item = size of 3   = 33rd item
 4   4 

Q3  Q1 40  20
Quartile deivation (Q.D) =   10 .
2 2

 Q3  40

Q3  Q1 40  20 20 1
Coefficient of Q.D = Q  Q =    0.333 .
3 1 40  20 60 3

Example: The following is the age distribution of 799 workers.

Age–Group 20–25 25–30 30–35 35–40 40–45 45–50 50–55 55–60

No. of workers 50 70 100 180 150 120 70 59

Find QD and its coefficient.

Solution: To find QD and its coefficient, we first find the value fo Q1 and Q3 from the given
data.

Age–group No. of workers (f) c. f


20–25 50 50
25–30 70 120
30–35 100 220
35–40 180 400
40–45 150 550
45–50 120 670
50–55 70 740
55–60 59 799
N = 799

N 799
Q1    199.75 th item, thus 30-35 is the inter quartile calss.
4 4

N 
  c
4
Q1  l1    h
f
74
199.75  120
 30  5
100

398.75
 30  = 30 + 3.99 = 33.99  34 years.
100

3N 3  799
Q3 is the value of   599.25 th item which lies in 45-50 class.
4 4

 3N 
  c
4
Q3  l    h
f

599.25  550
 45  5
120

5  49.25
 45 
120

= 45 + 2.05 = 47.05  47 years.

Q3  Q1 47  34 13
Q.D =    6.5
2 2 2

Q3  Q1 47  34 13
Coefficient of QD = Q  Q  47  34  81  0.61 .
3 1

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. For the following data find range, quartile deviation. Also find their coefficients.

8, 2, 4, 15, 11, 10, 9

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

75
2. Calculate QD and its coefficient from the following data.

Weight (Kg) 60 61 62 63 65 70 75 80

No. of workers 1 3 5 7 10 3 1 1

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. Compute quartile deviation for the following data.

Marks 0–10 10–20 20–30 30–40 40–50 50–60 60–70

No. of students 6 5 8 15 7 6 3

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
Merits and Demerits of Quartile deviation

Merits

1. It is simple to understand and easy to calculate.

2. It is not affected by extreme values.

3. It can be calculated for data open end classes also.

Demerits

1. It is not based on all the items. It is based on two positional values of Q1 and Q3 and
ignores the extreme 50% of the items.

2. It is not amenable to further mathematical treatment.

3. It is affected by sampling fluctuations.

76
5.6 MEAN DEVIATION (MD)

Previously we have studied Range and Quartile deviation which are the positional
measures of dispersion. Their computation is not based on all the observations. In strict sense,
they are measures of dispersion as they do not measure the scatteredness in observations
around an average.

Mean deviation of a series is the arithmetic mean of the absolute deviations of various
items from some central value, such as mean, median and mode. Symbolically, the mean deviation
about mean, median or mode can be expressed as follows:

a. Mean deviation from mean X

1
M .DX 
N
 X X

b. Mean deviation from median M d :

1
M .DM d 
N
 X  Md

c. Mean deviation from mode M 0

1
M .DM 0 
N
 X  M0
where N = the number of observations

 XX = sum of absolute deviations taken from mean.

 X M d = Sum of absolute deviations taken from median.

 X M 0 = Sum of absolute deviations taken from mode.

In case of grouped data or frequency distributions, the above formula can be given as:

1
M .DX 
N
f X X

1
M .DM d 
N
 f X  Md

1
M .DM 0 
N
 f X  M0

77
Coefficient of Mean Deviation:

Mean deviation is an absolute measure of dispersion. The corresponding relative


measures called coefficient of mean deviation, it is obtained by dividing mean deviation by the
average or central value used for calculating it.

MD
Coefficient of MD =
Mean or Median or Mode

Example: Compute mean deviation and its coefficient from the following data.

10, 70, 50, 53, 20, 95, 55, 42, 60, 48, 80

Solution:

Absolute deviation
X
from mean |X–53|
10 43
70 17
50 3
53 0
20 33
95 42
55 2
42 11
60 7 \
48 5
80 27
x = 583  X-X = 190

Mean = X 
 x  583  53
N 11

1 1
M .DX   X  X = 190   17.27 .
N 11

Example: Compute the mean deviation from the mean for the following data.

Size 2 4 6 8 10 12 14 16

Frequency 2 2 4 5 3 2 1 1

78
Solution:
x f fx x  x  x 8 f x 8

2 2 4 6 12
4 2 8 4 8
6 4 24 2 8
8 5 40 0 0
10 3 30 2 6
12 2 24 4 8
14 1 14 6 6
16 1 16 8 8

N = 20 fx = 160 f(x–8) = 56

fx 160
X  8
N 20

Mean deviation (MD) =


56 f
 2.8 .
xx

N 20
Example: Calculate the mean deviation (MD) for the following data. Also find coefficient of
MD.

Class interval 0–20 20–40 40–60 60–80 80–100 100–120

Frequency 5 50 84 32 10 6

Solution:

Mid value f x x
CI f fx x  x  x  51.07
(x)

0–20 10 5 50 41.07 205.35


20–40 30 50 1500 21.07 1053.50
40–60 50 84 4200 1.07 89.88
60–80 70 32 2240 18.93 605.76
80–100 90 10 900 38.93 389.30
100–120 110 6 660 58.93 353.58

N = 187 fx = 9550  f x  x = 2697.37

X 
 fx  9550  51.07
N 187
1 1
MD =  f xx   2697.37   14.42
N 187
14.42
Coefficient of MD = MD/Mean =  0.28 .
51.07
79
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

4. Calculate the mean deviation and its coefficient for the following data.

2, 5, 3, 6, 3, 4, 4

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
5. Calculate mean deviation for the following series.

x 10 11 12 13 14

f 3 12 18 12 3

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
6. Compute mean deviation from mean, for the following data.

Marks 0–10 10–20 20–30 30–40 40–50 50–60 60–70

No. of students 6 5 8 15 7 6 3

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

80
Merits and Demerits of Mean Deviation

Merits

1. It is simple to understand and easy to compute.

2. It is rigidly defined

3. It is based on all items of the series.

4. It is not much affected by the fluctuations of sampling.

5. It is less affected by the extreme items.

6. It is flexible, because it can be calculated from any average.

Demerits

1. It is not very accurate measure of dispersion.

2. It is not suitable for further mathematical calculation.

3. It is rarely used.

4. Algebraic positive and negative signs are ignored. It is mathematically unsound and
illogical.

5.7 STANDARD DEVIATION

Introduction

The standard deviation concept was introduced by Karl Pearson in 1823, it is by far the
most important and widely used measure of central dispersion. Its significance lies in the fact
that it is free from those defects from which the earlier methods suffer and satisfies most of the
properties of a good measure of dispersion. Standard deviation is also known as root mean
square deviation for the reason that it is the square root of the mean of the squared deviation
from the arithmetic mean. Standard deviation is denoted by the small Greek letter  (read as
sigma).

The standard deviation measures the absolute dispersion (or variability of distribution;
the amount of dispersion or variability), the greater the standard deviation, the greater will be
the magnitude of the deviations of the values from their mean. A small standard deviation
means a high degree of uniformity of the observation as well as homogeneity of a series; a
large standard deviation means just the opposite. Thus, if we have two or more comparable
series with identical or nearly identical means, it is the distribution with the smallest standard
deviation that has the most representative mean. Hence standard deviation is extremely useful
in judging the representativeness of the mean.

81
Computation of Standard Deviation
Standard Deviation-Individual Series
In case of individual observations standard deviation may be computed by applying any
of the following two methods:
1. By taking deviation of the items from the actual mean.
2. By taking deviations of the items from an assumed mean.
Deviations taken from Actual Mean (Direct Method)
When deviations are taken from actual mean the following formula is applied:

1 2
S.D =      X  X  
n 

Deviations taken from Assumed Mean (Short-cut method)


When the actual mean is in fractions, say, it is 123.674 it would be too cumbersome to
take deviations from it and then obtain squares of these deviations. In such a case either the
mean may be approximated or else the deviations be taken from an assumed mean and the
necessary adjustment made in the value of the standard deviation. The former method of
approximation is less accurate and, therefore, invariably in such a case deviations are taken
from assumed mean.
When deviations are taken from assumed mean the following formula is applied:

 d 2  d 2 
 
    
 N  N  
   

Example 1: Calculate S.D. from the following data with the help of direct method.
X: 10 11 17 25 7 13 21 10 12 14
Solution:

82
Here, X 
X 
140
= 14 (A whole number)
N 10

1 2
S.D =      X  X  
n 

1 
   274
10 

  27.4 
= 5.23.

Example 2.

Calculate standard deviation with the help of assumed mean for the following data.

240, 260, 290, 245, 255, 288, 272, 263, 277, 251.

Solution. Calculation of Standard Deviation by the assumed mean method:

Let A = 264

 d 2  d 2 
 
    
 N  N  
   

2
d  2689,  d  1, N  10

83
 2689  1 2 
    
 10  10  

 268.9  0.01  16.398

Standard Deviation-Discrete Series

For calculating standard deviation in discrete series, any of the following methods may
be applied:

a. Actual mean method.(Direct method)

b. Assumed mean method. (Short-Cut Method)

c. Step deviation method

(a) Actual Mean Method. (Direct method)

When this method is applied, deviations are taken from the actual mean, i.e., we find
X  X and denote these deviations by x. These deviations are then squared and multiplied
by the respective frequencies. The following formula is applied


 fx , where x =  X  X 
N

However, in practice this method is rarely used because if the actual mean is in fractions
the calculations take a lot of time.

(b) Assumed Mean Method. (Short-Cut Method)

When this method is used, the following formula is applied:

 d 2  d 2 
 
    
 N  N   , where d = (X - A)
   

(c) Step Deviation Method

When this method is used we take deviations of midpoints from an assumed mean and
divide these deviations by the width of class interval, i.e., ‘i’. In case class intervals are unequal,
we divide the deviations of midpoints by the lowest common factor and use ‘c’ instead of ‘i’ in
the formula for calculating standard deviation is:

2
 fd 2  fd  
 
  
   i
 N  N  
 

84
 x  A
where, d  and i = class interval.
i

The use of the above formula simplifies calculations.

Example 3:

Use direct method to calculate the S.D. of the following discrete frequency distribution.

Solution: Computation of S.D. (Direct Method)

Here, we first calculate : X 


 fx  706  7.06
N 100

1 2
S.D =     f  X  X  
N 

 1 
   237.64
100 

  2.3764   1.54 .

85
Example 4:Calculate the standard deviation from the data given below.

Size of item 3.5 4.5 5.5 6.5 7.5 8.5 9.5

Frequency 3 7 22 60 85 32 8

Solution: Calculation of standard deviation

Size of fd2
f d = (x–65) fd
item (X)

3.5 3 –3 –9 27
4.5 7 –2 –14 28
5.5 22 –1 –22 22
6.5 60 0 0 0
7.5 85 +1 +85 85
8.5 32 +2 +64 128
9.5 8 +3 +24 72

N = 217 fd =+128 fd2 =362

2
 fd 2  fd  
 
  
  
 N  N  

2
 fd  362,  fd  128, N  217

 362  128 2 
    
 217  217  

 1.668  .348  1.149 .


Example 5. The annual salaries of employees are given in the following table:

Calculate the standard deviation of the series.

86
Solution: Calculation of Standard deviation

2
 fd 2  fd  
 
  
   i
 N  N  
 

2
Here,  fd  240,  fd  36, N  50, i  5

 240  36 2 
      5
 50  50  

 4.8  .5184  5  10.35

Standard Deviation-Continuous Series

In continuous series any of the methods discussed above for discrete frequency
distribution can be used. However, in practice it is the step deviation method that is most used.
The formula is

2
 fd 2  fd  
    
    i
 N  N  

 m  A
where d  , i = class interval.
i

87
Example 6: Calculate mean and standard deviation of following frequency distribution of
marks:

Solution: Calculation of mean and Standard deviation

X  A
 fd  i
N

118
 35   10  35  5.9  40.9
200

2
 fd 2  fd  
 
  
   i
 N  N  
 

 510  118 2 
      10
 200  200  

 2.55  .3481 10

= 1.4839 x 10 = 14.839.

Merits and Demerits of Standard Deviation

Merits

1. It is rigidly defined.

2. Its computation is based on all the observations.

88
3. It is amenable to further algebraic treatment which makes it the most important and
widely used measure of dispersion.

4. Among all the measures of dispersion, it is least affected by sampling fluctuations

5. s.d enables us to determine the reliability of means of two or more series having equal
means. In such a situation, a series having minimum S.D. will have the most
representative mean.

Demerits

1. S.D. is comparatively difficult to calculate.

2. It gives greater weight to extreme observations.

3. It is an absolute measure of dispersion and cannot be used for comparing variability


of two or more distributions expressed in different units.

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

7. Calculate the standard deviation for the following data.

10, 13, 17, 22, 27, 30, 31, 32

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
8. Calculate the standard deviation for the following data.

x 7 8 9 10 11 12

f 13 13 18 17 15 14

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

89
9. Calculate the standard deviation for the following data.

180– 190–
C.I 130–140 140–150 150–160 160–170 170–180
190 200

Frequency 22 44 66 90 77 51 31

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

5.8 EXERCISE

1. From the following data, find range and quartile deviation. Also determine their respective
coefficients.

78, 80, 80, 82, 82, 84, 84, 86, 86, 88, 88, 90

Ans. Range = 22, QD = 3.5, CR = 0.07, CQD = 0.04

2. Find out the value of quartile deviation for the following data.

Roll No. 1 2 3 4 5 6 7

Marks 20 28 40 12 30 15 50

Ans. QD = 12.5, CQD = 0.455

3. Calculate QD for the following data.

C.I 0–5 5–10 10–15 15–20 20–25 25–30 30–35 35–40

Frequency 4 5 6 10 11 9 4 1

Ans. QD = 6.458

4. Calculate mean deviation from mean for the following data.

5, 10, 15, 20, 25, 30, 35

Ans. MD = 8.5714

90
5. Compute mean deviation from the following series.

x 10 11 12 13 14

f 3 12 18 12 3

Ans. MD = 0.75

6. Find the mean deviation from the following data.

Size 0–10 10–20 20–30 30–40 40–50 50–60 60–70

Frequency 7 12 18 25 16 14 8

MD = 13.14

7. Using short-cut method and step-deviation method, obtain standard deviation of the
following data.

420, 440, 400, 420, 470, 480, 440, 480, 450, 500

Ans. 30.33

8. Data recorded on the number of seeds per fruit. Calculate the standard deviation.

6, 7, 9, 11, 12, 14, 17, 20, 25, 30

Ans. SD = 6.75

9. Calculate standard deviation from the following data.

32, 33, 35, 35, 37, 40, 40, 41, 42, 42

Ans. SD = 3.77

10. Calculate SD for the following data.

Size 6 7 8 9 10 11 12

Frequency 3 6 9 13 8 5 4

Ans. SD = 1.6

11. Calculate the standard deviation from the following data.

Age 20–30 30–40 40–50 50–60 60–70 70–80 80–90

No. of members 3 61 132 153 140 51 2

Ans. 11.88

91
12. Calculate the standard deviation from the following data.

C.I 130–134 134–139 140–144 144–149 150–154 155–159 160–164

Frequency 3 12 21 28 19 12 5

Ans. SD = 7.21

5.9 SUMMARY

An average or the central value alone cannot describe the distribution adequately.
Thus, the measure of scatteredness of observations around their average is necessary to get
better description of the data. The extent or degree to which data tend to spread around an
average is called dispersion. Measures of dispersion may be absolute or relative. Absolute
measures of dispersion are expressed in the unit of given observation. These measures are
useful to compare variations in two or more distributions in which units of measurement are
same. Relative measure of dispersion are unitless numbers useful for comparing the variation
in two or more distributions in which units of measurement are different.

Absolute measures of dispersion are (1) range (2) inter quartile range and quartile
deviation (3) mean deviation (4) standard deviation

The Range is defined as the difference between two extreme observations, inter quartile
range is the difference between third and first quartiles.

Q3  Q1
Quartile deviation is given by QD = .
2

Mean deviation is defined as the arithmetic mean of the absolute deviations of various
items from an average value such as mean, median or mode.

Standard deviation is defined as the positive square root of the arithmetic mean of the
squares of deviations of observations from the arithmetic mean. The square of standard deviation
is known as variance. The standard deviation is the best measure of dispersion as it satisfies
most of the desirable properties.

5.10 CHECK YOUR PROGRESS - MODEL ANSWERS

1. Range = L - S = 15 - 2 = 13

QD = 3.5

L  S 15  2 13
Coefficient of range =    0.76 .
L  S 15  2 17

Q3  Q1 11  4 7
Coefficient of QD = Q  Q  11  4  15  0.466 .
3 1

92
2. QD = 1.5 kg, coefficient of QD = 0.024

1 1
3. QD =  Q3  Q1    44.64  22.19   11.23 .
2 2

4. MD = 1, coefficient of MD = 0.25

5. MD = 0.75

6. MD = 13.184

7. SD = 8.58

8. SD = 3.15

9. SD = 16.41

5.11 MODEL EXAMINATION QUESTIONS

Section - A (Long Answers)

1. What do you understand by dispersion? Explain characteristics of an ideal measure of


dispersion.

2. Distinguish between absolute and relative measures of dispersion.

3. Explain quartile deviation and coefficient of quartile deviation. Also mention its merits
and demerits.

4. Define mean deviation and explain its merits and demerits.

5. Explain the properties of standrd deviation. Why is it called the best measure of dispersion.

6. Explain the computation procedure of standard deviation in case of deviation taken


from the actual mean and deviation taken from an assumed mean for individual data.

Section - B (Short Answers)

1. Explain with suitable example the term dispersion.

2. Define range and quartile deviation.

3. Define mean deviation and coefficient of mean deviation.

4. Distinguish between measures of control tendency and dispersion.

5. Define standard deviation and state any four of its important characteristics.

93
UNIT - 6 : MOMENTS, CENTRAL AND NONCENTRAL
MOMENTS, SKEWNESS
Contents

6.0 Objectives

6.1 Introduction

6.2 Moments

6.3 Sheppard’s Correction for Moments

6.4 Measures of Skewness

6.5 Kurtosis

6.6 Exercise

6.7 Summary

6.8 Check Your Progress - Model Answers

6.9 Model Examination Questions

6.0 OBJECTIVES

After studying this unit, you should be able to

 Define and compute central moments (  r ) and non-central moments  r ’ss for the
data.

 Understand the use of moments.

6.1 INTRODUCTION

The concept of moments has been introduced in statistics is an attempt to represent the
maximum information about original data in a relatively few quantities. It may be defined as the
arithmetic mean of various powers of deviations taken in any distribution from mean  X  .
Accordingly, there can be moments of various order viz. first order, second order, third order,
fourth order and so on.

94
6.2 MOMENTS
'
The rth raw moment of X about origin is denoted by  r and defined as

1
 r'  f i X ir , where N   f i
N i i

The first raw moment about origin i' is called the mean of X and denoted by X

'
f i
i Xi
i.e., i  X 
N

The rth moment of a variable X about any point A, is denoted by  r' and defined as

1 r
 r'    fi  xi  A
N i

1 r

N
f d
i
i i , where di  X i  A .

The rth moment of a variable X about the mean X is called central moment denoted
by  r , and given by

1 r
r 
N
 f X
i
i i X

1
In particular  0 
N
 f X i
i i  X0

1 2
2   f i  xi  X    2
N i

which is the algebraic sum of deviations from the mean.

1 2
r 
N
 f X
i
i i X

1 r

N
 f X
i
i i  A  A  X  , where A is constant.

1 r

N
 f d
i
i i  A  x  , where di  X i  A

95
1 r

N
 f d
i
i i  i' 

1

N
  2 3 r
fi dir  r C1 d ir 1 1'  r C2 dir  2 i'  r C3 dir  3 1'  ...  ...   1 1r 
2 r r
 r   r'  r C1 r' 1  r C2  r' 2 1'  ...   1 11 ... (1)

In particular, on putting r = 2, 3 and 4 in (1) and simplifying we get

2
2  2'   11 

3
3  3'  32' 1'  2  1' 

2 4
4  4'  43' 1'  6 2' 1'  3  1' 

' 1 r 1 r
Similarly,  r 
N
 f x
i
i i  A 
N
 f x
i
i i  x  x  A

1 r

N
 f y
i
i i  1'  , where y  x  x and x  A   '
i i 1

1
 f y 
2 2
r
 i i  r C1 yir 1 1'  r C2 yir  2 1'  ...  1'
N

2 2
 r  r C1 r 11'  r C2  r  2 1'  ...  1'

In particularly, taking r = 2, 3, 4 and noting that 1  0 , we get

2
 2'  2  1'

3
3'  3  3 2 1'  1'

2 4
 4'  4  43 1'  6  2 1'  1'

These formulas enable us to find the moments about any arbitary point, once the mean
and the moments about mean are known.

96
6.2.2 Effect of Change of Origin and Scale on Moments

Theorem

Central moments are independent of change of origin but not scale.

xi  A
Proof: Let ui   xi  A  hui
h

x  A  hu

and xi  x  h  ui  u 

The rth moment of X about mean is

1 r
r   fi  xi  x 
N i

1 r

N
 f h  u
i
i i  u 

1 r
 hr .  f u i i u
N

 hr [ rth central moment of the variable u]

Thus, rth central moment of the variable x is hr times rth central moment of variable u.

6.2.3 Theorem: Noncentral moments are independent of change of origin but not scale.

xi  A
Proof: Let ui  , xi  A  ui h
h

x  A  hu and xi  x  h  ui  u 

' 1 r
Now,  r 
N
 f x
i
i i  A

1 r 1
  f  hu 
i i  hr . fu
i
r
i
N i N

i.e. rth non-central moment of variable X is hr times the rth non-central moment of
variable u.

97
6.3 SHEPPARD’S CORRECTION FOR MOMENTS

In case of grouped frequency distribution, while calculating moments we assume that


the frequencies are concentrated at the middle point of the class interval. If the distribution is
symmetrical or slightly asymmetrical and class intervals are not greater than one-twentieth of
the range, this assumption is very rarely true. But since the assumption is not in general true,
some error, called the ‘grouping error’ creeps into the calculation of moments. Sheppard’s
correction is used to eliminate the assumption of grouping error. Sheppard proved that if

(i) The frequency distribution is continuous and

(ii) The frequencies tapper off to zero in both directions, the effect due to grouping at the
midpoints of the intervals can be corrected by the following formulae.

i2
 2 (corrected) =  2 
12

 3 (corrected) =  3

i2 7 4
 4 (corrected) =  4  2  i
2 240

where ‘i’ is the width of the class interval.

Note: First and third moments required no correction as positive and negative deviations
themselves compensate the grouping error.

Skewness

Skewness means ‘lack of symmetry’. We study skewness to have an idea about the
shape of the curve which we can draw with the help of the given data. A distribution is said to
be skewed if

(i) Mean, median and mode fall at different points i.e., Mean  Median  Mode

(ii) Quartiles are not equidistant from median and

(iii) The curve drawn with the help of the given data is not symmetrical but stretched more
to one side than to the other.

6.4 MEASURES OF SKEWNESS

Skewness can be measured in absolute terms by finding the difference between the
mean and the mode or mean and median.

i.e. Skewness = Mean - Mode

= Mean - Median

98
Any positive value obtained by any of the above formulae is marked as the extent of
the positive skewness. Any negative value obtained by any of the above formulae is marked as
the extent of the negative skenwess. If result is zero, then Mean = Median = Mode.

Hence, for a symmetrical distribution mean, median and mode coincide.

Relative Skewness:

As in dispersion, for comparing two series we do not calculate these absolute measures
but we calculate the relative measures called the coefficients of skewness which are pure
numbers independent of units of measurement. The following are the measures of skewness.

(i) Karl Pearson’s coefficient of skewness

(ii) Bowley’s coefficient of skewness

6.4.1 (i) Karl Pearson’s coefficient of skewness

According to Karl Pearson for skewness of a series, the difference between the mean
and mode only should be found out, because mean is an average which is most affected by the
extreme values of a series and mode is an average which is least affected by the extreme value
of a series. Thus

S K  P  = Mean - Mode

If mode is ill defined, i.e., when it has different values, Karl Pearson proposed to
findout the skewness by the following formula.

S K  P  = 3(Mean - Mode)

This formula is based on the empirical relationship between mean, median and mode
which is given by

Mode = 3 Median - 2 Mean

S K  P  = Mean - [3 median - 2 mean]

= Mean - 3 Median + 2 Mean

= 3 Mean - 3 Median

= 3(Mean - Median)

For relative measure of skenwess, Pearson suggested that standard deviation should
be taken as the division of the absolute skewness. This is because standard deviation may
possess algebraic properties and is asumed to be the best measure of dispersion. Karl Pearson’s
coefficient of Skewness is given by

Mean - Median
Coefficient of Skewness =
Standard deviation

When mode is ill defined, this formula is modified as below.

99
3  Mean - Median 
Coefficient of skewness = S K  P  =
Standard deviation

3  M  Md 

6.4.2 Remark

Limits for Karl Pearson’s coefficient of skewness:

3  M  Md 
SK  P   are 3 . However, in practice these limits are rarely attained.

Example: Calculate skewness and its coefficient from the following data using Karl Pearson’s
formula.

Wages (Rs.) 10 11 12 13 14 15 16

No. of workers 4 7 9 15 8 5 2

Solution:

Wages No. of workers


x2 fx fx2
(x) f

10 4 100 40 400
11 7 121 77 847
12 9 144 108 1296
13 15 169 195 2535
14 8 196 112 1568
15 5 225 75 1125
16 2 256 32 512

N = f = 50 x2 = 1211 fx =639 fx2 =8283

Mean = X 
 fx  639  12.78
 f 50
2 2
 fx   fx 
Standard deviation      
N  N 

2
8283  639 
  
50  50 

100
 165.66  163.33

 2.33
= 1.527
Highest frequency is 15

 Mode = 13.

Skewness = Mean - Mode = 12.78 - 13 = -0.22

Negative skewness indicates the data is negatively skewed.

Karl Pearson’s coefficient of skewness is given by

M  Md  12.78  13
SK  P     0.144 .
 1.527

Skewness is negative indicates that the tail of the left side of the distribution is longer or
fatter than the tail on the right side indicating a strong mode.

6.4.3 Bowley’s Coefficient of Skewness

Bowley suggested the measure of skewness on the basis of median and both the
quartiles. Bowley’s coefficient of skewness is also known as Quartile coefficinet of skenwss,
it is used when the mode is ill-defined and extreme observations are present in the data, and
also when the distribution has open end classes or unequal class intervals. In these situations
Pearson’s coefficient of skewness cannot be used.

Bowley’s coefficient of skewness is given by

Q3  Q1  2 Median
SK  B  
Q3  Q1

where Q1 and Q3 are quartiles which are equidistant from the median, which is very
much clear from the diagram given below.

101
In the above diagram AB = BC

Q3 - median = median - Q1

Q3  Q1 - 2 medina = 0

Example: Find the coefficient of skewness from the quartiles and median.

Size 4–8 8–12 12–16 16–20 20–24 24–28 28–32 32–36 36–40

f 6 10 18 30 15 12 10 6 2

Solution:

Size Frequency cf

4–8 6 6
8–12 10 16
12–16 18 34
16–20 30 64
20–24 15 79
24–28 12 91
28–32 10 101
32–36 6 107
36–40 2 109

N = 109

To find Q1 :

N 109
size of th item =  27.25 which falls in the c.f. 34, thus, class corresponding
4 4
to Q1 is 12  16

N h
Using formula Q1  l1    c 
4  f

N
l1  12,  27.25, c  16, h  l2  l1  4, f  18
4

4
Q1  12   27.25  16   14.5
18

102
To find Median

N 109
  54.5 th item, which falls in the c.f. 64.
2 2

Thus, Median class is 16  20

N h
Median = l1    c 
2 f

4
 16   54.5  34 
30

= 16 + (20.5) x (0.133)

= 16 + 2.73

= 18.73.

To find Q3

3N  3  109 
  th item = 81.75 th item
4  4 

It lies in c.f 94. Thus interval for Q3 is 24  28

 3N  h
Q3  l1    c .
 4  f

4
= 24 + (81.75 - 79).
12

= 24.92.

Q3  Q1  2Median
Bowley’s coefficient of skewness = S K  B  
Q3  Q1

4.92  14.5  2 18.73 



24.92  14.5

39.42  37.46 1.96


   0.188
10.42 10.42

The Bowley’s coefficient of skewness is 0.188.

This gives an idea about the concentration of higher or lower data values around the
central value of the data. S K  B  0.188 indicates that the data shows asymmetry..

103
6.4.4 Measure of Skewness Based on Moments

The measure of skewness can be defined by using the moments. The measure of
Skewness is obtained by making use of the third moment about the mean. The relative measure
of Skewness is defined as

32
1 
 23

For a perfectly symmetrical distribution the value of 1 will be zero. The greater the
value of 1 the more skewed is the distribution.

It is to be noted that this measure of Skewness can never give negative value becasue
the value of  3 may be positive or negative, but 32 will always be positive while the value of
 2 (variance) is always positive. As such the direction of Skewness (i.e) whether positive or
negative, can not be ascertained from the measure of Skewness. In view of the above limitations
Prof. R. A. Fisher introduced another measure of Skewness, which is given below:

 1   1

If  1  0 then there is no Skewness in the distribution.

If  1  0 then there is positive Skewness in the distribution.

If  1  0 then there is negative Skewness in the distribution.

6.5 KURTOSIS
If we know the measure of central tendency, dispersion and skewness, we still cannot
form a complete idea about the distribution. In addition to these measures, we should know one
more measure which Prof. Karl Pearson calls the “Convexity of the frequency curve or kurtosis”.
Kurtosis enables us to have an idea about the ‘flatness or peakedness’ of the frequency curve.
It is measured by the coefficient  2 or its derivation  2 is given by

4
2  ,   2  3
22 2

Curve of type A, which is neither flat nor peaked is called the normal curve or mesokurtic
curve and for such curve  2  3,  2  0 . Curve of type B, which is flatter than the normal
curve is known as platy kurtic and for such a curve  2  3, i.e.  2  0 . Curve of type C which
is more peaked than the normal curve is called lepto kurtic and for such a curve  2  3 , i.e.
2  0 .

104
Example: The first four moments of a distribution about the value 4 of the variable are -1.5, 17,
-30 and 108. Find the moments about mean, 1 and  2 . Find also the moments about
(i) the origin and (ii) any arbitary point x= 2.

Solution: We are given A  4, 1'  1.5, 2'  17, 3'  30 and 4'  108

Moments about mean:

2 2
 2   2'   1'   17   1.5   17  2.25  14.75

3 3
3  3'  32' 1'  2  1'   30  3  17   1.5  2  1.5  39.75

4
4  4'  4 3' 1'  6 2' 1'2  3  1' 

2 4
 108  4  30  1.5  6 17  1.5  3  1.5 

= 142.3125

2
32  39.75 
1    0.4926
23 14.75 3

4 142.3125 
2   2
 0.6543
 22 14.75 

Mean  X  = A  1 = 4 + (-1.5) = 2.5


'

105
(i) Moments about origin, we are given x  2.5,  2  14.75, 3  39.75 and  4  142.31 .

We know that mean  X  = A  1' = 4

Taking A= 0 , 1' = mean, x  2.5

2 2
2'  2   1'   14.75   2.5  14.75  6.25  21

3 3
3'  3  32 1'   1'   39.75  3 14.75  2.5   2.5   166

4'  4  4 3 1'  6 2 1'2  1'4

2 4
= 142.3125 + 4(39.75) (2.5) + 6(14.75)  2.5   2.5  = 1132

Moments about the point x = 2, we have  X  = A  1


'
(ii)

1'  x  2  2.5  2  0.5

2
2'  2   1'   14.75  0.25  15

3 3
3'   3  32 1'   1'   39.75  3(14.75)  0.5    0.5  62

4'  4  4 3 1'  6 2 1'2  1'4

2 4
= 142.3125 + 4(39.75) (0.5) + 6(14.75)  0.5    0.5   244

Example: The first four moments of a distribution about the value 5 are -4, 22, -117 and 560.
Find the corresponding moments about the mean. Also find 1 and  2 .

' ' ' '


Solution: We are given 1  4, 2  22, 3  117, 4  560

Moments about mean are

2 2
2  2'   1'   22   4   6

3 3
3  3'  32' 1'  2  1'   117  3(22)(4)  2  4   19

2 4
4'  4  43 1'  6 2  1'   3  1' 

2 4
= 560 - 4(-117) (-4) + 6(22)  4   3  4   32 .

106
2
32 19 
1    1.67
 23  6 3

4 32 32
2     0.89
 2  6 2 36
2

The values of 1 &  2 indicate that the data is not symmetric. If you recall that for the
data to be symmetric the 1  0 &  2  3 . Here 1  1.67 and  2  0.89 thus not symmetric.

Example: Calculate the first four moments about mean for the following data.

x 11 13 14 15 16

f 1 2 3 3 1

Solution:

dXX
x f fx f.d fd2 Fd3 fd4
= X – 14

11 1 11 –3 –3 9 –27 81
13 2 26 –1 –2 2 –2 2
14 3 42 0 0 0 0 0
15 3 45 1 3 3 3 3
16 1 16 2 2 4 8 16

f =
fx =140 0 18 –18 102
10

Mean = X 
 fx  140  14
N 10

1 
 fd  0,  
 fd 
18
 1.8
2
N N 10

3 4

3 
 fd 
18
 1.8,  4 
 fd 
102
 10.2
N 10 N 10

Thus, the four central moments are

1  0,  2  1.8, 3  1.8,  4  10.2

107
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. The first four moments of a distribution about the value 4 of the variable are as under

1'  1, 2'  4, 3'  10 and 4'  45

Find out the mean of the distribution and calculate central moments.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

2. From the following data calculate moments about

(i) assumed mean is 25 (ii) actual mean.

Variable: 0-10 10-20 20-30 30-40

frequency: 1 3 4 2

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. The first four moments of a distribution about x= 2 are 1, 2.5, 5.5 and 16. Calcualte the
four moments about x and about zero.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

108
4. Find the first four central moments from the following data. Also find 1 and  2 .

Hours worked per week 30–33 33–36 36–39 39–42 42–45 45–48

No. of officers 4 8 14 36 30 8

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

6.6 EXERCISE
1. The first four central moments of distribution are 0, 2.5, 0.7 and 18.75. Comment on
the Skewness and Kurtosis of the distribution.

[Ans. 1  0.031 , slightly skewed,  2  3 , meso kurtic]

2. The first four moments of a distribution about x= 2 are -2, 12, -20 and 100. Calculate
the four moments about mean. Also calculate  2 and show whether the distribution is
lepto kurtic or platy kurtic.

[Ans.  2  8, 3  36, 4  20,  2  0.3125 , platy kurtic]

3. Calcualte the first four moments about the mean from the following data. Also calculate
the values of 1 and  2 .

Marks 0–10 10–20 20–30 30–40 40–50 50–60 60–70

No. of students 5 12 18 40 15 7 3

[Ans.  2  177.39, 3  47.982, 4  95009.364, 1  0.0004,  2  3.02 ]

4. Obtain Karl Pearson’s coefficient of Skewness for the following data.

Class 5–10 10–15 15–20 20–25 25–30 30–35

Frequency 6 8 17 21 15 11

[Ans. Mean = 216, Mode = 22,   7.137 , Karl Pearson coefficient of skewness =
0.056]

109
5. Calcualte Bowley’s coefficient of Skewness to the following data.

Class 5–10 10–15 15–20 20–25 25–30 30–35 35–40

No. of families 45 26 18 13 12 12 4

[Ans. Bowley’s coefficient of skewness = 0.2851]

6.7 SUMMARY

Moments may be defined as the arithmetic mean of various powers of deviations taken
in any distribution from mean  x  . There can be moments of various orders viz., first order,,
second order, third order, fourth order and so on. Moments upto fourth order are sufficient to
know all the features of the distribution.

The technique which measures the shape of the distribution is called Skewness. if the
distribution of data is not symmetrical, then its is called asymmetrical or, skewed. It is known
that a distribution is symmetrical if the frequencies are symmetrically distributed about mean
i.e., the values of the variable are equidistant from the mean. On the other hand if this property
is not satisfied the distribution is called asymmetrical or skewed. Measures of skewness are
obtained by Karl Pearson’s coefficient of skewness and Bowley’s coefficient of Skewness.
Karl Pearson’s coefficient of Skewness is based on mean, median, mode and standard deviation
where as Bowley’s coefficient of Skewness is based on quartiles.

Kurtosis is another measure which gives an idea about flatness of the curve. If a curve
is peaked like a normal curve it is called mesokurtic, if a curve is more peaked than normal
curve it is called leptokurtic and if it is flatter than the normal curve it is called platy kurtic.

6.8 CHECK YOUR PROGRESS - MODEL ANSWERS

1. Mean = 5,  2  3, 3  0, 4  26

2. 1'  3, 2'  90, 3'  900, 4'  2100, 2  81, 3  144, 4  14817

3. The moments about mean are 1  0,  2  1.5, 3  0,  4  6

The moments about zero are 1'  3, 2'  10.5, 3'  40.5, 4'  168

4.  2  12.95, 3  29.49, 4  518.48

32 869.66 4 518.48


1    0.400 2    3.09
23 2171.75 2 12.952
2

110
6.9 MODEL EXAMINATION QUESTIONS

Section - I (Long Answers)

1. Define central and non-central moments. Derive the relation between them.

2. Show that the moments are independent of change of origin but not scale.

3. Explain Sheppard’s correction for moments.

4. What is Skewness? Explain various measures of skewness.

5. Explain measures of Skewness based on moments.

Section - II (Short Answers)

1. Define central and non-central moments.

2. Write a short note on Skewness.

3. Explain Karl Pearson’s coefficient of Skewness.

4. Explain Bowley’s coefficient of Skewness.

5. What is kurtosis? And write a short note on kurtosis.

111
BLOCK - III: ADDITION RULE
In this block, you shall study the basic concepts of probability, addition rule of probability.
In unit 7, you can find the basic terminology used in probability and different approaches to
theory of probability. In unit 8, we introduce addition theorem of probability for two events and
for n events. In unit 9, you will learn some more important theorems and their uses in solving
problems.

The units included in this block are:

Unit - 7: Definition and Basic Concepts

Unit - 8: Theorems of Addition Rule of Probability

Unit - 9: Some More Important Theorems

112
UNIT - 7: DEFINITION AND BASIC CONCEPTS
Contents

7.0 Objectives

7.1 Introduction

7.2 Basic Terminology Used in Probability

7.3 Fundamental Rules of Counting

7.4 Definitions of Probability

7.5 Exercises

7.6 Summary

7.7 Check Your Progress - Model Answers

7.8 Model Examination Questions

7.0 OBJECTIVES

After studying this unit, you should be able to understand:

 Know basic terminology used in probability.

 Define different approaches to theory of probability like the classical/ mathematical


probability, empirical probability and axiomatic probability.

 Appreciate the use of probability theory in our day-to-day life and in the decision
making in the face of uncertainty.

7.1 INTRODUCTION

The concept of probability is very commonly used by each of us knowingly or


unknowingly in our day-to-day conversation. For example, we come across statements like
“Probably it may rain tonight”, “Probably you are right”, “It is possible that I may not be able to
join you at the tea party”. All these terms, possible, probably, likely etc., convey the same sense.
These statements involve an element of uncertainty. A numerical measure of uncertainty is
provided by a very important branch of Mathematics called the ‘Theory of Probability’.

The first foundation of Mathematical theory of probability was laid in the 17th century
by two French Mathematicians, Blaise Pascal (1623-62) and Pierre Fermat (1601-65), succeeded
in obtaining exact probability for certain gambling problems involving the dice. Since then, there
has been a steady and continuous development in the theory of probability. Starting with games
of chance, probability has become a past of our every day life. Probability theory is now being

113
applied in the analysis of social, economic, physical, computer, biological, business and
management sciences.

For understanding the concept of probability theory, the following concepts must be
clearly grasped.

7.2 BASIC TERMINOLOGY USED IN PROBABILITY

In this section we shall explain the various terms which are used in the definition of
probability under different approaches.

7.2.1 Experiment: An activity that yields a result or an outcome is called an experiment.

Random Experiment: If in each trial of an experiment conducted under identical conditions,


the outcome is not unique, but may be any one of possible outcomes, then such an experiment
is called a random experiment.

Example: Tossing a coin, throwing a dice, drawing a card from a pack of 52 cards.

7.2.2 Outcome: The result of a random experiment is called an outcome.

7.2.3 Trail: Conducting a random experiment is called a trial.

Example: Tossing of a coin is trial or random experiment.

7.2.4 Sample Space:The set of all possible outcomes of a random experiment is called
sample space. It is denoted by S or  .

Eg: (1) If a coin is tossed, either a head or a tail appears. The sample space is
S  H ,T  .

(2) If a dice is thrown, the possible outcomes are 1, 2, 3, 4, 5 and 6, then the sample
space is S  1, 2,3, 4,5,6 .

7.2.5 Event: The possible outcomes of a random experiment are called events i.e. any
element or member of sample sapce is an event.

Simple Event or Elementary Event

An event which corresponds to a single possible outcome of an experiment i.e., it is an


outcome of a random experiment that cannot be decomposed further into smaller events.

Eg: When a die is thrown, each of the events 1, 2, 3, 4, 5, 6 is an elementary event.

When two coins are tossed, the event “two heads is an elementary event”.

114
Compound (Composite) Event

An event which is not simple or elementary is called a compound event. Every compound
event can be represented by the union of a set of elementary events.

Eg:When a dice is thrown, an event of getting an even number is composite, consists


of elementary events 2, 4, 6.

7.2.6 Exhaustive Event:

The total number of all possible outcomes of a random experiment is known as the
exhaustive events.

Example: 1. In rolling of a dice, there are six exhaustive events 1, 2, 3, 4, 5 and 6.

2. In tossing of a coin, there are two exhaustive events head and tail. If tossing an
unbiased coin, head and tail are equally likely events.

In throwing an unbaised die, all the six faces are equally likely events.

7.2.9 Favourable Events:

The number of cases favourable to an event in a random experiment is the number of


outcomes which entail the happening of the event.

Example: In throwing a dice, the number of cases favourable to getting an even


number is 2, 4, 6.

In drawing a card from a pack of cards the number of cases favourable to drawing a
diamond is 13 and for drawing a black card is 26.

7.2.10 Independent Events:

Two or more events are said to be independent if the happening or non-happening of


one event is not affected by the happening or non-happening of other events.

Example: If a coin is tossed twice, the result of second toss does not depend upon the
result of first toss. If a die is thrown twice, the result of the first throw does not affect the result
of the second throw. .2.11 Sure (or) Certain Event

An event is said to be sure event if all the possible outcomes of an experiment are
favourable to the event i.e., if P  E   1 then E is called sure or certain event.

Impossible Event

When none of the outcomes are favrourable to the event then it is called impossible
event i.e. if P  E   0 then E is called an impossible event.

115
7.3 FUNDAMENTAL RULES OF COUNTING

In computing probabilities of complex events, sometimes it is difficult to count the


number of favourable or exhaustive cases. To make it easier we discuss a few fundamental
rules of counting.

1. If an event can happen in any one of ‘m’ ways, and when another event can occur in
any one of ‘n’ ways, then the number of ways in which both events can happen
together in m x n = mn ways.

Eg: If two coins are tossed simultaneously, the first coin can land in any one of 2 ways.
For each of two ways the second coin can land in 2 ways. The two coins can land in
2 x 2 = 4 ways.

i.e. Sample Space = {H H; H T; T H; T T}

2. If an event A can occur in total of m ways and if a different event B can occur in n
ways, then the event A or B can occur in  m  n  ways provided the two events are
mutually exclusive.

Example: Suppose in a certain class, a class representative is to be chosen from 3 female


students and 4 male students. So female representative can be chosen in 3 ways and a male in
4 ways. Therefore, the number of ways a class representative can be chosen will be 3 + 4 = 7
ways.

Factorial: In the following rules we will observe that the product of consecutive integers are
involved. We represent this product by a factorial symbol.

Example: 6 x 5 x 4 x 3 x 2 x 1 can be written as 6! or 6 and referred to as 6 factorial.

In general, for any positive integer n,

n!  n  n  1 n  2 ...3.2.1

Permutation

A permutation is an arrangement of all or some of a set of objects.

Let us consider the three alphabets, X, Y and Z. The possible permutation of three
letters are X Y Z , X Z Y, Y X Z, Y Z X, Z X Y, Z Y X. Thus, we get six different arrangements
of three letters or objects. Thus, we have 3 choices for the first position, 2 for the second,
leaving only 1 for the last position, giving a total of 3 x 2 x 1 = 6 permutations.

In general, the number of permutations of n objects will be n  n  1 n  2  ...3.2.1  n! .

Example: Five students can be lined upto get on a bus is 5! = 5 x 4 x 3 x 2 x 1 = 120.

116
Permutations of n Objects taken r at a time

The number of permutations of the three letters a, b and c will be 3! = 6. Now, let us
consider the number of permutations that are possible by takine 3 letters a, b and c, two at
time. These permutation would be ab, ac, ba, ca, bc, cb , (i.e.) a total of 3 x 2 = 6 permutations.

 In general, n distinct objects taken ‘r’ at a time can be arranged in  P  ways.


n
r

i.e.  P   n  n  1 n  2 ... n  r  1   n n!r !


n
r

Combinations

The combination of n different objects taken r at a time is a selection of r out of the n


objects with no preference given to the order of arrangement. It is denoted by n Cr and defined
as

n n!
Cr 
r ! n  r !

Example: The number of ways of selecting 2 boys out of 5 boys is

5 5!
C2   10
2! 3!

7.4 DEFINITIONS OF PROBABILITY

We will discuss the following definitions of probability

1. Mathematical or classical apriori probability.

2. Statistical or empirical probability

3. Axiomatic approach to probability.

7.4.1 Mathematical or Classical or Apriori Probability

If a random experiment has ‘n’ exhaustive, mutually exclusive and equally likely
outcomes, out of which ‘m’ are favourable to the occurence of an event E, then the probability
of occurrence of event E is denoted by P  E  and defined as

Number of favourable cases m


PE  
Total number of exhaustive cases n

and the probability that the event E does not occur will be

117
Number of cases unfavourable to the event E
PE 
Total number of exhaustive cases

nm m
  1
n n

m
PE  1  1 P E
n

P E  P E   1

Remark: Since ‘m’ number of favourable cases, clearly m  n, n  0

m
 0
n

m
0 1
n

 0  P  E   1.

Example 1: An unbaised die is thrown once, then find the probability of getting an even number.

Solution: Sample space = {1, 2, 3, 4, 5, 6}

 Exhaustive number of cases (n) = 6

favourable cases of getting an even number are

2, 4, 6 i.e., m = 3

Let E be the event of getting an even number

m 3 1
PE    .
n 6 2

Example 2: What is probability of getting a head, if two coins are tossed simultaneously?

Solution: Sample Space = {(H H), (H T), (T H), (T T)}

 Exhaustive number of cases (n) = 4


Favourable cases of getting a head are (H T), (T H)

Let ‘E’ be the event of getting a head

m 2 1
PE    .
n 4 2

118
7.4.2 Statistical or Empirical Probability

In classical definition of probability, ‘n’ is finite and all cases are equally likely. These
aer very restrictive conditions and as such, cannot cover all the situations. For overcoming such
situations, the statistical or empirical definition of probability is useful.

Definition:Let a random experiment is conducted under homogeneous and identical conditions,


then the limiting value of the number of times the event occurs to the number of trials, as the
number of trials becomes indefinitely large, is called the probability of happening of the event, it
being assumed that the limit is finite and unique.

Symbolically, if in N trials an event E happens M times, then the probability of happening


of event E, denoted by P  E  , and defined as

M
P  E   lim
n  N

M
where is called relative frequency or frequency ratio of an event E connected with
N
a random experiment.

This statistical definition of probability removes all the limitations of the mathematical
definition. The only limitation of statistical definition is that it is difficult to prove that existence
of a limit to the relative frequency.

Example: Consider a coin tossing experiment and let E be the event that a throw results in a
head. If the coin is tossed 10 times resulting in 6 heads and 4 tails, the relative frequency of
6
head is  0.6 . However, if the experiment is carried out a very large number of times we
10
expect that the relative frequencies of heads will become stable and tend towards 0.50. This
indicates that though the results of an individual experiment are unpredictable, the average
results of a long sequence of random experiments show a very striking regularity and are some
what predictable.

7.4.3 Axiomatic Approach to Probability

The axiomatic approach to probability, which closely relates the theory of probability
with the modern metric theory of functions and also set theory, was proposed by Kolmogorov,
a Russian mathematician in 1933. The axiomatic definition of probability includes both the
classical and the statistical definition as particular cases and overcomes the deficiencies of
each of them.

Definition: P  A  the probability function defined on a  - field B of events if the following


axioms hold

1. For each A  B , P  A  is defined, is real and P  A  0 .

2. P  S   1 , S is a sure event.

119
3. If  An  is any infinite or finite, sequence of mutually exclusive events in B, then

 n  n
P   Ai    P  Ai 
 i 1  i 1

Now, if an event A consists of m sample points, then the probability of the event will be

m Number of sample points in A


P  A  
n Number of sample points in S

Number of cases favourable to event A



Number of all possible outcomes in the experiment

n  A

nS 

where n  A  = number of distinct elements in A

n  S  = number of distinct elements in S

Example 3: In a simultaneous throw of two dice (i) find the probability of getting a total of 6 (ii)
the total number on the dice is greater than 8, (iii) the total of the numbers on the dice is any
number from 2 to 12, both inclusive.

Solution: In a throw of two dice, the total number of cases is 6 x 6 = 36.

S = { (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3,
1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3),
(5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) }

Exhaustive number of cases (n) = 36

(i) The favourable cases of getting a total of 6 are

(1, 5), (2, 4), (3, 3), (4, 2), (5, 1), i.e. m = 5

5
 Probability that total number of events on two dice is 6 = .
36

(ii) The favourable cases of getting a total of more than 8 are

(3, 6), (4, 5), (4, 6), (5, 4), (5, 5), (5, 6), (6, 3), (6, 4), (6, 5), (6, 6)

i.e. m = 10

10 5
 Probability that total number of events on two dice is greater than eight is =  .
36 18

120
(iii) The probability of the total of the numbers on the dice is any number from 2 to 12 is
one; as the total of the numbers on the two dice certainly ranges from 2 to 12.

 The given event is called a sure event.

Example 4: Four cards are drawn at random from a pack of 52 cards. Find the probability that
(i) They are two kings and two queens

(ii) Two are black and two are red.

52
Solution: Four cards can be drawn from a well shuffled pack of 52 cards in C4 ways, which
gives exhaustive number of cases.

(i) Two kings can be drawn in 4 C2 ways.

Two queens can be drawn in 4 C2 ways.

4
C2  4 C2
P [two kings and two queens] = 52 .
C4

(ii) Since there are 26 black cards and 26 red cards in a pack of cards.
26
C2  4 C2
P [two black and two red] = 52 .
C4

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. A coin is successively tossed three times. Find the probability of getting (i) exactly one
head (ii) exactly two heads (iii) exactly one head or exactly two heads.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2. One card is randomly drawn from a pack of 52 cards. Find the probability that (i) the
drawn card is red (ii) the drawn card is red and a king.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

121
3. A bag contains 10 white and 8 green balls. Two balls are drawn at random from the
bag. Find the probability that (i) both of them are white and (ii) one is white and other
is green.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
4. An integer is chosen at random from 50 digits. What is the probability that the integer
is divisible by 6 or 8?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

7.5 EXERCISES

1. Define the following with an example.

(a) Sample space (b) Mutually exclusive events

(c) Exhaustive events (d) Equally likely events

2. In an experiment of throwing a coin three times, write down the sample space. How
many points are there in the sample space.

3. Define mathematical definition of probability.

4. There are 6 defective items in a sample of 30 items. Find the probability that an item
chosen at random from the sample is (i) non-defective (ii) defective.

Ans. (i) 4/5 (ii) 1/5

5. What is probability of getting a tail in a throw of a coin?

Ans. 1/2

6. If a die is thrown, what is the probability of getting the face five?

Ans. 1/6

7. If two dice are thrown simultaneously, what is the probability of getting a total of 6?

Ans. 5/36

122
8. Two dice are thrown simultaneously. What is the Sample space.

9. Give the statistical definition of probability.

10. Explain the axiomatic approach to probability.

7.6 SUMMARY

The probability theory provides an idea of the likelihood of occurrence of different


events resulting from a random experiment in terms of quantitative measures between zero and
one. The probability of an impossible event is zero while that of a sure event is unity.

Probability can be defined in three ways according to the mathematical definition, the
probability of an event A is

m Number of favourable cases of the event A


P  A  
n Exhaustive number of cases

and the probability that the event A does not happen is

m nm
P  A   1  P  A  1  
n n

In statistical or empirical definition, the probability is defined as the limiting value of the
ratio of the number of times the event A happens to the number of trials, as the number of trials
becomes infinite

m
i.e. P  A  lim .
n  n

According to the axiomatic approach of probability, the probability P of an event A with


regard to a sample space S of an experiment satisfies the following axioms:

(a) 0  P  A  1

(b) PS 1

n  n
(c) P  P  Ai     P  Ai  , where A , A , ..., A are mutually exclusive events.
1 2 n
 i 1  i 1

123
7.7 CHECK YOUR PROGRESS - MODEL ANSWERS

1. In tossing a coin three times, sample space is noted as:

S = {H H H, H H T, H T H, T H H, T T H, T H T, H T T, T T T}

n = 8.

(i) Favourable cases for getting exactly one head are

{H T T, T H T, T T H} and m = 3

m 3
PE   .
n 8

(ii) Favourable cases for getting exactly two heads are

{H H T, H T H, T HH}, and m = 3

3
PE 
8

(iii) Favourable cases for getting one or two heads are

{H T T, T H T, T T H, H H T, H T H, T H H}, and m = 6

6 3
PE   .
8 4

2. In randomly drawing a card from 52 cards we have 52 possible outcomes, so n = 52

(i) Total number of red cards are 26

 m  26

26 1
P (red and a King) = 
52 2

(ii) There are only two cards which are red Kings

m  2

2 1
P ( getting a red colour and a King) =  .
52 26

3. Total number of balls are = 10 + 8 = 18 balls.

18  17
Two balls can be drawn out of 18 by 18
C2 ways =  153
1 2

124
 n  153 .

(i) Number of ways of selecting 2 white balls out of 10 white balls is

10 10  9
C2   45
1 2

 m  45

45 15
P(both white balls)  
153 51

10
(ii) One white and one green ball can be drawn by C1  8C1 ways.

80
P (one white and one green ball) = .
153

4. Let A be an event of choosing an integer which is divisible by 6 from 50 digits.

Similarly B is an event of choosing an integer which is divisible by 8 from 50 digits.

Favourable cases of A are {6, 12, 18, 24, 30, 36, 432, 48} i.e. 8

Favourable cases of B are {8, 16, 24, 32, 40, 48} i.e. 6

Total no. of digits = 50

Favourable cases of A and B i.e. A  B  24,48  2

8 6 2
P  A  , P  B  , P  A  B  
50 50 50

8 6 2 12 6
P(A or B)  P  A  B       .
50 50 50 50 25

7.8 MODEL EXAMINATION QUESTIONS

Section - I (Long Answers)

1. Explain the concept of probability with an example.

2. Define the following terms with an example.

i) Sample Space

ii) Mutually Exclusive Events

iii) Equally likely events

iv) Independent and dependent events.

125
3. Explain mathematical, statistical and axiomatic definitions of probability.

4. In a simultaneous throw of two dice

i) Find the probability of getting a total of 6.

ii) The total number on the dice is greater than 8.

iii) The total of the numbers on the dice is any number from 2 to 12, both inclusive.

Section - II (Short Answer Questions)

1. Define experiment and random experiment.

2. What is impossible event?

3. Four cards are drawn at random from a pack of 52 cards. Find the probability that

i) they are two kings and two queens

ii) Two are black and two are red.

126
UNIT - 8: ADDITON THEOREM OF PROBABILITY
Contents

8.0 Objectives

8.1 Introduction

8.2 Addition Theorem of Probability

8.3 Exercises

8.4 Summary

8.5 Check Your Progress - Model Answers

8.6 Model Examination Questions

8.0 OBJECTIVES

After studying this unit, you should be able to:

 Understand the addition theorem and its use in solving problems in various diversified
situations.

8.1 INTRODUCTION

The probability of happening an event can easily be found using the definition of
probability. But just the definition cannot be used to find the probability of happening atleast one
of the given events. A theorem known as addition theorem in the probability is the process of
determining probability that one or more events occur.

8.2 ADDITION THEOREM OF PROBABILITY FOR TWO


EVENTS

Statement:For any two events A and B defined on the sample space S then

P  A  B   P  A  P  B   P  A  B 

127
Proof: Consider the venn diagram

Fig. 8.2.1

From venn diagram figure 8.2.1 we can observe that A and A  B are disjoint.

A  B  A   A  B

P  A  B   P  A   A  B  

 P  A  P  A  B 

Adding and subtracting P  A  B  on the right side

P  A  B   P  A    P  A  B   P  A  B    P  A  B 

 P  A  P  B   P  A  B  (from previous theorem)

Therefore, P  A  B   P  A   P  B   P  A  B  .

8.2.1 Extension of Addition Theorem of Probability for n Events

For ‘n’ events A1 , A2 ,..., An we have

 n  n
P   Ai    P  Ai     P  Ai  A j    P  Ai  A j  Ak 
 i 1  i 1 1 i  j  n 1i  j  k  n

n 1
...   1 P  A1  A2  ...  An 

128
Proof: This theorem can be proved by the principle of mathematical induction.

For two events A1 and A2 , we have

P  A1  A2   P  A1   P  A2   P  A1  A2  ... (1)

 2  2
i.e. P   Ai    P  Ai    1 P  A1  A2 
2 1

 i 1  i 1

Hence the result is true for n = 2

Let us now suppose that the result is true for n = r (say)

 r  r
P   Ai    P  Ai    P  Ai  Aj   ...   1 P  A1  A2  ...  Ar  ... (2)
r 1

 i 1  i 1 1 i  j  r

Now, we have to prove that the result is true for n = r + 1

 r 1   r  
P  Ai   P   Ai   Ar 1 
 i 1   i 1  

 r   r  
P   Ai   P  Ar 1   P   Ai   Ar 1  (using (1))
 i 1   i 1  

r
r 1
  P  Ai    P  Ai  Aj   ...   1 P  A1  A2  ...  Ar 
i 1 1i  j  r

 r 
 P  Ar 1   P    Ai  Ar 1   (using (2))
 i 1 

r 1
r 1
  P  Ai    P  Ai  Aj   ...   1 P  A1  A2  ...  Ar 
i 1 1i  j  r

 r
  P  Ai  Ar 1    P  Ai  Aj  Ar 1  ...   1r 1 P  A1  A2  ...  Ar  Ar 1  
 i 1 1i  j  r

(using (2))

r 1
r
  P  Ai    P  Ai  Aj   ...   1 P  A1  A2  ...  Ar 1 
i 1 1 i  j  r 1

129
 The result is true for n  r  1 events. Hence, by the principle of mathematical
induction, the result is true for all positive integral values of n.

Example 1: A card is drawn at random from a well shuffled pack of 52 cards. Find the
probability of getting an ace or a spade.

Solution: Let A be an event of getting an ace and B of getting a spade.

Then A= set of all aces and B = set of all spades and A  B = set of an ace of spade.

n  A   4, n  B   13 and n  A  B   1 (n represents number)

n  S   52

4 13 1
P  A  , P  B  , P  A  B  
52 52 52

 P [an ace or a spade] = P(A or B) = P  A  B   P  A   P  B   P  A  B 

4 13 1 16 4
     .
52 52 52 52 13

Example 2:A construction company is bidding for two contracts A and B. The probability that
the company will get contract A is 3/5. will get contract B is 1/4 and the probability that the
company gets both the contracts is 1/8. What is the probability that the company will get
contract A or B?

Solution:Let A and B be the respective events of getting the contracts A and B. Then, we are
given that

3 1 1
P  A  , P  B   and P  A  B  
5 4 8

 The required probability that the company will get a contract A or B is

P(A or B) = P  A  B   P  A   P  B   P  A  B 

3 1 1 29
    .
5 4 8 40

Example 3: A bag contains 30 balls numbered from 1 to 30. One ball is drawn at random. Find
the probability that number of drawn ball is a multiple of 4 or 9.

Solution: Let A be the event that the drawn number is a multiple of 4 then the number of cases
favourable to A are {4, 8, 12, 16, 20, 24, 28} i.e. 7.

Let B be the event that the drawn number is a multiple of 9, then the number of cases
favourable to B are {9, 18, 27} i.e. 3.

130
Total number of event = 30

7 3 0
P  A  , PB  , P A  B  0
30 30 30

P [ number of the drawn ball is a multiple of 4 or 9] = P(A or B)

P(A or B) = P  A  B   P  A   P  B   P  A  B 

7 3 10 1
  0  .
30 30 30 3

Example 4: Find the probability of getting more than 4 in tossing a die.

Solution: The numbers more than 4 on a die are 5 and 6.

Let A and B be the events of getting 5 and 6

1 1
P  A  , P ( B) 
6 6

A and B are mutually exclusive, P  A  B   0

 P(A or B) = P (5 or 6) = P  A   P  B 

1 1 2 1
=    .
6 6 6 3

Example 5: Three news papers A, B and C published in a certain city. It is estimated from a
survey that of the adult population. 20% read A, 16% read B, 14% read C, 8% read both
A and B, 5% read both A and C, 4% read both B and C, 2% read all three. Find what percentage
read atleast one of the papers?

Solution: Let X, Y and Z denote the events that the adults read newspapers A, B and C
respectively.

20 16 14
We are given P  X   P Y   PZ  
100 100 100

8 5 4 2
P X Y   , P X  Z   , P Y  Z   , P X Y  Z  
100 100 100 100

P [an adult reads atleast one of the news papers A, B and C]

P  X  Y  Z   P  X   P Y   P  Z   P  X  Y   P Y  Z   P  X  Z   P  X  Y  Z 

131
20 16 14 8 5 4 2
      
100 100 100 100 100 100 100

35

100

Hence 35% of the adult population reads atleast one of the newspapers.

Example 6: If two dice are thrown, what is the probability that the sum is (a) greater than 8 (b)
neither 7 nor 11?

Solution: If S denotes the sum on the two dice, then we want P  S  8 .

The required event can happen in the following mutually exclusing ways:

i.e. S = 9, S = 10, S = 11, S = 12

By addition theorem of probability

P  S  8  P  S  9   P  S  10   P  S  11  P  S  12 

If two dice are thrown, then sample space contains 62  36 sample points. The number
of favourable cases are

S = 9: (3, 6), (6, 3), (4, 5), (5, 4) i.e. 4 sample points.

4
P  S  9 
36

S = 10: (4, 6), (6, 4), (5, 5) i.e. 3 sample points.

3
P  S  10  
36

S = 11: (5, 6), (6, 5) i.e. 2 sample points.

2
P  S  11 
36

S = 12: (6, 6) only one sample point

1
P  S  12  
36

4 3 2 1 10 5
 P  S  8       .
36 36 36 36 36 18

132
(b) Let A be an event of getting the sum of 7 and let B be an event of getting the sum of 11
with a pair of dice.

S = 7: (1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3), i.e. 6 favourable events.

6 1
P  A  P  S  7   
36 6

S = 11: (5, 6), (6, 5) i.e. 2 favourable events.

2 1
P  S  11   .
36 18

 Required probability = P [sum neither 7 nor 11]

 P  A  B   1  P  A  B 

 1   P  A   P  B  

1 1 7
1   .
6 18 9

Example 7: Two dice are tossed. Find the probability of getting an even number on the first die
or a total of 8.

Solution: In tossing of two dice, sample space contains 36 sample points.

Let A be an event of getting an even number on the first dice and let B be an event of
getting sum of the points obtained on the two dice 8.

These events are represented by the following subsets of S.

A  2, 4,6  1, 2,3,4,5,6  n  A   3  6  18

B   2,6  ,  6, 2  ,  3,5 ,  5,3 ,  4, 4   n  B   5

A  B   2,6  ,  6,2  ,  4,4   n  A  B   3

n  A  18 1 n  B 5 3 1
P  A    , PB   p A  B   .
n  S  36 2 n  S  36 and 36 12

Hence, the required probability is given by

18 5 3 20 5
P  A  B   P  A  P  B   P  A  B       .
36 36 36 36 9

133
Example 8: The probability that a student passes a physics test is 2/3 and the probability that
he passes both physics test and English test is 14/45. The probability that he passes atleast one
test is 4/5. What is the probability that he passes the English test?

Solution: Let A an event that student passes a Physics test, let B an event that student passes
an English test.

We are given

2 14 4
P  A   , P  A  B   , P  A  B   and we want P  B  .
3 45 5

P  A  B   P  A  P  B   P  A  B 

4 2 14
  P B 
5 3 45

4 14 2 36  14  30 4
P B      .
5 45 3 45 9

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. An integer is chosen at random from 50 digits. What is the probability that the integer
is divisible by 6 or 8?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2. The probability that a student passes Physics test is 2/3 and the probability that he
passes both Physics test and an English test is 14/45. The probability that he passes
atleast one test is 4/5. What is the probability that he passes the English test?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

134
3. A card is drawn from a pack of 52 cards. Find the probability of getting a King or a
heart or a red card.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

8.3 EXERCISE

1. State and prove the addition theorem of probability for two events.

2. State and prove addition theorem of probability for n - events.

3. In a single throw of three dice, find the probability of getting a total of 17 or 18?

Ans. 1/54

4. A card is drawn at random from a well-shuffled pack of cards. Find the probability that
it is a heart or a queen?

Ans. 4/13

5. If the probability of player A winning a game is 1/8 and that of player B winning the
same is 1/4, what is the probability that one of the players will win?

Ans. 3/8

6. Find the probability of drawing a total of 7 or 11 when two are thrown?

Ans. 2/9

8.4 SUMMARY

The fundamental rule of probability which is useful for simplifying the calculation of
probabilities in addition theorem. The addition theorem in the probability concept is the process
of determination of the probability that either event A or event B occurs or both occur. The
notation for addition between two events A and B is denoted as  and pronounced as union.

The addition theorem is given by

P  A  B   P  A  P  B   P  A  B 

If A and B are mutually exclusive events, the additive rule is P  A  B   P  A  P  B  .

135
8.5 CHECK YOUR PROGRESS - MODEL ANSWERS

1. Let A be an event of choosing an integer which is divisible by 6 from 50 digits.

Similarly B is an event of choosing an integer which is divisible by 8 from 50 digits.

Favourable cases of A are {6, 12, 18, 24, 30, 36, 42, 48} = 8

Favourable cases of B are {8, 16, 24, 32, 40, 48} = 6

Favourable cases of A  B {24, 48} = 2

8 6 2
P  A  , P  B  , P  A  B  
50 50 50

6
P A  B  .
25

2. Let A be an event that student passes a Physics test.

Let B be an event that student passes an English test.

2 14 4
P  A   , P  A  B   , P  A  B   and we want P  B  .
3 45 5

P  A  B   P  A  P  B   P  A  B 

4 2 14
  P B 
5 3 45

4 14 2 36  14  30 4
P B      .
5 45 3 45 9

3. Let the events

A = the card drawn is a King

B = the card drawn is a heart

C = the card drawn is a red card

A, B, C are not mutually exclusive.

A  B : The card drawn is the king of hearts = n  A  B   1

B  C  B : The card drawn is a heart = n  B  C   13 .

136
C  A : The card drawn is a red king = n  C  A  2

A  B  C  A  B : The card drawn is the King of hearts = n  A  B  C   1

4 13 26 1 13
P  A  , P  B   , P C   , P  A  B  , P  B  C  
52 52 52 52 52

2 1
P  C  A  , P A  B  C  
52 52

The probability of getting a king or heart or a red card is given by

P  A  B  C   P  A  P  B   P  C   P  A  B   P  B  C   P  C  A  P  A  B  C 

4 13 26 1 13 2 1
      
52 52 52 52 52 52 52

28 7
  .
52 13

8.6 MODEL EXAMINATION QUESTIONS

Section - I (Long Answers)

1. State and prove addition theorem of probability for two events . Also extend for n
events.

2. If two dice are thrown, what is the probability that the sum is

a) greater than 8 b) neither 7 nor 11?

Section - II (Short Answers)

1. State and prove addition theorem of probability.

2. A card is drawn at random from a well shuffled pack of 52 cards. Find the probability
of getting an ace or a spade.

137
UNIT - 9: SOME MORE IMPORTANT THEOREMS
Contents

9.0 Objectives

9.1 Introduction

9.2 Theorems of Probability

9.3 Boole’s Inequality

9.4 Exercises

9.5 Summary

9.6 Check Your Progress - Model Answers

9.7 Model Examination Questions

9.0 OBJECTIVES

After completion of this unit, you should be able to:

 learn some important theorems on probability and their use in solving problems.

9.1 INTRODUCTION

In this unit, we shall prove a few simple theorems which help us to evaluate the
probabilities of some complicated events in a rather simple way.

Boole’s inequality is also known as the union bound. It is applicable at places when we
have to show that the union probability of some events is less than a particular value. It is very
simple yet useful.

9.2 THEOREMS OF PROBABILITY

9.2.1 Theorem

Probability of an impossible event  is zero i.e. P    0 .

Proof: Impossible event cotains no sample point. Certain event S and impossible event  are
mutually exclusive

S   S

PS   PS

138
From axiomatic definition of probability

P  S     P  S   P  

 P  S   P    P  S 

 P    0 .

9.2.2 Theorem

The probability of the complement of an event A is given by P  A   1  P  A 

Symbol A denotes complement of A.

Proof: We know that A  A  S

P  A  A  P  S 

From axioms 2 and 3 of probability

P  A  P  A   1

 P  A   1  P  A .

9.2.3 Theorem

For any two events A and B defined on the sample space S,


P A  B  PB  P A  B .

Proof: Consider the Venn diagram

From the Venn diagram, it can be seen that

139
B   A  B   A  B

P  B  P  A  B  P  A  B 

 P A  B  PB  P A  B

9.3 BOOLE’S INEQUALITY

9.3.1 Statement

For ‘n’ events E1 , E2 ,..., En we have

n  n
(a)  Ei    P  Ei    n  1
P
 i 1  i 1

n  n
(b)  Ei    P  Ei 
P
 i 1  i 1

Proof: (a) We can prove this by the principle of mathematical induction.

For two events E1 and E2 , we have

P  E1  E2   P  E1   P  E2   P  E1  E2   1

P  E1  E2   P  E1   P  E2   1

2  2
i.e. P  Ei    P  Ei    2  1
 i 1  i 1

Hence the result is true of n = 2.

Let us assume that the result is true for n = k events such that

k  k
P  Ei    P  Ei    k  1
 i 1  i 1

Now, we have to prove the result is true for n = k + 1 events.

 k 1  k   k 1 
P  Ei   P  Ei  Ek 1   P  Ei   P  Ek 1   1
 i 1   i 1   i 1 

k
  P  Ei    k  1  P  Ek 1   1
i 1

140
 k 1  k 1
 P   Ei    P  Ei    k  1  1
 i 1  i 1

 The result is true for n = k + 1.


Hence, by the pricniple of Mathematical induction the result is true for all positive
integral values of n.

(b) We can prove this result by the principle of Mathematical induction.

For two events E1 and E2 , we have

P  E1  E2   P  E1   P  E2   P  E1  E2 

 P  E1   P  E2   P  E1  E2   0

2  2
 P   Ei    P  Ei 
 i 1  i 1

 The result is true for n = 2 events.


Let us suppose that the result is true for n = k events.

k  k
P  Ei    P  Ei 
 i 1  i 1

Now, we have to prove the result is true for n = k + 1 events.

k   k  
P  Ei   P   Ei   Ek 1 
 i 1   i 1  

k 
 P  Ei   P  Ek 1 
 i 1 

k
  P  Ei   P  Ek 1 
i 1

 k 1  k 1
 P   Ei    P  Ei 
 i 1  i 1

 The result is true for n = k + 1 events. Hence by the pricniple of Mathematical


induction the result is true for all positive integral values of n.

141
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. If B  A , then prove that

(i) P  A  B   P  A   P  B 

(ii) P  B   P  A

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

9.4 EXERCISES

1. State and prove Boole’s inequality.

2. What is the probability of a sure event?

Ans. 1

3. Find the probability that the Sun rises in the East?

Ans. 1

9.5 SUMMARY

Some important theorems which have been proved are

(i) Probability of an impossible event is zero.

(ii) P  A   1  P  A 

(iii) P  A  B   P  B   P  A  B 

Boole’s inequality is given by

n  n
(a)  Ei    P  Ei    n  1
P
 i 1  i 1

n  n
(b) P  Ei    P  Ei 
 i 1  i 1
142
9.6 CHECK YOUR PROGRESS - MODEL ANSWERS

1. (i) When B  A , B and A  B are mutually exclusive events so that

A  B   A B

P  A   P  B   A  B  

 P B  P A  B

 P  A  B   P  A  P  B 

(ii) P  A  B   0  P  A   P  B   0

 P  B   P  A

Hence, P  B   P  A

9.7 MODEL EXAMINATION QUESTIONS

Section - I (Long Answers)

1. State and prove Boole’s in equality.

Section - II (Short Answers)

1. For any two events A and B defined on the sample space S, prove that

P A  B  PB  P A  B .

2. Show that probability of null event will be zero.

143
BLOCK - IV: BAYE’S THEOREM
The units included in this block are:

Unit - 10: Conditional Probability

Unit - 11: Independent Events

Unit - 12: Baye’s Theorem

144
UNIT - 10: CONDITIONAL PROBABILITY
Contents

10.0 Objectives

10.1 Introduction

10.2 Definition

10.3 Multiplication Theorem of Probability

10.4 Exercise

10.5 Summary

10.6 Check Your Progress - Model Answers

10.7 Model Examination Questions

10.0 OBJECTIVES

After studying this unit, you should be able to:

 Understand the definition of conditional probability

 Learn how to use the formula for conditional probability.

 Understand how to use the multiplication rule to find the probability of the intersection
of two events.

10.1 INTRODUCTION

The concept of conditional probability is one of the most fundamental and one of the
most important concepts in probability theory. Conditional probability is a measure of the probability
of an event given that another event has occured. P(A) is the probability of an event A which
represents the likelihood that a random experiment will result in an outcome in the set A relative
to the sample space S of the random experiment. While evaluating probability of some event,
we already have some information stemming from the experiment. That is, a conditional
probability is a probability that a certain event will occur given some knowledge about the
outcome of some other event.

145
10.2 DEFINITION

The conditional probability that an event A will occur, given that B has already occurred.
It is denoted by P  A / B  (read as conditional probability of A given B) and it is defined as

P A  B
P A/ B  , PB  0
P B

Example 1:A die is rolled once. If the outcome is an odd number, what is the probability that it
is prime?

Solution: In rolling a die, the sample space S = {1, 2, 3, 4, 5, 6}.

Let A be an event of getting an odd number.

B be an event of getting a prime number.

then A = {1, 3, 5}, B = {2, 3, 5}, A  B  3,5

3 3 2
P  A  , P  B   , P  A  B  
6 6 6

P A  B 2/ 6 2
P  B / A   
P  A 3/ 6 3 .

Example 2: Two coins are tossed. What is the conditional probability of getting two heads
given that atleast one coin shows a head?

Solution: Sample space = {HH, HT, TH, TT}

A = event that atleast one coin shows a head then A = {(HT), (TH), (HH)}

B = Event of getting two heads B = {(H H}

A  B  H H 

3 1 1
P  A  , P  B   , P  A  B  
4 4 4

P  A  B  1/ 4 1
P  B / A   
P  A 3/ 4 3 .

146
Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. In a class 40% students read Statistics, 25% Mathematics and Statistics. One student
is selected at random. Find the probability:

(i) That he studies Statistics, if it is known that he reads Mathematics.

(ii) That he studies Mathematics, if it is known that he has studied Statistics.

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

10.3 MULTIPLICATION THEOREM OF PROBABILITY

The probability of simultaneous occurrence of the events A and B is given by the


product of unconditional probability of an event A by conditional probability of B given that A
has happened.

10.3.1 Theorem

For two events A and B,

P  A  B   P  A  .P  B / A , P  A   0

 P  B  .P  A / B  , P  B   0

Where P  B / A represents conditional probability of occurrence of B when the event


A has already happened and P  A / B  is the conditional probability of happening of A given
that B has already happened.

n( A) nB n A  B
Proof: We have P  A  ; P B  and P A  B  ...(*)
nS  nS  n S 

For the conditional event A / B , the favourable outcomes must be one of sample points
of B, i.e. for the event A / B , the sample space is B and out of the n( B) sample points,
n  A  B  pertain to the occurrence of the event A.

147
n A  B
Hence, P  A / B  
n B

Rewriting (*), we get

n  B  n  A  B
P  A  B    P  B  .P  A / B 
nS  nB

Similarly,

we get from (*)

n  A n  A  B 
P  A  B    P  A  .P  B / A 
nS  n  A

Thus, the probability of the simultaneous occurrence of two events A and B is equal to
the product of the probability of one of these events and the conditional probability of the other,
given that the first one has occurred.

10.3.2 Extension of Multiplication Theorem of Probability to n - events

Theorem

For n - events A1 , A2 ,..., An , we have

P  A1  A2  ...  An   P  A1  .P  A2 / A1  .P  A3 / A1  A2  ...P  An / A1  A2  ...  An 1 

where P  Ai / A j  Ak  ...  Al  represents the conditional probability of the event Ai

given that the events A j , Ak ,..., Al have already happened.

Proof: We can prove this theorem by the principle of mathematical induction.

For two events A1 and A2

P  A1  A2   P  A1  .P  A2 / A1 

For three events

P  A1  A2  A3   P  A1   A2  A3  

 P  A1  .P  A2  A3 / A1 

 P  A1  .P  A2 / A1  .P  A3 / A1  A2 

148
 The result is true for n = 2 and 3 events.
Let us suppose that the result is true for n = k events.

P  A1  A2  ...  Ak   P  A1  .P  A2 / A1  .P  A3 / A1  A2  ...P  Ak / A1  A2  ...  Ak 1 

Now we have to prove that the result is true for n = k + 1 events.

P  A1  A2  ...  Ak  Ak 1   P  A1  ...  Ak  .P  Ak 1 / A1  A2  ...  Ak 

 P  A1  .P  A2 / A1  .P  A3 / A1  A2  ...P  Ak / A1  A2  ...  Ak 1  .P  Ak 1 / A1  A2  ...  Ak 

The result is true for n = k + 1. Hence, by the principle of mathematical induction, the
result is true for all the positive integral values.

1 1 1
Example 3: LetA and B be two events such that P  A  , P  B   and P  A  B   .
2 3 4
Obtain probabilities (i) P  A / B  (ii) P  A  B  and (iii) P  A  B  .

1 1 1
Solution: We are given P  A  , P  B   and P  A  B  
2 3 4

P  A  B  1/ 4 3
(i) P  A / B =  
P B 1/ 3 4

(ii) P  A  B  = P  A  P  B   P  A  B 

1 1 1 7
   
2 3 4 12

(iii) P  A  B  = 1  P  A  B

7 5
= 1  .
12 12

Example 4: Two dice are thrown. Find the probability that the sum of the numbers in the two
dice is 10, given that the first die shows a six.

Solution: Let A be the event that sum of numbers is 10.

Let B be the event that the first die shows 6.

Thus, A = {(4, 6), (5, 5), (6, 4)}; B = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}

and A  B   6, 4  , Also n  S   36

149
n  A 3 n  B 6
Thus, P  A  n S  36 , P  B   n S  36
   

n A  B 1
P  A  B  
nS  36

Thus, the required probability is

P  A  B  1/ 36 1
P A/ B   
P B 6 / 36 6 .

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1 1 11
2. Let A and B be events such that P  A   , P  B   and P  A  B   . Find
3 5 30

(i) P  A / B  (ii) P  B / A .

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

10.4 EXERCISE

1. A bag contains 5 white and 3 black balls. Two balls are drawn at random one after the
other without replacement. Find the probability that both balls drawn are black.

3
(Ans. P  A  B   )
28

2. Find the probability of drawing a queen, a king and a knave (Jack) in that order from a
pack of cards in three consecutive draws, the cards drawn not being replaced.

[Ans. 0.00048]

3. From a city population, the probability of selecting

(i) a male or a smoker is 7/10

(ii) a male smoker is 2/5 and

150
(iii) a male, if a smoker is already selected is 2/3.

Find the probability of selecting

a) a non-smoker b) a male and c) a smoker, if a male is first selected.

[Ans. a) P(B) = 3/5, b) P(A) = 1/2, c) P(B/A) = 4/5]

4. A bag contains 17 counters marked with the numbers 1 to 17. A counter is drawn and
replaced; a second drawing is then made what is the probability that the first number
drawn is even and the second odd?

[Ans. P  A  B   9 / 34 ]

5. Sixty percent of the employees of the XYZ corporation are college graduates. Of
these, ten percent are in sales. Of the employees who did not graduate from college,
eighty percent are in sales. What is the probability that an employee selected at random
is in sales?

[Ans. P(B) = 0.38]

10.5 SUMMARY

Conditional probability is probability of the conditional event denoted by P  B / A and

P A  B
given by P  B / A  , P  A  0 .
P  A

Thus, the probability of an event B occurring when it is known that some event A has
already occurred is called the conditional probability.

Another fundamental rule of probability is multiplication rule. According to multiplication


rule if A and B are two dependent events then

P  A  B   P  A .P  B / A

Multiplicatiion rule for n - events is given by

P  A1  A2  ...  An   P  A1  .P  A2 / A1  .P  A3 / A1  A2  ...P  An / A1  A2  ...  An 1 

151
10.6 CHECK YOUR PROGRESS - MODEL ANSWERS

1. Sample Space (S) = 100

40
number of A = 40, P(A) =
100

25
number of B = 25, P(B) =
100

15
number of A and B = 15, P  A  B  
100

P  A  B  15 /100 15 3
i) P  A / B    
P B 25 /100 25 5

P  A  B  15/100 15 3
ii) P  B / A    
P  A 40 /100 40 8

P  A  B  1/ 6 5
2. P A/ B   
PB 1/ 5 6

P  A  B  1/ 6 3 1
P  B / A    
P  A 1/ 3 6 2

10.7 MODEL EXAMINATION QUESTIONS

Section - I (Long Answer Questions)

1. State and prove multiplication theorem of probability for two events and also extend it
for ‘n’ events.

Section - II (Short Answer Questions)

1. State conditional probability.

2. State Multiplication theorem of probability.

1 1 1
3. Let A and B be two events such that P  A   , P  B   and P  A  B   . Obtain
2 3 4
the probabilities.

i) P  A / B 

ii) P  A  B  and P  A  B  .
152
UNIT - 11: INDEPENDENT EVENTS
Contents

11.0 Objectives

11.1 Introduction

11.2 Independent Events

11.3 Conditional Probability

11.4 Examples

11.5 Summary

11.6 Check Your Progress - Model Answers

11.7 Model Examination Questions

11.0 OBJECTIVES

After studying this unit you should be able to determine:

 Whether two given events are independent or not

 the conditional probability of an event given another event.

11.1 INTRODUCTION

In a random experiment the occurence of an event E may affect or may not affect the
occurence of another event F. In the case E may not affect F, we call the events independent.
In the case E may affect F, we call E, F are dependent.

In the random experiment of tossing of a dice repeatedly, the event of getting ‘4’ in 1st
throw is independent of getting ‘4’ in second, third or subsequent throws.

In drawing cards from a pack of 52 cards, the outcome of the second draw will depend
upon the card drawn in the first draw. Howerver, if the card drawn in the first draw is put back
in the place before drawing the second card, then the outcome of the second draw will be
independent of the first draw.

11.2 INDEPENDENT EVENTS

Two or more events are said to be independent if the occurrence of one event does not
affect the occurrence of all other events. For example in tossing of a coin getting head and tail
are independent events. Because the event tail will not affect the happening or no happening of
the other event called head. In tossing of coin the second trial is independent of the first trial.

153
The events therefore, are called independent. But if out of 52 cards one card is drawn, then
only 51 cards are left. Unless the card is replaced, the composition stands changed and the
probability of second card is affected. In this case the events are dependent i.e. the second
draw depends on the first draw and third draw depends on the first and second draw and so on.

When events are independent the probability of occurrence of events A and B is equal to the
product of their unconditional probabilities. For instance P(A and B) = P  A  B   P  A .P  B  .

11.2.1 Illustration

A box contains 5 white and 3 black balls. A ball is drawn at random frorn the bag and
replaced. Another ball is drawn at random after the replacement. Find the probability of getting
both are white black balls.

Solution: Let A be the event of getting black ball in the first draw and B be the event of getting
black ball in the second draw.

Since there are 5W+3B =8 balls in a box out of 8 balls 3 balls are black. The probability of
getting black ball in the first draw = P(A) = 3/8

In the second draw balls are replaced and the number of balls in the box are 8, therefore the
probability of getting again black ball not affected by the first draw, which is P(B) = 3/8.

 3  3 9
Hence P(A and B) = P  A  B   P  A  .P  B         .
 8   8  64

More than two Events

n
A finite set of events  Ai i 1 is pairwise independent if every pair is independent that
is, if and only if for all distinct pairs of indices m, k,

P  Am  Ak   P  Am  P  Ak  .

A finite set of events is mutually independent if every event is independent of any


k
intersection of the other events that is, if and only if for every k - element subset Bi i 1 of
n
 Ai i 1 ,

 k  k


P  Bi   PB  i
 i 1  i 1

This is called the multiplication rule for independent events. Note that it is not a single
condition involving only the product of all the probabilities of all single events; it must hold true
for all subsets of events.

For more than two events, a mutually independent set of events is (by definition) pairwise
independent; but the converse is not necessarily true.

154
Self - independence

Note that an event is independent of itself if and only if

P  A   P  A  A   P  A .P  A   P  A  0 or 1

Thus an event is independent of itself if and only if almost surely occurs or its complement
almost surely occurs; this fact is useful when proving zero-one laws.

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

11.2.2 In the random experiment of tossing a coin twice, are the cases of getting head first
and then the tail independent events?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
11.2.3 In the random experiment of tossing a coin twice the cases of getting all the tails are
independent events?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

11.3 CONDITIONAL PROBABILITY

When the events are not independent:

If the events are not independent then the events A and B are so related that occurrence
of B is affected by the occurrence of A. Then events A and B are called dependent events. Or,
if the happening or no happening of an event affects the happening or no happening of all the
other events then events are said to be not independent. The probability of event B depending
on the occurrence of event A is called conditional probability and is written as P(B/A) which
may be read, “the probability of B given A”. In this case the probability that both the events A
and B will occur is given by

P  A  B   P  A .P  B / A , P  A  0 , or P  A  B   P  B  .P  A / B  , P  B   0

155
Remarks

1. We note that P(A/B) = P(A), P(B/A) = P(B) if A and B are independent. Because
event B does not affect the event A and event A does not affect the event B.

2. The multiplication rule can be generalised to cover more than two events. For example
if A,B and C are any three events. The probability of their simultaneous occurrence of
three events A,B and C, is given by

P(A and B and C) = P  A  B  C   P  A  .P  B / A  .P  C / A and B 

c
3. If A and B are independent events, A and B are also independent

P(A and B c ) = P  A  P  A  B 

 P  A   P  A .P  B  (A and B are independent)

 P  A  1  P  B  

 
 P  A  .P B c . A and B c are also independent.

4. If A and B are independent events, Ac and B are also independent

P ( Ac and B) = P  B   P  A  B 

= P  B   P  A .P  B  (A and B are independent)

 P  B  1  P  A  

 
 P  B  .P Ac . Ac and B are also independent.

c
5. If events A and B are independent then A and B c are also independent.

 
P Ac  B c  1  P  A  B 

 1  P  A   P  B   P  A  .P  B  since A and B are independent

 1  P  A  1  P  B  

   
 P Ac .P B c . Ac and B c are also independent.

156
11.4 EXAMPLES

11.4.1 Rolling Dice

The event of getting 5 the first time a dice is rolled and the event of getting a 5 the
second time are independent. By contrast, the event of getting a 5 the first time a dice is rolled
and the event that the sum of the numbers seen on the. first and second trial is 8 are not
independent.

11.4.2 Drawing cards

If two cards are drawn with replacement from a deck of cards, the event of drawing a
red card on the first trial and that of drawing a red card on the second trial are independent. By
contrast, if two cards are drawn without replacement from a deck of cards, the event of
drawing a red card on the first trial and that of drawing a red card on the second trial are not
independent, because a deck that has had a red card removed has proportionately fewer red
cards.

11.4.3 Pairwise and mutual independence

Consider the two probability spaces shown. In both cases, P(A) = P(B) = 1/2 and
P(C) = 1/4. The random variables in the first space are pairwise independent because
P(A/B) = P(A/C) =1/2 = P(A), P(B/A) = P(B/C) = 1/2 = P(B), and P(C/A) = P(C/B) = 1/4 =
P(C); but the three random variables are not mutually independent. The random variables in the
second space are both pairwise independent and mutually independent. To illustrate the
difference, consider conditioning on two events. In the pairwise independent case, although any
one event is independent of each of the other two individually, it is not independent of the
intersection of the other two:

4
40 4
P  A / BC     P  A
4 1 5

40 40

4
40 4
P  B / AC     PB
4 1 5

40 40

4
40 2
P  C / AB     P C 
4 6 5

40 40

157
In mutually independent case, howerver,

1
16 1
P  A / BC     P  A
1 1 2

16 16

1
16 1
P  B / AC     PB
1 1 2

16 16

1
16 1
P  C / AB     P C 
1 3 4

16 16

158
Remark: Mutual Independence

It is possible to create a three-event example in which

P  A  B  C   P  A  .P  B  .P  C  ,

and yet no two of the three events are pairwise independent (and hence the set of
events are not mutually independent). This example shows that mutual independence involves
requirements on the products of probabilities of all combinations of events, not just the single
events as in this example. For another example, take A to be empty and B and C to be identical
events with non-zero probability. Then, since B and C are the same event, they are not
independent, but the probability of the intersection of the events is zero, the product of the
probabilities.

11.4.4 Example

A box contains four tickets marked with numbers 112, 121, 211, 222 and one ticket is
drawn at random. Let Bi (i = 1, 2, 3) be the event that the ith digit of the number of the ticket
drawn is 1. Discuss the independence of the events B1 , B2 and B3 .

Number of exhaustive cases = 4 C1  4 . Cases favourable to event B1 are those in


which 1st digit of the number is 1. i.e. 112 and 121. i.e. 2 in all.

2 1
 P  B1    .
4 2

Cases favourable to the event B2 are those in which the 2nd digit of the number is 1,
i.e. 112, 211, i.e. 2 in all.

2 1
 P  B2   
4 2

Similarly, cases favoruable to the event B3 are 121, 211, i.e. 2 in all.

2 1
 P  B3   
4 2

Cases favourable to the event B1  B2 are those number in which the first as well as
the second digit is 1 is 112, i.e. only 1.

1 1 1
 P  B1  B2    .  P  B1  .P  B2 
4 2 2

 B1 , B2 are independent.

159
Cases favourbale to the event B2  B3 are those numbers in which 2nd as well as 3rd
digit is 1, viz., 211, i.e.only 1

1 1 1
 P  B2  B3    .  P  B2  .P  B3 
4 2 2

 B2 , B3 are independent.

Similarly, the only case favourable to the event B1  B3 is 121.

1 1 1
 P  B1  B3    .  P  B1  .P  B3 
4 2 2

 B1 , B3 are independent.

Hence B1 , B2 and B3 are pairwise independent. But because there is no case favourable
to the event B1  B2  B3 (i.e. all the three digits of number are 1’s), we get

1 1 1 1
P  B1  B2  B3   P    0   . .  P  B1  P  B2  P  B3  .
8 2 2 2

Hence, B1 , B2 and B3 , though pairwise independent, are not mutually independent.

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

11.4.5 Say B is the event of drawing a card randomly from a pack of 52 cards and the
outcome is Hearts symbol. What is the conditional probability P(A/B) where A is the
event of getting Ace?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
11.4.6 A cubical dice is rolled randomly and let B be the event of getting an even outcome. If
B occured, what is the probability of P(A/B) where A is getting the outcome ‘2’?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

160
11.5 SUMMARY

In this unit you have learnt the concept of independent events. You also learnt the
concept of conditional probability. You studied multiplication rule and independence of events.
You have observed that this can be extended to any finite number of mutually independent
events.

11.6 CHECK YOUR PROGRESS - MODEL ANSWERS

1 1 1
11.2.2 P(H T) = , P(H) = , P(T) =
4 2 2

1 1
P(H T) = . = P(H) P(T)
2 2

HT is independent.

1 1
11.2.3 P(H H H) = , P(H) =
8 2

1 1 1
P(H H H) = . . = ....
2 2 2

so H H H independent.

13 1
11.4.5 P(B) = 
52 4

1
P A  B 
52

P  A  B  1/ 52 1
P(A/B) =  
P B 1/ 4 13 .

2 1
11.4.6 P A  B  
6 3

3 1
P B  
6 2

P  A  B  1/ 3 2
 
P B 1/ 2 3 .

161
11.7 MODEL EXAMINATION QUESTIONS

1. The probability of an economic decline in the year 2000 is 0.23. There is a probability
of 0.64 that we will elect a republican president in the year 2000. If we elect a republican
president, there is a 0.35 probability of an economic decline. Let “D” represent the
event of an economic decline, and “R” represent the event of election of a Republican
president.

a. Are “R” and “D” independent events?

b. What is the probability of a Republican president and economic decline in the year
2000?

c. If we experience an economic decline in the year 2000, what is the probability that
there will a Republican president?

d. What is the probability of economic decline or a Republican president in the year 2000?
Hint: You want to find P(D  R).

Answers:

a. 0.23 b. No, because P  A  M   0

c. No, because P  A  M   P  A .P  B  d. 0.4783 e. 0.6111

2. A survey of a sample of business students resulted in the following information regarding


the genders of the individuals and their selected major.

Selected Major

Gender Management Marketing Others Total

Male 40 10 30 80

Female 30 20 70 120

Total 70 30 100 200

a. What is the probability of selecting an individual who is majoring in Marketing?

b. What is the probability of selecting an individual who is majoring in Management, given


that the person is female?

c. Given that a person is male, what is the probability that he is majoring in Management?

d. What is the probability of selecting a male individual?

Answers: a. 0.15 b. 0.25 c. 0.50 d. 0.40

162
3. As a company manager for Claimstat Corporation there is a 0.40 probability that you
will be promoted this year. There is a 0.72 probability that you will get a promotion or
a raise. The probability of getting a promotion and a raise is 0.25.

a. If you get a promotion, what is the probability that you will also get a raise?

b. Are getting a raise and being promoted independent events? Explain using probabilities.

c. Are these two events mutually exclusive? Explain using probabilities.

Answers:

a. 0.625 b. 0.57

c. No, because P(R)  P(R | P) d. No, because P  R  P   0

4. A bank has the following data on the gender and marital status of 200 customers.

Male Female

Single 20 30

Married 100 50

a. What is the probability of finding a single female customer?

b. What is the probability of finding a married male customer?

c. If a customer is female, what is the probability that she is single?

d. What percentage of customers is male?

e. If a customer is male, what is the probability that he is married?

f. Are gender and marital status mutually exclusive?

g. Is marital status independent of gender? Explain using probabilities.

Answers:

a. 0.15 b. 0.5 c. 0.3 75 d. 60%

e. 0.833 f. No, the intersection is not zero.

g. They are not independent because P(male)= 0.6, P(male | single) = 0.4

163
UNIT - 12: BAYE’S THEOREM
Contents

12.0 Objectives

12.1 Introduction

12.2 Baye’s Theorem

12.3 Exercise

12.4 Summary

12.5 Check Your Progress - Model Answers

12.6 Model Examination Questions

12.0 OBJECTIVES

After studying this unit, you should be able to:

 Understand Baye’s theorem.

 Understand the concept of prior and posterior probability the likelihood used in Bayes
theorem.

12.1 INTRODUCTION

The concept of conditional probability discussed earlier takes into account information
about the occurrence of one event to predict the probability of another event. This concept can
be extended to revise probabilities based on new information and to determine the probability
that a particular effect was due to specific cause. The procedure for reversing these probabilities
is known as Baye’s theorem. The principle was given by Thomas Bayer in 1763. By this
principle, assuming certain prior probabilities, the posteriori probabilities and likelihood are
obtained.

12.2 BAYE’S THEOREM

If E1 , E2 ,..., En are mutually exclusive with P  Ei  0  ,  i  1, 2,..., n  , then for any


n

arbitrary event A which is a subset of E


i 1
i such that P  A  0 , we have

164
E  P  Ei  .P  A / Ei 
P i   n ; i  1,2,..., n
 A

P  Ei  .P  A / Ei 
i 1

Proof: Consider the following venn diagram

n
 n  n

Since A   Ei , we have A  A   Ei     A  E  i
i 1  i 1  i 1

Since  A  Ei   Ei , i  1, 2,3,..., n are mutually disjoint events we have by addition


theorem of probability.

n  n n


P  A  P   A  Ei     P  A  Ei    P  E  .P  A / E 
i i ... (1)
 i 1  i 1 i 1

The conditional probability of an event Ei given that A has already occurred, is given
by

P  A  Ei 
P  Ei / A  
P  A

P  Ei  P  A / Ei 
 ... (2)
P  A

Using equation (1), equation (2) can be written as

P  Ei  .P  A / Ei 
E  Ei / A   n

 P  E  .P  A / E 
i 1
i i

165
12.2.1 Remark:

1. The probabilities P  E1  , P  E2  ,..., P  En  are termed as the ‘a priori probabilities’


because they exist before we gain any information from the experiment itself.

2. The probabilities P  A / Ei  , i  1,2,3,..., n are called ‘likelihoods’ because they indicate


how likely the event A under consideration is to occur, given each and every apriori
probability.

3. The probability P  A / Ei  , i  1,2,3,..., n are called Posterior probabilities because they


are determined after the results of the experiment are known.

Example 1: Suppose a factory has two machines. Past records show that machine 1 produces
30% of the items of output and machine 2 produces 70% of the items. Further, 5% of the items
produced by machine 1 were defective and only 1% produced by machine 2 were defective. If
a defective item is drawn at random, what is the probability that the defective item was produced
by machine 1 and machine 2.

Solution: Let E1 be the event of drawing an item produced by machine 1.

E2 be the event of drawing an item produced by machine 2.

Let A be the event of drawing a defective item produced either by machine 1 or


machine 2.

Given that P  E1   30%  0.30 , P  E2   70%  0.70 , P  A / E1   5%  0.05 ,


P  A / E2   1%  0.01

P  A   P  E1  .P  A / E1   P  E2  .P  A / E2 

= (0.30) (0.05 + (0.70) (0.01) = 0.022.

Required probability is that the defective item produced by machine 1 is

P  E1  .P  A / E1   0.30  0.05 
P  E1 / A     0.682 .
P  A 0.022

Probability of defective item produced by machine 2 is

P  E2  .P  A / E2   0.70  0.01
P  E2 / A    0.318 .
P  A 0.022

166
4 2 1
Example 2: The probabilities of X, Y and Z becoming managers are , and respectively..
9 9 3
The probability that bonus scheme will be introduced if X, Y and Z become managers are
3 1 4
, and respectively. If the bonus scheme has been introduced, then what is the probability
10 2 5
that the manager appointed was X.

Solution: Let E1 , E2 , E3 denote the events that X, Y and Z become managers respectively and
A denote event that bonus scheme is introduced.

4 2 1
P  E1   , P  E2   , P  E3  
9 9 3

3 1 4
P  A / E1   , P  A / E2   , P  A / E3  
10 2 5

P  A  P  E1  A  P  E2  A   P  E3  A

 P  E1  .P  A / E1   P  E2  .P  A / E2   P  E3  .P  A / E3 

4 3 2 1 1 4 23
 .  .   
9 10 9 2 3 5 45

Thus required probability is

P  E1  .P  A / E1  12 / 90
P  E1 / A    0.26 .
P  A 23/ 45

Check Your Progress:

Note: (a) Space is given below for writing your answer.

(b) Compare your answer with the one given at the end of this unit.

1. Suppose 5 men out of 100 and 25 women out of 10,000 are colour blind. A clolour blind
person is chosen at random. What is the probability of the person being a male?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

167
2. A business man goes to hotels X, Y and Z 20%, 50% and 30% of the times respectively.
It is known that 5%, 4% and 8% of the rooms in X, Y, Z hotels have faulty plumbing?
What is the probability that business man’s room having faulty plumbing is assigned to
hotel Z?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. A manufacturing company produces pipes in 2 plants I and II with daily production of
1500 and 2000 respectively. The fractions defective of the pipes produced by two
plants I and II are 0.006 and 0.008 respectively. If a pipe is selected at random from
the day’s production and found to be defective, what is the probability that it has come
from plant II?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________

12.3 EXERCISE

1. In a factory, machine A produces 40% of the output and machine B produces 60%. On
the average, 9 items in 1000 produced by A are defective and 1 item in 250 produced
by B is defective. An item drawn at random from a day’s output is defective. What is
the probability that it was produced by A or B?

Ans. P(A/E) = 0.6, P(B/E) = 0.4

2. First box contains 2 black, 3 red, 1 white balls; second box contains 1 black, 1 red, 2
white balls and third box contains 5 black, 3 red, 4 white balls. Of these a box is
selected at random. From it a red ball is randomly drawn. If the ball is red, find the
probability that it is from second box.

1
Ans. P  E2 / A    0.25
4

3. Three members X, Y and Z of a private club have been nominated for the office of the
president. The probability that Mr. X will be elected is 0.3, the probability that Mr. Y
will be elected is 0.5 and the probability that mr. Z will be elected is 0.2. Should Mr. X
be elected, the probability for an increase in membership fee is 0.8. Should Y or Z be

168
elected, the corresponding probability for an increase in fee are 0.2 and 0.3. If fee has
been increased, what is the probability that

(i) Mr. X was elected president of the club?

(ii) Mr. Y was elected president of the club?

Ans. P  E1 / A   0.52 , P  E2 / A   0.22

4. In 2019 there will be three candidates for the position of principal. A, B and C whose
chances of getting the appointment are in the proportion 4:2:3 respectively. The
probability that A if selected would introduce coeducation in the college is 0.3. The
probabilities of B and C doing the same are respectively 0.5 and 0.8. If there is coeduction
in the college in 2020, what is the probability that C is the principal.

12
Ans. P  E3 / A  
23

5. In a factory manufacturing electric razors, machines X, Y and Z manufactures 30%,


30% and 40% of the total production of electric razors respectively on their output,
4%, 5% and 10% of the electric razors respectively are defective. If one electric razor
is selected at random and if it is found to be defective, what is the probability that it is
manufactured by machine 2?

Ans. P(Z/A) = 40/67

12.4 SUMMARY

In this unit we have introduced Bayers’ theorem. The concept of conditional probability
takes into account the information about the occurrence of one event to predict the probability
of another event. Bayes’ theorem is used to determine the probability of some event A given
that another event B has observed.

Theorem states that if E1 , E2 ,..., En are n mutually exclusive and exhaustive events
such that P  Ei   0(i  1, 2,..., n) in a sample space S and A is any other event in S intersecting
with every Ei such that P  A  0 then

P  A / Ei  .P  Ei 
P  Ei / A  n

 P  A / E .P  E 
i 1
i i

P  Ei  is ‘a priori probability’ known even before the experiment. P  A / Ei  ‘likelihoods’


and P  Ei / A  ‘Posteriori probabiliteis’ determined after the result of the experiment.

169
12.5 CHECK YOUR PROGRESS - MODEL ANSWERS

1
1. The probability that the chosen person is male, P  M   .
2

1
Probability that the chosen person is female, P  F   .
2

Let A be the event that the chosen person is blind.

5 25
Given that P  A / M    0.05, P  A / F    0.0025
100 10000

P  A  P  M  .P  A / M   P  F  .P  A / F   0.026

P(M/A) = 0.95

2 5 3
2. P X   , P Y   , P  Z  
10 10 10

1 1 2 4
PE / X   , PE /Y   , PE / Z   , P  Z / E    0.44
20 25 25 9

3 4
3. P  E1   , P  E2   , P  A / E1   0.006, P  A / E2   0.008, P  E2 / A   0.64 .
7 7

12.6 MODEL EXAMINATION QUESTIONS

Section - I (Long Answers)

1. State and prove Baye’s theorem.

Section - II (ShortAnswers)

1. Explain the importance of Baye’s theorem.

2. State Baye’s theorem.

170
Dr. B. R. Ambedkar Open University
Faculty of Science
First Semester, I Year (3 Year Degree Programme)
MODEL EXAMINATION QUESTION PAPER
STATISTICS BS127STAT-E
DESCRIPTIVE STATISTICS & PROBABILITY
[Time: 3 hours] [Max. Marks: 80]
Section - A
(Short Answer Questions)
[Marks: 4 x 5 = 20]

Note: (a) Answer any FOUR of the following EIGHT questions.


(b) Each question carries 5 marks.

1. [Block - I] State the preliminary steps you would take for planning a statistical enquiry.

2. [Block - I] What is macro editing? When is it appropriate? Explain with example.

3. [Block - II] What is average?

4. [Block - II] What are merits of A.M?

5. [Block - III] What is impossible event?

6. [Block - III] State and prove addition theorem of probability.

7. [Block - IV] State Multiplication theorem of probability.

8. [Block - IV] What is the probability of economic decline or a Republican president in


the year 2000? Hint: You want to find P(D  R).

Section - B
[Long Answer Questions]
[Marks: 4 x 10 = 40]
Note: (a) Answer the following questions.
(b) Each question carries 10 marks.

171
9. [Block - I] A company would like to conduct a survey to know about its customer’s
satisfaction. In this case which data (primary or secondary) will you recommend to the
company? What are the data collection methods you suggest to the company for data
collection ? Why? State the appropriate reasons.

OR

10. [Block - I] Write a note on published sources of secondary data.

11. [Block - II] What are measures of central tendency? Explain their objectives and
functions.

OR

12. [Block - II] Explain the computation procedure of standard deviation in case of devia-
tion taken from the actual mean and deviation taken from an assumed mean for indi-
vidual data.

13. [Block - III] In a simultaneous throw of two dice

i) Find the probability of getting a total of 6.

ii) The total number on the dice is greater than 8.

iii) The total of the numbers on the dice is any number from 2 to 12, both inclusive.

OR

14. [Block - III] If two dice are thrown, what is the probability that the sum is

a) greater than 8 b) neither 7 nor 11?

15. [Block - IV] State and prove multiplication theorem of probability for two events and
also extend it for ‘n’ events.

OR

16. [Block - IV] The probability of an economic decline in the year 2000 is 0.23. There is
a probability of 0.64 that we will elect a republican president in the year 2000. If we
elect a republican president, there is a 0.35 probability of an economic decline. Let
“D” represent the event of an economic decline, and “R” represent the event of elec-
tion of a Republican president.

172
Section - C
(Objective Type Questions)
(Marks : 20)
Total number of questions 20 [=15 from (theory) and 5 from (practicals)].

I. Multiple Choice questions. (10 marks)


1. The mean of 1, 3, 5, 7, 9 is
(a) 3 (b) 9 (c) 5 (d) 1
2. The median of 1, 3, 5, 7, 9, 11 is
(a) 65 (b) 7 (c) 9 (d) 6
3. The mode of 1, 1, 3, 5, 5, 5, 7, 9, 9, 11, 11 is
(a) 7 (b) 5 (c) 9 (d) 11
4. The variance of a data is 8.27. Its standard deviation is
(a) 2.87 (b) 8.27 (c) 8.272 (d) 4.14
5. The harmonic mean of 1, 2, 3, 4 is
(a) 2.5 (b) 0.4 (c) 0.48 (d) 2.008
6. The geometric mean of 2, 4, 8, 12, 16, 24
(a) 11 (b) 8.158 (c) 0.01 (d) 12
7. The arithmetic mean of two observations is 127.5 and their geometric mean is 60.
Then harmonic mean is
(a) 30 (b) 25 (c) 28.24 (d) 31
8. Variance is

(b) E  X  (c) E  X   E  X  (d) E  X   E  X 


2 2 2
(a) E  X 

9. The nature of the frequency curve about which kurtosis speaks is


(a) zeros (b) flatness (c) asymptote (d) tangents
10. If for a data mean is 220, median is 250, mode is
(a) 205 (b) 100 (c) 210 (d) 200

173
II. Match the following. (5 marks)
1. The less than and the more than ogives are reflections in

[ ] (a) X

3N
2. Quartile deviatiion [ ] (b) x 
4

1
3.
N
 fx [ ] (c) negative

Q3  Q1
4. S.D is never [ ] (d)
2

N
5.  2 is [ ] (e) x 
4

N
(f) x 
2

(g) variance
III. Fill in the blanks (5 marks)
1. Inequalities satisfied by AM, GM, HM of a data are ___________
2. The sum of deviations of data from its mean is ___________
3. Geometric mean of a data can be found only when the data values are ___________
4. A distribution is negatively skewed if ___________ (an inequality by median, mean).
5. x, y are negatively correlated. If x increases, then y ___________
***

174

You might also like