Professional Documents
Culture Documents
B.Sc.
All rights reserved. No part of this book may be reproduced in any form without the permission
in writing from the University.
Further information on Dr. B.R.Ambedkar Open University courses may be obtained from
the Director (Academic), Dr. B.R.Ambedkar Open University, Road No. 46, Prof. G. Ram
Reddy Marg, Jubilee Hills, Hyderabad - 500033.
Web: www.braou.ac.in
E-mail: info@braou.ac.in
Printed at : ___________________________________________________________
II
PREFACE
The course material of Statistics is prepared in accordance with the guidelines issued
by the University Grants Commission and the Telangana State Commission of Higher Education
to suit the Choice Based Credit Format in the semester system. Keeping in view the diversity
of students of the open university with regards to their age, learning abilities and the minimum
academic pre-requirements, the course material is prepared in self-instructional mode with all
the inputs to help the students to learn by themselves. This text material is supplemented with
limited face to face instruction and audio/video support. From this year a separate Practical
Manual cum Record Book is prepared and supplied along with this course material. The students
are expected to attend the practical classes and practice the problem solving skills by solving
the problems given in the manual under the guidance of a counsellor. The text material of
theory part carries a weightage of 4 credits and the practical’s carry that of 1 credit.
In the open distance learning system the onus of learning rests with the learner. The
university provides an opportunity to realize the academic aspirations of the learner and facilitate
the learner to ‘LEARN TO LEARN’. In bringing this idea achievable to the student every
effort is made to make the learning a pleasure.
Any suggestions to improve the content of the course material are welcome and the
university would consider without fail.
III
CONTENTS
Block-I: Data 1
IV
BLOCK-1: DATA
1
UNIT-1: METHODS OF COLLECTION AND EDITING OF
PRIMARY DATA
Contents
1.0 Objectives
1.1 Introduction
1.5 Summary
1.0 OBJECTIVES
1.1 INTRODUCTION
We are living in a world where everything can be monitored and measured. If you
speak about anything without using a measurement your knowledge is very limited and not
satisfactory, when you can measure what you are speaking about, you know something about
it. Therefore, without data it is not possible to measure a phenomenon. Therefore, data collection
and analysis is playing vital role in various fields to take decisions. For example companies
maintain a variety of data about their employees, customers and business operations. Data on
employees are salaries, age and years of experience etc. This data can be obtained from
internal records of the company. It helps in understanding various facts regarding the employees.
Other internal records contain data on sales, advertising expense, distribution cost, inventory
levels etc. which can help the manager to know about facts like expenditure, sales and profits
which can help in decision making process. Obviously data types are many, broadly classified
into Primary and secondary data. This unit focuses on primary data, methods of collection and
editing of primary data.
2
1.2 STATISTICAL SURVEY
A statistical survey is nothing but a systematic search for truth. It seeks some authentic
answers to a problem which is quantifiable and, therefore amenable to statistical treatment.
The statistical inquiry has to pass through the following four stages.
1.2.1 Observation
The next step is lay down hypotheses on the basis of available information possibly by
using deductive logic. For example the business firm finds that the sales of one of its products
have been declining and it undertake to investigate the causes. The preliminary investigation
might yield following tentative results.
1.2.3 Prediction
It refers to anticipations about future deduced from the facts of preliminary investigation
or based on the opinion of experts. The verification of the truth would reveal how a particular
prediction was correct.
1.2.4 Verification
The final stage is that of the actual verification of the truth of the prediction. For this
purpose, experiments have to be conducted and observations should be recorded. This will
reveal whether the methods followed in prediction was sound. Deviations are within the toler-
ance limits may be ignored. But, if the differnce is significant the methods adopted for predic-
tion have to be modified and improved.
3
1.3 PLANNING OF STATISTICAL SURVEY
1.3.1. Purpose
Before a survey is conducted, one should be clear about the purpose underlying it. A
clear statement of the purpose is necessary when the survey is handed over to any agency.
Clear purpose statement will enable us the steps we need to follow.
Scope of survey explains the various aspects have to be covered to achieve the given
objectives. For example in case of declining sales of the product, it is clearly stated whether the
quality aspect has to be covered, also whether the external forces affecting the sales have to be
considered.
There may be various statements of purpose and scope of the work which should have
many things. To give a clean idea about the statement we avoid any mistakes due to use of
terminology the various terms should be properly defined.
During the survey if the scope of survey is not clearly known either to the persons
conducting the survey or the agency entrusted with the task. The best course in such
circumstances is to get a dummy report prepared on the basis of some broad ideas about the
nature and scope of the work. Such reports will indicate the formats of the tables in which the
final results will give clear picture of the scope of work for those who have commissioned the
study and those who are going to conduct it.
Primary data refers to the data collected for the first time. Primary data are originated
by a researcher for the specific purpose of addressing the problem on hand. For example the
statistics collected by the Government organisation relating to the population is primary data for
that organisation since it has been collected to achieve a particular objective. A data set collected
by an organisation to know its customer perceptions from its customers is another example for
primary data. Primary data might be a qualitative or quantitative in nature. Quantitative data
can be obtained from the phenomenon which can be measured in terms of units. For example
height, weight, length, temperature etc. can be mesured by using appropriate measurements
like inches, kilograms, meters, centigrade etc. On the other hand when we have data sets
4
which can’t be measured in terms of numbers such data is called qualitative data. For example
honesty, braveness, kindness, satisfaction, perceptions etc. can’t be measured in terms of units.
These can be mesured by using qualitative scales. The measurements of such qualitative
phenomenon is called qualitative data sets.
The following are the methods to collect primary data. By applying any of the following
methods we can collect
Primary Data
This method is face to face interview, investigator will contact the respondents from
whom the information is to be obtained. The interviewer asks the questions to the respondents
pertaining to the survey and collects the required information. Thus, if a person wants to collect
data regarding perceptions on the working conditions of the employee of an IT Company in
Hyderabad, he will go to the company concerned, contact the employee working in the organisation
and obtain the desired information. The information collected in this manner is first hand and
also original in character, it is known as primary data.
1. Door-to-Door Interview
In this method investigator contacts the respondents at their residence. This method
gives an opportunity to an investigator to uncover some real facts and figures otherwise it is
very difficult to obtain the data.
The investigator stationed at the entrance of the shopping mall invite the respondents to
participate in the structured interview process. It is a cost efficient approach in addition due to
the limited coverage space, a researcher can control the interview up to some extent which
otherwise is extremely difficult when the respondents are scattered in a huge geographical
area. A researcher can also use efficiently a huge respondent pool available at different mall
locations.
5
3. Office Interview
This method is appropriate when the researcher’s objective is to unfold the consumer
attitude on any industrial product, services or opinion on government’s new policies from
various categories of employees. To achieve this objective investigator will conduct an office
interview.
4. Self-administered questionnaire
In this method the respondents provide their information input through a computer
terminal by using key board. To collect large amounts of data investigator’s will use this method
6. Observation
Under this method investigator collects the information by mere observing the respondent
and he records the data. This is the most useful method in some marketing studies where
consumer behaviour is studied.
There are many merits and demertis of these methods, which are discussed as under.
Merits
1. Respondents feel free to give the required information when contacted personally.
2. Most of the times the data collected through this method is more accurate because of
the face to face Interaction.
3. This method also provides the scope for getting supplementary information from the
respondent, because while interviewing it is possible to ask some supplementary questions
which may be of greater use later.
4. The interviewer can change his language according to the educational status and level
of understanding of the respondents to avoid inconvenience and confusions of the
respondent.
Demerits
2. There is a greater chance of personal bias and prejudice under this method as compared
to other methods.
6
3. The interviewers have to be thoroughly trained and experienced; otherwise they may
not be able to obtain the desired information. Untrained or poorly trained interviewers
may spoil the entire work.
4. This method is more time taking as compared to other methods. This is because
interviews can be held only at the convenience of the informants.
Conclusion
Though there are some demerits still we cannot say that it is not useful. It can be used
when the area of study is limited. Now a days cause of the advancement in the communication
system investigator can collect data through phone, email or by mailing questionnaires.
2. Indirectoral Interviews
According to this method the investigator meets the third parties generally called
‘witnesses’ who are capable of supplying necessary information. Generally this method can be
employed when the data to be collected is complex in nature and respondent may not be willing
to give information directly.
For example, when the researcher is trying to obtain data on drug addiction or on AIDS
disease or on the habit of alcohol drinking, respondent may not show interest to share information
with the investigator. In these situations researcher will approach third person or third parties
who know information regarding the respondent. This method of data collection is called indirect
oral interview. The accuracy of the data depends upon the ability of the investigator to draw
information from the third parties.
Merits
Demerits
1. The degree of accuracy of information sometimes is less and it may not be reliable
7
of a wholesale price index numbers regular information is obtained from correspondents
appointed in different areas. This method is more economical and appropriate for extensive
investigation. But it may not always ensure accurate results because of the personal prejudice
and bias of the correspondents.
Merits
3. In case of skilled and experienced local agents, the data obtained are of good quality.
Demerits
1. The data obtained by this method may not be reliable because of personal prejudices.
Questionnaire is the set of questions for collecting desired information from the
respondents. A questionnaire can be administered personally or mailed to the respondents. It is
an efficient method of collecting primary data when the investigator knows what exactly is
required and how to measure the variables of interest as: Behaviour - past, present, or intended,
Demographic characteristics - Age, Gender, Income and Occupation, Level of knowledge,
Attitudes and opinions. The questionnaire design process is a step-wise and structured process
which begins with converting study objectives into information needs and specifying the
populations.
Mertis
Demertis
2. Most of the times the collected data by this method may not be precise
8
1.5.2 Methods of Editing the Primary Data
Data editing is a process which reviews and adjusts the collected data. The main objective of
editing is to detect possible errors and irregularities and to improve the quality of the data. The
task of editing is a highly specialized one and requires great care and attention. We can’t
overlook because, if the collected data is not a representative data of the population we may get
useless findings which may not serve the purpose. If the data collected from the internal sources
and from published sources, editing data becomes simple. If the data collected from the surveys
then we need extensive editing. However, it should be noted that after collecting data editing is
mandatory to check its representativeness. Data editing can be done by manually or with the
help of computer. The following are some of the methods of primary data editing.
1. Interactive editing
2. Selective editing
3. Macro editing
4. Automatic editing
1. Interactove Edotomg
The term interactive editing is commonly used for modem computer-assisted manual
editing. Most interactive data editing tools applied at National Statistical Institutes (NSIs) allow
one to check the specified edits during or after data entry, and if necessary to correct erroneous
data immediately. Several approaches can be followed to correct erroneous data:
• Compare the data given by the respondent with the data collected earlier from the
same.
• Use the common sense and your knowledge to gauge the data collected from the
respondents.
Interactive editing is a standard way to edit data. It can be used to edit both categorical
and continuous data. Interactive editing reduces the time frame needed to complete the cyclical
process of review and adjustment.
2 Selective Editing
In selective editing we identify the influential errors and outliers, in selective editing the
data will be splitting into two parts or streams. I. The critical stream 2. The non - critical
stream. The critical stream consists of records that are more likely to contain influential errors.
These critical records are edited in a traditional interactive manner. The records in the non-
critical stream which are unlikely to contain influential errors are not edited in a computer
assisted manner.
9
3. Macro Editing
1. Aggregation Method
This method is followed in almost every statistical agency before publication: verifying
whether figures to be published seem plausible. This is accomplished by comparing quantities in
publication tables with same quantities in previous publications. If an unusual value is observed,
a micro-editing procedure is applied to the individual records and fields contributing to the
suspicious quantity
2. Distribution Method
Data available is used to characterize the distribution of the variables. Then all individual
values are compared with the distribution. Records containing values that could be considered
uncommon (given the distribution) are candidates for further inspection and possibly for editing.
4 Automatic Editing
(b) Compare your answer with the one given at the end of this unit.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
1.5.4 Write about different sources of primary data.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
10
1.5.5 What are the methods you employ to collect primary data.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
1.5.6 What is data editing? Why should we edit primary data?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
1.5 SUMMARY
In this unit you have defined and identified primary data. You have learnt to identify
sources of primary data. You have looked into various methods used to collect primary data.
You have learnt to edit primary data using various methods of editing primary data. To conclude
you have acquired skills to collect and edit primary data.
1. A company would like to conduct a survey to know about its customer’s satisfaction.
In this case which data (primary or secondary) will you recommend to the company?
What are the data collection methods you suggest to the company for data collection ?
Why? State the appropriate reasons.
11
Section - A (Short Answers)
2. State the preliminary steps you would take for planning a statistical enquiry.
12
UNIT - 2: METHODS OF COLLECTION AND EDITING OF
SECONDARY DATA
Contents
2.0 Objectives
2.1 Introduction
2.6 Summary
2.0 OBJECTIVES
2.1 INTRODUCTION
Secondary data are the data that have already been collected by someone else before
the current needs of the researcher. The researcher only uses the data with related reference
and never collect it from the field. When compared with the primary data, secondary data can
be collected easily with time and cost efficiency.
Secondary data sources can be broadly classified into two categories. They are
1. Published sources
2. Unpublished sources.
1. Published Sources
The governmental, international and local agencies publish statistical data. Some of
them are explained below:
There are some international institutions and bodies like I.M.F, I.B.R.D, I.C.A.F.E and
U.N.O who publish regular and occasional reports. These reports provide lot of data to the
researcher. The accuracy and the quality of the data are unquestionable
Several departments of the Central and the State Governments regularly publish reports
on a number of subjects. Some of the important publications are: The Reserve Bank of India
Bulletin, Census of India, Statistical Abstracts of States, Agricultural Statistics of India, Indian
Trade Journal, CSO, NSSO etc provide lot of database to any researcher.
Indian Statistical Institute (I.S.I), Central Statistical Organisation (CSO), National Sample Surveys
Organisation (NSSO), Indian Council of Agricultural Research (LC.A.R), Indian Agricultural
Statistics Research Institute (I.A.S.R.I), etc. publish data regarding various phenomenon and
also present the findings of their research projects.
14
(E) Reports of Various Committees and Commissions Appointed by the
Government
Journals and News Papers are very important and powerful sources of secondary
data. Current and important materials on statistics and socioeconomic problems can be obtained
from journals and newspapers like Economic Times, Commerce, Capital, Indian Finance, Monthly
Statistics of trade etc.
2. UNPUBLISHED SOURCES
Unpublished data can be obtained from many unpublished sources like records
maintained by various government and private offices, from the university libraries. We can
gather data from these for the numerous research process by the researchers.
Since secondary data have already been obtained, it is highly desirable that a proper
scrutiny of such data is made before they are used by the investigator. In this context Prof.
Bowley rightly points out that “Secondary data should not be accepted at their face value.”
Because it may be biased, inadequate sample size, substitution errors, arithmetical errors etc.
Even if there is no error such data may not be suitable to achieve the objectives of the researcher.
Therefore, before using the secondary data the investigators should consider the following
factors:
The investigator must satisfy himself that the data available are suitable to achieve
formulated research objectives.
ADEQUACY OF DATA
If the data are suitable for the purpose of investigation then we must judge whether the
data can provide adequate information to the present study.
RELIABILITY OF DATA
The reliability of data depends upon the various methods and various measurements
adopted by the organisation for data collection.
15
Once data have been obtained the next step in the statistical investigation is to edit the
data. The chief objective of editing is to detect possible errors and irregularities. The task of
editing is a highly specialized one and requires great care and attention.. While editing data, the
following considerations should be borne in mind:
While editing, the editor should see that each schedule and questionnaire is complete in all
respects. He should see to it that the answers to each and every question have been furnished.
If some questions are not answered and if they are of vital importance, the informants should
be contacted again either personally or through correspondence. Even after all the efforts it
may happen that a few questions remain unanswered. In such questions, the editor should mark
‘No answer’ in the space provided for answers and if the questions are of vital importance then
the schedule or questionnaire should be dropped.
At the time of editing the data for consistency, the editor should see that the answers to
questions are not contradictory in nature. If they are mutually contradictory answers, he should
try to obtain the correct answers either by referring back the questionnaire or by contacting,
wherever possible, the informant in person. For example, if amongst others, two questions in
questionnaire are (a) Are you a student? (b) Which class do you study and the reply to the first
question is ‘no’ and to the latter ‘tenth’ then there is contradiction and it should be clarified.
Homogeneity means the condition in which all the questions have been understood in
the same sense. The editor must check all the questions for uniform interpretation. For example,
16
if the question of income is asked, if some informants have given monthly income, others annual
income and some others weekly income or even daily income, no comparison can be made.
Therefore, it becomes an essential duty of the editor to check-up that the information supplied
by the various people is homogeneous and uniform.
As we have already seen, there are a lot of differences in the methods of collecting
Primary and Secondary data. Primary data which is to be collected originally involves an entire
scheme of plan starting with the definitions of various terms used, units to be employed, type of
enquiry to be conducted, extent of accuracy aimed at etc. For the collection of secondary data,
a mere compilation of the existing data would be sufficient. A proper choice between the type
of data needed for any particular statistical investigation is to be made after taking into
consideration, the nature, objective and scope of the enquiry; the time and the finances at the
disposal of the agency; the degree of precision aimed at and the status of the agency (whether
government- state or central-or private institution or an individual).
In using the secondary data, it is best to obtain the data from the primary source as far
as possible. By doing so, we would at least save ourselves from the errors of transcription
which might have inadvertently crept in the secondary source. Moreover, the primary source
will also provide us with detailed discussion about the terminology used, statistical units employed,
size of the sample and the technique of sampling (if sampling method was used), methods of
data collection and analysis of results and we can ascertain ourselves if these would suit our
purpose. Now-a-days in a large number of statistical enquiries, secondary data are generally
used because fairly reliable published data on a large number of diverse fields are now available
in the publications of governments, private organizations and research institutions, agencies,
periodicals and magazines etc. In fact, primary data are collected only if there do not exist any
secondary data suited to the investigation under study. In some of the investigations both primary
as well as secondary data may be used.
17
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2.5.2 Explain the different sources of secondary data and the precautions in using secondary
data.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2.5.3 What is editing of secondary data? Why is it required?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2.5.4 What are the different types of editing of secondary data?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
18
2.6 SUMMARY
There are two types of data, primary and secondary. Data which are collected first
hand are called Primary data and data which have already been collected and used by some-
body are called Secondary data. There are two methods of collecting data: (a) Survey method
or total enumeration method and (b) Sample method. When a researcher goes for investigating
all the units of the subject, it is called as survey method. On the other hand if he/she resorts to
investigating only a few units of the subject and gives the result on the basis of that, it is known
as sample survey method. There are different sources of collecting Primary and Secondary
data. Some of the important sources of Primary data are-Direct Personal Interviews, Indirect
Oral Interviews, Information from Correspondents, Mailed questionnaire method, Schedules
sent through enumerators and so on. Though all these sources or methods of Primary data have
their relative merits and demerits, a researcher should use a particular method with lot of care.
There are basically two sources of collecting secondary data- (a) Published sources and (b)
Unpublished sources. Published sources are like publications of different government and semi-
government departments, research institutions and agencies etc. Whereas unpublished sources
are like records maintained by different government departments and unpublished Theses of
different universities etc. Editing of secondary data is necessary for different purposes as -
editing for completeness, editing for consistency, editing for accuracy and editing for homoge-
neity.
It is always a tough task for the researcher to choose between primary and secondary
data. Though primary data are more authentic and accurate, time, money and labour involved in
obtaining these more often prompt the researcher to go for the secondary data. There are
certain amount of doubt about its authenticity and suitability, but after the arrival of many
government and semi government agencies and some private institutions in the field of data
collection most of the apprehensions in the mind of the researcher have been removed.
2.5.1 Data that have already been collected by someone else before the current needs of the
researcher.
2.5.2 Published sources and unpublished sources. Precautions are sustainability, adequacy,
reliability.
2.5.3 It is required because the secondary data may bebiased inadequate sample size,
substitution errors. Editing is to detect possible errors and innegularities.
19
2.8 MODEL EXAMINATION QUESTIONS
20
UNIT - 3 : CLASSIFICATION AND TABULATION OF THE
DATA
Contents
3.0 Objectives
3.1 Introduction
3.5 Tabulation
3.6 Summary
3.0 OBJECTIVES
3.0 INTRODUCTION
In the last unit we have learned about various types of data and data collection methods
from primary and secondary sources. After obtaining data if you present data as it is, does not
give any idea or insights of the problem under study. Therefore, we need to present data in a
systematic way. Such arrangement is called data classification and tabulation of the data. In
data classification we arrange data into mutually exclusive groups on the basis of certain
properties.
21
3.1 CLASSIFICATION OF DATA:
b. Exhaustive - every classified data must be exhaustive in the sense that they should
belong to one of the classes or categories.
c. Suitable for the purpose - it is crucial to remember the objective of the report or
analysis while classifying data. Avoid classifying the data in a manner that does not suit
the purpose of the inquiry.
1. Temporal Classification
2. Spatial Classification
3. Quantitative Classification
4. Qualitative Classification
22
3.3.1. TEMPORAL CLASSIFICATION
Year 2002 2003 2004 2005 2006 2007 2008 2009 2010
Population
25.8 43.2 54.2 63 68 70 72 77 80
(crore)
For example sales of TV’s in various regions given in the Table 3.2.
Table 3.2
Sales of TV sets in different regions
Region Nellore Vijayawada Warangal Kareemnagar Hyderabad
Table 3.3
Sales of products in different regions
(Figures in Numbers)
Region/
Hyderabad Secunderabad Khammam Nalgonda Warangal Total
products
23
3.3.3 CLASSIFICATION OF QUALITATIVE DATA
If the classification basis is some attribute like honesty, kindness, gender, literacy, region,
occupation etc. which can’t be quantified, then that classification is called qualitative
classification. To classify qualitative data we construct a frequency distribution. Frequency
distribution represents various attributes observed in the data and number of times each attribute
is repeated which is also called frequency. The following table 3.4 shows the satisfaction of the
data collected randomly from the 113 guests stayed in the hotel and the customers visited a
hotel.
Table 3.4
No. of guests 13 12 34 42 12
Table 3.5
No. of children 0 1 2 3 4
No. of families 15 34 76 23 10
One of the most common ways of grouping data is through the use of frequency
distribution. A frequency distribution divides data into several ordered nonoverlapping classes
or groups or categories. Raw data collected needs to be grouped together in a meaningful way.
One of those meaningful presentations of the data is frequency distribution. The number of
observations in each class is known as frequency. A table which represents values or classes
and their frequencies is called frequency distribution. A frequency distribution tells us how
values are distributed over the classes. We need to understand the process of converting data
into a frequency distribution.
24
Example 3.1
For example guests staying at a 3 star hotel were asked to rate the quality of their
accommodations as being excellent, above average, average, below average or poor.
The ratings provided by a sample of 25 guests are shown below.
Below average Average Above Average Above Average Above Average Above Average
Above Average Below Average Below Average Poor Poor Above Average
Excellent Above Average Average Above Average Average Above Average
Average Excellent Above Average Above Average Below Average Below Average
Frequency Distribution
Rating Frequency
Poor 2
Below Average 4
Average 3
Above Average 9
Excellent 2
From the frequency distribution, we can observe that the most of the guests visiting
that hotel feel that the quality of the service provided by the hotel staff is above average since
the frequency of the rating above average is the highest.
We might want to compare the various opinions of the guests in the example or we
might express the same information instead of frequencies with relative importance of each
rating, we convert the above frequency distribution into a relative frequency distribution. We
calculate each rating relative frequency by dividing the respective category’s frequency by the
total number of observations. The sum of the relative frequency equals to one. The following
table is the relative frequency distribution of the example
Frequency Relative
Rating
Frequency
Poor 2 0.10
Average 3 0.15
Excellent 2 0.10
25
We can easily convert the relative frequencies into percent frequencies by multiplying
relative frequencies by 100. For instance the percent of respondents rated service quality as
above average is 45% and only 10% of the respondents rated the service quality offered by the
respondents is excellent.
For a quantitative data having few values which are repeating continuously within the data set
can be expressed with a discrete frequency distribution. In this case to summarize such data
we can arrange values in ascending order and count their frequencies by using tally bars.
Example: 2
0 1 1 1 1 2 2 2 0 0
0 0 0 0 0 0 0 0 1 1
3 0 0 0 0 0 0 0 0 0
Solution: The above data consists 30 observations, but it consists only four values which are
continuously repeating themselves. In this case to summarize above data we can use the
following frequency distribution. Which is also called discrete frequency distribution
3.4.4 Case II: If the data set consist values without more repetitions:
To construct a frequency distribution the entire range of the data can be divided into
various subranges. Each subrange is known as a class. Each class can be represented by two
values. They are lower limit and upper limit. Lower limit represents the smallest possible
value of the class and Upper limit represents the largest possible value of the class. The
26
length of the class is called class interval, it can be determined by calculating the difference
between Lower limit and Upper limit. The number of observation in each class is called class
frequency
For example in the following frequency distribution 10-20, 20-30, 30-40, and so on are
called classes. 10, 20, 30, … are called lower limits of the first, second, third classes etc.
respectively. 20, 30, 40, … are called upper limits of the first, second, third classes etc.
respectively. The difference between upper limit and lower limit of the classes are called
lengths of class intervals 20-10 = 10, 30-20 = 10 etc.
10 – 20 8
20 – 30 12
30 – 40 15
40 – 50 8
50 – 60 6
60 – 70 2
In a frequency distribution if the upper limits of the class included in the same class
then the class intervals are called inclusive class intervals. For example, in the following frequency
distribution the upper limit of the first class 9 included in the same class, 19 is the upper limit of
the second class included in the second class and so on.
0–9 8
10–19 12
20–29 13
30–39 6
In a frequency distribution if the upper limit of the class excluded from that class then
the class intervals are known as exclusive class interval. For example in the following frequency
distribution in the first class interval 0 – 10, 10 excluded from the first class. It is included in the
second class. Similarly from the second class 20 excluded and included in the third class and so
on.
27
Class Interval Frequency
0–10 8
10–20 12
20–30 13
30–40 6
Open-end frequency distribution is the one which has at least one of its ends open.
Either the lower limit of the first class or upper limit of the last class or both are not specified.
The words “below” or “less than” and “above” or “more than” are used. In the former the
value extends to - and in the latter to + co. Example of such a frequency distribution is
given in Table.
28
3.4.4.5 A Frequency Distribution with unequal class width
The classes of a frequency distribution may or may not be of equal width. A frequency
distribution with unequal class width is reproduced in table. Here, the width of 1st, 2nd and 5th
classes is 10, while that of 3rd is 5 and that of 4th is 15.
Class Interval Frequency
0–10 8
10–20 12
20–30 13
30–40 6
40–50 2
For a quantitative data a relative frequency distribution identities the proportion of the
values that fall into each class, that is,
Class frequency
Class frequency
Class relative frequency = Total Number of values
Example 1:
0 18 18 30
1 5 18+5=23 30–18=12
2 3 23+3=26 12–5=7
3 2 26+2=28 7–3=4
4 1 28+1=29 4–2=2
5 1 29+1=30 2–1=1
29
Example 2:
0–10 8 8 39
30–40 6 33+6=39 6
Note: Where LCF = Less than Cumulative Frequency, GCF = Greater than Cumulative
frequency
The above table indicates quickly the number of people drawing less than or more than
a particular value. It may be noted that the cumulative frequencies are corresponding to the
class limits or class boundaries. The less than cumulative frequencies are corresponding to the
upper limits or upper boundaries of the class and the more than cumulative frequencies are
corresponding to the lower limits or lower boundaries of the class.
Less than 10 8
Less than 20 20
Less than 30 33
Less than 40 39
Greater than 0 39
Greater than 10 31
Greater than 20 19
Greater than 30 6
30
Construction of a Frequency Distribution
2. Fix the number of class intervals. There is no fixed rules to obtain the number of class
intervals. As a rule of thumb, the number of class intervals should be not less than 5
and more than 15. The final decision is based on the discretion of the researcher.
3. Determine the width of the class interval by using the following formula
Range
Width of the Class Interval = Number of class int ervals
Example 3:
A company organised a training programme. After the first week, the company officials
evaluated the training programme. The scores (out of 100) of 40 employees are presented
below;
32 36 31 67 65 74 43 42 39 56 78 61 46
42 39 56 78 61 46 56 34 78 75 78 56 30
65 42 45 79 64 46 54 59 56 54 53 62 58
71 85 87 73 88 65 54 78 47 56 59
The first step in the construction of a frequency distribution is to find the range of
the given data.
Range
Width of the Class Interval = Number of class int ervals = 58/6 = 9.83
To construct frequency distribution, the class interval must start from the value lower
than or equal to the lowest number of the ungrouped data and must end at the value higher than
or equal to the highest number of the ungrouped data. In this case the lowest marks 30 and
highest marks 88. We can start the distribution from 30 and can end the distribution at 88 or
more with the number of class intervals 6 represented in the following table.
31
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3.4.5.2 Explain the following terms giving examples:
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
32
3.4.5.3 What points are to be kept in mind while taking decisions for preparing a frequency
distribution in respect of a) The number of classes, and b) Width of the class interval?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3.4.5.4 Construct relative frequency distribution, less than and more than type cumulative
frequency distributions from the following data:
Frequency 5 8 10 12 8 7
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
“The logical listing of related quantitative data in vertical columns and horizontal rows
of numbers with sufficient explanatory and qualifying words, phrases and statements in the
form of titles, headings and explanatory notes to make clear the full meaning, context and the
origin of the data “A saying”.
Table 3.15 is based on hypothetical figures of exports and imports of country X with
country B’ for three years 1995, 1996 and 1997.
33
Table 3.15 Imports and Exports of India with country B during 1995-1997
(in crores of rupees)
Nepal 60 70 50 60 40 50
Bangladesh 45 50 60 70 80 50
China 50 60 70 80 60 70
Singapore 60 70 70 80 80 90
In this table it is clear that the purpose is to show the imports and the exports of India
vis-a-vis the rest of the world. Note that a particular entry of the table refers to a column and
a row. For example, an entry at the intersection of second row and fourth column indicates that
in 1996 India imported goods and services worth Rs.50 crore from country Nepal. This figure
then can be compared with other import and export figures to seek important interpretations.
Objectives of Tabulation
2. Data presented in the form of a table reveal the trend or pattern of data which otherwise
cannot be understood in a descriptive form of presentation.
Types of Tables
34
1) Reference tables are tables we prepare for a general purpose and these tables used
for storing information with the objective of presenting data meaningfully. From these tables,
we can derive information which is also called secondary data. Examples are tables presented
by different government departments, ministries, Reserve Bank of India, Economic Surveys,
etc. are reference tables and are a routine work of these departments.
Population Census tables prepared by the Registrar General of India giving detailed
information on the demographic features of India is another important example. Students are
advised to consult the latest issue of “Economic Survey” which is issued every year along with
the union budget of India.
Text tables are the special type of tables. These tables are smaller in size and are
prepared from the reference tables. To study a particular aspect these tables will be prepared.
For example from the Population Census tables we may collect information regarding various
demographic factors like religion, age, gender, Income, languages they speak etc. Similarly
from various publications of Reserve Bank of India, we may be able to extract information, in
tabular form, on money supply, rate of interest offerred by banks for the last ten years or so.
Features of a Table
1. Table number
A table should have its own identification. For example in case of report generation
based on the data stored in various tables it is mandatory to mention the table identification from
which we are generating the report. Therefore a table should be identified by a number. Generally
table number can be mentioned at the top of the table.
Title of the table is necessary to understand what is in the table. Title of the table should
be brief and precise, avoid lengthy sentences. Present the title with bold or capital letters.
3. Head note
It is also called prefatory note, is written just below the title. It shows contents and unit
of measurement like (rupees crore) or (lakh tonnes) or (thousand bales). It should be written in
brackets and should appear on right side top just below the title. However, every table does not
need a head note like number of students in each class.
4. Stubs
Stubs are used to designate rows. They appear on the left hand column of the table.
Stubs consist of two parts: a) Stub head describes the nature of stub entry. b) Stub entry is the
description of row entries.
35
5. Captions
Captions also called box heads, designate the data presented in the columns of the
table. It may contain more than one column heads, and each column head may be sub-divided
into more than one sub-head. For example, we can divide the students of a college into hostellers
and non-hostellers and then again into males and females. This will help us to know the number
of male hostellers in, say, first year, second year and third year.
It is also called field of the table, is its most important and bulky part. It contains the
relevant numerical information about which a hint is already contained in the title of the table
7. Foot Note
Foot note is a qualifying statement put just below the table (at the bottom). Its purpose
is to caution about the limitations of the data or certain omissions. Source of data may be the
last part of a table, yet it is important. It speaks about the authenticity of the data quoted.
Taking all these points into consideration, the format of a hypothetical table is presented below:
Caption
Column – head Column – head Column – head Column – head Column – head
Sub entries
Total (Columns)
Footnote:
Source Note:
Remarks:
Not available information should be indicated by the letter N.A, or by Dash (-) in the
body of the table Ditto marks (“), ‘etc’, and abbreviated forms should be avoided in the table.
Importance of Tables
Numerical information arranged in tabular form has distinct advantage over other forms of
presentation. First, tabulated data are easy to understand and interpret. Secondly, one can
make quick comparison between different characteristics, for example, ‘Are imports greater
36
than exports over all the three years?’ or ‘Are exports increasing?’ Thirdly, it opens doors for
further investigations. Fourthly, they have a more lasting impression on human mind than the
textual statements. Needless to say, that the statistical tables are used extensively in almost all
fields of human inquiry.
3.6 SUMMARY
In this unit you have understood the concepts of classification and tabulation. You have
learnt different classification techniques to classify the data. For quantitative and qualitative
data you have been taught how to construct a frequency distribution. Finally you have learnt
how to interpret a frequency distribution.
3.4.5.1 In simple frequency distribution, frequencies are tabulated class interval - wise. In
cumulative frequency distribution, frequencies of successive class intervals are added.
3.4.5.3 Number of classes is between 5 and 15, width of the class interval is range divided by
number of class intervals.
1) Distinguish between
2. Draw a blank table to show the number of candidates gender wise appearing in the
pre-university, first year, second year, and third year examinations of a university in
the faculties of Arts, Science, and Commerce in a certain year.
37
BLOCK - II: MEASURES OF CENTRAL TENDENCY AND
DISPERSION
In this block, you will know about the concept of measures of central tendency and
measures of dispersion. In unit 4, you study measures of central tendency i.e. mean, median,
mode, GM and HM. In unit 5, we will disucss the measures of dispersion i.e., Range, QD, MD
and SD. In unit 6, we shall introduce the moments, central and non-central moments and
skewness.
38
UNIT - 4: MEAN, MEDIAN, MODE, GM, HM
Contents
4.0 Objectives
4.1 Introduction
4.2 Mean
4.3 Median
4.4 Mode
4.7 Exercise
4.8 Summary
4.0 OBJECTIVES
Define and compute the various measures of central tendency viz., mean, median,
mode, geometric mean and harmonic mean for different forms of data, i.e., (i) raw
data, (ii) data arranged in the form of a frequency distribution, (iii) data arranged in the
form of grouped or continuous frequency distribution.
4.1 INTRODUCTION
One of the most important objectives of statistical analysis is to get one single value
that describes the characteristic of the entire mass of the data. Such a value is called central
value or an average. The word average is very commonly used in day-to-day conversation. For
example, we often talk of average boy in a class. When we say he is an average student it
means that he is neither very good nor very bad, just a mediocre type of student. However, in
statistics the term average has different meaning.
An average value is a single value within the range of the data that is used to represent
all the values in the series. It is also known as central value of central tendency.
39
Measure of Central Tendency
(2) Median
(3) Mode
4.2 MEAN
4.2.1 Mean
The most popular and widely used measure of central tendency is mean.
Following two methods are used for calculating arithmetic mean of an individual series.
The process of calculating arithmetic mean or mean in the case of an individual series
is very simple. Let x1 , x2 , x3 ,..., xn be n observations then mean is denoted by x and defined as
x1 x2 x3 ... xn
xi
i 1
x
n n
Example 1: Marks obtained by 10 students in statistics are 52, 76, 70, 40, 56, 43, 65, 36, 48, 64.
Calculate the arithmetic mean.
10
x i
Solution: Mean marks of the students is x i 1
52 76 70 40 56 43 65 36 48 64
10
550
55
10
The arithmetic mean can also be calculated by short-cut method or indirect method.
This method reduces the amount of calculation. It involves the following steps.
(i) Presume any one value as an assumed mean (A), which is also known as working
mean or provisional mean or arbitrary average.
(ii) Find the deviation or difference of each value from the assumed mean d = x - A
d
(iv) Now apply the formula X A
n
Example 2: Let us solve the previous example by short-cut method.
d = x - A where A= 56
52 -4
76 +20
70 +14
40 -16
56 0
43 -13
65 +9
36 -20
48 -8
64 8
d 10
41
d
Mean = X A
n
10
56
10
= 56 - 1 = 55.
Mean marks of the students is 55, which is same as computed in the direct method.
In discrete frequency series, there are frequencies corresponding to the different values
in the series. There are two methods of estimating mean of the discrete series.
(a) Direct Method: Formula to estimate mean in discrete series using direct method is
fx fx
X
f N
(b) Short-cut method: Formula to estimate mean in discrete series using short-cut method
is
fd
x A , where N f , d x A .
N
Example 3: Find the arithmetic mean of the following frequency distribution using (a) Direct
method and (b) Short - cut method.
x 1 2 3 4 5 6 7
f 5 9 12 17 14 10 6
42
x f fx
1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42
f 73 fx 299
fx 299
X 4.09 .
f 73
fd
x A , where N f , d x A
N
d=x-A
x f fx d
=x-4
1 5 -3 -15
2 9 -2 -18
3 12 -1 -12
4 17 0 0
5 14 1 14
6 10 2 20
7 6 3 18
f = 73 fd = 7
7 292 7
Mean = x 4 4.09 .
43 73
43
4.2.3 Calculation of Mean for Continuous Series (Grouped Data)
fm
x , where m = mid-value of class interval.
f
(b) Short-cut Method
Formula for the mean in continous series is given by
x A
fd , where d= deviations of mid point = m - A.
f
Example: For the following data compute arithmetic mean by (i) direct method, (ii) short-cut
method.
fm
x , where m = mid-value of the class interval.
f
44
x
fm 3300 33
.
f 100
Average marks = 33.
x A
fd
f , where d= deviations of mid point = m - A.
Mid – value d = m-A
Marks f fd
m = m -35
0-10 5 -30 5 -150
10-20 15 -20 10 -200
20-30 25 -10 25 -250
30-40 35 0 30 0
40-50 45 10 20 200
50-60 55 20 10 200
f = 100 fd = -200
200
Mean = x 35
100
= 35 + (-2)
= 35 - 2 = 33.
4.2.4 Merits and Demerits of Arithmetic Mean
Merits
1. It is rigidly defined.
3. If the number of items is sufficiently large, it is more accurate and more reliable.
5. It is possible to calculate even if some of the details of the data are lacking.
Demerits
45
4. It cannot be calculated for open-end classes.
5. It may lead to fallacious conclusions, if the details of the data from which it is computed
are not given.
(b) Compare your answer with the one given at the end of this unit.
1. The following are the wages (Rs.) of the workers 215, 265, 275, 280, 412, 495, 672,
890, 1115, 1245. Calculate the mean wage of the workers by (i) direct method (ii)
short-cut method.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2. Following table gives the wages paid to 125 workers in a factory. Calculate the arithmetic
mean of the wages using (i) direct method (ii) short-cut method.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. Calculate the mean for the following frequency distribution using (i) direct method (ii)
short-cut method.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
46
4.3 MEDIAN
Second measure of central tendency is median. It is defined as the value of the middle
item (or the mean of two middle items) when the items are arranged in an increasing or decreasing
order of magnitude. Method to determine the median is explained below.
Step 1: Arrange the given observations either in ascending or descending order of magnitude.
Generally, we follow ascending order.
N 1
Case (i) If N is odd then median is the value corresponding to th item.
2
th
N N 2
Case (ii) If N is even then median is the average and th items.
2 2
Example: Find the median of the following items 5, 19, 40, 11, 55, 32, 21, 60, 58, 38, 30.
A.O: 5, 11, 19, 21, 30, 32, 38, 40, 55, 58, 60
N 1
Median = th item
2
11 1
6.
2
Example: Find the median for the following data 50, 30, 5, 20, 10, 25, 15, 45, 35, 40.
Solution: A.O: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50
N = 10 (even)
N N 2 12
5, 6
2 2 2
25 30
Median = 27.5 .
2
47
4.3.2 Calculation of Median in a discrete frequency distribution
N
Step 1:Find , where N f
2
N
Step 2:Locate the cumulative frequency just greater than .
2
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6
Solution:
x f Cumulative
frequency (cf)
1 8 8
2 10 18
3 11 29
4 16 45
5 20 65
6 25 90
7 15 105
8 9 114
9 6 120
f = 120
Here N f 120
N 120
60 .
2 2
N
The cumulative frequency just greater than is 65 and the value of x corresponding
2
to 65 is 5
Median is 5.
48
4.3.3 Calculation of Median for a Continuous Frequency Distribution
In the case of continuous frequency distribution the following steps are involved to
calculate median.
N
Step 2: Locate c.f which is just greater than and it is called median class.
2
N h
Median = l c .
2 f
and N f .
Class interval 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
Frequency 5 7 10 18 20 12 8 6 4 1
Solution:
Class Cumulative
Frequency
Interval frequency (cf)
0-5 5 5
5-10 7 12
10-15 10 22
15-20 18 40
20-25 20 60
25-30 12 72
30-35 8 80
35-40 6 86
40-45 4 90
45-50 1 91
N = f = 120
49
N 91
45.5 .
2 2
From the above table the cumulative frequency, just greater than 45.5 is 60 and the
corresponding class 20-25 is median class.
N h
Median = l c .
2 f
5
Median = 20 45.5 40 = 21.375.
20
Weight (in gms) 410-419 420-429 430-439 440-449 450-459 460-469 470-469
No. of apples 14 20 42 54 45 18 7
Solution: It is an example of inclusive series in which the upper class limit of first class finishes
at 419 but the lower class limit of second class begins at 420 thus causing a difference of 1.We
should convert it to the exclusive series by deducing 0.5 from the lower limits and adding 0.5 to
the upper limits.
N 200
100
2 2
50
c.f just greater than 100 is 130. Therefore, 439.5-449.5 is the median class.
N h
Median = l c .
2 f
10
439.5 100 76
54
= 439.5 + 4.44
= 443.94.
Merits
4. Median can be located even for qualitative factors such as ability, honesty, etc.
Demerits
1. A slight change in the series may bring drastic change in median value.
3. It is not suitable for further mathematical treatment except its use in mean deviation.
(b) Compare your answer with the one given at the end of this unit.
4. Marks obtained by 9 students in statistics are 55, 48, 63, 78, 36, 45, 67, 59, 61. Find
median.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
51
5. Find the median size of the following series.
Size X 4 5 6 7 8 9 10
Frequency 6 12 15 28 20 14 5
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
Marks 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
Frequency 7 15 24 31 42 30 26 15 10
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
7. Calculate the median from the following data.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
52
4.4 MODE
Mode is the another measure of central tendency. The mode or the modal value is that
value in a series of observations or data which occurs with the greatest frequency.
Solution: Here are only ten observations and the observation 11 has the maximum frequency
3.
In the case of discrete frequency distribution mode can be located simply by inspection.
Here the variate having maximum frequency will be taken as mode.
x 10 12 14 16 18 20 22
f 4 6 10 11 21 10 5
solution: In the above discrete frequency distribution the variate value 18 has the maximum
frequency 21. Therefore, 18 is the mode of the given data.
If the data are given in class intervals then the following formula is used to calculate
mode.
f1 f0
Mode = l 2 f f f c
1 0 2
53
Example: From the following data regarding the incomes of 100families, find out the average
income by means of mode.
Income No. of
families
Upto 1500 18
1500-2500 10
2500-4000 15 – f0
Modal class 4000-5000 25 – f1
5000-6000 12 – f2
6000-8000 11
8000-10000 7
Above 10000 2
From the given data, highest frequency is 25. The class corresponding to 25 is
4000-5000 and is called modal class.
f1 f0
Mode = l 2 f f f c
1 0 2
10
Mode = 4000 1000
23
= 4000 + 434.7
= 4,434.7.
Merits
3. It is not affected by extreme items. It can be obtained even if the extreme values are
not given.
54
6. Mode has been defined as the most typical value of a distribution. Therefore, it is a
useful average for many practical situations, such as, average size of shoe, average
price of a commodity, the average type of dress, average wages and so on.
Demerits
3. It is not capable of being handled algebraically as its value is not based on all the
observations.
4. The mode doesnot exist in many cases while there may be more than one mode in
other cases, i.e., it is not useful as an average in such situations.
5. The value of mode is significantly affected by the size of the class interval which is the
basis of grouping the frequencies.
(b) Compare your answer with the one given at the end of this unit.
2, 8, 8, 3, 6, 7, 2, 8, 9, 2, 8
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
9. Calculate mode from the following data.
x 3 4 7 8 9 11 12
f 2 6 5 14 10 6 3
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
55
10. Calculate the mode for the following continuous frequency distribution.
Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 5 7 10 12 18 10 6 3 2 1
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
4.4.4 Determination of Mode from Mean and Median Using an Empirical Relationship
In a symmetrical distribution mean, median and mode are identical, i.e., have the same
value. However, for moderately asymmetrical distributions, the values of mean, median and
mode are observed to have the following empirical relationship.
If any two values out of the three are known, the third can be calculated by using the
above relation.
Example: In a moderately asymmetrical distribution, the mode and the mean are 64.2 and 67.4
respectively. Find the median.
199
Median = 66.33 .
3
The most important measures of central tendency are mean, median and mode becasue
of their wide usefulness, simplicity and general applicability. Under certain conditions other
measures of central tendency may be useful and we shall, therefore, consider the geometric
mean.
The geometric mean (GM) is the nth root of the product of n positive values. The
formula is
56
1/ n
GM = x1. x2 ....xn .
f f f 1/ n
GM = x1 1 x2 2 ...xn n , where n f i .
1/ 3
GM = 4 16 27 12
Example: The frequency table shows the number of goals scored in netball by Iris in 12 games
played.
No. of goals 1 2 3 4 5 6
frequency 2 3 2 1 2 2
f f f 1/ n 2 2
Solution:GM = x1 1 x2 2 ...xn n = 1 ...6 , where n f i .
Geometric mean is defined as the nth root of the product of n positive items. When the
number of items is three or more the task of multiplying the numbers and of extracting the root
becomes excessively difficult. To simplify calculations logarithms are used. Geometric mean
then is calculated as follows:
log x
GM = Antilog N
f log x
In discrete series, GM = Antilog , N f
N
f log m
In continuous series, GM = Antilog N
, where m = mid-value.
57
Example: Find the GM of 2, 2.5, 7, 15.05 and 6.75
Solution:
x log x
2 0.30103
2.5 0.39794
7 0.84509
15.05 1.17753
6.75 0.82930
Total 3.55089
log x
GM = Antilog N
3.55089
= Antilog
5
= 0.710182.
f 4 11 21 6 2
Solution:
x f log x f . log x
110 4 2.0414 8.1165
115 11 2.0607 22.6677
118 21 2.0719 43.5099
119 6 2.0755 12.4530
120 2 2.0792 4.1584
f = 44 f log x = 90.9055
f .log x
Geometric Mean (GM) = Antilog
f
90.9055
= Antilog 116.7 .
44
58
Example: Find the geometric mean for the following data.
frequency: 1 2 6 6 5
f log m
GM = Antilog
f
f log m
GM = Antilog
f
28.9692
= Antilog
20
= 28.08.
Merits
Demerits
1. It is defined only for positive values of the variable. GM is undefined if and only if the
observation is negative or zero.
2. It is difficult to understand.
59
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
11. Daily income of ten families of a particular place is given below. Find geometric mean.
85, 70, 15, 75, 500, 8, 45, 250, 40, 36
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
12. Calculate the geometric mean of the following data.
x 50 60 70 80 90 100 110
f 2 4 7 10 9 6 2
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
13. Calculate the geometric mean of the following data.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
60
4.6 HARMONIC MEAN (H.M)
Harmonic mean, like geometric mean is a measure of central tendency in solving special
types of problems. Harmonic mean is the reciprocal of the arithmetic mean of the reciprocal
values of various items in the variable. It is a specified average which solves problems involving
variables expressed within ‘time rates’ that vary according to time.
Individual Series
N
HM =
1
x
1
where x = sum of the reciprocal values of the variable x.
N = number of items.
Example: From the data 5, 10, 17, 24, 30 calculate harmonic mean.
Solution:
1
x Reciprocal values o f x
x
5 0.2000
10 0.1000
17 0.0588
24 0.0417
30 0.0333
1
N=5 x = 0.4338
5
Harmonic mean = HM = 11.53 .
0.4338
61
Discrete Series
If the data is in discrete series, harmonic mean is calculated with the formula
N
HM = 1 N f = total frequency..
f . x , where
Example:Find the harmonic mean of the marks obtained in a class test given below.
Marks 11 12 13 14 15
No. of students 3 7 8 5 2
Solution:
No. of 1 1
Marks (x) f
students (f) x x
11 3 0.0909 0.2727
12 7 0.0833 0.5831
13 8 0.0769 0.6152
14 5 0.0714 0.3570
15 2 0.0667 0.1334
N = 25 1
f . x = 1.9614
N 25
Harmonic mean = 12.75 .
1 1.9614
f x
Continuous Series
Harmonic mean for continuous series can be calculated by using the following formula.
N
HM = , where m = mid-value of the class, N f .
1
f . m
Example: From the following data compute the value of harmonic mean.
62
Solution:
N 30
Harmonic Mean (HM) = = 29.88 .
1 1.004
f . m
Merits
1. It is rigidly defined
5. It is very useful for measuring average relative changes in certain types of rates or
ratios.
Demerits
3. It gives more weight to small observation and thus may lead to fallacious results.
However, in view of this property, the harmonic mean is more useful when more
weights are to be given to smaller observations.
63
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
15. Number of tomatoes per plant is given below. Calculate the harmonic mean.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
16. Find the harmonic mean of the following data.
Class Interval 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 5 8 11 21 35 30 22 18
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
64
4.7 EXERCISE
(Ans. 54)
Size of item 10 20 30 40 50 60 70
Frequency 8 12 18 26 16 12 8
(Ans. 54)
(Ans. 220)
5. The following is the frequency distribution of ages of 670 students of a school. Compute
the median of the data.
(Ans. 9 years)
(Ans. 52.62)
65
7. From the following data determine mode.
6, 5, 3, 4, 3, 7, 8, 5, 9, 5, 4
Marks 10 20 30 40 50 60
No. of students 8 23 45 65 75 80
(Ans. 27.78)
(Ans. 71.34)
10. Daily income of ten families of a particular place is given below. Find out geometric
mean.
(Ans. GM = 58.03)
(Ans. GM = 12.58)
(Ans. 11.526)
66
14. From the following data compute the value of harmonic mean.
Marks 10 20 25 40 50
No. of students 20 30 50 15 5
(Ans. 20.08)
15. From the following data compute the value of harmonic mean.
(Ans. 29.88)
4.8 SUMMARY
One of the most important aspects of describing distribution is the central value around
which the observations are distributed. A statistical measure used for representing the centre or
central value of a set of observations is known as a measure of central tendency or measure of
location. There are three measures of central tendency in common use. Arithmetic Mean,
Median and Mode.
Arithmetic mean is obtained by dividing the sum of the given observations by their
number. Apart from having most of the characteristics of a good average, arithmetic mean
possesses some simple but important algebraic properties. Because of these properties arithmetic
mean is one of the most widely used average.
The median, is the value of the variable which divides the group of observations into
two equal parts.
Mode is another measure of location. The mode of a distribution is the value around
which the observations tend to be most heavily concentrated.
The remaining two averages are geometric mean and harmonic mean. The geometric
mean of a set of n observations is defined as the nth root of their product. GM is often used in
the construction of index numbers.
The harmonic mean of a set of observations is the reciprocal of the arithmetic average
of the reciprocals of the observations. If all items in a series have the same value then the
arithmetic mean, GM and HM of the series coincide
i.e. AM = GM = HM
67
4.9 CHECK YOUR PROGRESS - MODEL ANSWERS
2. Mean = Rs.227.92
3. Mean = 25.404
4. Median = 59
5. median = 7
6. Median = 27.74
7. Median = 239.5
8. Mode = 8
9. Mode = 8
11. GM = 58.03
12. GM = 80.02
14. HM = 0.006
15. HM = 21.91
16. HM = 33.52
1. What are measures of central tendency? Explain their objectives and functions.
8. Define harmonic mean, explain the merits and demerits of harmonic mean.
68
Section - B (Short Answers)
1. What is average?
3. Define median.
5. What is mode?
69
UNIT - 5: RANGE, QD, MD, SD
Contents
5.0 Objectives
5.1 Introduction
5.4 Range
5.8 Exercise
5.9 Summary
5.0 OBJECTIVES
Define and compute different measures of dispersion viz., range, quartile deviation,
mean deviation and standard deviation for different types of data.
5.1 INTRODUCTION
The different measures of central tendency give us an idea of the central values whereas,
measures of dispersion are the measures of spread about an average. The central value may
be same in two or more distributions but may differ in respect to dispersion. The measures of
dispersion help us in studying the important characteristics of the distribution. It measures the
extent to which there are differences between individual observations and the central value.
70
5.2 CHARACTERISTICS OF AN IDEAL MEASURE OF
DISPERSION
For ascertaing the deviations from the central value there are certain specific measures
of dispersion. The important measures of dispersion are:
1. Range
2. Quartile deviation
3. Mean deviation
4. Standard deviation
Measures of Dispersion
1. The measures which express the spread of observations in terms of distance between
the values of the selected observations. These are also termed as distance measures,
eg., range and quartile deviation.
2. The measures which express the spread of observations in terms of the average of the
deviations of observations from some central value eg., mean deviation and standard
deviation.
71
Relative measure of dispersion is called coefficient of dispersion. The relative measure
of dispersion is the ratio of the absolute measure of dispersion to the mean and is expressed as
a percentage.
5.4 RANGE
LS
Coefficient of range = .
LS
Example: Find the range and coefficient of range for the following data.
Range = L - S= 14 - 5 = 9
L S 14 5 9
Coefficient of range = .
L S 14 5 19
Merits
2. In certain types of problems like quality control, weather forecasts, share price analysis
etc., range is mot widely used.
Demerits
72
5.5 QUARTILE DEVIATION
The range as a measure of dispersion has certain limitations. It is based on two extreme
observations. For this reason there has been developed a measure called the interquartile range
or quartile deviation. Quartiles divide the whole data of observations into approximately four
equal parts. Quartile deviation is definitely a better measure than the range, as it makes use of
50% of the data. But it ignores other 50% of the data. Quartile deviation is given by
Q3 Q1 Q3 Q1
Q.D = , coefficient of Q.D = Q Q .
2 3 1
Example: Calculate quartile deviation and its coefficient for the following data.
Q1 15
N 1 38
Q3 = size of 3 th item = size of = 6th item
4 4
Q3 40
Q3 Q1 40 15
Quartile deivation (Q.D) = = 12.5 .
2 2
Q3 Q1 40 15 25
Coefficient of Q.D = 0.455 .
Q3 Q1 = 40 15 55
Example: Calculate quartile deviation and its coefficient for the following data.
Marks 10 20 30 40 50 60
No. of students 4 7 15 8 7 2
Solution:
73
N 1 43 1
Q1 = size of th item = size of = 11th item
4 4
Q1 20
N 1 44
Q3 = size of 3 th item = size of 3 = 33rd item
4 4
Q3 Q1 40 20
Quartile deivation (Q.D) = 10 .
2 2
Q3 40
Q3 Q1 40 20 20 1
Coefficient of Q.D = Q Q = 0.333 .
3 1 40 20 60 3
Solution: To find QD and its coefficient, we first find the value fo Q1 and Q3 from the given
data.
N 799
Q1 199.75 th item, thus 30-35 is the inter quartile calss.
4 4
N
c
4
Q1 l1 h
f
74
199.75 120
30 5
100
398.75
30 = 30 + 3.99 = 33.99 34 years.
100
3N 3 799
Q3 is the value of 599.25 th item which lies in 45-50 class.
4 4
3N
c
4
Q3 l h
f
599.25 550
45 5
120
5 49.25
45
120
Q3 Q1 47 34 13
Q.D = 6.5
2 2 2
Q3 Q1 47 34 13
Coefficient of QD = Q Q 47 34 81 0.61 .
3 1
(b) Compare your answer with the one given at the end of this unit.
1. For the following data find range, quartile deviation. Also find their coefficients.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
75
2. Calculate QD and its coefficient from the following data.
Weight (Kg) 60 61 62 63 65 70 75 80
No. of workers 1 3 5 7 10 3 1 1
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. Compute quartile deviation for the following data.
No. of students 6 5 8 15 7 6 3
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
Merits and Demerits of Quartile deviation
Merits
Demerits
1. It is not based on all the items. It is based on two positional values of Q1 and Q3 and
ignores the extreme 50% of the items.
76
5.6 MEAN DEVIATION (MD)
Previously we have studied Range and Quartile deviation which are the positional
measures of dispersion. Their computation is not based on all the observations. In strict sense,
they are measures of dispersion as they do not measure the scatteredness in observations
around an average.
Mean deviation of a series is the arithmetic mean of the absolute deviations of various
items from some central value, such as mean, median and mode. Symbolically, the mean deviation
about mean, median or mode can be expressed as follows:
1
M .DX
N
X X
1
M .DM d
N
X Md
1
M .DM 0
N
X M0
where N = the number of observations
In case of grouped data or frequency distributions, the above formula can be given as:
1
M .DX
N
f X X
1
M .DM d
N
f X Md
1
M .DM 0
N
f X M0
77
Coefficient of Mean Deviation:
MD
Coefficient of MD =
Mean or Median or Mode
Example: Compute mean deviation and its coefficient from the following data.
10, 70, 50, 53, 20, 95, 55, 42, 60, 48, 80
Solution:
Absolute deviation
X
from mean |X–53|
10 43
70 17
50 3
53 0
20 33
95 42
55 2
42 11
60 7 \
48 5
80 27
x = 583 X-X = 190
Mean = X
x 583 53
N 11
1 1
M .DX X X = 190 17.27 .
N 11
Example: Compute the mean deviation from the mean for the following data.
Size 2 4 6 8 10 12 14 16
Frequency 2 2 4 5 3 2 1 1
78
Solution:
x f fx x x x 8 f x 8
2 2 4 6 12
4 2 8 4 8
6 4 24 2 8
8 5 40 0 0
10 3 30 2 6
12 2 24 4 8
14 1 14 6 6
16 1 16 8 8
fx 160
X 8
N 20
Frequency 5 50 84 32 10 6
Solution:
Mid value f x x
CI f fx x x x 51.07
(x)
X
fx 9550 51.07
N 187
1 1
MD = f xx 2697.37 14.42
N 187
14.42
Coefficient of MD = MD/Mean = 0.28 .
51.07
79
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
4. Calculate the mean deviation and its coefficient for the following data.
2, 5, 3, 6, 3, 4, 4
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
5. Calculate mean deviation for the following series.
x 10 11 12 13 14
f 3 12 18 12 3
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
6. Compute mean deviation from mean, for the following data.
No. of students 6 5 8 15 7 6 3
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
80
Merits and Demerits of Mean Deviation
Merits
2. It is rigidly defined
Demerits
3. It is rarely used.
4. Algebraic positive and negative signs are ignored. It is mathematically unsound and
illogical.
Introduction
The standard deviation concept was introduced by Karl Pearson in 1823, it is by far the
most important and widely used measure of central dispersion. Its significance lies in the fact
that it is free from those defects from which the earlier methods suffer and satisfies most of the
properties of a good measure of dispersion. Standard deviation is also known as root mean
square deviation for the reason that it is the square root of the mean of the squared deviation
from the arithmetic mean. Standard deviation is denoted by the small Greek letter (read as
sigma).
The standard deviation measures the absolute dispersion (or variability of distribution;
the amount of dispersion or variability), the greater the standard deviation, the greater will be
the magnitude of the deviations of the values from their mean. A small standard deviation
means a high degree of uniformity of the observation as well as homogeneity of a series; a
large standard deviation means just the opposite. Thus, if we have two or more comparable
series with identical or nearly identical means, it is the distribution with the smallest standard
deviation that has the most representative mean. Hence standard deviation is extremely useful
in judging the representativeness of the mean.
81
Computation of Standard Deviation
Standard Deviation-Individual Series
In case of individual observations standard deviation may be computed by applying any
of the following two methods:
1. By taking deviation of the items from the actual mean.
2. By taking deviations of the items from an assumed mean.
Deviations taken from Actual Mean (Direct Method)
When deviations are taken from actual mean the following formula is applied:
1 2
S.D = X X
n
d 2 d 2
N N
Example 1: Calculate S.D. from the following data with the help of direct method.
X: 10 11 17 25 7 13 21 10 12 14
Solution:
82
Here, X
X
140
= 14 (A whole number)
N 10
1 2
S.D = X X
n
1
274
10
27.4
= 5.23.
Example 2.
Calculate standard deviation with the help of assumed mean for the following data.
240, 260, 290, 245, 255, 288, 272, 263, 277, 251.
Let A = 264
d 2 d 2
N N
2
d 2689, d 1, N 10
83
2689 1 2
10 10
For calculating standard deviation in discrete series, any of the following methods may
be applied:
When this method is applied, deviations are taken from the actual mean, i.e., we find
X X and denote these deviations by x. These deviations are then squared and multiplied
by the respective frequencies. The following formula is applied
fx , where x = X X
N
However, in practice this method is rarely used because if the actual mean is in fractions
the calculations take a lot of time.
d 2 d 2
N N , where d = (X - A)
When this method is used we take deviations of midpoints from an assumed mean and
divide these deviations by the width of class interval, i.e., ‘i’. In case class intervals are unequal,
we divide the deviations of midpoints by the lowest common factor and use ‘c’ instead of ‘i’ in
the formula for calculating standard deviation is:
2
fd 2 fd
i
N N
84
x A
where, d and i = class interval.
i
Example 3:
Use direct method to calculate the S.D. of the following discrete frequency distribution.
1 2
S.D = f X X
N
1
237.64
100
2.3764 1.54 .
85
Example 4:Calculate the standard deviation from the data given below.
Frequency 3 7 22 60 85 32 8
Size of fd2
f d = (x–65) fd
item (X)
3.5 3 –3 –9 27
4.5 7 –2 –14 28
5.5 22 –1 –22 22
6.5 60 0 0 0
7.5 85 +1 +85 85
8.5 32 +2 +64 128
9.5 8 +3 +24 72
2
fd 2 fd
N N
2
fd 362, fd 128, N 217
362 128 2
217 217
86
Solution: Calculation of Standard deviation
2
fd 2 fd
i
N N
2
Here, fd 240, fd 36, N 50, i 5
240 36 2
5
50 50
In continuous series any of the methods discussed above for discrete frequency
distribution can be used. However, in practice it is the step deviation method that is most used.
The formula is
2
fd 2 fd
i
N N
m A
where d , i = class interval.
i
87
Example 6: Calculate mean and standard deviation of following frequency distribution of
marks:
X A
fd i
N
118
35 10 35 5.9 40.9
200
2
fd 2 fd
i
N N
510 118 2
10
200 200
= 1.4839 x 10 = 14.839.
Merits
1. It is rigidly defined.
88
3. It is amenable to further algebraic treatment which makes it the most important and
widely used measure of dispersion.
5. s.d enables us to determine the reliability of means of two or more series having equal
means. In such a situation, a series having minimum S.D. will have the most
representative mean.
Demerits
(b) Compare your answer with the one given at the end of this unit.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
8. Calculate the standard deviation for the following data.
x 7 8 9 10 11 12
f 13 13 18 17 15 14
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
89
9. Calculate the standard deviation for the following data.
180– 190–
C.I 130–140 140–150 150–160 160–170 170–180
190 200
Frequency 22 44 66 90 77 51 31
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
5.8 EXERCISE
1. From the following data, find range and quartile deviation. Also determine their respective
coefficients.
78, 80, 80, 82, 82, 84, 84, 86, 86, 88, 88, 90
2. Find out the value of quartile deviation for the following data.
Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
Frequency 4 5 6 10 11 9 4 1
Ans. QD = 6.458
Ans. MD = 8.5714
90
5. Compute mean deviation from the following series.
x 10 11 12 13 14
f 3 12 18 12 3
Ans. MD = 0.75
Frequency 7 12 18 25 16 14 8
MD = 13.14
7. Using short-cut method and step-deviation method, obtain standard deviation of the
following data.
420, 440, 400, 420, 470, 480, 440, 480, 450, 500
Ans. 30.33
8. Data recorded on the number of seeds per fruit. Calculate the standard deviation.
Ans. SD = 6.75
Ans. SD = 3.77
Size 6 7 8 9 10 11 12
Frequency 3 6 9 13 8 5 4
Ans. SD = 1.6
Ans. 11.88
91
12. Calculate the standard deviation from the following data.
Frequency 3 12 21 28 19 12 5
Ans. SD = 7.21
5.9 SUMMARY
An average or the central value alone cannot describe the distribution adequately.
Thus, the measure of scatteredness of observations around their average is necessary to get
better description of the data. The extent or degree to which data tend to spread around an
average is called dispersion. Measures of dispersion may be absolute or relative. Absolute
measures of dispersion are expressed in the unit of given observation. These measures are
useful to compare variations in two or more distributions in which units of measurement are
same. Relative measure of dispersion are unitless numbers useful for comparing the variation
in two or more distributions in which units of measurement are different.
Absolute measures of dispersion are (1) range (2) inter quartile range and quartile
deviation (3) mean deviation (4) standard deviation
The Range is defined as the difference between two extreme observations, inter quartile
range is the difference between third and first quartiles.
Q3 Q1
Quartile deviation is given by QD = .
2
Mean deviation is defined as the arithmetic mean of the absolute deviations of various
items from an average value such as mean, median or mode.
Standard deviation is defined as the positive square root of the arithmetic mean of the
squares of deviations of observations from the arithmetic mean. The square of standard deviation
is known as variance. The standard deviation is the best measure of dispersion as it satisfies
most of the desirable properties.
1. Range = L - S = 15 - 2 = 13
QD = 3.5
L S 15 2 13
Coefficient of range = 0.76 .
L S 15 2 17
Q3 Q1 11 4 7
Coefficient of QD = Q Q 11 4 15 0.466 .
3 1
92
2. QD = 1.5 kg, coefficient of QD = 0.024
1 1
3. QD = Q3 Q1 44.64 22.19 11.23 .
2 2
4. MD = 1, coefficient of MD = 0.25
5. MD = 0.75
6. MD = 13.184
7. SD = 8.58
8. SD = 3.15
9. SD = 16.41
3. Explain quartile deviation and coefficient of quartile deviation. Also mention its merits
and demerits.
5. Explain the properties of standrd deviation. Why is it called the best measure of dispersion.
5. Define standard deviation and state any four of its important characteristics.
93
UNIT - 6 : MOMENTS, CENTRAL AND NONCENTRAL
MOMENTS, SKEWNESS
Contents
6.0 Objectives
6.1 Introduction
6.2 Moments
6.5 Kurtosis
6.6 Exercise
6.7 Summary
6.0 OBJECTIVES
Define and compute central moments ( r ) and non-central moments r ’ss for the
data.
6.1 INTRODUCTION
The concept of moments has been introduced in statistics is an attempt to represent the
maximum information about original data in a relatively few quantities. It may be defined as the
arithmetic mean of various powers of deviations taken in any distribution from mean X .
Accordingly, there can be moments of various order viz. first order, second order, third order,
fourth order and so on.
94
6.2 MOMENTS
'
The rth raw moment of X about origin is denoted by r and defined as
1
r' f i X ir , where N f i
N i i
The first raw moment about origin i' is called the mean of X and denoted by X
'
f i
i Xi
i.e., i X
N
The rth moment of a variable X about any point A, is denoted by r' and defined as
1 r
r' fi xi A
N i
1 r
N
f d
i
i i , where di X i A .
The rth moment of a variable X about the mean X is called central moment denoted
by r , and given by
1 r
r
N
f X
i
i i X
1
In particular 0
N
f X i
i i X0
1 2
2 f i xi X 2
N i
1 2
r
N
f X
i
i i X
1 r
N
f X
i
i i A A X , where A is constant.
1 r
N
f d
i
i i A x , where di X i A
95
1 r
N
f d
i
i i i'
1
N
2 3 r
fi dir r C1 d ir 1 1' r C2 dir 2 i' r C3 dir 3 1' ... ... 1 1r
2 r r
r r' r C1 r' 1 r C2 r' 2 1' ... 1 11 ... (1)
2
2 2' 11
3
3 3' 32' 1' 2 1'
2 4
4 4' 43' 1' 6 2' 1' 3 1'
' 1 r 1 r
Similarly, r
N
f x
i
i i A
N
f x
i
i i x x A
1 r
N
f y
i
i i 1' , where y x x and x A '
i i 1
1
f y
2 2
r
i i r C1 yir 1 1' r C2 yir 2 1' ... 1'
N
2 2
r r C1 r 11' r C2 r 2 1' ... 1'
2
2' 2 1'
3
3' 3 3 2 1' 1'
2 4
4' 4 43 1' 6 2 1' 1'
These formulas enable us to find the moments about any arbitary point, once the mean
and the moments about mean are known.
96
6.2.2 Effect of Change of Origin and Scale on Moments
Theorem
xi A
Proof: Let ui xi A hui
h
x A hu
and xi x h ui u
1 r
r fi xi x
N i
1 r
N
f h u
i
i i u
1 r
hr . f u i i u
N
Thus, rth central moment of the variable x is hr times rth central moment of variable u.
6.2.3 Theorem: Noncentral moments are independent of change of origin but not scale.
xi A
Proof: Let ui , xi A ui h
h
x A hu and xi x h ui u
' 1 r
Now, r
N
f x
i
i i A
1 r 1
f hu
i i hr . fu
i
r
i
N i N
i.e. rth non-central moment of variable X is hr times the rth non-central moment of
variable u.
97
6.3 SHEPPARD’S CORRECTION FOR MOMENTS
(ii) The frequencies tapper off to zero in both directions, the effect due to grouping at the
midpoints of the intervals can be corrected by the following formulae.
i2
2 (corrected) = 2
12
3 (corrected) = 3
i2 7 4
4 (corrected) = 4 2 i
2 240
Note: First and third moments required no correction as positive and negative deviations
themselves compensate the grouping error.
Skewness
Skewness means ‘lack of symmetry’. We study skewness to have an idea about the
shape of the curve which we can draw with the help of the given data. A distribution is said to
be skewed if
(i) Mean, median and mode fall at different points i.e., Mean Median Mode
(iii) The curve drawn with the help of the given data is not symmetrical but stretched more
to one side than to the other.
Skewness can be measured in absolute terms by finding the difference between the
mean and the mode or mean and median.
= Mean - Median
98
Any positive value obtained by any of the above formulae is marked as the extent of
the positive skewness. Any negative value obtained by any of the above formulae is marked as
the extent of the negative skenwess. If result is zero, then Mean = Median = Mode.
Relative Skewness:
As in dispersion, for comparing two series we do not calculate these absolute measures
but we calculate the relative measures called the coefficients of skewness which are pure
numbers independent of units of measurement. The following are the measures of skewness.
According to Karl Pearson for skewness of a series, the difference between the mean
and mode only should be found out, because mean is an average which is most affected by the
extreme values of a series and mode is an average which is least affected by the extreme value
of a series. Thus
S K P = Mean - Mode
If mode is ill defined, i.e., when it has different values, Karl Pearson proposed to
findout the skewness by the following formula.
S K P = 3(Mean - Mode)
This formula is based on the empirical relationship between mean, median and mode
which is given by
= 3 Mean - 3 Median
= 3(Mean - Median)
For relative measure of skenwess, Pearson suggested that standard deviation should
be taken as the division of the absolute skewness. This is because standard deviation may
possess algebraic properties and is asumed to be the best measure of dispersion. Karl Pearson’s
coefficient of Skewness is given by
Mean - Median
Coefficient of Skewness =
Standard deviation
99
3 Mean - Median
Coefficient of skewness = S K P =
Standard deviation
3 M Md
6.4.2 Remark
3 M Md
SK P are 3 . However, in practice these limits are rarely attained.
Example: Calculate skewness and its coefficient from the following data using Karl Pearson’s
formula.
Wages (Rs.) 10 11 12 13 14 15 16
No. of workers 4 7 9 15 8 5 2
Solution:
10 4 100 40 400
11 7 121 77 847
12 9 144 108 1296
13 15 169 195 2535
14 8 196 112 1568
15 5 225 75 1125
16 2 256 32 512
Mean = X
fx 639 12.78
f 50
2 2
fx fx
Standard deviation
N N
2
8283 639
50 50
100
165.66 163.33
2.33
= 1.527
Highest frequency is 15
Mode = 13.
M Md 12.78 13
SK P 0.144 .
1.527
Skewness is negative indicates that the tail of the left side of the distribution is longer or
fatter than the tail on the right side indicating a strong mode.
Bowley suggested the measure of skewness on the basis of median and both the
quartiles. Bowley’s coefficient of skewness is also known as Quartile coefficinet of skenwss,
it is used when the mode is ill-defined and extreme observations are present in the data, and
also when the distribution has open end classes or unequal class intervals. In these situations
Pearson’s coefficient of skewness cannot be used.
Q3 Q1 2 Median
SK B
Q3 Q1
where Q1 and Q3 are quartiles which are equidistant from the median, which is very
much clear from the diagram given below.
101
In the above diagram AB = BC
Q3 - median = median - Q1
Q3 Q1 - 2 medina = 0
Example: Find the coefficient of skewness from the quartiles and median.
Size 4–8 8–12 12–16 16–20 20–24 24–28 28–32 32–36 36–40
f 6 10 18 30 15 12 10 6 2
Solution:
Size Frequency cf
4–8 6 6
8–12 10 16
12–16 18 34
16–20 30 64
20–24 15 79
24–28 12 91
28–32 10 101
32–36 6 107
36–40 2 109
N = 109
To find Q1 :
N 109
size of th item = 27.25 which falls in the c.f. 34, thus, class corresponding
4 4
to Q1 is 12 16
N h
Using formula Q1 l1 c
4 f
N
l1 12, 27.25, c 16, h l2 l1 4, f 18
4
4
Q1 12 27.25 16 14.5
18
102
To find Median
N 109
54.5 th item, which falls in the c.f. 64.
2 2
N h
Median = l1 c
2 f
4
16 54.5 34
30
= 16 + (20.5) x (0.133)
= 16 + 2.73
= 18.73.
To find Q3
3N 3 109
th item = 81.75 th item
4 4
3N h
Q3 l1 c .
4 f
4
= 24 + (81.75 - 79).
12
= 24.92.
Q3 Q1 2Median
Bowley’s coefficient of skewness = S K B
Q3 Q1
This gives an idea about the concentration of higher or lower data values around the
central value of the data. S K B 0.188 indicates that the data shows asymmetry..
103
6.4.4 Measure of Skewness Based on Moments
The measure of skewness can be defined by using the moments. The measure of
Skewness is obtained by making use of the third moment about the mean. The relative measure
of Skewness is defined as
32
1
23
For a perfectly symmetrical distribution the value of 1 will be zero. The greater the
value of 1 the more skewed is the distribution.
It is to be noted that this measure of Skewness can never give negative value becasue
the value of 3 may be positive or negative, but 32 will always be positive while the value of
2 (variance) is always positive. As such the direction of Skewness (i.e) whether positive or
negative, can not be ascertained from the measure of Skewness. In view of the above limitations
Prof. R. A. Fisher introduced another measure of Skewness, which is given below:
1 1
6.5 KURTOSIS
If we know the measure of central tendency, dispersion and skewness, we still cannot
form a complete idea about the distribution. In addition to these measures, we should know one
more measure which Prof. Karl Pearson calls the “Convexity of the frequency curve or kurtosis”.
Kurtosis enables us to have an idea about the ‘flatness or peakedness’ of the frequency curve.
It is measured by the coefficient 2 or its derivation 2 is given by
4
2 , 2 3
22 2
Curve of type A, which is neither flat nor peaked is called the normal curve or mesokurtic
curve and for such curve 2 3, 2 0 . Curve of type B, which is flatter than the normal
curve is known as platy kurtic and for such a curve 2 3, i.e. 2 0 . Curve of type C which
is more peaked than the normal curve is called lepto kurtic and for such a curve 2 3 , i.e.
2 0 .
104
Example: The first four moments of a distribution about the value 4 of the variable are -1.5, 17,
-30 and 108. Find the moments about mean, 1 and 2 . Find also the moments about
(i) the origin and (ii) any arbitary point x= 2.
Solution: We are given A 4, 1' 1.5, 2' 17, 3' 30 and 4' 108
2 2
2 2' 1' 17 1.5 17 2.25 14.75
3 3
3 3' 32' 1' 2 1' 30 3 17 1.5 2 1.5 39.75
4
4 4' 4 3' 1' 6 2' 1'2 3 1'
2 4
108 4 30 1.5 6 17 1.5 3 1.5
= 142.3125
2
32 39.75
1 0.4926
23 14.75 3
4 142.3125
2 2
0.6543
22 14.75
105
(i) Moments about origin, we are given x 2.5, 2 14.75, 3 39.75 and 4 142.31 .
2 2
2' 2 1' 14.75 2.5 14.75 6.25 21
3 3
3' 3 32 1' 1' 39.75 3 14.75 2.5 2.5 166
2 4
= 142.3125 + 4(39.75) (2.5) + 6(14.75) 2.5 2.5 = 1132
2
2' 2 1' 14.75 0.25 15
3 3
3' 3 32 1' 1' 39.75 3(14.75) 0.5 0.5 62
2 4
= 142.3125 + 4(39.75) (0.5) + 6(14.75) 0.5 0.5 244
Example: The first four moments of a distribution about the value 5 are -4, 22, -117 and 560.
Find the corresponding moments about the mean. Also find 1 and 2 .
2 2
2 2' 1' 22 4 6
3 3
3 3' 32' 1' 2 1' 117 3(22)(4) 2 4 19
2 4
4' 4 43 1' 6 2 1' 3 1'
2 4
= 560 - 4(-117) (-4) + 6(22) 4 3 4 32 .
106
2
32 19
1 1.67
23 6 3
4 32 32
2 0.89
2 6 2 36
2
The values of 1 & 2 indicate that the data is not symmetric. If you recall that for the
data to be symmetric the 1 0 & 2 3 . Here 1 1.67 and 2 0.89 thus not symmetric.
Example: Calculate the first four moments about mean for the following data.
x 11 13 14 15 16
f 1 2 3 3 1
Solution:
dXX
x f fx f.d fd2 Fd3 fd4
= X – 14
11 1 11 –3 –3 9 –27 81
13 2 26 –1 –2 2 –2 2
14 3 42 0 0 0 0 0
15 3 45 1 3 3 3 3
16 1 16 2 2 4 8 16
f =
fx =140 0 18 –18 102
10
Mean = X
fx 140 14
N 10
1
fd 0,
fd
18
1.8
2
N N 10
3 4
3
fd
18
1.8, 4
fd
102
10.2
N 10 N 10
107
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
1. The first four moments of a distribution about the value 4 of the variable are as under
Find out the mean of the distribution and calculate central moments.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
frequency: 1 3 4 2
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. The first four moments of a distribution about x= 2 are 1, 2.5, 5.5 and 16. Calcualte the
four moments about x and about zero.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
108
4. Find the first four central moments from the following data. Also find 1 and 2 .
Hours worked per week 30–33 33–36 36–39 39–42 42–45 45–48
No. of officers 4 8 14 36 30 8
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
6.6 EXERCISE
1. The first four central moments of distribution are 0, 2.5, 0.7 and 18.75. Comment on
the Skewness and Kurtosis of the distribution.
2. The first four moments of a distribution about x= 2 are -2, 12, -20 and 100. Calculate
the four moments about mean. Also calculate 2 and show whether the distribution is
lepto kurtic or platy kurtic.
3. Calcualte the first four moments about the mean from the following data. Also calculate
the values of 1 and 2 .
No. of students 5 12 18 40 15 7 3
Frequency 6 8 17 21 15 11
[Ans. Mean = 216, Mode = 22, 7.137 , Karl Pearson coefficient of skewness =
0.056]
109
5. Calcualte Bowley’s coefficient of Skewness to the following data.
No. of families 45 26 18 13 12 12 4
6.7 SUMMARY
Moments may be defined as the arithmetic mean of various powers of deviations taken
in any distribution from mean x . There can be moments of various orders viz., first order,,
second order, third order, fourth order and so on. Moments upto fourth order are sufficient to
know all the features of the distribution.
The technique which measures the shape of the distribution is called Skewness. if the
distribution of data is not symmetrical, then its is called asymmetrical or, skewed. It is known
that a distribution is symmetrical if the frequencies are symmetrically distributed about mean
i.e., the values of the variable are equidistant from the mean. On the other hand if this property
is not satisfied the distribution is called asymmetrical or skewed. Measures of skewness are
obtained by Karl Pearson’s coefficient of skewness and Bowley’s coefficient of Skewness.
Karl Pearson’s coefficient of Skewness is based on mean, median, mode and standard deviation
where as Bowley’s coefficient of Skewness is based on quartiles.
Kurtosis is another measure which gives an idea about flatness of the curve. If a curve
is peaked like a normal curve it is called mesokurtic, if a curve is more peaked than normal
curve it is called leptokurtic and if it is flatter than the normal curve it is called platy kurtic.
1. Mean = 5, 2 3, 3 0, 4 26
2. 1' 3, 2' 90, 3' 900, 4' 2100, 2 81, 3 144, 4 14817
The moments about zero are 1' 3, 2' 10.5, 3' 40.5, 4' 168
110
6.9 MODEL EXAMINATION QUESTIONS
1. Define central and non-central moments. Derive the relation between them.
2. Show that the moments are independent of change of origin but not scale.
111
BLOCK - III: ADDITION RULE
In this block, you shall study the basic concepts of probability, addition rule of probability.
In unit 7, you can find the basic terminology used in probability and different approaches to
theory of probability. In unit 8, we introduce addition theorem of probability for two events and
for n events. In unit 9, you will learn some more important theorems and their uses in solving
problems.
112
UNIT - 7: DEFINITION AND BASIC CONCEPTS
Contents
7.0 Objectives
7.1 Introduction
7.5 Exercises
7.6 Summary
7.0 OBJECTIVES
Appreciate the use of probability theory in our day-to-day life and in the decision
making in the face of uncertainty.
7.1 INTRODUCTION
The first foundation of Mathematical theory of probability was laid in the 17th century
by two French Mathematicians, Blaise Pascal (1623-62) and Pierre Fermat (1601-65), succeeded
in obtaining exact probability for certain gambling problems involving the dice. Since then, there
has been a steady and continuous development in the theory of probability. Starting with games
of chance, probability has become a past of our every day life. Probability theory is now being
113
applied in the analysis of social, economic, physical, computer, biological, business and
management sciences.
For understanding the concept of probability theory, the following concepts must be
clearly grasped.
In this section we shall explain the various terms which are used in the definition of
probability under different approaches.
Example: Tossing a coin, throwing a dice, drawing a card from a pack of 52 cards.
7.2.4 Sample Space:The set of all possible outcomes of a random experiment is called
sample space. It is denoted by S or .
Eg: (1) If a coin is tossed, either a head or a tail appears. The sample space is
S H ,T .
(2) If a dice is thrown, the possible outcomes are 1, 2, 3, 4, 5 and 6, then the sample
space is S 1, 2,3, 4,5,6 .
7.2.5 Event: The possible outcomes of a random experiment are called events i.e. any
element or member of sample sapce is an event.
When two coins are tossed, the event “two heads is an elementary event”.
114
Compound (Composite) Event
An event which is not simple or elementary is called a compound event. Every compound
event can be represented by the union of a set of elementary events.
The total number of all possible outcomes of a random experiment is known as the
exhaustive events.
2. In tossing of a coin, there are two exhaustive events head and tail. If tossing an
unbiased coin, head and tail are equally likely events.
In throwing an unbaised die, all the six faces are equally likely events.
In drawing a card from a pack of cards the number of cases favourable to drawing a
diamond is 13 and for drawing a black card is 26.
Example: If a coin is tossed twice, the result of second toss does not depend upon the
result of first toss. If a die is thrown twice, the result of the first throw does not affect the result
of the second throw. .2.11 Sure (or) Certain Event
An event is said to be sure event if all the possible outcomes of an experiment are
favourable to the event i.e., if P E 1 then E is called sure or certain event.
Impossible Event
When none of the outcomes are favrourable to the event then it is called impossible
event i.e. if P E 0 then E is called an impossible event.
115
7.3 FUNDAMENTAL RULES OF COUNTING
1. If an event can happen in any one of ‘m’ ways, and when another event can occur in
any one of ‘n’ ways, then the number of ways in which both events can happen
together in m x n = mn ways.
Eg: If two coins are tossed simultaneously, the first coin can land in any one of 2 ways.
For each of two ways the second coin can land in 2 ways. The two coins can land in
2 x 2 = 4 ways.
2. If an event A can occur in total of m ways and if a different event B can occur in n
ways, then the event A or B can occur in m n ways provided the two events are
mutually exclusive.
Factorial: In the following rules we will observe that the product of consecutive integers are
involved. We represent this product by a factorial symbol.
n! n n 1 n 2 ...3.2.1
Permutation
Let us consider the three alphabets, X, Y and Z. The possible permutation of three
letters are X Y Z , X Z Y, Y X Z, Y Z X, Z X Y, Z Y X. Thus, we get six different arrangements
of three letters or objects. Thus, we have 3 choices for the first position, 2 for the second,
leaving only 1 for the last position, giving a total of 3 x 2 x 1 = 6 permutations.
116
Permutations of n Objects taken r at a time
The number of permutations of the three letters a, b and c will be 3! = 6. Now, let us
consider the number of permutations that are possible by takine 3 letters a, b and c, two at
time. These permutation would be ab, ac, ba, ca, bc, cb , (i.e.) a total of 3 x 2 = 6 permutations.
Combinations
n n!
Cr
r ! n r !
5 5!
C2 10
2! 3!
If a random experiment has ‘n’ exhaustive, mutually exclusive and equally likely
outcomes, out of which ‘m’ are favourable to the occurence of an event E, then the probability
of occurrence of event E is denoted by P E and defined as
and the probability that the event E does not occur will be
117
Number of cases unfavourable to the event E
PE
Total number of exhaustive cases
nm m
1
n n
m
PE 1 1 P E
n
P E P E 1
m
0
n
m
0 1
n
0 P E 1.
Example 1: An unbaised die is thrown once, then find the probability of getting an even number.
2, 4, 6 i.e., m = 3
m 3 1
PE .
n 6 2
Example 2: What is probability of getting a head, if two coins are tossed simultaneously?
m 2 1
PE .
n 4 2
118
7.4.2 Statistical or Empirical Probability
In classical definition of probability, ‘n’ is finite and all cases are equally likely. These
aer very restrictive conditions and as such, cannot cover all the situations. For overcoming such
situations, the statistical or empirical definition of probability is useful.
M
P E lim
n N
M
where is called relative frequency or frequency ratio of an event E connected with
N
a random experiment.
This statistical definition of probability removes all the limitations of the mathematical
definition. The only limitation of statistical definition is that it is difficult to prove that existence
of a limit to the relative frequency.
Example: Consider a coin tossing experiment and let E be the event that a throw results in a
head. If the coin is tossed 10 times resulting in 6 heads and 4 tails, the relative frequency of
6
head is 0.6 . However, if the experiment is carried out a very large number of times we
10
expect that the relative frequencies of heads will become stable and tend towards 0.50. This
indicates that though the results of an individual experiment are unpredictable, the average
results of a long sequence of random experiments show a very striking regularity and are some
what predictable.
The axiomatic approach to probability, which closely relates the theory of probability
with the modern metric theory of functions and also set theory, was proposed by Kolmogorov,
a Russian mathematician in 1933. The axiomatic definition of probability includes both the
classical and the statistical definition as particular cases and overcomes the deficiencies of
each of them.
2. P S 1 , S is a sure event.
119
3. If An is any infinite or finite, sequence of mutually exclusive events in B, then
n n
P Ai P Ai
i 1 i 1
Now, if an event A consists of m sample points, then the probability of the event will be
n A
nS
Example 3: In a simultaneous throw of two dice (i) find the probability of getting a total of 6 (ii)
the total number on the dice is greater than 8, (iii) the total of the numbers on the dice is any
number from 2 to 12, both inclusive.
S = { (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3,
1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3),
(5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) }
(1, 5), (2, 4), (3, 3), (4, 2), (5, 1), i.e. m = 5
5
Probability that total number of events on two dice is 6 = .
36
(3, 6), (4, 5), (4, 6), (5, 4), (5, 5), (5, 6), (6, 3), (6, 4), (6, 5), (6, 6)
i.e. m = 10
10 5
Probability that total number of events on two dice is greater than eight is = .
36 18
120
(iii) The probability of the total of the numbers on the dice is any number from 2 to 12 is
one; as the total of the numbers on the two dice certainly ranges from 2 to 12.
Example 4: Four cards are drawn at random from a pack of 52 cards. Find the probability that
(i) They are two kings and two queens
52
Solution: Four cards can be drawn from a well shuffled pack of 52 cards in C4 ways, which
gives exhaustive number of cases.
4
C2 4 C2
P [two kings and two queens] = 52 .
C4
(ii) Since there are 26 black cards and 26 red cards in a pack of cards.
26
C2 4 C2
P [two black and two red] = 52 .
C4
(b) Compare your answer with the one given at the end of this unit.
1. A coin is successively tossed three times. Find the probability of getting (i) exactly one
head (ii) exactly two heads (iii) exactly one head or exactly two heads.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2. One card is randomly drawn from a pack of 52 cards. Find the probability that (i) the
drawn card is red (ii) the drawn card is red and a king.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
121
3. A bag contains 10 white and 8 green balls. Two balls are drawn at random from the
bag. Find the probability that (i) both of them are white and (ii) one is white and other
is green.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
4. An integer is chosen at random from 50 digits. What is the probability that the integer
is divisible by 6 or 8?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
7.5 EXERCISES
2. In an experiment of throwing a coin three times, write down the sample space. How
many points are there in the sample space.
4. There are 6 defective items in a sample of 30 items. Find the probability that an item
chosen at random from the sample is (i) non-defective (ii) defective.
Ans. 1/2
Ans. 1/6
7. If two dice are thrown simultaneously, what is the probability of getting a total of 6?
Ans. 5/36
122
8. Two dice are thrown simultaneously. What is the Sample space.
7.6 SUMMARY
Probability can be defined in three ways according to the mathematical definition, the
probability of an event A is
m nm
P A 1 P A 1
n n
In statistical or empirical definition, the probability is defined as the limiting value of the
ratio of the number of times the event A happens to the number of trials, as the number of trials
becomes infinite
m
i.e. P A lim .
n n
(a) 0 P A 1
(b) PS 1
n n
(c) P P Ai P Ai , where A , A , ..., A are mutually exclusive events.
1 2 n
i 1 i 1
123
7.7 CHECK YOUR PROGRESS - MODEL ANSWERS
S = {H H H, H H T, H T H, T H H, T T H, T H T, H T T, T T T}
n = 8.
{H T T, T H T, T T H} and m = 3
m 3
PE .
n 8
{H H T, H T H, T HH}, and m = 3
3
PE
8
{H T T, T H T, T T H, H H T, H T H, T H H}, and m = 6
6 3
PE .
8 4
m 26
26 1
P (red and a King) =
52 2
(ii) There are only two cards which are red Kings
m 2
2 1
P ( getting a red colour and a King) = .
52 26
18 17
Two balls can be drawn out of 18 by 18
C2 ways = 153
1 2
124
n 153 .
10 10 9
C2 45
1 2
m 45
45 15
P(both white balls)
153 51
10
(ii) One white and one green ball can be drawn by C1 8C1 ways.
80
P (one white and one green ball) = .
153
Favourable cases of A are {6, 12, 18, 24, 30, 36, 432, 48} i.e. 8
Favourable cases of B are {8, 16, 24, 32, 40, 48} i.e. 6
8 6 2
P A , P B , P A B
50 50 50
8 6 2 12 6
P(A or B) P A B .
50 50 50 50 25
i) Sample Space
125
3. Explain mathematical, statistical and axiomatic definitions of probability.
iii) The total of the numbers on the dice is any number from 2 to 12, both inclusive.
3. Four cards are drawn at random from a pack of 52 cards. Find the probability that
126
UNIT - 8: ADDITON THEOREM OF PROBABILITY
Contents
8.0 Objectives
8.1 Introduction
8.3 Exercises
8.4 Summary
8.0 OBJECTIVES
Understand the addition theorem and its use in solving problems in various diversified
situations.
8.1 INTRODUCTION
The probability of happening an event can easily be found using the definition of
probability. But just the definition cannot be used to find the probability of happening atleast one
of the given events. A theorem known as addition theorem in the probability is the process of
determining probability that one or more events occur.
Statement:For any two events A and B defined on the sample space S then
P A B P A P B P A B
127
Proof: Consider the venn diagram
Fig. 8.2.1
From venn diagram figure 8.2.1 we can observe that A and A B are disjoint.
A B A A B
P A B P A A B
P A P A B
P A B P A P A B P A B P A B
Therefore, P A B P A P B P A B .
n n
P Ai P Ai P Ai A j P Ai A j Ak
i 1 i 1 1 i j n 1i j k n
n 1
... 1 P A1 A2 ... An
128
Proof: This theorem can be proved by the principle of mathematical induction.
P A1 A2 P A1 P A2 P A1 A2 ... (1)
2 2
i.e. P Ai P Ai 1 P A1 A2
2 1
i 1 i 1
r r
P Ai P Ai P Ai Aj ... 1 P A1 A2 ... Ar ... (2)
r 1
i 1 i 1 1 i j r
r 1 r
P Ai P Ai Ar 1
i 1 i 1
r r
P Ai P Ar 1 P Ai Ar 1 (using (1))
i 1 i 1
r
r 1
P Ai P Ai Aj ... 1 P A1 A2 ... Ar
i 1 1i j r
r
P Ar 1 P Ai Ar 1 (using (2))
i 1
r 1
r 1
P Ai P Ai Aj ... 1 P A1 A2 ... Ar
i 1 1i j r
r
P Ai Ar 1 P Ai Aj Ar 1 ... 1r 1 P A1 A2 ... Ar Ar 1
i 1 1i j r
(using (2))
r 1
r
P Ai P Ai Aj ... 1 P A1 A2 ... Ar 1
i 1 1 i j r 1
129
The result is true for n r 1 events. Hence, by the principle of mathematical
induction, the result is true for all positive integral values of n.
Example 1: A card is drawn at random from a well shuffled pack of 52 cards. Find the
probability of getting an ace or a spade.
Then A= set of all aces and B = set of all spades and A B = set of an ace of spade.
n S 52
4 13 1
P A , P B , P A B
52 52 52
4 13 1 16 4
.
52 52 52 52 13
Example 2:A construction company is bidding for two contracts A and B. The probability that
the company will get contract A is 3/5. will get contract B is 1/4 and the probability that the
company gets both the contracts is 1/8. What is the probability that the company will get
contract A or B?
Solution:Let A and B be the respective events of getting the contracts A and B. Then, we are
given that
3 1 1
P A , P B and P A B
5 4 8
P(A or B) = P A B P A P B P A B
3 1 1 29
.
5 4 8 40
Example 3: A bag contains 30 balls numbered from 1 to 30. One ball is drawn at random. Find
the probability that number of drawn ball is a multiple of 4 or 9.
Solution: Let A be the event that the drawn number is a multiple of 4 then the number of cases
favourable to A are {4, 8, 12, 16, 20, 24, 28} i.e. 7.
Let B be the event that the drawn number is a multiple of 9, then the number of cases
favourable to B are {9, 18, 27} i.e. 3.
130
Total number of event = 30
7 3 0
P A , PB , P A B 0
30 30 30
P(A or B) = P A B P A P B P A B
7 3 10 1
0 .
30 30 30 3
1 1
P A , P ( B)
6 6
P(A or B) = P (5 or 6) = P A P B
1 1 2 1
= .
6 6 6 3
Example 5: Three news papers A, B and C published in a certain city. It is estimated from a
survey that of the adult population. 20% read A, 16% read B, 14% read C, 8% read both
A and B, 5% read both A and C, 4% read both B and C, 2% read all three. Find what percentage
read atleast one of the papers?
Solution: Let X, Y and Z denote the events that the adults read newspapers A, B and C
respectively.
20 16 14
We are given P X P Y PZ
100 100 100
8 5 4 2
P X Y , P X Z , P Y Z , P X Y Z
100 100 100 100
P X Y Z P X P Y P Z P X Y P Y Z P X Z P X Y Z
131
20 16 14 8 5 4 2
100 100 100 100 100 100 100
35
100
Hence 35% of the adult population reads atleast one of the newspapers.
Example 6: If two dice are thrown, what is the probability that the sum is (a) greater than 8 (b)
neither 7 nor 11?
The required event can happen in the following mutually exclusing ways:
P S 8 P S 9 P S 10 P S 11 P S 12
If two dice are thrown, then sample space contains 62 36 sample points. The number
of favourable cases are
S = 9: (3, 6), (6, 3), (4, 5), (5, 4) i.e. 4 sample points.
4
P S 9
36
3
P S 10
36
2
P S 11
36
1
P S 12
36
4 3 2 1 10 5
P S 8 .
36 36 36 36 36 18
132
(b) Let A be an event of getting the sum of 7 and let B be an event of getting the sum of 11
with a pair of dice.
S = 7: (1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3), i.e. 6 favourable events.
6 1
P A P S 7
36 6
2 1
P S 11 .
36 18
P A B 1 P A B
1 P A P B
1 1 7
1 .
6 18 9
Example 7: Two dice are tossed. Find the probability of getting an even number on the first die
or a total of 8.
Let A be an event of getting an even number on the first dice and let B be an event of
getting sum of the points obtained on the two dice 8.
n A 18 1 n B 5 3 1
P A , PB p A B .
n S 36 2 n S 36 and 36 12
18 5 3 20 5
P A B P A P B P A B .
36 36 36 36 9
133
Example 8: The probability that a student passes a physics test is 2/3 and the probability that
he passes both physics test and English test is 14/45. The probability that he passes atleast one
test is 4/5. What is the probability that he passes the English test?
Solution: Let A an event that student passes a Physics test, let B an event that student passes
an English test.
We are given
2 14 4
P A , P A B , P A B and we want P B .
3 45 5
P A B P A P B P A B
4 2 14
P B
5 3 45
4 14 2 36 14 30 4
P B .
5 45 3 45 9
(b) Compare your answer with the one given at the end of this unit.
1. An integer is chosen at random from 50 digits. What is the probability that the integer
is divisible by 6 or 8?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
2. The probability that a student passes Physics test is 2/3 and the probability that he
passes both Physics test and an English test is 14/45. The probability that he passes
atleast one test is 4/5. What is the probability that he passes the English test?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
134
3. A card is drawn from a pack of 52 cards. Find the probability of getting a King or a
heart or a red card.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
8.3 EXERCISE
1. State and prove the addition theorem of probability for two events.
3. In a single throw of three dice, find the probability of getting a total of 17 or 18?
Ans. 1/54
4. A card is drawn at random from a well-shuffled pack of cards. Find the probability that
it is a heart or a queen?
Ans. 4/13
5. If the probability of player A winning a game is 1/8 and that of player B winning the
same is 1/4, what is the probability that one of the players will win?
Ans. 3/8
Ans. 2/9
8.4 SUMMARY
The fundamental rule of probability which is useful for simplifying the calculation of
probabilities in addition theorem. The addition theorem in the probability concept is the process
of determination of the probability that either event A or event B occurs or both occur. The
notation for addition between two events A and B is denoted as and pronounced as union.
P A B P A P B P A B
135
8.5 CHECK YOUR PROGRESS - MODEL ANSWERS
Favourable cases of A are {6, 12, 18, 24, 30, 36, 42, 48} = 8
8 6 2
P A , P B , P A B
50 50 50
6
P A B .
25
2 14 4
P A , P A B , P A B and we want P B .
3 45 5
P A B P A P B P A B
4 2 14
P B
5 3 45
4 14 2 36 14 30 4
P B .
5 45 3 45 9
136
C A : The card drawn is a red king = n C A 2
4 13 26 1 13
P A , P B , P C , P A B , P B C
52 52 52 52 52
2 1
P C A , P A B C
52 52
P A B C P A P B P C P A B P B C P C A P A B C
4 13 26 1 13 2 1
52 52 52 52 52 52 52
28 7
.
52 13
1. State and prove addition theorem of probability for two events . Also extend for n
events.
2. If two dice are thrown, what is the probability that the sum is
2. A card is drawn at random from a well shuffled pack of 52 cards. Find the probability
of getting an ace or a spade.
137
UNIT - 9: SOME MORE IMPORTANT THEOREMS
Contents
9.0 Objectives
9.1 Introduction
9.4 Exercises
9.5 Summary
9.0 OBJECTIVES
learn some important theorems on probability and their use in solving problems.
9.1 INTRODUCTION
In this unit, we shall prove a few simple theorems which help us to evaluate the
probabilities of some complicated events in a rather simple way.
Boole’s inequality is also known as the union bound. It is applicable at places when we
have to show that the union probability of some events is less than a particular value. It is very
simple yet useful.
9.2.1 Theorem
Proof: Impossible event cotains no sample point. Certain event S and impossible event are
mutually exclusive
S S
138
From axiomatic definition of probability
P S P S P
P S P P S
P 0 .
9.2.2 Theorem
P A A P S
P A P A 1
P A 1 P A .
9.2.3 Theorem
139
B A B A B
P B P A B P A B
P A B PB P A B
9.3.1 Statement
n n
(a) Ei P Ei n 1
P
i 1 i 1
n n
(b) Ei P Ei
P
i 1 i 1
P E1 E2 P E1 P E2 P E1 E2 1
P E1 E2 P E1 P E2 1
2 2
i.e. P Ei P Ei 2 1
i 1 i 1
Let us assume that the result is true for n = k events such that
k k
P Ei P Ei k 1
i 1 i 1
k 1 k k 1
P Ei P Ei Ek 1 P Ei P Ek 1 1
i 1 i 1 i 1
k
P Ei k 1 P Ek 1 1
i 1
140
k 1 k 1
P Ei P Ei k 1 1
i 1 i 1
P E1 E2 P E1 P E2 P E1 E2
P E1 P E2 P E1 E2 0
2 2
P Ei P Ei
i 1 i 1
k k
P Ei P Ei
i 1 i 1
k k
P Ei P Ei Ek 1
i 1 i 1
k
P Ei P Ek 1
i 1
k
P Ei P Ek 1
i 1
k 1 k 1
P Ei P Ei
i 1 i 1
141
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
(i) P A B P A P B
(ii) P B P A
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
9.4 EXERCISES
Ans. 1
Ans. 1
9.5 SUMMARY
(ii) P A 1 P A
(iii) P A B P B P A B
n n
(a) Ei P Ei n 1
P
i 1 i 1
n n
(b) P Ei P Ei
i 1 i 1
142
9.6 CHECK YOUR PROGRESS - MODEL ANSWERS
A B A B
P A P B A B
P B P A B
P A B P A P B
(ii) P A B 0 P A P B 0
P B P A
Hence, P B P A
1. For any two events A and B defined on the sample space S, prove that
P A B PB P A B .
143
BLOCK - IV: BAYE’S THEOREM
The units included in this block are:
144
UNIT - 10: CONDITIONAL PROBABILITY
Contents
10.0 Objectives
10.1 Introduction
10.2 Definition
10.4 Exercise
10.5 Summary
10.0 OBJECTIVES
Understand how to use the multiplication rule to find the probability of the intersection
of two events.
10.1 INTRODUCTION
The concept of conditional probability is one of the most fundamental and one of the
most important concepts in probability theory. Conditional probability is a measure of the probability
of an event given that another event has occured. P(A) is the probability of an event A which
represents the likelihood that a random experiment will result in an outcome in the set A relative
to the sample space S of the random experiment. While evaluating probability of some event,
we already have some information stemming from the experiment. That is, a conditional
probability is a probability that a certain event will occur given some knowledge about the
outcome of some other event.
145
10.2 DEFINITION
The conditional probability that an event A will occur, given that B has already occurred.
It is denoted by P A / B (read as conditional probability of A given B) and it is defined as
P A B
P A/ B , PB 0
P B
Example 1:A die is rolled once. If the outcome is an odd number, what is the probability that it
is prime?
3 3 2
P A , P B , P A B
6 6 6
P A B 2/ 6 2
P B / A
P A 3/ 6 3 .
Example 2: Two coins are tossed. What is the conditional probability of getting two heads
given that atleast one coin shows a head?
A = event that atleast one coin shows a head then A = {(HT), (TH), (HH)}
A B H H
3 1 1
P A , P B , P A B
4 4 4
P A B 1/ 4 1
P B / A
P A 3/ 4 3 .
146
Check Your Progress:
(b) Compare your answer with the one given at the end of this unit.
1. In a class 40% students read Statistics, 25% Mathematics and Statistics. One student
is selected at random. Find the probability:
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
10.3.1 Theorem
P A B P A .P B / A , P A 0
P B .P A / B , P B 0
n( A) nB n A B
Proof: We have P A ; P B and P A B ...(*)
nS nS n S
For the conditional event A / B , the favourable outcomes must be one of sample points
of B, i.e. for the event A / B , the sample space is B and out of the n( B) sample points,
n A B pertain to the occurrence of the event A.
147
n A B
Hence, P A / B
n B
n B n A B
P A B P B .P A / B
nS nB
Similarly,
n A n A B
P A B P A .P B / A
nS n A
Thus, the probability of the simultaneous occurrence of two events A and B is equal to
the product of the probability of one of these events and the conditional probability of the other,
given that the first one has occurred.
Theorem
P A1 A2 P A1 .P A2 / A1
P A1 A2 A3 P A1 A2 A3
P A1 .P A2 A3 / A1
P A1 .P A2 / A1 .P A3 / A1 A2
148
The result is true for n = 2 and 3 events.
Let us suppose that the result is true for n = k events.
The result is true for n = k + 1. Hence, by the principle of mathematical induction, the
result is true for all the positive integral values.
1 1 1
Example 3: LetA and B be two events such that P A , P B and P A B .
2 3 4
Obtain probabilities (i) P A / B (ii) P A B and (iii) P A B .
1 1 1
Solution: We are given P A , P B and P A B
2 3 4
P A B 1/ 4 3
(i) P A / B =
P B 1/ 3 4
(ii) P A B = P A P B P A B
1 1 1 7
2 3 4 12
(iii) P A B = 1 P A B
7 5
= 1 .
12 12
Example 4: Two dice are thrown. Find the probability that the sum of the numbers in the two
dice is 10, given that the first die shows a six.
Thus, A = {(4, 6), (5, 5), (6, 4)}; B = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
and A B 6, 4 , Also n S 36
149
n A 3 n B 6
Thus, P A n S 36 , P B n S 36
n A B 1
P A B
nS 36
P A B 1/ 36 1
P A/ B
P B 6 / 36 6 .
(b) Compare your answer with the one given at the end of this unit.
1 1 11
2. Let A and B be events such that P A , P B and P A B . Find
3 5 30
(i) P A / B (ii) P B / A .
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
10.4 EXERCISE
1. A bag contains 5 white and 3 black balls. Two balls are drawn at random one after the
other without replacement. Find the probability that both balls drawn are black.
3
(Ans. P A B )
28
2. Find the probability of drawing a queen, a king and a knave (Jack) in that order from a
pack of cards in three consecutive draws, the cards drawn not being replaced.
[Ans. 0.00048]
150
(iii) a male, if a smoker is already selected is 2/3.
4. A bag contains 17 counters marked with the numbers 1 to 17. A counter is drawn and
replaced; a second drawing is then made what is the probability that the first number
drawn is even and the second odd?
[Ans. P A B 9 / 34 ]
5. Sixty percent of the employees of the XYZ corporation are college graduates. Of
these, ten percent are in sales. Of the employees who did not graduate from college,
eighty percent are in sales. What is the probability that an employee selected at random
is in sales?
10.5 SUMMARY
P A B
given by P B / A , P A 0 .
P A
Thus, the probability of an event B occurring when it is known that some event A has
already occurred is called the conditional probability.
P A B P A .P B / A
151
10.6 CHECK YOUR PROGRESS - MODEL ANSWERS
40
number of A = 40, P(A) =
100
25
number of B = 25, P(B) =
100
15
number of A and B = 15, P A B
100
P A B 15 /100 15 3
i) P A / B
P B 25 /100 25 5
P A B 15/100 15 3
ii) P B / A
P A 40 /100 40 8
P A B 1/ 6 5
2. P A/ B
PB 1/ 5 6
P A B 1/ 6 3 1
P B / A
P A 1/ 3 6 2
1. State and prove multiplication theorem of probability for two events and also extend it
for ‘n’ events.
1 1 1
3. Let A and B be two events such that P A , P B and P A B . Obtain
2 3 4
the probabilities.
i) P A / B
ii) P A B and P A B .
152
UNIT - 11: INDEPENDENT EVENTS
Contents
11.0 Objectives
11.1 Introduction
11.4 Examples
11.5 Summary
11.0 OBJECTIVES
11.1 INTRODUCTION
In a random experiment the occurence of an event E may affect or may not affect the
occurence of another event F. In the case E may not affect F, we call the events independent.
In the case E may affect F, we call E, F are dependent.
In the random experiment of tossing of a dice repeatedly, the event of getting ‘4’ in 1st
throw is independent of getting ‘4’ in second, third or subsequent throws.
In drawing cards from a pack of 52 cards, the outcome of the second draw will depend
upon the card drawn in the first draw. Howerver, if the card drawn in the first draw is put back
in the place before drawing the second card, then the outcome of the second draw will be
independent of the first draw.
Two or more events are said to be independent if the occurrence of one event does not
affect the occurrence of all other events. For example in tossing of a coin getting head and tail
are independent events. Because the event tail will not affect the happening or no happening of
the other event called head. In tossing of coin the second trial is independent of the first trial.
153
The events therefore, are called independent. But if out of 52 cards one card is drawn, then
only 51 cards are left. Unless the card is replaced, the composition stands changed and the
probability of second card is affected. In this case the events are dependent i.e. the second
draw depends on the first draw and third draw depends on the first and second draw and so on.
When events are independent the probability of occurrence of events A and B is equal to the
product of their unconditional probabilities. For instance P(A and B) = P A B P A .P B .
11.2.1 Illustration
A box contains 5 white and 3 black balls. A ball is drawn at random frorn the bag and
replaced. Another ball is drawn at random after the replacement. Find the probability of getting
both are white black balls.
Solution: Let A be the event of getting black ball in the first draw and B be the event of getting
black ball in the second draw.
Since there are 5W+3B =8 balls in a box out of 8 balls 3 balls are black. The probability of
getting black ball in the first draw = P(A) = 3/8
In the second draw balls are replaced and the number of balls in the box are 8, therefore the
probability of getting again black ball not affected by the first draw, which is P(B) = 3/8.
3 3 9
Hence P(A and B) = P A B P A .P B .
8 8 64
n
A finite set of events Ai i 1 is pairwise independent if every pair is independent that
is, if and only if for all distinct pairs of indices m, k,
P Am Ak P Am P Ak .
k k
P Bi PB i
i 1 i 1
This is called the multiplication rule for independent events. Note that it is not a single
condition involving only the product of all the probabilities of all single events; it must hold true
for all subsets of events.
For more than two events, a mutually independent set of events is (by definition) pairwise
independent; but the converse is not necessarily true.
154
Self - independence
P A P A A P A .P A P A 0 or 1
Thus an event is independent of itself if and only if almost surely occurs or its complement
almost surely occurs; this fact is useful when proving zero-one laws.
(b) Compare your answer with the one given at the end of this unit.
11.2.2 In the random experiment of tossing a coin twice, are the cases of getting head first
and then the tail independent events?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
11.2.3 In the random experiment of tossing a coin twice the cases of getting all the tails are
independent events?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
If the events are not independent then the events A and B are so related that occurrence
of B is affected by the occurrence of A. Then events A and B are called dependent events. Or,
if the happening or no happening of an event affects the happening or no happening of all the
other events then events are said to be not independent. The probability of event B depending
on the occurrence of event A is called conditional probability and is written as P(B/A) which
may be read, “the probability of B given A”. In this case the probability that both the events A
and B will occur is given by
P A B P A .P B / A , P A 0 , or P A B P B .P A / B , P B 0
155
Remarks
1. We note that P(A/B) = P(A), P(B/A) = P(B) if A and B are independent. Because
event B does not affect the event A and event A does not affect the event B.
2. The multiplication rule can be generalised to cover more than two events. For example
if A,B and C are any three events. The probability of their simultaneous occurrence of
three events A,B and C, is given by
c
3. If A and B are independent events, A and B are also independent
P(A and B c ) = P A P A B
P A 1 P B
P A .P B c . A and B c are also independent.
P ( Ac and B) = P B P A B
P B 1 P A
P B .P Ac . Ac and B are also independent.
c
5. If events A and B are independent then A and B c are also independent.
P Ac B c 1 P A B
1 P A 1 P B
P Ac .P B c . Ac and B c are also independent.
156
11.4 EXAMPLES
The event of getting 5 the first time a dice is rolled and the event of getting a 5 the
second time are independent. By contrast, the event of getting a 5 the first time a dice is rolled
and the event that the sum of the numbers seen on the. first and second trial is 8 are not
independent.
If two cards are drawn with replacement from a deck of cards, the event of drawing a
red card on the first trial and that of drawing a red card on the second trial are independent. By
contrast, if two cards are drawn without replacement from a deck of cards, the event of
drawing a red card on the first trial and that of drawing a red card on the second trial are not
independent, because a deck that has had a red card removed has proportionately fewer red
cards.
Consider the two probability spaces shown. In both cases, P(A) = P(B) = 1/2 and
P(C) = 1/4. The random variables in the first space are pairwise independent because
P(A/B) = P(A/C) =1/2 = P(A), P(B/A) = P(B/C) = 1/2 = P(B), and P(C/A) = P(C/B) = 1/4 =
P(C); but the three random variables are not mutually independent. The random variables in the
second space are both pairwise independent and mutually independent. To illustrate the
difference, consider conditioning on two events. In the pairwise independent case, although any
one event is independent of each of the other two individually, it is not independent of the
intersection of the other two:
4
40 4
P A / BC P A
4 1 5
40 40
4
40 4
P B / AC PB
4 1 5
40 40
4
40 2
P C / AB P C
4 6 5
40 40
157
In mutually independent case, howerver,
1
16 1
P A / BC P A
1 1 2
16 16
1
16 1
P B / AC PB
1 1 2
16 16
1
16 1
P C / AB P C
1 3 4
16 16
158
Remark: Mutual Independence
P A B C P A .P B .P C ,
and yet no two of the three events are pairwise independent (and hence the set of
events are not mutually independent). This example shows that mutual independence involves
requirements on the products of probabilities of all combinations of events, not just the single
events as in this example. For another example, take A to be empty and B and C to be identical
events with non-zero probability. Then, since B and C are the same event, they are not
independent, but the probability of the intersection of the events is zero, the product of the
probabilities.
11.4.4 Example
A box contains four tickets marked with numbers 112, 121, 211, 222 and one ticket is
drawn at random. Let Bi (i = 1, 2, 3) be the event that the ith digit of the number of the ticket
drawn is 1. Discuss the independence of the events B1 , B2 and B3 .
2 1
P B1 .
4 2
Cases favourable to the event B2 are those in which the 2nd digit of the number is 1,
i.e. 112, 211, i.e. 2 in all.
2 1
P B2
4 2
Similarly, cases favoruable to the event B3 are 121, 211, i.e. 2 in all.
2 1
P B3
4 2
Cases favourable to the event B1 B2 are those number in which the first as well as
the second digit is 1 is 112, i.e. only 1.
1 1 1
P B1 B2 . P B1 .P B2
4 2 2
B1 , B2 are independent.
159
Cases favourbale to the event B2 B3 are those numbers in which 2nd as well as 3rd
digit is 1, viz., 211, i.e.only 1
1 1 1
P B2 B3 . P B2 .P B3
4 2 2
B2 , B3 are independent.
1 1 1
P B1 B3 . P B1 .P B3
4 2 2
B1 , B3 are independent.
Hence B1 , B2 and B3 are pairwise independent. But because there is no case favourable
to the event B1 B2 B3 (i.e. all the three digits of number are 1’s), we get
1 1 1 1
P B1 B2 B3 P 0 . . P B1 P B2 P B3 .
8 2 2 2
(b) Compare your answer with the one given at the end of this unit.
11.4.5 Say B is the event of drawing a card randomly from a pack of 52 cards and the
outcome is Hearts symbol. What is the conditional probability P(A/B) where A is the
event of getting Ace?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
11.4.6 A cubical dice is rolled randomly and let B be the event of getting an even outcome. If
B occured, what is the probability of P(A/B) where A is getting the outcome ‘2’?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
160
11.5 SUMMARY
In this unit you have learnt the concept of independent events. You also learnt the
concept of conditional probability. You studied multiplication rule and independence of events.
You have observed that this can be extended to any finite number of mutually independent
events.
1 1 1
11.2.2 P(H T) = , P(H) = , P(T) =
4 2 2
1 1
P(H T) = . = P(H) P(T)
2 2
HT is independent.
1 1
11.2.3 P(H H H) = , P(H) =
8 2
1 1 1
P(H H H) = . . = ....
2 2 2
so H H H independent.
13 1
11.4.5 P(B) =
52 4
1
P A B
52
P A B 1/ 52 1
P(A/B) =
P B 1/ 4 13 .
2 1
11.4.6 P A B
6 3
3 1
P B
6 2
P A B 1/ 3 2
P B 1/ 2 3 .
161
11.7 MODEL EXAMINATION QUESTIONS
1. The probability of an economic decline in the year 2000 is 0.23. There is a probability
of 0.64 that we will elect a republican president in the year 2000. If we elect a republican
president, there is a 0.35 probability of an economic decline. Let “D” represent the
event of an economic decline, and “R” represent the event of election of a Republican
president.
b. What is the probability of a Republican president and economic decline in the year
2000?
c. If we experience an economic decline in the year 2000, what is the probability that
there will a Republican president?
d. What is the probability of economic decline or a Republican president in the year 2000?
Hint: You want to find P(D R).
Answers:
Selected Major
Male 40 10 30 80
Female 30 20 70 120
c. Given that a person is male, what is the probability that he is majoring in Management?
162
3. As a company manager for Claimstat Corporation there is a 0.40 probability that you
will be promoted this year. There is a 0.72 probability that you will get a promotion or
a raise. The probability of getting a promotion and a raise is 0.25.
a. If you get a promotion, what is the probability that you will also get a raise?
b. Are getting a raise and being promoted independent events? Explain using probabilities.
Answers:
a. 0.625 b. 0.57
4. A bank has the following data on the gender and marital status of 200 customers.
Male Female
Single 20 30
Married 100 50
Answers:
g. They are not independent because P(male)= 0.6, P(male | single) = 0.4
163
UNIT - 12: BAYE’S THEOREM
Contents
12.0 Objectives
12.1 Introduction
12.3 Exercise
12.4 Summary
12.0 OBJECTIVES
Understand the concept of prior and posterior probability the likelihood used in Bayes
theorem.
12.1 INTRODUCTION
The concept of conditional probability discussed earlier takes into account information
about the occurrence of one event to predict the probability of another event. This concept can
be extended to revise probabilities based on new information and to determine the probability
that a particular effect was due to specific cause. The procedure for reversing these probabilities
is known as Baye’s theorem. The principle was given by Thomas Bayer in 1763. By this
principle, assuming certain prior probabilities, the posteriori probabilities and likelihood are
obtained.
164
E P Ei .P A / Ei
P i n ; i 1,2,..., n
A
P Ei .P A / Ei
i 1
n
n n
Since A Ei , we have A A Ei A E i
i 1 i 1 i 1
n n n
P A P A Ei P A Ei P E .P A / E
i i ... (1)
i 1 i 1 i 1
The conditional probability of an event Ei given that A has already occurred, is given
by
P A Ei
P Ei / A
P A
P Ei P A / Ei
... (2)
P A
P Ei .P A / Ei
E Ei / A n
P E .P A / E
i 1
i i
165
12.2.1 Remark:
Example 1: Suppose a factory has two machines. Past records show that machine 1 produces
30% of the items of output and machine 2 produces 70% of the items. Further, 5% of the items
produced by machine 1 were defective and only 1% produced by machine 2 were defective. If
a defective item is drawn at random, what is the probability that the defective item was produced
by machine 1 and machine 2.
P A P E1 .P A / E1 P E2 .P A / E2
P E1 .P A / E1 0.30 0.05
P E1 / A 0.682 .
P A 0.022
P E2 .P A / E2 0.70 0.01
P E2 / A 0.318 .
P A 0.022
166
4 2 1
Example 2: The probabilities of X, Y and Z becoming managers are , and respectively..
9 9 3
The probability that bonus scheme will be introduced if X, Y and Z become managers are
3 1 4
, and respectively. If the bonus scheme has been introduced, then what is the probability
10 2 5
that the manager appointed was X.
Solution: Let E1 , E2 , E3 denote the events that X, Y and Z become managers respectively and
A denote event that bonus scheme is introduced.
4 2 1
P E1 , P E2 , P E3
9 9 3
3 1 4
P A / E1 , P A / E2 , P A / E3
10 2 5
P A P E1 A P E2 A P E3 A
P E1 .P A / E1 P E2 .P A / E2 P E3 .P A / E3
4 3 2 1 1 4 23
. .
9 10 9 2 3 5 45
P E1 .P A / E1 12 / 90
P E1 / A 0.26 .
P A 23/ 45
(b) Compare your answer with the one given at the end of this unit.
1. Suppose 5 men out of 100 and 25 women out of 10,000 are colour blind. A clolour blind
person is chosen at random. What is the probability of the person being a male?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
167
2. A business man goes to hotels X, Y and Z 20%, 50% and 30% of the times respectively.
It is known that 5%, 4% and 8% of the rooms in X, Y, Z hotels have faulty plumbing?
What is the probability that business man’s room having faulty plumbing is assigned to
hotel Z?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
3. A manufacturing company produces pipes in 2 plants I and II with daily production of
1500 and 2000 respectively. The fractions defective of the pipes produced by two
plants I and II are 0.006 and 0.008 respectively. If a pipe is selected at random from
the day’s production and found to be defective, what is the probability that it has come
from plant II?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
______________________________________________________________________
12.3 EXERCISE
1. In a factory, machine A produces 40% of the output and machine B produces 60%. On
the average, 9 items in 1000 produced by A are defective and 1 item in 250 produced
by B is defective. An item drawn at random from a day’s output is defective. What is
the probability that it was produced by A or B?
2. First box contains 2 black, 3 red, 1 white balls; second box contains 1 black, 1 red, 2
white balls and third box contains 5 black, 3 red, 4 white balls. Of these a box is
selected at random. From it a red ball is randomly drawn. If the ball is red, find the
probability that it is from second box.
1
Ans. P E2 / A 0.25
4
3. Three members X, Y and Z of a private club have been nominated for the office of the
president. The probability that Mr. X will be elected is 0.3, the probability that Mr. Y
will be elected is 0.5 and the probability that mr. Z will be elected is 0.2. Should Mr. X
be elected, the probability for an increase in membership fee is 0.8. Should Y or Z be
168
elected, the corresponding probability for an increase in fee are 0.2 and 0.3. If fee has
been increased, what is the probability that
4. In 2019 there will be three candidates for the position of principal. A, B and C whose
chances of getting the appointment are in the proportion 4:2:3 respectively. The
probability that A if selected would introduce coeducation in the college is 0.3. The
probabilities of B and C doing the same are respectively 0.5 and 0.8. If there is coeduction
in the college in 2020, what is the probability that C is the principal.
12
Ans. P E3 / A
23
12.4 SUMMARY
In this unit we have introduced Bayers’ theorem. The concept of conditional probability
takes into account the information about the occurrence of one event to predict the probability
of another event. Bayes’ theorem is used to determine the probability of some event A given
that another event B has observed.
Theorem states that if E1 , E2 ,..., En are n mutually exclusive and exhaustive events
such that P Ei 0(i 1, 2,..., n) in a sample space S and A is any other event in S intersecting
with every Ei such that P A 0 then
P A / Ei .P Ei
P Ei / A n
P A / E .P E
i 1
i i
169
12.5 CHECK YOUR PROGRESS - MODEL ANSWERS
1
1. The probability that the chosen person is male, P M .
2
1
Probability that the chosen person is female, P F .
2
5 25
Given that P A / M 0.05, P A / F 0.0025
100 10000
P A P M .P A / M P F .P A / F 0.026
P(M/A) = 0.95
2 5 3
2. P X , P Y , P Z
10 10 10
1 1 2 4
PE / X , PE /Y , PE / Z , P Z / E 0.44
20 25 25 9
3 4
3. P E1 , P E2 , P A / E1 0.006, P A / E2 0.008, P E2 / A 0.64 .
7 7
Section - II (ShortAnswers)
170
Dr. B. R. Ambedkar Open University
Faculty of Science
First Semester, I Year (3 Year Degree Programme)
MODEL EXAMINATION QUESTION PAPER
STATISTICS BS127STAT-E
DESCRIPTIVE STATISTICS & PROBABILITY
[Time: 3 hours] [Max. Marks: 80]
Section - A
(Short Answer Questions)
[Marks: 4 x 5 = 20]
1. [Block - I] State the preliminary steps you would take for planning a statistical enquiry.
Section - B
[Long Answer Questions]
[Marks: 4 x 10 = 40]
Note: (a) Answer the following questions.
(b) Each question carries 10 marks.
171
9. [Block - I] A company would like to conduct a survey to know about its customer’s
satisfaction. In this case which data (primary or secondary) will you recommend to the
company? What are the data collection methods you suggest to the company for data
collection ? Why? State the appropriate reasons.
OR
11. [Block - II] What are measures of central tendency? Explain their objectives and
functions.
OR
12. [Block - II] Explain the computation procedure of standard deviation in case of devia-
tion taken from the actual mean and deviation taken from an assumed mean for indi-
vidual data.
iii) The total of the numbers on the dice is any number from 2 to 12, both inclusive.
OR
14. [Block - III] If two dice are thrown, what is the probability that the sum is
15. [Block - IV] State and prove multiplication theorem of probability for two events and
also extend it for ‘n’ events.
OR
16. [Block - IV] The probability of an economic decline in the year 2000 is 0.23. There is
a probability of 0.64 that we will elect a republican president in the year 2000. If we
elect a republican president, there is a 0.35 probability of an economic decline. Let
“D” represent the event of an economic decline, and “R” represent the event of elec-
tion of a Republican president.
172
Section - C
(Objective Type Questions)
(Marks : 20)
Total number of questions 20 [=15 from (theory) and 5 from (practicals)].
173
II. Match the following. (5 marks)
1. The less than and the more than ogives are reflections in
[ ] (a) X
3N
2. Quartile deviatiion [ ] (b) x
4
1
3.
N
fx [ ] (c) negative
Q3 Q1
4. S.D is never [ ] (d)
2
N
5. 2 is [ ] (e) x
4
N
(f) x
2
(g) variance
III. Fill in the blanks (5 marks)
1. Inequalities satisfied by AM, GM, HM of a data are ___________
2. The sum of deviations of data from its mean is ___________
3. Geometric mean of a data can be found only when the data values are ___________
4. A distribution is negatively skewed if ___________ (an inequality by median, mean).
5. x, y are negatively correlated. If x increases, then y ___________
***
174