You are on page 1of 34

DESCRIPTIVE STATISTICS AND PROBABILITY

DESCRIPTIVE STATISTICS

Introduction

Statistics enters into almost every phase of life in some way. A daily news broadcast may start
with a weather forecast and ends with an analysis of the stock market. Statistics in systematic
ways provides bases for investigations in many field of knowledge, such as social, physical,
engineering, medicine, biological sciences, education, business and management. Information on
a topic is acquired in the form of numbers; an analysis of these data is made in order to obtain a
better understanding of the phenomenon of interest, and some conclusions may be drawn. Often
generalizations are sought; their validity is assessed by further investigation.

Statistics is a branch which deals uncertainty. It consists of methods or techniques to


create information to make better decisions under uncertainty. There are methods for collecting,
analyzing, interpreting, and drawing conclusions from data. Data is observed as the outcome of
experiments.

Data collection and Data Presentation

The totality of individuals under consideration is called population and each individual in the
population is called a unit. A particular aspect about which we require information is called
characteristic. Sometimes values for all individuals in the population of relevance are obtained,
but often only a set of individuals, which can be considered as representatives of that population
are observed; such a set of individuals constitutes a sample.

Census Survey and Sample Survey

The method of collecting data from all the units of the population is called census survey or
complete enumeration and the method of collecting data from a sample is called sample survey.

Organising and Summarising Data

A researcher who has to deal with data would be more efficient if the data are presented to him
in a properly tabulated, easy-to-read form. This facilitates quick assimilation of data and
decision-making. However, the data needed for purpose of analysis are generally not available in
a proper format. The analyst is, therefore, required to undertake on his own, the task of
organizing the data into a proper format.

Raw Data: The data in the form originally collected, completely devoid of arrangement by
size or sequence are known as raw data. That is, the unorganized data are called raw data.
These raw data are not amenable even to simple reading and do not highlight any
characteristic or trend.
Frequency Distribution: A frequency distribution can be either grouped or ungrouped. If the
distinct values in a set of data are less, we can write the observed values in one column and
the corresponding number of repetitions ( frequencies) in another column. Such type of
frequency distribution is called ungrouped frequency distribution. If the distinct values in
the set of data are large, we can group the set of values into different classes and the number
of observations in each class can be find out. Then the distribution is called grouped
frequency distribution.
For example, the numbers of children in 20 households can be summarized as follows:
Number Frequency Relative Cumulative
of Frequency Frequency
Children

0 2 0.1 2

1 4 0.2 6

2 7 0.35 13

3 5 0.25 18

4 2 0.1 20

The first and second columns together form an ungrouped frequency table. The third column
gives the relative frequency, which is obtained on dividing each frequency by the total
frequency. The last column gives the cumulative frequency.

The following table gives the marks of 50 students in an examination. The data is
summarized in the form of a grouped frequency table.
Marks Frequency

0 - 10 2

10 - 20 5

20 - 30 25

30 - 40 15

40 - 50 3

If the upper limit of a class is same as the lower limit of the next class, then the distribution is
continuous. Here the upper limits are not included in that class, so this type of classification
is called exclusive classification.

Classification of the form 0 - 10, 11 - 20, 21 - 30, … is called inclusive classification. The
difference between upper limit and lower limit of a class is known as class width. In the
above table, class width is 10.

An inclusive type of classification can be converted to an exclusive type by making use of


some minor adjustments. i) Find the difference between the upper limit of a class and the
lower limit of the next class and divide it by 2. ii) Subtract the resulting quantity from all the
lower limits and add to all the upper limits.
Measures of Central Tendency( Averages)

After a set of data has been collected, it must be organized and condensed or categorized for
purposes of analysis. In addition to graphical displays, numerical indices can be computed
that summarize the primary features of the data set. One is an indicator of location or central
tendency that specifies where the set of measurements is “located”. That is, an average is a
value, which is relatively close to all the observations and act as a representative.
The commonly used averages are Arithmetic Mean (AM), Median and

Mode. 1. Arithmetic Mean

Arithmetic mean is defined as the sum of the values divided by the total number of values
in the data.
xxx+++n
... 1 2
=
If x1, x2, …, xn are the values, then its AM denoted by X n
x
n
=
∑i

i 1
=
n
Example 1: The numbers of children in 10 families are: 5, 2, 2, 3, 1, 4, 3, 2, 1, 2.
2 +1+ 2 25
5 + 2 + 2 + 3 +1+ 4 + 3 + = 10 = 2.5
Solution: AM = 10

If the data is in the form of an ungrouped frequency table as follows:

Values : x1 x2 x3 … xn
Frequency: f1 f2 f3 … fn, then,
x1× f1 gives the sum of all observations with value x1, x2× f2 gives the sum of all
observations with value x2, …, xn× fn gives the sum of all observations with value xn. Therefore,
the sum of all observations will be x1× f1 + x2× f2 + … + xn× fn.

x f x f ... x f
×+×++×
12n
Arithmetic Mean, X= n
1 1 2 2 n n f f ... f

+++
∑ ×
xf ii
i 1
= =
n


=
f 1
i
i

Example 2: Find the arithmetic mean of the following distribution:

x: 1 2 3 4 5 6 7 f: 5 9 12 17 14 10 6
Solution
x f f×x

1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42

Total 73 299

∑×
f x= 73
X
=
∑ f
4.096

299
=

Calculation AM from a grouped frequency table

In a grouped frequency table we don’t know the actual values of the observations falling
in a class. We only know that the values of observations falling in a class lie between its lower
limit and upper limit. So, we cannot find out the exact AM. For calculating AM(approximate)
of a grouped table, we make an assumption that the values of observations falling in a class
are equal to the mid-value of that class. Then we consider the class mid-values as x-values
and make use of the formulae in the case of ungrouped frequency table.

Example 3: Find the AM of the following data:

Daily wages(in Rs.): 50 –60 60 – 70 70 - 80 80 - 90 90 - 100


No. of workers : 2 5 7 6 5

Solution:
Wages Mid-value Frequency f.x
(x) (f)

50 - 60 55 2 110
60 - 70 65 5 325
70 - 80 75 7 525
80 - 90 85 6 510
90 - 100 95 5 475

Total 25 1945

∑×
f x= 25
X ∑
= 77.8
f
1945
=

Short-cut Method to find AM


If the values of x are very large, the calculation of AM becomes time consuming. Let the
x
mid-values of k classes be x1, x2, …, xk and f1, f2, …, fk be the corresponding frequencies. We
-Ai
for i = 1,2, …, k.
=
use the transformation of the form ui C
Here A and C can be any two numbers. But it is better to take A as a number among the middle
part of the mid-values. If all the classes are of equal width, C can be taken as the class width.
Then AM, X= A + C u

∑×
fu
Where u ∑ f
=

Example 4: Find the AM of the following data:

Marks : 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 - 60 No. of students: 12


18 27 20 17 6

Solution:
Marks Mid-value xi - 35 Frequency f.u
(x) (f)
ui = 10

0 - 10 5 -3 12 -36
10 - 20 15 -2 18 -36
20 - 30 25 -1 27 -27
30 - 40 35 0 20 0
40 - 50 45 1 17 17
50 - 60 55 2 6 12

Total 100 -70

u ∑∑ffu= 100
=

− 70
= -0.7

X= A + C u
= 35 + 10× -0.7
= 35 – 7
= 28

Properties of AM

1. The algebraic sum of deviations of a set of values from their AM is zero. 2.


Sum of squares of deviations of a set of values is minimum when deviations
taken about AM.
Combined Mean of Two Groups

Let x1and 2 xbe the means of two groups. Let there be n1 observations in the first group
and n2 observations in the second group. Then x, the mean of the combined group can be
obtained as
nxnx
x=
+ +
1122 n n 12

Example 5: Average daily wage of 60 male workers in a firm is Rs. 120 and that of 40
females is Rs.100. Find the mean wage of all the workers.

Solution: Here n1 = 60, x1= 120 and n2 = 40, x2= 100

60 120 40 100

×

Combined Mean = 60 40
+
= 112

Weighted AM

When calculating AM we assume that all the observations have equal importance. If some
items are more important than others, proper weightage should be given in accordance with
their importance. Let w1, w2, …, wn be the weights attached to the items x1, x2, …, xn, then the
weighted AM is defined as

w x w x ... w x
+++
Weighted mean = +++
1 1 2 2 n n w w ... w
12n

Example 6: A teacher has decided to use a weighted average in figuring final grades for his
students. The midterm examination will count 40%, the final examination will count 50% and
quizzes 10%. Compute the average mark obtained for a student who got 90 marks for
midterm examination, 80 marks for final and 70 for quizzes.

Solution: Here w1 = 40, x1 = 90


w2 = 50, x2 = 80
w3 = 10, x3 = 70

40 90 50 80 10 70
×+×+×
Weighted mean = 40 50 10
++
8300
= 100
= 83

2. Median
The median of a set of observations is a value that divides the set of observations in half, so that
the observations in one half are less than or equal to the median and the observations in the other
half are greater than or equal to the median value.

In finding the median of a set of data it is often convenient to put the observations in ascending or
descending order. If the number of observations is odd, the median is the middle observation. For
example, if the values are 52, 55, 61, 67, and 72, the median is 61. If there were 4 values instead
of 5, say 52, 55, 61, and 67, there would not be a middle value. Here any number between 55 and
61 could serve as a median; but it is desirable to use a specific number for the median and we
usually take the AM of two middle values, i.e, (55+61)/2 = 58.

Median is the primary measure of location for variables measured on ordinal scale because it
indicates which observation is central without attention to how far above or below the median the
other observations fall.

Example 7: Find the median of 10, 2, 4, 8, 5, 1, 7

Solution: Observations in ascending order of magnitude are 1, 2, 4, 5, 7, 8, 10


Here there are 7 observations, so median is the 4th observation.
That is, median = 5

Median for a grouped frequency distribution

In a grouped frequency distribution, we do not know the exact values falling in each class.
So, the median can be approximated by interpolation. Let the total number of observations be N.
for calculating median we assume that the observations in the median class are uniformly
distributed. Median class is the class in which the (N/2) th observation belongs. Also assume that
median is the (N/2)th observation.

Here the frequency table must be continuous. If it is not, convert it into continuous table.
Prepare a less than cumulative frequency table and find the median class. Let ‘l’ be the lower
limit of the median class, ‘f’ the frequency of the median class, and ‘c’ is the class width of the
median classs. By the assumption of uniform distribution, the ‘f’ observations in the median
c 2c fc
class are l + , l + f , …, l + f . Let ‘m’ be the cumulative frequency of the class above the
median class.
f
N
Then the median will be the (2 - m) th observation in the median class.
N
That is, median = l + (2 - m)fc
Example 8: Calculate the median of the following data:
class frequency

1 - 10 4
11 - 20 12
21 - 30 24

31 - 40 36
41 - 50 20
51 - 60 16
61 - 70 8
71 - 80 5

Solution: Since the frequency table is of inclusive, convert it into exclusive by subtracting 0.5
from the lower limits and adding 0.5 to the upper limits.
Class Frequency Cumulati
ve
frequenc
y

0.5 - 10.5 4 4
10.5 – 20.5 12 16
20.5 – 30.5 24 40
30.5 – 40.5 36 76
40.5 – 50.5 20 96
50.5 – 60.5 16 112
60.5 – 70.5 8 120
70.5 – 80.5 5 125

N 125
Here 2 = 2 = 62.5, which lies in the 30.5 - 40.5 class (median class) So, l =
30.5, f = 36, m = 40 and c = 10
N
Median = l + (2 - m)fc
= 30.5 + (62.5 – 40) 3610
= 36.75

Property of Median: The sum of absolute deviations of a set values is minimum when the
deviations are taken from median.

3. Mode

The mode of a categorical or a discrete numerical variable is that category or value which occurs
with the greatest frequency.

Example 8: The mode of the data 2, 5, 4, 4, 7, 8, 3, 4, 6, 4, 3 is 4 because 4 repeated the greatest


number of times.

Mode of a grouped frequency distribution

In a grouped frequency distribution, to find the mode, first locate the modal class. Modal
class is that class with maximum frequency. Let l be the lower limit of the modal class, ‘c’ be the
class interval, f1 be the frequency of the modal class, f0 be the frequency of the class preceding
and f2 be the frequency of the class succeeding the modal class.
c(f -f ) 1 2
Then, Mode =
2f -f -f 1 0 2
l+

Example 9: Find the mode of the distribution given below


class frequency

10 – 15 3
15 – 20 9
20 – 25 16
25 – 30 12
30 – 35 7
35 – 40 5
40 - 45 2

Solution: Here the modal class is the class 20 – 25. That is,
l = 20, c= 5, f0 = 9, f1 = 16 and f2 = 12 c(f -f )
Mode = l + 5(16 -12)
12
2f -f -f 1 0 2
= 20 + 32 - 9 -12
= 21.8

4. Quartiles, Deciles and Percentiles

Median, as has been indicated, is a locational average, which divides the frequency distribution
into two equal parts. Quartiles, deciles and percentiles are not averages. They are the partition
values, which divides the distribution into certain equal parts.

Quartiles

Quartiles are the values, which divides a frequency distribution into four equal parts so that
25% of the data fall below the first quartile (Q1), 50% below the second quartile (Q2), and
75% below the third quartile (Q3). The values of Q1 and Q3 can be find out as in the case of
Q2 (Median). For a raw data, Q1 is the (n/4)th observation and Q3 is the (3n/4)th observation.

N
For a grouped table, Q1 = l1 + (4 - m1)11
c
f

Where N is the total frequency, l1 is the lower limit of the first quartile class ( class
in which (N/4)th observation belongs), m1 is the cumulative frequency of the class above the
first quartile class, f1 is the frequency of the first quartile class and c1 is the width of the first
quartile class.
C
3N
Q3 = l3 + (4 - m3)33
f

Where l3 is the lower limit of the third quartile class ( class in which (3N/4)th observation
belongs), m3 is the cumulative frequency of the class above the third quartile class, f3 is the
frequency of the third quartile class and C3 is the width of the third quartile class.

Deciles and Percentiles

Deciles are nine in number and divide the frequency distribution into 10 equal parts.
Percentiles are 99 in number and divide the frequency distribution into 100 equal
parts.

Selecting the Most Appropriate Measure of Central Tendency

Generally speaking, in analyzing the distribution of a variable only one of the possible
measures of central tendency would be used. Its selection is largely a matter of judgment
based upon the kind of data, the aspect of the data to be examined, and the research question.
Some of the points that must be considered are following.

Central tendency for interval data is generally represented by the A.M., which takes into
account the available information about distances between scores. For ranked (ordinal) data,
the median is generally most appropriate, and for nominal data, the mode.
If the distribution is badly skewed, one may prefer the median to the mean, because the
median would not be affected as much by unusual extreme scores. For this reason, for
example, the median income of people is usually reported rather than the A.M.
If one is interested in prediction, the mode is the best value to predict if an exact score in a
group has to be picked.

Measures of Dispersion

So far we have discussed averages as sample values used to represent data. But the average
cannot describe the data completely.
Consider two sets of data : 5, 10, 15, 20, 25
15, 15, 15, 15, 15
Here we observe that both the sets, the same mean 15. But in the set I, the observations are more
scattered about the mean. This shows that, even though they have the same mean, the two sets
differ. This reveals the necessity to introduce measures of dispersion.

A measure of dispersion is defined as a mean of the scatter of observations from an average.

Commonly used measures of dispersion are Range, Mean deviation, Standard deviation, and
quartile deviation.

1. Range
Range of a set of observations is the difference between the largest and the smallest
observations. In the case of grouped frequency table, range is the difference between the upper
bound of last class and the lower bound of the first class.

Example 1: The range of the set of data 9, 12, 25, 42, 45, 62, 65 is 65 – 9 = 56

Range is the simplest measure of dispersion but its demerit is that it depends only on the extreme
values.

2. Mean deviation about the Mean:

You have seen that range is a measure of dispersion, which does not depend on all observations.
Let us think about another measure of dispersion, which will depend on all observations.

One measure of dispersion that you may suggest now is the sum of the deviations of observations
from mean. But we know that the sum of deviations of observations from the A.M is always
zero. So we cannot take the sum of deviations of observations from the mean as a measure.

One method to overcome this is to take the sum of absolute values of these deviations. But if we
have two sets with different numbers of observations this cannot be justified. To make it
meaningful we will take the average of the absolute deviations. Thus mean deviation (MD) about
the mean is the mean of the absolute deviations of observations from arithmetic mean. If x1, x2,
…, xn are n observations, then, MD = n1i - x
n

∑ |x |
i = 1

Example 2: Find the MD for the following data 12, 15, 21, 24,
28 Solution:
28
12 +15 + 21+ 24 + = 20
X= 5
x | xi- x|

12 8
15 5
21 1
24 4
28 8

Total 26

26
MD = 5 = 5.2

Mean deviation about mean for a frequency table


Let x1, x2, …, xn be the values and f1, f2, …, fn are the corresponding frequencies. Let N be the
sum of the frequencies. Then, MD = N1i - x
n

∑| x | fi
i = 1

In the case of a grouped frequency table, take the mid-values as x-values and use the same
method given above.

Example 3: Find the mean deviation of the heights of 100 students given below:
Heigt in cm frequency

160 – 162 5
163 – 165 18
166 – 168 42
169 – 171 27
172 - 174 8
Solution:
Heigt Mid Frequency fx | xi- x| fi| xi- x|
in cm value (f)
(x)

160 – 162 161 5 805 6.45 32.25


163 – 165 164 18 2952 3.45 62.10
166 – 168 167 42 7014 0.45 18.90
169 – 171 170 27 4590 2.55 68.85
172 - 174 173 8 1384 5.55 44.40

Total 100 16745 226.50

16745
= 167.45

X= 100

MD = N1i - x
n

∑| x | fi
i = 1

226.5
= 2.265
= 100

3. Variance and Standard Deviation

When we take the deviations of the observations from their A.M both positive and negative
values occurs. For defining mean deviation we took absolute values of the deviations. Another
method to avoid this problem is to take the square of the deviations. So, variance is the
mean of squares of deviations from A.M.. Positive square root of variance is called standard
deviation.
If x1, x2, …, xn are n observations, then, the variance = n1i - x
n

∑(x standard

)2 and
i
=
1
deviation(SD) is defined as, SD = n1i - x
n

∑(x
)2
i = 1

Example 4: Find the variance and standard deviation of the following data:
42, 39, 44, 40, 36, 39, 30, 46, 48, 36
400
= 40
Solution: Arithmetic mean X= 10
1n 1 2
n
)2 = 10 [(42 – 40)2 + (39 – 40) + … +
(36 – 40)2] i - x
∑(x
i
=
1 4 = 10
25 =
25.4
Variance = 25.4
S.D = 25.4= 5.04

Variance and Standard deviation for a frequency table

Let x1, x2, …, xn be the values and f1, f2, …, fn are the corresponding frequencies. Let N be the
sum of the frequencies. Then, Variance = N1i - x
n

∑( x )2 fi and
i = 1

Standard deviation = N1i - x


n

∑(x
)2fi
i = 1

1
The above formulae for variance can be expressed as, variance = N Σfi xi2- X2 In the case
of a grouped frequency table, take the mid-values as x-values and use the same method
given above.

Example 5: Find the variance and standard deviation of the following data:
0 – 10 3
10 – 20 4
20 - 30 6
30 – 40 10
40 - 50 7

Solution:
class frequency

class mid-value frequency

(x) (f) fx fx2

0 – 10 5 3 15 75
10 – 20 15 4 60 900
20 - 30 25 6 150 3750
30 – 40 35 10 350 12250
40 - 50 45 7 315 14175

Total 30 890 31150

1
Variance = N Σfi xi2- X2
890
= 29.67, Σfi xi2 = 31150
N = 30, X= 30
31150
- (29.67)2
Variance = 30
= 1038.33 - 880.31
= 158.02
Standard deviation = 158.02= 12.57

Short-cut method to find standard deviation

If the values of x are very large, the calculation of SD becomes time consuming. Let the
xi - A
mid-values of k classes be x1, x2, …, xk and f1, f2, …, fk be the corresponding frequencies. for
i = 1,2, …, k.
We use the transformation of the form ui = C
Here A and C can be any two numbers. But it is better to take A as a number among the middle
part of the mid-values. If all the classes are of equal width, C can be taken as the class width.
1
Variance of ui’s, Var(u) = N Σfi ui2- u2
Then variance of xi’s, Var(x) = C2 × Var(u)
That is, SD(x) = C × SD(u)

Example 6: Consider the problem in example 5, let us find out the SD using
short-cut method.
Solution:
class mid-val xi - 25 frequency fu fu2
ue (x) (f)
ui = 10

0 – 10 5 -2 3 -6 12
10 – 20 15 -1 4 -4 4
20 - 30 25 0 6 0 0
30 – 40 35 1 10 10 10
40 - 50 45 2 7 14 28

Total 30 14 54

∑fu= 3014= 0.467, Σfi ui2 = 54, N = 30


u= N

54
Variance(u) = 30 - (0.467)2
= 1.8 – 0.21809
= 1.5819
Variance(x) = 102×1.5819 = 158.19

SD(x) = 158.19= 12.57

Combined Variance
If there are two sets of data consisting of n1 and n2 observations with s12and s22as their
respective variances, then the variance of the combined set consisting of n1+n2 observations
is
S2 = [n1(s12 + d12) + n2(s22 + d22)] / (n1 + n2)
Where d1 and d2 are the differences of the means, x1and x2, from the combined mean x
respectively.

Example 7: Find the combined standard deviation of two series A and B


Mean 50 40
Standard 5 6
deviation No. of 100 150
items
Solution:

Given x1= 50 and x2= 40, s12 = 25 and s22 = 36, n1 = 100 and n2 = 150 100 50
150 40
×+×
= 44,
Combined mean x= 100 150
+
d1 = x1 - x= 50 – 44 = 6, and d2 = x2 - x= 40 – 44 =-4

100(25 36) 150(36 16)


+++
Combined variance = 100 150
+
= 55.6
Therefore, combined SD = 55.6= 7.46

4. Quartile Deviation
Quartile deviation (Semi inter-quartile range) is one-half of the difference between the third
quartile and first quartile.
Q3 - Q1
That is, Quartile deviation, Q.D = 2

Example 8: Estimate an appropriate measure of dispersion for the following data:


Income (Rs.) No. of persons
Less than 50 54
50 – 70 100
70 – 90 140
90 – 110 300
110 – 130 230
130 – 150 125
Above 150 51

1000

Solution:
Since the data has open ends, Q.D would be a suitable measure
Income (Rs.) No. of Cumulative
x persons f frequency

Less than 50 54 54
50 – 70 100 154
70 – 90 140 294
90 – 110 300 594
110 – 130 230 824
130 – 150 125 949
Above 150 51 1000

1000

N
Q1 = l1 + (4 - m1)11
c
f
3N
Q3 = l3 + (4 - m3)33
c
f
N 3N
Here N= 1000, 4 = 250, 4 =750
The class 70 – 90 is the first quartile class and 110 – 130 is the third quartile class

l1 = 70, m1 = 154, c1 = 20, f 1 = 140

l3 = 110, m3 = 594, c3 = 20, f3 = 230

20
Q1 = 70 + (250- 154)140
= 83.7
20
Q3 = 110 + (750- 594)230
= 123.5
123.5 - 83.7
= 19.9 Rs.
Q.D = 2

Relative Measures

The absolute measures of dispersion discussed above do not facilitate comparison of two
or more data sets in terms of their variability. If the units of measurement of two or more sets of
data are same, comparison between such sets of data is possible directly in terms of absolute
measures. But conditions of direct comparison are not met, the desired comparison can be made
in terms of the relative measures.
Coefficient of Variation is a relative measure of dispersion which express standard deviation(σ
) as percent of the mean. That is Coefficient of variation, C.V = ( σ/x)100. Another relative
measure in terms of quartile deviations is Coefficient of quartile deviation and Q - Q
31
×
is defined as Qr = 100
.
+
Q Q1
3

Example 9: An analysis of the monthly wages paid to workers in two firms A and B, belonging to
the same industry, gives the following results:
Firm A Firm B

Number of workers 586 648


Average monthly wage 52.5 47.5
Standard deviation 10 11

In which firm, A or B, is there greater variability in individual wages?

10
×
Solution: Coefficient of variation for firm A = 100
52.5
= 19%
11
×
Coefficient of variation for firm B = 100
47.5
= 23%
There is greater variability in wages in firm B.

SKEWNESS and KURTOSIS

1. Skewness

Very often it becomes necessary to have a measure that reveals the direction of dispersion about
the center of the distribution. Measures of dispersion indicate only the extent to which individual
values are scattered about an average. These do not give information about the direction of
scatter. Skewness refers to the direction of dispersion leading departures from symmetry, or lack
of symmetry in a direction.

If the frequency curve of a distribution has longer tail to the right of the center of the distribution,
then the distribution is said to be positively skewed. On the other hand, if the distribution has a
longer tail to the left of the center of the distribution, then distribution is said to be negatively
skewed. Measures of skewness indicate the magnitude as well as the direction of skewness in a
distribution.

Empirical Relationship between Mean, Median and Mode

The relationship between these three measures depends on the shape of the frequency
distribution. In a symmetrical distribution the value of the mean, median and the mode is the
same. But as the distribution deviates from symmetry and tends to become skewed, the extreme
values in the data start affecting the mean.

In a positively skewed distribution, the presence of exceptionally high values affects the mean
more than those of the median and the mode. Consequently the mean is highest, followed, in a
descending order, by the median and the mode. That is, for a positively skewed distribution, Mean
> Median> Mode. In a negatively skewed distribution, on the other hand, the presence of
exceptionally low values makes the values of the mean the least, followed, in an ascending order,
by the median and the mode. That is, for a negatively skewed distribution, Mean < Median <
Mode.

Empirically, if the number of observations in any set of data is large enough to make its
frequency distribution smooth and moderately skewed, then, Mean – Mode = 3(Mean – Median)

Measures of Skewness

1. Karl Pearson’s measure of skewness: Prof. Karl Pearson has been developed this
measure from the fact that when a distribution drifts away from symmetry, its
mean, median and mode tend to deviate from each other.
Mean - Mode
Karl Pearson’s measure of skewness is defined as, SkP = SD
2. Bowley’s measure of skewness: developed by Prof. Bowley, this measure
of skewness is derived from quartile values.
Q Q 2Q
3+ 1-2
=
It is defined as SkB Q Q1
3-

3. Moment measure of skewness:


If x1, x2, …, xn are n observations, then the rth moment about mean is defined as
mr = n1i - x
n

∑(x )r
i = 1

The moment measure of skewness is defined as β1= m3/(SD)3


In a perfectly symmetrical distribution β1=0, and a greater or smaller value of β1
results in a greater or smaller degree of skewness.

2.Kurtosis

Kurtosis refers to the degree of peakedness, or flatness of the frequency Curve. If the curve is
more peaked than the normal curve, the curve is said to be lepto kurtic. If the curve is more flat
m
than the normal curve, the curve is said to be platy kurtic. The normal . The
4
curve is also called meso kurtic. The moment measure of kurtosis is β 2= 2
2
m
value of β 2=3, if the distribution is normal; more than 3, if the distribution is lepto kurtic; and
less than 3, if the distribution is platy kurtic.

Example 1: Given m2(variance) = 40, m3 = -100. Find a measure of skewness.

Solution:
Moment measure of skewness, β1= m3/(SD)3
−100= - 0.4
=3
( 40)
Hence, there is negative skewness

Example 2: The first four moments of a distribution about mean are 0, 2.5, 0.7, and 18.75.
Comment on the Kurtosis of the distribution
m
.
4
Moment measure of kurtosis is, β 2= 2
m 2

18.75
=3
=2
2.5
So, the curve is normal.

PROBABILITY

Introduction

Each of us has some intuitive notion of what “probability” is. Everyday conversation is full of
references to it: “He is likely to win the game”. He will probably be selected for the job”. The use
of words ‘likely’, ‘probably’ indicates that there is an element of uncertainty about these
statements. The theory of probability provides a numerical measure of the element of uncertainty.
It enables us to take decisions under uncertainty with a certain amount of risk.

Random Experiment
In science we come across phenomena, which follows certain pattern without fail. A stone drops
from a cliff follows Newton’s laws of motion. But there are experiments whose results cannot be
predicted in advance.
Random experiment is an experiment, which does not give the same result if it conducted under
homogeneous conditions.

Examples:
1. Tossing a coin and observing the face turns up
2. Rolling a die and observing the face turns up

Sample Space, Outcomes and Events

Set of all possible outcomes of a random experiment S together with a sigma field A
defined on S is called a sample space (S,A).

Examples:
1. Consider the random experiment, tossing a coin and observing the face turns up. S = {
H, T} , Where H – Head, T – Tail
2. Rolling a die and observing the face turns up.
S = {1, 2, 3, 4, 5, 6}
An outcome of the experiment is an element in S, which is also known as sample point.

An event is any subset of the sample space. In the example of tossing a coin, H and T are sample
points, but φ(null event), {H}, {T}, {H, T}(sure event) are events. The event φis an impossible
event because it can never occur. But the event {H, T} is a sure event, which occurs in every trial.
An event A will be said to have occurred in a trial if the outcome is a sample point, which
belongs to A.

The set consisting of exactly one sample point is called an elementary event. For example, in the
experiment of throwing a die, {1}, {2}, {3}, {4}, {5}, and {6} are elementary events, but 1, 2, 3,
4, 5, and 6 are sample points. That is, elementary events are events, which cannot be further split
up. Events, which can be further split up are called compound events. For example, {2, 4, 6} is a
compound event.

Algebra of Events

1. Event not A ( complement of A)

Corresponding to an event A, we can define another event, which contains the outcomes
in the sample space but not in A is called complement of A and it is denoted by A , A' , or
Ac.
Example: In the random experiment, rolling a die and observing the number shown up, let
A = {2, 4, 6}. Then A= {1, 3, 5}. Here the event A is ‘even number shown up’ and Ais
‘odd number shown up’.

2. All events (intersection)


If A and B are two events in the same experiment, the event which represents the
simultaneous occurrences of A and B is A ∩B.

Example: In a die rolling trial, let A be the event ‘a prime number happened’ and B be the
event ‘an odd number happened’. That is, A = {2, 3, 5} and B={1, 3, 5}. Then, the event
represents ‘the number happened is both prime and odd’ is A ∩B={3, 5}.

3. At least one among events (Union)

If A and B are two events in the same experiment, the event which gives at least one
among (A or B) is A ∪B.
Example: In a die rolling trial, let A be the event ‘a prime number happened’ and B be the
event ‘an odd number happened’. That is, A = {2, 3, 5} and B={1, 3, 5}.
Then the event at least one among (a prime number or an odd number) is A ∪B={1, 2, 3,
5}.

3. A and not B (difference)

If A and B are two events in the same experiment, the event which represents A and not B
is A ∩ B .
Example: In a die rolling trial, let A be the event ‘a prime number happened’ and B be the
event ‘an odd number happened’. That is, A = {2, 3, 5} and B={1, 3, 5}. Then, the event
represents ‘the number happened is a prime but not odd’ is A ∩ B={2}.

4. Exactly One (Symmetric difference)

If A and B are two events in the same experiment, the event, which represents the
happening of exactly one is (A∩ B)∪(A ∩B).
Example: In a die rolling trial, let A be the event ‘a prime number happened’ and B be the
event ‘an odd number happened’. That is, A = {2, 3, 5} and B={1, 3, 5}. Then, the event
represents ‘exactly one among A and B’ is (A ∩ B)∪(A ∩B) = {2}∪{1} ={1, 2}.

Mutually Exclusive Events ( Disjoint Events)

Two events are said to be disjoint if the occurrence of one event prevents the occurrence
of other event. That is, if A and B are disjoint events, their simultaneous occurrence will
not be possible. Therefore A ∩B = φ .
Example: In a die rolling trial, let A be the event ‘an even number happened’ and B be the
event ‘an odd number happened’. That is, A = {2, 4, 6} and B={1, 3, 5}. Since the
occurrence of ‘an even number’ prevents the occurrence of ‘an odd number’ in the same
trial, the events A and B are mutually exclusive. See that A ∩B = φ .
Exhaustive Events and Equally Likely Events

A list of elementary events of a random experiment is said to be exhaustive if their union


is the sample space. If every elementary event of a random experiment has an equal
chance of occurrence, then the elementary events are said to be equally likely.
Example: In a die rolling trial, the events {1}, {2}, {3}, {4}, {5}, and {6} are exhaustive
events since their union is the sample space. Since there is no preference for any one
event over another, these events are also equally likely.

Definitions of Probability

Generally speaking, probability is a measure of chance of happening of an uncertain


event. That is, probability is used to measure the uncertainty of an event. The value of
probability ranges between 0 and 1. If it is certain that an event happen, then its
probability would be 1 and if it is certain that the event would not happen its probability
is 0.

There are three different conceptual approaches to the study of probability. They are:
1. Classical approach.
2. Frequency approach.
3. Axiomatic approach.
1. Classical Definition

This is the earliest approach to the theory of probability. Laplace, the French
mathematician given this definition of probability. Using this definition, we can
determine the probability of an event even before the performance of trial. So classical
probability is often called ‘a priori probability’.

Definition: If the elementary events of a random experiment with finite sample space
are mutually exclusive, equally likely and exhaustive, then, the probability of an event
A is the “ratio of the number of outcomes favourable to A to the total number of
possible outcomes”. That is, if an event A can occur in ‘m’ ways out of ‘n’ equally
m
likely ways, then, P(A) =n .
Note that, the outcomes, which result in the happening of a desired event are called
favourable outcomes.

Example 1: Consider the random experiment tossing two coins and observing the
faces turns up. Sample space, S ={(H,H), (H,T), (T,H), (T,T)}. Let A be the event that
3
‘ getting at least one tail. Then P(A) = 4 ( In three outcomes there is at least one tail).

2. Frequency ( Empirical ) Definition


In many situations it is not possible to have equally likely events, on which the
classical definition of probability is based. In these situations, another approach can be
used is to find the probability from the past experience. That is, we may find the
probability on the basis of relative frequency of the event in the past. However relative
frequency should always be estimated on the basis of a large number of readings in the
past. The larger the past readings the greater will be the accuracy of the result. Since in
relative frequency approach probabilities are calculated on the basis of past experience,
these probabilities are called posteriori probabilities.
Definition: In frequency approach, probability can be defined as P(A)= n→∞
Lt nf Where
f is the frequency of A and n is the number of trials.

3. Axiomatic Definition

In classical and frequency definitions probability is defined under certain


assumptions. There is another definition of probability, which we shall now discuss,
called the axiomatic definition of probability, where probability is defined as a function
whose domain is the class of events, taking values in the real line.

Definition: A set function P from sigma field to the real line is a probability if it satisfies
the following axioms:
Axiom 1 P(A) ≥0, for every event A
Axiom 2 P(S) = 1, where S is the sample space
Axiom 3 If A 1, A 2, …, are disjoint events, then,
P(A1∪A2∪…) = P(A1) + P(A2) + …

Probability space
The triplet ( S, A,P) is called a probability space. Where S is the set of outcomes of
random experiment, A is a sigma field defined on S and P is the probability measure
defined on A.

Some Results in Probability

1. If A is an event and Aits complement, then, P(A) + P(A) =1


2. For any event A, 0 ≤P(A)≤1
3. Addition Theorem on Probability
For any two events A and B, P(A ∪B) = P(A) + P(B) – P(A∩B)
If A and B are disjoint, then, P(A ∪B) = P(A) + P(B)

Independence of Events
Two events A and B are said to be independent if and only if
P(A∩B) = P(A)×P(B)
CONDITIONAL PROBABILITY

We often face situations where the probability of an event A is influenced by the


information that another event B has already occurred. The probability being a measure of
chance, our assessment of the probability of an event will also change if we knew that
another event has occurred. This reassessment of the probability of one event conditional
on the occurrence or non-occurrence of another event is called the conditional probability.

Let A and B be two events in a sample space. Then P(B/A) ( read as probability of
B given A) be the probability of the event B given that the event A has occurred is called
conditional probability of B given A.
P(A ∩ B), if P(A)>0
It is defined as P(B/A) =
P(A)

P(A ∩ B), if P(B)>0


And P(A/B) =
P(B)

Example: Suppose a card is selected at random from a pack of cards. The card selected is
an ace. What is the probability that the card selected is a red one?

Solution:
Let A be the event that the card selected is an ace and B be the event that the card
selected is a red one. The required probability is P(B/A).
P(A ∩ B)
By definition, P(B/A) =
P(A)
4 1
P(A) = Probability that the card selected is an ace = 52 = 13 ( Since there are 4 aces in a
pack of 52 cards).
2
P(A∩B) = Probability that the card selected is a red ace = 52 = 261
= 13
Therefor 1
= 21
e P(B/A) 26 1

Multiplication Theorem on Probability

If A and B are two events in a sample space. Then, the multiplication theorem states that
P(A∩B) = P(A) P(B/A) if P(A)>0 and
= P(B) P(A/B) if P(B)>0

Two events A and B are independent, then, P(B/A) = P(B) or P(A/B) = P(A)

Example: Two cards are drawn form a well-shuffled pack of cards. Find the probability
that they are both aces if the first card is (a) replaced (b) not replaced.
Solution:
Let A be the event that “ace selected on the first draw” and B be the event that
“ace selected at the second draw”.
Then we require P(A∩B). By multiplication theorem, P(A∩B) = P(A) P(B/A)
4
(a) Since for the first draw, there are 4 aces in 52 cards. ∴P(A) = 52 . The card
4
is replaced and then selected, so P(B/A) = 52 .
4
∴P(A∩B) = 52 ×524

= 1691
(b) If the card is not replaced after first drawing, there will be only 3 aces on the
second drawing out of 51cards.
P(A) is same as in the first case, but P(B/A) = 513
4 3
∴P(A∩B) = 52 ×51 = 2211

Bayes’ Theorem

Bayes’ Theorem is used to revise the probability of an event when new information is
available. The idea of revising probabilities is used by all of us in daily life even though
we may not know anything about probability. For example, a person while going out may
start without taking a raincoat, but as soon as he comes out of his home and sees a large
mass of cloud in the sky he may decide to take a raincoat with him. So, by Bayes’
theorem, we find the posteriori probabilities.
Statement: Let B1, B2, …, Bn are ‘n’ mutually exclusive events whose union is the
sample P(B) P(A/B )
ii

space. If A is any event, then, P(B i


/A) =
∑P(B) P(A/B )
ii

1C.13. Solved Problems

1. Write down the sample space of the random experiment of throwing two dice
simultaneously and observing the face numbers.

Solution: Sample space S is given by:


S={(1,1 ), (1,2), (1,3), (1,4), (1,5), (1,6),
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6),
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6),
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6),
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6),
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}

2. If a box contains 10red and 6 blue balls, what is the probability that a bal drawn at
random is red? Find also the probability that the ball drawn is blue?

Solution: Number of red balls = 10


Number of blue balls = 6
Total number of balls = 16
By classical definition, P(the ball drawn is red) = 1610
P(the ball drawn is blue) = 166
3. A speaks truth in 60% cases and B in 70% cases. In what percentage of cases are they
likely to contradict each other in stating the same fact?

Solution:
Contradiction takes place only if one of them speaks truth and the other tells lie.
60
= 0.6
The probability that A speaks truth = 100
The probability that A tells lie = 1 – 0.6 = 0.4
70
= 0.7
The probability that B tells truth = 100
The probability that B tells lie = 1 – 0.7 = 0.3
Since A and B speaks independently, probability that A speaks truth and B tells lie
= Probability that A speaks truth × Probability that B tells lie = 0.6×0.3 Similarly,
Probability that A tells lie and B speaks truth = 0.4 ×0.7
Thus the probability that A speaks truth and B tells lie or A tells lie and B speaks
truth = 0.6×0.3 + 0.4×0.7 = 0.18 + 0.28 = 0.46.
That is, in 46% of cases they contradict each other.

4. The odds against A speaking the truth are 4 : 6 while the odds in favour of B speaking
the truth are 7:3. (i) What is the probability that A and B contradict each other in
stating the same fact? (ii) If A and B agree on a statement, what is the probability that
this statement is true?

Solution:
6
The probability that A speaks truth = 10 = 0.6
The probability that A tells lie = 1 – 0.6 = 0.4
7
The probability that B tells truth = 10 = 0.7
The probability that B tells lie = 1 – 0.7 = 0.3
(i) A and B will contradict each other if one of them tells lie and the other
speaks truth.
The required probability = 0.6×0.3 + 0.4×0.7
= 0.18 + 0.28
= 0.46
(ii) A and B agree on a statement if both tell lie or speak truth
Probability that both speaks truth = 0.6×0.7 = 0.42
Probability that both tells lie = 0.4×0.3 = 0.12
Probability that both agree on a statement = 0.42 + 0.12 = 0.54
0.42
= 97
Required probability = 0.54
5. Three light bulbs are chosen at random from 15 bulbs of which 5 are defectives. Find
the probability that (i) none is defective (ii) exactly one is defective, (iii) at least one is
defective.

Solution:
There are 15C3 = 455 ways to choose 3 bulbs from 15 bulbs.
(ii) Since there are 10 non-defective bulbs, there are 10C3 = 120 ways to choose 3 non
0
12 = 0.26
defective bulbs. Thus, P(none is defective) = 455
(iii) Since there are 5 defective bulbs, one defective bulb can be chosen in 5 different
ways and 10C2 = 45 different ways to choose 2 non-defective bulbs. Hence, there are 5 ×
225
45 = 225 ways to choose 3 bulbs of which exactly one is defective. Thus, = 0.49
P(exactly one is defective ) = 455
(iv) The event that at least one is defective is the complement of the event ‘none is
defective’. By (i), P(none is defective) = 0.26
Hence, P(at least one is defective) = 1 – 0.26 = 0.74

6. A box contains 5 white and 7 black balls. If three balls are drawn at random, what is
the probability that one is white and two are black balls.

Solution:
One white ball can happen in 5 ways and 2 back balls can happen in 7C2 = 21 different
5× 21
ways. Also 3 balls can happen in 12C3 = 220 different ways. = 4421
Thus, the required probability = 220

7. A box I contains 8 red and 7 blue balls. Another box II contains 6 red and 6 blue balls.
One ball is selected at random from the box I and transferred it into box II. Then, one
ball is drawn at random from the box II, what is the probability that it is a red ball?

Solution:
Let A be the event that the selected ball from the box II is a red ball. Then, A can
happen in the following ways. Transfer a red ball from box I to box II and then select a
red ball from box II or transfer a blue ball from box I to box II and then select a red ball
from box II.
P(transfer a red ball from box I to box II and then select a red ball from box II ) =
8
×= 195
15 13 56
7

P( transfer a blue ball from box I to box II and then select a red ball from box II) =
7
×= 195
15 42
.
6
13 56 42
+195 = 195
So, the required 98
probability = 195

8. If P(A) = 0.4, P(B) = 0.7, and P(A∩B) = 0.3, then, what is the probability of A or B
happened?
Solution:
By addition theorem on probability, P(A or B) =P(A ∪B)= P(A) +P(B)-P(A∩B)
That is, P(A∪B) = 0.4 + 0.7 – 0.3 = 0.8

3 5 3
9. Given, P(A) = 8 , P(B) = 8 and P(A∪B)= 4 , Are A and B independent?

Solution:
Two events A and B are independent if P(A ∩B) = P(A) × P(B)
By addition theorem on probability, P(A∪B)= P(A) +P(B)-P(A∩B)
So, P(A∩B) = P(A) +P(B) - P(A∪B)
3 5 3
= 8 +8 -4 = 41
3 5
P(A) × P(B) = 8 ×8 = 6415
Thus, P(A∩B) ≠ P(A) × P(B), hence A and B are not independent.

10. The probability that a contractor will get a contract for road construction is 0.5 and
the probability that he will get a contract for the construction of water tank is 0.7.
What is the probability of getting at least one contract?

Solution:
Let A be the event getting contract for road construction and B be the event of
getting contract for construction of water tank.
By addition theorem on probability,
P(at least one) =P(A∪B)= P(A) +P(B)-P(A∩B)
Since A and B are independent, P(A ∩B) = P(A) × P(B)
= 0.5×0.7 = 0.35
Hence, P(A∪B) = 0.5 + 0.7 – 0.35 = 0.85

11. A company has two plants to manufacture scooters. Plant I manufactures 70% of the
scooters and plant II manufactures 30%. At plant I, 80% of scooters are rated standard
quality and at plant II, 90% of scooters are rated standard quality. A scooter is selected
at random and is found to be of standard quality. What is the chance that it has come
from (a) plant I (b) plant II.

Solution:
Let A be the event ‘scooter selected is of standard quality’.
Let B1 be the event ‘scooter manufactured at plant I’ and B 2 be the event ‘scooter
manufactured at plant II.
P(B1) = 0.7, P(B2) = 0.3, P(A/B1) = 0.8, and P(A/B2) = 0.9
P(B ) P(A/B )
11
/A) =
(a) Required probability = P(B1 P(B ) P(A/B ) P(B ) P(A/B )
+
1122

0.7 0.8
×
= 0.7 0.8 0.3 0.9
×+×
= 8356
P(B ) P(A/B )
22
/A) =
(b) Required probability = P(B2 P(B ) P(A/B ) P(B ) P(A/B )
+
1122

0.3 0.9
×
= 0.7 0.8 0.3 0.9
×+×
= 8327

12. A box X contains 2 white and 3 red balls. Another box Y contains 4 white and 5 red
balls. One ball is drawn at random from one of the boxes and is found to be red. Find
the probability that it was drawn from box Y.

Solution:
Let A be the event ‘the ball drawn is red’, B 1 be the event ‘box X has been chosen’, and
B2 be the event ‘box Y has been chosen’
P(B ) P(A/B )
22
/A) =
Required probability is P(B2 P(B ) P(A/B ) P(B ) P(A/B )
+
1122

1 1 3
P(B1) = 2 , P(B2) =2 , P(A/B1) = 5 , and P(A/B2) = 95
1
5
×
2
9
P(B2/A) =
1
3
1
5
2 ×+×
5
= 5225
2
9

You might also like