You are on page 1of 61

1

Statistical Distributions
2
STATISTICAL DISTRIBUTIONS
Variables Discrete & Continuous
Probability Distribution, Frequency Function, Probability Density
Function
Types of Statistical Distributions
Binomial
Poisson
Uniform
Normal
Lognormal
Pareto
Exponential
Some Sampling Distributions
Chi Square
Students t
Fishers F
Simulation
3
Applications
A service provider needs information about arrival pattern of
customers, and distribution of service times for various
categories of services to decide the number of service counters
and service rendering procedures.
Distribution of Arrival and Service Times : A telephone company
has to assess the number and duration of calls made and
received by its customers to evolve strategy for planning and
managing its network.
Distribution of Talk Time : A life insurance company keeps
studying the life distribution of individuals to evolve suitable
policies for various age groups.
Distribution of Life of Individuals : An investor keeps track of the
distribution of earnings and dividends payment records of
various companies to plan investments portfolio.
Distribution of Dividend
4
Applications
A bank must know the distribution of its deposits in various maturity
periods for credit allocation/ commitment for varying periods of
time.- known in banking parlance as asset - liability management.
Distribution of Deposit Amount : A car battery manufacturer keeps
track of the distribution of lives of batteries being manufactured for
assessment of quality as also assuring the guarantee period.
Distribution of Life of Car Batteries : A production Manager keeps
track of the parameters like length, diameter, and breaking strength,
etc. which are considered relevant for improving quality of
products.
Distribution of Characteristics of Items : Each company listed on a
stock exchange announces its own EPS (earring per share).
5
Applications
Distribution of EPS for Various Companies : The index of a stock
exchange keeps on changing from day to day. One could
calculate rate of return. (Positive or negative or even zero) on the
stock index, on daily basis.
Distribution of Daily Rate of Return on a stock index like BSE over
a period, say one year or on a specific stock of an enterprise like
ICICI Bank
The starting salary of an MBA from a management Institute varies
from candidate to candidate.
Distribution of Starting Salaries

6
Random Variable Continuous and
Discrete
A continuous variable takes all possible
values in its range. For example, the height
of a person can take any value, in a certain
range, but for the sake of convenience, it is
measured only up to the accuracy of inches
or cms.
7
Discrete Variable
A discrete variable takes only certain values
in a range. For example, number of children
in a family, number on a dice, number of
defective items in packets of 10 items each,
etc.can take only integral values.
8
Probability Distributions
When two coins are tossed, the number of Heads is a random
variable which can take the values 0, 1 or 2. The associated
probabilities are 1/4, 1/2 and 1/4, and are depicted in the
following diagram:
Number of Heads
2 1 0
1/4
2/4
Probability
9
Parameters
Location : A parameter which indicates the mid point of
the range of the variable or a value around which all the
values of the variable tend to get located.

Scale: A parameter which determines the variability or the
scale of measurement of the variable.

Shape: It determines the shape (in a sense different from
location and scale) of the distribution within a family of
shapes associated with a specified type of variable.
10
Binomial Distribution
Let us visualise a conceptual or practical
situation where a trial or an experiment
results in only two outcomes, say Success
and Failure. Further, the result of one trial
does not influence the result of next trial,
and the probability of success at each trial is
same from trial to trial.
11
Binomial Distribution
Tossing a coin - Head or Tail
Birth of a baby - either a Girl or a Boy
Inspection of an item - item declared as Defective
or Non-Defective
Auditing a Bill or an Invoice - contains an error or not
A stocks closing price - Higher or not
An advertisement on TV - recalled by viewer or not
Repayment by a borrower - pays regularly or not
12
The conditions for the applicability of a
Binomial Distribution
There are n independent trials
Each trial has only two possible outcomes
The probabilities of two outcomes remain
constant
The trials are independent.

13
Binomial Distribution
if x is the random variable representing
number of successes, the probability of
getting r successes and n-r failures, in n
trials, is given by the probability function

P( x = r) =
n
C
r
p
r
q
n-r

x = 0, 1, 2,r, n
14
Mean & Variance
Thus, for Binomial Distribution,
Mean = np
Variance = npq
Standard Deviation = npq
15
Properties of Binomial Distribution
f(x ) 0 for each and every value of x which it can take
f(x) = 1 summation being over all values of x which it can
take In addition, the other properties are
Range of the variable x is from 0 to n, where n is the number of
trials
Mean = n p
Variance = npq, and is always less than mean
Shape of the distribution is symmetrical for p = 1/2 , skewed to
the right when p < 1/2, and skewed to the left when p > 1/ 2.
16
Relationship with Poisson Distribution
The Binomial Distribution can be
approximated by the Poisson Distribution,
when n is very large and p is very small such
that their product np is a finite quantity.
For practical purposes, n has to be greater
than 30, and p has to be smaller than 0.1.
17
Relationship with Normal Distribution
The Binomial distribution tends to the Normal
distribution, described later in the chapter, when
n is large, and p is neither close to zero nor 1.

( )
npq
np x
z
=
18
Poisson Distribution
Poisson Distribution was introduced by S.
D. Poisson as a distribution of rare events
i.e. the events whose probability of
occurrence is very small but the number of
trials, which could lead to the occurrence of
the event, are very large
19
Poisson Distribution
Number of air accidents in a day
Number of customers arriving at a banks counter every
minute
Number of telephone calls coming to an office every minute
Number of defects in pieces of clothes of a certain length, say
one meter
Number of goals scored per match in a soccer tournament
20
Poisson Distribution
It is defined as







!
) (
x
x
m
m
e
x f

=
=
. .......... , , , 3 2 1 0 x
21
The conditions for the applicability of a
Poisson distribution
The variable (number of occurrences) is a
discrete variable
The occurrences are random
The occurrences are rare.
22
Mean and Variance
Thus the mean and variance of Poisson
distribution are equal.
Mean = m
Variance = m
Standard Deviation = m
23
Properties of Poisson Distribution
Range of the variable is from 0 to
Mean and Variance are equal
Distribution gets symmetrical and tends to
normal distribution
24
Poisson Distribution as an Approximation to
Binomial Distribution
When n is large, say greater than 30, and p
is quite small, say less than 0.1, then the
Binomial distribution can be approximated
by the Poisson distribution with mean np
and variance also np.
25
Relationship with Normal Distribution
When n is very large, say > 30, the Poisson
distribution is well approximated by the
Normal distribution
26
Continuous Probability Distributions
For continuous variable, the probability
distribution is called probability density
function (abbreviated as p. d. f.) because it is
defined for every point in the range and not
only for certain values like Binomial and
Poisson distributions.
27
Probability as Area Under a Curve
f(x)
x
x+dx
x
28
Uniform/ Rectangular Distribution
This is the simplest type of distribution of a
continuous variable. Its mathematical form
is 1
f(x) = --------------
a x b ( b a )

, and the graphical form is as shown below:
29
Uniform/ Rectangular Distribution
x
f(x =1/(b a)
30
The mean and variance are as follows



12
) a - b (
2
=
Variance
2
b a
Mean
+
=
31
Normal Distribution
Some of the situations where normal distribution is
applicable are :
Life of items subjected to wear and tear like bulbs,
batteries, currency notes, tyre, etc.
Length and Diameter of certain manufactured
products like pipes, screws, and discs
Breaking strength of textile thread, bursting / tensile
strength of paper and plastic bags, metallic wire, etc.
Weekly sales of an item in a store
32
Normal Distribution
Daily rate of return for a stock or an market index like
SENSEX (BSE)
Height and weight of children at birth
Aggregate Marks obtained by students in an examination
Yield of a fertiliser used in different plots of a crop
Filling of a liquid item by an automatic machine in a
container
Daily rate of returns on a stock exchange index like BSE,
NSE etc.
33
Normal Distribution
34
Normal Distribution
( 1/2) ((x m)/ )2
f (x) = (1/ ) e
< x < +
t
2
35
Standard Normal Distribution
The normal distribution with mean(m) = 0, and s.d.() = 1,
is called the standard normal distribution. The random
variable that follows this distribution is denoted by z. If a
variable x follows normal distribution with mean m and s.d.
, the variable 3 defined as
x m
z = -------

has standard normal distribution with mean 0 and s.d as 1.
36
Properties of Normal Distribution
The range of the variable is from to + .
It is bell shaped curve symmetrical with respect to its mean
Being symmetrical with respect to its mean, its measure of skewness is
zero.
The measure of flatness of the curve i.e. kurtosis is 3. This value is
taken as reference point for indicating whether a curve is more flatter
than this ( leptokurtic ) or more peaked than this ( platykurtic). The
normal curve, itself, is referred to as meso kurtic.
The mean, median and mode are all equal.
The mean being equal to median divides the curve in two equal parts.
37
Properties of Normal Distribution
The mean being equal to mode, the curve has maximum value at the
mean.
The Mean Deviation of Normal distribution is = (4/5) = 0.8
(approximately)
The area under the curve from m to m + is 68.27 %.
The area under the curve from m 2 to m + 2 is 95.73 %.
50 % of the area under the curve lies between m 0.6745 (first
quartile) to m + 0.6745 (third quartile).
95 % of the area under the curve lies between m 1.96 to m + 1.96 .
99.73 % of the area under the curve lies between m 3 to m + 3

38
Normal Distribution as an
Approximation to Binomial Distribution
The binomial distribution tends to the normal
distribution, described below, when n is large, and
p is neither close to zero nor 1. To be more precise,
the binomial distribution can be very well
approximated by the a normal distribution
whenever both np and nq are greater than 5.
) ( npq
np x
z

=
39
Lognormal Distribution
If a variable x has lognormal distribution, then log
x has normal distribution.
Source: www.wikipedia.com
40
Pareto Distribution
The probability density function is

1
0
0
) / ( ) (
+

.
|

\
|
=
k
x
x
x k x f
41
Exponential Distribution Applications
Life of an electronic item like transistor which fails due to
sudden fluctuation in voltage or surge in current
Distance between defects in a woven cloth
Time interval between successive customers in a queue
Time interval between successive telephonic calls in an office
Time interval between successive breakdowns of a machine
Duration of STD / ISD calls
42
Exponential Distribution

where,the range of the variable (life in the above
case) is from 0 to , and m ( > 0 ) is the mean
The exponential distribution is also written as

s s
x 0
m
x
) / 1 ( ) (

=
e m x f
x
e x f

=
) (
43
Properties of Exponential Distribution
Mean = E(x) = m
Variance = E(x m)2 = m2
Median = m
It is positively skewed
44
Relationship Between Poisson and
Exponential Distribution
The two distributions viz. Poisson and Exponential
are interrelated as illustrated below. It may be
noted that if the time interval between successive
events follows exponential distribution, the
variable number of events in a unit time interval
follows Poisson distribution.
45
Sampling Distributions
Three important sampling distributions
Chi Square (_2) Distribution,
Students t Distribution, and
Fishers F Distribution
46
Chi Square (_2) Distribution
If x1, x 2, .. xn are n standard normal
variables i.e. each of them is distributed as
normal with mean 0 and, s.d. as 1, then the
statistic

is said to be distributed as _2.
2
1
i
n
i
x
=
47
Applications of _2 Distribution
The _2 Distribution is used in studying association between
two factors or attributes each of them being at two or
more levels. Some examples are
Educational Background( Science, Commerce and Arts ) of
MBA students and their Final Grades in MBA .
Credit worthiness of borrowers for personal loans and their
age groups
Coaching of students and their results in an examination.
Attitude toward stock market (Bearish, Neutral, Bullish)
and age groups of investors.
48
Applications of _2 Distribution
Returns on stocks and sectors of stocks like
Banking, Pharmaceutical, Information
Technology, etc.
Training received and performance of staff like
salesmen.
Yield of a crop with levels of a fertiliser used
49
Properties of _2 Distribution
_2, being the sum of squares is always positive. In
fact, the range of _2 is from 0 to since the range
of each xi is up to .
The Mean, Mode and Variance of _2 are as follows:
Mean = n
Mode = n 2
Variance = 2n
The shape of _2 distribution depends on the value
of n. The shape for 3 values of n are given below.
50
Properties of _2 Distribution
_
2
_
2
_
2

n = 10
n =
20
n =40
51
Properties of _2 Distribution
For large values of n, say > 30, is distributed
as Normal with mean
and s.d = l

is distributed as _2 with (n 1 ) d.f. a

where s
2
= ( 1 / n )

and x
i
s are distributed as N ( m, 2)
2
2
_
) 1 2 (

n
2
2
o
ns
) (
2
1
x x
i
n
i


=
52
Student's t Distribution
This distribution was introduced by W.S.Gosset in
1907 - but he preferred to name the distribution
under his pen name "Student". This is the
distribution of the ratio of a variable distributed
normally (0, 1) and the square root of the variable _2.
53
Student's t Distribution
Thus, if x is a variable distributed normally with
mean zero and s.d. 1, and there are n observations
of the variables as x1, x 2, xn, , then the
variable.




is said to distributed as student's t with (n l)d.f.
n x
x
t
i
/
2

=
54
Shape of the t Distribution
t
Normal
55
Properties of t Distribution
Mean = 0

Variance =

(n > 2)
n , a positive integer, representing d.f., and is the
shape parameter deciding the shape of the
distribution. More the value of n, closer it gets to
the normal curve. For > 30, it is almost identical
with the normal distribution.
2

n
n
56
Conditions for Use of t Distribution
The following conditions need to be satisfied for
using the t distribution.
The variable, on which the observations x1, x 2,
xn are recorded, follows normal
distribution in the population.
The sample size ( n ) is small, say < 30.
The s.d. of the variable, in the population, is
unknown.
57
Fisher's F Distribution
F is defined as the ratio of two variables( divided
by their respective d.f.), which are distributed as
_2. If U is distributed as _m2 and V is distributed
as _n2 , and U and V are independent, then the
variable
U / m _m2 / m
Fm,n = ------------ = ------------
V / n _n2 / n
58
F distribution is also called variance-ratio
distribution
s
1
2

F n1, n2 = ------------
s
2
2

where, s12 and s22 are two sample variances based
on n1 and n2 observations, respectively.
Fisher's F Distribution
59
Properties of 'F' Distribution
F being the ratio of two positive quantities, is always positive. Since
the range of _2 is from 0 to , the range of F is also from 0 to .
m, n represent degrees of freedom, and are shape parameters deciding
the shape of the distribution as shown below.
F
(5,10)

F
(10,15)

F
(15,20)

60
Simulation
Simulation is defined by T. H. Taylor as A
numerical technique for conducting experiments
on a digital computer, which involves certain types
of mathematical and logical relationships necessary
to describe the behaviour and structure of a
complex real word system over extended period of
time.
61
Advantages of Simulation
The important advantages of the simulation
techniques are as follows:
It is useful in solving problems where all values of
the variables are either not known, or partially
known.
In situations where it is difficult to predict or
identify bottlenecks, simulation is used to foresee
these unknown difficulties.

You might also like