You are on page 1of 8

Module 7: Probability and Statistics

Lecture 1: Sampling Distribution and Parameter Estimation


1. Introduction
This lecture deals with Sampling distribution and Estimation of parameters. Random
sampling and Point estimation, desirable properties of point estimators as well as methods of
point estimation are discussed in detail. Interval Estimation of various types of random
samples is also discussed. Example problems on the aforementioned topics are presented
alongside the theoretical descriptions.
2. Population and Sample
Population is the complete set of all values representing a particular random process. For
example, streamflow in a certain stream over infinite timeline represents the population. A
sample is any subset of the entire population. For example, streamflow in the stream over the
last 30 years is a sample.
2.1 Random sample
As it is impractical and/or uneconomical to observe the entire population, a sample (i.e, a
subset) is selected from the population for analysis. A sample is said to be a random sample
when it is representative of the population and probability theory can be applied to it to infer
results that pertain to the entire population.
2.2 Random sample from finite and infinite population
An observation set ( )
n
X X X X ... , ,
3 2 1
selected from a finite population of size N, is said to be
a random sample if its values are such that each X
i
of n has the same probability of being
selected.
An observation set ( )
n
X X X X ... , ,
3 2 1
selected from an infinite population ( ) x f , is said to be
a random sample if its values are such that each
i
X of n have the same distribution ( ) x f
and the n random variables are independent.
3. Classical approach to estimation of parameters
The classical approach of parameter estimation is of two types:
- Point Estimation - A single parameter value is estimated from the observed dataset.
- Interval Estimation - A certain interval is determined from the observed dataset; it can be
said with a definite confidence level that the parameter value will lie within that interval.
4. Random sampling and Point Estimation
In real life scenarios, the parameters of the distribution of a population are unknown and it is
not feasible to obtain them by studying the entire population, hence a random sample is
generally selected. The parameters of the distribution that are computed based on analysis of
the sample values are called the estimators of the parameters. Thus, parameters correspond to
the population, while estimators correspond to the sample.
4.1 Desirable Properties of a Point Estimator
The desirable properties of a point estimator are unbiasedness, consistency, efficiency, and
sufficiency.
Unbiasedness: Bias of an estimator is equal to the difference between the
estimator's expected value and its the true value. For an unbiased estimator of the
parameter, expected value =true value.
Consistency: It refers to the asymptotic property whereby the error in the estimator
decreases with increase in the sample size n . Thus as n , the estimated value
approaches the true value of the parameter.
Efficiency: An estimator with a lesser variance is said to be more efficient compared
to that with a greater variance, other conditions remaining same.
Sufficiency: If a point estimator utilizes all the information that is available from the
random sample, then it is called a sufficient estimator.
4.2 Methods of Point Estimation of Parameters
The most commonly used methods of point estimation of parameters are:
- Method of Moments
- Method of Maximum Likelihood
4.2.1 Method of Moments
The method of moments is based on the fact that the moments of a random variable have
some relationship with the parameters of the distribution.
If a probability distribution has m number of parameters, then the first m moments of the
distribution are equated to the first m sample moments. The resulting m number of
equations can then be solved to determine the m number of parameters.
For a sample size n , the sample mean and sample variance are:
( )
2
1
2
1
1
;
1

= =
= =
n
i
i
n
i
i
x x
n
s x
n
x
Thus x and
2
s are point estimates of the population mean and population variance
respectively and the parameters of the distribution can be determined from these. If needed,
other higher order sample moments can also be obtained to calculate all the parameters.
The relation between parameters of some common distributions and the moments are:
- In case of normal distribution, parameters and
2
o are equal to the mean and variance
( ) ( )
2
; o = = X Var X E
- In case of gamma distribution, the parameters o and | are relates to the mean and
variance as follows
( ) ( )
2
; o| o| = = X Var X E
4.2.2 Method of Maximum Likelihood
The method of maximum likelihood can be used to obtain the point estimators of the
parameters of a distribution directly.
If the sample values of a RV X with density function ( ) u ; x f are ( )
n
x x x ,... ,
2 1
, then the
maximum likelihood method is aimed at finding that value of u which maximizes the
likelihood of obtaining the set of observations ( )
n
x x x ,... ,
2 1
.
The likelihood of obtaining a particular sample value
i
x is proportional to the function value
of the pdf at
i
x .
The likelihood function for obtaining the set of observations ( )
n
x x x ,... ,
2 1
is given by
( ) ( ) ( ) ( ) u u u u ; ... ; ; ; ,... ,
2 1 2 1 n n
x f x f x f x x x L =
Differentiating the likelihood function w.r.t. u and equating it to zero, we get the value of
which is the maximum likelihood estimator of the parameter u
( )
0
; ,... ,
2 1
=
c
c
u
u
n
x x x L

The solution of can also be obtained by maximizing the logarithm of the likelihood function
L
( )
0
; ,... , log
2 1
=
c
c
u
u
n
x x x L

If there are m number of parameters of the distribution, then the likelihood function is
( ) ( )
[
=
=
n
i
m i m n
x f x x x L
1
2 1 2 1 2 1
,... , ; ,... , ; ,... , u u u u u u
And the maximum likelihood estimators are obtained by solving the following simultaneous
equations.
( )
m j
x x x L
j
m n
,..., 2 , 1 ; 0
,... , ; ,... ,
2 1 2 1
= =
c
c
u
u u u

4.3 Problem on Point Estimate
Q. The interarrival time of vehicles on a certain stretch of a highway is expressed by an
exponential distribution
( )

t
T
e t f

=
1

The time between successive arrival of vehicles was observed as 2.2s, 4.0s, 7.3s, 11.1s, 6.2s,
. 1 . 8 , 4 . 3 s s
Determine the mean inter arrival time by the a) method of moments b) the maximum
likelihood method.
Soln.
(a) The first moment about the origin of ( ) x f
X
is
| |


=
+ =
=

}
,
,
1
0
/ /
0
/
or
e te or
dt te
t t
t

Therefore, s t X
i
i
04 . 6
7
1
7
1
= = = =

=

(b) Assuming random sampling, the likelihood function of the observed values is
( )
( ) |
.
|

\
|
=
|
.
|

\
|
=

[
=

=
7
1
7
7
1
7 2 1
1
exp
exp
1
; ,... ,
i
i
i
i
t
t
t t t L


The estimator can now be obtained by differentiating the likelihood function L with respect to
.
Hence,
s or
t or
t or
t t or
t
t t
L
i
i
i
i
i
i
i
i
i
i
i
i
i
i
04 . 6 ,
7
1
,
7
1
,
0
1
exp
1
7 ,
0
1
exp
1
exp 7
7
1
7
1
7
1
7
1
8
2
7
1
7
1
7
7
1
8
=
=
=
= |
.
|

\
|

)
`

+
= |
.
|

\
|
+ |
.
|

\
|
=
c
c


=
=
= =

=
=

5. Interval Estimation
In case of a point estimate, chances are very low that the true value of the parameter will
exactly coincide with the estimated value.
Hence it is sometimes useful to specify an interval within which the parameter is expected to
lie. The interval is associated with a certain confidence level i.e, it can be stated with a certain
degree of confidence that the parameter will lie within that interval.
5.1 Confidence interval of Mean with known variance
For a large sample n ( ) 30 > n , if x is the calculated sample mean and
2
o is the known
variance of the population, then
n
X
/ o

is a standard normal variate. The confidence
interval of the mean is given by
n
z x
n
z x
o

o
o o 2 / 2 /
+ < < where ( ) % 100 1 o is the
degree of confidence and
2 o
z is the value of standard normal variate at cumulative
probability level 2 / o and ( ) 2 / 1 o .
5.2 Confidence interval of Mean with unknown variance
For a small sample n ( ) 30 < n , if x is the calculated sample mean and
2
s is the calculated
sample variance, then the random variable ( ) ( ) n s X / / follows t -distribution with
( ) 1 n degrees of freedom. The confidence interval of the mean is given by
n
t x
n
t x
o

o
o o 2 / 2 /
+ < < where ( ) % 100 1 o is the degree of confidence and
2 o
t is the
value of standard t-distribution variate at cumulative probability level 2 / o and ( ) 2 / 1 o . It
can be obtained from the t-distribution table. Though it is assumed that the sample is drawn
from a normal population, the expression applies roughly for non-normal populations also.
5.3 Problem on Confidence interval of Mean with known variance
Q. Thirty concrete cubes prepared under a certain condition. The sample mean of these cubes
is found to be 24 KN/m
3
. If the standard deviation is known to be 4 KN/m
3
, determine the
99% and the 95% confidence interval of the mean strength of the concrete cubes.
Soln.
(a) For the 99% confidence interval,
( ) ( ) 01 . 0 99 . 0 1 1 = = o
From the standard normal table,

( )
( )
( )
575 . 2 ,
995 . 0 ,
005 . 0 1 ,
2
1
005 . 0
005 . 0
005 . 0
2 /
=
= s
= s
= s
z or
z Z P or
z Z P or
z Z P
o
o

( ) 88 . 1 575 . 2
30
4
,
2 /
= =
o
o
z
n
Now
The 99% confidence interval of the mean strength of the concrete cubes is
( )
( )
3
3
/ 88 . 25 ; 12 . 22 , .
/ 88 . 1 24 ; 88 . 1 24
m KN e i
m KN +

(b) To determine the 95% confidence interval,
( )
( )
96 . 1 ,
975 . 0 ,
2
05 . 0
1
005 . 0
025 . 0
025 . 0
=
= s
= s
z or
z Z P or
z Z P

( ) 43 . 1 96 . 1
30
4
,
2 /
= =
o
o
z
n
Now
The 95% confidence interval of the mean strength of the concrete cubes is
( )
( )
3
3
/ 43 . 25 ; 57 . 22 , .
/ 43 . 1 24 ; 43 . 1 24
m KN e i
m KN +

It is more likely that the larger interval will contain the mean value than the smaller one.
Hence the 99% confidence interval is larger than the 95% confidence interval.

5.4 Problem on Confidence interval of Mean with unknown variance
Q. A random sample of 25 concrete cubes were selected from a batch of concrete cubes
prepared under a certain process. The sample mean of the 25 concrete cubes is found to be 24
KN/m
3
and the sample standard deviation is 4 KN/m
3
. Determine the 99% and the 95%
confidence interval of the mean strength of the concrete cubes.
Soln.
Here 25 = n .
So,
n S
X
/

has a t -distribution with ( ) 24 1 . . = = n f o d degrees of freedom.
For the 99% confidence interval, 005 . 0 2 / = o
From the t -distribution table, we get the value of
24 , 005 . 0
t for 995 . 0 = p and 24 = f ,
797 . 2
24 , 005 . 0
= t
( ) 24 . 2 797 . 2
25
4
,
1 , 2 /
= =
n
t
n
Now
o
o

The 99% confidence interval of the mean strength of the concrete cubes is
( )
( )
3
99 . 0
3
99 . 0
/ 24 . 26 ; 76 . 21 , .
/ 24 . 2 24 ; 24 . 2 24
m KN e i
m KN
=
+ =


It may be noted that this interval is larger compared to that where the standard deviation of
the population was known. This is expected because uncertainty is greater when the standard
deviation is unknown.
6. Concluding Remarks
The basics of sampling distribution and estimation of parameters is discussed in this lecture.
Point estimation and interval estimation for various types of samples are also presented here.
In the next lecture, one sided confidence interval of mean for known and unknown variance,
confidence interval of variance and estimation of proportion are discussed in detail.
Hypothesis testing is also introduced in the next lecture.

You might also like