You are on page 1of 18

Estimation Theory

We have to estimate the unknown parameters of a parent universe on the basis of the result based
on the various samples. The general theory concerned with estimation is known as the theory of
estimators.
Estimation of population parameters like (mean, variance, proportion, correlation coefficient) etc,
from the corresponding sample statistics is called statistical inference
Example:

1) A manufacturer may be interested to estimate the future demand for his product is the
market.
2) Manufacturer of bulbs or tubes may be interested in estimating the average life of his
product.
3) Manufacturer equipment may be interested to know the quality of the product by
estimating the proportion of defective pieces in the lot.

Let a random sample x 1 , x 2 , x 3 ,… … … … x n−1 , x n from a parent universe have a density function
f { x , θ } where θ is an unknown parameter to be estimated. Now our problem is to be estimated.
Now our problem is to obtain θ in terms of the sample values. There are infinite many ways of
choosing an estimator and our problem is to choose best estimator. By best we mean that the
distribution of the estimator should be concentrated near the true parameter value of θ .

The theory of estimation is divided into two groups.


1) Point estimation
2) Interval estimation
In point estimation a sample statistic (numerical value) is used to provide an estimate of the
population where as in interval estimation probable range is specified within which the true value of
the parameter might be expected to lie.

Point estimation:
A particular value of a statistics which is used to estimate a given parameter is known as a point
estimate or estimator of the parameter.
Let us suppose that some characteristic of the element in a population is represented by the random
variable x which has pdf F ( x ,θ ), where the form of F is known except for the fact that it contain an
unknown parameter θ . If θ is somehow determined, then F would be completely specified. The
problem of point estimation is to pick up statistic T ( X 1 … … X n ) that represents or estimates θ . The
numerical value of T when the realization is x 1 , x 2 … … x n is called an estimator of θ . Thus an
estimator is a statistic T that estimates is called an estimator of θ . Thus an estimator is a statistic T
that estimates the parameter θ , whereas an estimate is a value of T.

The problem of Interval estimation is to define two statistics T 1 ( X 1 … … X n ) and T 2 ( X 1 … … X n ),


where T 1 ( X 1 … … X n ) <T 2 ( X 1 … … X n ) Such that T 1 ( X 1 … … X n ) ,T 2 ( X 1 … … X n ) is a random
interval for which the probability that it will contain the unknown parameter θ is specified.

Example: Let X 1 , … … X n be a random sample from N ( μ , σ 2 ) . Then


n
1
X= ∑x
n i=1 1
( √ √)
2 2
is a possible point estimator of the parameter μ. And X−2
S S is a possible interval
, X +2
n n
estimator of μ.

Where,
n
1
S2= ∑ ( x −x )2
n−1 i=1 1
Parameter Space:
The set of all admissible value of parameter θ , associated with the population pdf F ( x ,θ ) is called
the parametric space, to be denoted by θ .
The parametric θ may be a vector of parameters. For example, for a normal population
N ( μ , σ 2 ) ,θ−( μ , σ 2 ) is a vector of paramters μ and σ 2.Hence the parametric space is
Θ= {( μ 0 , σ 2) 0−∞ < μ< ∞ , σ 2> 0 } ,
When both μ and σ 2 are unknown.If μ=μ 0 (given), and σ 2 is not known, then
Θ= {( μ 0 , σ ) : σ >0 } ,
2 2

There are many methods of obtaining point estimation.


A good estimator is one which is as close to the true value of the parameter as possible.
The following are some of the criteria which should be satisfied by a good estimator or the
properties of good estimators.
1) Unbiasedness
2) Consistency
3) Efficiency
4) Sufficiency

Consistency:
X 1 , X 2 … be a sequence of independent and identically distributed r,v with common pdf
F ( x 1 ,θ ) ,θ ∈ Θ. A sequence of point estimators T n ( X 1 , … X n )=T n will be called consistent if:
T n P θ for each ¿ θ∈ Θ

Or if for every ∈>0
P (|1 T n−θ|>∈ ) → 0 as n→ ∞
Or if,
lim P (|1 T n −θ|<∈ ) =1 ∀ ∈> 0
n→∞
In variance property of consistent estimators.
Theorem: If T n is a consistent estimator of θ and Ψ ( θ ) is a continuous function of θ , then Ψ ( T n ) is a
consistent estimator of Ψ ( θ )
Proof: Since T n is a consistent estimator of θ
T n P θ as n → ∞ , i , e . for ∈>0 , η>0.

∃ a positive integer n≥ m ( ∈, η ) such that
P {|T n−θ|<∈ }>1−n , ∀ n ≥ m
Since Ψ { . } is a continuous function , for every ∈>0 , however small , ∃ a positive number ∈1 such
that
|Ψ ( T n )−Ψ ( θ )|< ∈1 whenever |T n−θ|< ∈i .e .
|T n−θ|<∈⇒|Ψ ( T n ) −Ψ (θ )|<∈1
For two events A and B, if A⇒B,then

Crammer-Rao Inequality
Theorem: If it an unbiased estimator for γ ( θ ), a function of parameter θ ,then

[ ]
2
d
γ (θ ) 2
dθ [ γˈ ( θ ) ]
var ( t ) ≥ =
( )
2
∂ I (θ)
E log L
∂θ
Where I ( θ ) is the information on θ , supplied by the sample.
Proof: Regularity Condition for Crammer-Rao Inequality
1) The parameter spaceΘ is a non-degenerate open interval on the real line R1 (−∞ , ∞ ) .
∂L
2) For almost all x=( x 1 , x 2 … … xn ) and ∀ θ ∈Θ , ( x ,θ ) exists, the exceptional set, if any is
∂θ
independent of θ .
3) The range of integration is independent of the parameter θ , so that f ( x ,θ ) is differentiation
under the integral sign.
4) The condition of uniform convergence of integrals are satisfied so that differentiation under
the integral sign is valid.

[{ }]
2

5) I ( θ )=E log L ( x ,θ ) ,exists and is positive for all θ ∈Θ
∂θ
Let x be a random variable, following the probability density function f ( x ,θ ) and Let L be the
likelihood function of the random sample ( x 1 , x 2 … … x n ) from this population, Then-
n
L=L ( x , θ ) =∏ f ( x i , θ )
i=1
Since is the joint probability distribution function of f ( x 1 , x 2 … … x n )
∫ L ( x , θ ) dx=1
Where
∫ dx=∬ ⋯ ⋯ ⋯ ∫ d x 1 , d x2 ⋯ ⋯ ⋯ d xn
Differentiating with respect to θ and using regularity conditions given above, we get-

∫ ∂ θ Ldx=0
∫ 1L ∂∂θL . Ldx=∫ ( ∂θ

log L) Ldx=0

¿E ( ∂
∂θ )
log L =0

{ }

∵ E ( x )= ∫ x f ( x ) dx
−∞

Let t =t ( x 1 , x 2 … … x n ) be an unbiased estimator of γ ( θ ),such that-


E ( t )=γ (θ ) ⇒ ∫ t . L . dx=γ ( θ )
Differentiating with respect to θ ,we get-

∫ t ∂∂ Lθ dx =γ ˈ ( θ ) ⇒∫ t ( ∂ log
∂θ
L
) ˈ
L dx=¿ γ ( θ ) ¿

⇒E t. ( ∂
∂θ
log L =γ ˈ (θ ) )
cov t . ( ∂
∂θ ) (
log L =E t .

∂θ )
log L −E ( t ) E

∂θ ( )
log L =γ ˈ ( θ )
We have
cov ( x , y )
r ≤ 1∧r=
σxσ y

⇒r =2 [ cov ( x , y ) ]
var ( x ) var ( y )
2
⇒ [ cov ( x , y ) ] ≤ var ( x ) . var ( y )

[ ( )] ≤ var ( t ) . var ( ∂∂θ log L)


2

∴ Cov t . log L
∂θ

[ ∂θ log L) −{ E ( log L)}


]
2

⇒ [ γ ( θ ) ] ≤ var ( t ) E (
2
ˈ 2 ∂ ∂
∂θ

⇒ [ γ ( θ ) ] ≤ var ( t ) [ E ( log L) ]
2
ˈ 2 ∂
∂θ
2
[ γ ˈ (θ ) ]
⇒ var ( t ) ≥

[( )]
2

E log L
∂θ

Which is Crammer-Rao Inequality

*Conditions for the equality sign in CR inequality


If the likelihood function L is expressible in the from-
∂ t−γ ( θ )
log L= =[ t−γ ( θ ) ] A ( θ )
∂θ λ (θ )
1
A ( θ )= ∗¿
λ (θ )
Then:
1) t is an unbiased estimator of
2) Minimum variable bound (MVB) estimator exists, and ( t ) for γ ( θ ) exists, and

| |
ˈ
γ (θ )
=|γ ( θ ) λ ( θ )|
ˈ
3) var ( t )=
Aθ ( )

Efficiency:
In the sampling distribution of two statistics have the same mean the statistic with the smaller
variance is called an efficient estimator of the mean while the other statistic is called an inefficient
estimator.
If one of the two consistent estimator T 1 , T 2 of a certain parameter θ . We have

V ( T 1) < V ( T 2 ) , n

Then T 1 is more efficient than T 2 for all sample size

Most Efficient Estimator:


If in a class of consistent estimators for a parameter there exist one whose sampling variance less
than of any such estimator. it is called the most efficient estimator. Whenever such an estimator
exists it provides a criterion for measurement of efficiency of the other estimator.

If T 1 is the most efficient estimator with variance V 1 and T 2 is any other estimator with
variance V 2, then the efficiency E of T 2 is defined as

T1
E= , Where E ≤1
T2

Sufficiency:
An estimator T n is said to be sufficient if it contains all the information in the sample regarding the
population parameter θ .
If T =T ( x 1 , x 2 … … x n ) is an estimator of a parameterθ , based on a sample x 1 , x 2 … … x n of size n
from the population with density f ( x ,θ ) such that the conditional distribution of x 1 , x 2 … … x n
given T , is independent of θ , then T is sufficient estimator θ .

Properties of sufficient estimators:


1) If a sufficient estimator exists for some parameter then, it is also most efficient estimator.
2) It is always consistent.
3) It may or may not be unbiased.
4) A minimum variance unbiased estimator (MVUE) for a parameter exists if and only if these
exists a sufficient estimator for it.

Testing of Hypothesis

Hypothesis:

A statistical hypothesis is some assumption or statement or assertion about a population or


equivalently about the probability distribution characterizing a population which we want to verify
on the basis of information available from a sample.

Example :
1) A quality control manager is to determine whether a process is working properly.
2) A drug chemist is to decide whether a new drug is really effective in curing a disease.
3) A statistical has to decide whether a given coin is biased.

Simple Hypothesis:

If the statistical hypothesis specifies the population completely then it is termed as a simple
statistical hypothesis.

Example :
If x 1 , x 2 … … x n is a random sample of size n from a normal population with mean μ and variance σ 2
,then the hypothesis.
2
H 0 : μ=μ0 , σ =is a simple hypothesis .

Composite Hypothesis:
A hypothesis which does not specify the population completely then it is termed as a composite
hypothesis.
If x 1 , x 2 … … x n is a random sample of size n from a normal population with mean μ & variance
2
σ ,then the hypothesis.
For example:
I. H 0 : μ=μ0
II. H 0 :σ 2=σ 20
2 2
III. μ< μ0 , σ < σ 0
2 2
IV. μ=μ 0 ,σ < σ 0 is a composite hypothesis.

Test of Hypothesis:
A test of statistical hypothesis is a procedure or a rule for deciding whether to accept or reject
the hypothesis on the basis of sample values obtained.
For example, let x 1 , x 2 … … x n be random sample from on N ( x 1 4 ) .
Also let.
H :μ ≤ 15. One of the test is as follows
Reject H if and only if
A test is usually described in term of some statistic T =T ( x 1 , … , x n ) which reduced the
experimented data. Such a statistic associated with the test is called a test statistic.

Null hypothesis:
A hypothesis which is tested under the assumption that it is true is called a null hypothesis. The
null hypothesis asserts that there is no (significant) difference between the statistic and the
population parameter and whatever observed difference is there is merely due to fluctuations
in sampling from the same population. It is denoted by H 0.
R.A. fisher 'The hypothesis which is tested for possible rejection under the assumption that it is
true."
Symbolically, the above defination can be explained as:
Let X 1 , … X n be a random sample from population with dF F θ , θ∈ Θ whereΘ  is the parametric
space.
Then the null hypothesis is,
H 0 :θ ∈ⒽoC Ⓗ
The null hypothesis is simple if H 0 is a single for set, otherwise it is composite.

Alternative Hypothesis
Any hypothesis which contraticts the null hypothesics i.e., whenever we reject the null
hypothesis, the conclusion we do accept is called the altenative hypothesis and it is denoted by
H1
The two hypothesis H 0∧H 1 and HI are such that if one is true, other is false and vice—versa.
Symbolically, it is explained as.
H 1 :θ ∈ Θ−Θ0
The hypothesis is simple if Θ−Θ0 is a singleten set, otherwise it is composite.
If we have to test whether the population mean μ has a specified value μ0 , then the null
hypothesics.
H 0 : μ=μ0

Unit - 4
Non-Parametric Test-
Almost all the exact (small) sample tests of significance are based on the fundamental assumption
that-
1) Parent popluation is normal.
2) They are concerned with testing or estimating the means and variance of these populations.
The tests, which deal with the parameters of the population are known as parametric tests. The
parametric techniques are not five distribution. Thus, a parametric test is a test whose model
specifies certain conditions about the parameters of the population from which the samples are
drawn.

On the other hand a Non-parametric test is a test that does not depend on the particular fom of the
basic frequency function from which he samples are drawn i.e. Non parametric test does not make
any assumption regarding the form of the population. It means the Non-Parametric techniques are
distribution free.

The hypothesis of a non-parametric test are concerned with something other than the value of a
population parameter. A large number of these tests exist but few of the better known and more
widely used ones.
1) The sign test for paired data, where positive or negative signs ae substituted for quantitative
values.
2) A rank sum test, often called the Mann-Whitney U test, which can used to determined
whether two independent sample have been drawn from the same population. It uses more
information than the sign test.
3) Another rank sum test. the Krushal—Walis test, which generalized the analysis of variance to
enable us to despense with the assumption that population are normally distributed.
4) The one sample runs test,a methods for determining the randomness with which sampled
items have been selected.
5) Rank correlation, a methods for doing correlation analysis when the data are not available to
use in numerical form, but when information sufficient to rank the data fist, second and so
on.
6) The Kolmogorov-Smimov test, another method for determing the goodness of fit between
an observed sample and a theoretical distribution.
Assumptions—
Certain assumptions associated with Non-parametric test are-
a) Sample observations are independent.
b) The variable under study is continuous.continuous.
c) The probability densi funct'
d) Lower order momen exist.

Considerations for the Non-parametric statistics —


The non-parametric statistics may be effectively used on the basis of the following considerations—
1) It is used in non-probability sample as well as probability sample.
2) The scores have free distribution. It is a distribution free technique.
3) It has the focus on difference of two medians of two or more tests of data.
4) The calculation are simply counting frequencies rather than arithmetical opetations.
5) The data are nominal and ordinal scale. It may also applied in interval scale.
6) The population is not known and not well defined.
7) The statistics are not used for estimating parameters. These are interpreted or described as
it is.
Non-parametric tests can be used as short-cut replacements for more complicated tests. They are
especially valuable in dealing with non-numerical data, such as arise when consumers rank cereals or
other products in order of preference.
Advantages of Non-parametric test —
Non-parametric test have the following advantages—
a) They require few assumptions.
b) It has non assumption or condition to use for analysing data.
c) No assumption is made about the form of the frequency function of the parent population
from which sampling is done.
d) simple and easy apply and don't require complicated sample theory.
e) Non parametric test can be used in situations where actual measurements are available and
the data are obtained only as ranks.
f) The application of Non-Parametric tests are psychometry, sociology and educational
statistics.
g) They do not require us to make the assumption that a population distributed in the shape of
a normal curve or another specific shape.
h) Generally they are easier to do and to understand. Most non-parametric tests do not
demand the kind of laborious computations often required for example to calculated S.D.
i) Sometimes even formal ordering or ranking is not required.

Disadvantages Of Non-Parametric Test—


a) Nonparametric test can be used only if the measurements are nominal or ordinal.
b) No Non-parametric methods exist for testing interactions in analysis of variance model
unless special assumption about the additivity of the model are made.
c) Non-paremetric tests are designed to test statistical hypothesis only and not for estimating
parameters.
d) They ignore a certain amount of information. We have demonstrated how the value 1.2.3.4
and 5 can replace the numbers 13.33, 76.50, 101.79, 113.45 and 189.42.
Parametric 113.45 189.42 76.50 13.33 101.79
value
Non-parametric 4 5 2 1 3
value

Yet if we represent " 189.42 " by "5 " we loose information that is contained in the value of 189.42. If
the value becomes 1189.42 and still be the fifth or largest value of list.
e) They are often not as effectient or sharp as parametric tests. These assumptions are fewer
and much weaker than those associated with parametric tests.

Sign test, Wilcoxon's test, Wilcoxon's signed rank test.


for one sample
The sign test was developed 1946 by Dickson and Bhud as a simple non-parametric test. It is an
alternative technique parametric ‘t’ test. It is useful in the situation quantitative measurement is not

Sign test is specificially designed for testing hypothesis above the median of any continuous
population like mean, median is the measure of centre or location of distribution therefore the sign
sometime called as test for location.

Let x 1 , x 2 … … x n a random sample from a population with unknown median θ . Suppose we are
required to test the hypothesis H 1 :θ=θ 0 (some specified value) against the one sided alternative
H 1 :θ <θ0 or H 1 :θ >θ0∨¿
two sided H 1 :θ ≠ θ0 .
If the sample comes from distribution with median θ0 then on average half of the observation will be
greater than θ0 and half will be smaller than θ0 .
It name comes from the fact that it is based on the direction {or signs for pluses or minuses) of a of
observation not on their numerical magnitude.
Steps:
1) Replace each observation greater than θ0 plus ¿ sign and each observation smaller than θ0
by negative ( – ) sign. Sample values equal to θ0 may be ignored.
2) Count number of plus ¿ sign and denote it by r and number of negative ( – ) sign by s with
r+s≤ n.
The distribution of r given r + s with p=P [ x >θ0 ] . The number r of plus signs may be used to test
1
H 0, which is equivalent to testing for the binomial parameter i.e. for H 0 : p=
2
Note- If P-value ≤ α . then H 0 is rejected otherwise H 0 is accepted

Wilcoxon's signed rank test-


 One sample Wilcoxon signed rank test— This test will be more efficient for testing for the
median of a distribution that the sign test. Provided that the distribution is continuous and
symmetric, since this test considers both signs and the magnitudes (ranks).
Consider a distribution which is continuous and symmetric. Let x 1 , x 2 … … x n be n random and
independent observations from this distribution. Suppose we are required to test the hypothesis
about the median ( θ ) .
H 0 :θ=θ 0
Then under H 0, the differences d 1=x 1−θ 0 are independent and come from a continuous
distribution which is symmetric about zero. So positive and negative d 1 of the same absolute value
have equal probabilities to occur.
1) Assign rank to absolute differences, i.e. |d 1| from the smallest to largest.
Ignore (Ys as they are assumed occur with zero probability.
2) If the observation are repeated. Use the concept of third rank.
+¿¿ −¿¿
Now Let T be the sum of ranks of the positive d 1 and T be that of negative d 1. Then
m (m+ 1 )

m being the number of non-zero d 1 and m ≤n


−¿= ¿
+¿+T 2
¿.
T
−¿¿ m ( m+1 )
The null distribution of T +¿¿T ¿
are identical, each being symmetric about and ranging
4
m ( m+1 ) −¿=
m (m+ 1 )
¿ −¿¿
from 0 to . Also, T +¿+T 2
¿. So test statistics based on T +¿¿T ¿
are related and provided
2
equivalent test criteria.

It is more convenient to work with T. the smaller of the two sums.


and table gives the values the random variable
T,which either case may be). such that
PIT Tal_i a
HO 9 against etent
Critical region
Alte mative

Unit-5

Analysis of Variance:
Analysis of variance is one of the most powerful tools of statistical analysis and this technique is used
to test whether the difference between the means of three or more populations is significant or not.
The systematic procedure of this statistical technique was first developed by R.A.Fisher and the F-
distribution was named in his honour. Earlier, this tehnique was used in agricultural experiments.
NOW analysis Of variance is widely used in natural, social and physical sciences.
Example:
1) We can explain by using the technique of analysis of variance whether different varieties of
seeds or fertilizers or soils differ significantly or not as regards average yields of land.
2) A manager of a firm may use this technique to know whether there is a significant difference
in the average sale figures of different salesman employed by the firm.
3) The difference in various types of drugs manufactured by a company to cure a particular
disease may be studied through this technique.
Thus, through ANOVA technique one can, in general, investigate any number of factors which are
hypothesized or said to influence the dependent variable. One may as well investigates the
difference amongst various categories within each of these factors which may have a large number
of possible values.

The technique of analysis of vanance split up the variance :!.. venous components. Usually the
variance (or total variance) is splm,.. • two parts :
(a) Variance between the samples.
(b) Variance within the samples.

According to H.T. Hayslett :


The analysis of variance is a technique that separates the variation that is present into independent
components : then these components are analysed in order to test certain hypothesis,

Assumptions in Analysis of Variance : The Analysis of Variance (ANOVA) is based on the following
assumptions :
1) The samples are independently or randomly drawn from the populations.
2) All the populations from which samples have been drawn are normally distributed.
3) The variance of all the populations are equal.

Technique of Analysis of Variance (ANOVA) The observations may be classified according to one
factor (criterion) or two factors. The classifications according to one factor and two factors are
respectively called one-way cfrceifications and two-way classification.

Basics Principle of ANOVA :


The basic principle of ANOVA is to test for differences among the means of the populations by
examining the amount of variation within each of these samples, relatively to the amount of
variation between the sample. In terms of variation within the given population, it is assumed that
the valuer of (X1) differs from the mean of this population only because of random effects I.e. There
are influences on XI which are unexplainable, whereas in examining differences between population
we assume that the difference between the mean of the jth population and the grand mean is
attributable to what is called a 'specific factor.

We have to make two estimates of population variance viz, one based on between sample variance
and the other based on within sample variance. The two estimates of population are compared with
F-test.

Estimation of population variance based on between sample variance Estimation of population


variance based on within samples variance. The F„, is compared with the F„, and it P. > F„ Then we
may say that there are significant difference between the sample means.

One-way classification.
one-Way classification the observation are classified according to one factor.

Example:
The yields of several piots of land may be classified according to one or more types of fertilisers
Following are the methods by which we can perform ANOVA:
(a) Direct Method
(b) Shortcut Method
(c) Coding Method

 Direct Method:- Let the Null Hypothesis be H 0 μ1=μ2 ,=⋯ ⋯ ⋯ ⋯=μ n .

Design of Experiments:-

Gamma Distribution Function:-

The continuous variable x is distributed according to the probability Law

{ }
−x λ−1
e x
f ( x )= ,if λ>0.0< x <∞
λ!
0 ; otherwise

Where, n !=∫ e
−x
x n−1 dx is known as Gamma Distribution and is denoted by γ ( λ ) .
0

Moment Generating function of Gamma Distribution:



∵ M x t=∫ e f ( x ) dx
tx

∞ −x λ−1
e x
M x t=∫ e
tx
dx
0 λ!

1
M x t= ∫ e
− x ( 1−t) λ−1
x dx
λ! 0

On putting

x ( 1−t ) = y
y
x=
( 1−t )

( )
λ−1
1 y dy
M x t= ∫ e
−y
λ! 0 1−t ( 1−t )

1
λ∫
− y λ −1
M x t= e y dy
λ! (1−t ) 0

[ ]

∵∫ e
−x n−1
x dx=n !
0

1
∴ M x t= λ
[ λ !]
λ ! ( 1−t )

M x t=( 1−t )− λ

Hence, This is the expression for Moment Generating function of Gamma Distribution.

Moments=μ1 , μ 2 …

Constants=μ 1 , μ2 …

Constants of Gamma Distribution:( k x t )

( Commulant Generating Functionof Gamma Distribution )


∵ k x ( t )=log M x ( t )

⇒ k x ( t )=log ( 1−t )−λ

⇒ k x ( t )=−λ log ( 1−t )

[ ∵ log ( 1−x ) =−x−


x 2 x3 x 4
− − +…
2 3 4 ]
[ t2 t3 t 4
k x ( t )=− λ −t− − − + …
2 3 4 ]
[ t 2 t3 t 4
k x ( t )= λ t+ + + +…
2 3 4 ]
r
t
Since, k r=coeff of k (t )
r! x
1
t
No, k 1=coeff of k ( t ) ⇒k 1= λ=μ 1 ( variance )
1! x
t2 ( )
k 2=coeff of k t ⇒k 2 =λ=μ 2
2! x
t3
k 3=coeff of k ( t ) ⇒k 3=2 λ=μ 3
3! x
t4 ( )
k 4=coeff of k x t ⇒ k 4 =6 λ=μ4
4!
μ4 =k 4 +3 k 22
2
μ4 =6 λ+ 3 λ
μ4 =3 λ ( 2+ λ )
These are the constants of Gamma Distribution
Skewness:
2 2
μ3
(2 λ) 4
β 1= 3 = 3 =
μ2 λ λ

β 1=4
Kurtosis:
μ4 3 λ ( 2+ λ )
β 2= 2
=
μ 2 λ2
3 ( 2+ λ )
β 2=
λ
Additive property of Gamma Distribution:

“The sum of independent Gamma variates is also a gamma variates.”

Let x i ( where i=1,2,3 , … k ) be the continuous random variables of gamma distribution. We know
that moment generating function of Gamma Disribution will be given as
−λ i
M x t = (1−t )
i

Since, we know that the Moment Generating function of sum of variates ( x i +1+2+3+… k ) is given
by

M ( x +1+2+3 +… k ) ( t )=M x t + M x t +…+ M x t


i 1 2 n

{ ∴ By property of Moment Generating function }


− λ1 −λ2 −λ 3 − λk
∴ M ( x +1 +2+3+… k ) ( t )=( 1−t )
i
+ ( 1−t ) + ( 1−t ) + …+ ( 1−t )
( λ1+ λ2 +λ 3+… λk )
M ( x +1+2+3 +… k ) ( t )= (1−t )
i

Which is again a Moment Generating function. Therefore, the result follows by uniqueness
theorem.

βeta Distribution :-
There are two types of βeta Distribution−¿
st
I. βeta Distribution of Kind I
nd
II. βeta Distribution of Kind II
st
βeta Distribution of Kind I :-
The continuous random variable which is distributed according to the probability Law
{ }
1 μ −1 v−1
x ( 1−x ) ( μ , v ) > 0 ,0< x<1
f ( x )= β ( μ , v )
0 , otherwise
1
Where β ( m, n )=∫ x
m−1
( 1−x )n−1 dx is known as βeta Distribution of kind I st
0

Constant of βeta Distribution are given by the relation


1
μr =∫ x f ( x ) dx
ˈ r

1
1
μr =∫ x
ˈ r μ−1 v−1
x ( 1−x ) dx
0 β (μ , v )
1
1
μ=
ˈ
r ∫
β (μ,v) 0
x
μ +r−1
( 1−x )v−1 dx

{ }
1
m !n !
∵ β ( m, n ) =∫ x
m −1
( 1−x )n−1 dx , β ( m , n )=
0 ( m +n ) !

ˈ 1
μr = β ( μ+ r , v )
β (μ,v)
( μ+ v ) ! ( μ+r ) ! v !
μˈr =
μ ! v ! ( μ+r + v ) !

( μ+ v ) ! ( μ+ r ) !
μˈr =
μ ! ( μ+r + v ) !

{ ( n+1 ) !=n ! n }
Now, mean

ˈ ( μ+ v ) ! ( μ+1 ) !
μ1=
μ ! ( μ +v +1 ) !

ˈ ( μ +v ) ! μ! μ
μ1 =
( μ+ v )( μ+ v ) ! μ !
ˈ μ ˈ
μ1 = =μ 1( mean)
( μ+ v )

ˈ ( μ+ v ) ! ( μ+2 ) !
μ2=
μ ! ( μ +v +2 ) !
μ ( μ+1 ) μ! ( μ+ v ) !
μˈ2=
( μ+ v+1 )( μ+ v )( μ+ v ) ! μ !

ˈ μ ( μ+1 )
μ2=
( μ+ v+1 )( μ+ v )
Since, variance
2
μ2=μ ˈ2−( μˈ1 )

[ ]
2
μ ( μ+1 ) μ
μ2= −
( μ+ v+1 )( μ+ v ) ( μ+ v )

μ ( μ+1 ) μ2
μ2= −
( μ+ v+1 )( μ+ v ) ( μ+ v )2
2
μ ( μ +1 )( μ+ v )−μ ( μ+v +1 )
μ2 =
( μ+ v +1 ) ( μ+ v )2

μ2=
μ
( μ+ v ){( μ+1 ) ( μ+ v )−( μ 2+ μv + μ )
( μ+ v+1 ) ( μ+ v ) }
μ2=
μ
( μ+ v ){μ2 + μv+ μ+ v−μ 2−μv−μ
μ2 + μv+ μv + v 2+ μ+ v }
μv
μ2=
( μ+ v )( μ+ v+ 1 )( μ+ v )
μv
μ2= 2
( variance)
( μ+ v ) ( μ+ v+1 )
μ3

βeta Distribution of Kind II nd:-


The condition random variable x which is distributed according to the probability Law

{ }
1 x μ−1
f ( x )= β ( μ , v ) ( 1+ x )μ+ v ( μ , v )> 0 , 0< x< 1
0 , otherwise

is known as βeta Distribution of kind II nd

Constant of βeta Distribution are given by the relation



μˈr =∫ x r f ( x ) dx
0


1 xr −1
μ =∫ x r
ˈ
r dx
0 β ( μ , v ) ( 1+ x )μ +v
∞ μ+r −1
1 x
μ=
ˈ
r ∫
β ( μ , v ) 0 ( 1+ x ) μ+v +r−r
dx

{ }
∞ m−1
x
∵ β ( m, n ) =∫
0 ( 1+ x )m+ n
∞ μ+r −1
1 x
ˈ
μ=
r ∫
β ( μ , v ) 0 ( 1+ x ) μ+v +r−r
dx

ˈ 1
μr = . β ( μ+r , v−r )
β (μ,v)

1 ( μ +r ) ! ( v−r ) !
so μˈr =
μ !v ! ( μ+r + v−r )
( μ+v ) !
( μ+ r ) ! ( v−r ) !
μˈr =
μ !v !
In Particular when r =1, from eqn ( 3 )

Mean,
ˈ ( μ+1 ) ! ( v−1 ) !
μ1=
μ!v !
μ ! μ ( v −1 ) !
μˈ1=
μ ! ( v−1 ) v−1!
ˈ μ
μ1= =μ (mean)
( v −1 ) 1
( μ+2 ) ! ( v−2 ) !
μˈ2=
μ!v !

( μ+1 ) μ !μ ( v−2 ) !
μˈ2=
μ ! ( v−1 )( v−2 ) v−2 !
μ ( μ+1 )
μˈ2=
( v −1 )( v−2 )
and variance

μ ( μ+1 ) μ2
μ2= −
( v −1 )( v−2 ) ( v−1 )2

μ2=
μ
[
( μ +1 )

μ
( v −1 ) ( v−2 ) ( v−1 ) ]
μ2 =
μ
( v −1 ) [
μv−μ+ v−1−μv +2 μ
( v−1 ) ( v−2 ) ]
μ2=
μ
[
( v +1 ) ( μ−1 )
( v −1 ) ( v−1 )( v−2 ) ]
μ ( μ+ v−1 )
μ2= variance
( v −1 )2 ( v−2 )

Exam Plan
Unit-I
1) Estimation Theory
a) Point estimation
b) Interval estimation
2) Parameteric Space
3) Consistency and sufficient condition for consistency
4) Unbaisedness
5) Efficiency and most efficient estimator
6) Sufficiency and Properties of sufficient estimators.
7) Properties of Maximum Likelihood Estimators(M.L.E)
8) Method of Minimum Variance(M.M.V)
9) Crammer-Rao Inequality and Conditions for the equality sign in CR Inequality
10) Unit-II
11) Hypothesis and Example of it.
12) Simple Hypothesis and Example of it.
13) Composite Hypothesis
14) Test of Hypothesis
15) Null Hypothesis
16) Alternative Hypothesis
17) Two-types of errors in sampling
a) Type I errors
b) Type II errors
18) Probability forms
19) Critical Region
20) Critical Value
21) Best Critical Region
22) One Sided Test, Left Sided Test & Right Sided Test
23)

Exam Question
1) Explain the method of maximum likelihood. State the properties of the maximum likelihood
estimator.
2) State and prove Cramer-Rao Inequality. Let x 1 , x 2 … … x n be a random sample from N ( μ , σ 2 )
where σ 2 is known. Obtain M.V.U.E. for μ.
3) Explain with suitable illustrative examples, the problem of estimation, and the criteria of
unbiasedness, consistency, efficiency and sufficiency for an estimator.
4) Find the maximum likelihood estimators of θ for a random sample x 1 , x 2 … … x n from a
distribution:
−x
1 θ
a) with probability density function f ( x )= e ;0<x
θ
e−θ θ x
b) with probability mass function p ( x ) = ; x=0 , 1 ,2 , …
x!
5) Explain the problem of estimation in statistical theory, with a suitable example.
If x 1 , x 2 , x 3 … … , x n are independent random observations from a normal population with mean
μ and variance σ 2, where σ 2 is finite, and
n n
1 1
x= ∑
n i =1
x i , s = ∑ ( x i−x )
2
n i=1
2

State, giving reasons, which of the following statement is true:


a) x is an unbiased estimator of μ
b) s2 is an unbiased estimator of f σ 2 .
1
c) x + is a consistent estimator of μ.
n
d) x is a sufficient estimator of μ.
6) What are maximum likelihood estimators ? State their important properties. State Cramer Rao
inequality.
7) Define sufficiency of and estimator. Show one example of sufficient estimator. Show also that if
maximum likelihood estimator exists then it is a function of sufficient estimator.
8) For Poisson population, estimate the parameter using maximum likelihood method of estimation
and show that this estimator possess the minimum variance given by Cramer-Rao Inequality.
9) What are estimators? When is an estimator said to be
a) Unbiased
b) Consistent
c) Minimum variance bound estimator.
10) State Cramer’s Rao inequality and state the condition when equality sign is attained.
11) For a random sample of

You might also like