(Teaching Notes)
Introduction:
Statistics is a science the science of inference.
Data summarized or otherwise, are used in the inference along with
tools of probability theory and inductive or deductive reasoning.
Definition
Statistical inference comprises those methods concerned with the
analysis of a subset of data leading to predictions or inferences about the
entire set of data.
Also, statistical inference means making a probability judgment
concerning a population on the basis of one or more samples.
There are two subdivisions within statistics: Descriptive statistics and
inferential statistics.
Descriptive statistics simply summarize the given data, bringing
out their important features and no attempt is made to infer
anything that pertains to more than the data themselves.
E.g.
In the financial year 2008/2009, seventy one (71) out of 133
district councils obtained clean financial audit which is 53.4%.
Inferential statistics uses a number of qualitative techniques that
enable us to make appropriate generalization from limited
observations.
e.g.
Suppose the department of mathematics and statistics wants to
establish masters programme in the academic year 2011/2012. To
meet statistician market demand, the department will conduct a
survey in various institutions to explore lacking skills so that the
masters curriculum to focus on.
Note: Statistical inference rely on the theory of probability regardless
whether we are dealing with point or interval estimation, tests of hypothesis
or correlation.
By Josephat Peter  UDOM
A numerical measure of a population is called a population
parameter, or simply parameter.
A numerical measure of the sample is called a sample statistic, or
simply a statistic.
Population parameters are estimated by sample statistics. When a
sample statistic is used to estimate a population parameter, the
statistic is called an estimator of the parameter.
Statistical Inference is to be subject to this subject
Two important problems in statistical inference are estimation and tests of
hypotheses.
Topic 1: ESTIMATION
Assume that some characteristic of the elements in a population can be
represented by a random variable X whose density is x
f
(x; ) = f(x; ),
where the form of the density is assumed known except that it contains an
unknown parameter (if were known, the density function would be
completely specified, and there would be no need to make inferences about
it).
Further more assume that the values n
x x x x ,..., , ,
3 2 1 of a random sample
n
X X X ,..., ,
2 1 from f(x; ) can be observed. On the basis of the observed
sample values n
x x x x ,..., , ,
3 2 1 it is desired to estimate the value of the
unknown parameter or the value of some function, say
(), of the
unknown parameter. This estimation can be made in two ways (types of
estimates): point and interval estimation.
1.1POINT ESTIMATION
Definition
 Is to let the value of some statistic to represent or estimate the unknown
parameter.
 Is a single number which is used to estimate an unknown population
parameter.
e.g. The sample mean, X is the sample statistic used as an estimator
of the population mean,
1
is an estimator of a mean
, and n
x
is an
estimate of
.
Methods of Moments
In mechanics moment is used to denote the rotating effect of a force. In
statistics, it is used to indicate peculiarities of a frequency distribution. We
can measure central tendency, dispersion or variability, skewness and the
peakedness of the curve.
The moments about the actual arithmetic mean are:
First moment:
( )
X X
N
1 1
1
Second moment:
( )
2
1 2
1
X X
N
=
2
VARIANCE
Third Moment:
( )
3
1 3
1
X X
N
=
) (SKEWNESS
Fourth Moment:
( )
4
1 4
1
X X
N
= KURTOSIS
Let
( )
k
X f ,....., ;
1 be a density of a random variable X which has k
parameters k
,.....,
1 . Let
r
1
Form the k equations
( )
k j j
M ,....,
1
,
, ,....., 1 k j
in k variables
k
,.....,
1 and let k
,....,
1 be their solution (we assume that there is a
unique solution). We say that the estimator
( )
k
,.....,
1 , where j
estimates
j
, is the estimator of
) ,....., (
1 k
obtained by the method of moments. The
estimators were obtained by replacing population moments by sample
moments.
For simplicity it can be defined that:
By Josephat Peter  UDOM
 Population moment. Let X follow a specific population distribution. The k
th moment of the population distribution with pdf f(x) is:
( )
k
X E
.
 Sample moment. Let n
X X X ,...., ,
2 1 be a random sample from a pdf f(x).
The kth sample moment is:
n
X
n
i
i
k
1
First moment:
n
X
M
n
i
i
1
1
(These are moments about zero)
Second moment:
n
X
M
n
i
i
1
2
2
, e.t.
Sample moments can be used to estimate population moments
Example 1
Let n
X X X ,...., ,
2 1 be a random sample from a normal distribution with
mean
and variance
2
. Let
( ) ( ) , ,
2 1
. Estimate the parameters
and
is
X X
n
M
n
i
i
1
1
1
(Treat
)
( )
2 2
2 2 2
, + M
(Treat
2
as a second moment, then equate a second moment formula with
the formula for calculating variance)
The method of moments estimator of
is
2
1
2
1
2
1 1
,
_
+
n
i
i
n
i
i X
n
X
n
By Josephat Peter  UDOM
2
1 1
2 2
1 1
,
_
n
i
i
n
i
i X
n
X
n
=
2
1
2 2
1
X X
n
n
i
i
=
( )
n
i
i
N
X X
1
2
Note: estimator of
is not
2
S
Example 2
( ) , N
Solution
n
i
i
X
n
1
1
and
2 2
1
2
1
n
i
i X
n
To get
X
and
( )
2
1
2
1
,
n
I
i
X X
n
NOTE:
Method of moments estimators are not uniquely defined. So far we have
been using first k raw moments. But central moments could also be used to
obtain equations whose solution would also produce estimators that would
be labeled method of moment estimators. Also moments other than the first k
could be used to obtain estimators.
Exercise
1. Let n
X X ,......,
1 be a random sample from a uniform distribution on
( ) 3 , 3 +
. Use the method of moments to estimate the
parameters
2. Let n
X X ,......,
1 be a random sample from a Poisson distribution with
parameter
. Estimate
.
Maximum Likelihood
This technique of finding estimators was first used and developed by Sir
R.A Fisher in 1922, who called it the maximum likelihood method.
By Josephat Peter  UDOM
The maximum Likelihood method provides estimators with the desirable
properties such as efficiency, consistency and sufficient. It usually does not
give an unbiased estimation.
Example 1:
Suppose we want to estimate the average grade
of LG university
examination. A random sample of size
128 n
is taken and the sample mean
x
is found to be 64 marks.
Clarification:  The assumption is that a sample of
128 n
represent a
population.
 From which population is
64 x
most probably come? A
population with
60
, 64 or 75?
Note: The population mean
,
_
;
,
for
n x ,..., 2 , 1 , 0
Where
p q 1
and
p
is the probability of drawing a black ball.
By Josephat Peter  UDOM
4
1
p
or
4
3
p
 Here; we draw a sample of three balls,
3 n
, with
replacement and attempt to estimate the unknown parameter
p
of the distribution.
 The choice is to be done between only two numbers,
4
1
and
4
3
.
The possible outcomes and their probabilities are as follows:
Outcome: x 0 1 2 3
,
_
4
3
; x f
64
1
64
9
64
27
64
27
,
_
4
1
; x f
64
27
64
27
64
9
64
1
For example, if we found
0 x
in a sample of 3, the estimate
4
1
for p would
be preferred over
4
3
because the probability
64
27
is greater than
64
1
.
Generally we should estimate p by
4
1
when
0 x
or 1 and by
4
3
when
2 x
or 3. The estimator may be defined as
( )
'
3 , 2 : 7 5 . 0
1 , 0 : 2 5 . 0
x
x
x p p
The estimator thus selects for every possible x the value of
p
such that
( ) ( ) p x f p x f > ; . ;
, where
p
is an alternative value of p.
Further more:
If several alternative values of p were possible, we might reasonably proceed
in the same manner. Thus if we found
6 x
in a sample of 25 from a
binomial population, we should substitute all possible values of p in the
expression
By Josephat Peter  UDOM
( ) ( )
1 9
6
1
6
2 5
; 6 p p p f
,
_
for
1 0 p
and choose as our estimate that value of p which maximized
( ) p f ; 6
.
Maximum value can be found by equating first derivative equal to zero.
i.e.
( ) ( ) ( ) [ ] 0 1 9 1 6 1
6
2 5
; 6
1 8
5
,
_
p p p p p f
d p
d
we found that
25
6
, 1 , 0 p
are the roots.
The root which gives maximum value is
25
6
p
The estimate has the property
( ) ( ) p f p f > ; 6 ; 6
, where
p
is an alternative
value of p in the interval
1 0 p
Definition of Likelihood Function (likelihood=chance=probability)
The likelihood function of n random variables n
X X X ,....., ,
2 1 is defined to
be the joint density of the n random variables, say
( ) ; ,.....,
1 ,.....,
1
n X X
x x f
n
,
which is considered to be a function of
. In particular, if n
X X ,.....,
1 is a
random sample from the density
( ) ; x f
, then the likelihood function is
( ) ( ) ( ) ; ..... ; ;
2 1 n
x f x f x f
.
Notation for likelihood function is
( )
n
x x L ,....., ;
1
.
Definition of Maximum likelihood estimator:
Let
( ) ( )
n
x x L L ,....., ;
1
be the likelihood function for the random
variables n
X X X ,....., ,
2 1 . If
[where
( )
n
x x x ,....., ,
2 1
is a function of
the observations n
x x ,.....,
1 ] is the value of
in which maximizes
( ) L
,
then
( )
n
X X X ,....., ,
2 1
is the maximumlikelihood estimator of
.
( )
n
x x ,.....,
1
is the maximumlikelihood estimate of
d
dL
Also
( ) L
and ln
( ) L
have their maxima at the same value of
, and it is
sometimes easier to find the maximum of the natural logarithm of the
likelihood.
If the likelihood function contains k parameters, i.e.
( ) ( )
n
i
k i k
x f L
1
2 1 2 1
,....., , ; ,....., ,
then the maximumlikelihood estimators of the parameters k
,....., ,
2 1 are
the random variables
( )
n
X X ,.....,
1 1 1
,
( )
n
X X ,.....,
1 2 2
, ..,
( )
n k k
X X ,.....,
1
, where
k
,.....,
2 1
are the values in which
maximize
( )
K
L ,....., ,
2 1
.
If certain regularity conditions are satisfied, the point where the likelihood is
a maximum is a solution of the k equations
By Josephat Peter  UDOM
( )
0
,.....,
1
1
d
dL
k
( )
0
,.....,
2
1
d
dL
k
.
.
.
( )
0
,.....,
1
k
k
d
dL
In this case it may also be easier to work with the natural logarithm of the
likelihood.
Example
Suppose that a random sample of size n is drawn from the Bernoulli
distribution
( )
x x
q p p x f
1
;
,
1 0 p
and
p q 1
. The sample values
n
x x x ,....., ,
2 1 will be a sequence of 0s and 1s.
The likelihood function is
( )
n
i
x x
i i
q p p L
1
1
=
i i
x n x
q p
By apply ln we get:
( ) ( ) q x n p x p L
i i
ln ln ln
+
First derivative:
( )
q
x n
p
x
dp
p L d
i i
ln
By substituting
p q 1
,
We find the estimate as:
i
i
x
n
x
p
Consider that
3 n
; the likelihood function can be represented by the
following four curves:
By Josephat Peter  UDOM
( ) ( )
( ) ( )
( ) ( )
( )
3
3
2
2
2
1
3
0
3 ;
1 2 ;
1 1 ;
1 0 ;
p x p L L
p p x p L L
p p x p L L
p x p L L
i
i
i
i
L(p)
0
L
3
L
1
L
2
L
0 1 p
Example
A random sample of size n from the normal distribution has the density
( )
1
]
1
n
i
i
x
1
2
2
2
1
exp
2
1
= ( )
1
]
1
,
_
2
2
2
2
2
1
exp
2
1
i
n
x
Taking ln
( )
2
2
2
2
1
ln
2
2 ln
2
i
x
n n
, where
0 >
and
< <
Maximum location, compute first derivative
( )
( )
i
x
d
d
2
1 ln
and
( )
( )
+
2
4 2 2
2
1 1
2
ln
i
x
n
d
d
Equating the equations to 0 to get
x x
n
i
1
( )
2 2
1
x x
n
i
d
dL
Locates both minima and maxima, and hence one
must avoid using a root of the equation which actually
locates a minimum. For example consider the following
figure;
( ) L
as the maximum.
 Maximum likelihood estimator has some desirable optimum
properties other than the natural (intuitively).
 Maximum likelihood estimators posses a property which is
sometimes called the invariance property of maximum
likelihood estimators.
Theorem: Invariance property of maximum likelihood estimators
Let
( )
n
X X X ,....., ,
2 1
be the maximum likelihood estimator of
in the
density
( ) ; x f
, where
is assumed unidimensional. If
( ) x
is a function
with a singlevalued inverse, then the maximum likelihood estimator of
( )
is
( )
.
For example, in the normal density with 0
n
i
i
X
n
1
2
0
1
is
( )
n
i
i
X
n
1
2
0
1
n
i
i
X
n
1
2
0
1
ln
Extension of invariance property of maximum likelihood estimators
Extension is done in two ways:
1. first
is
x
, but since
( ) 1
is not a onetoone function of
and
2
), since we known the estimate of
and
2
. Then the
estimate will be
( )
+
2 2
1
x x
n
x
i
Theorem
Let
( )
k
,.....,
1
, where
( )
n j j
X X ,.....,
1
, be a maximum likelihood
estimator of
( )
k
,.....,
1
in the density
( )
k
x f ,....., ;
1 . If
( ) ( ) ( ) ( )
r
,.....,
1
for
k r 1
is a transformation of the parameter space
, then a maximum likelihood estimator of
( ) ( ) ( ) [ ]
r
,.....,
1
is
( ) ( ) ( ) [ ]
,.....,
1 r
. Note that
( ) ( ); ,.....,
1 k j j
so the maximum
likelihood estimator of
( )
k j
,.....,
1 is
( ) r j
k j
,....., 1 ,
,.....,
1
Exercise 2
1. Uniform distribution
By Josephat Peter  UDOM
Least Square
Regression refers to the statistical technique of modeling the relationship
between variables.
Consider the following simple linear regression
y . . . .
. . . . Data point
.. . . .
.. . . .
. . . . .
. . . Regression line
x
 The points on the graph are randomly chosen observations
of the two variables, X and Y.
 The straight line describes the general movement in the data
We would like our model to explain as much as possible about the process
underlying our data. However due to the uncertainty inherent in all real
By Josephat Peter  UDOM
world situation, our model will probably not explain every thing, and we
will always have some remaining errors. The errors are due to unknown
outside factors that affect the process generating our data.
A good statistical model uses as few mathematical terms as possible to
describe the real situation. The model captures the systematic behavior of
the data, leaving out the factors that are nonsystematic and can not be
foreseen or predicted the errors.
Systematic component
Data Random errors
Model extracts everything
systematic in the data,
leaving purely random errors
The errors, denoted by
=
population slope.
Equation 1 above is composed of two components: nonrandom component
which is line itself and a purely random component the error term.
r randomerro
nonrandom
X Y + +
1 0
The nonrandom part of the model, the straight line, is the equation for the
mean of Y, given X. i.e
( )
X
Y
E
. If the model is correct, the average value of
Y for a given value of X falls right on the regression line.
The conditional mean of Y:
( ) X
X
Y
E
1 0
+
(2)
Sometimes
( ) Y E
or
( ) x
are used instead of
( )
X
Y
E
to denote conditional
mean of Y, for a given value of X.
As X increases, the average population value of Y also increases, assuming a
positive slope of the line, and vice versa.
By Josephat Peter  UDOM
The actual population value of Y is equal to the average Y conditional on X,
plus a random error,
=slope
. . .. . . . . Points are the popln values of X and Y
. . .
0
x
Model assumptions:
1. The relationship between X and Y is a straight line relationship
2. The values of the independent variables X are assumed fixed (not
random); the only randomness in the values of Y comes from the error
term,
.
3. The errors,
~
( )
2
, 0 N
or it can be written as
( ) 0
i
E
and
( )
2
var
i
Estimation
So far, we have described the population model, that is, the assumed true
relationship between the two variables X and Y. Our interest is focused on
this unknown population relationship, and we want to estimate it using
sample information.
We want to find good estimates of the regression parameters, 0
and 1
. A
method that gives us good estimates of the regression coefficients is the
methods of least squares compared to other methods such as minimizing the
sum of the absolute errors.
The estimated regression equation:
e X Y + +
1 0
By Josephat Peter  UDOM
In terms of data, it can be written as follows with the subscript i to signify
each particular data point:
i i i
e x y + +
1 0
Where:
n i ,....., 3 , 2 , 1
Generally:
i i i
i i i
i
i i
Y Y
Y Y
Y
Y
1 0
1 0
+
+
+ +
Sum of squares for error:
( ) ( )
n
i
n
i
i i i
Y Y SSE
1 1
2
2
and 1
and
1
'
1
1
]
1
,
_
,
_
n
i
i i
x y
1
2
1 0
2
1
2
2
1
exp
2
1
By Josephat Peter  UDOM
and
( ) ( )
n
i
i i
x y
n n
L
1
2
1 0
2
2 2
1 0
2
1
ln
2
2 ln
2
, , ln
The partial derivatives of
( )
2
1 0
, , L
with respect to
2
1 0
, ,
are
obtained and set equal to zero. We have three equations
( )
( )
( )
2
2
1 0
1
1 0
1
1 0
0
0
n x y
x x y
x y
i i
i
n
i
i i
n
i
i i
The first two equations are called the normal equations. Solving the above
equations we get
( )( )
( )
( )
n
i
i i
i
i i
x y
n
x y
x x
x x y y
1
2
1 0
2
1 0
2
1
1
0
2
x x
i
;
that is there must be at least two distinct values for the i
x
.
(Properties of point estimation such as minimum variance should be
shown later).
b. Assumption is that only the i
Y
are pair wise uncorrelated;
that is,
[ ] 0 , cov
j i
Y Y
for all
n j i ,....., 2 , 1
. Point estimation of
2
1 0
, ,
and
( ) x
for any x will be discussed.
For this case, n
Y Y Y ,......,
2 , 1 are pairwise uncorrelated random
variables with means n
x x x
1 0 2 1 0 1 1 0
,......, , + + +
and variances
2
. Since the joint density of the i
Y
is not specified, maximum
likelihood estimators of 0
, 1
and
2
cannot be obtained. In models
when the joint density of the observable random variables is not given,
a method of estimation called leastsquares can be utilized.
By Josephat Peter  UDOM
i.e. The values of 1 0
,
that minimize the sum of squares
( )
n
i
i i
x Y
1
2
1 0
are defined to be the leastsquares estimators of 0
and 1
.
From the normal equations shown above, we get; .
( )( )
( )
2
1
x x
x x Y Y
i
i i
and
x Y
1 0
The leastsquares method gives no estimator for
2
, but an estimator of
2
based on the leastsquares estimators of 0
and 1
is
( )
1
]
1
n
i
i i
x Y
n
1
2
1 0
2
2
1
By Josephat Peter  UDOM
1.2INTERVAL ESTIMATE
Point estimates are useful, yet they leave something to be desired. When the
point estimator under consideration had a probability density function, the
probability that the estimator actually equaled the value of the parameter
being estimated was zero (The probability that a continuous random variable
equals any one value is 0).
Hence it seems desirable that a point estimate should accompanied by some
measure of the possible error of the estimate. i.e. Instead of making the
inference of estimating the true value of the parameter to be a point, we
might make the inference of estimating that the true value of the parameter is
contained in some interval (we are referring interval estimation).
Interval estimate is an estimate constituting an interval of numbers rather
than a single number. An interval estimate is an interval believed likely to
contain the unknown population parameter. It conveys more information
than just the point estimate on which it is based.
Like point estimation, the problem of interval estimation is twofold.
There is the problem of finding interval estimators (we need
methods of finding a confidence interval).
There is the problem of determining good or optimal interval
estimators (we need criteria for comparing competing
confidence interval or for assessing the goodness of a
confidence interval).
An interval estimate of a population parameter
and
2
for a
particular sample and also on the sampling distribution of
.
By Josephat Peter  UDOM
e.g. a random sample of Matriculation examination scores for student
entering B.A Statistics at the University of Dar es Salaam in the year 2002
produce an interval 50 70 within which we expect to find the true average
all scores. The values of the end points 50 and 70 will depend on the
computed sample mean
x
and the sampling distribution of X .
As the sample size increases, we know that n
X
2 2
decreases, and
consequently our estimate are likely to be closer to the parameter
,
resulting in a shorter interval. Thus the interval estimate indicates, by its
length, the accuracy of the point estimate.
Since different samples will generally yield different values of
and,
therefore different values of
1
and
2
and
2
.
The interval
2 1
< < is computed from the selected sample, is then called
a
( ) % 100 1
confidence interval, the fraction
1
is called the confidence
coefficient or the degree of confidence, and the end points
1
and
2
are
called the lower and upper confidence limits.
Note: 95% is most useful confidence interval.
e.g. it is better to be 95% confident that the average life of LG
refrigerator is between 7 and 8 years that to be 99% confident that it is
between 4 and 11. We prefer a short interval with a high degree of
confidence.
Some times the restrictions on the size of our sample prevent us from
achieving us from achieving short intervals.
In practice, estimates are often given in the form of the estimates plus or
minus a certain amount. e.g. The National Bureau of Statistics, department
of labor statistics may estimate the number of unemployed in a certain area
to be
2 . 0 7 . 5 t
million at a given time, feeling rather sure that the actual
number is between 5.5 and 5.9 million.
By Josephat Peter  UDOM
Suppose that a random sample (1.2, 3.4, 0.6, 5.6) of four observations is
drawn from a normal population with an unknown mean
and a known
standard deviation 3. The maximum likelihood estimate of
is the mean of
the sample observations;
7 . 2 x
.
We wish to determine upper and lower limits which are rather certain to
contain the true unknown parameter value between them.
From sample of size 4 from normal distribution
2 3
X
Z
will be normally
distributed with mean 0 and unit variance.
Hence
( ) ( )
2
2
1
2
1
z
z
e z z f
. We can compute the probability that Z will be
between any two arbitrary chosen numbers. Consider 95%.
( ) z
2
z
2
z
Thus
[ ] ( )
< <
96 . 1
96 . 1
95 . 0 96 . 1 96 . 1 dz z Z P
Substituting Z, we get
[ ] 95 . 0 96 . 1
2 3
96 . 1 <
<
X
P
( ) ( ) [ ]
[ ]
( ) 64 . 5 , 24 . 0
95 . 0 94 . 2 7 . 2 94 . 2 7 . 2
95 . 0 2 3 96 . 1 2 3 96 . 1
+ < <
+ < <
P
X X P
The method for finding a confidence that has been used in the example
above is a general method. This technique is applicable in many important
problems, but in other it is not because in these others it is either impossible
to find functions of the desired form or it is impossible to rewrite the derived
probability statements.
1.2.1 Confidence Interval of MEAN
By Josephat Peter  UDOM
There are really two cases to consider depending on whether or not
2
is
known.
Confidence Interval of Mean when the population
Standard Deviation is known
The central limit theorem tells us that when we select a large random sample
from any population with mean
, the sample
mean, X is (at least approximately) normally distributed with mean
and
standard deviation
n
. If the population itself is normal, X is normally
distributed for any sample size.
Transforming Z to the random variable X with mean
and standard
deviation
n
, we find that before the sampling there is a
1
probability the X will fall within the interval:
n
Z
2
t
Once we have obtained our random sample, we have a particular value,
x
.
This particular
x
either lies within the range of values specified by the
formula above or not lie within this range.
Since the random sampling has already taken place and a particular
x
has
been computed, we no longer have a random variable and may no longer
talk about probabilities. We may say that we are 95% confident that
x
lies
within the interval (about 95% of the values of X obtained in a large
number of repeated samplings will fall within the interval).
Note: We cannot say that there is a 0.95 probability that
is inside the
interval, because the interval
n x 96 . 1 t
is not random, and neither is
.
The population mean
when
2
t
By Josephat Peter  UDOM
Z value for 90% CI is 1.645
For 99% CI. is 2.58 or (using approximation interpolation, 2.576)
For 95% CI is 1.96
Note: When sampling from the same population, using a fixed sample
size, the higher the confidence level, the wider the interval.
e.g. 80% CI for
with n = 25,
x
= 122 and
10
is (116.88,
127.12)
but 95% CI is [114.16, 129.84]
80% is narrow compared to 95%
That means a wider interval has more of a presampling chance of capturing
the unknown population parameter. If we want 100% CI for a parameter, the
interval must be
[ ] ,
since the probability of capturing a parameter is 1.
Such probability will be obtained by allowing Z to be anywhere from
to .
If want both a narrow interval and a high degree of confidence, we need to
have a large amount of information because the larger the sample size the
narrower the interval.
When sampling from the same population, using a fixed confidence level, the
larger the sample size, n, the narrower the confidence interval.
Confidence interval for mean when standard
deviation in unknown
In constructing confidence intervals for
and
are population
parameters and they have to be estimated. When
and
when
is
unknown, when df is large we may use the standard normal distribution. E.g.
sample size of 200 (df is 199).
Estimation problems can be divided into two kinds;
Small sample problems ( sample is less than 30)
Large sample problems (sample is 30 or more)
Example
A stock market analyst wants to estimate the average return on a certain
stock. A random sample of 15 days yields an average (annualized) return of
% 37 . 10 x
and a standard deviation of s = 3.5%. Assuming a normal
population of returns, give a 95% confidence interval for the average return
on this stock.
Solution
[ ] 31 . 12 , 48 . 8
15
5 . 3
145 . 2 37 . 10
2
t t
n
s
t x
Thus the analyst may be 95% sure that the average annualized return on the
stock is any where from 8.43% to 12.31%.
Theorem: Error in estimating
.
By Josephat Peter  UDOM
If
x
is used as an estimate of
, we can then be
( ) % 100 1
confident that the
error will not exceed
n
Z
2 .
Frequently, we wish to know how large a sample is necessary to ensure that
the error in estimating
If
x
is used as an estimate of
, we can be
( ) % 100 1
confident that the
error will not exceed a specified amount e when the sample size is
2
2
,
_
e
Z
n
.
1.2.2 Confidence Interval of the difference
between two Means in Paired and Independent
Samples
If we have two populations with means 1
and 2
and variances
2
1
and
2
2
, respectively, a point estimator of the difference between 1
and 2
is
given by the statistic
2 1
X X
. To obtain 2 1
, we shall select two
independent random samples, one from each population, of size 1
n
and 2
n
and compute the difference, 2 1
x x
, of the sample means.
If the independent samples means (greater than 30) is selected from normal
population, we can establish a confidence interval for 2 1
by
considering the sampling distribution of
2 1
X X
.
We known sampling distribution of
2 1
X X
~
( )
2 1 2 1
,
X X X X
,
,
_
,
_
2
2
2
1
2
1
2 1
2 1
2 1
n n
X X
X X
Then
( ) ( )
,
_
,
_
2
2
2
1
2
1
2 1 2 1
n n
X X
Z
With probability;
By Josephat Peter  UDOM
( ) ( )
( ) ( )
1
1
]
1
<
+
< 1
2
2
2
2 1
2
1
2 1 2 1
2
Z
n n
X X
Z P
Confidence Interval for
2 1
;
2
1
and
2
2
Known
If 1
x
and 2
x
are the means of independent random samples of size 1
n
and
2
n
from populations with known variances
2
1
and
2
2
, respectively, a
( ) % 100 1
confidence interval for 2 1
is given by
( ) ( )
2
2
2
1
2
1
2 2 1 2 1
2
2
2
1
2
1
2 2 1
n n
z x x
n n
z x x
+ + < < + where 2
z
is the
value leaving an area of
2
to the right.
For small sample we use t distribution when the populations are
approximately normally distributed.
Example
A standardized chemistry test was given to 50 girls and 75 boys. The girls made an
average grade of 76 with a standard deviation of 6, while the boys made an average grade
of 82 with a standard deviation of 8. Find a 96% confidence interval for the difference
mean, where the first mean is the mean score of all boys and second mean is score of all
girls who might take this test. (Answer: 57 . 8 43 . 3
2 1
< < )
Small Sample Confidence Interval for
2 1
;
2
1
=
2
2
Unknown
If 1
x
and 2
x
are the means of small independent random samples of size 1
n
and 2
n
respectively, from approximate normal populations with unknown
but equal variances,
( ) % 100 1
confidence interval for 2 1
is given by
( ) ( )
2 1
2 2 1 2 1
2 1
2 2 1
1 1 1 1
n n
s t x x
n n
s t x x
p p
+ + < < +
where p
s
is the
pooled estimate of the population standard deviation and 2
t
is the t value
with
2
2 1
+ n n v
degree of freedom, leaving an area of
2
to the right.
( ) ( )
2
1 1
2 1
2
2 2
2
1 1
2
+
+
n n
s n s n
s
p
Example
A course in statistics is taught to 12 students by the conventional classroom procedure. A
second group of 10 students was given the same course by means of programmed
By Josephat Peter  UDOM
materials. At the end of the semester the same examination was given to each group. The
12 students meeting in the classroom made an average grade of 85 with a standard
deviation of 4, while the 10 students using programmed materials made an average of 81
with a standard deviation of 5. Find a 90% confidence interval for the difference between
the population means, assuming the populations are approximately normally distributed
with equal variances. (Answer:
31 . 7 69 . 0
2 1
< <
)
Small Sample Confidence Interval for
2 1
;
2
1
2
2
Unknown
If 1
x
and
2
1
s
, and 2
x
and
2
2
s
, are the means and variances of small
independent samples of size 1
n
and 2
n
, respectively, from approximate
normal distributions with unknown and unequal variances, an approximate
( ) % 100 1
confidence interval for 2 1
is given by
( ) ( )
2
2
2
1
2
1
2 2 1 2 1
2
2
2
1
2
1
2 2 1
n
s
n
s
t x x
n
s
n
s
t x x + + < < +
where 2
t
is the t value
with
( )
( )
( )
( )
1
1
]
1
+
1
1
]
1
,
_
1 1
2
2
2
2
2
1
2
1
2
1
2
2
2
2
1
2
1
n
n s
n
n s
n
s
n
s
v
degree of freedom, leaving an area of
2
to the
right.
Example
Records for the past 15 years have shown the average rainfall in a certain region of the
country for the month on may to be 4.93 centimetres, with a standard deviation of 1.14
centimetres. A second region of the country has had an average rainfall in May of 2.64
centimetres with a standard deviation of 0.66 centimetres during the past 10 years. Find a
95% confidence interval for the difference of the true average rainfalls in these two
regions, assuming that the observations come from normal populations with different
variances. (answer:
56 . 2 02 . 2
2 1
< <
)
Difference of two means when the samples are not
independent and the variances of the two populations
are not necessary equal
This will be true if the observations in the two samples occur in pairs so that
the two observations are related.
e.g. If we run a test for second year B.Sc. Statistics on a new ST 200 lecturer
using 22 students, the scores before and after form our two sample.
By Josephat Peter  UDOM
Observations in the two samples made on the same students are related and
hence form a pair. To determine the effective of the new lecturer we have to
consider the difference of scores.
e.g. 2. investigate maize output using different fertilizers but same
area/soil/land
Confidence Interval for
2 1
D
for paired observations
If d and d
s
are the mean and standard deviation of the difference of n
random pairs of measurements, a
( ) % 100 1
confidence interval for
2 1
D
is
n
s
t d
n
s
t d
d
D
d
2 2
+ < <
,
Where 2
t
is the t value with
1 n v
degree of freedom, leaving an area of
2
Example
Twenty college freshmen were divided into 10 pairs, each member of the pair having
approximately the same IQ. One of each pair was selected at random and assigned to a
statistics section using programmed materials only. The other member of each pair was
assigned to a section in which the professor lectured. At the end of the semester each
group was given the same examination and the following results were recorded.
Pair 1 2 3 4 5 6 7 8 9 10
Programmed Material 76 60 85 58 91 75 82 64 79 88
Lecturer 81 52 87 70 86 77 90 63 85 83
Find a 98% confidence interval for the true difference in the two learning
procedures. (Answer:
09 . 4 29 . 7 < <
D
)
1.3.1 Confidence Interval of PROPORTION
Sometimes our interest is qualitative rather than quantitative variable.
Interest may be relative frequency of occurrence of some characteristic in a
population. E.g. proportion of population who are users of colgate.
By Josephat Peter  UDOM
A point estimator of the proportion
p
in a binomial experiment is given by
the statistic
n
X
P
.
Therefore for
n
large the distribution of P
is approximately normally
distribution with mean
( ) p
n
np
n
X
E P E
P
,
_
And variance
n
pq
n
npq
n
X
n X
P
2 2
2
2 2
n pq
p P
Z
2
t
Where the sample proportion,
p
, is equal to the number of successes in the
sample,
x
, divided by the number of trials (the sample size),
n
, and
p q 1
.
Example
A market research firm wants to estimate the share that foreign companies
have in the Tanzania market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreignmade products; the rest are users of domestic products. Give 95%
confidence interval for the share of foreign products in this market.
Solution
We have x = 34 and n = 100
34 . 0
100
34
n
x
p
( )( )
t t
100
66 . 0 34 . 0
96 . 1 34 . 0
2
n
q p
z p
[0.2472, 0.4328]
By Josephat Peter  UDOM
Thus, the firm may be 95% confident that foreign manufactures control
anywhere from 24.72% to 43.28% of the market.
Suppose the firm is not happy with such a wide confidence interval. What
can be done about it? Answers: either to increase sample size, if not sample
is to be increased then reduce confident interval say to 90%
Note: When estimating proportions using small samples, the binomial
distribution may be used in forming confidence intervals. Since the
distribution is discrete, it may not be possible to construct an interval with an
exact, prespecified confidence level such as 95% or 99%.
If
p
is used as an estimate of p, then we can be
( ) % 100 1
confident that the
error will not exceed a specified amount
e
when the sample size is
2
2
2
e
q p z
n
e.g. How large a sample is required if we want to be 95% confident that our
estimate of
p
is within 0.02? let p = 0.32
Solution
( ) ( )( )
( )
2090
02 . 0
68 . 0 32 . 0 96 . 1
2
2
n
Since sample size is obtained after estimating p, some time it is not
possible to estimate p (p is not given and cant be computed) therefore
the following technique will be used
2
2
2
4e
z
n
and variance
2
2 2
1
1 1
2
2 1
n
q p
n
q p
P P
+
where
( ) ( )
( ) ( )
2 2 2 1 1 1
2 1 2 1
n q p n q p
p p P P
Z
+
N
n N
Note: Correction factor is close to 1 when the sample size is small relative to
the population size. The expression approaching zero as the sample
size approaches the population size as required.
A large sample 32confidence interval for
t
N
n N
n
s
z x
A large sample
( ) % 100 1
confidence interval for p using a finite
population correction:
1
t
N
n N
n
q p
z p
Example
A company has 1000 accounts receivable. To estimate the average amount
of these accounts, a random sample of 100 accounts is chosen. In the
sample, the average amount is
35 . 532 x
units and the standard deviation is
22 . 61 s
units. Give a 95% confidence interval for the average of all 1000
accounts.
Solution
Sampling fraction is
10 . 0
1000
100
N
n
. Since fraction is grater than 0.05, we
need to use a confidence interval with a finite population correction factor.
t
,
_
,
_
t 39 . 11 35 . 532
999
900
10
22 . 61
96 . 1 35 . 532
1
2
N
n N
n
s
z x
[520.96, 543.74]
By Josephat Peter  UDOM
1.3.1 CONFIDENCE INTERVAL OF VARIANCE
In some situations, our interest centers on the population variance
(population standard deviation) this happen in production processes, queuing
processes and other situations.
To compute confidence intervals for the population variance, we must have
knowledge of chi square denoted by
2
S n
has
a chisquare distribution with n 1 degrees of freedom.
The probability that a random sample produces a
2
2 1
2
2
2
We are asserting that ( ) < < 1 2
2 2
2 1
2
X P , substituting the value of
2
we get
( ) ( )
,
_
< <
1
1 1
2 1
2
2
2
2
2
2
S n S n
P
.
A
( ) % 100 1
confidence interval for the population variance
2
(where the
population is assumed normal):
( ) ( )
1
]
1
2 1
2
2
2
2
2
1
,
1
S n S n
where 2
2
is the value
of the chisquare distribution with n 1 degree of freedom that cuts off an
area of
2
to its right and 2 1
2
is the value of the distribution that cuts
off an area of
2
to its left (equivalently, an area of
2 1
to its right).
By Josephat Peter  UDOM
Since
2
1
]
1
We can be 95% sure that the population variance is between [11765, 33604]
By Josephat Peter  UDOM
1.3.2 CONFIDENCE INTERVAL OF RATIO OF two
Variances
A point estimate of the ration of two population variances
2
2
2
1
is given
by the ratio
2
2
2
1
s s of the sample variances. If
2
1
and
2
2
are the variances
of a normal populations, we can establish an interval estimate of
2
2
2
1
by
using the statistic
2
2
2
1
2
1
2
2
S
S
F
, with
1
1 1
n v
and
1
2 2
n v
degrees of
freedom.
The number of degrees of freedom associated with numerator is stated first
followed by sample variance in denominator. The curve of F depends not
only on the two parameters 1
v
and 2
v
but also on the order of which we
state them.
The distribution is similar to chisquare, that it is not symmetric. It is
represented similar to chisquare.
Writing
( )
2 1
, v v f
for
f
with
1
v
and
2
v
degrees of freedom, then
( )
( )
1 2
2 1 1
,
1
,
v v f
v v f
.
1
2
2
0 f
2 1
f
2
f
We can establish a confidence interval for
2
2
2
1
as
( ) ( ) [ ]
< <
1 , ,
2 1 2 2 1 2 1
v v f F v v f P
Where
( )
2 , 1 2 1
v v f
and
( )
2 1 2
, v v f
are the values of F distribution with
1
v
and 2
v
degrees of freedom.
Substituting for F we get,
( ) ( )
1
]
1
< <
1 , ,
2 1 2
2
2
2
1
2
1
2
2
2 1 2 1
v v f
S
S
v v f P
By Josephat Peter  UDOM
Hence Confidence interval is
( )
( )
1 2 2
2
2
2
1
2
2
2
1
2 1 2
2
2
2
1
,
,
1
v v f
s
s
v v f
s
s
< <
Example
A standardized placement test in ST 205 was given to 11 female and 80
male. Female made an average grade of 82 with a standard deviation of 8,
while males made an average grade of 78 with a standard deviation of 7.
Find 98% confidence interval for
2
2
2
1
and 2 1
, where
2
1
and
2
2
are
the variances of the populations grades for all female and male, respectively.
Assume the population to be normal.
Solution
11
1
n
,
79
2
n
,
8
1
s
,
7
2
s
For 98% means
02 . 0
Reading from the table,
( ) 47 . 2 79 , 10
01 . 0
f
(this is assumed to be since 79 is
not shown)
( ) 4 10 , 79
01 . 0
f
( ) 4
49
64
47 . 2
1
49
64
2
2
2
1
< <
,
_
:
1
]
1
+
n
s
z x
,
A left hand
( ) % 100 1
confidence interval for
: 1
]
1
+ ,
n
s
z x
Note:
replaces
2
because we have only one side where an error of
probability
, ,
as well as other sample
statistics are used as estimators of population parameters. May be we can
ask ourselves that, are some of many possible estimators better in some
sense than the other?
There are several criteria by which we can evaluate the quality of a statistic
as an estimator. We are going to discuss: Unbiasedness, efficient, sufficient,
minimum variance, Cramer Rao inequality and Consistency.
Unbiasedness
This is very important property that an estimator should possess. If we
take all possible samples of the same size from a population and calculate
their means, the mean x
of the population.
Repeated samples are drawn by resampling while keeping the values of
the independent variables unchanged. Bias is often assessed by
characterizing the sampling distribution of an estimator.
Definition:
An estimator is said to be unbiased if its expectation value is equal to the
population parameter it estimates.
 An estimator
is said to be unbiased if
( )
E
.

( ) x E
This is to say that the sample mean
x
is an unbiased estimator of the
population mean.
This is an important property of the estimator because it means that there
is no systematic bias away from the parameter of interest.
Suppose we take the smallest sample observation as an estimator of the
population mean
,
( ) < X E
. Thus the estimator is biased downwards.
The extent of bias (systematic deviation) is the difference between the
expected value of the estimator and the value of the parameter,
By Josephat Peter  UDOM
( ) X E Bias
Also
( ) ( ) ( ) ( )
E E E E Bias
i.e.
( )
E B
Any systematic deviation of the estimator away from the parameter of
interest is called bias.
is said to be unbiased if
( ) 0
E
Note: In reality we usually sample once and obtain our estimate.
Consistency
An estimator is said to be consistent if its probability of being close to the
parameter it estimates increases as the sample size increases.
The sample mean,
x
, is a consistent estimator of
increases.
 A consistent estimator is one that concentrates in a narrower and
narrower band around its target as sample size increases indefinitely.
Mean Squared Error (MSE)
Mean squared error of estimators is defined as ( )
2
E
We known that;
( ) ( ) [ ] ( ) [ ]
2 2
+ B E E E Var (Refer that
( )
E B
, B is Bias)
( ) ( )
2
B E Var
( ) ( ) [ ]
( ) ( ) ( )
( ) ( ) ( ) ( )
2
2
2
2
2
2
B E BE E Var
B B E Var
B E Var
+
1
]
1
+
By Josephat Peter  UDOM
( ) ( ) ( ) ( ) [ ]
( ) ( ) ( ) [ ]
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
2
2
2
2
2
2
2
2
2
2
2
2
B Var MSE E
B E Var
B B B E Var
B E B E Var
B E E B E Var
+
+
+
MSE = variance of estimator +
( )
2
bias
If the estimator is unbiased, then
( ) ( )
Var MSE
Example
A company has 4,000 employees whose average month wage comes to
Tshs 480,000 with a standard deviation of Tshs 120,000. Let
x
be the
mean monthly wage for a random sample of certain employees selected
from this company. Find the mean and standard deviation of
x
for a
sample size of 40 and 100.
Solution
000 , 4 N
,
000 , 480 Tshs
and
000 , 120 Tshs
For sample size of 40
The samples mean
000 , 480 Tshs
x
40 n
000 , 4 N
which gives
01 . 0
N
n
. As this value is less than 0.05
correction factor is not considered.
34 . 18987
40
000 , 120
n
x
From the example we learn that sample mean is equal to the population
mean regardless of the sample size. Standard deviation is usually affected
by sample size, and as sample size increase it decreases.
Efficiency (not real the same as minimum variance)
By Josephat Peter  UDOM
(Remember efficiency differ with consistency because efficiency is
based on relative frequency i.e. comparisons between two
estimators).
Efficiency is a relative property. We say that one estimator is efficient
relative to another. This means that the estimator has a smaller variance
(also standard deviation) than the other. Efficient is measured in terms of
size of the standard error of the statistic. Since an estimator is a random
variable, it is necessarily characterized by a certain amount of variability.
This is to say some estimates may be variable than others.
Definition
An estimator is efficient if it has relatively small variance (and standard
deviation).
If 1
and 2
if ( ) ( )
2 1
Var Var . Usually the estimator is selected based on MSE.
For example
In large samples, the variance of the sample mean is
( ) n x v
2
. As the
sample size increases, the variance becomes smaller, so the estimate
becomes more efficient.
Consider the probability distribution of the two estimators A and B.
A
B
x
Curve A shows the distribution of sample means. It is more precise
estimator as compared to curve B.
Estimator A is biased, though it may yield an estimate that will be close
to the true value (though it is likely to be wrong).
By Josephat Peter  UDOM
Estimate B though unbiased, can give estimates that are far away from
the true value.
As such we would prefer estimate A.
e.g. The sampling distribution of mean and median have the same mean,
that is population mean. However, the variance of the sampling
distribution of the means is smaller that the variance of the sampling
distribution of the medians. As such the sample mean is an efficient
estimator of the population mean, while the sample mean is an inefficient
estimator.
More examples
 The sample mean Y
and 1
Y
both are unbiased, which estimator is more efficient
than the other?
Sufficiency
An estimator is said to be sufficient if it contains all the information in
the data about the parameter it estimates.
x
is a sufficient statistics because it utilizes all the information a sample
contains about the parameter to be estimated. We say
x
is a sufficient
estimator of the population mean
is
defined to be uniformly minimum variance unbiased estimator of
if
and only if ;
i. ( )
E (that is unbiased)
ii.
( ) ( ) Var Var
be unbiased
estimator of
. We consider
( ) ; x f
as the probability density function that
satisfies the following assumption, called regularity conditions:
i.
( )
; ln x f
d
d
exists for all x and all
ii.
( ) ( )
n
i
n
i
n i n i
dx dx x f
d
d
dx dx x f
d
d
1 1
1 1
... ; ... ... ; ...
iii.
( ) ( ) ( ) ( )
n
i
n i n
n
i
n i n
dx dx x f
d
d
x x l dx dx x f x x l
d
d
1
1 1
1
1 1
... ; ,..., ... ... ; ,..., ...
iv.
( ) <
1
1
]
1
1
]
1
<
2
; ln 0
X f
d
d
E
for all
The above assumptions based on continuous density function, it applies the
same to discrete density function.
Under the assumptions above;
( )
( )
1
1
]
1
,
_
2
, ln
1
x f
d
d
nE
Var
The above expression is what is called Cramer Rao Inequality. The
right hand side is called Cramer Rao lower bound for the variance of
unbiased estimators.
The CramrRao (Cramer Rao lower bound) is a limit to the
variance that can be attained by an unbiased estimator of a parameter
of a distribution.
Given a certain estimator we expect it to have low Mean Squared Error
(MSE). But the question is; what is the smallest variance that an
estimator can be attained by unbiased estimator of? An answer is given
by CramaerRao inequality.
By Josephat Peter  UDOM
Example
Let n
X X ,.....,
1 be a random sample from ( ) ( )
!
; ;
x
e
x f x f
x
for x =
0, 1, 2,
Solution
( )
!
ln ; ln
x
e
d
d
x f
d
d
x
( )
x
x x
d
d
+ + 1 ! log ln
Therefore;
( ) ( ) [ ] [ ]
1 1
var
1 1
1 ; ln
2 2
2
2
2 2
1
1
]
1
,
_
1
1
]
1
,
_
X X E
X
E X f
d
d
E
Hence
( )
( ) n n
1
1
var
this is CramerRao
lower bound.
2.6 Minimum Variance
???????????????????????????????????????????????????????
APPLICATION OF THE PROPERTIES OF ESTIMATORS
Normally Distributed Population
Normal population implies symmetric.
Unbiased
Both sample mean and median are unbiased estimators of the population
mean
.
Efficiency
Mean is more efficient than the sample median. This is because the variance
of the sample median happens to be 1.57 times as large as the variance of the
sample mean.
i.e.
n
median Var
2
57 . 1 ) (
By Josephat Peter  UDOM
Sufficiency
Sample mean is more sufficient than median because its computation use the
entire data set. Median is not sufficient because it is found as the point in the
middle of the data set regardless of the exact magnitudes of all other data
elements.
Consistence
Mean is also consistent.
Proportion
Unbiased
The sample proportion P
.
It is also has the smallest variance of all unbiased estimators of p.
Sample Variance,
2
S
( )
1
2
2
n
X X
S
It seems logical to divide the sum of squared deviations by n rather than n 1
because we are seeking the average squared deviation from the sample
mean. The reasoning for this is explained by the concept of degree of
freedom.
But if we divide by n 1,
2
S become unbiased, and if we divide by n,
2
S
becomes biased.
Note: As we known that
2
S is an unbiased estimator of the population
variance
2
, the sample standard deviation S is not an unbiased estimator of
the population standard deviation
.
Degree of freedom
The number of degrees of freedom is equal to the total number of
measurement (these are not always raw data points), less the total number
of restrictions on the measurements. A restriction is a quantity computed
from the measurements.
By Josephat Peter  UDOM
e.g. given 10, 12, 16 and 18 where its mean is14. We are able only to
compute one unknown value let say 10 + 12 + 16 + x = 14.
If we have two samples and we known its means, degrees of freedom
becomes
( ) ( ) 2 1 1
2 1
: 2
2
: 1
1
+ + n n n n
df sample df sample
Topic 3: TESTING OF HYPOTHESES
Concept of Hypothesis
A hypothesis is a proposition that we want to verify. Collection of relevant
information is required, process it using statistical techniques and then test
By Josephat Peter  UDOM
the proposition. Hypothesis helps us to proper decision making. It is very
helpful in examining the validity or theories. Hypothesis is not always
necessary except for problem oriented study.
There are two types of hypothesis; Null and alternative. Amir (1989) defines
a null hypothesis as an assertion about one or more population parameters.
This is the assertion we hold as true until we have sufficient statistical
evidence.
Normally null hypothesis is denoted by 0
H
. This is the hypothesis of no
effects or difference. Consider the following current saga in Bagamoyo
District Council, some officials are accused for fund misuse, if they are
taken before magistrate (before verdict) the persons are considered not
committed fund misuse. So the statement Bagamoyo District Council
officials did not misuse fund is called the null hypothesis.
Alternative hypothesis, denoted by 1
H
, is the assertion of all situation not
covered by the null hypothesis (Amir, 1989). Beri (2003) definite alternative
hypothesis as the opposite of the null hypothesis. From an example of
Bagamoyo saga, alternative hypothesis is Bagamoyo District Council
officials misused fund.
Generally, whenever null hypothesis is specified, alternative hypothesis
must also be specified. It should be noted that it is not possible for null and
alternative hypotheses to be true at once. There are only two ways we can
make a conclusion on a proposition: either not to reject the null hypothesis
that means alternative hypothesis is untrue or reject the null hypothesis
while accepting the alternative hypothesis.
It is possible to have two or more alternative hypothesis but should be tested
one at a time against null hypothesis.
Both in the null and alternative hypothesis, the sample statistics such as
p x,
are not used. Instead the population parameter such as
p ,
Example (statistical example)
Consider that a drug manufacturing company has installed a machine that
fills automatically 5 grams in a small bottle.
Solution
At the beginning we are assuming that what the company claims is true.
Thus;
5 :
0
H
By Josephat Peter  UDOM
5 :
1
H
Procedure in hypothesis testing
There five steps involved in testing a hypothesis.
1. Formulate a hypothesis. This is the first step where setting of two
hypotheses should be done, i.e. 0
H
and 1
H
.
2. Set up a suitable significance level. In testing validity of
hypothesis we need a certain level of significance. The confidence
level with which a null hypothesis is rejected or accepted depends
upon the significance level used for the purpose. E.g. a
significance level of 5% means that we have about 5% of making
wrong decision, accepting a false hypothesis or rejecting a true
hypothesis.
3. Select test criterion. Selection of appropriate statistical technique
as a test criterion is the third step. We know that there is a lot of
statistical test, ztest for >30 and ttest for <30 etc. Statistical test
normally used in hypothesis testing are Z, t, F and
2
.
4. Compute. Computation of testing statistic and other necessary
computations.
5. Make Decision. This is the final step where statistical decision is
made involving the acceptance or rejection of the null hypothesis.
This depends whether the computed value of the test criterion falls
in the region of acceptance or in the region of rejection at a given
level of significance. The statement rejecting the hypothesis is
stronger than the statement accepting it. It is much easier to prove
something false than to prove it true.
Often, we wish to test the null hypothesis and see whether we can reject it in
favour of the alternative hypothesis. In a test of the value of the population
parameter we normally employ a test statistics.
A test statistic is a sample statistics computed from the data. The value of
the test statistic is used in determining whether or not we may reject the null
hypothesis.
We decide whether or not to reject the null hypothesis by following a rule
called the decision rule.
The decision rule of a statistical hypothesis test is a rule that specifies the
conditions under which the null hypothesis may be rejected.
By Josephat Peter  UDOM
Two types of errors in hypothesis testing
In testing a hypothesis there are four possibilities;
1. The hypothesis is true but our test leads to its rejection
2. The hypothesis is false but our test leads to its acceptance
3. The hypothesis is true and our test leads to its acceptance
4. The hypothesis is false and our test leads to its rejection
The first two leads to an erroneous decision. The first probability leads to a
Type I
( )
error and the second possibility leads to a Type II error
( )
.
State of Nature
Decision 0
H
is true 0
H
is false
Accept 0
H
Correct decision Type II error
( )
Reject 0
H
Type I error
( )
correct decision
i.e.
P (Reject 0
H
; 0
H
is true)
P (Accept 0
H
; 0
H
is false)
Note
The word accept above is in order.
Usually before carrying out the actual test to try to reject the null
hypothesis, the probability that we will make type I error is known.
This probability
( )
is preset small say 0.05. Knowing the probability
of making type I error i.e. to reject a null hypothesis which should not
be rejected, makes our rejection of a null hypothesis a strong
conclusion.
We can not say that we are accepting the null hypothesis because we
do not know the probability of making type II error
( )
, i.e. fail to
reject a false null hypothesis this is weak conclusion.
When we reject the null hypothesis, we feel fairly confident that the
hypothesis should indeed be rejected. When we fail to reject the null
hypothesis, we feel that we did not have enough evidence to reject the
hypothesis. Either the null hypothesis is indeed true, or more
evidence is needed for it to be rejected.
We emphasize, however, that accept will mean that there is not
evidence to reject the null hypothesis.
Note
By Josephat Peter  UDOM
The level of significance has a big role in committing either of these two
errors. If we choose a level of significance which is very small (we are
avoiding making type I error) we increase the probability of committing type
II error. Similarly, if level of significance is high (avoiding type II error)
there is an increase of making type I error. The solution is to choose the
level of significance which is not too small or big. The only way to get rid of
this is to increase sample size.
Definition
The level of significance of a statistical hypothesis test is
, the probability
of committing a type I error.
Definition
The rejection region of a statistical hypothesis test is the range of numbers
that will lead us to reject the null hypothesis in case the test statistic falls
within this range. The rejection region, also called the critical region, is
defined by the critical points. The rejection region is designed so that,
before the sampling takes place, our test statistic will have a probability
of falling within the rejection region if the null hypothesis is true.
Rejection region Acceptance Rejection region
region
Tabulated value Tabulated value
Definition
The acceptance region is the range of values that will lead us not to reject
the null hypothesis if the test statistic should fall within this region. The
acceptance region is designed so that, before the sampling takes place, our
test statistic will have a probability
( ) 1
of falling in the acceptance region
if the null hypothesis is true.
Tails of a test
Rejection region in hypothesis can be on both sides of the curve
with the nonrejection region in between the two rejection regions.
By Josephat Peter  UDOM
A hypothesis test with two rejection regions is called a twotail test
and a test with one rejection region is called a onetail test. The
one rejection region can be either of the regions, right (right tail
test) or left (left tail test).
How to find out that a particular test is a twotail, right or left tail
test?
Signs in the tails of a test
Two tail test left tail test right tail test
Sign in the 0
H
= = or
= or
Sign in the 1
H < >
Rejection region in both tails in the left tail in the right tail
e.g.
45
45
1
0
H
H
Note: We say that a statistical result is significant at the level of
significance
.
Testing Hypotheses about Mean (large sample)
Consider the problem of testing the hypothesis that the mean
of a
population, with known variance
2
equals a specified value 0
against
two sided alternative that the mean is not equal to 0
.
i.e.
0 1
0 0
:
:
H
H
An appropriate statistic on which we base our decision criterion is the
random variable X . By using the significance level of
, it is possible to
find two critical values, 1
x
and 2
x
, such that the interval 2 1
x x x
defines
the acceptance region and the two tails of the distribution, 1
x x <
and 2
x x >
,
constitute the critical region.
By Josephat Peter  UDOM
The critical region is given in terms of Z values by the means of
transformation
n
x
z
0 2
2
0 1
2
From the population we select a random sample of size n and compute the
sample mean
x
.
1
2
2
1
x
0
2
x
2
z
2
z
Example
A company manufacturing automobile tyres finds that tyreslife is normally
distributes with a mean of 40,000 km and standard deviation of 3,000 km. It
is believed that a change in the production process will result in a better
product and the company has developed a new tyre. A sample of 100 new
tyres has been selected. The company has found that the mean life of these
new tyres is 40,900 km. Can it be concluded that the new tyre is
significantly better that the old ones, using the significance level of 0.01?
Solution
We are interested in testing whether or not there has been an increase in the
mean life of tyre.
Steps
1.
km H
km H
000 , 40 :
000 , 40 :
1
0
>
n
x
z
5. z tabulated =
33 . 2
01 . 0
z z
. If we compare with z computed we see
that z computed is greater that z tabulated, then we reject the null
hypothesis.
i.e. since tabulated computed
z z >
we reject the null hypothesis
that
km 000 , 40
. That means that the new tyre is significantly
better than the old ones.
33 . 2 z
The power of statistical test
The power of a statistical test, given as
1
= P (reject 0
H
when 0
H
is
false) measures the ability of the test to perform as required.
1
is called the power of the function
When
1
is low (a value very close to zero) it is an indication that our
hypothesis test is working poorly. In contrast if
1
is large (very close to
1), we can be sure that our hypothesis test is working quite well.
The power of a statistical hypothesis test depends on the following factors;
The power depends on the distance between the value of the
parameter under the null hypothesis and the true value of the
parameter in question. The greater this distance, the greater the power.
The power depends on the population standard deviation. The smaller
the population standard deviation, the greater the power.
The power depends on the sample size used. The larger the sample,
the grater the power.
The power depends on the level of significance of the test. The
smaller the level of significance,
:
By Josephat Peter  UDOM
n s
x
t
0
When the population is normally distributed and the null hypothesis is true,
the test statistic has a t distribution with
1 n
degrees of freedom.
Example
A manufacturer of electric batteries claims that the average capacity of a
certain type of battery that the company produces is at least 140 ampere
hours with a standard deviation of 2.66 amperehours. An independent
sample of 20 batteries gave a mean of 138.47 amperehours. Test a 5 percent
significance level the null hypothesis that the mean life is 140 amperehours
against alternative that it is lower. Can the manufacturers claim be sustained
on the basis of this sample?
Solution
:
0
H
The mean life of batteries is 140 amperehours
:
1
H
The mean life of batteries is < 140 amperehours
Level of significance:
05 . 0
Test statistic: t
Computation:
57 . 2
47 . 4 66 . 2
53 . 1
20 66 . 2
140 47 . 138
n
x
t
computed
729 . 1
19 , 05 . 0 1 ,
t t t
n tabulated
19 , 05 . 0
t
140
1.729
57 . 2
computed
t
We reject the null hypothesis since computed
t
is within rejection region. Hence
we conclude that the sample mean is less than 140 amperehours.
Testing Hypotheses about DIFFERENCE BETWEEN TWO
POPULATION Means
Individual means is referred to as a one sample test. In some cases we are
required to test whether there is any difference between two means, in such a
case we need samples from each group. This is known as two sample tests.
By Josephat Peter  UDOM
The procedure for testing the hypothesis is similar to that used in onsample
tests. Here, we have two populations and our concern is to test the claim as
to a difference in their sample means.
e.g. the government may claim that there is no difference between the
average monthly pension of its central and local government retired
employees.
From the example we have average monthly pension for central government
employees
( )
1
. We take a random
samples of size 1
n
and 2
n
and determine their means 1
x
and 2
x
along
with sample standard deviations
( )
1
s
and
( )
2
s
.
When
30 > n
the Z statistic takes the following form
( ) ( ) ( ) ( )
2 1
2 1
2
2
2
1
2
1
2 1
2
2
2
1
2
1
2 1 2 1
x x
x x
n n
x x
n n
x x
Z
, when 2 1 0
: H
1
and 2
are unknown.
Example
A potential buyer wants to decide which of the two brands of electric bulbs
he should buy as he has to buy them in bulk. As a specimen, he buys 100
bulbs of each of the two brands A and B. On using these bulbs, he finds
that brand A has a mean life of 1,200 hours with a standard deviation of 50
hours and brand B has a mean life of 1,150 hours with a standard deviation
of 40 hours. Do the two brands differ significantly in quality? Use 0.05
Solution
Step 1
2 1 1
2 1 2 1 0
:
0 :
H
H
Where 1
n n
x x
Z
Step 5: decision
This is two tails, Z value is
96 . 1 t
. The calculated Z value falls in the
rejection region, then we reject the null hypothesis and therefore conclude
that the bulbs of two brands differ significantly in quality.
When
30 n
, t test is used
( ) ( )
( )
2 1
2 1 2 1
2 1
2 1
2 1
2
2 2
2
1 1
2 1 2 1
2
n n
n n x x
t
n n
n n
n n
s n s n
x x
t
+
+
+
+
, where
2
2 1
2
2 2
2
1 1
+
+
n n
s n s n
Testing Hypotheses for the population proportion (large
sample)
We known that, when the sample size is large, the distribution of the sample
proportions, P
or
0 0
0
q np
np x
Z
(binomial approximation)
We use 0
p
 the hypothesized value of p under the null hypothesis.
Example
A commonly prescribed drug on the market for relieving nervous tension is
believed to be only 60% effective. Experimental results with a new drug
administered to a random sample of 100 adults who were suffering from
nervous tension showed that 70 received relief. Is this sufficient evidence to
By Josephat Peter  UDOM
conclude that the new drug is superior to the one commonly prescribed? Use
0.05
Solution
6 . 0 :
6 . 0 :
1
0
>
p H
p H
05 . 0
Critical region:
645 . 1 > z
Computed
( )( )( )
04 . 2
4 . 0 6 . 0 100
60 70
z
Decision: Reject null hypothesis and conclude that the new drug is superior.
(z computed > z tabulated)
Testing Hypotheses for the DIFFERENCE BETWEEN TWO
proportion
The test statistics z for test concerning differences between two population
proportions
( ) ( )
2
2 2
1
1 1
2 1 2 1
n
q p
n
q p
p p p p
z
+
When
0 :
2 1 2 1 0
p p p p H
, then the test statistics z is
,
_
2 1
0
2 1
1 1
n n
q p
p p
z
o
, where pooled
2 1
2 2 1 1
0
n n
p n p n
p
+
+
, 0 0
1 p q
e.g.
You obtain a large number of components to an identical specification from
two sources. You may notice that some of the components are from the
suppliers own plant in Msalato and some are from the plant located Makuru.
You would like to know whether the proportions of defective components
are the same or there is a difference between the two. You take a random
sample of 600 components from each plant and find that the rejection rate
1
p
is 0.015 for Msalato components as compared to 2
p
= 0.017 for
Makuru component. Set up the null hypothesis and test it at 5 percent level
of significance.
Solution
2 1 1
2 1 0
:
:
p p H
p p H
, where 1
p
and 2
p
are the proportions of defective
Components from Msalato and Makuru respectively
This is two tails test
By Josephat Peter  UDOM
Level of significance is 0.05, both large samples
Z tabulated is
96 . 1 t
Z computed
,
_
2 1
0
2 1
1 1
n n
q p
p p
z
o
,
2 1
2 2 1 1
0
n n
p n p n
p
+
+
( )( ) ( )( )
016 . 0
1200
2 . 10 9
600 600
017 . 0 600 015 . 0 600
0
+
+
+
p
( )
0276 . 0
005248 . 0
002 . 0
600
1
600
1
016 . 0 1 016 . 0
017 . 0 015 . 0
,
_
z
We do not reject the null hypothesis since z computed does not fall in the
rejection region. Thus, there is no difference in the rejection rates of
components from Msalato and Makuru.
Testing Hypotheses for POPULATION VARIANCE
Some times we may be interested to draw conclusion on whether population
variance exceeds some level.
Test statistics for the population variance:
( )
2
0
2
2
1
s n
Where
2
0
s n
Reading chi square with
05 . 0
and 30 df the value is 43.77
reject
null hypothesis
For
025 . 0
, chisquare tabulated is 46.98
and
2
. The data pertaining to the two processes are given below
Test that the variability of the two processes is the same.
Solution
2
2
2
1 1
2
2
2
1 0
:
:
H
H
05 . 0
Test statistic is F
Computations:
( ) ( )
( ) ( )
46 . 1
2625
3840
20 52500
15 57600
1 21
50 21
1 16
60 16
1
1
2
2
2
2
2 2
1
2
1 1
2
2
2
1
n
s n
n
s n
s
s
F
This is a two tail test, then 1.46 is compared with
20 . 2
20 , 15 , 05 . 0
F
As 2.20 is greater than 1.46 we do not reject the null hypothesis indicating
that there is no significant in the variability of the two samples.
THE pVALUE
So far we have been arbitrary specifying level of significance. As such mere
acceptance or rejection of a hypothesis fails to show the full strength of the
sample evidence.
Alternative is to use pvalue approach.
Example
Let n = 600
0096 . 0 :
0096 . 0 :
1
0
>
p H
p H
By Josephat Peter  UDOM
Sample A Sample B
16
1
h 21
2
h
hr x 1200
1
hr x 1300
2
hr 60
1
hr 50
2
( )
519 . 0
0 0
0
n
q p
p p
z
If we let
10 . 0
, the critical point is
28 . 1 + z
we do not reject the null
hypothesis
Again if
05 . 0
, critical point is 1.645, we do not reject the null hypothesis.
Question: is it not possible to accept null hypothesis at larger value of 0.1?
Answer: we can simply compute the smallest possible
at which we may
reject the null hypothesis.
If that is the case at which level of
N
n N
by multiplying the
standard error so long as sample size, n, represents 5% or more of the
population.
Sample size determination for hypothesis tests:
By Josephat Peter  UDOM
The minimum required sample size in hypothesis tests of
to satisfy a
given significance level and a given power:
( )
2
1 0
1 0
1
]
1
z z
n
Where 0
z
and 1
z
are the required z values determined by the probabilities
and
and 1
,
_
p p
q p z q p z
n
Where 0
z
and 1
z
are the required z values determined by the probabilities
and
statistic is limited.
The
2
test is neither capable nor expected to detect fit inefficiencies for all
possible problems. This is a powerful and versatile tool but it should not be
considered as the ultimate solution to every goodness of fit problem.
By Josephat Peter  UDOM
e.g. Consider the tossing of a die. If we hypothesized that the toss is fair
(uniform distribution of outcomes) and the die is tossed 120 times, then we
expect that each face will occur 20 times.
Faces
1 2 3 4 5 6
Observed 20 22 17 18 19 24
Expected 20 20 20 20 20 20
By comparing the observed frequencies with the corresponding expected
frequencies we must decide whether these discrepancies are likely to occur
due to sampling fluctuations and the die is balanced, or the die is not honest
and the distribution of outcomes is not uniform.
The appropriate statistic on which we base our decision criterion for an
experiment involving k cells is defined as:
A goodness of fit test between observed and expected frequencies is
based on the quantity
( )
k
i i
i i
E
E O
1
2
2
Where
2
E
E O
488 . 9
2
05 . 0 , 4
2
, 1 5
Since chi square computed < chi square tabulate we accept the null
hypothesis.
Test for Independence
 Chi square test is also used to test for independence of two
variables.
 The observed frequencies are presented in a contingency
table
Row/column
 Expected frequencies
( )
i i
i i
E
E O
2
2
level of
significance; otherwise, accept the null hypothesis.
Test for Independence
The chisquare statistic for testing independence is also applicable when
testing the hypothesis that k binomial populations have the same parameter
p. Hence we are interested in testing the hypothesis k
p p p H .......
2 1 0
against the alternative hypothesis that the population proportions are not all
equal.
By Josephat Peter  UDOM
To perform this test we first select independent random samples of size
k
n n n ,...., ,
2 1 from k populations and arrange the data in the 2 x k
contingency table.
The expected cell frequencies are calculated as above and substituted
together with the observed frequencies into the chisquare formula for
independence,
( )
i i
i i
E
E O
2
2
with
( )( ) 1 1 1 2 k k v
degrees of
freedom. Conclusion is reached with
2 2
>
By Josephat Peter  UDOM