You are on page 1of 180

CHAPTER 4

Estimation
ESTIMATION
 Sampling process - used to draw statistical
inference about the characteristics of a
population or process of interest.
 One can not have enough information to calculate
an exact value of parameters (such as μ, σ and P)
and hence, make the best estimate of this value
from the corresponding sample statistics (such as
x̅ , s, and p^).
 Using sample statistic to draw conclusions
about Parameters is the fundamental
applications of statistical inference in business
and economics.
FOR INSTANCE, STATISTICAL ESTIMATION
COULD BE USED IN THE FOLLOWING CASES:

A bank needs to understand the


proportion of consumers aware
of its services and credit
schemes.
Any service center needs to
determine the average amount
of time a customer spends in
queue.
ESTIMATION

In all such cases, a decision-


maker needs to examine the two
concepts of estimation and
hypothesis testing that are
useful for drawing statistical
inference about an unknown
population or process
parameters based upon random
samples.
IN THIS SECTION

 Methods to estimate unknown


population parameters and
 Determining the range of values
(confidence interval) likely to contain
the parameter values will be
discussed.
ESTIMATION …

Estimation is a procedure of
assigning numerical values to
a population parameter
based on information
collected from a
corresponding random
sample statistic.
TWO TYPES OF ESTIMATES THAT WE CAN
MAKE ABOUT A POPULATION:

1. A point estimate and


2. An interval estimate.
 Point estimation - statistical
procedure in which we use a single
value to estimate unknown population
parameter.
 A single number that is used as an
estimate of unknown population
parameter
 it is obtained from random sample data.

ESTIMATION …. POINT … INTERVAL
 Instead of saying that the mean housing
expenditure per month for all households is Birr
450, we obtain an interval by subtracting from
and adding to Birr 450.
 Then we state that this interval contains the
population mean µ.
 For purposes of illustration, we can subtract and
add Birr 50 to Birr 450.
 Then, we obtain the interval Birr 400 – Birr 500.

 This interval is likely to contain the population


mean µ and the procedure is called interval
estimation.
 The value Birr 400 is called the lower limit of the
A DRAWBACK OF POINT ESTIMATE
COMPARED TO INTERVAL ESTIMATE

 Point estimate is based on single


element chosen from a sampling
distribution (not range of values), but the
fact is the unknown parameter may be
above or below the estimate.
 It conveys little information about the
accuracy of the estimate; it does not tell as
to how confident we can be that the
estimate is close to the parameter it is
estimating.
INTERVAL ESTIMATION …
 On the contrary, interval
estimation
 Gives the estimate in ranges or
intervals and
 Specifies the level of confidence
concerning the reliability of the
estimate.
 When we make an estimate of a
population parameter, we use a
sample statistic.
THIS SAMPLE STATISTIC IS AN ESTIMATOR,
THE BEST ESTIMATOR SHOULD BE HIGHLY RELIABLE
AND HAVE THE FOLLOWING DESIRABLE PROPERTIES:


PROPERTIES …

PROPERTIES …
 Consistency: As n increases the
standard error decreases and the
probability of being close to the
parameter it estimates increases.
 Sufficiency: Using all information
about population available in the
sample.
 E.g. the mean is so (unlike the mode
and median)
TO OVERCOME THE DRAWBACK OF POINT
ESTIMATION, INTERVAL ESTIMATION IS
USED.

 Interval estimation: A statistical


procedure in which we find a
random interval with a specified
probability of containing the
parameter being estimated.
 Interval estimate: range of values
within which the population
parameter is expected to occur ( it
has upper and lower bounds).
CONFIDENCE INTERVAL:

A range of data constructed from sample


data so the parameter occurs within that
range at a specified probability.
 The specified probability is called the level of
confidence it is denoted by 1- 𝛼.
 Confidence level in decimal form is called
confidence coefficient.
 More common values are 90%, 95% & 99%.
 The corresponding confidence coefficient are
0.9 0.95 & 0.99.
 𝛼 is called the significance level.
CONFIDENCE LEVEL
 Confidence Levels (1- 𝑎): are
probabilities specifying the level of
accuracy of the interval estimate;
 Degree of sureness that the interval
estimate contains the parameter being
estimated.
 Although any value of confidence level
can be chosen, popular confidence
levels are like 90%, 95% and 99%.
CONFIDENCE LEVEL

Significance level (𝑎): is


probability specifying the degree of
error in interval estimates.
 Probability that the parameter may be
outside the interval estimate being
developed.
 For the above confidence levels the
significance levels are 10%, 5% and 1%
respectively.
CONFIDENCE INTERVAL

 95% confidence interval means:


 about 95% of the similarly constructed
intervals are expected to contain (will
contain) the parameter being estimated.
 95% of the sample means for specified
sample size will lie within 1.96*σ of the
hypothesized µ.
 N.B.: To find the Z value for
corresponding confidence level, we
divide each confidence level into 2 left
and right of µ.
CONFIDENCE INTERVAL
 The confidence interval estimate of a population
parameter is obtained by applying the formula :
Point estimate ± Margin of error(E)
 Where Margin of error = Zc × Standard error
of a particular statistic
 Zc = critical value of standard normal variable that
represents confidence level (probability of being
correct) such as 0.90, 0.95.
 The number that should be subtracted and added
to a point estimate to obtain an interval estimate
is margin of error and it depends on the sample
standard deviation and the confidence level.
Statistical Estimation: Confidence
Interval for the Population mean
Using Normal Distribution
CONFIDENCE INTERVAL…

 Normal probability distribution is used to


construct the confidence interval for the
population mean:
1. whenever n≥30, because of the central limit
theorem, (if population standard deviation
(σ) is unknown, it can be approximated by
sample standard deviation s), or
2. when n< 30 but the population is normally
distributed and population standard
deviation σ is known.
CONFIDENCE INTERVAL …
 Accordingly, the confidence interval for the
population mean at a given confidence level (1-
𝛼) is computed as follows:
CONFIDENCE INTERVAL …

EXAMPLE 1
 The sponsor of TV program targeted at the
children's market (age 4-10) wants to find out the
average amount of time children spend watching
TV. A random sample of 100 children indicated
the average time spent by these children
watching TV per week to be 27.2 hours. From
previous experience, the population standard
deviation of the weekly TV watched is known to
be 8 hours. A confidence level of 95% is adequate.
a) What is the population mean of weekly TV
watching time for children?
b) What is the best estimate of the population
mean? What is this value called?
d. We can conclude that with 95 % confidence that a
child on an average spends between 25.6 and 28.7
hours per week watching television; or

About 95% all possible samples of 100 children would


include the population mean. However, it should be
understood 5% of the times our conclusion would still be
wrong. 2.5% of the times on both tails children are
watching below 25.6 and above 28.7 hours weekly.
EXAMPLE 2
 It is desired to estimate the average age of
students who graduate with an MBA degree in
the university system. A random sample 64
graduating students showed that the average age
was 27 years with a standard deviation of 4
years.
a) Estimate a 95% confidence interval estimate
of the true average (population mean) age of
all such graduating students at the
university.
b) How would the confidence interval limits
change if the confidence level was increased
from 95% to 99%?
EXAMPLE 3
 In a certain small city, to estimate the mean
monthly expenditure for food, a random sample
25 households was randomly selected yielding a
mean of 200 birr. From experience, it is known
that such expenditures are normally distributed
with a standard deviation of 50 Birr.
a) What is the point estimate of the mean
monthly expenditures for food of all
households in the city?
b) Find 95% confidence interval for the mean
monthly expenditures for food of all
households in the city.
Statistical Estimation:
Confidence Interval for the
Population Proportion Using
Normal Distribution
CONFIDENCE INTERVAL FOR POPULATION
PROPORTION




EXAMPLE

SOLUTION …
Statistical Estimation: Confidence
Interval for the Population Mean
Using t-distribution
CONFIDENCE INTERVAL FOR THE POPULATION
MEAN USING T-DISTRIBUTION

So far, in estimating a


population mean we use normal
distribution for a
 Any large sample (n ≥ 30), and
 For a small sample (n<30)
 Only if the population is normally
distributed and population standard
deviation, σ is known.
T-DISTRIBUTION …

 When the sample is small (n<30) and


the population is normal or
approximately normal,
 But σ is not known,
 Sample average cannot be normal
 We cannot use the normal
distribution for determining
confidence intervals for the
unknown population mean,
 But we can use special type of
T-DISTRIBUTION …
 Note when σ is approximated by the sample
standard deviation (S), the standard error

will be some what different from sample to sample,


due to the variability of S.
As a result, when S is used in the Z conversion
formula for small samples, it results in converted
values that are not distributed as Z values.
T-DISTRIBUTION …

Instead, the values are distributed


according to the t distribution.
 Developed by William S. Gossett in 1908.
 Gosset worked in an Irish Guinness
Brewery and published a paper about the
t- distribution using the pen name
‘Student’.
 t distribution has many similar
characteristics to the z distribution.
CHARACTERISTICS OF T-DISTRIBUTION:

It is continuous distribution;


It is bell shaped and symmetrical.
There is no one t distribution, but
rather a family of t-distributions.
 Allhave the same mean of 0, but
 Standard deviation varies according to
sample size;
Thus, different t distribution exists for

different sample size
CHARACTERSTICS …
 Itis more spread out (flatter) and wider
than the z, thus:
 Standard deviation of t is greater than
z, and
 Thus it has a standard deviation greater than one.
 The variance for t-distribution = df/df-2; note: df
stands for degree of freedom
 The value of t for a given level of
confidence is larger in magnitude
than the corresponding z value.
T DISTRIBUTION
 It
approaches z distribution as
sample size; n increases
other words the t-distribution is
 In
approximately normal for n ≥ 30
 The t-distribution is defined by the
degrees of freedom (df) which is
equal to n-1, that is its only
parameter.
DEGREE OF FREEDOM
 Number of items in a sample that are free to vary.
 Assume that the mean of four numbers is known to be
5.
 The four numbers are 7, 4, 1, and 8.
 The deviations of these numbers from the mean
must total 0.
 The deviations of +2, −1, −4, and +3 do total 0.
 If the deviations of +2, −1, and −4 are known, then
the value of +3 is fixed (restricted) in order to
satisfy the condition that the sum of the deviations
must equal 0.
 Thus, 1 degree of freedom is lost in a sampling
problem involving the standard deviation of the
sample because one number (the arithmetic mean)
is known.
COMPUTING T VALUES:



EXAMPLE

SOLUTION:


SOLUTION ,,,,

From the t distribution table,


the value of t for df = 9 and
0.025 area in the right tail or
area of 0.05 at the two tails is
2.262.
 Thus, we can state with 95% confidence that the
mean operating life for all bulbs lies
approximately between 3857 hours and 4143
hours.
 Sometimes we might be provided with the raw
data for the sample (10 bulbs). And
 First we are required to calculate the sample
mean and the sample standard deviation using
the formula
Interval Estimation of the Population
mean
Statistical Estimation:
Confidence Interval for Finite
Populations
CI FOR FINITE POPULATION
 The populations sampled so far have been very large
or infinite.
 What if the sampled population is not very large?
 Adjustments in calculating standard error of the sample
means and standard error of the sample proportions are
required.
 Thus, for a finite population, where the total number
of objects or individuals is N and the number of
objects or individuals in the sample is n, we need to
adjust the standard errors in the confidence
interval formulas for the population mean and
proportion.
 This adjustment is called the finite-population
correction factor (FPC).
 Particularly, it is needed when the sampling is done
without replacement from a small population; and
 When the sample constitutes more than 5% of the
population (n/N>0.05)
 Multiplying this correction factor by the
standard error reduces the standard error.
 If the sample is a substantial percentage
of the population, the estimate of the
population parameter is more precise.
 As N becomes larger relative to n, then n/N
becomes small and so FPC approaches unit.
 If n/N ≤ 0.05 or in other words, if the sample size
is not more than 5 % of the population size, then
the FPC may be omitted.
 Accordingly, to develop a confidence interval
for the mean from a finite population and
unknown population standard deviation the
formula is as follows:
IN CASE OF PROPORTION FOR SIMILAR
CASES, THE CONFIDENCE INTERVAL IS:
EXAMPLE
 Suppose, 250 families reside around Unity
University; and a random sample of 40 of these
families revealed their mean annual community
contribution was $450 and the standard
deviation of this was $75.
 What is the population mean?
 What is the best estimate of the population
mean annual contribution?
 Develop a 90% confidence interval for the
population mean.
 What are the endpoints of the confidence
interval?
 Using the confidence interval, explain why the
SOLUTION

c) The former can be a possibility, as it
is in the confidence interval; but the
latter is not likely, it is not within
the range.
EXAMPLE 2
 The same study on community contributions, in
the above case, revealed that 15 of the 40 families
sampled participate in community wide green
initiatives regularly. Construct the 95%
confidence interval for the proportion of families
participating in community wide green initiatives
regularly.
SOLUTION

Statistical Estimation: Sample
Size Determination
SAMPLE SIZE DETERMINATION
 Resources at researchers’ disposal
are limited - compel us not take
census or large sample,
 as long as small sample sizes can
satisfactorily help us achieve the
research objective/result.
 Too large data wastes resource,
 Too small data may not be
representative, making the resulting
conclusion uncertain.
SAMPLE SIZE DETERMINATION

The correct sample size depends on


three factors:
 Level of confidence desired
 The margin of error the researcher
will tolerate (maximum allowable
error)
 The variation of the population being
studied
SAMPLE SIZE DETERMINATION

Degree of Confidence is usually


95 or 99%, it could be any level.
 Itis directly related to sample size
(n).
Note:
 largersample sizes and
 more time and money to collect
the sample
 Correspond with higher levels
 Maximum allowable error
(sampling error)
 Tolerable error at specified level of
confidence;
 Difference between an estimator and
parameter.
 Itis one half the width of the
corresponding confidence interval.
 A small allowable error will require
large sample and vice versa.


 An equation for determining sample
size can be derived from margin of
error (E) formula, by solving for n.
SAMPLE SIZE DETERMINATION FOR
ESTIMATING A POPULATION MEAN, Μ:

N.B.: n=sample size;


Z𝛼/2= standard normal value for
corresponding confidence level
E= margin of error (the maximum
allowable error)
SAMPLE SIZE DETERMINATION FOR
ESTIMATING THE POPULATION
PROPORTION:
SAMPLE SIZE DETERMINATION FOR FINITE POPULATION
 Samples drawn without replacement from a finite
population of size N,
 The use of finite population correction factor
reduces the standard error by a value equal to (N
− n) /(N −1).
 Accordingly, sample size determination
formula for estimating the population mean and
proportion are multiplied by the FPC
 The revised sample size, taking into
consideration the size of the population, is given
by:
WHERE …

n=Revised sample size


𝑛0= the sample size without
adjustment (without using the
finite correction factor)
N= Population size
EXAMPLE 1
1. A marketing research firm wants to conduct a survey
to estimate the average amount spent on
entertainment by each person visiting a popular
resort. The people who plan the survey would like to
determine the average amount spent by all people
visiting the resort to within $120, with 95%
confidence. From past operation of the resort, an
estimate of the population standard deviation is
$400. What is the minimum required sample size?
EXAMPLE 2
 The manufacturers of a sports car want to estimate
the proportion of people in a given income bracket
who are interested in the model. The company
wants to estimate the population proportion, P, to
within 0.01 with 99% confidence. Current company
records indicate that the proportion P may be
around 0.25. What is the minimum required sample
size for this survey?
 For a population of 1000, what should be the
sampling size necessary to estimate the
population mean at 95 per cent confidence with a
sampling error of 5 and the standard deviation
equal to 20?
Chapter 5:
Hypothesis Testing
HYPOTHESIS TESTING

analysis in which we put our


 Statistical
assumptions about a population
parameter to the test.
 It is used to estimate the relationship
between 2 statistical variables
 Statistical
inference used to decide
whether the data at hand sufficiently
support a particular hypothesis.
 Allows us to make probabilistic statements
about population parameters.
HYPOTHESIS TESTING
 Testing hypothesis about population parameters
(such as µ, P, σ)
 Another fundamental aspect of statistical inference
and statistical analysis.
 In testing hypothesis, we start by making
assumption or claim with regard to an
unknown population characteristic (parameter).
 We then take a random sample from the
population, and on the basis of the
corresponding sample characteristic
(sample statistic), we either accept or reject the
hypothesis with a particular degree of confidence
(1- 𝛼).
STATISTICAL HYPOTHESIS
Any
assumption/claim/statement
about the true value of
unknown population parameter
or
 About population distribution
developed for the purpose of
testing.
A hypothesis is something that
has not yet been proven to be
HYPOTHESIS TESTING

A procedure that is used to


check/determine the validity
of hypothesis (to accept or
reject it) based on sample
evidence and probability
theory.
HYPOTHESIS TESTING
 In Testing hypotheses problem, we are faced with
a pair of hypotheses such that one and only
one of them is always true.
1. Null hypothesis (H0) - tentative statement
about population parameter assumed to be true
 Reflects the status quo, until sufficient

statistical evidence is provided to conclude


otherwise.
 In H0, H stands for hypothesis and the
subscript 0 (not) implies ‘no difference’.
 Accepting it would lead to null (no) action.
2. Alternative hypothesis (Ha
or H1)
 Optional hypothesis that is
accepted if the null hypothesis is
rejected or proved false because of
sufficient sample evidence.
 Negation of H0. H0 & H1 are
formulated in such a way that they
are opposite or mutually exclusive.
ERRORS IN HYPOTHESIS TESTING

 Two types of errors can be made in hypothesis


testing due to chance attributed to the random
sampling.
1. Type 1 error - error of rejecting hypothesis(H0)
that is in fact true because of insufficient
sample evidence.
 Analogous to case where judge sentences guilty an
innocent person.
2. Type 2 error - error of accepting the hypothesis
(H0) which is actually false.
 Analogous to judge’s error made by acquitting guilty
person pronouncing he/she is innocent.
ERROR…
 Theprobability of making type I
error
Denoted by alpha (𝑎), also called level
of significance; and
1- 𝑎 is the level of confidence of the
test.
 Probability of making type II error
Denoted by Beta (β); and
1- β i.e. called power of the test is the
probability of rejecting the false
hypothesis.
ERROR

 Often analogy of criminal trial of a court case
is used to explain hypothesis testing.
 When a person is accused of a crime, he or she
faces a trial.
 The prosecution presents its case, and a jury
must make a decision on the basis of the
evidence presented.
 In fact, the jury conducts a test of hypothesis.
 Thus the hypothesis is formulated as,
 H0: The defendant is innocent, and
 H1: The defendant is guilty.
 Of course, the jury does not know which
hypothesis is correct.
 The members must make a decision on
the basis of the evidence presented by
both the prosecution and the defense.
 There are only two possible decisions.
 Convict (find guilty) or
 acquit (free) the defendant.
In statistical term,
 Convicting the defendant is
equivalent to rejecting the null
hypothesis in favor of the alternative;
 i.e the jury is saying that there was enough
evidence to conclude that the defendant was
guilty.
 Acquitting a defendant is phrased as
not rejecting the null hypothesis in
favor of the alternative,
 that is , the jury decided that there was not
enough evidence to conclude that the
defendant was guilty.
 Failing to reject H0 does not prove that H0
is true; it means we have failed to
disprove H0.
 To prove without any doubt the null
hypothesis is true, the population
parameter would have to be known via
census.
 The scenario in criminal trial and
statistical hypothesis testing, in terms of
types of decision and consequences, is
summarized in Table below.
APPLICATION:
 Learning to formulate hypothesis
correctly needs practice.
 The following three cases illustrate
three situations in which hypothesis
testing procedures are commonly
employed
 Example 1: Testing research hypothesis
 Example 2: Testing the validity of a
claim
 Example 3: Decision making situation
EXAMPLE 1: TESTING RESEARCH
HYPOTHESIS

 Contrary to existing brand of Teff


yielding an average of less than or
equal to 100 quintals per hectare;
agricultural researchers developed a
new method increasing the average
per hectare yield of Teff beyond 100
quintals.
 N.B.: the research hypothesis is
formulated as alternative
hypothesis.
 H0: µ<100 H1: µ>100
EXAMPLE 2: TESTING THE VALIDITY OF A CLAIM

Testing the validity of claim of


manufacturer of Bottled water
stating that each bottle have a
volume of at least 500ml on
average
H0: µ>500 H1: µ<500
EXAMPLE 3: DECISION MAKING SITUATION

A shipment of bolts diameter is expected


to be 2 inches; a bolt with any other
diameter will not be accepted.
 If H0 is not rejected, shipment will be
accepted; but, if H0 is rejected shipment is
not accepted.
 H0: µ= 2
 H1: µ≠2
 The value of the test statistic is used
(as evidence) in determining whether or
not we may reject the null hypothesis.
EXAMPLE 3…

Test
statistic - sample statistic
computed from sample data.
 Evaluated against the decision
rule of a statistical hypothesis test
A rule that specifies the
conditions under which the
null hypothesis may be
rejected.
 Critical value of the test statistic
based on the significance level of the
test is identified from the respective
statistical table; and
 is used to formulate the decision rule.
 Generally,the three approaches that
are used to test hypothesis are
namely:
Critical value,

P-value, and

Confidence interval approach.


A HYPOTHESIS TEST

 Can be two-tailed (non-directional) or


 One-tailed (directional) test.
 One-tailed test can be right-tailed or
left-tailed test.
 The type of test is determined by looking at
the sign of H1; and
 It indicates which tail is involved in
representing the rejection area of the test,
under the concerned frequency curve.
FORMS OF HYPOTHESIS, TYPES OF TAILS AND
CRITICAL RANGES (REJECTION REGION) OF A
HYPOTHESIS TEST.
Hypothesis Testing (One Sample): Testing
Procedure for Population Mean using the
Normal Probability Distribution
(Two-tailed Test)
MEAN USING THE NORMAL PROBABILITY
DISTRIBUTION
(TWO-TAILED TEST)

THE FOLLOWING SIX-STEP PROCEDURE
IS USED TO TEST HYPOTHESIS:
1. State (H0) and (H1): depending on the
nature of the problem, the hypothesis
may be formulated as one tailed or
two tailed test.
 If H0 has an = and H1 has (≠) sign; two
tail test is conducted.
 If H0 has (≤) sign, and H1 has (>) sign;
or
 if H0 has (≥) sign, and H1 has (<) sign,
one tail test is carried out.
NOTE:
 The former is right tail test, and the latter is left tail test (look at
the sign in H1).
 H1 never contains equal sign.
 Depending at which tail (s) the rejection region is located under
the frequency curve; a hypothesis test could be one tailed
(one-sided) test or two-tailed (two-sided test).
STEPS

STEPS …

STEPS …

4.Formulate the decision rule: -


Specific conditions under which the null
hypothesis is rejected and it is not
rejected.
 The critical value (s) of the test statistic (z
in this case) that is determined from table,
based on the significance level, is used to
formulate the decision rule.
 Critical value (s) - dividing point between the
region where the null hypothesis is rejected and
the region where it is not rejected, on the curve.
STEPS …
 On the standard normal curve above, the
(shaded) critical region in the tails (depending
on the type of test) indicates the rejection
(critical) region,
 The region that is not shaded represents the
acceptance region.
 If Z value from the sample falls in the rejection

region, H0 will be rejected; but if Z from sample


falls in the acceptance region, H0 is accepted
STEPS ….
5. Determine (calculate) the actual value of the test
statistic:
 Take sample and compute the test statistic; a sample
statistic computed from sample data.
 The value of the test statistic is used in determining
whether or not we may reject the null hypothesis.
 If the normal distribution is used, the test statistic can be
standardized as follows:

 This is observed/calculated value of z got from sample.


STEPS …
6. Make Decision and interpret results:
 Decision about H0 based on the sample
information; eventually, comparing the z
value from the sample (z observed/calculated),
with z value from table (critical value of
/tabulated value of z), as per the decision
rule, we decide either to accept or reject
H0.
 However, because the decision is based
on a sample, it is always possible to
make either of two decision errors (type
I & II)
EXAMPLE: TWO-TAIL TEST
 3F Furniture Company manufactures and assembles
desks and other office equipment at several plants in
Addis Ababa.
 The weekly production of the office desk at Saris Plant
follows a normal probability distribution with a mean of
200 and a standard deviation of 16.
 Recently, because of market expansion, new production
methods have been introduced and new employees hired.
 The vice president of manufacturing would like to
investigate whether there has been a change in the
weekly production of the office desk.
 Is the mean number of desks produced at the Saris Plant
different from 200 at the .01 significance level?
 Interpretation: We conclude that the
population mean is not different from
200.
 Sample evidence does not show (fails to
indicate) that the new production
methods resulted in a change in the 200-
desks-per-week production rate.
 The difference between the population
mean of 200 per week and the sample
mean of 203.5 could simply be due to
chance (sampling error).
N.B.: We did not prove µ is still
200 but we failed to disprove H 0.
Failing to disprove the
hypothesis that the population
mean is 200 is not the same
thing as proving it to be true.
Hypothesis Testing (One Sample): Testing
Procedure for Population Mean using the
Normal Probability Distribution (One Tail
Test)
ONE SAMPLE …. ONE TAILED
 In a one-tailed test, the question of interest is
whether the population parameter is
greater than (or less than) a hypothesized
value.
 There is only one region of rejection for a one-
sided test.
 The region of rejection for a one-sided test is
always in the tail that represents support of the
alternative hypothesis.
 If H0 has (≤) sign, and H1 (>) sign, it is right tail
test; or
 If H0 has (≥) sign, and H1 (<) sign , it is a left tail
test.
 Critical values for one-sided tests
differ from two-sided tests because
the given proportion of area (𝛼) is all
in one tail of the distribution.
 The critical (rejection) region for
testing hypothesis about the mean
using the normal distribution is:
Z<−Z𝛼 for left tail test, and Z>Z𝛼 for
right tail test.
EXAMPLE
 Based on the above example, it was emphasized
that the concern only was reporting to the vice
president whether there had been a change in the
mean number of desks assembled at the Saris
Plant. The concern was not whether the change
was an increase or a decrease in the production. To
illustrate a one-tailed test, let’s change the
problem.
 Suppose the vice president wants to know whether
there has been an increase in the number of units
assembled. Can we conclude, because of the
improved production methods, that the mean
number of desks assembled in the last 50 weeks
was more than 200? Use the same significance level
 Note: In the two tail test in Example above, we
wanted to know whether there was a difference
in the mean number assembled, but now we want
to know whether there has been an increase.
 H0:µ≤200
 H1:µ>200
 Significance level (𝛼=0.01)
 As the sample taken is large n≥30,
the sampling distribution of x̅
is approximately normal, Z distribution is
used to find the test statistic
 The critical value: for the one-tailed
test, Z𝛼=Z0.01, Z at 0.5-0.01=Z at
0.4900=Z
Critical (Zc)=±2.33;
 Decision rule: if test statistic Z>2.33,
reject H0 and accept H1
RIGHT TAILED TEST
 The mean number of desks produced over
the last year based on a random sample
of (50 weeks) because the plant was shut
down 2 weeks for vacation is 203.5 with
standard deviation of 16

 the test statistic, Z calculated or observed


is 1.55
Decision: as per the decision
rule, 1.55 is below Z critical of
2.33 (this is in the acceptance
region); therefore, accept H0 and
reject H1.
Conclusion: population mean is
not greater than 200. Sample
evidence does not show (fails to
indicate) that the new production
methods resulted in an increase in
the 200-desks-per-week
production rate.
 Suppose the new method (change)
introduced was met with stiff resistance
on the part of the employees in the
company; and the vice president doubts
that the confusion that resulted may have
caused a decrease in the number of units
assembled.
 Can we conclude, because of the
resistance to change related to the new
production methods, that the mean
number of desks assembled in the last 50
weeks was less than 200? Use the same
significance level
ONE-TAILED TEST (LEFT-TAILED TEST)
 Most of the solution items are similar to
that of #1; only the different ones are
indicated below:
 H0:µ≥200 H1:µ<200
 The critical value is Z0.01=-2.33;
 Decision rule: if test statistic, Z<-2.33,
reject H0 accept H1.
 The following Figure shows the
critical(rejection region) and the
acceptance region.
LEFT TAILED TEST
 The test statistic is the same as in
Example above ; Z=1.55. As 1.55 is
above -2.33; the decision: Accept H0
and reject H1.
 We conclude that the population
mean is not less than 200.
 Sample evidence does not show (fails
to indicate) that the new production
methods resulted in a decrease in
the 200-desks-per-week production
rate.
Hypothesis Testing (One Sample):
Use of p-Values
USE OF P-VALUES


Determining the p-value not only
results in a decision regarding H0, but
it gives us additional insight into the
strength of the decision.
Consistent with the critical value
approach described in the preceding
sections, the idea is that
 A low p-value indicates that the
sample would be unlikely to occur when
the null hypothesis is true;
 Therefore, obtaining a low p-value leads
to rejection of the null hypothesis.

AS A RULE OF THUMB:

When the p-value is smaller than


0.01, the result is called very
significant.
When the p-value is between 0.01 and
0.05, the result is called significant.
When the p-value is between 0.05 and
0.10, the result is considered by some
as marginally significant (and by
most as not significant).
When the p-value is greater than
EXAMPLE
 ABC, a manufacturer of ketchup, uses a
particular machine to dispense 16 ounces
of its ketchup into containers.
 From many years of experience with the
particular dispensing machine, ABC
knows the amount of product in each
container follows a normal distribution
with a mean of 16 ounces and a standard
deviation of 0.15 ounce.
 A sample of 50 containers filled last hour
revealed the mean amount per container
was 16.017 ounces.
 Does this evidence suggest that the mean
amount dispensed is more than 16
ounces?
a) State the null hypothesis and the alternate
hypothesis under these conditions.
b) What is the decision rule under the new conditions
stated in part (a)?
c) A second sample of 50 filled containers revealed the
mean to be 16.040 ounces. What is the value of the
test statistic for this sample?
d) What is your decision regarding the null
hypothesis?
e) Interpret, in a single sentence, the result of the
statistical test.
f) What is the p-value? What is your decision
regarding the null hypothesis based on the p-value?
Is this the same conclusion reached in part (d)?
Topic: Hypothesis Testing
(One Sample): For Population
mean using t- Distribution
 When n<30 - t distribution is the
appropriate basis for determining the
standardized test statistic when the
sampling distribution of the mean is
normally distributed but population
standard deviation σ is not known.
 Thus, the sample standard deviation is
used to estimate σ. To find the value of
the test statistic, we use the following
formula
EXAMPLE
 The mean life of a battery used in a
digital clock is 305 days.
 The lives of the batteries follow the
normal distribution.
 The battery was recently modified
to last longer.
 A sample of 20 of the modified
batteries had a mean life of 311
days with a standard deviation of
12 days.
 Did the modification increase the
mean life of the battery?
a) State the null hypothesis and
the alternate hypothesis.
b) Show the decision rule. Use the
.05 significance level.
c) Compute the value of t. What is
your decision regarding the null
hypothesis is briefly?
d) Summarize your results.
Topic: Hypothesis Testing (One
Sample): For Population
Proportion using Normal
Probability Distribution
PROPORTION

 Largesample test statistic for the
population proportion
EXAMPLE
 A marketing manager of an enterprise is facing a
decision whether to introduce a new product into the
market or not.
 Consumer acceptance measured in a blind comparison
test is agreed upon as an appropriate basis for
evaluation.
 Marketing of the new product will be pursued only if the
acceptance rate exceeds 30%.
 Otherwise, the new product will not be introduced in the
market.
 A random sample of 200 consumers reveals that the
acceptance rate is 32%. Using a level of significance of
0.05, perform the hypothesis testing and recommend
your action.
SOLUTION

SOLUTION …
 We have no evidence to reject the null hypothesis based
on the sample data at 5% level of significance.
 In this case even at 1% level of significance, we cannot
reject H0.
 This implies that you accept H0 and conclude that the
population proportion of consumer acceptance is less
than or equal to 30%.
 Hence, the manager should not introduce the new
product in the market.
 You may wonder how come when the sample proportion
is 32%, you say that you should not introduce the new
product?
 Is not 32% better than the 30% stipulated? Yes, but you
see statistically speaking, 32% sample proportion has
arisen due to chance and not a real one.
 This is why you say statistically not significant. As long
as statistical significance does not take place, you
Hypothesis Testing (Two Independent
Samples): About Differences between the
Means of Two Populations Using the Normal
Probability Distribution
SESSION LEARNING OBJECTIVES:

 By the end of this session students


are expected to:
 Test hypothesis about the
difference between means of two
populations (µ1- µ2) using the large
and independent samples
 Construct confidence interval for
(µ1-µ2) using z distribution.
HYPOTHESIS TESTING (TWO INDEPENDENT
SAMPLES):

Procedure for testing a


hypothesis concerning the
difference between two
population means is similar to
that for testing a hypothesis
concerning the value of one
population mean.
HYPOTHESIS TESTING (TWO INDEPENDENT
SAMPLES):

HYPOTHESIS TESTING (TWO INDEPENDENT SAMPLES):

 Use of the normal distribution is based on


the same conditions as in the one-sample
case, except that two independent
random samples are involved.
 Two samples are independent if they are
Drawn from two different populations and
The elements of one sample have no
relationship to the elements of the second
sample.
If they are somewhat related they are
dependent.
HYPOTHESIS TESTING (TWO INDEPENDENT SAMPLES):


The general formula for determining the z value for
testing a hypothesis concerning the difference between
two means, according to whether the population standard
deviation (σ) values for the two populations are known, is:
HYPOTHESIS TESTING (TWO INDEPENDENT
SAMPLES):

 Note: if the population standard


deviations are unknown, the respective
sample standard deviations (S1 & S2)
will be used to approximate (σ1 & σ2),
as long as n≥30 for both samples Thus

Note: S1 & S2 are standard deviations of each sample


ASSUMPTIONS FOR THIS TEST ARE:

 The population should be normal


 The standard deviations (σ) for both
populations should be known, if not use
the respective sample standard
deviations (S), to approximate the
population standard deviations, as long
as n≥30 for both samples.
 The null & alternate hypotheses are
formulated in any of the following ways:
 H0: µ1= µ 2 or µ1-µ2 =0;
 H1: µ 1 ≠ µ 2 or µ1>µ2 (µ1-µ2>0); µ1<µ2 (µ1-
µ2<0)
 Moreover, confidence interval for the difference
between means of two populations, µ1- µ2 at a
given confidence level is constructed based on
independent samples as follows:
EXAMPLE
A civil group in the city claims that a female
college graduate earns less than a male college
graduate.
 To test this claim, a survey of starting salary of
60 male graduates and 50 female graduates was
taken and it was found that the average starting
salary for female graduates was birr 29,500 with
standard deviation of birr 500 and the average
salary for male graduates was Birr 30,000 with a
Standard deviation of Birr 600.
a) At 1% level of significance, test if the claim of
this civil group is valid;
b) Develop confidence interval for the difference of
the means salaries at the specified confidence
Topic: Hypothesis Testing (Two
Dependent/Paired Samples):
About Difference between
Two Population Means
…. PAIRED OR DEPPENDENT
 Inferences about the difference between
parameters of two populations may be based on
independent or dependent random sample
(paired/matched observation).
 Independent random samples are in no way
related to each other; we observe different groups of
persons or things at different times or under different
sets of circumstances.
 Contrarily, in paired-Observations (dependent
samples), the same group of persons or things are
observed at two different times ‘before’ and ‘after’,
and under two different sets of circumstances or
‘treatments’.
 Infact, population parameters may differ at
two different times or under two different
sets of circumstances or treatments because:
 The circumstances differ between times or
treatments, and the people or things in the
different groups are themselves different.
 Thus, compared to independent samples, by
looking at paired-observations, we are able to
minimize the ‘between group’, extraneous
variation.
 In sum, when we can pair or match observations
that measure differences for a common variable, a
hypothesis test based on dependent samples is
more sensitive to detecting a significant difference
than a hypothesis test based on independent
samples.
THERE ARE TWO TYPES OF DEPENDENT SAMPLES
(SITUATIONS):

1. A matching or pairing of the observations; for


instance postgraduate students defending their
thesis in a university system are evaluated by 2
examiners, external and internal examiners.
 Suppose, postgraduate program coordinator wants to
make an assessment of whether the average
evaluations of external and internal examiners are
consistent; took a random sample of 12 postgraduate
students’ score.
 This sample consisting of 12 paired observations of
scores by external and internal examiner is a good
example of dependent sample.
 Each pair relates to one student;
2. And those characterized by a
measurement, an intervention of some
type, and then another measurement.

In the example below, a training believed


to improve production efficiency was
given to a random sample of 10 workers.

• Their efficiency was measured before


and after the training (intervention),
and efficiency scores were recorded.
THE TEST FOR PAIRED DIFFERENCES
 The hypothesis test is interested in the
distribution of the differences.
 The symbol µd is used to indicate the
population mean of the distribution of
differences.
 Most commonly, it is investigated
whether the mean of the distribution of
differences concerning the values of the
variable under study is 0
THE TEST FOR PAIRED DIFFERENCES
 (E.g. H0: µd=0; H1: µd≠0, True mean difference (μd)
 Note: the test can be done as one tailed or two tailed).

 We assume the distribution of the population of


differences is approximately normally distributed, and
the population standard deviation is unknown, hence, the
test statistic follows the t distribution.
 Consequently, for a random sample of paired observations
the 'difference' between each pair of data is first
calculated and then these differences are treated as a
single set of data (sample) in order to consider whether
there has been any significant change or whether
differences could have occurred by chance.
 The test statistic is calculated as follows:
EXAMPLE
 The production manager wants to find out whether a
unique training program will increase employee
efficiency. He plans to take a random sample of 10 and
record their efficiency before the training starts. After
completion of the program, the efficiency of the same
sample of employees will be recorded. There will be a
pair of efficiency ratings for each member of the sample.
A test of hypothesis with 5% significance level is
conducted to find out if there is a difference between
rating before and after the training program. It is called
paired difference test.
 For the test of hypothesis to be
conducted, there is only one sample,
not two.
 We are testing the hypothesis that
the distribution of the differences
has a mean (µd) 0.

You might also like