You are on page 1of 18

Chapter 15, 16, 17,21 : lnferences for Proportions

We will now start to cover inferential statistics, which is making conclusions about the population
given our sample data. The results we obtained for sampling distributions is based on knowing
the true population parameters in the models. Generally, these values are unknown but we
would like to give estimates to the true parameter.

A Confidence Interval
Recall: The sampling distribution of the proportion is approximately Normal with mean p and
p(r - p)
standard deviation
"
We typically do not know the true population proportion P, so we use a SRS and estimate it with
p. Wnen we use a single number to estimate the a parameter, this is called a point estimate.
E.g. The sample proportion p is a point estimate of the population proportion p and the sample
mean 9 is a point estimate of the population mean &.

Whenever we estimate the standard deviation of the sampling distribution, we call it a standard
error. Thus given a sample proportion f , the standard error is

- , Ft:d
,\p) :\/
spl;\
"
What can we say about the true population proportion given its sample proportion and standard
erro(?

We can use an interval estimate to provide an interval containing p with a confidence level.
These are called confidence intervals. The confidence interval generally takes the form

Esti.mate * kI E : E st'irno,te * Cri'tical V alu.e S E(Estim,ttte)

Maroin of Error
ln a confidence interval, the extent of the interval on either side of the observed statistic value is
called the margin of error (ME). A margin of error is typically the product of a critical value
from the sampling distribution and a standard error from the data.

For a proportion, the margin of error is given by

t,,I E : z*

where z* is the critical value. A critical value is the number of standard errors to move away
from the mean of the sampling distribution to correspond to a specified level of confidence.
For a standard Normal Model, we denote the critical value z*. For a95% confidence level, the
critical value is z* :
1.96 since 95% oI values fall within 1.96 standard deviations of the mean.

Example 15.1 : Find the critical value z* corresponding to a 99% confidence level.
-1*46 O l.qf
o'oos ,.A- o-g.J PUc# )= 0.qqs
ffia =g.oog **
Qt+(r.S7) -o.qqqq z* a a-s+tgS$
707# P(a ce'sr) *o'qqs I _-.5+S
a_
=l? I
One-proportion z-interval

Assumptions and Conditions

lndependence Assumption: lf the data were sampled at random or generated from a properly
randomized experiment, they satisff the Randomization Gondition. Proper randomization can
help ensure independence.

10% Condition: Sampling more than 10% of the population will produce an overestimate of the
true SE.

\- Sample size: nP > 10, n(L - P) > 10


When the conditions are met, we are ready to find a level C confidence interval for the
poPulation ProPortion r:.

The confidence interval is

and the critica! value z* specifies the number of SEs needed for Co/o of random samples to yield
confidence intervals that capture the true parameter value.

Commonly used confidence levels and critical values for z-interval:

Gonfidence Level (%) CriticalValues

90 1.645

95 1.96

99 2.575

2
Example 15.2: Consider flipping a coin 100 times. The results show 39 heads. Based on this
result, what interval captures the most likely 95% of the values for the true proportion of heads?
"d;'H";";.;A-
Does the confidence interval suggest that this coin is fair?
;'i:q 6";''"'3= 6 ) =@ = 0.3q(\ -o,3q)
loo ;"i* rca
/
t z#' S;E(F) t () ,0q971,...
Gr€,dence. It'srnal' P
: 0 3qt l.q6(o.oqrffi"")
= o.3qr 0.0q55q... qr6)
= (o.eqqr 0
[.rje qre 4S?o
-Qtts cunftae* +[*'h t'lne {**. P-Pod"^ "F
ulg'6o/o
h;d. bctween ?a.|a/o aFd'is ',
-[his d,,es-ryE.,n;; {1^"t {+r.. a 4s1o c|,nc< {-l4dl
* h*dt ls bdwecn 34'47o ar'd Llt'6%.
t;e *naa pffition "{ro &lf r.^ri*lrrn 6qr onf;dence i"tterval '
S'in.. Sa%Y no*
lnterpretino Gonridence rntervars <i;' 5[ji;+:
.
dln{- o,r{ oin \s ref &;,-'
What do we mean by saying we have 95% confidence that our interval contains the true
proportion?

What we mean is that "95o/o ol samples of this size will produce confidence intervals that capture
the true proportion." We typically say'We are 95% confident that the true proportion lies in our
interval."

So if we repeatedly take random samples of the same size and calcutate confidence intervals
for each sample proportion, then 95% of the intervals will capture the true proportion.

The following graph shows 20 confidence intervals obtained from 20 random samples. Each dot
represents the sample proportion and the horizontal line is the true proportion. Notice that some
confidence intervals capture the true proportion while others do not.

3
Example 15.3: For a school project, a student randomly sampled 182 other students at a
university to determine if the majority of students were in favour of a proposal for a new
cafeteria. The student found that 75 were in favour of the proposal.

A) Find the 95% confidence interval for the proportion of students that are in favour of
-l'I'H^
P,
the proposal.
O.qr 2o8,.' 1e*=l'q6r
sr(p)@ffi
C on$,denc< lErterua\: F t z*Sf($ ) o 'o36tt$ "'
: O. Llao$... t. oB6tr .. )
i.ee (o.

= O.L{l?o6... ! A.o7l5l ...


,(o.3Ll l, o.Ll8q)
o.r rh .,.* ;;**eS'
In
s
!,r" li-auou. r
p
*"f "il::^ ?r4;",S #f#
-' Find the 99% confidence interval for p.
B)
sE(i)* a'o 36t'lg"'
6 = o.irios... I ax = ?.s75,
Con€,dence Intrrua\ : B rr
?* sE(F )
rt
i 3 1"T," . i:.'trJ;,,T ?',
= (o."tg, o.506)
u;e qre l4olo corfrda"t [q* #g trua 'p*p2rhon of50'6%,
sl'r.d+nts

;-S-ro,r- u€ tt"( PPrro\ \s betuee6 "l'b% a'..d


p
C) Find the 95% confidence interva! Ior if the student had randomly sampled 364
students instead and found that 148 were in favour of the proposal.

p= c o. L{ 065q,.. , **= l'Q 6, sa(p)=/&ld


P)= /P_(.t-Pt
=p
36q
Confidr^c. Tdtrtnl ,f t *r SE (F) =@
vr
= O, .t
qo6$q.. l .W(a.o*vt) x6.OzS+t{,..
= O.L1o6Sq...t A.OSo,te .I
=(a356, o.qS7)
tde qre qsol .o"fidr^t *liaF t h( 'f'qa prop"r*t on

{a\\s betr^,een 35.6% a".ot 4 S.fl, .


The Plus Four Gonfidence lnterval for Small Samples

When the Success/Failure Condition fails, we can make an adjustment to the confidence
interval by adding four observations to the sample-two successes and two failures. This is
^v
:
called the Plus Four Confidence lnterval. So instead of P i, *e use the adjusted

6- rta+2
proportion ' l4 and the confldence interval is

p(l - p)
p:r i

This method gives better performance overall, especially for proportions near 0 or 1.

Example 15.4: Surgeons examined their results to compare two methods for a surgical
procedure used to alleviate pain on the outside of the wrist. A new method was compared with
the traditional "freehand" method for the procedure. Of 45 operations using the "freehand"
method, 3 were unsuccessful, for a failure rate of 6.7%. With only three failures, the data
doesn't satisfy the Success/Failure Condition, so we can't use the regular Normal-based
confidence interval. What is the 95% confidence interval using the Plus Four method?
g= 3 r h='l 5 , F=!,8= g!!=Ea0.l-Q20,{...
?n= t.ql.15ffi= ;4s21...
v
"+1
Confidenca Irnttrvql *ffio.roao qrv6(a.o*rf)
'nffi
:

V 8q?S
= o.lo2oq* O.a
Using the- Pl.rs Fbr*. rndhal, \ue are_qtS%;L?*P-*r-J,lf)
Ghoosins a sampre size *L'e*["lpif,p, "I^ri;if#r'i-e iJt-uns
is be.
An important step in planning any study is choosing a sample size. lf we want to attain a certain
margin of error with a specified level of confidence, then we can rearange the formula for ME:

rJ:z-.^y lN--D-
rvrF -
"
'fr'ME:z*u$g_ P1

Ji:firm=A
": (#r)' oo - os 4 on RrrY"^\q Sfret '
When computing sample sizes we always round up (to ensure the confidence levelwe want).
Note: This formula requires p to ne known or estimated. lf we don't know f , we can guess a
value. To be safe, we take p(l - p) to be the largest value which happens when p : 0'5.

Examole 15.5: Suppose a TV executive would like to find a 95% confidence interval estimate to
within 0.03 for the proportion of all households that watch The Big Bang Theory regularly.

A) Howlargeofasampleisneeded? 4! =l'q6 I ME = O'o$, A =O'5

n= (W)"(osxt-oS)
\0.o 3 /
: loe?.I
"F-o6B]
}ur Sarvrpl< size. sl^ur^[ bc o] \easS loag .

B) How large of a sample is needed if a prior estimate f: 0.15 was obtained?


p = 0.15, *= 1.46, HE=0.O3
ne (tgt'(oLsxl-o.ls) 2 sqtl ' eee .l ttsl
\ 0.03 /
Oqr' iiiptu s't?e- sh*ta be df leasf St{5. I

Testing Hypotheses About Proportions

A hvpothesis test is a test using sample statistics to make a decision between two different
claims about a population parameter.

Hypotheses are models that we adopt temporarily - untilwe can test them once we have data.

The null hyoothesis is the claim being assessed in a hypothesis test, denoted by Hr. We
assume that the null hypothesis is true until we have evidence to suggest that it is false. lt
generally is the hypothesis of "no change" or "no effect' and write Hs : n : no.

The alternative hypothesis proposes what we should conclude if we find the null hypothesis to
not be plausible, denoted by He. These can take the form H4: r *
ro, o, Hh r)
rs, s7
Hpr: r 1no.

The test statistic is the calculated statistic that measures the number of standard deviations
away the estimate in the sample is from the hypothesized value in flo.

The P-value is the probability of observing a test statistic at least as far from the hypothesized
parameter as the statistic value actually observed if the null hypothesis is true.

il6 1* 6
For a hypothesis test with Ha : P : Po:
hyoothesis is fy'a : P * po (twotailed test)
a two-sided alternative
a one-sided alternative hypothesis that focuses on deviations from the null hypothesis
in only one direction Ha.: P l Po (lowertailed test) or Ha : P ) Pa (uppertailed test)

Conclusion about any null hypothesis should always be accompanied by the P-value of the test.

Choosing the Hypotheses

The appropriate null arises from the context of the problem.

To write a null hypothesis, identify a parameter and choose a null value that relates to the
question at hand.

Example 16.1: Write the appropriate null and alternative hypotheses for each of the following
scenarios.
A) I wish to test if a c.oin is unfair. Let P repres-ent the propprtion of heads.
Coin rs &ir Cdn is yrdt fair
Ho: p=O.5 He.: p+O.5
L' B) I wish to test if a coin is biased towards heads.

Ho: p:0.5 Ha: P>O.5


C) I wish to test if more than 5% of students fall asleep during class.

Ho: p= 0'o5 He: P > 0'05


D) I wish to test if less than 60% of students like vanilla ice cream.

Ho" P=0'6 Ha: P(O'6


How to think about P-values

A P-value is a conditional probability. lt tells us the probability of getting results at least as


unusual as the observed statistic, given that the null hypothesis is true:

P-value: P(observed statistic value [or even more extreme]l]16)


A small P-value means that the result we observed is unlikely to occur if the null hypothesis is
true. The P-value should serve as a measure of the strength of the evidence against the null
\- hypothesis.
A big P-value means that the result we observed is not surprising. The results are in line with
\-- our assumption that the null hypothesis models the world, so we have no reason to reject it. A
big P-value doesn't prove that the null hypothesis is true, but it offers no evidence that it's not
true.

The calculation of the P-value is dependent on the test type:

Test Type Lower Tail Upper Tail Two Tails

P-value P(z < zs) P(z > zt)) 2P(z < -lrol) s7 2P(z > lrol)

How do we decide whether to reject the null hypothesis?

We can define a threshold called the alpha level orSigtrifigangglClle!, CI.

lf our P-value falls below a, then we reject the null hypothesis. We call such results
statistically significant. Common alpha values are 0.10, 0.05, and 0.01.

The alpha value is chosen appropriately for the situation. For example, if we are assessing the
:
safety of air bags, then you will want a low alpha level; a 0.01 or smaller. lf we want to know
whether people prefer pizza with or without pineapple, then you might be fine with c 0.10.:
We must select the alpha Ievel before we look at the data, otherwise we can manipulate our
signiflcance level to suit the data.

When the P-value is greater than a, then we fail to reject the null hypothesis. There is
insufficient evidence to reject the null hypothesis.

Steps in a Hyoothesis Test

1. State the null and alternative hypotheses, Ho and Ha.


2. Check the assumptions and conditions.
3. Calculate the test statistic.
4. Calculate a P-value or an interval containing the P-value.
5. Decide to reject or fail to reject the null hypothesis.
6. State your conclusion in the context of the problem.

8
The one-proportion z-test is a test of the null hypothesis that the proportion parameter for a
single population equals a specified value (Ho t
p : po) by referring the statistic
P-Po
l;,,o_^)
V"
to a standard Normal Model.

Assumptions and Conditions

lndependence Assumption: Sampled values are independent of each other.

Randomization Condition: SRS or random assignment.

Success/Failure Condition: Expected counts are greater than 10,


nqo ) 10 and n(7 - Po) > 10.

10% Condition: Sample is no larger than 1Oo/o of population.

Examole 16.2: Let p be the probability of obtaining heads on a coin flip. Mike claims that the
coin is fair, but Rosana wants to test whether Mike's claim is true. Rosana took a random
sample of size 100 and the results showed 39 heads. Carry out a test of hypothesis with a
"ffi :
significance level of a 0.05 and state your conclusion.
*. ;; ;:'i""'Z -Tro- srdcd A hl-arn6ti"q
(6mnA^'Rn'Cd
Indgenderce : Randori*tirt condidbn Ls sati$ed

s,rccess /Fai\ure Gndilirn


: nPo = loo(o's) = sovlo
n( l-po) = too (I o fiIP.f€ ), to
lo[o Gndi$ron: nzloo is no+ la'st. *"q,ffi:,[,;
P= *=o'3Q
100
Test .a = 3-Po ;' 0.3q-q{ =:o'!
stofric'z=
ffi re-re
loo
.05 0

'
Tuo <ai\ed +€s|: Pva\u< =1Yk-t-'1o)
=2(o.orBq)
Zo.oa"tr
u
i-lf." r-,}ff-" e Q.03+8 ( o( =QoS ,. u.,a r€jecf +he nqll
l$;h;is.-tr'-re ls stffircrent evid enc,< *i conclr.,cle {hcf
e the- coin s not, &ir,
Confidence lntervals and Hypothesis Tests

Both confidence intervals and hypothesis tests have the same assumptions and conditions.
The results for a hypothesis test should be consistent with the corresponding confidence
interval. Since confidence intervals extends to both sides of the point estimate, they correspond
to twotailed tests.

For a significance level a of a two-tailed hypothesis test, the confidence level of the
corresponding confldence interval is (1 - a) ' ]0A%.

For example, a two-tailed hypothesis test with e: 0.05 has a confidence interval with a
confidence tevet of (1 - 0.05) .L}A%: (0.95) .100%:95Yc.
*
I -d CanFidence. Intrruq I
/"

For a significance level a of a one-tailed hypothesis test, the confidence level of the
conesponding confidence interval is (1 - 2*) '700%.

For example, a one-tailed hypothesis test with a : 0.05 has a confidence interval with a
confidence tevet of (1 - 2(0.05)) . 100% : (0.90) - 100% :9AYa.
q ad GrrFr&.re iLrftrr,arl
I-
/

oEe
Recall: The 95% confidence interval for tossing 100 coins in Examplerls tZS.+"/y+9.W Ow.O,OV1B
hypothesis test with a significance level of a :
0.05 produced a P-value:*0ffi{^,hich showed -
evidence to support the claim that the coin is not fair. This is consistent with the 95%
confidence interval.

Example 16.3: lf we had chosen oursignificance level to be a:0.01 instead forthe coin flip
hypothesis test, would the conclusions be the same as the significance level of a 0.05? :
Verify the result by constructing the corresponding confidence interval.

P-rdqq = .Q.oeAi > O.al=X, , we tr.,ou\d ftit


^
{a qFe fu
null hyPoth€s'rs '
t i\.4 furtj G'fiderce ler'e t (l-4)' Mfr = (t- 0ol)' Wt/a = q1%
Tt ,o -
Ex\S.?, ?=o.39, sE(F)t 0.oLl87?... , **=2,5"5
Co^fido*cn l#Fr'rql : p t z*sffa)l 9Iq t
p.Sx(9.o-{r??,,.)

UJa dff,+ hc,/a suffrcie,.t €videnre


.{13,
lonclhdq
'{l'l+ ++'t
Clln jsnl t Sqrc..
Example 16.4: Suppose that the proportions of adults above 40 that participate in fitness
activities is 0.8 last year. An advertising campaign that promotes fitness activities is launched
this year and you want to test whether the proportion is ligher this year. Assume you take a
random sample of 100 peopte and found that 85 participate in the activities. Carry out a
hypothesis test with a significance level of a : 0.01 to test your claim.

Ho,p=0.8 , H. l p>O.8 * One-sidd oltrrmtiuq

Fq.,&*iz..tion: Pandorn Sarrnfe, we Csn scsume. indcpnderug,


Succrsr/fti\u'e: Exfd€4 c0q6ts' nPo t loo (o'g) ' 80 V to
n(t-p,) , Lo?[E = &oz to
rs
lOTo Gndi*io,": n=[@ , s not Lagcr tlaon 4*1ra Popt,tation
6F qAuFls or/€r "4a .

Test shihstic: P = !.5- "


O 85
Ioo
A
-,Po z O,BS-6.4 z O,0S'
Z =-P
o,6L{ "il:s]
One -*qi\ed *est : P*@lhe = P(z> l.eS )
a P( e<t.aS) l-
I - Cl ,8q1Ll
=
Sinc< P'vq\re= o.t,s; =a.ot ,- we=Hm**
j'I
ev\turte {"
*lna hrl\ hypdl.esi5. There rs )1nifficjgr#ove(
Ona\u.'cla+(;.f {lL pr0pr+10n. of adrr'lk LLoSlnot-

part .ipe" \; phyrSica\' qc+iui$es is 6.o*ec this g€ar.

Decision Errors

When we perform a hypothesis test, there are two ways that we can make mistakes:

Type I Error: The null hypothesis is true, but we mistakenly reject it.
Type ll Error: The null hypothesis is false, but we fail to reject it.

11
We can compare the types of errors by using an analogy of the criminal justice system:
I{o: the person is innocent HA : the person is guilty.
,
Truth

lnnocent Guilty

Decision Guilty Type I Error OK

Not Guilty OK Type ll Error

During a trial, the jury may make the following mistakes:


Type I Error: Convict an innocent person.
Type ll Error: Set a guilty person free.

Truth

l1o is true f/o is false

Decision Reject flo Type I Error OK

Fail to Reject I1o OK Type ll Error

The probability of a Type I Error, a, is the probability of rejecting flo even though 110 is true:
P(Rc.jcct HolHo is trut') : 6

The orobability of a Type ll Error, ,6, is the probability of failing to reject 110 even though flo is

false: P(Fail to reject fIolHo is false) :6.


y'{0.
The power of the test, L - P , is the probability of correctly rejecting a false

Note: The two types of errors that occur in a hypothesis test depend on each other. Lowering
the value of c will increase the value 0 , and vice versa. The value of a is determined by the
experimenter. The value of 0 is difficult to calculate since we don't know the true value of the
parameter. Obtaining a larger sample size will decrease ,6, thus increasing the power of the

0(

Po
Ho Sq\s<
B
r-B
Dtf P=on
12
Example 17.1: Suppose a report shows that a drug that alleviates migraines turns out to have a
nasty side effect. A researcher wants to study the drug and look for evidence that such side
errects exist' r The d.q hr* rD srde €ed , , Ho T\e dng hot q
Ho SUe
A) lf the test yields a Iow P-value and the researcher rejects the null hypothesis, but there fu.t.
is actually no ill side effect of the drug, what type of error occurred and what are the
consequences?

Iirt h"riH.,t#**':"f \#t .?tr kT:#


wou\{ba prodrrced anol people qro'Jd be rnissiS on 6rt,rf 4

B) , nflltP-r"rre and the researcher rairs to reject the nur hypothesis, but
'r.*ffi'!::ft\
there is a bad side effect of the drug, what type of error occurred and what are the
consequences? Since dJ^< re,erd.,e. &ited to fgied *t're nult
Jtyfe,:\t ?.^_ t\or^gxf,\i'. nu\\ \ypddps if, 9]*, t.: {+1ls
\d'o, ttpc :1. ec.oc? The d6.uq *'*a ; p.ad,eccC _but
-hhg
peop\a {'he d*g ;"\".t s&ir +hE *le e€"Cs
Comparing Two Proportions

Given two proportions pr and pz, with sample sizes nt and zzz respectively, we can calculate
the standard deviation of the difference between two proportions as follows:
r
- sD(p:,: rf r"0-P')
nG pt)
sD(Pr):
Y t,t

V o,r{1t1 * : V crr(Pt) * V or\.P2) : 7,1 (l - 7,1 ) .r- ptll * pz)


Pr)
'flt TLz

: (1 - 7i1) pzl - tD)


- pr): lfTarfp, -
1.,1
SD(p, 1r) t-

Assumptions and Conditions when Comparing Proportions

lndependence Assumptions

lndependence Responses Assumption: Within each group, the data should be based on
results for independently responding individuals.

Randomization Condition: ldeally, the data in each group should be drawn independently and
at random from the target population or generated by a randomized comparative experiment.
Otherwise, we need to argue that the cases may be considered reasonably representative of
some larger populations to which your conclusions may apply.

13
The 10% Gondition: The sample should be no more than 10% of the population.

lndependent Groups Assumption: The two groups we are comparing must also be
independent of each other. Usually, the independence of the groups from each other is evident
from the way the data were collected.

Sample Size Gondition

Each of the groups must be large enough. We need larger groups to estimate proportions that
are near 0%o or 100%.

Success/Failure Gondition: Both groups are big enough that at least 10 successes and at
least 10 failures have been observed in each, or will be expected in each (when testing a
hypothesis): rltPt ) 10, n,1(1 - pt) > 10, nzPz ) 1A,n2(1- pr) > 10

A Confidence lnterval for the Difference Between Two Proportions

The Sampling Distribution Model For A Difference Between Two lndependent Proportions

Provided that the sampled responses are independent, the samples are independent, and the
sample sizes are large enough, the sampling distribution ol 0r - fiz can be modetted by a
L Norma! model with mean p: fu -& anO standard deviation

: p'(t - p') - p2)


sD(p, - pz) +p2(7
YUrL2
^f

However, we don't know the true values, so we estimate the standard deviation with the
observed proportions.

A Two-Proportion z-l nterval

When the conditions are met, we are ready to flnd the confidence interval for the difference of
two proportions, fr - pz. The confidence intervat is

(pt-pz)*z*.SE(fu-pr)
where we find the standard error of the difference

t
it\ - itr)
* iz(t - iz)
sl(pr - iz) - ,fV trr ?Lz

14
Example 21.1: Suppose we want to compare two different therapies. The criterion forthe
comparison is the probability to survive at least 5 years after therapy. The study produced the
following data:

Therapy 1 Therapy 2

Number of people sampled 100 80

Number of people suruived at least 5 years 90 70

Find the 95% confidence interval for the difference in proportions.


tD osrqn€ $t& dl''e
: r+ is reosonMaa-
14pg*trndepa.dffie
Strrvivo\ r,fra e* q frro't Io does no+ "ft"ct 4
*he sc,rt'iral
;^\" os q",other. \De oko heeA +o qsfuft* +t"'r,+ dhe
***gf" \S rcPresetalive oF d'l"c pog"\ati,n '
'lo%G.d'ti,n:tJe.Ihus+ocsavtre+le+dhtsaryo)esi6
rs no q}'an I'fo # $€ pop'Joilrb".
i; erd^ ${rx^p lage.
Lde qGSqne *tnot $e sufvruot
rale \n one grDuP does
;rt ;{rd. thc sqruiva\ r&e .$ dh€ dArr gre'}'
Success /mL,*ne Gnctiti,y,, f, = 0't , i, = #=O.s?5;
Yr
t03.fr
.,$, = \oo(o.e) = go7 to
z^Go(oms) = toa D
n,'(f -fl,) loo ( l-oR). l0)to ne(l -fi,)= 8o( [-0.87s)= to >lO
"
z* = I .q 6 ( r57c, co.ffd,ao.), 5?'( f' - F,) =

loo n@ tro
=
J[o-s{)-oq)
Corftdsn." xv*errr,\, 6,
- A *;.:#,?i?l;
=
(o.q - o. s+s) L
t. q 6 (o .oqtet . . )
i_g,-q25 + 0. oq 732 ,. .,

tle ore- 4Eo/o confi dcnt rs th& *f'<. 'l1l^e dfemanfe bCt*e*"t
tha prup.tlons \s bdlre€h '8,8'L and il.s%,
s

The z-test for a Difference Between Proportions

Hvoothesis Testing for the true difference between proportions

Hs :: Pt - Pz:0
We know that

s Etp, - pz) -
p''(t
-nt p') * pz(l - pz)
Y
^f
TL2

But since we assumed that fIo is true, the two proportions are equal. So we should have a
single P in tne SE formula.

Whenever we have data from different sources or different groups that we believe came from
the same underlying population, we can combine the successes to get an overall proportion.
This is called pooling. -

When we have counts for each group, we can find the pooled proportion as

^ Su(C€ss1 * Succe,ss2 n1i)y * nzfrz


l'Pouttd - rrr + n2 tt1* ll.2

When we have proportions and not the counts for each group, then we have to reconstruct the
number of successes and we typically round these to whole numbers:

Successl :'tltPt,, Success2 : TLzP2

E*a"a(p, pz) : . a-A#@


s -
lu-U;^
: /
d''**o(L - P'P"a"a') (.; * a)
1f

Two-Proportion z-Test

The conditions for the two-proportion z-test are the same as for the two-proportion z-interval.
We are testing the hypothesis
Ho:Pt-Pz:0
Because we hypothesize that the proportions are equal, we pool the groups to find

frPooted'-n'rYt*?zPz
nt'l nz

16
and use that pooled value to estimate the standard error

SEpoorca(frt - pz) :
We can find the test statistic
: Pt-Pz-0 :
' sE;r*(P, fr)
'P,
sLr",^(fr,
- P,
4)
-
When the conditions are met and the null hypothesis is true, the statistic follows the standard
Normal model, so we can use that modelto obtain a P-value.

Example 21.2: Suppose that the lottery commissioner's office in a state wants to flnd if the
proportions of men and women who play the lottery are different. A random sample of 500 men
taken by the commissioner's office showed that 160 of them play the lottery and a random
sample of 300 women showed that 66 of them play the lottery. Carry out a hypothesis test at
:
with a significance level of a 0.01. Can we conclude that the proportions of all men and
women in this state who play the lottery are different?

Hr, pu-pw=O )
l-l*: PH - p* * O

1^ActrrdEr,ce : Fon brniedlon G^d$io. Psndon sr^rnP\€s


: in each
gn up . S0 We- ossrryvre {h4+
q Pcrson phtiS the- lotrt ru
rsi^d"f*rd";+ ;tr ;"d,1^(; p€rsn +t^oh'' Oqri to*e.e'.
- *
lqrto Gnd,rHon , Ve Can astuyye t1^+ d{1( Wb is vb la**-
{d",". Laoo fre pop.
SucCuss/ Foilu"e C@i{ro6: *e" i there arz 160 su6e$535
Fo.
-*o*"1'r,
;,io e,i;.* F; thecl qre 66 shcessas q,l,
;"d
eE[ fi"i..*.'-t-L;e o;; ;(i' at \€qst lo so hr{ hque 4
lolge e.no-',qlt san^Ce "i::^. . -^ a^ F
T;"+ st,h.\,c: pp-r"{ = ISI=96- = 9S. x o,e8'e5
5OO+ 3oO 60(>-
s Epoored ( i" - 0*)
= 0 r*o(\ - 6.o,,".) J_
hr
fk)
A 0,03effl.".
P'q"= W r o.Sa p" =36 Q.22
500 340 =
17
a = 3.0{
o.03e6
u:o *ai\ ed tea = a.P " ) 301{ o( z 1 -3.Oq

-3 0'l ?.o q

Sinca = O.OOzL{ <X=O.01 efe €ct *lae


nu\\ h cie'*
t.ue rnen a^d. rY\en uJ
en'lr

You might also like