You are on page 1of 54

Stat 110, Lecture 12

Sampling Distributions, Estimation,


and Hypothesis Testing (II)
bheavlin@stat.stanford.edu
Statistics

Way Too
No Data Some Data Much Data

Inferential Descriptive
Probability Statistics Statistics

sampling hypothesis estimation


distributions testing

Stat 110 bheavlin@stat.stanford.edu


topics

• comparing proportions
• paired vs two-sample (again)
• sample size calculations
• hypothesis testing
• power transformations
• the other two-sample test
• the k-sample problem

Stat 110 bheavlin@stat.stanford.edu


even odd
column column total
yields 40 28 68
no yield 60 72 132
total 100 100 00

odd and even columns


have different test heads
Three measures
2. risk reduction 40/100 – 28/100 = 0.12
3. relative risk (72/100)/(60/100)= 1.20
4. odds ratio (40/60)/(28/72) = 1.71
Stat 110 bheavlin@stat.stanford.edu
Terms for Comparing Two Probabilities
Risk reduction:
• Rate1 – Rate2 (good≡bad)
Relative risk:
• Rate1 / Rate2 (bad, usually)
Odds:
• Rate / (1– Rate) (good≡bad)
Odds ratio:
• ratio of two odds (good≡bad)
• [Rate1/(1– Rate1)]/[Rate2/(1– Rate2)]
Stat 110 bheavlin@stat.stanford.edu
… comparing probabilities
Index Pluses Minuses
Risk •simplest •awkward to model
reduction •aids cost-benefit analyses •additive model less
physical
(delta) •smaller sample sizes

Relative risk •more physical for modeling •no symmetry between Pr


•… extrapolation {A} and Pr{not A}
•requires prospective data
•larger sample sizes

Odds ratio •easy to model •less physical than relative


•can use with retrospective risk
datasets, rare events •harder to explain
•simple formula for std err

Stat 110 bheavlin@stat.stanford.edu


… comparing New Old
#fail a b
probabilities #pass c d
total nN nO

notation nN = (a+c), nO = (b+d), pN = a/nN , pO = b/nO


Which Index Estimate Standard Error2
Risk pN – pO pN (1–pN)/nN + pO(1–pO)/nO
Reduction

Loge Relative Loge ( pN / pO ) (1–pN )/a + (1–pO ) /b


Risk
Loge Odds Loge( ad / cd ) = pN (1–pO) 1/a + 1/b + 1/c + 1/d
Ratio (1–pN ) pO

Stat 110 bheavlin@stat.stanford.edu


Example confidence intervals

point standard lower conf upper conf


estimate error limit limit
risk reduction 0.12 0.066 -0.129 0.253

relative risk 1.2 0.977 1.47


(log RR) 0.182 0.103 -0.023 0.388

odds ratio 1.71 0.937 3.14


(log OR) 0.539 0.302 -0.065 .14

Stat 110 bheavlin@stat.stanford.edu


odds ratios and “recommend indices”
notation:
attrition
Pr{“Y”(0)} = this year's RI = 1–Pr{“N”(0)”} “yes”
Pr{“Y”(-1)} = last year's RI = 1–Pr{“N”(-1)”}

Pr{“Y”(0)|“Y”(-1)}
= this year's retention rate
Pr{“Y”(0)|“N”(-1)} re-enlistment “no”
= this year's re-enlistment rate

Pr{“N”(0)|“Y”(-1)}
= this year's de-enlistment rate
= 1–Pr{“Y”(0)|“Y”(-1)} = 1-retention rate equilibrium RI/(1-RI)
Pr{“Y”(0)|“N”(-1)}
conversion rates Pr{“N”(0)|“Y”(-1)}
Pr{“Y”(0)|“N”(-1)} and Pr{“N”(0)|“Y”(-1)}
Stat 110 bheavlin@stat.stanford.edu
Schredder-Schredder chess match
Intel=L Intel=W
AMD=W Draw AMD=L
AMD white 16 44 11 Intel black
AMD black 11 40 19 ntel white

Ignoring draws, White odds = 35:22 ignores


AMD white odds = 16:11 AMD vs Intel effects, ignores
AMD black odds = 11:19 any sample size imbalances.
= Intel white odds vs
AMD odds = 27:30 log(AMD white odds)+
log(Intel white odds),
White odds = 35:22
which implicitly does so adjust
Stat 110 bheavlin@stat.stanford.edu
confidence intervals for odds

95% confidence interval for White odds=


White odds = 35:22 = 1.59 AMD white odds ×
ln(35/22)= 0.464 Intel white odds
1/35=0.0286 1/22=0.0455 = (16/11) ×(19/11)
s.e.=[0.0286+0.0455]1/2 Log odds “ratio”=0.921
=0.272 s.e.=[1/16+1/11+1/19+1/11]1/2
0.464 ± 2×0.272 =[0.297]1/2 = 0.5449
=(–0.080,1.008) log-odds 0.921 ±2×0.5449
(0.923,2.741) as odds =(–0.169,1.089) as log odds
OR (0.845,2.97) as odds

Stat 110 bheavlin@stat.stanford.edu


sample sizes from confidence intervals

• old and new processes,


• any difference in yield?

Suppose we know σo=σ+=σ.


• standard error for the difference in means
= σd(n) = σ(1/no + 1/n+)1/2
= σ(2/n)1/2, where no=n+= n.
• with approx 95% confidence interval
d ± 2σd(n)
Stat 110 bheavlin@stat.stanford.edu
sample sizes (solution)

fix the length of the confidence interval =Δ,


solve for n:
Δ = (d+2σd(n))–(d–2σd(n)) = 4σd(n) = 4×σ(2/n)1/2 or
Δ2 = 16×σ2×(2/n) = 32σ2/n
n = 32σ2/Δ2

e.g. Δ= 2σ, n = 32σ2/(2σ)2=8 per group


e.g. Δ= σ, n = 32 per group

Stat 110 bheavlin@stat.stanford.edu


sample sizes (one-sample version)
When: process monitoring, paired data
Suppose we know σd.
• standard error for the mean difference = σd / n1/2
• with approx 95% confidence interval d ± 2σd / n1/2

fix length of the confidence interval=Δ, solve for n:


• Δ = (d+2σd /√n )–(d–2σd/√n) = 4σd/√n or
• n = 16σd2/Δ2
e.g. Δ= 2σd, n = 16σ2/(2σ)2= 4 pairs
e.g.
Stat 110
Δ= σd, n = 16σ /σ
2 2
=16 pairs
bheavlin@stat.stanford.edu
The price of two-sample testing

1. Assuming σd is comparable to σx, the two-sample


problem requires twice the number of
observations. This is because the value of its
control group is random.
2. In addition, the cost of a pair is usually less that
of two unrelated observations.
3. Finally, when pairing is feasible, it is reasonable
for σd < σx . When pairing is at random, σd = σx√2,
and the one-sample test is burdened by the loss
of degrees of freedom.
Stat 110 bheavlin@stat.stanford.edu
Hypothesis testing

1. null hypothesis, whereby the population


parameter is “uninteresting, unremarkable,
default, null=zero.”
2. alternative hypothesis which is implicitly
accepted if the null hypothesis is rejected. The
alternative hypothesis is usually not unique.
3. test statistic computed from observed sample.
4. rejection region which defines values of the test
statistic that would reject the null hypothesis.

Stat 110 bheavlin@stat.stanford.edu


e.g. one-sample mean, σ known
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d×n1/2/σ
Rejection region: z > 1.645
0.4

0.3

0.2

0.1

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
standard normal z
Stat 110 bheavlin@stat.stanford.edu
…p-value version
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d × n1/2/σ
One-sided p-value:
0.4

p1-value = P(zobs>zpv | Ho) 0.3

0.2
= P( d /(σ/n )> zpv |Δ=0)
1/2
0.1

= 1– Φ(d /(σ/n1/2)) 0

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

=StatΦ(–d×n
110
1/2
/σ) standard normal z
bheavlin@stat.stanford.edu
…two-tailed p-value
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d × n1/2/σ

Two-sided p-value:
p2-value = P(|zobs| > zpv | Ho ) 0.4

= P( |d|/(σ/n1/2)> zpv | Δ=0 ) 0.3

=1–[Φ(|d|/(σ/n1/2))–Φ(-|d|/(σ/n1/2))] 0.2

= 2×Φ(-|d|×n1/2/σ) 0.1

= 2×p1-value 0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
standard normal z
Stat 110 bheavlin@stat.stanford.edu
one-sample mean, σ unknown, p-value
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic t = d / (s/n1/2) = d × n1/2/s
Rejection region: |t| > tdf=n-1 (α/2)
0.4

0.3
Two-sided p-value:
0.2
p2-value = P(|tobs|>tpv |Ho )
0.1

= P( |d|/(s/n1/2)> tpv|Δ=0) 0

= 2×T(–|d|×n1/2/s, df=n-1) -6 -5 -4 -3 -2 -1 0
t(df=4)
1 2 3 4 5 6

Stat 110 bheavlin@stat.stanford.edu


Example: overetch yield experiment

lot split 01-12 13-24 delta


1 clearout 01-12 75 68 7
2 clearout 01-12 45 61 -16
3 clearout 01-12 81 79 2
4 clearout 01-12 78 87 -9
5 clearout 01-12 57 77 20

mean = –7.20 t = d √n / s
stdev = 11.52 = –7.2×√5 / 11.52 = –1.40
t(df=4,0.975)=2.776 p-value = 0.117
Stat 110 bheavlin@stat.stanford.edu
Wafer position sequence effects
120
How to adjust
for this effect 100
in the 1st 12
80
vs 2nd 12 split
experiment? 60
zig-zag
even-vs-odd
40 early effect
wafer
20
effect
0
0 5 10 15 20 25
wafer processing sequence
Stat 110 bheavlin@stat.stanford.edu
20 concurrent unsplit lots
lot split 01-12 13-24 lot split 01-12 13-24
6 no splits 62.5 58.8 17 no splits 66.7 72.9
7 no splits 50.5 30.6 18 no splits 75.4 68.6
8 no splits 72.5 71.6 19 no splits 78.3 81.4
9 no splits 86.0 73.8 20 no splits 75.5 77.8
10 no splits 68.6 59.3 21 no splits 84.0 73.6
11 no splits 76.6 78.2 22 no splits 79.6 78.5
12 no splits 55.6 44.3 23 no splits 77.9 74.7
13 no splits 64.6 71.3 24 no splits 64.8 61.7
14 no splits 73.5 77.5 25 no splits 69.6 70.2
15 no splits 81.3 77.7 26 no splits 70.7 71.4
16 no splits 66.0 3.1
Stat 110 bheavlin@stat.stanford.edu
Adjusting for the wafer position bias
delta
-30 -20 -10 0 10 20

split n mean stdev


clearout

clearout 01-12 5 –7.20 11.52 01-12

no splits 21 3.49 7.04

df 24
no splits

diff means –10.69


pooled stdev .96
t = –10.69 / [7.96×(1/5+1/24)1/2]
= –2.70 with df=24
two-sided p-value = 0.0126
Stat 110 bheavlin@stat.stanford.edu
Two errors
Type I error is the probability that, given the null
hypothesis is true, of the statistical procedure
rejecting the null hypothesis.

Type II error is the


probability that, given the truly truly
alternative hypothesis is delta=0 delta≠0
true, of not rejecting the we say 1–α β=
null hypothesis. Type II "delta=0” type II
error is a strong function
of the particular we say α= 1–β =
alternative. "delta≠0” type I power
Stat 110 bheavlin@stat.stanford.edu
IF … THEN power(Δ)
1.0
0.9
0.8
0.7
0.6

IF Δ=0, THEN 0.5


0.4
0.3
the probability of a 0.2

significant result = α 0.1


0.0
-1 0 1 2 3

IF Δ=ΔA, THEN Δ=0 Δ


the probability of a
significant result = 1–β(ΔA)

Stat 110 bheavlin@stat.stanford.edu


one-sample mean…
power(ΔA)
1.0
0.9
Null Ho : Δ=0. Alt HΔ : Δ=ΔA. 0.8
0.7
statistic z = d /(σ/n ) = d×n
1/2 1/2
/σ 0.6
0.5

reject z > 1.645 0.4


0.3
0.2
P( z > 1.645 | Δ=0 ) = α = 0.05 0.1
0.0
-1 0 1 2 3
power(ΔA) = 1–β
ΔA
= P( z > zα | Δ=ΔA ) = P( d×n /σ > zα |Δ=ΔA)
1/2

= P( (d – ΔA + ΔA)×n1/2/σ > zα | Δ= ΔA)


= P((d – ΔA )×n1/2/σ + ΔAn1/2/σ > zα | Δ= ΔA)
= P( z + ΔAn1/2/σ > zα )=P( z > zα–ΔAn1/2/σ )
= Φ( –zα + ΔAn1/2/σ )
Stat 110 bheavlin@stat.stanford.edu
connection to hypothesis tests:
• When the confidence interval contains zero, then
the conventional null hypothesis that the
population parameter is zero cannot be rejected
(at the given confidence level=1–significance).
• Confidence intervals consist of those null
hypotheses that cannot be rejected (at the given
confidence level=1–significance).
• Confidence intervals have sufficient information
to determine whether the null hypothesis is to be
rejected.

Stat 110 bheavlin@stat.stanford.edu


one-sample mean…
0.4
0.3 Δ=0 zα
power(ΔA) = 1– β 0.2
0.1
= P( z >zα–ΔA×n /σ )
1/2
0

so 0.4
0.3
Δ√n/σ =1
z1–β = zα – ΔA×n1/2/σ or 0.2
0.1

–zβ = zα – ΔA×n1/2/σ or 0

0.4
ΔA×n /σ = zα + zβ
1/2
or 0.3 Δ√n/σ =2
0.2
n 1/2
= ( zα + zβ ) ×σ / ΔA 0.1
0
n = ( zα + zβ )2×σ2/ ΔA2
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Stat 110 bheavlin@stat.stanford.edu


Two sample version:

σn2 = σ2(1/n + 1/n) = 2 σ2/n, so

n = 2[zα/2 + zβ]2 σ2/(μ1─μ0)2

Guenther’s refinement:
2
n = 2[zα/2 + zβ]2 σ2/(μ1─μ0)2 + zα/2/4

Stat 110 bheavlin@stat.stanford.edu


Comparison of Sample Size Calculations
(Two-Sample Problem)

power=0.5 power=0.9
non- non-
alpha Δ/σ n* Guenther central n* Guenther central
0.05 0.25 122.93 123.89 123.88 336.24 337.2 337.2
0.05 0.5 30.73 31.69 31.71 84.06 85.02 85.03
0.05 0.75 13.66 14.62 14.67 37.06 38.32 38.34
0.05 1 7.68 8.64 8.73 21.01 21.98 22.02
0.05 1.25 4.92 5.88 6.02 13.45 14.41 14.48
0.05 1.5 3.41 4.37 4.57 9.34 10.3 10.4
0.05 1.75 1.92 2.88 3.17 5.25 6.21 .39

Stat 110 bheavlin@stat.stanford.edu


Examples:
Yield: standard process 100 dpw
“new” process 110 dpw?
Δ=new-std 10 dpw
σ 25 dpw

Reliability: standard process 30


“new” process 35, 40
Δ =new-std 5, 10
σ 6

Stat 110 bheavlin@stat.stanford.edu


Corporation Named
paired data
of Interest Competitor count
(binary): yeasayers => Yes Yes 798
Yes No 406
No Yes 95
naysayers => No No 220
total 519

The key information is patterns (yes,no) & (no,yes).


We proceed conditionally: CoI vs NC odds = 406:95 = 4.27
Log odds 95% CI = 1.45 ± 2×0.114=(1.22,1.68)
Odds 95% CI = (3.40,5.36);
95 % CI Yes fraction = Odds/(1+Odds) = (0.773,0.843)
Stat 110 bheavlin@stat.stanford.edu
One-sample variance problem
Null hypothesis Ho : σ = σo
Alternative HA : σ > σo
Test statistic and ν s2 / σo2
rejection region > χ2(df=ν,0.95)
0.4

0.3
one-sided p-value:
p1-value 0.2

= P(ν s2 / σo2 >χpv | Ho) 0.1

0.0

= 1– χ2Inv(df=ν, νs2/σo2 ) 0 1 2 3 4 5 6 7 8
Stat 110 bheavlin@stat.stanford.edu
Two-sample variance problem
Null hypothesis Ho : σ1 = σ2
Alternative HA : σ1 ≠ σ 2
Test statistic and s12 / s22
rejection region < F(ν1,ν2, 0.025) or
> F(ν1,ν2, 0.975)
two-sided p-value:
Label so that s1 > s2
p2-value
= 2×P( s12/s22 > F Inv(ν1,ν2,1–p2/2) | Ho)
= 2×F(ν1, ν2, s12/s22)
Stat 110 bheavlin@stat.stanford.edu
CPU times (reprise)
3 3 3

Normal Quantile Plot


.99 .99

Normal Quantile Plot


.99
2 2 2
.95 .95 .95
.90 .90 .90
1 1 1
.75 .75 .75

.50 0 .50 0 .50 0

.25 .25 .25


-1 -1 -1
.10 .10 .10
.05 .05 .05
-2 -2 -2
.01 .01 .01

-3 -3 -3

0 1 2 3 4 5 0 1 2 -6 -5 -4 -3 -2 -1 0 1 2 3

linear scale square roots log (base 2)


Stat 110 bheavlin@stat.stanford.edu
variability tracking with mean
0 5 10

raw
data group mean stdev
1.46 0.58 4.31 1.02 1 2.89 2.62
1.30 8.24 3.51 6.87 2 3.56 4.10
5.92 1.86 1.41 1.70 3 3.08 1.50
4 3.20 3.20
0.17 2.92 0.91 0.43 5 1.21 0.95
1.43 1.44 4.49 4.21 6 2.00 0.80
2.02 1.65 1.40 .40 7 2.27 1.94
8 2.01 .96
Stat 110 bheavlin@stat.stanford.edu
a few power transformations
5 1.25

4
1
stdev (linear)

stdev (sqrt)
3
0.75
2

0.5
1
linear sqrt
0 0.25
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 1.2 1.4 1.6 1.8
mean (linear) mean (sqrt)
2

stdev 1.75

1.5
stdev (log)

1.25

0.75
mean log
0.5
-0.5 0 .5 1 1.5
mean (log)
Stat 110 bheavlin@stat.stanford.edu
Why power transformations?
Theoretical reasons
• align physical relationships to (linear) statistical models.
Empirical reasons
• to reduce correlations of group variances to group
means.
• reduces the influence of large values without making
them into outliers
• reduces the skewness in right-skewed data (λ<1).
• resolve an ambiguity in scale, (e.g a rate vs its
reciprocal).
Preference order:
• λ = 0 (logs), 1/2 (square roots), -1 (inverses),
1/3 (cube roots~logs with zeros)
Stat 110 bheavlin@stat.stanford.edu
Box-Cox transformations
“poor man’s” Box-Cox
procedure
2. For each group, calculate
the mean, and the
standard devation.
3. Plot the log(stdev) vs log
(mean).
4. Estimate the slope, say r.
What are they? 5. The recommended
Response y → yλ power for transforming
Note: y → (yλ ─1)/ λ the raw data is 1–r,
@1 equals 0 with slope 1 (suitably rounded).

Stat 110 bheavlin@stat.stanford.edu


two examples
theoretical empirical Box-Cox
Suppose the standard 2.5

deviation is proportional 2

to the mean: 1.5

log stdev
1

σ(μ) = μ×σo 0.5

0
Box-Cox plots log(σ(μ)) vs -0.5
log(μ): 0 .5 1 1.5 2
log mean

log(σ(μ))= 1×log(μ)+log(σo),
slope = 1.24
slope r is 1, 1–r=0, so
transform by taking logs 1–r = –0.24, suggests logs
of raw data. or reciprocal sqrts.
Stat 110 bheavlin@stat.stanford.edu
linear vs log: plots of transformed data
Ra226 (linear) log2 Ra226

0 1 2 3 4 5 6 7 8 9 -3 -2 -1 0 1 2 3

g2
g2

A A

B B

C C

D D

E E

F F

G G

H H

linear logarithms
Stat 110 bheavlin@stat.stanford.edu
Mis-calibration:
thickness
Target thickness
β (mean) is β to
target
thickness deviation is (β – b)to is
b
actual proportional to
time mean.

to
So multiplicative relationships tend to promote
constant coefficients of variation, and log
transforms.
Stat 110 bheavlin@stat.stanford.edu
sums of small positive errors
actual thickness
Examples:
= ( Σ bi Δt i ) Poisson mean = λ
with variance with variance = λ
= ( Σ Δti2 Var(bi) ) sums of independent
Poissons are Poisson
= (Δt Var(b)) ΣiΔti
= σbΔ2 to so variance is Chi-square (gamma):
proportional to square root
mean =ν
of mean.
variance = 2 ν
sums of independent chi-
squares still chi-squares
Stat 110 bheavlin@stat.stanford.edu
Why Box-Cox works:
Background theory: Setup:
g(X) ≈ g(μ) + g'(μ)(X – μ ) or log( σ(μ) ) = k + r log(μ) or
g(X) – g(μ) ≈ g'(μ)(X – μ ) so log(σ2(μ)) = 2k + 2rlog(μ) or
E(g(X) – g(μ))2 σ2(μ)) = c μ2r
≈ g'(μ)2 E(X – μ)2 so
Var(g(X)) ≈ g'(μ)2 Var(X) Suppose g(x) = x1– r then
g'(x) = (1–r) x– r or
g'(x)2 = (1–r)2 x–2r so
Var(g(X)) ≈ g'(μ)2 Var(X)
= (1–r)2 μ–2r c μ2r
≈ constant with respect to μ
Stat 110 bheavlin@stat.stanford.edu
The other two-sample t-test

In general, for independent observations from two


populations,
E( X1 – X2 ) = μ1 – μ2 ,
2 2 “classical”
Var( X1 – X2 ) = σ1/n1 + σ2/n2
So a natural two-sample t-statistic is
x1 – x2 x1 – x2
not
2 2
[ s1/n1 + s2/n2 ]1/2 sp [1/n1+1/n2 ]1/2
2 2
where s =[ (n1–1)s + (n2–1)s ]
p
2
1 2

Stat 110
(n1–1) + (n2–1)
bheavlin@stat.stanford.edu
issues

The two t-statistics differ when n1 ≠ n2 or s1 ≠ s2.


In larger samples can the differences among s1, s2,
sp be worrisome, but M&S distinguish between
them by whether n1 ≠ n2 or n1 = n2.
For the “unequal variances” procedure, there is no
clear theory for its sampling distribution…in
particular we need to figure out the associated
degrees of freedom.

Stat 110 bheavlin@stat.stanford.edu


Degrees of freedom for unequal variances t
Lemma: Let s be a standard deviation from
independent normals with same mean and
variance σ2, and ν degrees of freedom.
Var( s2 ) = 2 σ 4 / ν .
So Var( s12/n1 + s22/n2 )
= 2[σ12/n1]2 /ν1 + 2[σ22/n2]2/ν2, (set equal to)
= 2 σ 4 / ν, where σ =[σ12/n1 + σ22/n2 ]1/2 . Of course,
we don’t know σ1 or σ2 , so we “plug in” s1,s2:
So, [s12/n1+s22/n2]2/ν = (s12/n1)2/ν1+(s22/n2)2/ν2, from
which we solve for ν.
Stat 110 bheavlin@stat.stanford.edu
Clearout 5 split + 21 unsplit
split unsplit sum
ng 5 21
mean = xg –7.2 3.49
standard deviation = sg 11.52 7.04

df = νg = ng –1 4 20

sg2 / ng 26.5421 2.360076 28.902


[sg2 / ng]2 / νg 176.1205 0.278498 176.399

calc’d df= 28.92 / 176.4 4.735484

Stat 110 bheavlin@stat.stanford.edu


…continued

–7.2 – 3.49
= -1.988 = t with df=4.735
[ 28.902 ]1/2

p-value (one tail) = 0.059 tν=4.735


or
p-value (two tail) = 0.118
0.059 0.059
Conventional pooled t
with df=24, p-values
= 0.0292, 0.0583 -5 -4 -3 -2 -1 0 1 2 3 4 5

Stat 110 bheavlin@stat.stanford.edu


comments
Different variances in different groups,
• This can often be of intrinsic interest, with groups
with smaller variation usually more desirable.
• When variation tracks with the mean level (usually
higher going with higher), Box-Cox power
transformations are suggested.
• When differences in means are still of interest (in
spite of differences among groups in variation), the
alternative t-test conservatively adjusts the degrees
of freedom.
• Note df≈5 vs 24, p-value=0.053 vs 0.029
• This low power is why M&S recommend Wilcoxon.
Stat 110 bheavlin@stat.stanford.edu
Metrology study
date mean stdev
Monitor of the same 17-Sep 1051 2.2
linewidth (same spot)
22-Sep 1062 4.3
on 10 days,
28-Sep 1063 3.1
5 readings each day.
28-Sep 1058 4.7
29-Sep 1057 3.6
30-Sep 1060 3.3
1-Oct 1062 4.1
2-Oct 1066 4.7
5-Oct 1061 4.1
6-Oct 1060 3.4

Stat 110 bheavlin@stat.stanford.edu


components of variance
day-to-day: σday
d

meas-to-meas: σmeas
(repeatability) +m

total variation:
2 2
σtotal = [σday +σmeas]1/2
d+m
(reproducibility)
Stat 110 bheavlin@stat.stanford.edu
Estimating these two variances

Pooled within day standard deviation = 3.82


= RMS( 2.2, 4.3, …, 3.4 )

Standard deviation of the daily averages = 4.055

Stat 110 bheavlin@stat.stanford.edu