You are on page 1of 54

# Stat 110, Lecture 12

## Sampling Distributions, Estimation,

and Hypothesis Testing (II)
bheavlin@stat.stanford.edu
Statistics

Way Too
No Data Some Data Much Data

Inferential Descriptive
Probability Statistics Statistics

## sampling hypothesis estimation

distributions testing

## Stat 110 bheavlin@stat.stanford.edu

topics

• comparing proportions
• paired vs two-sample (again)
• sample size calculations
• hypothesis testing
• power transformations
• the other two-sample test
• the k-sample problem

## Stat 110 bheavlin@stat.stanford.edu

even odd
column column total
yields 40 28 68
no yield 60 72 132
total 100 100 00

## odd and even columns

Three measures
2. risk reduction 40/100 – 28/100 = 0.12
3. relative risk (72/100)/(60/100)= 1.20
4. odds ratio (40/60)/(28/72) = 1.71
Stat 110 bheavlin@stat.stanford.edu
Terms for Comparing Two Probabilities
Risk reduction:
Relative risk:
• Rate1 / Rate2 (bad, usually)
Odds:
• Rate / (1– Rate) (good≡bad)
Odds ratio:
• ratio of two odds (good≡bad)
• [Rate1/(1– Rate1)]/[Rate2/(1– Rate2)]
Stat 110 bheavlin@stat.stanford.edu
… comparing probabilities
Index Pluses Minuses
Risk •simplest •awkward to model
reduction •aids cost-benefit analyses •additive model less
physical
(delta) •smaller sample sizes

## Relative risk •more physical for modeling •no symmetry between Pr

•… extrapolation {A} and Pr{not A}
•requires prospective data
•larger sample sizes

## Odds ratio •easy to model •less physical than relative

•can use with retrospective risk
datasets, rare events •harder to explain
•simple formula for std err

## Stat 110 bheavlin@stat.stanford.edu

… comparing New Old
#fail a b
probabilities #pass c d
total nN nO

## notation nN = (a+c), nO = (b+d), pN = a/nN , pO = b/nO

Which Index Estimate Standard Error2
Risk pN – pO pN (1–pN)/nN + pO(1–pO)/nO
Reduction

## Loge Relative Loge ( pN / pO ) (1–pN )/a + (1–pO ) /b

Risk
Loge Odds Loge( ad / cd ) = pN (1–pO) 1/a + 1/b + 1/c + 1/d
Ratio (1–pN ) pO

## Stat 110 bheavlin@stat.stanford.edu

Example confidence intervals

## point standard lower conf upper conf

estimate error limit limit
risk reduction 0.12 0.066 -0.129 0.253

## relative risk 1.2 0.977 1.47

(log RR) 0.182 0.103 -0.023 0.388

## odds ratio 1.71 0.937 3.14

(log OR) 0.539 0.302 -0.065 .14

## Stat 110 bheavlin@stat.stanford.edu

odds ratios and “recommend indices”
notation:
attrition
Pr{“Y”(0)} = this year's RI = 1–Pr{“N”(0)”} “yes”
Pr{“Y”(-1)} = last year's RI = 1–Pr{“N”(-1)”}

Pr{“Y”(0)|“Y”(-1)}
= this year's retention rate
Pr{“Y”(0)|“N”(-1)} re-enlistment “no”
= this year's re-enlistment rate

Pr{“N”(0)|“Y”(-1)}
= this year's de-enlistment rate
= 1–Pr{“Y”(0)|“Y”(-1)} = 1-retention rate equilibrium RI/(1-RI)
Pr{“Y”(0)|“N”(-1)}
conversion rates Pr{“N”(0)|“Y”(-1)}
Pr{“Y”(0)|“N”(-1)} and Pr{“N”(0)|“Y”(-1)}
Stat 110 bheavlin@stat.stanford.edu
Schredder-Schredder chess match
Intel=L Intel=W
AMD=W Draw AMD=L
AMD white 16 44 11 Intel black
AMD black 11 40 19 ntel white

## Ignoring draws, White odds = 35:22 ignores

AMD white odds = 16:11 AMD vs Intel effects, ignores
AMD black odds = 11:19 any sample size imbalances.
= Intel white odds vs
AMD odds = 27:30 log(AMD white odds)+
log(Intel white odds),
White odds = 35:22
Stat 110 bheavlin@stat.stanford.edu
confidence intervals for odds

## 95% confidence interval for White odds=

White odds = 35:22 = 1.59 AMD white odds ×
ln(35/22)= 0.464 Intel white odds
1/35=0.0286 1/22=0.0455 = (16/11) ×(19/11)
s.e.=[0.0286+0.0455]1/2 Log odds “ratio”=0.921
=0.272 s.e.=[1/16+1/11+1/19+1/11]1/2
0.464 ± 2×0.272 =[0.297]1/2 = 0.5449
=(–0.080,1.008) log-odds 0.921 ±2×0.5449
(0.923,2.741) as odds =(–0.169,1.089) as log odds
OR (0.845,2.97) as odds

## Stat 110 bheavlin@stat.stanford.edu

sample sizes from confidence intervals

## • old and new processes,

• any difference in yield?

## Suppose we know σo=σ+=σ.

• standard error for the difference in means
= σd(n) = σ(1/no + 1/n+)1/2
= σ(2/n)1/2, where no=n+= n.
• with approx 95% confidence interval
d ± 2σd(n)
Stat 110 bheavlin@stat.stanford.edu
sample sizes (solution)

## fix the length of the confidence interval =Δ,

solve for n:
Δ = (d+2σd(n))–(d–2σd(n)) = 4σd(n) = 4×σ(2/n)1/2 or
Δ2 = 16×σ2×(2/n) = 32σ2/n
n = 32σ2/Δ2

## e.g. Δ= 2σ, n = 32σ2/(2σ)2=8 per group

e.g. Δ= σ, n = 32 per group

## Stat 110 bheavlin@stat.stanford.edu

sample sizes (one-sample version)
When: process monitoring, paired data
Suppose we know σd.
• standard error for the mean difference = σd / n1/2
• with approx 95% confidence interval d ± 2σd / n1/2

## fix length of the confidence interval=Δ, solve for n:

• Δ = (d+2σd /√n )–(d–2σd/√n) = 4σd/√n or
• n = 16σd2/Δ2
e.g. Δ= 2σd, n = 16σ2/(2σ)2= 4 pairs
e.g.
Stat 110
Δ= σd, n = 16σ /σ
2 2
=16 pairs
bheavlin@stat.stanford.edu
The price of two-sample testing

## 1. Assuming σd is comparable to σx, the two-sample

problem requires twice the number of
observations. This is because the value of its
control group is random.
2. In addition, the cost of a pair is usually less that
of two unrelated observations.
3. Finally, when pairing is feasible, it is reasonable
for σd < σx . When pairing is at random, σd = σx√2,
and the one-sample test is burdened by the loss
of degrees of freedom.
Stat 110 bheavlin@stat.stanford.edu
Hypothesis testing

## 1. null hypothesis, whereby the population

parameter is “uninteresting, unremarkable,
default, null=zero.”
2. alternative hypothesis which is implicitly
accepted if the null hypothesis is rejected. The
alternative hypothesis is usually not unique.
3. test statistic computed from observed sample.
4. rejection region which defines values of the test
statistic that would reject the null hypothesis.

## Stat 110 bheavlin@stat.stanford.edu

e.g. one-sample mean, σ known
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d×n1/2/σ
Rejection region: z > 1.645
0.4

0.3

0.2

0.1

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
standard normal z
Stat 110 bheavlin@stat.stanford.edu
…p-value version
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d × n1/2/σ
One-sided p-value:
0.4

## p1-value = P(zobs>zpv | Ho) 0.3

0.2
= P( d /(σ/n )> zpv |Δ=0)
1/2
0.1

= 1– Φ(d /(σ/n1/2)) 0

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

=StatΦ(–d×n
110
1/2
/σ) standard normal z
bheavlin@stat.stanford.edu
…two-tailed p-value
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d × n1/2/σ

Two-sided p-value:
p2-value = P(|zobs| > zpv | Ho ) 0.4

## = P( |d|/(σ/n1/2)> zpv | Δ=0 ) 0.3

=1–[Φ(|d|/(σ/n1/2))–Φ(-|d|/(σ/n1/2))] 0.2

= 2×Φ(-|d|×n1/2/σ) 0.1

= 2×p1-value 0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
standard normal z
Stat 110 bheavlin@stat.stanford.edu
one-sample mean, σ unknown, p-value
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic t = d / (s/n1/2) = d × n1/2/s
Rejection region: |t| > tdf=n-1 (α/2)
0.4

0.3
Two-sided p-value:
0.2
p2-value = P(|tobs|>tpv |Ho )
0.1

= P( |d|/(s/n1/2)> tpv|Δ=0) 0

= 2×T(–|d|×n1/2/s, df=n-1) -6 -5 -4 -3 -2 -1 0
t(df=4)
1 2 3 4 5 6

## Stat 110 bheavlin@stat.stanford.edu

Example: overetch yield experiment

## lot split 01-12 13-24 delta

1 clearout 01-12 75 68 7
2 clearout 01-12 45 61 -16
3 clearout 01-12 81 79 2
4 clearout 01-12 78 87 -9
5 clearout 01-12 57 77 20

mean = –7.20 t = d √n / s
stdev = 11.52 = –7.2×√5 / 11.52 = –1.40
t(df=4,0.975)=2.776 p-value = 0.117
Stat 110 bheavlin@stat.stanford.edu
Wafer position sequence effects
120
for this effect 100
in the 1st 12
80
vs 2nd 12 split
experiment? 60
zig-zag
even-vs-odd
40 early effect
wafer
20
effect
0
0 5 10 15 20 25
wafer processing sequence
Stat 110 bheavlin@stat.stanford.edu
20 concurrent unsplit lots
lot split 01-12 13-24 lot split 01-12 13-24
6 no splits 62.5 58.8 17 no splits 66.7 72.9
7 no splits 50.5 30.6 18 no splits 75.4 68.6
8 no splits 72.5 71.6 19 no splits 78.3 81.4
9 no splits 86.0 73.8 20 no splits 75.5 77.8
10 no splits 68.6 59.3 21 no splits 84.0 73.6
11 no splits 76.6 78.2 22 no splits 79.6 78.5
12 no splits 55.6 44.3 23 no splits 77.9 74.7
13 no splits 64.6 71.3 24 no splits 64.8 61.7
14 no splits 73.5 77.5 25 no splits 69.6 70.2
15 no splits 81.3 77.7 26 no splits 70.7 71.4
16 no splits 66.0 3.1
Stat 110 bheavlin@stat.stanford.edu
Adjusting for the wafer position bias
delta
-30 -20 -10 0 10 20

clearout

df 24
no splits

## diff means –10.69

pooled stdev .96
t = –10.69 / [7.96×(1/5+1/24)1/2]
= –2.70 with df=24
two-sided p-value = 0.0126
Stat 110 bheavlin@stat.stanford.edu
Two errors
Type I error is the probability that, given the null
hypothesis is true, of the statistical procedure
rejecting the null hypothesis.

## Type II error is the

probability that, given the truly truly
alternative hypothesis is delta=0 delta≠0
true, of not rejecting the we say 1–α β=
null hypothesis. Type II "delta=0” type II
error is a strong function
of the particular we say α= 1–β =
alternative. "delta≠0” type I power
Stat 110 bheavlin@stat.stanford.edu
IF … THEN power(Δ)
1.0
0.9
0.8
0.7
0.6

## IF Δ=0, THEN 0.5

0.4
0.3
the probability of a 0.2

0.0
-1 0 1 2 3

## IF Δ=ΔA, THEN Δ=0 Δ

the probability of a
significant result = 1–β(ΔA)

## Stat 110 bheavlin@stat.stanford.edu

one-sample mean…
power(ΔA)
1.0
0.9
Null Ho : Δ=0. Alt HΔ : Δ=ΔA. 0.8
0.7
statistic z = d /(σ/n ) = d×n
1/2 1/2
/σ 0.6
0.5

## reject z > 1.645 0.4

0.3
0.2
P( z > 1.645 | Δ=0 ) = α = 0.05 0.1
0.0
-1 0 1 2 3
power(ΔA) = 1–β
ΔA
= P( z > zα | Δ=ΔA ) = P( d×n /σ > zα |Δ=ΔA)
1/2

## = P( (d – ΔA + ΔA)×n1/2/σ > zα | Δ= ΔA)

= P((d – ΔA )×n1/2/σ + ΔAn1/2/σ > zα | Δ= ΔA)
= P( z + ΔAn1/2/σ > zα )=P( z > zα–ΔAn1/2/σ )
= Φ( –zα + ΔAn1/2/σ )
Stat 110 bheavlin@stat.stanford.edu
connection to hypothesis tests:
• When the confidence interval contains zero, then
the conventional null hypothesis that the
population parameter is zero cannot be rejected
(at the given confidence level=1–significance).
• Confidence intervals consist of those null
hypotheses that cannot be rejected (at the given
confidence level=1–significance).
• Confidence intervals have sufficient information
to determine whether the null hypothesis is to be
rejected.

## Stat 110 bheavlin@stat.stanford.edu

one-sample mean…
0.4
0.3 Δ=0 zα
power(ΔA) = 1– β 0.2
0.1
= P( z >zα–ΔA×n /σ )
1/2
0

so 0.4
0.3
Δ√n/σ =1
z1–β = zα – ΔA×n1/2/σ or 0.2
0.1

–zβ = zα – ΔA×n1/2/σ or 0

0.4
ΔA×n /σ = zα + zβ
1/2
or 0.3 Δ√n/σ =2
0.2
n 1/2
= ( zα + zβ ) ×σ / ΔA 0.1
0
n = ( zα + zβ )2×σ2/ ΔA2
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

## Stat 110 bheavlin@stat.stanford.edu

Two sample version:

## n = 2[zα/2 + zβ]2 σ2/(μ1─μ0)2

Guenther’s refinement:
2
n = 2[zα/2 + zβ]2 σ2/(μ1─μ0)2 + zα/2/4

## Stat 110 bheavlin@stat.stanford.edu

Comparison of Sample Size Calculations
(Two-Sample Problem)

power=0.5 power=0.9
non- non-
alpha Δ/σ n* Guenther central n* Guenther central
0.05 0.25 122.93 123.89 123.88 336.24 337.2 337.2
0.05 0.5 30.73 31.69 31.71 84.06 85.02 85.03
0.05 0.75 13.66 14.62 14.67 37.06 38.32 38.34
0.05 1 7.68 8.64 8.73 21.01 21.98 22.02
0.05 1.25 4.92 5.88 6.02 13.45 14.41 14.48
0.05 1.5 3.41 4.37 4.57 9.34 10.3 10.4
0.05 1.75 1.92 2.88 3.17 5.25 6.21 .39

## Stat 110 bheavlin@stat.stanford.edu

Examples:
Yield: standard process 100 dpw
“new” process 110 dpw?
Δ=new-std 10 dpw
σ 25 dpw

## Reliability: standard process 30

“new” process 35, 40
Δ =new-std 5, 10
σ 6

## Stat 110 bheavlin@stat.stanford.edu

Corporation Named
paired data
of Interest Competitor count
(binary): yeasayers => Yes Yes 798
Yes No 406
No Yes 95
naysayers => No No 220
total 519

## The key information is patterns (yes,no) & (no,yes).

We proceed conditionally: CoI vs NC odds = 406:95 = 4.27
Log odds 95% CI = 1.45 ± 2×0.114=(1.22,1.68)
Odds 95% CI = (3.40,5.36);
95 % CI Yes fraction = Odds/(1+Odds) = (0.773,0.843)
Stat 110 bheavlin@stat.stanford.edu
One-sample variance problem
Null hypothesis Ho : σ = σo
Alternative HA : σ > σo
Test statistic and ν s2 / σo2
rejection region > χ2(df=ν,0.95)
0.4

0.3
one-sided p-value:
p1-value 0.2

## = P(ν s2 / σo2 >χpv | Ho) 0.1

0.0

= 1– χ2Inv(df=ν, νs2/σo2 ) 0 1 2 3 4 5 6 7 8
Stat 110 bheavlin@stat.stanford.edu
Two-sample variance problem
Null hypothesis Ho : σ1 = σ2
Alternative HA : σ1 ≠ σ 2
Test statistic and s12 / s22
rejection region < F(ν1,ν2, 0.025) or
> F(ν1,ν2, 0.975)
two-sided p-value:
Label so that s1 > s2
p2-value
= 2×P( s12/s22 > F Inv(ν1,ν2,1–p2/2) | Ho)
= 2×F(ν1, ν2, s12/s22)
Stat 110 bheavlin@stat.stanford.edu
CPU times (reprise)
3 3 3

.99 .99

.99
2 2 2
.95 .95 .95
.90 .90 .90
1 1 1
.75 .75 .75

## .25 .25 .25

-1 -1 -1
.10 .10 .10
.05 .05 .05
-2 -2 -2
.01 .01 .01

-3 -3 -3

0 1 2 3 4 5 0 1 2 -6 -5 -4 -3 -2 -1 0 1 2 3

## linear scale square roots log (base 2)

Stat 110 bheavlin@stat.stanford.edu
variability tracking with mean
0 5 10

raw
data group mean stdev
1.46 0.58 4.31 1.02 1 2.89 2.62
1.30 8.24 3.51 6.87 2 3.56 4.10
5.92 1.86 1.41 1.70 3 3.08 1.50
4 3.20 3.20
0.17 2.92 0.91 0.43 5 1.21 0.95
1.43 1.44 4.49 4.21 6 2.00 0.80
2.02 1.65 1.40 .40 7 2.27 1.94
8 2.01 .96
Stat 110 bheavlin@stat.stanford.edu
a few power transformations
5 1.25

4
1
stdev (linear)

stdev (sqrt)
3
0.75
2

0.5
1
linear sqrt
0 0.25
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 1.2 1.4 1.6 1.8
mean (linear) mean (sqrt)
2

stdev 1.75

1.5
stdev (log)

1.25

0.75
mean log
0.5
-0.5 0 .5 1 1.5
mean (log)
Stat 110 bheavlin@stat.stanford.edu
Why power transformations?
Theoretical reasons
• align physical relationships to (linear) statistical models.
Empirical reasons
• to reduce correlations of group variances to group
means.
• reduces the influence of large values without making
them into outliers
• reduces the skewness in right-skewed data (λ<1).
• resolve an ambiguity in scale, (e.g a rate vs its
reciprocal).
Preference order:
• λ = 0 (logs), 1/2 (square roots), -1 (inverses),
1/3 (cube roots~logs with zeros)
Stat 110 bheavlin@stat.stanford.edu
Box-Cox transformations
“poor man’s” Box-Cox
procedure
2. For each group, calculate
the mean, and the
standard devation.
3. Plot the log(stdev) vs log
(mean).
4. Estimate the slope, say r.
What are they? 5. The recommended
Response y → yλ power for transforming
Note: y → (yλ ─1)/ λ the raw data is 1–r,
@1 equals 0 with slope 1 (suitably rounded).

## Stat 110 bheavlin@stat.stanford.edu

two examples
theoretical empirical Box-Cox
Suppose the standard 2.5

deviation is proportional 2

log stdev
1

## σ(μ) = μ×σo 0.5

0
Box-Cox plots log(σ(μ)) vs -0.5
log(μ): 0 .5 1 1.5 2
log mean

log(σ(μ))= 1×log(μ)+log(σo),
slope = 1.24
slope r is 1, 1–r=0, so
transform by taking logs 1–r = –0.24, suggests logs
of raw data. or reciprocal sqrts.
Stat 110 bheavlin@stat.stanford.edu
linear vs log: plots of transformed data
Ra226 (linear) log2 Ra226

0 1 2 3 4 5 6 7 8 9 -3 -2 -1 0 1 2 3

g2
g2

A A

B B

C C

D D

E E

F F

G G

H H

linear logarithms
Stat 110 bheavlin@stat.stanford.edu
Mis-calibration:
thickness
Target thickness
β (mean) is β to
target
thickness deviation is (β – b)to is
b
actual proportional to
time mean.

to
So multiplicative relationships tend to promote
constant coefficients of variation, and log
transforms.
Stat 110 bheavlin@stat.stanford.edu
sums of small positive errors
actual thickness
Examples:
= ( Σ bi Δt i ) Poisson mean = λ
with variance with variance = λ
= ( Σ Δti2 Var(bi) ) sums of independent
Poissons are Poisson
= (Δt Var(b)) ΣiΔti
= σbΔ2 to so variance is Chi-square (gamma):
proportional to square root
mean =ν
of mean.
variance = 2 ν
sums of independent chi-
squares still chi-squares
Stat 110 bheavlin@stat.stanford.edu
Why Box-Cox works:
Background theory: Setup:
g(X) ≈ g(μ) + g'(μ)(X – μ ) or log( σ(μ) ) = k + r log(μ) or
g(X) – g(μ) ≈ g'(μ)(X – μ ) so log(σ2(μ)) = 2k + 2rlog(μ) or
E(g(X) – g(μ))2 σ2(μ)) = c μ2r
≈ g'(μ)2 E(X – μ)2 so
Var(g(X)) ≈ g'(μ)2 Var(X) Suppose g(x) = x1– r then
g'(x) = (1–r) x– r or
g'(x)2 = (1–r)2 x–2r so
Var(g(X)) ≈ g'(μ)2 Var(X)
= (1–r)2 μ–2r c μ2r
≈ constant with respect to μ
Stat 110 bheavlin@stat.stanford.edu
The other two-sample t-test

## In general, for independent observations from two

populations,
E( X1 – X2 ) = μ1 – μ2 ,
2 2 “classical”
Var( X1 – X2 ) = σ1/n1 + σ2/n2
So a natural two-sample t-statistic is
x1 – x2 x1 – x2
not
2 2
[ s1/n1 + s2/n2 ]1/2 sp [1/n1+1/n2 ]1/2
2 2
where s =[ (n1–1)s + (n2–1)s ]
p
2
1 2

Stat 110
(n1–1) + (n2–1)
bheavlin@stat.stanford.edu
issues

## The two t-statistics differ when n1 ≠ n2 or s1 ≠ s2.

In larger samples can the differences among s1, s2,
sp be worrisome, but M&S distinguish between
them by whether n1 ≠ n2 or n1 = n2.
For the “unequal variances” procedure, there is no
clear theory for its sampling distribution…in
particular we need to figure out the associated
degrees of freedom.

## Stat 110 bheavlin@stat.stanford.edu

Degrees of freedom for unequal variances t
Lemma: Let s be a standard deviation from
independent normals with same mean and
variance σ2, and ν degrees of freedom.
Var( s2 ) = 2 σ 4 / ν .
So Var( s12/n1 + s22/n2 )
= 2[σ12/n1]2 /ν1 + 2[σ22/n2]2/ν2, (set equal to)
= 2 σ 4 / ν, where σ =[σ12/n1 + σ22/n2 ]1/2 . Of course,
we don’t know σ1 or σ2 , so we “plug in” s1,s2:
So, [s12/n1+s22/n2]2/ν = (s12/n1)2/ν1+(s22/n2)2/ν2, from
which we solve for ν.
Stat 110 bheavlin@stat.stanford.edu
Clearout 5 split + 21 unsplit
split unsplit sum
ng 5 21
mean = xg –7.2 3.49
standard deviation = sg 11.52 7.04

df = νg = ng –1 4 20

## sg2 / ng 26.5421 2.360076 28.902

[sg2 / ng]2 / νg 176.1205 0.278498 176.399

## Stat 110 bheavlin@stat.stanford.edu

…continued

–7.2 – 3.49
= -1.988 = t with df=4.735
[ 28.902 ]1/2

## p-value (one tail) = 0.059 tν=4.735

or
p-value (two tail) = 0.118
0.059 0.059
Conventional pooled t
with df=24, p-values
= 0.0292, 0.0583 -5 -4 -3 -2 -1 0 1 2 3 4 5

## Stat 110 bheavlin@stat.stanford.edu

Different variances in different groups,
• This can often be of intrinsic interest, with groups
with smaller variation usually more desirable.
• When variation tracks with the mean level (usually
higher going with higher), Box-Cox power
transformations are suggested.
• When differences in means are still of interest (in
spite of differences among groups in variation), the
alternative t-test conservatively adjusts the degrees
of freedom.
• Note df≈5 vs 24, p-value=0.053 vs 0.029
• This low power is why M&S recommend Wilcoxon.
Stat 110 bheavlin@stat.stanford.edu
Metrology study
date mean stdev
Monitor of the same 17-Sep 1051 2.2
linewidth (same spot)
22-Sep 1062 4.3
on 10 days,
28-Sep 1063 3.1
28-Sep 1058 4.7
29-Sep 1057 3.6
30-Sep 1060 3.3
1-Oct 1062 4.1
2-Oct 1066 4.7
5-Oct 1061 4.1
6-Oct 1060 3.4

## Stat 110 bheavlin@stat.stanford.edu

components of variance
day-to-day: σday
d

meas-to-meas: σmeas
(repeatability) +m

total variation:
2 2
σtotal = [σday +σmeas]1/2
d+m
(reproducibility)
Stat 110 bheavlin@stat.stanford.edu
Estimating these two variances

## Pooled within day standard deviation = 3.82

= RMS( 2.2, 4.3, …, 3.4 )