Statistics 514: Design of Experiments

Topic 2
Topic Overview
This topic will cover
• Basic Statistical Concepts (Montgomery 2-1, 2-2)
• Commonly Used Densities (Montgomery 2-3)
Basic Statistical Concepts
• Random Variable - Y
– Quantity (response) capable of taking on a set of values
– Discrete or continuous
¸
i
Pr(Y = y
i
) = 1 or

f(y)dy = 1
– Described by a probability distribution (density f)
• Numerical Summaries of a Variable
– Center - Mean: µ, E()
– Spread - Variance: σ
2
, Var()
Discrete Continuous
µ :
¸
yPr(Y = y)

y f(y)
σ
2
:
¸
(y −µ)
2
Pr(Y = y)

(y −µ)
2
f(y)
• Independence – Observations are statistically independent if the value of one of the
observations does not influence the value of any other observations.
• Elementary Results of Numerical Summaries
– E(aY ±b) = aE(Y ) ±b
– Var(aY ±b) = a
2
Var(Y )
– E(Y
1
±Y
2
) = E(Y
1
) ±E(Y
2
)
– Cov(Y
1
, Y
2
) = E(Y
1
Y
2
) −E(Y
1
)E(Y
2
).
– If Y
1
and Y
2
are independent, → Cov(Y
1
, Y
2
) = 0.
– Var(Y ) = E(Y
2
) −E(Y )
2
= E[(Y −E(Y ))
2
]
Topic 2 Page 1
– Var(Y
1
±Y
2
) = Var(Y
1
) + Var(Y
2
) ±2Cov(Y
1
, Y
2
)
– E(Y
1
×Y
2
) = E(Y
1
)E(Y
2
), if Y
1
, Y
2
independent.
– However, E

Y
1
Y
2

=
E(Y
1
)
E(Y
2
)
.
Common Sample Summaries
• Sample mean (
¯
Y )
If Y
1
, . . . , Y
n
are independent with mean µ and variance σ
2
,
E

1
n
¸
Y
i

=
1
n
¸
E(Y
i
) =
1
n
nµ = µ
Var

1
n
¸
Y
i

=
1
n
2
¸
Var(Y
i
) =
1
n
2

2
= σ
2
/n
What is the distribution of
¯
Y ?
If Y
i
Normal →
¯
Y Normal
If Y
i
Other →
¯
Y ≈ Normal
The Central Limit Theorem
If Y
1
, . . . , Y
n
are independent R. V.’s with mean µ and variance σ
2
,
¸
Y −nµ


2
∼ N(0, 1)
• Sample variance (S
2
=
1
n−1
¸
(Y
i

¯
Y )
2
)
E(Y
i

¯
Y ) = E(Y
i
) −E(
¯
Y ) = 0
Var(Y
i

¯
Y ) = Var(Y
i
) + Var(
¯
Y ) −2Cov(Y
i
,
¯
Y )
= σ
2
+ σ
2
/n −2σ
2
/n
=
n −1
n
σ
2
E(S
2
) =
1
n −1
¸
Var(Y
i

¯
Y )
=
1
n −1
n
n −1
n
σ
2
= σ
2
• Sample standard deviation (S =

S
2
)
What is the distribution of S
2
?
If Y
i
Normal, then
(n −1)S
2

2
∼ χ
2
n−1
,
where n −1 is the degrees of freedom.
Topic 2 Page 2
Setup
• Goal: Learn about population from (randomly) drawn data/sample
• Model and Parameter: Assume population (Y ) follow a certain model (distribution)
that depends on a set of unknown constants (parameters): Y ∼ f(y, θ)
Example: Y is the yield of a tomato plant
Y ∼ N(µ, σ
2
)
f(y) =
1

2πσ
2
exp


(y −µ)
2

2

θ = (µ, σ
2
)
Random sample/observations
• Random sample (conceptual)
X
1
, X
2
, . . . , X
n
∼ f(x; θ)
• Random sample (realized/actual numbers)
x
1
, x
2
, . . . , x
n
Example: 0.0 4.9 -0.5 -1.2 2.1 2.8 1.2 0.8 0.9 -0.9
Populations/Samples
A parameter is the true value of some aspect of the population. (Examples: mean, median,
variance, slope)
An estimator is a
• statistic that corresponds to a parameter.
• random variable (not based on any particular data).
An estimate is a particular value of the estimator, computed from the sample data. It is
considered fixed, given the data.
Sampling from a population:
Collection of possible values Numerical Summary
What you want to know Population Parameter
What you actually get to see Sample Statistic
Estimator:
ˆ
θ = g(Y
1
, . . . , Y
n
)
Estimate:
ˆ
θ = g(y
1
, . . . , y
n
)
Topic 2 Page 3
Example
Estimators for µ and σ
2
ˆ µ =
¯
Y =
¸
n
i=1
Y
i
n
; ˆ σ
2
= S
2
=
¸
n
i=1
(Y
i

¯
Y )
2
n −1
Estimates
ˆ µ = ¯ y = 1.01 ; ˆ σ
2
= s
2
= 3.49
Variance vs. Bias





























While variance refers to random spread, bias refers to a systematic drift in an estimator.
Most of the time, we will be concerned with the variance and bias of estimators, not
populations. (Thus, an estimate of variance may be biased.) Thus, bias and variance are
inherent (and thus often subject to manipulation) in a statistical method, not the sampling.
An estimator
ˆ
θ of θ is unbiased if E(
ˆ
θ) = θ.
Unbiased Biased
sample mean sample standard deviation
sample variance F-ratio
Degrees of Freedom of a sum is equal to the number of elements in that sum that are
independent (i.e., free to vary).
For example, if you are told the sum of three elements equals five, you only need
to know two of the three elements to know all of them.
General Result:
If Y
i
has variance σ
2
and SS =
¸
(Y
i

¯
Y )
2
with k degrees of freedom,
E(SS/k) = σ
2
.
Topic 2 Page 4
Sampling/Reference Distribution
Statistical inference/testing: making decision in the presence of variability. Is result of
experiment easily explained by chance variation or is it “unusual”?
• “Unusual”: Is it unlikely if only chance variation?
• Need distribution of results assuming only chance variation (null distribution)
• Compare observed result with distribution of outcomes
• Example 1: t-test (comparing two means)
– Calculate observed t-test statistic
– t distribution summarizes outcomes under Null hypothesis
– Compare observed result with distribution
• Example 2: randomization test
– Chance variation due to randomization
– Generate all possible outcomes (each equally likely)
– Compare observed result with distribution of outcomes
Standard Normal Distribution
Take Z
i
∼ N(0, 1) independent i = 1, . . . , n.
So P(a < Z
i
< b) =

b
a
f(z)dz, where
f(z) =
1


e

1
2
z
2
-4 -2 0 2 4
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
Topic 2 Page 5
The density of Z = (Z
1
, . . . , Z
n
) is
f(z) = (2π)
−n/2
e

1
2
¸
n
i=1
z
2
i
Normal with mean µ and variance σ
2
X = µ + σZ ∼ N(µ, σ
2
)
Standardizing
Z =
X −µ
σ
∼ N(0, 1)
Application
Used to model random observations.
Y = µ(X) + eσ, where e ∼ N(0, 1).
Given X = x, Y ∼ N(µ(x), σ
2
).
Multivariate Normal
X = µ + AZ ∼ N(µ, A

A).
Z = A
−1
(X −µ) ∼ N(0, I).
Special case: A orthogonal, A

A = I.
SAS Code
data normal;
qtl = probit(0.975); /* to get z value for 95% confidence interval */
pval = 2*(1-probnorm(2.3)); /* to get p-value of 2-sided z-score 2.3 */
run;
proc print data = normal;
run;
Obs qtl pval
1 1.95996 0.0214
R Code
> qtl = qnorm(0.975)
> qnorm(0.975)
[1] 1.959964
> 2*(1-pnorm(qtl))
[1] 0.05
Topic 2 Page 6
Chisquare distribution
Chisquare distribution on n degrees of freedom. Sums of squares of normals; usually in
variance estimates
S
2
=
n
¸
i=1
Z
2
i
∼ χ
2
(n)
¸
(Y
i

¯
Y )
2

2
∼ χ
2
(n−1)
Chisquare densities on 2, 5, 10, d. f.
0 5 10 15 20 25 30
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
E(S
2
) = n, Var(S
2
) = 2n
• For large n, χ
2
(n)
.
= N(n, 2n) by CLT.
• In SAS, use probchi(q, df) for p-values and cinv(p, df) for quantiles.
• In R, use pchisq(q, df) for p-values and qchisq(p, df) for quantiles.
Student’s t Distribution
If X ∼ N(0, 1) and s
2

1
d
χ
2
(d)
, independent, then t =
X
s
∼ t
(d)
.
Recipe: t
(d)
= N(0, 1)/

indep χ
2
(d)
/d.
Topic 2 Page 7
t densities on 2, 5, 10, ∞ d. f.
-4 -2 0 2 4
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
small d ⇒ heavy tails
Handy theorem
X
i
∼ N(µ, σ
2
) independent. Let
¯
X =
1
n
¸
n
i=1
X
i
, s
2
=
1
n−1
¸
n
i=1
(X
i

¯
X)
2
Then
¯
X ∼ N(µ, σ
2
/n), s
2
∼ (n −1)
−1
χ
2
(n−1)
, indep
Sample standardization
So, if X
i
∼ N(µ, σ
2
) independent, then
t =

n
¯
X −µ
s
∼ t
(n−1)
• In SAS, use probt(q, df) for p-values and tinv(p, df) for quantiles.
• In R, use pt(q, df) for p-values and qt(p, df) for quantiles.
Fisher’s F Distribution
If SS
N
∼ χ
2
(n)
independent of SS
D
∼ χ
2
(d)
,
Then
F =
1
n
SS
N
1
d
SS
D

MS
N
MS
D
∼ F
n,d
Notes
Topic 2 Page 8
• 1/F
n,d
∼ F
d,n
• F
1,d
= t
2
(d)
• As d →∞, F
n,d
→χ
2
(n)
/n.
F
2,d
densities with d ∈ {2, 5, 10, ∞}
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
F
5,d
densities with d ∈ {2, 5, 10, ∞}
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
Topic 2 Page 9
F
10,d
densities with d ∈ {2, 5, 10, ∞}
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
• In SAS, use probf(q, df
1
, df
2
) for p-values and finv(p, df
1
, df
2
) for quantiles.
• In R, use pf(q, df
1
, df
2
) for p-values and qf(p, df
1
, df
2
) for quantiles.
Noncentral Distributions
Noncentral chisquare
X
i
∼ N(a
i
, 1) independent
C =
¸
n
i=1
X
2
i
∼ χ
2
(n)
(φ), φ =
¸
i
a
2
i
Miracle: only depends on a
Arises: mean square when µ = 0 (in alternate hypotheses)
• In SAS, use probchi(q, df, φ) for p-values and cinv(p, df, φ) for quantiles.
• In R, use pchisq(q, df, φ) for p-values and qchisq(p, df, φ) for quantiles.
Noncentral F
F

n,d
(φ) =
χ
2
(n)
(φ)/n
χ
2
(d)
/d
Arises: probability of significant F-test under alternative (i.e. power of F test).
• In SAS, use probf(q, df, φ) for p-values and finv(p, df, φ) for quantiles.
• In R, use pf(q, df, φ) for p-values.
Topic 2 Page 10
Noncentral t
t

d
(a
1
) =
N(a
1
, 1)

χ
2
(d)
Power of t test
• In SAS, use probt(q, df, a
1
) for p-values and tinv(p, df, a
1
) for quantiles.
• In R, use pt(q, df, a
1
) for p-values.
Doubly noncentral F
F

n,d
(φ) =
χ
2
(n)

n
)/n
χ
2
(d)

d
)/d
For power when error mean square corrupted.
Noncentral distributions
Widely tabled
Watch parameterization closely
See
1. Encyclopedia of the statistical sciences
2. Johnson and Kotz’s books on distributions
3. the Web
Review on Finding p-values
Building block for p-values: cumulative distribution function (cdf)
• Let X be a random variable.
• The cdf for X is a function of x such that cdf(x) = P(X < x).
• Note: P(X > x) = 1 −P(X < x)
• Examples of functions which evaluate cdf for different distributions: probnorm, probchi,
probt, probf.
Topic 2 Page 11
x
c
d
f
(
x
)
-2 -1 0 1 2
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Figure 1: Example of cumulative distribution (for standard normal random variable)
1-sided p-values
• Used in F-tests, t-tests with > or < alternatives (not =).
• Procedure: Get test statistic u.
• If alternative hypothesis is θ > 0, then usually p-value is P(X > u).
• Example
– If alternative hypothesis is µ
1
− µ
2
> 0 and t statistic is 2.5 with 14 degrees of
freedom, the p-value is 1 −P(t
14
< 2.5) = 0.0127.
2-sided p-values
• Used in t-tests with = alternative.
• Procedure: Get test statistic u.
• If alternative hypothesis is θ = 0, then usually p-value is usually P(X > |u|) =
2(1 −cdf(|u|)).
• Example
– If alternative hypothesis is µ
1
− µ
2
= 0 and t statistic is 2.5 with 14 degrees of
freedom, the p-value is 2(1 −P(t
14
< 2.5)) = 0.0255.
Topic 2 Page 12
X
D
e
n
s
i
t
y

o
f

t
(
1
4
)
-2 0 2
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
Figure 2: Two-sided p-value corresponds to area in both shaded regions
• Notes
– Most of the time, SAS output will report the p-value associated with a particular
test. However, this will not always be true.
– Sometimes, a test statistic is so large that the p-value is reported as 0. This, of
course, is not true; you should report, in this case, that the p-value is too small
to be determined numerically.
Cutoff values/rejection regions
• Often, we report a cutoff value for a particular test at a particular α level.
• Test statistics that are larger (usually) than the cutoff value are in the rejection region,
so called because being in the rejection region means that the null hypothesis is rejected.
• The main function for finding cutoffs is the quantile function, which takes as its input
a probability. (For 1-sided tests, the input is usually 1 −α; for 1-sided test, the input
is usually 1 −α/2.)
• The quantile function is the inverse of the cdf.
• Examples of quantile functions in SAS include probit, cinv, tinv, and finv.
• Example
– Suppose you are running an F-test at level 0.05 with 3 and 35 degrees of freedom.
Topic 2 Page 13
– The cutoff for this test is 2.874 (the 95% quantile of an F distribution with 3 and
35 degrees of freedom); thus, F-ratios greater than 2.874 will result in the null
hypothesis being rejected.
Topic 2 Page 14

Topic 2 Page 2 . – However. .’s with mean µ and variance σ 2 . . E(Y2 ) Common Sample Summaries ¯ • Sample mean (Y ) If Y1 . Y2 ) – E(Y1 × Y2 ) = E(Y1 )E(Y2 ). if Y1 . . Y2 independent. E Var 1 n 1 n Yi Yi 1 n 1 = n2 = E(Yi ) = 1 nµ = µ n 1 Var(Yi ) = 2 nσ 2 = σ 2 /n n ¯ What is the distribution of Y ? ¯ If Yi Normal → Y Normal ¯ If Yi Other → Y ≈ Normal The Central Limit Theorem If Y1 . Yn are independent with mean µ and variance σ 2 . V. 1) nσ 2 • Sample variance (S 2 = 1 n−1 ¯ (Yi − Y )2 ) ¯ ¯ E(Yi − Y ) = E(Yi ) − E(Y ) = 0 ¯ ¯ ¯ Var(Yi − Y ) = Var(Yi ) + Var(Y ) − 2Cov(Yi .– Var(Y1 ± Y2 ) = Var(Y1 ) + Var(Y2 ) ± 2Cov(Y1 . then (n − 1)S 2 /σ 2 ∼ χ2 . Y ) = σ 2 + σ 2 /n − 2σ 2 /n n−1 2 σ = n 1 ¯ E(S 2 ) = Var(Yi − Y ) n−1 1 n−1 2 = n σ n−1 n = σ2 √ • Sample standard deviation (S = S 2 ) What is the distribution of S 2 ? If Yi Normal. E Y1 Y2 = E(Y1 ) . . Yn are independent R. . n−1 where n − 1 is the degrees of freedom. Y − nµ √ ∼ N (0. . . .

θ) • Random sample (realized/actual numbers) x1 . . . yn ) Numerical Summary Parameter Statistic Topic 2 Page 3 . .5 -1. An estimate is a particular value of the estimator. . . variance. . . Sampling from a population: Collection of possible values What you want to know Population What you actually get to see Sample Estimator: Estimate: ˆ θ = g(Y1 .9 -0. (Examples: mean.8 1. x2 .9 Populations/Samples A parameter is the true value of some aspect of the population. . . . .1 2. Yn ) ˆ θ = g(y1 . .2 2. median. xn Example: 0. θ) Example: Y is the yield of a tomato plant ∼ N (µ. • random variable (not based on any particular data). . It is considered fixed. slope) An estimator is a • statistic that corresponds to a parameter. Xn ∼ f (x. .9 -0.Setup • Goal: Learn about population from (randomly) drawn data/sample • Model and Parameter: Assume population (Y ) follow a certain model (distribution) that depends on a set of unknown constants (parameters): Y ∼ f (y. . σ 2 ) Y Random sample/observations • Random sample (conceptual) X1 . computed from the sample data. X2 .0 4. given the data.2 0. σ 2 ) (y − µ)2 1 exp − f (y) = √ 2σ 2 2πσ 2 θ = (µ.8 0. .

if you are told the sum of three elements equals five.e.. Topic 2 Page 4 .) Thus.01 . Bias • • ••• • • •• • • •• • • • • • • •• • • • •• • • While variance refers to random spread. σ 2 = s2 = 3.Example Estimators for µ and σ 2 µ=Y = ˆ ¯ Estimates µ = y = 1. E(SS/k) = σ 2 . bias refers to a systematic drift in an estimator. bias and variance are inherent (and thus often subject to manipulation) in a statistical method. For example. (Thus. an estimate of variance may be biased. General Result: If Yi has variance σ 2 and SS = ¯ (Yi − Y )2 with k degrees of freedom. not the sampling. not populations. σ =S = ˆ 2 2 n i=1 (Yi ¯ − Y )2 n−1 Variance vs.49 ˆ ¯ ˆ n i=1 Yi n . ˆ ˆ An estimator θ of θ is unbiased if E(θ) = θ. free to vary). Unbiased sample mean sample variance Biased sample standard deviation F -ratio Degrees of Freedom of a sum is equal to the number of elements in that sum that are independent (i. we will be concerned with the variance and bias of estimators. Most of the time. you only need to know two of the three elements to know all of them.

Is result of experiment easily explained by chance variation or is it “unusual”? • “Unusual”: Is it unlikely if only chance variation? • Need distribution of results assuming only chance variation (null distribution) • Compare observed result with distribution of outcomes • Example 1: t-test (comparing two means) – Calculate observed t-test statistic – t distribution summarizes outcomes under Null hypothesis – Compare observed result with distribution • Example 2: randomization test – Chance variation due to randomization – Generate all possible outcomes (each equally likely) – Compare observed result with distribution of outcomes Standard Normal Distribution Take Zi ∼ N (0. b So P (a < Zi < b) = a f (z)dz.0 -4 0.2 0. . 1) independent i = 1. .1 0.Sampling/Reference Distribution Statistical inference/testing: making decision in the presence of variability. .4 -2 0 2 4 Topic 2 Page 5 .3 0. where 1 2 1 f (z) = √ e− 2 z 2π 0. . n.

. Y = µ(X) + eσ. σ 2 ).959964 > 2*(1-pnorm(qtl)) [1] 0. SAS Code data normal. Z = A−1 (X − µ) ∼ N (0. 1) σ Application Used to model random observations. run. Zn ) is f (z) = (2π)−n/2 e− 2 Normal with mean µ and variance σ 2 X = µ + σZ ∼ N (µ. σ 2 ) 1 n 2 i=1 zi Standardizing Z= X −µ ∼ N (0. I).05 Topic 2 Page 6 . proc print data = normal. where e ∼ N (0. Given X = x. Obs 1 qtl 1. A A = I. .The density of Z = (Z1 .95996 pval 0. .975) [1] 1.0214 R Code > qtl = qnorm(0. Multivariate Normal X = µ + AZ ∼ N (µ.3)).3 */ run. Y ∼ N (µ(x).975) > qnorm(0. qtl = probit(0. . A A). Special case: A orthogonal. /* to get z value for 95% confidence interval */ pval = 2*(1-probnorm(2.975). 1). /* to get p-value of 2-sided z-score 2.

1)/ indep χ2 /d. f. df ) for quantiles. then t = (d) X s ∼ t(d) . usually in variance estimates S = i=1 2 n Zi2 ∼ χ2 (n) ¯ (Yi − Y )2 /σ 2 ∼ χ2 (n−1) Chisquare densities on 2. • In R.5 0. • For large n. Recipe: t(d) = N (0. use pchisq(q.0 0 0. 1) and s2 ∼ d χ2 . 10. Student’s t Distribution 1 If X ∼ N (0. Var(S 2 ) = 2n . use probchi(q. Sums of squares of normals. 0. d. df ) for p-values and cinv(p. (d) Topic 2 Page 7 . 2n) by CLT.1 0.2 0.4 5 10 15 20 25 30 E(S 2 ) = n.Chisquare distribution Chisquare distribution on n degrees of freedom. χ2 = N (n. 5. (n) • In SAS. independent. df ) for p-values and qchisq(p. df ) for quantiles.3 0.

Fisher’s F Distribution If SSN ∼ χ2 independent of SSD ∼ χ2 . df ) for p-values and qt(p. s2 = 1 n−1 n i=1 (Xi ¯ − X)2 • In SAS. df ) for quantiles. ∞ d. f.1 0.t densities on 2. df ) for quantiles. then t= ¯ √ X −µ n ∼ t(n−1) s 1 n n i=1 Xi . 5.3 -2 0 2 4 small d ⇒ heavy tails Handy theorem Xi ∼ N (µ. use pt(q.2 0. use probt(q. σ 2 /n). 10. • In R.0 -4 0. (n) (d) Then F = Notes Topic 2 Page 8 1 SSN n 1 SSD d ≡ M SN ∼ Fn.d M SD . σ 2 ) independent. 0. σ 2 ) independent. Let ¯ X= Then ¯ X ∼ N (µ.4 0. indep Sample standardization So. if Xi ∼ N (µ. s2 ∼ (n − 1)−1 χ2 (n−1) . df ) for p-values and tinv(p.

5.n • F1.8 0.d = t2 (d) • As d → ∞. ∞} 0.8 2 4 6 8 10 F5.• 1/Fn. (n) F2.d densities with d ∈ {2.d densities with d ∈ {2. ∞} 0.6 0.4 0. Fn. 10.6 2 4 6 8 10 Topic 2 Page 9 .4 0. 5.0 0 0.d → χ2 /n.2 0. 10.2 0.0 0 0.d ∼ Fd.

df1 .F10. • In R. φ) for p-values. φ) for quantiles. df2 ) for quantiles. df1 . Noncentral F Fn. 5. • In SAS. 1) independent C = n Xi2 ∼ χ2 (φ). Topic 2 Page 10 .e. φ) for p-values and cinv(p. df . df . df . df2 ) for quantiles. use probchi(q. use pchisq(q.d densities with d ∈ {2. use pf(q. φ) for p-values and finv(p. use pf(q. • In R. φ) for quantiles.2 0.8 2 4 6 8 10 • In SAS.d (φ) = χ2 (φ)/n (n) χ2 /d (d) Arises: probability of significant F -test under alternative (i. φ = i a2 i i=1 (n) Miracle: only depends on a Arises: mean square when µ = 0 (in alternate hypotheses) • In SAS. df . use probf(q. φ) for quantiles.4 0. df2 ) for p-values and qf(p. df1 . df . 10. ∞} 1.0 0 0. φ) for p-values and qchisq(p. use probf(q. • In R. df . df . df1 . power of F test).6 0.0 0. df2 ) for p-values and finv(p. Noncentral Distributions Noncentral chisquare Xi ∼ N (ai .

probchi. a1 ) for quantiles. Encyclopedia of the statistical sciences 2. df . df . use probt(q. a1 ) for p-values and tinv(p. probt. N (a1 . • The cdf for X is a function of x such that cdf (x) = P (X < x). use pt(q. • In R.d (φ) = χ2 (φn )/n (n) χ2 (φd )/d (d) For power when error mean square corrupted. • Note: P (X > x) = 1 − P (X < x) • Examples of functions which evaluate cdf for different distributions: probnorm. the Web Review on Finding p-values Building block for p-values: cumulative distribution function (cdf) • Let X be a random variable. probf. Noncentral distributions Widely tabled Watch parameterization closely See 1. Topic 2 Page 11 . a1 ) for p-values. df .Noncentral t td (a1 ) = Power of t test • In SAS. Johnson and Kotz’s books on distributions 3. 1) χ2 (d) Doubly noncentral F Fn.

2-sided p-values • Used in t-tests with = alternative. the p-value is 2(1 − P (t14 < 2.5 with 14 degrees of freedom.6 0. t-tests with > or < alternatives (not =).0 -1 0 x 1 2 Figure 1: Example of cumulative distribution (for standard normal random variable) 1-sided p-values • Used in F -tests.0127. • If alternative hypothesis is θ = 0.0255.8 1.4 0. • Example – If alternative hypothesis is µ1 − µ2 = 0 and t statistic is 2.5)) = 0.cdf(x) 0. then usually p-value is usually P (X > |u|) = 2(1 − cdf (|u|)).5 with 14 degrees of freedom. • Procedure: Get test statistic u. • Example – If alternative hypothesis is µ1 − µ2 > 0 and t statistic is 2.0 -2 0.5) = 0. • If alternative hypothesis is θ > 0. Topic 2 Page 12 . • Procedure: Get test statistic u. the p-value is 1 − P (t14 < 2.2 0. then usually p-value is P (X > u).

a test statistic is so large that the p-value is reported as 0.) • The quantile function is the inverse of the cdf. in this case. this will not always be true. This.3 0. However. • Examples of quantile functions in SAS include probit. we report a cutoff value for a particular test at a particular α level. tinv. Topic 2 Page 13 .05 with 3 and 35 degrees of freedom. the input is usually 1 − α/2.0 0. SAS output will report the p-value associated with a particular test. of course.4 -2 0 X 2 Figure 2: Two-sided p-value corresponds to area in both shaded regions • Notes – Most of the time. is not true. you should report. cinv. that the p-value is too small to be determined numerically.1 Density of t(14) 0. so called because being in the rejection region means that the null hypothesis is rejected. (For 1-sided tests. • Example – Suppose you are running an F -test at level 0. • Test statistics that are larger (usually) than the cutoff value are in the rejection region. – Sometimes. which takes as its input a probability. Cutoff values/rejection regions • Often. and finv. for 1-sided test.2 0.0. • The main function for finding cutoffs is the quantile function. the input is usually 1 − α.

F -ratios greater than 2. Topic 2 Page 14 . thus.– The cutoff for this test is 2.874 (the 95% quantile of an F distribution with 3 and 35 degrees of freedom).874 will result in the null hypothesis being rejected.