You are on page 1of 37

Statistical Methods for Evaluation

Wilcoxon Rank-Sum Test


ANOVA, t-test
Week 5
Parametric vs. Non-parametric tests
Hypothesis Testing-Wilcoxon Rank Sum Test
▪Wilcoxon with both n1 and n2 < 10 or n1 and n2 ≥ 10

▪When we test a hypothesis about the difference between two


independent population means, we do so using the difference
between two sample means.

▪ When the two sample variances are tested and found not to
be equal
• As we cannot use the sample variances
• thus we cannot use the t-test for independent samples. Instead, we
use the Wilcoxon Rank Sum Test
µ tells us about the
population

Population 1 Population 2

µ1 µ2

X1 X2
Sample1 Sample2
The sample mean
tells us about µ
Wilcoxon Rank Sum Test
The Z test and the t test are “parametric tests” – that is, they
answer a question about the difference between populations by
comparing sample statistics (e.g., X1 and X2) and making an
inference to the population parameters (μ1 and μ2).

The Wilcoxon, in contrast, allows inferences about whole


populations
Distribution A Note that distribution B
is shifted to the right of
distribution A

μ X

Distribution B

μ X
Small samples, independent groups
Wilcoxon Rank Sum Test
• first, combine the two samples and rank order all
the observations
• smallest number has rank 1, largest number has
rank N (= sum of n1 and n2)
• separate samples and add up the ranks for the
smaller sample (If n1 = n2, choose either one)
• test statistics : rank sum T for smaller sample
Small samples, independent groups
Wilcoxon – Two-tailed Hypotheses
H0: Prob. distributions for 2 sampled populations are identical.
HA: Prob. distribution for Population A shifted to right or left of
distribution for Population B.
Small samples, independent groups
Wilcoxon – Rejection region:
(With Sample taken from Population A being smaller than sample
for Population B) – reject H0 if
TA ≥ TU or TA ≤ TL
Small samples, independent groups
Wilcoxon for n1 ≥ 10 and n2 ≥ 10:

Test statistic:
Wilcoxon for n1≥ 10 and n2 ≥ 10
Rejection region:
One-tailed Two-tailed

Z > Zα │Z│ > Zα/2

Note: use this only when n1≥ 10 and n2 ≥ 10

Wilcoxon
Example 1
- These are small samples, and they are independent (“random
samples of Cajun and Creole dishes”)
- Therefore, we must begin with the test of equality of variances
Cajun Creole
3500 3100
4200 4700
4100 2700
4700 3500
4200 2000
3705 3100
4100 1550
Test of hypothesis of equal variances
H0: 12 = 22
HA: 12 ≠ 22

Test statistic: F = S12


S22

Rej. region: F > Fα/2 = F(6,6,.025) = 5.82 Use table or Excel


=F.INV.RT(prob,df1,df2)
or F < (1/5.82) = .172
Test of hypothesis of equal variances
S2Cajun = (385.27)2 = 148432.14 ▪If F-value < F-critical >>>>Don’t reject
Null
S2Creole = (1027.54)2 = 1055833.33 ▪If F-value > F-critical >>>>Reject Null

Fobt = 148432.14 = 7.11


1055833.33

Reject H0 – variances are not equal, so we do the Wilcoxon rank sum


test
Example 1 – Wilcoxon Rank Sum Test
H0: Prob. distributions for Cajun and Creole populations are
identical
HA: Prob. distribution for Cajun is shifted to right of distribution for
Creole

Statistical test: T
Example 1 – Wilcoxon Rank Sum Test
Rejection region:
Reject H0 if TCajun > 66 (or if TCreole < 39)

(Note: We shall give lower heat values lower rank values)


Example 1 – Wilcoxon Rank Sum Test
Cajun Creole
6.5 4.5
3500 3100
11.5 13.5
4200 4700
9.5 3
4100 13.5 2700
6.5
4700 11.5 3500 2
4200 8 2000 4.5
3705 9.5 3100 1
4100 Σ 70 1550 35
Example 1 – Wilcoxon Rank Sum Test
Calculation check:

Sum of the ranks should = (n) (n+1)


2

70 + 35 = 105 = (14)(15)
2
Example 1 – Wilcoxon Rank Sum Test
TCajun = 70 > 66 (and TCreole = 35 < 39)

Therefore, reject H0 – Cajun dishes are significantly hotter than


Creole dishes.

Rejection region:
- TA ≥ TU or TA ≤ TL
- Reject H0 if TCajun > 66 (or if TCreole < 39)
Example 2 – Wilcoxon Rank Sum Test
6.4 2.7
H0: 12 = 22 1.7 3.9
3.2 4.6
HA: 12 ≠ 22 5.9 3.0
2.0 3.4
Test statistic: F= S12 3.6 4.1
5.4 3.4
S22 7.2 4.7
3.8
Rej. region: F > Fα/2 = F(7,8,.025) = 4.53
or F < (1/4.90) = .204
Example 2 – Wilcoxon Rank Sum Test
Fobt = 4.316 = 9.38
.46

Reject H0 – do Wilcoxon
Example 2 – Wilcoxon Rank Sum Test
H0: Prob. distributions for females and males populations are
identical.
HA: Prob. distribution for females is shifted to left of distribution for
males.

Statistical test: T
Rejection region:
T > TU = 90 (or T < TL = 54)
Example 2 – Wilcoxon Rank Sum Test
6.4 16 2.7 3
1.7 1 3.9 10
3.2 5 4.6 12
5.9 15 3.0 4
2.0 2 3.4 6.5
3.6 8 4.1 11
5.4 14 3.4 6.5
7.2 17 4.7 13
3.8 9
Σ 78 75
Example 2 – Wilcoxon Rank Sum Test
T = 78 < TU = 90

Therefore, do not reject H0 – no evidence that mean distance in


females is less than that in males.

Rejection region:
T > TU = 90 (or T < TL = 54)
Example 3 – Wilcoxon Rank Sum Test
Hoodoo Mukluk
H0: 12 = 22 2 6
HA: 12 ≠ 22 6 8
4 7
Test statistic: F= S12 23 10
7 8
S22
6 4

Rej. region: F > Fα/2 = F(5,5,.025) = 7.15


or F < (1/7.15) = .140
Example 3 – Wilcoxon Rank Sum Test
Fobt = (7.563)2 = 57.20 Rej. region: F > Fα/2 = F(5,5,.025) = 7.15

(2.04)2 4.16

= 13.74

Reject H0 – do Wilcoxon
Example 3 – Wilcoxon Rank Sum Test
H0: Prob. distributions for Hoodoo and Mukluk populations are
identical
HA: Prob. distribution for Hoodoos is shifted to right or left of
distribution for Mukluks.

Statistical test: T
Rejection region: TH > 52
Example 3 – Wilcoxon Rank Sum Test
Hoodoo Mukluk
2 1 6 5
6 5 8 9.5
4 2.5 7 7.5
23 12 10 11
7 7.5 8 9.5
6 5 4 2.5
Σ 33 45
Example 3 – Wilcoxon Rank Sum Test
Rejection region: TH > 52
Check: TH + TM = 78
(12)(13) = 78
2

TH = 33 > 26 and < 52

Do not reject H0 – no evidence for a significant difference between


teams.
Field 1 Field 2
15.2 15.9
15.3 15.9

Hypothesis Testing, Students t-test


16 15.2
15.8 16.6
15.6 15.2
14.9 15.8
15 15.8

▪ Field 1 and Field 2 15.4


15.6
16.2
15.6

▪ Took a sample from Field 1 and 2


15.7 15.6
15.5 15.8
15.2 15.5
▪ Can you tell which one has high Yield? 15.5
15.1
15.5
15.5

▪ Null hypothesis: 15.3


15
14.9
15.9
There is no statistically significant
difference between the samples.
▪ Decision:
▪If t-value < t-critical >>>>Don’t reject
Null
▪If t-value > t-critical >>>>Reject Null
Students t-test
t-table
▪Decision:
▪If t-value < t-critical >>>>Don’t reject Null

Students t-test ▪If t-value > t-critical >>>>Reject Null

Dof = n1 + n2 -
2

Null hypothesis:
There is no statistically significant difference between the
samples.
▪ t-value > t-critical
▪ 2.3 > 2.04
▪That means, there is some statistically significant difference
between the samples
t-table

Students t-test Plant 1 Plant 2


38 32
Null hypothesis: 52 39
There is no statistically significant 48 54
difference between the samples 25 47
39 41
▪ t-value < t-critical ?? 51 34
▪ t-value > t-critical ?? 46 30
55 36
▪Decision: 46 36
▪If t-value < t-critical >>>>Don’t reject Null
▪If t-value > t-critical >>>>Reject Null 53 38
45 46
42 58
54 52
65 29
56 76
67 46
74 46
32 45
Mean 43
48.09524 46.80952 78
STD 11.77669 15.11826
34
Variance 138.6905 228.5619
80
n 45 21 21 40
Analysis of variance (ANOVA) Group 1 Group 2 Group 3
1 2 2
2 4 3
Step 1: 5 2 4
Null hypothesis:
There is no difference between means
µ1 = µ2 = µ3
Alternative:
At least there is one difference among the means
Alpha = 0.05
Step 2
Find critical F-value
- Dof (between-numerator) = k – 1 = 3 – 1 = 2
k (number of conditions in our group
- Dof (within-denominator) = N – k = 9 – 3 = 6,
N (total number of scores we have in sample
- Dof (total) = 8
- F-Critical = 5.14 (from table)
Rubber Rubber Rubber
supplier supplier supplier
1 2 3

ANOVA 1
2
2
4
2
3
5 2 4
Step 3:
Analysis of sum of squares-total SS (within) = Sum of squares (within)
variability = Sum (x1 – mean (x1))^2
Mean for each condition/group + Sum (x2 – mean (x2))^2
- Mean x1 = 2.67 + Sum (x3 – mean (x3))^2
- Mean x2 = 2.67
- Mean x3 = 3.00 = (1-2.67)^2 + (2- 2.67)^2 + (5-2.67)^
+ (2-2.67)^2 + (4- 2.67)^2 + (2-2.67)^
Grand mean (G) = G/N = + (2-3)^2 + (3- 3)^2 + (4-3)^2
= (1+2+5+2+4+2+2+3+4)/9
G = 2.78 SS (within) = 13.34

SS (total) = Sum of squares (total) SS (between) = SS(total) – SS (within)


= Sum (x - G)^2 = 13.6 – 13.34
= (1-2.78)^2 + (2- 2.78)^2 + (5-2.78)^2 = 0.24
+ (2-2.78)^2 + (4- 2.78)^2 + (2-2.78)^2
+ (2-2.78)^2 + (3- 2.78)^2 + (4-2.78)^2
SS (total) = 13.6
Null hypothesis:

ANOVA
There is no difference between means
µ1 = µ2 = µ3

Step 4: Step 5:
Variance (between) F-value = MS(between)/MS (within)
Variance (within) = 0.12/2.22
= 0.054
Mean square = MS (between)
= SS (between)/Dof (between) F-critical = 5.14 Remember!!!
= 0.24/2
= 0.12 F-value < F-critical
Mean square = MS (within) 0.054 < 5.14
= SS (within)/Dof (within)
= 13.34/6 Conclusion:
= 2.22 We fail to reject null hypothesis

▪Decision:
▪If F-value < F-critical >>>>Don’t reject
Null
▪If F-value > F-critical >>>>Reject Null
Practicing ANOVA

Machine Machine Machine


supplier 1 supplier 2 supplier 3
30 19 60
34 22 51
39 26 53

You might also like