Professional Documents
Culture Documents
1
Alloy I Rank
18.3 12
NOTE: The assumption of normality is only 16.4 10
valid when both n1 , n2 > 8 22.7 16
17.8 11
18.9 13
Step 4 The sampling distribution of U is 25.3 18
symmetrical and has a mean and variance 16.1 9
given respectively by the formula µU = N1 N2 /2 24.2 17
2
and σU = N1 N2 (N1 + N2 + 1)/12 which needs to Total 106
be calculated.
Computing the sum of ranks for Alloy II we
have
Step 5 We compute the Z value by convert- Alloy II: Rank
ing as follows Z = (U − µU )/σU and then com- 12.6 3
pare it with the relevant Z table and draw the 14.1 5
conclusion. 20.5 15
10.7 1
15.9 8
2.2 Example 19.6 14
12.9 4
Given the following data about the strength 15.2 7
of the cables made from two different alloys, I 11.8 2
and II, determine using Mann-Whitney U Test 14.7 6
whether there is a significant difference be- Total 65
tween the strength of the cables made from
alloy I and alloy II? Since the alloy I samples have the smaller
Alloy I Alloy II sample size N1 = 8 we assign R1 = 106 and
18.3 12.6 R2 = 65 then we have:
16.4 14.1
22.7 20.5
17.8 10.7 8(8 + 1)
U = (8)(10) + − 106
18.9 15.9 2
25.3 19.6 = 10
16.1 12.9
24.2 15.2 We now compute the mean as µU =
11.8 N1 N2 /2 = (8)(10)/2 = 40 and the variance as
14.7 2
σU = N1 N2 (N1 + N2 + 1)/12 = (8)(10)(8 + 10 +
1)/12 √= 126.67. Then the standard deviation
σU = 126.67 = 11.25.
Solution We organise the data into an array Now computing the Z statistic we have Z =
starting from the smallest to the largest and (U − µ )/σ = (10 − 40)/11.25 = −2.67.
U U
give ranks from 1 to 18 as follows:
2
taken over all the observations. If there are
no ties then T = 0 and C reduces to 1, so that
N2 (N2 + 1)
U = N1 N2 + − R1 no correction is needed. In practise, the cor-
2
rection is usually negligible (i.e. not enough
follows N(0,1) to warrant a change in the decision).
The sampling distribution of U is sym- The H test provides a non parametric
metrical and has a mean and variance method in the analysis of variance for one
given respectively by the formula µU = way classification, or one-factor experiments
2 and generalisations can be made.
N1 N2 /2 and σU = N1 N2 (N1 + N2 + 1)/12.
2. Mann Whitney U test should be avoided
if N1 or N2 is ≤ 8. Under such a situation
it is better to use T-test.
4 The Sign Test
3. U1 + U2 = N1 N2 and R1 + R2 = (N1 +
N2 )(N1 + N2 + 1)/2. These provide a check
for the correctness of the calculation. 4.1 Sign test for small samples
HC = H/C
P 3
(T − T ) Example Use the sign test to see if there is
where C = 1− a difference between the number of days until
N3 − N
the collection of account receivable before and
where T is the number of ties correspond- after a new collection policy is implemented.
ing to each observation and where the sum is Use the 0.05 level of significance.
3
Before After (1st - 2nd) Example The following data relates to the
Calculated daily production of cement (in million tons) of
30 32 − a cement plant for 30 days. Use sign test to
28 29 − test the null hypothesis that the plants daily
34 33 + average production of cement is 11.2 million
35 32 + tom against the alternative hypothesis that it
40 37 + is less than 11.2 million tons at the 0.05 level
42 43 − of significance:
33 40 −
38 41 − 11.5 10.0 11.2 10.0 12.3
34 37 − 11.1 10.2 9.6 8.7 9.3
45 44 + 9.3 10.7 11.3 10.4 11.4
28 27 + 12.3 11.4 10.2 11.6 9.5
27 33 − 10.8 11.9 12.4 9.6 10.5
25 30 − 11.6 8.3 9.3 10.4 11.5
41 38 +
36 36 0 Solution Putting + or − signs after compar-
ing with 11.2 we have:
Solution The number of plus and minus
signs for each pair is shown along with the + − 0 − +
raw data in the figure above − − − − −
− − + − +
From the above we see that there are 8
+ + − + −
(−)ve signs, 6 (+)ve sings and 1 zero. As per
− + + − −
the convention we drop the pair giving rise to
+ − − − +
zero. Then n = 15 − 1 = 14 and S = 6 as the
(+)ve sing is less frequent. Calculating the
The number of plus signs: 11
value of K we have:
The number of minus signs: 18
Number of zeros: 1
14 − 1 √ Total Sample Size: 30
K = − (0.98) 14
2
= 6.50 − 3.67 Hence we have the following: X = 11, n = 29
and p = 1/2. Substituting the values in the
= 2.83
formula we have:
4
2. It is one of the few test that can be em- 6 Wald Wolfowitz Run Test
ployed when the only information avail-
able is that one observation exceeds an- Wald Wolfowitz Run test is a non-parametric
other or vice versa. test for testing the null hypothesis that the
distribution functions of two continuous
3. According to some statisticians, sign test populations are the same.
should always be used with caution, as Suppose x1 , x2 , x3 . . . , xn1 is an ordered sam-
the rejection or non-rejection depends ple from a population with the density func-
random pairing. Or in other words the tion f1 (.) and let y1 , y2 , y3 . . . , yn2 be an inde-
one set of pairing of the same data may pendent ordered sample from another popu-
lead to rejection whereas another may lation with density function f2 (.). What we
lead to acceptance. want to test is whether the samples have been
drawn from the same population or from pop-
ulations with the same density functions i.e.
5 Wilcoxon Rank Sum T f1 (.) = f2 (.). Let us combine the two samples
Test and arrange the observations in order of mag-
nitude to give the combined ordered sample
This test is used for testing dependent sam- as x1 x2 x3 y1 y2 y3 y4 x4 x5 . . ..
ples in which data is collected in matched A Run is defined as a sequence of letters of
pairs. This test takes into account both the any kind surrounded by sequence of letters of
direction of differences within a pair of obser- the other kind and the number of elements in
vations and the relative magnitude of differ- a run is usually referred to as the length (l)
ences. It gives more weight to the pairs show- of the run. In the above example we have, in
ing large differences; than to pairs showing order, a run of x (l = 3), a run of y (l = 4), a
small differences. To use this test, measure- run of x (l = 2) etc.
ment must at least be ordinally scaled within If both the samples come from the same
pairs. population then there would be a thorough
For the Wilcoxon Rank Sum Test, the basic mingling of x’s and y’s and consequently
idea is that if the sample’s are from the same the number of runs in the combined sam-
population. If this assumption is true then it ple would be large. On the other hand if
can be assumed that the difference between the samples come from two different popu-
the pairs (either + or -) should be symmetri- lations so that their ranges do not overlap
cally distributed around a central value. then there would be only two runs of type
x1 , x2 , x3 . . . , xn1 and y1 , y2 , y3 . . . , yn2 .
Assume that there are N pairs of values
In order to test the Null Hypothesis H0 :
(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ). Let the HO be
f1 (.) = f2 (.) i.e. the samples have come from
that the N pairs of observations have been
the same population, we count the number
drawn from identical (or same) population.
of runs (’U ’) in the combined ordered sample.
Compute the list of differences δj = (xj − yj ).
Null Hypothesis is rejected if U < u0 where the
Next, sort the absolute values of differences
value of u0 is determined from considering the
{|δj |} into ascending order. Add up the ranks
distribution of U under H0 .
assigned to the positive differences and call it
W +. Similarly find the sum of the ranks as- Given that n1 , n2 are the number of obser-
signed to the negative differences and call it vations of x, y respectively under the null hy-
W −. pothesis we have:
For N > 25 the following test statistic is ap-
plicable: 2n1 n2
E(U ) = +1
(n1 + n2 )
T − µT 2n1 n2 (2n1 n2 − n1 − n2 )
z = follows N(0,1) V ar(U ) =
σT (n1 + n2 )2 (n1 + n2 − 1)
N (N + 1)
where µT =
r 4 and we can use the normal test
N (N + 1)(2N + 1)
σT = U − E(U )
24 Z = p
T = min{W +, W −} V ar(U )
5
follows N (0, 1) asymptotically. parametric methods are readily applica-
This approximation is a fairly good repre- ble.
sentation if n1 , n2 ≥ 10. Since the alternative
hypothesis is ’too few runs’ the test is ordi- 4. Since the socio-economic data are not, in
narily one tailed with only negative values. general, normally distributed, non para-
The test has a very low power; its relative ef- metric test have found application in var-
ficiency compared to the traditional t test for ious social sciences like — Psychometry,
equal variances is zero. Furthermore, it has Sociology and Educational Statistics etc.
the least power compared to other nonpara- 5. Non Parametric tests are available to deal
metric tests applied to the same data. with data which are given in ranks or
grades.
7 Run Test for Randomness
9 Disadvantages of Non-
Another application of the ’run’ test is in the
testing of randomness of a given set of ob- Parametric Test
servations. Let x1 , x2 , x3 . . . , xn be the set of
observations arranged in the order in which 1. Non Parametric test can be used only if
they occur i.e. xi is the ith observation in the measurement are nominal or ordi-
the outcome of an experiment. Then, for nal. Even in that cases, if parametric test
each of the observations, we see if it is be- exists they are more powerful than non
low or above the value of the median of the parametric test.
observations and we write A if the observa- 2. Non Parametric tests are designed to test
tion is above the median and B if the ob- statistical hypothesis only and not for es-
servation is below the median value. Thus timating parameter.
we get a sequence of A’s and B’s of the type
A A A B A B B B B A B A B say. 3. So far, no Non Parametric test is available
Then, under the Null Hypothesis, H0 that for testing interactions in ANOVA model
the set of observations is random the number unless specific assumptions are made
of runs, denoted by U is a random variable about the additivity of the the model.
with
8 Advantages of Non-
Parametric Test
1. Non Parametric tests are readily compre-
hensible, easy to apply and do not require
complicated sample theory.