You are on page 1of 35

Non – Parametric Statistics

Διατμηματικό ΠΜΣ
Επαγγελματική και Περιβαλλοντική
Υγεία-Διαχείριση και Οικονομική
Αποτίμηση

Δημήτρης Φουσκάκης
Introduction
So far in the course we’ve assumed that the
data come from some known distribution, e.g.
normal or the Central Limit Theory hold.
Methods of estimation and hypothesis testing
have been based on these assumption. These
procedures are usually called parametric
statistical methods. If these assumptions are
not met the nonparametric statistical
methods must be used.
Revision – Inferential Statistics
Hypothesis testing versus Confidence
Intervals
Parametric versus Nonparametric
Quantitative data
Categorical data
Relation between two variables
Relation between several variables
What does inferential statistics do?
helps to quantify how certain we can be
when we make inferences from a given
sample.
The three approaches:
a) Hypothesis testing
b) Confidence Intervals
c) Both

I know how to do a t-test, but I don’t know when!


Hypothesis Testing
HO: W=wa
HA: W≠wa
α: The Type I error or significance level of the test, is
usually set to a value like 5%.
Power = (1-β), the power of the test, common value 80%.
Power calculations: Have I chosen a correct number of
observations?
Is H0 really true?

Yes No

Reject Type I Correct


H0 error decision
Researcher’s α Power
decision Accept Correct Type II
H0 decision error
β
Statistical and clinical significance
Statistical significance (Pvalue):
The probability that this sample was drawn from
a population with characteristics consistent with
H0 was low enough to reject H0. (usual rule:
reject HO if Pvalue < 0.05; why 0.05 and not
0.04?)
Clinical (practical) significance:
An important finding with implications for your
clinical practice.
Summary points for Pvalues
Pvalues, or significant levels, measure the strength
of the evidence against the null hypothesis; the
smaller it is the stronger the evidence is.
An arbitrary division of results, into “significant” or
not, according to the Pvalue was not the intention of
the founders.
A Pvalue of 0.05 provides some but not strong
evidence against the null hypothesis, but it is
reasonable to say that Pvalue<0.001 does.
Results of medical research should not be
reported as “significant” or not but should be
interpreted in the context of the type of study
and other available evidence.
Correct Definition of the Pvalue
Pvalue is the chance of getting a test
statistics as extreme or more than the
observed one.

Pvalue is NOT the chance of the null


hypothesis being right.
Confidence Intervals(C.I.)
The wrong definition:
There is a 95% (e.g.) chance that the parameter of interest
will fall within the particular interval.
The exact definition:

If we take a series of samples from the same population


and construct e.g. 95%confidence intervals around their
parameters then 95% of these confidence intervals will
contain the true parameter.

Implementation to the Hypothesis testing:


Check if the interval includes wa, in order to decide if you
are going to reject the null hypothesis.
How to choose a statistical test . . .
The type of data
„ continuous versus categorical
The distribution
„ parametric versus non-parametric
The sample size
The number of samples
The relation of samples to each other
„ paired versus unpaired
The number of variables
„ univariate versus multivariate
Parametric versus Non-Parametric
Parametric methods:
make distributional assumptions
„ usually assume Normal distribution or use the
Central Limit Theorem.
„ comparable Standard Deviations
Non-parametric methods:
“distribution-free”
„ Pvalue(non-parametric) > Pvalue(parametric)
„ No confidence intervals usually in the non-parametric
tests.
Statistical methods for continuous
data
Univariate tests to compare means:

parametric non-parametric
One-sample Wilcoxon signed
1
t-test rank sum test
Wilcoxon matched
Number paired Paired t-test pairs signed rank
of 2 sum test
samples Two-sample
unpaired Mann-Whitney U test
t-test
3 or One-way
Kruskal-Wallis test
more ANOVA
One Sample
Table 1: Average daily energy intake (kJ) over 10 days of 11 healthy women.
Subject Average daily energy intake (kj)
1 5260
2 5470
3 5640
4 6180
5 6390
6 6515
7 6808
8 7515
9 7515
10 8230
11 8770

Mean 6753.6
SD 1142.1

What can we say about the energy intake of these women in relation to a
recommended daily intake of 7725kJ?
One Sample
To answer the question we can carry out a test of
the null hypothesis that our data are a sample from
a population with a specific hypothesized mean.
The test is called the one sample t-test.
sample mean - hypothesized mean x − k
t = =
standard error of sample mean s/ n
6753.6 − 7725
= = −2.821 If t > tn-1,α/2 or
1142.1/ 11 t < - tn-1,α/2 reject
t distribution with
2 × (area to the right of |t| under n -1=10 df Table Ho
the t distribution with 10 df)
Pvalue < 0.02 Reject H0
One Sample
Alternatively we could calculate a 95% C.I. for the
mean intake:
(x ± t10,0.025 ⋅ s / n ) = (6753.6 ± 2.228 ⋅ 344.4) = (5986, 7521)

This range does not include the recommended level


of 7725KJ. If we assume that the women are a
representative sample, then we can infer that for all
women of this age the average daily energy
consumption is less than is recommended.
One Sample
Assumptions:
z The Data comes from a Normal distribution.
z If the sample size is >30 then because of the
Central Limit Theory we can perform the test
even if data doesn’t look very near to Normal.
z For small samples non Normally distributed
we should perform a non parametric method
like the Sign Test or the Wilcoxon signed
rank sum test.
One Sample
The Sign Test (or Binomial Test)
If there were no differences on average between the sample values
and the hypothesized specific value we would expect an equal
number of observations above and below the specific value. We can
thus use the Binomial distribution, or the Normal approximation of it,
to evaluate the probability of the observed frequencies when the true
probability of exceeding the expected intake is p=1/2. In our dataset 2
women had daily intakes above 7725 KJ and 9 below. We calculate
the following test statistic:

2 × (area to the right of |z| under


If z > zα/2 or z<-zα/2 reject Ho the N(0,1) distribution)

r − np 9 − 5.5 Normal Table


z= = = 2.11 Pvalue=0.035
np (1 − p ) 1.658
OR
r − np 2 − 5.5
z= = = − 2.11 REJECT H0
np (1 − p ) 1.658
One Sample
The Sign Test (or Binomial Test)
If any of the observations is exactly the same as the
hypothesized value then we ignore it in the calculation.
Thus the sample size is the number of observations that
differ from the hypothesized value.
Because of the small sample size it would be better in the
normal approximation to use the continuity correction,
i.e. subtract ½ in the absolute value of the numerator.

| r − np | − 1 / 2 Normal Table
z= = 1.81 Pvalue=0.07
np (1 − p )

DO NOT REJECT H0
One Sample
The Wilcoxon Signed rank Test
Calculate the difference between each
observation and the value of interest.
Ignoring the signs of the differences, rank
them in order of magnitude. More powerful
test than the sign test.
Calculate the sum of the ranks of all the
negative (or positive) ranks and find Pvalue
from corresponding table.
One Sample
The Wilcoxon Signed rank Test

3+5 = 8 Pvalue < 0.05 Reject H0


Wilcoxon Signed rank Test Table
Two Groups of Paired
Observations
Paired data arise when the same individuals are
studied more than once, usually in different
circumstances.
Also, when we have 2 different groups of
subjects who have been individually matched,
for example on a matched pair case-control
study.
Very common in Medical Research.
We are interested in the average difference
between the observations for each individual
and the variability of these differences.
Two Groups of Paired
Observations
Table 2: Mean daily intake over 10 pre-menstrual and 10
post-menstrual days We can use the one sample t-test
to calculate a P value for the
Dietary intake
Subject Pre-menstrual Post-menstrual Difference comparison of means, the
1 5260 3910 1350 observed mean difference of
2 5470 4220 1250 1320.5KJ and the hypothetical
3 5640 3885 1755 value of zero, i.e. the null
4 6180 5160 1020 hypothesis is that pre- and post-
5 6390 5645 745 menstrual dietary intake is the
6 6515 4680 1835 same.
7 6808 5265 1540
8 7515 5975 1540 d−0 1320.5 − 0
t= = = 11.94
9 7515 6790 725 se( d ) 366.7 / 11
10 8230 6900 1330
11 8770 7335 1435 T distribution with
Mean 6753.6 5433.2 1320.5 n -1=10 df Table
SD 1142.1 1216.8 366.7 Pvalue < 0.001
Reject H0
Two Groups of Paired
Observations
Alternatively we could calculate a 95% C.I. for
the mean difference:
(d ± t10,0.025 ⋅ s / n ) = (1320.5 ± 2.228 ⋅110.6) = (1074.2,1566.8)

This range does not include the recommended


level of 0KJ. If we assume that the women are a
representative sample, then we can infer that
dietary intake is much lower in the post-
menstrual period.
Two Groups of Paired
Observations
The same assumptions as before hold for the
difference data (thus we require normality for
the differences not for each set of data). If
these assumptions are not met then we can
apply the same non parametric techniques as
before for the difference data. For example we
see that all 11 differences have the same sign
so the test statistic of the sign test with the
continuity correction is:
| r − np | − 0.5 | 11 − 5.5 | − 0.5 Normal Table
z= = = 3.02 Pvalue = 0.003
np (1 − p ) 1.658
Reject H0
Two Independent Groups of
Observations
The most common statistical analysis, e.g. clinical
trials or observational studies comparing different
groups of subjects.
Table: 24 hour total energy expenditure (MJ/day) in groups of lean and obese women.
Lean Obese

1
(n=13)
6.13
(n=9)
8.79
Is there a true
2
3
7.05
7.48
9.19
9.21
difference in the 24
4
5
7.48
7.53
9.68
9.69
hour total energy
6
7
7.58
7.9
9.97
11.51 expenditure
8
9
8.08
8.09
11.85
12.79 between lean and
10 8.11
11
12
8.4
10.15
obese women?
13 10.88
Mean 8.066 10.298
SD 1.238 1.398
Two Independent Groups of
Observations
To answer this question we can carry out
a test of the null hypothesis that the
means of the two populations, obese and
lean women have the same mean of total
energy expenditure. The test is called the
two sample t-test. P <0.001 value
(T distribution with
x1 − x 2 10.298 − 8.066 n1+ n2 -2=20 df )
t= = = 3.95
se( x 1 − x 2 ) s p ⋅ 1 / n 1 + 1 / n 2
w here s p is the p ooled standard deviation given by Reject H0
(n 1 − 1)s 12 + (n 2 − 1)s 22
sp = , w ith s i2 the variance of the i th group.
n1 + n 2 − 2

If t > tn1+n2-2,α/2 or t < - tn1+n2-2,α/2 reject Ho


Two Independent Groups of
Observations
Alternatively we could calculate a 95% C.I.
for the mean difference:
(x − x
1 2 ± t n1 + n 2 − 2,0.025 ⋅ s p ⋅ 1/ n1 + 1/ n 2 )
= (2.232 ± 2.086 ⋅ 0.5656) = (1.05,3.41)

This range does not include the value of


0MJ/day. Thus the total energy
expenditure in the obese women is greater
than that of the lean women.
Two Independent Groups of
Observations
Assumptions:
z Each set of observations is sampled from a
population with a Normal distribution and the
variances of the two populations are the same.
z If the sample sizes of the two groups are >30 then
because of the Central Limit Theory we can perform
the test even if data doesn’t look very near to Normal
in either or both groups.
z For small samples non Normally distributed, or/and
for populations with unequal variances, we should
perform a non parametric method, the Mann-Whitney
test (or the Wilcoxon Rank sum test).
Two Independent Groups of
Observations – Mann-Whitney Test
The Mann-Whitney test requires all
observations to be ranked as if they were
from a single sample. Then T = sum of the
ranks in the smaller group (either group
can be taken if they have equal size) is
calculated and a P value is found from
tables. Mann – Whitney Table
In our case T=150 Pvalue < 0.01

Reject H0
Two Independent Groups of
Observations – Mann-Whitney Test
Testing the Assumptions
How to test normality? Most people just make a histogram of the
data and check if this looks like a bell shape. Although remember that
the assumption is not that the sample has the normal distribution but
that it comes from a population which does. For large samples we
expect to see a histogram with a bell shape if the population is normal
but with small samples it is quite unlike to get a symmetric distribution
even if the population is normally distributed. There are formal
methods that test for normality, and you can find them in most
statistical packages, like the Shapiro-Wilk test or the Shapiro-
Francia test. You can also use common sense and answer the
question if it is reasonable to make the assumption that the
population of interest is normally distributed.
When the data are not normally distributed and are skewed, it is
better to try some transformations first, like the logarithmic one, in
order to make their shape symmetric and then perform a parametric
test on the transformed data, instead of doing directly a non
parametric test.
Testing the Assumptions
How to test equality of variances? Most
people just see how close are the 2
sample variances. Instead you can
perform a hypothesis testing with a null
hypothesis that the two variances are
equal; this test is called the F test.
Testing the Assumptions
Table: Serum thyroxine level (nmol/l) in 16 hypothyroid infants by severity of
symptoms (Hulse et al., 1979)
Marked symptoms Slight or no symptoms We wish to compare
(n=7) (n=9)
1
2
5
8
34
45
thyroxine levels in the
3
4
18
24
49
55
two groups defined by
5
6
60
84
58
59
severity of symptoms,
7
8
96 60
62 but the sample standard
Mean
9
42.1
86
56.4
deviations are markedly
SD 37.48 14.22
different.
F distribution with
s 12 ⎛ 37.48 ⎞
2
n1 -1=6 and n2 -1=8 df
F= 2 =⎜ ⎟ = 6.95
Pvalue < 0.01
s 2 ⎝ 14.22 ⎠
w here s i is the standard deviation of the i th group. Reject H0
If F < Fn1-1,n2-1,1-α/2 or F > Fn1-1,n2-1,a/2
area to the right of F under the
reject Ho
F distribution with 6, 8 df)
Testing the Assumptions
Alternatively we could calculate a 95% C.I. for the
variances ratio:

⎛ s12 1 s12 1 ⎞
⎜⎜ 2 ⋅ , 2⋅ ⎟⎟ =
⎝ s 2 Fn1 −1,n 2 −1,0.975 s 2 Fn1 −1,n 2 −1,0.025 ⎠
⎛ ⎛ 37.48 ⎞2 1 ⎛ 37.48 ⎞2 1 ⎞
= ⎜⎜ ⋅ ⋅ = (1.49,38.61)
⎜ ⎝ 14.22 ⎟⎠ 4.65 ⎜⎝ 14.22 ⎟⎠ 0.18 ⎟⎟
,
⎝ ⎠
This range does not include the value of 1. Thus the
variance in the marked symptoms group is larger than
the one in the slight or no symptoms group. Thus we
cannot use the t-test and we have to perform a non-
parametric method.
Testing the Assumptions
The F test is non-robust to a violation of
Normality. Alternatively one can use the
Levene’s Test using a statistical package,
which is not strongly dependent on the
assumption of Normality of the two groups.

You might also like