Professional Documents
Culture Documents
Statistics M
Statistics M
Tom Sensky
HOW TO USE THIS POWERPOINT
PRESENTATION
1
PDQ stands for ‘Pretty Darn Quick’ – a series of publications
AIM OF THIS PRESENTATION
VARIABLES
QUANTITATIVE QUALITATIVE
12 97.5th Centile
10
75th Centile
8
Pain (VAS)
6
MEDIAN
4 (50th centile)
2
25th Centile
0
-2
N= 74 27
2.5th Centile
Female Male
Inter-quartile
range
STANDARD DEVIATION – MEASURE
OF THE SPREAD OF VALUES OF A
SAMPLE AROUND THE MEAN
THE SQUARE OF 2
THE SD IS KNOWN Sum(Value Mean)
AS THE VARIANCE SD
Number of values
SD decreases as a function of:
• smaller spread of values
about the mean
• larger number of values
IN A NORMAL
DISTRIBUTION, 95%
OF THE VALUES WILL
LIE WITHIN 2 SDs OF
THE MEAN
STANDARD DEVIATION AND
SAMPLE SIZE
As sample size
increases, so
SD decreases n=150
n=50
n=10
SKEWED DISTRIBUTION
MEAN
MEDIAN – 50% OF
VALUES WILL LIE
ON EITHER SIDE
OF THE MEDIAN
DOES A VARIABLE FOLLOW A
NORMAL DISTRIBUTION?
NORMAL SKEWED
DISTRIBUTION DISTRIBUTION
107.5 - 100
3.0
This ratio tells us how
far out on the standard
94 97 100 103 106 2.5 distribution we are –
the higher the number,
the further we are from
the population mean
EXAMPLE: IQ
Look up this figure (2.5) in a table of
values of the normal distribution
From the table, the area in the tail
to the right of our sample mean is
0.006 (approximately 1 in 160)
SAMPLE B
COMPARING TWO SAMPLES
SD
SE
Sample Size
COMPARING TWO SAMPLES
We start by assuming that our sample came from the
original population
Our null hypothesis (to be tested) is that IQ=107.5 is
not significantly different from IQ=100
The larger ,
the greater the
chance that the
sample comes
from the ‘Red’
population
100 110
COMPARING TWO SAMPLES
The level represents the probability of finding a significant
difference between the two means when none exists
This is known as a
Type I error
This is known as
the level and is
normally set at
0.20
• ‘False negative’
• Fail to find a significant difference
Type II () even though one exists
• Usually set at 0.20 (20%)
• Power = 1 – (ie usually 80%)
ABNORMAL
POPULATION
DISTRIBUTION
a
FIRST POSSIBLE CUT-OFF:
OF OUTSIDE THE RANGE OF THE
DYSFUNCTIONA DYSFUNCTIONAL
L SAMPLE POPULATION
DISTRIBUTION OF
FUNCTIONAL
(‘NORMAL’) SAMPLE
UNPAIRED OR INDEPENDENT-
SAMPLE t-TEST: PRINCIPLE
The two distributions
are widely separated
so their means clearly
different
The distributions
overlap, so it is unclear
whether the samples
come from the same
population
SD
SE
Sample Size
10 0.401
20 0.641
SUBGROUP ANALYSIS
Papers sometimes report analyses of
subgroups of their total dataset
Criteria for subgroup analysis:
Must have large sample
Must have a priori hypothesis
Must adjust for baseline differences
between subgroups
Must retest analyses in an independent
sample
TORTURED DATA - SIGNS
1 10 11
• Note that the variation between
subjects is much wider than that
within subjects ie the variance in
2 0 3 the columns swamps the variance
in the rows
3 60 65
• Treating A and B as entirely
separate, t=-0.17, p=0.89
4 27 31
• Treating the values as paired,
t=3.81, p=0.03
SUMMARY THUS FAR …
ONE-SAMPLE
Used to compare means of
(INDEPENDENT
two independent samples
SAMPLE) t-TEST
Actual number
15 15
discharged
Expected
number
discharged
COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
(Observed - Expected)2
A B 2
Sum
Expected
Number of
100 50 (15 20)2 (15 10)2
patients
20 10
Actual % 25 25
15 30 1.25 2.5 3.75
Discharged 20 10
Sum of Mean
df F Sig.
Squares Square
Total 2709.69 67
Total 2709.69 67
Total 2709.69 67
• If scores in A were more highly ranked than those in B, all the A scores would be on the left, and B
scores on the right
• If there were no difference between A and B, their respective scores would be evenly spread by
rank
Rank 1 2 3 4 5 6 7 8 9 10 11 12
Group A A A B A B A B B B B B
MANN-WHITNEY U TEST
• Generate a total score (U) representing the number of times an A score precedes each B
Rank 1 2 3 4 5 6 7 8 9 10 11 12
• The first B is preceded by 3 A’s
• The Group A
second B is precededA by 4AA’s etc
B etc A B A B A B B B
• U = 3+4+5+6+6+6 = 30
3 4 5 6 6 6
• Look up significance of U from tables (generated automatically by SPSS)
SUMMARY OF BASIC
STATISTICAL TESTS
Independent t-
Continuous variables ANOVA
test
Mann-Whitney U
Ordinal variables (not test Kruskal-Wallis
normally distributed) ANOVA
Median test
KAPPA
• (Non-parametric) measure of agreement
Kappa Agreement
<0.20 Poor
0.21-0.40 Slight
0.41-0.60 Moderate
0.61-0.80 Good
107.5 5.88
Thus we can be 95%
confident that the true
mean lies between
101.62 and 113.4
CONFIDENCE INTERVAL (CI)
Gives a measure of the precision (or
uncertainty) of the results from a particular
sample
The X% CI gives the range of values which we
can be X% confident includes the true value
CIs are useful because they quantify the size of
effects or differences
Probabilities (p values) only measure strength
of evidence against the null hypothesis
CONFIDENCE INTERVALS
p1 (1 p)1 p2 (1 p2 )
se
n1 n2
0.13(1 0.13) 0.52(1 0.52)
se(ARR )
23 23
NB This formula is given for convenience. You are not required to commit any of
these formulae to memory – they can be obtained from numerous textbooks
CONFIDENCE INTERVAL OF
ABSOLUTE RISK REDUCTION
• ARR = 0.39
• se = 0.13
• 95% CI of ARR = ARR ± 1.95 x se
• 95% CI = 0.39 ± 1.95 x 0.13
• 95% CI = 0.39 ± 0.25 = 0.14 to 0.64
• The calculated value of ARR is 39%, and the 95% CI
indicates that the true ARR could be as low as 14% or as
high as 64%
• Key point – result is statistically ‘significant’ because the
95% CI does not include zero
INTERPRETATION OF CONFIDENCE
INTERVALS
• Remember that the mean estimated from a
sample is only an estimate of the population mean
• The actual mean can lie anywhere within the 95%
confidence interval estimated from your data
• For an Odds Ratio, if the 95% CI passes through
1.0, this means that the Odds Ratio is unlikely to
be statistically significant
• For an Absolute Risk Reduction or Absolute
Benefit increase, this is unlikely to be significant if
its 95% CI passes through zero
CORRELATION
14
Here, there are two
12 variables (HADS depression
HADS Depression
6 The question is –
do HADS scores correlate
4
with SIS ratings?
2
0
0 5 10 15 20 25 30
SIS
CORRELATION
10 minimised
8 Because deviations can be
x1
negative or positive, each is
6 x2
x3 first squared, then the
4
x4 squared deviations are
2 added together, and the
0
square root taken
0 5 10 15 20 25 30
SIS
CORRELATION
14 r2=0.34 14 r2=0.06
12
HADS Depression
HADS Depression
12
10 10
8 8
6 6
4 4
2 2
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
SIS SIS
CORRELATION
y = A + Bx
x
CORRELATION
y = A + Bx
x
CORRELATION
y = A + Bx
x
REGRESSION
Beta t p R2
Disease Activity
.02 .01 0.91 .00
(RADAI)
Sense of
-.40 -4.40 <0.001 .23
Coherence
6 W Patients who
have not
7 W relapsed at
8 the end of the
study are
9 X
described as
10 X ‘censored’
0 1 2 3 4 5
Year of Study
SURVIVAL ANALYSIS: ASSUME
ALL CASES RECRUITED AT TIME=0
1 X X=Relapsed
2 C W=Withdrew
3 W
C=Censored
4 X
5 C
Case
6 W
7 W
8 C
9 X
10 X
0 1 2 3 4 5
Year of Study
SURVIVAL ANALYSIS:
EVENTS IN YEAR 1
1 X X=Relapsed
2 C W=Withdrew
3 W
C=Censored
4 X
Case 6 withdrew within
5 C
Case
SURVIVAL CURVE
Year
KAPLAN-MAIER SURVIVAL
ANALYSIS