You are on page 1of 60

Making sense of Results

Rafael Perera
Essential “Statistics” for Readers

1. Types of measurements

2. Fundamental equation of error

3. Measuring the random error

4. Some statistical terms


• Odds, Risks, NNT
• Ratio vs. Difference
• Regression to the mean
Different types of measurements
use different types of statistics
• Dichotomous:  STATISTICS
– Male,female OR infected, non-infected Proportion, Risk

• Categorical: 
Mode,
– Red, green, blue OR Proportions

• Ordinal: 
– Nil, +, ++ of glucose Mode, Median?

• Interval:
Mean, Median
– temperature
Fundamental Equation of Error
Use Use
good study large
design numbers
Researcher
• Measure = Truth + Bias + Random Error

Confidence
Critically Intervals
Appraise and
Design P-values

Reader
Bias versus Random error
Bias
low high
low
Random error

true result
high
Bias and Measurement error

Groups of 3-4 people


1 – subject
2 – measurers
Measurers – measure (twice) and
record the head size of the
subject. Keep measurements
hidden.
Bias and Measurement error

Intra-Observer variability
Measurement error
• Same answer
• Varied by < 0.5 cm
• Varied by < 1cm
• Varied by < 2 cm
• Varied by >2 cm
Bias and Measurement error

Inter-Observer variability
Measurement error
• Same answer
• Varied by < 0.5 cm
• Varied by < 1cm
• Varied by < 2 cm
• Varied by >2 cm
Bias and Measurement error

Bias
Included ears?
Included nose?
Which part of the head?
Other?
Does it matter?

In paediatric practice following meningitis, a


head circumference that increases by 7mm in
a day will result in urgent head imaging

In obstetrics measurements of the fundal


height can vary by up to 5cm (the difference
between having a baby delivered early due to
IUGR or not when opposite occur)

The question is can you reproduce the test in


your setting and will it perform as well in your
setting
Measuring Random error

Most things don’t work!


Two methods of assessing the role of
random error
• P-values • Confidence Intervals
• (Hypothesis Testing) (Estimation)
– use statistical test to – estimates the range of
examine the ‘null’ values that is likely to
hypothesis include the true value
– if p<0.05 then result is
statistically significant

Relationship between p-values and confidence intervals


If the ‘no effect’ value falls outside the CI then the result is statistically
significant
Supercalifragilisticexpialidocious!
The Steps in Testing a Hypothesis

State the
null hypothesis
H0 Choose the
test statistic
Based on H0
that
calculate the
summarizes probability of Interpret the
the data getting the P-value
value of the
test statistic
Some Statistical tests
• Comparing groups
– T-tests (1 or 2 groups, normally distributed)
– Chi-squared (2 or more groups, categorical or binary data)
– Mann-Whitney U (2 groups, non-normal data)
– Log-rank test (2 groups, survival data)
– ANOVA (multiple groups, normally distributed)
–…
• Tips:
– Understand what the hypothesis being tested is
– Use the p-value to assess the level of evidence against it
– (Experienced) Assess if the test was adequate for the question
and data analysed
Reading confidence intervals
What is “no effect”
•For a ratio? (risk ratio; odds ratio; etc)
ratio = 1

•For a difference? (risk difference; difference in


means; etc)
difference = 0
Statistically versus clinically
significant

Statistically significant
= effect not explained by chance (P<0.05)

Clinically significant
= effect is clinically important
Clinically significant

Vitamin X shortens a 5 day cold

Would you take it twice per day if it


shortened the cold by:
Clinically significant

Vitamin X shortens a 5 day cold

Would you take it twice per day if it


shortened the cold by:
50%
Clinically significant

Vitamin X shortens a 5 day cold

Would you take it twice per day if it


shortened the cold by:
50%
20%
Clinically significant

Vitamin X shortens a 5 day cold

Would you take it twice per day if it


shortened the cold by:
50%
20%
10%
Clinically significant

Vitamin X shortens a 5 day cold

Would you take it twice per day if it


shortened the cold by:
50%
20%
10%
5%
Clinically significant

Vitamin X shortens a 5 day cold

Would you take it twice per day if it


shortened the cold by:
50%
20%
10%
5%
1%
Which are clinically significant?

(a) (b)

(c) (d)
20

Minimum clinical
Important difference
10
0

No difference
Some Statistical Concepts
• Risks, Odds, NNT

• Ratios vs Differences for representing effect

• Regression to the Mean


Effect of daily aspirin on risk of
cancer metastasis
• Five large randomised trials of daily aspirin (≥75
mg daily) versus control for the prevention of
vascular events in the UK

• 1000 new cancers in 16000 participants

• 600 in the placebo group (out of 8000) and 400


in the aspirin group (out of 8000)
Risk (Control)
• 8000 people in the placebo group, and
600 cancers
• risk of cancer
= 600 cancers/8000 people
= 6/80 = 0.075 = 7.5%

risk = number of events of interest


total number of observations
Odds (Control)
• 8000 people in the placebo group, and
600 cancers
• odds of cancer
= 600 cancers/7400 non-cancers
= 6/74 = 0.08 (not usually as %)

odds = number of events of interest


number without the event
Now you do the maths
400 Cancers out of 8000 in the aspirin group

• Risk of Cancer in the aspirin group

• Odds of Cancer in the aspirin group


Now you do the maths
• Risk of Cancer in the aspirin group
400/8000 = 1/20 = .05 = 5%

• Odds of Cancer in the aspirin group


400/7600 = 1/19 = .052

Does it matter which one I use?


How to I communicate risk?
Risk Placebo = 7.5%; Risk Aspirin = 5%

• Maximise impact of intervention?

• What does it mean to an individual?


Relative vs Absolute Risk
• Relative Risk
5%/7.5% =.05/.075 = 0.67
There is a risk reduction of over 30%

• Absolute Risk Difference


5% - 7.5% = -2.5
There is a risk reduction of 2.5%
NNT
• How many need to take aspirin to prevent
one cancer?

NNT = 1/ARD = 1/.025 = 40


The problem with Before and After studies

SPEED CAMERAS
Road Accidents
• The number of cars have increased steadily for the last
50 years (around 3-6% per year)

• In the US there are 256M vehicles (pop 310M)

• An estimated 1.2 million people are killed in road


crashes each year, and as many as 50 million are
injured

• If present trends continue, road traffic injuries are


predicted to be the third-leading contributor to the global
burden of disease and injury by 2020
Speed Cameras
• I can guarantee that these will reduce the
number of traffic accidents
Instructions
• Each one of you represents a Street in
London
• We are going to 'simulate' the number of
road accidents in each Street
• In each table there are two dice
• Each person roll both dice once and write
down the sum (1-12)
Evaluation of Speed Cameras
• Installed in places where 8 or more
serious road accidents have taken place in
the last 3 years
• Measured the number of accidents after
the cameras where installed
• Adjusted estimates give us an effect that is
around 20% the reported drop
Thank you
Flowchart of Statistical Tests for Hypothesis Testing
2
Between one observed X test for goodness
variable and a theoretical of fit
distribution
Between 2
X test for
distributions
Independence between two or independence
more variables

McNemar’s test for


related groups

Parametric T test for independent


samples
Two samples

T test difference for


Hypothesis Between means for related samples
testing and continuous data

a ssessing Non parametric Rank sum test for


d ifferences independent samples

Sign test for related


samples

Parametric ANOVA

> t wo samples
Kruskal Wallis
Non parametric

One sample vs. H Z score


0

Between proportions for


categorical data

One sample vs. H Z score equal


0
proportions
Flowchart of Statistical Tests for Hypothesis Testing

Between one observed 2test for


variable and a goodness of fit
theoretical distribution
Between
distributions 2 test for
independence
Independence between
two or more variables

McNemar’s test
for related groups
Flowchart of Statistical Tests for Hypothesis Testing

t-test independent
samples
Parametric
t-test difference for
Two samples related samples

Rank sum test for


independent
Between means samples
Non Parametric
for continuous
data Sign test for related
samples

Parametric ANOVA

> two samples

Non Parametric Kruskal – Wallis


Flowchart of Statistical Tests for Hypothesis Testing

One sample vs. H0 Z-score

Between
proportions for
categorical data
Two samples Z-score equal
proportions

Summarising proportions
One sample: Risk, Odds
Two samples: Relative risk, Odds ratios, Risk differences
Diagnostic studies
Population prevalence Diagnostic test

Sensitivity Specificity

Loong, TW. (2003) BMJ


• Positive
Predictive value

• Negative
predictive value

Loong, TW. (2003) BMJ


Abstract 3
A patient comes in and you notice that he
has bilateral earlobe crease (ELC)
What is the likelihood that this patient has
Coronary Artery Disease (CAD)?

• CAD has a prevalence of 6%

• ELC has a sensitivity of 50% and a


specificity of 85% for detecting CAD
ELC Prevalence of 6%
Sensitivity of 50%
Specificity of 85% [FPR=15%]

CAD +ve

6 3
17 positive
tests in total

100 of which 3
have the
disease

14
CAD -ve
94 About 18%
Now consider the FOB screening tests

You find out that your father has undertaken the


test and has a positive result – He ask you
whether he has cancer?

• Prevalence of disease is 0.3%


• Sensitivity of 50%
• False positive rate (1 - specificity) 3%.
FOB Prevalence of 0.3%
Sensitivity of 50%
FPR of 3%

Cancer +ve

3 1.5
1000
30
Cancer -ve
997
Probability of having
Cancer

After a positive FOB

>90%
60 - 90%
20 - 60%
10 - 20%
5- 10%
<5%
FOB Prevalence of 0.3%
Sensitivity of 50%
FPR of 3%

Cancer +ve

3 1.5
31.5 positive
tests in total

1000 of which 1.5


have the
disease

30
Cancer -ve
997 About 5%
5%
Doctors with an average of 14 yrs experience
Answers ranged from 1% to 99%
half of them estimating the probability as 50%
Gigerenzer G BMJ 2003;327:741-744

• Thank you

• Any questions

You might also like