You are on page 1of 37

Statistics for

Non-Statisticians

THE BASIC IDEA

Statistics are used in clinical trials to make inferences


about new treatments based on the evidence of the
patients in the trial

E.g. New drug for treatment of lung cancer does it


work or not?

Ideally design trial that includes all patients with lung


cancer

Not really practical!!

Can only test the new treatment on a representative


sample of the population

Statistics allow us to draw conclusions about the likely


effect on the population using data from the sample

USING STATISTICS

But what exactly do we want the


statistics to assess: Assess

the weight of evidence that a


treatment works (or doesnt)
Give an estimate (and likely range) of the
treatment effect
Test to see how likely it is that this effect
would have been seen by chance

BUT

Statistics can never PROVE anything


beyond any doubt, just beyond
reasonable doubt!!

STATISTICAL DATA
ANALYSIS METHODS

WHO TO INCLUDE?

In any clinical trial, one is likely to find:


Ineligible patients included by mistake
Protocol violators those who dont adhere to
the treatment regimen allocated
Patients who withdraw or get lost to follow-up

To avoid bias, keep these to a minimum

Follow-up all patients randomised into a trial

Should we include them in the analysis?

INTENTION TO TREAT ANALYSIS

As a general rule, all patients randomised


should be analysed by treatment allocated
(regardless of whether they actually
received this treatment)
INTENTION TO TREAT ANALYSIS

Reasons for ITT:

Avoids or certainly minimises risk of bias


Is more pragmatic reflects real life

HYPOTHESIS TESTING

We want to compare the outcomes in different


treatment arms (A and B)

Testing two hypotheses

H0: A=B (Null hypothesis no difference)


H1: AB

Calculate test statistic based on the assumption


that H0 is true (i.e. there is no real difference)
Test will give us a p-value: how likely are the
collected data if H0 is true
If this is unlikely (small p-value), we reject H0

THE LURE OF THE P-VALUE

The p-value is the probability of having observed our data


when the null hypothesis is true

Typically if the p-value is less than 0.05, people say that the
trial gives statistically significant evidence that there is a
difference

Tend to ignore results where p-value greater than 0.05

However, 0.05 is a purely arbitrary value, and not really that


small one time in twenty we will reject H0 wrongly!

That is state difference exists, when one doesnt (false positive)

Dont become wedded to the p-value: there is not much


difference between 0.051 and 0.049

ESTIMATE OF TREATMENT EFFECT

Better still, use the data collected in the


trial to give an estimate of the treatment
effect size, together with a measure of how
certain we are of our estimate

CONFIDENCE INTERVALS (CI)

To determine the true treatment effect, we calculate the


confidence interval for our point estimate

CI is a range of values within which the true treatment effect


is believed to be found, with a given level of confidence.
95% CI is a range of values within which the true
treatment effect will lie 95% of the time

Generally, 95% CI is calculated as


Sample Estimate 1.96 x Standard Error

Use the confidence interval to assess the true treatment


effect, and not just p-values

DATA ANALYSIS

How do we do this?
What type of analysis should be performed?

Depending on the sort of outcome measure,


different types of analysis are appropriate

Because the actual analyses are now done mainly


by computer, the skill is now:

In choosing the appropriate test


Correctly interpreting the results

COMMON OUTCOME MEASURES

Categorical
Continuous
Survival

CATEGORICAL DATA

Outcomes like good/bad, yes/no or present/absent

In testing categorical data, we are looking to see if


there is any relationship between the outcome
category and the treatment given

H0: No association between variables


H1: Association between variables

For categorical data, the chi-squared test is


appropriate if the categories arent ordered
For ordered categories, use a trend test

ISIS TRIAL OF ASPIRIN TO


PREVENT MORTALITY AFTER MI
Dead
804

Alive
7783

Total
8587

No Aspirin 1016

7584

8600

Total

15,367

17,187

Aspirin

1820

- Use chi-squared test of association to determine whether to reject the null


hypothesis of no association between aspirin and death

ISIS TRIAL OF ASPIRIN TO


PREVENT MORTALITY AFTER MI
Aspirin

Dead
Alive
Total
804 (E=909.3) 7783 (E=7677.7) 8587

No Aspirin 1016 (E=910.7) 7584 (E=7689.3) 8600


Total

1820

15,367

17,187

- Use chi-squared test of association to determine whether to reject the null


hypothesis of no association between aspirin and death
- X21 = (804 909.3)2 / 909.3 + + (7584 7689.3)2 / 7689.3 = 27.26
- X21 = 27.26 (P<0.0001)
- Strong evidence of an association between aspirin and mortality

MEASURES OF TREATMENT EFFECT

Tested hypothesis and found strong evidence of an


association between aspirin use and mortality
Not very informative - is aspirin harmful or
beneficial?
Various measures of treatment effect:

Absolute Risk Reduction


Number Needed to Treat
Relative Risk
Relative Risk Reduction
Odds Ratio
Odds Reduction

ODDS RATIO & ODDS REDUCTION

Odds ratio = (804 x 7584) / (7783 x 1016) = 0.77

Estimate of treatment effect

<1 so odds of dying smaller with aspirin


95% CI for the odds ratio = 0.70 to 0.85

Odds reduction = 23%

With true treatment effect based on CI ranging from


a 15% reduction in mortality to a 30% reduction in
mortality with aspirin

Moderate treatment effect, narrow-ish CI and P<0.0001


Good evidence that aspirin reduces risk of death following
MI

SUMMARISING BINARY DATA IN TWO


GROUP PROSPECTIVE STUDY
Risk in standard treatment (P1) and Risk in new treatment (P2)
Term

Formula

ISIS Example

Absolute Risk
Reduction (ARR)

P1 - P2

0.118 0.094 = 0.024


(i.e. 2.4% in favour of new Rx)

Number needed to
1 / |P1 - P2|
treat/harm (NNT/NNH)

1 / 0.024 = 41.7, so NNT = 42


(i.e. need to treat 42 patients with
aspirin in order to prevent 1 death)

Relative Risk (RR)

P2 / P1

0.094 / 0.118 = 0.80 (<1)


(i.e. risk of death lower with aspirin)

Relative Risk
Reduction (RRR)

(P1 - P2) / P1

(0.118 0.094) / 0.118 = 0.20


(i.e. aspirin reduces the risk of death
by 20%)

CONTINUOUS DATA

Outcomes like blood pressure, weight or scores,


summarised using measures of the centre and spread of
the distribution

Measures of the centre of the distribution

Mean: what we think of as an average add up all data and divide


by number of items
Median: midpoint of the data half data below median, and other
half above
Mode: most popular observation

Measures of spread

Variance and standard deviation


Standard deviation is average distance individual observations are
from the mean

CONTINUOUS DATA

In continuous data, we are comparing the


means in the two groups and assessing
whether the two groups come from the
same population

H0: Mean A = Mean B

H1: Mean A Mean B

Use Students t-test

ANOVA if comparing >2 treatment groups

NORMAL DISTRIBUTION

T-test and ANOVA assumes data are Normally distributed


However, if the data are very skew or have multiple peaks,
we use a non-parametric test which doesnt assume any
particular shape for the data

Wilcoxon Mann-Whitney

As a rule, non-parametric tests are more general, but less


sensitive

STUDY COMPARING TWO ANTIHYPERTENSIVE DRUGS ON BP


Diastolic BP compared in two groups of hypertensive patients given
two different drug treatments

Treatment A

N
41

Mean
91mmHg

SD
5.5

Treatment B

43

95mmHg

5.5

- Use Students t-test to assess whether means are from the same population (i.e.
Mean with Treatment A = Mean with Treatment B)

TESTING FOR A DIFFERENCE

Treatment A: N=41, Mean=91mmHg, SD=5.5


Treatment B: N=43, Mean=95mmHg, SD=5.5
Use t-test to assess evidence for or against null
hypothesis (mean A = mean B)
t-test = -3.33 on 82 df
(df=n1+n2-2)
P=0.0013
So there is evidence against H0
Evidence that the mean diastolic BP in the two
treatment groups are different

MEASURE OF TREATMENT EFFECT

Tested hypothesis and found evidence that mean diastolic BP in two


groups are different
Not very informative which of treatment A or B is better?
Point estimate of the treatment effect - calculate the difference between
the two means and the confidence interval

Difference = 91 95mmHg = -4mmHg (favours treatment A)


95% CI: -6.39 to -1.61mmHg

So the difference in mean diastolic BP between groups is statistically


significant (P=0.0013)
With treatment A being more effective in reducing diastolic BP
However, the observed difference of 4mmHg in favour of treatment A,
could be as small as 1.6mmHg or as large as 6.4mmHg.

SURVIVAL DATA

Why are survival data different?

Interested in studying the time between randomisation


and a subsequent event (say death)
These times are unlikely to be normally distributed
Cannot afford to wait until events have happened to all
subjects, for example until all are dead.
Some people may have left the study early and become
lost to follow up - only information we have about some
patients is that they were still alive at last follow-up.

Use survival analysis methods to analyse time to


event data, not just the number of events

Take into account that not all patients may have had an
event

KAPLAN-MEIER SURVIVAL ANALYSIS

Basic idea: we split the trial up into distinct time


intervals
In each time interval: a certain number, N, patients
enter that time period alive and still on follow-up, and
some of these, D, have an event:
Then the probability of surviving that time interval
(assuming you live that long) is (1-D/N)
Multiply all these probabilities together to give the
probability of survival up to a given time point

EXAMPLE SURVIVAL FUNCTION

AIM-HIGH TRIAL OF INTERFERON


FOR MALIGNANT MELANOMA
Total

187

Median
Survival
~4 years

No IFN 156

180

~4 years

336

Total

367

IFN

Dead

Alive

151
307

338
674

Want to assess whether the time to death is the same for the two treatments?

COMPARING SURVIVAL
BETWEEN GROUPS

We will have two graphs: how do we say whether


one group survives longer than the other?

Could do one test at say 1 year; compare proportions (as


before)
Could keep testing at small intervals

What are the drawbacks to these methods?


Use logrank test to determine whether survival
function the same for two treatment groups

H0: Survival function/curve same for both groups

H1: Survival function/curve different across groups

SURVIVAL IN MELANOMA:
INTERFERON VS. OBSERVATION

MEASURE OF TREATMENT EFFECT

Assessed the evidence and found that there is no evidence


that time to death differs between the treatment groups
Despite lack of difference should still calculate point
estimate and confidence interval for treatment effect
Use cox regression to calculate hazard ratio and
confidence interval

HR=0.94 (CI=0.75 1.18)

IFN non-significantly reduces the risk of death by 6%, with


the true treatment effect based on the confidence interval
ranging from a 25% reduction in mortality to an adverse
18% increase in mortality with IFN.

ANALYSES GOOD PRACTICE

Report the primary/secondary outcomes as stated


in the protocol

Do not explore all endpoints until you find one that


is significant (data dredging)

Dont give minor endpoints undue prominence in the


paper

Looking at multiple outcomes, increases chance


finding something significant
In 20 outcomes, just by chance 1 outcome will
significant
Is this real, or the play of chance?

Solution: Dont have too many endpoints

of
be

ANALYSES GOOD PRACTICE

Give confidence intervals where possible,


and not just p-values

Keep subgroup analyses to a minimum


Subgroup analyses should be pre-specified
When interpreting subgroups, assess whole
picture
Do not focus upon one subgroup and
individual p-values

FINAL WORDS

The idea of statistics is to look at the strength of the


evidence for a given hypothesis and determine the
reliability of the treatment effect observed in the
trial
Calculations are based on formulas, but the
application of the formulas and the interpretation of
the results is an art rather than a science
Significance is not black and white

P>0.05 is not evidence of absence of effect, merely


absence of evidence of an effect

A little common sense can go a long way in


medical statistics
If in doubt, ask a statistician!

To call in the statistician after the


experiment is done may be no more
than asking him to perform a post
mortem examination: he may be
able to say what the experiment
died of.
Sir R.A. Fisher
Indian Statistical Congress, Sankhya, c. 1938

BOOK LIST

Swinscow TDV and Campbell MJ. Statistics at Square One


(10th edition). BMJ Books 2002

Campbell MJ. Statistics at Square Two. BMJ Books 2001

Altman D, Machin D, Bryant T and Gardner M. Statistics


with Confidence. BMJ Books 2000

Pereira-Maxwell F. A-Z of Medical Statistics. Arnold1998

You might also like