You are on page 1of 37

Statistics for Non-Statisticians

THE BASIC IDEA

Statistics are used in clinical trials to make inferences about new treatments based on the evidence of the patients in the trial

E.g. New drug for treatment of lung cancer does it work or not?

Ideally design trial that includes all patients with lung cancer

Not really practical!!

Can only test the new treatment on a representative sample of the population Statistics allow us to draw conclusions about the likely effect on the population using data from the sample

USING STATISTICS

But what exactly do we want the statistics to assess: Assess

the weight of evidence that a treatment works (or doesnt) Give an estimate (and likely range) of the treatment effect Test to see how likely it is that this effect would have been seen by chance

BUT

Statistics can never PROVE anything beyond any doubt, just beyond reasonable doubt!!

STATISTICAL DATA ANALYSIS METHODS

WHO TO INCLUDE?

In any clinical trial, one is likely to find:


Ineligible patients included by mistake Protocol violators those who dont adhere to the treatment regimen allocated Patients who withdraw or get lost to follow-up

To avoid bias, keep these to a minimum

Follow-up all patients randomised into a trial

Should we include them in the analysis?

INTENTION TO TREAT ANALYSIS

As a general rule, all patients randomised should be analysed by treatment allocated (regardless of whether they actually received this treatment) INTENTION TO TREAT ANALYSIS Reasons for ITT:
Avoids or certainly minimises risk of bias Is more pragmatic reflects real life

HYPOTHESIS TESTING

We want to compare the outcomes in different treatment arms (A and B) Testing two hypotheses

H0: A=B H1: AB

(Null hypothesis no difference)

Calculate test statistic based on the assumption that H0 is true (i.e. there is no real difference) Test will give us a p-value: how likely are the collected data if H0 is true If this is unlikely (small p-value), we reject H0

THE LURE OF THE P-VALUE

The p-value is the probability of having observed our data when the null hypothesis is true
Typically if the p-value is less than 0.05, people say that the trial gives statistically significant evidence that there is a difference Tend to ignore results where p-value greater than 0.05 However, 0.05 is a purely arbitrary value, and not really that small one time in twenty we will reject H0 wrongly!

That is state difference exists, when one doesnt (false positive)

Dont become wedded to the p-value: there is not much difference between 0.051 and 0.049

ESTIMATE OF TREATMENT EFFECT

Better still, use the data collected in the trial to give an estimate of the treatment effect size, together with a measure of how certain we are of our estimate

CONFIDENCE INTERVALS (CI)

To determine the true treatment effect, we calculate the confidence interval for our point estimate CI is a range of values within which the true treatment effect is believed to be found, with a given level of confidence. 95% CI is a range of values within which the true treatment effect will lie 95% of the time Generally, 95% CI is calculated as Sample Estimate 1.96 x Standard Error Use the confidence interval to assess the true treatment effect, and not just p-values

DATA ANALYSIS

How do we do this? What type of analysis should be performed? Depending on the sort of outcome measure, different types of analysis are appropriate Because the actual analyses are now done mainly by computer, the skill is now:

In choosing the appropriate test Correctly interpreting the results

COMMON OUTCOME MEASURES

Categorical Continuous Survival

CATEGORICAL DATA

Outcomes like good/bad, yes/no or present/absent In testing categorical data, we are looking to see if there is any relationship between the outcome category and the treatment given

H0: No association between variables H1: Association between variables

For categorical data, the chi-squared test is appropriate if the categories arent ordered For ordered categories, use a trend test

ISIS TRIAL OF ASPIRIN TO PREVENT MORTALITY AFTER MI


Dead 804 1820 Alive 7783 Total 8587

Aspirin Total

No Aspirin 1016

7584
15,367

8600
17,187

- Use chi-squared test of association to determine whether to reject the null hypothesis of no association between aspirin and death

ISIS TRIAL OF ASPIRIN TO PREVENT MORTALITY AFTER MI


Dead Alive Total 804 (E=909.3) 7783 (E=7677.7) 8587 1820 15,367 17,187

Aspirin Total

No Aspirin 1016 (E=910.7) 7584 (E=7689.3) 8600

- Use chi-squared test of association to determine whether to reject the null hypothesis of no association between aspirin and death
- X21 = (804 909.3)2 / 909.3 + + (7584 7689.3)2 / 7689.3 = 27.26 - X21 = 27.26 (P<0.0001)

- Strong evidence of an association between aspirin and mortality

MEASURES OF TREATMENT EFFECT

Tested hypothesis and found strong evidence of an association between aspirin use and mortality Not very informative - is aspirin harmful or beneficial?
Various measures of treatment effect:

Absolute Risk Reduction Number Needed to Treat Relative Risk Relative Risk Reduction Odds Ratio Odds Reduction

ODDS RATIO & ODDS REDUCTION

Odds ratio = (804 x 7584) / (7783 x 1016) = 0.77


<1 so odds of dying smaller with aspirin 95% CI for the odds ratio = 0.70 to 0.85

Estimate of treatment effect

Odds reduction = 23%

With true treatment effect based on CI ranging from a 15% reduction in mortality to a 30% reduction in mortality with aspirin

Moderate treatment effect, narrow-ish CI and P<0.0001 Good evidence that aspirin reduces risk of death following MI

SUMMARISING BINARY DATA IN TWO GROUP PROSPECTIVE STUDY


Risk in standard treatment (P1) and Risk in new treatment (P2)
Term Absolute Risk Reduction (ARR) Formula P1 - P2 ISIS Example 0.118 0.094 = 0.024 (i.e. 2.4% in favour of new Rx) 1 / 0.024 = 41.7, so NNT = 42 (i.e. need to treat 42 patients with aspirin in order to prevent 1 death) 0.094 / 0.118 = 0.80 (<1) (i.e. risk of death lower with aspirin)

Number needed to 1 / |P1 - P2| treat/harm (NNT/NNH) Relative Risk (RR) Relative Risk Reduction (RRR) P2 / P1

(P1 - P2) / P1 (0.118 0.094) / 0.118 = 0.20 (i.e. aspirin reduces the risk of death by 20%)

CONTINUOUS DATA

Outcomes like blood pressure, weight or scores, summarised using measures of the centre and spread of the distribution

Measures of the centre of the distribution


Mean: what we think of as an average add up all data and divide by number of items Median: midpoint of the data half data below median, and other half above Mode: most popular observation

Measures of spread

Variance and standard deviation Standard deviation is average distance individual observations are from the mean

CONTINUOUS DATA

In continuous data, we are comparing the means in the two groups and assessing whether the two groups come from the same population
H0: Mean A = Mean B H1: Mean A Mean B

Use Students t-test

ANOVA if comparing >2 treatment groups

NORMAL DISTRIBUTION

T-test and ANOVA assumes data are Normally distributed However, if the data are very skew or have multiple peaks, we use a non-parametric test which doesnt assume any particular shape for the data

Wilcoxon Mann-Whitney

As a rule, non-parametric tests are more general, but less sensitive

STUDY COMPARING TWO ANTIHYPERTENSIVE DRUGS ON BP


Diastolic BP compared in two groups of hypertensive patients given two different drug treatments

Treatment A

N 41 43

Mean 91mmHg 95mmHg

SD 5.5 5.5

Treatment B

- Use Students t-test to assess whether means are from the same population (i.e. Mean with Treatment A = Mean with Treatment B)

TESTING FOR A DIFFERENCE


Treatment A: N=41, Mean=91mmHg, SD=5.5 Treatment B: N=43, Mean=95mmHg, SD=5.5


Use t-test to assess evidence for or against null hypothesis (mean A = mean B) t-test = -3.33 on 82 df (df=n1+n2-2) P=0.0013 So there is evidence against H0 Evidence that the mean diastolic BP in the two treatment groups are different

MEASURE OF TREATMENT EFFECT


Tested hypothesis and found evidence that mean diastolic BP in two groups are different Not very informative which of treatment A or B is better?

Point estimate of the treatment effect - calculate the difference between the two means and the confidence interval

Difference = 91 95mmHg = -4mmHg (favours treatment A) 95% CI: -6.39 to -1.61mmHg

So the difference in mean diastolic BP between groups is statistically significant (P=0.0013) With treatment A being more effective in reducing diastolic BP However, the observed difference of 4mmHg in favour of treatment A, could be as small as 1.6mmHg or as large as 6.4mmHg.

SURVIVAL DATA

Why are survival data different?


Interested in studying the time between randomisation and a subsequent event (say death) These times are unlikely to be normally distributed Cannot afford to wait until events have happened to all subjects, for example until all are dead. Some people may have left the study early and become lost to follow up - only information we have about some patients is that they were still alive at last follow-up.

Use survival analysis methods to analyse time to event data, not just the number of events

Take into account that not all patients may have had an event

KAPLAN-MEIER SURVIVAL ANALYSIS

Basic idea: we split the trial up into distinct time intervals In each time interval: a certain number, N, patients enter that time period alive and still on follow-up, and some of these, D, have an event: Then the probability of surviving that time interval (assuming you live that long) is (1-D/N) Multiply all these probabilities together to give the probability of survival up to a given time point

EXAMPLE SURVIVAL FUNCTION


Survival Function
1.0

.8

.6

.4

.2 Survival Function 0.0 0 20 40 60 80 100 120 Censored

Time in W eeks

AIM-HIGH TRIAL OF INTERFERON FOR MALIGNANT MELANOMA


Dead Alive Median Survival ~4 years ~4 years Total

IFN Total

151 307

187
180

338
336

No IFN 156

367

674

Want to assess whether the time to death is the same for the two treatments?

COMPARING SURVIVAL BETWEEN GROUPS

We will have two graphs: how do we say whether one group survives longer than the other?

Could do one test at say 1 year; compare proportions (as before) Could keep testing at small intervals

What are the drawbacks to these methods? Use logrank test to determine whether survival function the same for two treatment groups

H0: Survival function/curve same for both groups H1: Survival function/curve different across groups

SURVIVAL IN MELANOMA: INTERFERON VS. OBSERVATION

MEASURE OF TREATMENT EFFECT


Assessed the evidence and found that there is no evidence that time to death differs between the treatment groups Despite lack of difference should still calculate point estimate and confidence interval for treatment effect
Use cox regression to calculate hazard ratio and confidence interval

HR=0.94 (CI=0.75 1.18)

IFN non-significantly reduces the risk of death by 6%, with the true treatment effect based on the confidence interval ranging from a 25% reduction in mortality to an adverse 18% increase in mortality with IFN.

ANALYSES GOOD PRACTICE

Report the primary/secondary outcomes as stated in the protocol

Dont give minor endpoints undue prominence in the paper

Do not explore all endpoints until you find one that is significant (data dredging)

Looking at multiple outcomes, increases chance of finding something significant In 20 outcomes, just by chance 1 outcome will be significant Is this real, or the play of chance?

Solution: Dont have too many endpoints

ANALYSES GOOD PRACTICE

Give confidence intervals where possible, and not just p-values Keep subgroup analyses to a minimum
Subgroup analyses should be pre-specified When interpreting subgroups, assess whole picture Do not focus upon one subgroup and individual p-values

FINAL WORDS

The idea of statistics is to look at the strength of the evidence for a given hypothesis and determine the reliability of the treatment effect observed in the trial Calculations are based on formulas, but the application of the formulas and the interpretation of the results is an art rather than a science Significance is not black and white

P>0.05 is not evidence of absence of effect, merely absence of evidence of an effect

A little common sense can go a long way in medical statistics If in doubt, ask a statistician!

To call in the statistician after the experiment is done may be no more than asking him to perform a post mortem examination: he may be able to say what the experiment died of.
Sir R.A. Fisher
Indian Statistical Congress, Sankhya, c. 1938

BOOK LIST

Swinscow TDV and Campbell MJ. Statistics at Square One (10th edition). BMJ Books 2002
Campbell MJ. Statistics at Square Two. BMJ Books 2001 Altman D, Machin D, Bryant T and Gardner M. Statistics with Confidence. BMJ Books 2000 Pereira-Maxwell F. A-Z of Medical Statistics. Arnold1998

You might also like