You are on page 1of 10

AIMS OF SESSION

Session 4  Understand what is meant by exposure and outcome


variables
 For continuous outcome variable:
 compare between two ‘exposure’ groups
DESCRIPTIVE  compute mean difference as measure of effect
COMPARISON OF GROUPS  extend to compare >2 groups
 For categorical outcome variable:
 compare binary outcome between two ‘exposure’ groups
D r A n d r e a Ve n n
 compute risk difference, risk ratio and odds ratio as measures of effect
 extend to compare >2 exposure groups
 extend to categorical outcome with >2 levels

EXPOSURE AND OUTCOME VARIABLES EXPOSURE AND OUTCOME VARIABLES

 In research have an outcome variable of interest (disease,  A study of UK adults to investigate whether c-reactive protein
disease marker, death…etc) levels differs between different ethnic groups

 Often investigate the association between some exposure Outcome: Blood Pressure (continuous)
and outcome Exposure: Ethnic group (categorical)
 Exposure variable may be demographic or lifestyle factor (eg
gender, smoking status, whether live near power lines… etc)  A randomised control trial to see whether people on a new
or treatment/experimental group cancer treatment are more likely to recover from disease than
those on standard treatment
 Assess association by comparing the amount of disease Outcome: Recovery (binary)
between exposure groups Exposure: Treatment (binary)

COMPARING OUTCOME BETWEEN


EXPOSURE GROUPS 1. Comparison of
 Methods covered in this session appropriate for studies
continuous outcome
where outcome measured at one point in time
 Methods for longitudinal data later Research question: Does lung function
 No statistical tests at this stage differ between men and women?
 Look at three situations:
 Continuous outcome variable Outcome = FEV1 Exposure = gender
 Binary outcome variable
 Categorical outcome variable (>2 levels)

1
STEP 1: IS THE OUTCOME VARIABLE STEP 2: COMPUTE SUMMARY MEASURES
ROUGHLY NORMALLY DISTRIBUTED? IN EACH OF THE TWO GROUPS

 Draw histogram
 If Normal give the mean and sd for each group
25

Men: mean FEV1 =2.64 L sd= 0.78 L


20

Women: mean FEV1 =2.00 L sd= 0.56 L


Frequency

15

10  If not Normal give median and range or interquartile range

Mean = 2.4359
Std. Dev. = 0.7773
N = 177
0
0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00
FEV1 in litres in 2000

LOOK AT THE 95% CIS AROUND MEANS STEP 3: COMPUTE MEASURE OF EFFECT

If Normal:
 Men: 2.64 (95% CI 2.50 to 2.79)
 Women: 2.00 (95% CI 1.85 to 2.14)
Difference in means = Mean in exposed – mean in unexposed

What do these mean?


 Sometimes called ‘mean difference’
95% confident the true mean FEV1 for men lies somewhere  Not always “exposed” and “unexposed” eg gender (men and
between 2.50 and 2.79 litres, and for women, between 1.85 women)
to 2.14 litres  Also make clear which group is being treated as the
comparison group (reference/baseline/unexposed group)
No overlap suggesting true mean different for men and women  Here computing men – women, so women are the reference
group

MEAN DIFFERENCE 95% CI AROUND MEAN DIFFERENCE

Difference in means (men – women) or mean difference Mean difference (men - women)

= 2.64 – 2.00 = 0.64 litres = 0.64 L (95% CI 0.45 to 0.85)

On average, FEV1 0.64 litres greater in men than women Estimate true difference is 0.64 L, but 95% sure between 0.45
and 0.85 L higher in men than women
So a value of 0 means no association between exposure and
outcome Suppose 95% CI = -0.23 to 1.51
How is this interpreted?

2
DIFFERENCE IN MEDIANS USING SPSS

If outcome not Normal, compute: Same example: Does lung function (FEV1) differ
between men and women?
Difference in medians =
median in exposed – median in unexposed
 Outcome: FEV1
 Exposure: Gender (male or female)

Note: Can’t get 95% CIs around difference in medians easily


(although can get in STATA package)

STEP 2: COMPUTE SUMMARY MEASURES IN


STEP 1: DRAW A HISTOGRAM EACH OF THE TWO GROUPS

Analyse, descriptive statistics, explore


Graphs, legacy dialogs, histogram

25

20
Frequency

15

10

Mean = 2.4359
Std. Dev. = 0.7773
N = 177
0
0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00
FEV1 in litres in 2000

Descriptives

FEV1 in litres in 2000


Sex
male Mean
Statistic
2.6446
Std. Error
.07143 BOX AND WHISKER PLOT
95% Confidence Lower Bound 2.5031
Interval for Mean Upper Bound
2.7860

5% Trimmed Mean 2.6681


Median 2.6750
Top
Variance .612
Std. Deviation .78246 line=Maximum
Minimum .70
Maximum 4.16
Range 3.46
Black line =
Interquartile Range 1.13 Median
Skewness -.291 .221
Kurtosis -.217 .438
female Mean 1.9967 .07353
Box =
95% Confidence Lower Bound 1.8494 Interquartile
Interval for Mean Upper Bound
2.1440 range
5% Trimmed Mean 2.0041
Median 2.0900 Bottom
Variance .308
Std. Deviation .55515
line=Minimum
Minimum .57
Maximum 3.11 Any outliers will
Range 2.54
Interquartile Range .86
be plotted
Skewness -.191 .316 separately
Kurtosis -.464 .623

3
STEP 3: COMPUTE MEASURE OF EFFECT STEP 3: COMPUTE MEASURE OF EFFECT

Analyse, compare means, independent samples t-test Group Statistics

Std. Error
Sex N Mean Std. Deviation Mean
FEV1 in litres in 2000 male 120 2.6446 .78246 .07143
female 57 1.9967 .55515 .07353

Independent Samples Test

Levene's Test for


Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
FEV1 in litres in 2000 Equal variances
7.075 .009 5.613 175 .000 .64792 .11544 .42009 .87574
assumed
Equal variances
6.320 149.081 .000 .64792 .10251 .44535 .85048
not assumed

Mean difference =0.64


95% CI = 0.45 to 0.85

EXAMPLE FROM A PAPER COMPARING MORE THAN TWO GROUPS

Example: FEV1 by body mass index category


Group N Mean SD Mean difference (95% CI)
Underweight 31 2.91 0.67 ref
Normal 338 3.20 0.84 0.29 (-0.02, 0.60)
Overweight 415 3.21 0.86 0.30 (-0.01, 0.61)
Obese 189 2.94 0.82 0.03 (-0.28, 0.33)

• Choose reference category/group (here ‘underweight’)


• Compute mean difference for each group compared to reference
Forsyth J et al. BMJ, May 2003; 326: 953 • Careful in interpreting 95% CIs; need to do significance test to
decide if overall association

STEP 1: COMPUTE PROPORTION WITH


OUTCOME IN TWO GROUPS
2. Comparison of
binary outcome Cross tabulate exposure with outcome
Smoker
Research question: Does the proportion of
people with disease differ between Yes No Total
smokers and non-smokers?
Disease Yes 15 5 20
Outcome = Disease Exposure = smoking status
No 63 117 180

Total 78 122 200

4
STEP 1: COMPUTE PROPORTION STEP 1: COMPUTE PROPORTION
WITH OUTCOME IN TWO GROUPS WITH OUTCOME IN TWO GROUPS

Compute proportion with disease or Table shows column percentages


Risk of disease = number with disease  If exposure is columns, compute column %s
total number  If exposure is rows, compute row %s
Can express as decimal number or %
Smoker
Risk in all Smoker
= 20/200 = 0.1 or 10% Yes No Total
Yes No Total
Risk in smokers Dis- Y 15 5 20 Disease Yes 15 19% 5 4% 20 10%
=15/78 = 0.19 or 19% ease N 63 117 180 No 63 81% 117 96% 180 90%
Risk in non-smokers
Tot 78 122 200
= 5/122 = 0.04 or 4% Total 78 100% 122 100% 200100%

EXAMPLE: ROW OR COLUMN %? STEP 2: COMPUTE MEASURE OF EFFECT

Disease recurrence Total We have seen that disease seems to be more common amongst
smokers than non-smokers…
BMI >30 Yes No
….. but how much more?
Yes 34 35% 63 65% 97 100%
No 23 10% 203 90% 226 100% Three measures of effect for binary outcomes:
Total 57 18% 266 82% 323 100% o Risk difference
o Risk ratio

ROW OR COLUMN %s? ROW o Odds ratio

RISK DIFFERENCE RISK DIFFERENCE

Risk difference = risk of disease in exposed – risk in unexposed Interpretation:

 Tells us on absolute scale, how much greater the risk of


= 0.19 – 0.04 = 0.15 or 19% - 4% = 15% disease is in the exposed group compared to the unexposed

Smoker  Increase in disease prevalence might expect by virtue of being


a smoker (15%)
Yes No Total
 Value of 0 means no association
Disease Yes 15 19% 5 4% 20 10%
No 63 81% 117 96% 18090%
Total 78 100% 122 100% 200100%

5
RISK DIFFERENCE: 95% CI RISK RATIO

Risk ratio = risk of disease in exposed


Risk difference = 15%
risk of disease in unexposed
95% CI = 7% to 23%
= 0.19 or 19% = 4.7
95% confident truth lies between 7% and 23% higher 0.04 4%
prevalence in smokers than non-smokers Smoker
Yes No Total
Suppose 95% CI = -2% to 32%
Disease Yes 15 19% 5 4% 20 10%
What mean?
No 63 81% 117 96% 18090%
Total 78 100% 122 100% 200100%

RISK RATIO RISK RATIO

Interpretation: Interpretation (cont):

 How many times more likely exposed are to have disease  Ratio measures always greater than 0, ie can’t be
than unexposed negative value

 RR=1 means no association


 RR=4.7 means smokers nearly 5 times as likely to have
disease than non-smokers  RR>1 means exposure increases risk of disease

 Tells us about the strength of association between  RR<1 (between 0 and 1) means exposure ‘protective’
exposure and disease

RISK RATIO RISK RATIO: 95% CI

 RR=3 RR=4.7
3 times more likely 95% CI=1.8 to 12.4

If close to 1 or <1, easier to understand if use % more/less likely: What does this mean?
 RR=1.5 95% confident that smokers are between about 2 and 12 times
1.5 – 1 = 0.5 ….. 50% more likely more likely to have disease than non-smokers
 RR=0.4
0.4 - 1 = -0.6 ….. 60% less likely Suppose 95% CI = 0.9 to 16.1
 RR=0.9 What mean?
0.9 – 1 = -0.1 ….. 10% less likely

6
ODDS RATIO RISK VERSUS ODDS

Up until now dealt with risks Smokers:


Risk of disease=15/78=0.19
Risk = number diseased Odds of disease=15/63=0.24
total number
Non-smokers: Smoker

What is an odds? Risk=5/122=0.041 Yes No Total


Odds= 5/117=0.043
Disease Y 15 5 20
Odds= number diseased N 63 117 180
number disease-free
Tot 78 122 200

ODDS RATIO ODDS RATIO

Interpretation:
Odds Ratio = Odds of disease in exposed
• Meaning less intuitive than risk ratio but if rare disease (<10%
Odds of disease in unexposed prevalence):
 OR similar to risk ratio
Smoker
OR=15/63  Can interpret as if risk ratio
5/117 Yes No Total • If not rare disease
= 0.238 Disease Y 15 5 20  OR will be over-estimate of risk ratio
0.043 N 63 117 180 • ORs used more than RRs because multivariate methods to
control for confounding only deal with ORs
= 5.57 Tot 78 122 200

ODDS RATIO: 95% CI ODDS RATIO: EXAMPLES

Smoking & disease example:


OR=5.57, 95% CI=1.93 to 16.04 Is exercise related to breast cancer?
OR=0.8 (0.6 to 1.1)
Overall prevalence=10%
Is oral contraceptive use associated with obesity?
Come to similar interpretation as with RR OR = 2.1 (1.3 to 3.9)
• smokers about 5 times more likely to have disease
• truth between 2 and 16 times as likely

7
EXERCISE EXERCISE

Data on 800 high risk women (family history of breast cancer)


Exposure Total
Exposure Total
Smoker Non-smoker
Smoker Non-smoker
Cancer Yes 19 21 40
Breast Yes 19 21 40
No 181 579 760
cancer No 181 579 760
Total 200 600 800
Total 200 600 800
5. Compute odds of disease in each group
1. What is overall prevalence of breast cancer?
6. Compute odds ratio and interpret
2. Compute appropriate %s
7. Are risk ratio and odds ratio similar? Explain why.
3. What is risk of breast cancer in each smoking group?
8. 95% CI for OR = 1.5 to 5.8. What does this mean?
4. Compute risk ratio, treating non-smokers as unexposed

STEP 1: COMPUTE PROPORTION WITH


USING SPSS
OUTCOME IN TWO GROUPS

Same example: Does proportion with disease differ Analyse, descriptive statistics, crosstabs
between smokers and non-smokers?

 Outcome: disease y/n


 Exposure: smoker y/n

STEP 2: COMPUTE MEASURE OF


EFFECT SPSS CROSSTAB OUTPUT

DISEASE * SMOKE Crosstabulation

SMOKE
yes no Total
DISEASE yes Count 15 5 20
% within SMOKE 19.2% 4.1% 10.0%
no Count 63 117 180
% within SMOKE 80.8% 95.9% 80.0%
Total Count 78 122 200
% within SMOKE 100.0% 100.0% 100.0%

Risk Estimate

95% Confidence
Interval
Value Lower Upper
Odds Ratio for DISEASE
5.571 1.935 16.040
(1.00 / 2.00)
For cohort SMOKE = 1.00 2.143 1.553 2.957
Tick ‘Risk’ to get odds ratio
For cohort SMOKE = 2.00 .385 .179 .828
N of Valid Cases 200

8
EXAMPLE FROM A PAPER COMPARING MORE THAN TWO GROUPS

Disease Age group


<20 20-29 30-39 40-49 50-59 60+ Total
Yes 5 7 6 7 8 10 43
No 17 28 36 41 46 140 308

Total 22 35 42 48 54 150 351

OR 4.12 3.50 2.33 2.39 2.43 1


95% CI 1.3, 13.5 1.2, 9.9 0.8, 6.8 0.9, 6.7 0.9, 6.5

<20 60+
OR = odds of disease in < 20’s
odds of disease in 60+
Y 5 10
N 17 140 = 5/17 = 4.12
Total 22 150 10/140

NOTES ON COMPARING > 2 GROUPS WHICH MEASURES TO USE?

 Need to use a significance test to assess whether overall  Ratio measures tell you about whether exposure likely to be
association, rather than looking at individual CIs causal or not (aetiological strength)
 SPSS does not display ORs and 95% CIs for tables bigger than
 Risk difference tells you about implications for individuals
2x2.
 Risk difference can also tell you about public health
 Need to select out the 2 exposure categories of interest (data,
select cases) and then run the cross-tab. implications, but may need to consider frequency of
exposure also (can compute population attributable risk,
PAR; see appendix in notes)

WHICH MEASURES TO USE?


3. Comparison of a
OC and hepatocellular adenoma
 RR=5, hence strong risk factor
categorical outcome
 Risk difference=0.005% – 0.001% = 0.004%, hence absolute increase in risk
small
 Additional risk to individuals small and public health importance low as rare in Research question: Does disease severity (mild,
exposed and unexposed moderate, severe) differ between those on
treatment and those on placebo?
HRT and breast cancer
 RR=1.1, hence weak risk factor Outcome=disease severity Exposure = treatment arm
 Risk difference = 22%-20% =2%, hence fairly large absolute risk increase
 Important implications to individuals on HRT (increased risk from 1 in 5 to 1 in
4.5) and potential public health importance as common disease (need to also
consider how common exposure is)

9
CROSS-TABULATE SUMMARY

Treatment arm Type of outcome Summary statistics Measure of effect


variable computed in each group
Active Placebo Total
Disease Mild 25 63% 18 45% 43 Continuous Mean and standard Difference in means
Normal deviation
severity Moderate 10 25% 14 35% 24
Continuous Median and Difference in medians
Severe 5 12% 8 20% 13
non-Normal range/interquartile range
Total 40 100% 40 100% 80
Binary variable Proportion (or %) with Risk difference, risk
disease ratio or odds ratio
 Compute appropriate %’s and use to describe/compare Categorical Proportion (or %) with None
distribution of outcome in each exposure group variable (>2 cats) each level of outcome
 Can’t compute ‘measure of effect’ unless recode outcome to
Remember: For difference measures 0=no association
make binary eg moderate/severe ->‘diseased’……but lose info!
For ratio measures 1=no association

10

You might also like