You are on page 1of 32

Outline

Statistics in practice
CMED6100 – Session 10

ST Ali

School of Public Health


The University of Hong Kong

20 November 2021

sli.do/#hkubiostat21

ST Ali CMED6100 – Session 10 Slide 2

Outline

Module aims

• To provide students with a foundation in biostatistics


– For those who require elementary skills in biostatistics to
complete their projects and dissertations.
– For those who require elementary skills in their workplace.
– For those who require elementary skills to understand and
interpret the medical literature.

ST Ali CMED6100 – Session 10 Slide 3

Outline

Module objectives
After completing this module, students will be able to:

1. Present data using appropriate tabular and graphical formats;

2. Define and calculate standard measures of location and dispersion of data;

3. Define probability and recognize common probability distributions


including the binomial and Normal distributions;

4. Calculate and interpret p-values for simple hypothesis tests;

5. Interpret parameter estimates and confidence intervals from linear


regression, logistic regression and proportional hazards regression models;

6. Perform power and sample size calculations for one- and two-group
studies.

ST Ali CMED6100 – Session 10 Slide 4


Descriptive tables Graphs Bump plots Results

Part I

Presenting information

ST Ali CMED6100 – Session 10 Slide 5

Descriptive tables Graphs Bump plots Results

Rounding
Table: Proportion of lung cancer cases and healthy men with
different smoking habits.
Rounding the column percentages
Smoking habits Lung cancer cases Healthy men
to one decimal place makes the (n = 86) (n = 86)
Heavy smoker 65.1% 36.0%
comparison more difficult. Light smoker 31.4% 47.7%
Non-smoker 3.5% 16.3%

Rounding to the nearest Table: Proportion of lung cancer cases and healthy men with
different smoking habits.
percent is sufficient here. Smoking habits Lung cancer cases Healthy men
(n = 86) (n = 86)

Note: if we wanted to round Heavy smoker 65% 36%


Light smoker 31% 48%
these numbers to 2 significant Non-smoker 3% 16%

figures, we would change the Percentages have been rounded so sums may not total.

“3%” to “3.5%”.
ST Ali CMED6100 – Session 10 Slide 6

Descriptive tables Graphs Bump plots Results

Include a succinct summary


Table: Proportion of lung cancer cases and healthy men with different smoking habits.

Smoking habits Lung cancer cases Healthy men


(n = 86) (n = 86)
Heavy smoker 65% 36%
Light smoker 31% 48%
Non-smoker 3% 16%

Percentages have been rounded so sums may not total.

Example description in text:


The smoking habits of 86 men with lung cancer and 86 healthy men were
compared (Table). Many more of the lung cancer cases were smokers than the
healthy men. The proportion of heavy smokers among the lung cancer cases
and healthy men were 65% and 36% respectively.

ST Ali CMED6100 – Session 10 Slide 7


Descriptive tables Graphs Bump plots Results

Formatting tables

Table: Tuberculosis notification rates in Hong Kong, 2012-2018, by age


group.
Age groups Notification rate, per 100,000
2012 2013 2014 2015 2016 2017 2018
65-69 124.39 106.03 99.23 98.24 110.18 107.2 96.32
70-74 163.73 136.9 156.75 134.05 132.7 132.96 123.97
75-79 190.91 187.17 189.5 155.79 163.28 154.47 155.45
80-84 296.49 221.03 244.7 209.36 208.98 185.53 193.55
85 or over 321.69 303.6 311.11 278.8 245.24 245.22 246.53

Source: Notifiable Infectious Diseases, Centre for Health Protection

How could you improve this table?

ST Ali CMED6100 – Session 10 Slide 8

Descriptive tables Graphs Bump plots Results

Improved?

Table: Tuberculosis notification rates in Hong Kong, 2012-2018, by age


group.

Age groups 2012 2013 2014 2015 2016 2017 2018


85 or over 322 304 311 279 245 245 247
80-84 296 221 245 209 209 186 194
75-79 191 187 190 156 163 154 155
70-74 164 137 157 134 133 133 124
65-69 124 106 99 98 110 107 96

We have rounded, reordered the rows, and removed the gridlines. Patterns in
the data are clearer.

ST Ali CMED6100 – Session 10 Slide 9

Descriptive tables Graphs Bump plots Results

Successful graphs

• As a general principle of constructing graphs, we will try to


show maximum information with minimum ink.
• Successful graphs will communicate with ease, and
– Show trends and relationships.
– Not deceive the reader.
– Improve on the data being shown in a table or in text.

ST Ali CMED6100 – Session 10 Slide 10


Descriptive tables Graphs Bump plots Results

TB example by Microsoft Excel


Figure: TB notification rates per 100,000 population by age groups in Hong
Kong, 2018.
300

250
Notification rate

200

150

100

50

0
65 - 69 70 - 74 75 - 79 80 - 84 85 & over

We can make a number of improvements to this plot...


ST Ali CMED6100 – Session 10 Slide 11

Descriptive tables Graphs Bump plots Results

Bar graph: improved TB example


Figure: TB notification rates per 100,000 population by age groups in Hong
Kong, 2018.
85 or over

80-84

75-79

70-74

65-69

0 50 100 150 200 250 300


Notification rate

Source: Notifiable Infectious Diseases, Centre for Health Protection


ST Ali CMED6100 – Session 10 Slide 12

Descriptive tables Graphs Bump plots Results

Line graph: TB notifications

Figure: TB notification rate per 100,000 population in Hong Kong by age


group, 2012-2018.

350

300

250

200
Notification
rate
150

100
85 or over
50 80−84
75−79
70−74
0 65−69

2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 13


Descriptive tables Graphs Bump plots Results

Line graph: TB notifications

Figure: At least we should reorder the legend

350

300

250

200
Notification
rate
150

100
65-69
50 70-74
75-79
80-84
0 85 or over

2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 14

Descriptive tables Graphs Bump plots Results

Line graph: TB notifications

Figure: Or, even better, label the lines directly

350

300

250 85 or over

200 80-84
Notification
rate 75-79
150
70-74
100 65-69

50

2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 15

Descriptive tables Graphs Bump plots Results

Time to influenza infection


Figure: Time to laboratory confirmation of influenza infection in
household contacts of a laboratory-confirmed index case, following
application of a non-pharmaceutical intervention.
100% Hand hygiene
Mask+HH
Control
80%

60%
Proportion
not infected
40%

20%

0%
0 2 4 6 8
Day since intervention

ST Ali CMED6100 – Session 10 Slide 16


Descriptive tables Graphs Bump plots Results

Time to influenza infection


Figure: It can be misleading to start the y-axis away from 0 in survival
analysis, as it can overstate differences between curves (people are used
to seeing Kaplan-Meier plots on a 0-1 scale).
100%

98%

96%
Proportion
not infected Hand hygiene
94%
Mask+HH

92%

Control
90%
0 2 4 6 8
Day since intervention

ST Ali CMED6100 – Session 10 Slide 17

Descriptive tables Graphs Bump plots Results

Time to influenza infection


Figure: Good to add a gap between the y-axis and x-axis to highlight that
the y-axis does not go down to zero.
100%

98%

96%
Proportion
Hand hygiene
not infected 94%
Mask+HH
92%

Control
90%

0 2 4 6 8
Day since intervention

ST Ali CMED6100 – Session 10 Slide 18

Descriptive tables Graphs Bump plots Results

Time to influenza infection

Figure: Or it can be even better to reverse the y-axis in these situations.


10%
Control

8%
Mask+HH

6%
Proportion Hand hygiene

infected
4%

2%

0%
0 2 4 6 8
Day since intervention

ST Ali CMED6100 – Session 10 Slide 19


Descriptive tables Graphs Bump plots Results

Health expenditures vs life expectancy

Life expectancy by health expenditures per capita, 2007 Life expectancy by health expenditures per capita,
1970-2008
Health expenditures are total (public and private), in
The data points are years. The other countries are
PPP-converted US dollars. Data source: OECD. Australia, Austria, Belgium, Canada, Denmark,
Finland, France, Germany, Ireland, Italy, Japan, the
Netherlands, New Zealand, Norway, Portugal, Spain,
Sweden, Switzerland, and the United Kingdom. Data
Source: OECD.
ST Ali CMED6100 – Session 10 Slide 20

Descriptive tables Graphs Bump plots Results

Bump plots
Bump plots are similar to line graphs:
Figure: Tuberculosis notifications per 100,000 population in Hong Kong
by age group, 2012-2018.

85 or over 322

80−84 296

247 85 or over

75−79 191 194 80−84

70−74 164
155 75−79

65−69 124 124 70−74


96 65−69

2012 2018

ST Ali CMED6100 – Session 10 Slide 21

Descriptive tables Graphs Bump plots Results

Characteristics of bump plots

• Ranking is clear from direct labeling on at least one side of


the chart.

• No scale needed, the scale can be seen by labels of exact


values on one or both sides of the chart.

• Very useful for comparing changes in ranking over time.


• Also very useful to show differences in various measures
between two or more groups.
– e.g. could plot men on left and women on right, and display
TB incidence by gender.
ST Ali CMED6100 – Session 10 Slide 22
Descriptive tables Graphs Bump plots Results

An example ARTICLES
These numbers are from an article in Estimates of relative survival rates, by cancer site.
Relative survival rate, % (SE) Relative survival rate, % (SE)

the Lancet. How are


5 yearsthey ordered?
10 years 15 years 20 years 5 years 10 years 15 years 20 years
Cancer site Cancer site
Oral cavity and pharynx 54·8 (1·2) 39·3 (1·4) 35·5 (1·6) 32·4 (1·8) Oral cavity and pharynx 56·7 (1·3) 44·2 (1·4) 37·5 (1·6) 33·0 (1·8)
Would you choose a different order?
Oesophagus
Stomach
13·0 (1·3)
19·9 (1·1)
6·4 (1·1)
18·7 (1·2)
4·1 (1·1)
13·7 (1·3)
1·8 (0·9)
12·2 (1·5)
Oesophagus
Stomach
14·2 (1·4)
23·8 (1·3)
7·9 (1·3)
19·4 (1·4)
7·7 (1·6)
19·0 (1·7)
5·4 (2·0)
14·9 (1·9)
Colon 61·4 (0·8) 57·4 (1·0) 49·5 (1·2) 47·2 (1·5) Colon 61·7 (0·8) 55·4 (1·0) 53·9 (1·2) 52·3 (1·6)
Rectum 61·5 (1·2) 50·3 (1·4) 39·8 (1·6) 39·3 (2·0) Rectum 62·6 (1·2) 55·2 (1·4) 51·8 (1·8) 49·2 (2·3)
Liver and intrahepatic 4·1 (0·8) 3·7 (1·0) 4·2 (1·4) 3·4 (1·6) Liver and intrahepatic 7·5 (1·1) 5·8 (1·2) 6·3 (1·5) 7·6 (2·0)
bile duct bile duct
Pancreas 3·4 (0·5) 2·4 (0·5) 2·4 (0·5) 0·8 (0·4) Pancreas 4·0 (0·5) 3·0 (0·5) 2·7 (0·6) 2·7 (0·8)
Note: the relative 62·4
survival
Larynx
Lung and bronchus
14·9 rate8·2 is
(0·4) 9·4 (0·3) (0·4) the
(2·1) 56·3 (2·3) 45·3 (2·6) 39·3 (3·1)
7·2 (0·5)
Larynx
Lung and bronchus
68·8 (2·1)
15·0 (0·4)
56·7 (2·5)
10·6 (0·4)
45·8 (2·8)
8·1 (0·4)
37·8 (3·1)
6·5 (0·4)
Melanomas 89·7 (0·9) 84·5 (1·2) 79·3 (1·6) 73·0 (2·1) Melanomas 89·0 (0·8) 86·7 (1·1) 83·5 (1·5) 82·8 (1·9)
Breast 85·9 (0·4) 76·2 (0·6) 58·1 (0·8) 51·8 (0·9) Breast 86·4 (0·4) 78·3 (0·6) 71·3 (0·7) 65·0 (1·0)
survival rate relative
Cervix uterito 67·9
70·7 (1·6) the(1·7)anticipated
Corpus uteri and uterus,
61·1 (2·1) 57·0 (2·3)
84·3 (1·0) 82·2 (1·3) 78·1 (1·6) 83·7 (1·8)
Cervix uteri
Corpus uteri and uterus,
70·5 (1·6)
84·3 (1·0)
64·1 (1·8)
83·2 (1·3)
62·8 (2·1)
80·8 (1·7)
60·0 (2·4)
79·2 (2·0)
NOS NOS

mortality in the general


Ovary
Prostate population
97·6 (0·4) 75·6 –(2·1)
48·9 (1·3) 44·5 (1·5) 36·7 (1·7) 34·7 (1·9)
(1·0) 54·6 (1·5) 43·9
Ovary
Prostate
55·0 (1·3)
98·8 (0·4)
49·3 (1·6)
95·2 (0·9)
49·9 (1·9)
87·1 (1·7)
49·6 (2·4)
81·1 (3·0)
Testis 95·2 (1·0) 93·0 (1·4) 86·4 (2·0) 84·1 (2·8) Testis 94·7 (1·1) 94·0 (1·3) 91·1 (1·8) 88·2 (2·3)
Urinary bladder 81·8 (1·0) 76·4 (1·4) 66·5 (1·9) 62·5 (2·5) Urinary bladder 82·1 (1·0) 76·2 (1·4) 70·3 (1·9) 67·9 (2·4)
thus it is a measure of excess
Kidney and renal pelvis
Brain and other nervous
62·4 (1·3)
32·4 (1·4)
53·3 (1·7)
26·9 (1·4)
46·8 (2·1)
19·5 (1·5)
41·7 (2·6)
19·9 (1·7)
Kidney and renal pelvis
Brain and other nervous
61·8 (1·3)
32·0 (1·4)
54·4 (1·6)
29·2 (1·5)
49·8 (2·0)
27·6 (1·6)
47·3 (2·6)
26·1 (1·9)
system system
Thyroid 95·6 (0·9) 94·9 (1·3) 91·1 (1·8) 96·3 (2·0) Thyroid 96·0 (0·8) 95·8 (1·2) 94·0 (1·6) 95·4 (2·1)
mortality. See the 81·0
original
Hodgkin’s disease
Non-Hodgkin lymphomas
article
(1·7) 73·9 (2·0) 66·2 (2·3)for a
57·4 (2·7)
53·4 (1·0) 43·4 (1·2) 37·0 (1·5) 30·8 (1·8)
Hodgkin’s disease
Non-Hodgkin lymphomas
85·1 (1·7)
57·8 (1·0)
79·8 (2·0)
46·3 (1·2)
73·8 (2·4)
38·3 (1·4)
67·1 (2·8)
34·3 (1·7)
Multiple myeloma 30·7 (1·7) 10·0 (1·3) 7·2 (1·4) 3·7 (1·2) Multiple myeloma 29·5 (1·6) 12·7 (1·5) 7·0 (1·3) 4·8 (1·5)

more detailed explanation of relative


Leukaemias 45·3 (1·2) 33·4 (1·3) 24·9 (1·4) 20·6 (1·5)
Rates derived from SEER 1973–98 database (both sexes, all ethnic groups).12
Leukaemias 42·5 (1·2) 32·4 (1·3) 29·7 (1·5) 26·2 (1·7)
Rates derived from SEER 1973–98 database (both sexes, all ethnic groups).12
NOS=not otherwise specified. NOS=not otherwise specified.

survival. Table 3: Most recent cohort estimates of relative survival


rates, by cancer site
Table 4: Most recent period estimates of relative survival
Source: Brenner, H. Long-term survival rates ofcancer
rates, by cancer patients achieved by the end of
site the 20th
century:
most recently. Period a period
analysis analysis.
for 1998 suggestsLancet, 260:1131-1135.
that the 2002;and 8%, although period estimates are slightly higher than
ST Ali patients diagnosed with cancer most recently have very
CMED6100 cohort
– Session 10 ones for patients with cancers other than lung Slide 23
favourable long-term survival prospects. cancer.
Cohort and period estimates of 5-year, 10-year,
15-year, and 20-year survival rates (with SEs) are shown Discussion
for 24 frequent cancers in table 3 (cohort) and table 4 These results show that long-term survival expectations
(period). Results of period analysis showed that better of patients with many types of cancer are substantially
long-term survival rates for patients with most cancers better than those suggested by conventional cohort-based
have been achieved by the end of the 20th century than estimates, which refer to cohorts of patients diagnosed
suggested by cohort estimates. Period estimates are higher many years ago. Although survival rates and their changes
than corresponding cohort ones for 16 (67%) of 24 forms over time vary strongly by cancer site, period estimates of
of cancer for 5-year relative survival, and for 20 (83%), 22 10-year, 15-year, and 20-year relative survival are about
Descriptive(92%),
tablesand 20 (83%) cancers for 10-year,
Graphs15-year, and 7%, 11%, and 11%Bump higher,plots
respectively, than traditional Results
20-year survival, respectively. Period estimates of 20-year cohort estimates for all cancers.
relative survival exceed corresponding cohort estimates by Differences in traditional estimates of long-term
about 10% or more for patients with rectal, breast and survival in cancer patients from other countries are even
ovarian cancer, melanomas, and Hodgkin’s disease, but greater, in view of the fact that survival rates of patients in
the most striking difference is seen for patients with the USA have for a long time been higher than those of
prostate cancer (37·2%). A difference of more than 5% is patients in most other parts of the world, including
An example – cancer survival
seen for patients with cancers of the colon, bladder,
kidney, and the brain and nervous system, and
Europe.17,18
Period analysis, which has been widely used in other
Perhaps it would be clearer to
leukaemias. Differences between period and cohort
estimates are less pronounced for 5-year and 10-year
areas of health statistics such as life tables and life
expectancy, was proposed for survival analysis of cancer
relative survival rates, but period estimates of 5-year and patients a few years ago.3,4 Period and cohort analyses
10-year survival are substantially higher than have been shown to yield closely similar estimates of long-
re-order the table by 5-year survival
corresponding cohort ones for some cancers, such as
those of the oral cavity and pharynx, rectum, ovary,
term survival, as long as survival rates remain constant
over time.3 Such a pattern was noted for lung cancer in
prostate, and for Hodgkin’s disease. this analysis. If survival improves over time, such
rates (see right).
By period analysis, 20-year relative survival rates are
close to 90% for cancers of the thyroid and testis, exceed
improvement is more timely captured by period than by
cohort estimates of long-term survival rates.3
80% for melanomas and prostate cancer, are about 80% The main reason why long-term survival rates obtained
for endometrial cancer, and almost 70% for bladder by cohort analysis are so much lower when major
cancer and Hodgkin’s disease. Breast cancer has a 20-year improvements in survival arise over time is because they
relative survival rate of 65%, cervical cancer 60%, and are affected strongly by survival in the first few years
colorectal, ovarian, and renal cancer about 50%. By after diagnosis. Although the same patients also affect
Alternatively we would order by
contrast, patients with cancers of the oesophagus, liver,
pancreas, and lung, and multiple myeloma continue to
long-term period survival estimates, their contribution to
the survival function is restricted to a recent period—ie,
have very poor 20-year relative survival rates between 2% many years after diagnosis—when cancer-related deaths

20-year survival rates.


THE LANCET • Vol 360 • October 12, 2002 • www.thelancet.com 1133

For personal use. Only reproduce with permission from The Lancet Publishing Group.
These data can be graphically
presented in a ‘bump plot’ ...

ST Ali CMED6100 – Session 10 Slide 24

Descriptive tables Graphs Bump plots Results

Figure: The 5-, 10-, 15-, and 20-year relative survival rates for various cancers.

ST Ali CMED6100 – Session 10 Slide 25


Descriptive tables Graphs Bump plots Results

Presenting your statistical analyses

• Remember to be consistent when rounding numbers

• Be consistent with column alignment (should usually use


right-alignment)

• Including sample size can aid interpretation

• Clarify abbreviations and methods in footnotes

ST Ali CMED6100 – Session 10 Slide 26

Descriptive tables Graphs Bump plots Results

Influenza study results

Table: Influenza secondary infection risks in households.

Risk of infection (95% CI)∗ p-value†


Control (n=279) Hand hygiene (n=257) Mask+HH (n=258)
Lab-confirmed influenza 0.10 (0.06, 0.14) 0.05 (0.03, 0.09) 0.07 (0.04, 0.11) 0.22
Clinical influenza(1) 0.19 (0.14, 0.24) 0.16 (0.12, 0.21) 0.21 (0.16, 0.27) 0.40
Clinical influenza(2) 0.05 (0.02, 0.08) 0.04 (0.02, 0.06) 0.07 (0.04, 0.11) 0.28

∗ 95% confidence intervals.

† By Pearson chi-square test adjusted for within-household correlation.

(1) is at least 2 of fever≥37.8◦ C, cough, headache, sore throat, aches or pains in muscles or joints.

(2) is fever≥37.8◦ C plus cough or sore throat.

ST Ali CMED6100 – Session 10 Slide 27

Descriptive tables Graphs Bump plots Results

Factors affecting influenza virus transmission

Characteristic n Adjusted OR∗ 95% CI for OR


Control arm 279 1.00
Hand hygiene arm 259 0.57 (0.26, 1.22)
Mask+HH arm 258 0.77 (0.38, 1.55)

Child (aged ≤5y) 44 1.91 (0.69, 5.30)


Child (aged 6 − 15y) 88 2.87 (1.42, 5.78)
Adult (aged 16+y) 662 1.00

Not vaccinated 688 1.00


Vaccinated in past 1 year 106 0.33 (0.12, 0.91)

Adjusted odds ratios of lab-confirmed infection estimated under a
multivariable logistic regression model adjusting for sex of household
contact, age, sex and antiviral use of corresponding index case, and
allowing for within-household clustering.

ST Ali CMED6100 – Session 10 Slide 28


Probability distributions Inference Comparing groups Choice of statistical methods

Part II

From probability to inferential statistics

ST Ali CMED6100 – Session 10 Slide 29

Probability distributions Inference Comparing groups Choice of statistical methods

The normal distribution


0.4

0.3

Density

0.2

0.1

0.0

17 18 19 20 21 22 23
Normal (20,1)
ST Ali CMED6100 – Session 10 Slide 30

Probability distributions Inference Comparing groups Choice of statistical methods

The normal distribution


If we sampled from a N(20,1) distribution, the distribution of a
sample sized 64 might look like this ...

30

24
Frequency
18

12

17 18 19 20 21 22 23 24
Y
ST Ali CMED6100 – Session 10 Slide 31
Probability distributions Inference Comparing groups Choice of statistical methods

The normal distribution


If we sampled from a N(20,1) distribution, the distribution of a
sample sized 64 might look like this ...

30

24
Frequency
18

12

17 18 19 20 21 22 23 24
Y
ST Ali CMED6100 – Session 10 Slide 32

Probability distributions Inference Comparing groups Choice of statistical methods

Means of repeated samples

• If we have a single sample of size 64 from a Normal (µ,1)


distribution, the best estimate of the mean µ is the sample
mean x̄.

• According to the central limit theorem, under repeated


sampling the sample means will follow a normal distribution

with mean µ and standard error of the mean σ/ n, i.e., 1/8
in our example since σ = 1 and n = 64.

ST Ali CMED6100 – Session 10 Slide 41

Probability distributions Inference Comparing groups Choice of statistical methods

Standard error versus standard deviation


Figure: Distribution of sample means vs original distribution. The
standard deviation refers to the spread of data. The standard error refers
to the variability of the mean under repeated sampling.
µ

s.e.
s.d.
ST Ali CMED6100 – Session 10 Slide 42
Probability distributions Inference Comparing groups Choice of statistical methods

Standard error versus standard deviation

Standard deviation Standard error


Describes variability in data Describes variability in sample
mean

Not affected by sample size Decreases with increasing sample


size

Most observations will fall within Mean ±2SE gives a 95% confi-
±2SD of the mean dence interval for the mean

ST Ali CMED6100 – Session 10 Slide 43

Probability distributions Inference Comparing groups Choice of statistical methods

CI for the sample mean


• Recall the central limit theorem: If X follows a distribution with mean µ
and standard deviation σ, and we take a random sample of size n,
provided that n is sufficiently large the sample mean X̄ will follow a
normal distribution, X̄ ∼ Normal(µ, σ 2 /n).
• Hence drawing repeated random samples of size n from the population,

we could say 95% of X̄ s fall within µ ± 1.96σ/ n.

µ
ST Ali CMED6100 – Session 10 Slide 44

Probability distributions Inference Comparing groups Choice of statistical methods

Possible case for X̄

X

− 1.96σ
X− σ n + 1.96σ
X+ σ n

2.5% 2.5%

µ− 1.96σ
σ n µ µ+ 1.96σ
σ n

Figure: In other random samples X̄ could be sampled here.

ST Ali CMED6100 – Session 10 Slide 45


Probability distributions Inference Comparing groups Choice of statistical methods

Possible case for X̄

− 1.96σ
X− σ n X + 1.96σ
X+ σ n

2.5% 2.5%

µ− 1.96σ
σ n µ µ+ 1.96σ
σ n

Figure: In 5% of samples X̄ will be in the tails of the distribution and


then the 95% CI will not include µ.
ST Ali CMED6100 – Session 10 Slide 46

Probability distributions Inference Comparing groups Choice of statistical methods

Definition of a confidence interval


• Under repeated samples, we can say that P% of P%
confidence intervals will contain the true population value.
• For example, 95% of all 95% confidence intervals will cover the
true population value.
• A single CI may or may not cover the true value.
• We can say that we have 95% confidence that a single 95% CI
will cover the true value, but this is simply a short version of
the definition above.
• Strictly speaking, we cannot say that there is a 95% chance
that a single 95% CI will cover the true value.
ST Ali CMED6100 – Session 10 Slide 47

Probability distributions Inference Comparing groups Choice of statistical methods

Comparing groups

x2

x1

Null hypothesis – assume both groups are samples from the same
distribution. What is the chance of getting a difference x̄1 − x̄2 as
unusual or more unusual than the difference observed?

ST Ali CMED6100 – Session 10 Slide 48


Probability distributions Inference Comparing groups Choice of statistical methods

Comparing groups

Under the null hypothesis, x̄1 − x̄2 will have a Normal distribution
with mean 0 and variance σ12 /n1 + σ22 /n2 .

ST Ali CMED6100 – Session 10 Slide 49

Probability distributions Inference Comparing groups Choice of statistical methods

UN Survey – Results from session 4


25
● X=65
20
15
Frequency
10
5
0

0 20 40 60 80

25
● X=10
20
15
Frequency
10
5
0

0 20 40 60 80
% of member states in Africa

Figure: Responses of 61 students given X = 65 and 58 given X = 10.


ST Ali CMED6100 – Session 10 Slide 50

Probability distributions Inference Comparing groups Choice of statistical methods

Observed difference versus sampling distribution


0.20
Density

0.15

0.10

0.05

0.00 ●

−6 −4 −2 0 2 4 6
X1 − X2
Figure: An observed standardised difference of 5 is at the extremes of the
sampling distribution under the null hypothesis.
ST Ali CMED6100 – Session 10 Slide 51
Probability distributions Inference Comparing groups Choice of statistical methods

Plausibility of results under the null hypothesis


0.20
Density

0.15

0.10

0.05

0.00 ●

−6 −4 −2 0 2 4 6
X1 − X2
Figure: If the null hypothesis were true, i.e. no difference between means, it
would be very unusual to observe such a large difference (whether less than −5
or greater than 5). We would only observe such a large difference in 1% of
ST Ali
repeated experiments. CMED6100 – Session 10 Slide 52

Probability distributions Inference Comparing groups Choice of statistical methods

How do we interpret this?


• If we repeated this experiment many times, and if the null hypothesis
were true, we would only see differences greater than 5 (or less than −5)
in 1% of those experiments.

• The value of 0.01 or 1% is often referred to as a p-value

• Notice that the p-value is a conditional probability – it is conditional on


the null hypothesis being true.

• Small p-values, indicating that observed differences are unlikely under the
null hypothesis, are usually taken as evidence against the null hypothesis

• A common threshold is p < 0.05; in that case p-values less than 0.05 are
called ‘statistically significant’.

ST Ali CMED6100 – Session 10 Slide 53

Probability distributions Inference Comparing groups Choice of statistical methods

Common misunderstandings about p-values

1. The p-value is not the probability that the null hypothesis is


true.

• The p-value is p(such unusual data | null hypothesis is true),


not p(null hypothesis is true | such unusual data).

• We cannot derive the second probability without some


assumption about p(null hypothesis is true) and
p(such unusual data)

ST Ali CMED6100 – Session 10 Slide 54


Probability distributions Inference Comparing groups Choice of statistical methods

Common misunderstandings about p-values


2. The p-value is not the probability that a finding is “merely due
to chance”.

• As the calculation of a p-value is conditional on the


assumption that a finding is the product of chance alone, it
cannot simultaneously be used to gauge the probability of
that assumption being true.

• The p-value is the probability that a finding is “merely due to


chance” if the null hypothesis is true.

• Even low probability events happen sometimes.


ST Ali CMED6100 – Session 10 Slide 55

Probability distributions Inference Comparing groups Choice of statistical methods

Common misunderstandings about p-values

3. The p-value does not indicate the size or importance of the


observed effect (compare with effect size).

• In a large sample, the standard errors will be small, and


therefore even small differences may be associated with small
(and therefore highly ‘significant’) p-values.

ST Ali CMED6100 – Session 10 Slide 56

Probability distributions Inference Comparing groups Choice of statistical methods

Statistical versus practical significance

• In large samples, even small differences may lead to p-values


less than 0.05

• Sometimes such small ‘statistically significant’ differences may


not have practical or clinical importance.

• Sometimes in smaller samples we will not be able to identify


statistically significant effects even if they would be practically
significant.

ST Ali CMED6100 – Session 10 Slide 57


Probability distributions Inference Comparing groups Choice of statistical methods

Common misunderstandings about p-values

4. A p-value of 1.00 does not mean the null hypothesis is true

• A p-value of 1.00 indicates that the observed data were


completely consistent with no effect (for example the primary
outcome occurred at exactly the same rate in two groups)

• Study of 10 people, Group A: 2/5 vs Group B: 2/5 experience


the event of interest – p-value for difference = 1.00

• Study of 1000 people, Group A: 200/500 vs Group B: 200/500


experience the event of interest – p-value for difference = 1.00

ST Ali CMED6100 – Session 10 Slide 58

Probability distributions Inference Comparing groups Choice of statistical methods

ST Ali CMED6100 – Session 10 Slide 59

Probability distributions Inference Comparing groups Choice of statistical methods

Choose a hypothesis test

Explanatory variable Outcome variable or response variable


or predictor variable 2 categories 3+ categories
2 categories Chi-squared test (or
3+ categories Fisher’s exact test, McNemar’s test)
Ordinal Logistic regression –a
Continuous Logistic regression –a

a
Methods for these kinds of data are outside the scope of this course.

ST Ali CMED6100 – Session 10 Slide 60


Probability distributions Inference Comparing groups Choice of statistical methods

Choose a hypothesis test (cont)

Explanatory variable Outcome variable or response variableb


or predictor variable Ordinal (e.g. Likert scale) Continuous
2 categories t-test, Mann-Whitney U t-test, Mann-Whitney U
3+ categories ANOVA ANOVA
Ordinal ANOVA ANOVA
Continuous Regression/correlation Regression/correlation

b
Methods for testing a hypothesis about a single variable or paired difference
include the 1-sample t-test, paired t-test, and the Wilcoxon signed rank test.

ST Ali CMED6100 – Session 10 Slide 61

Probability distributions Inference Comparing groups Choice of statistical methods

Multivariable regression models

Outcome variable Regression model


2 categories Logistic
Continuous Linear
Time-to-event data Proportional hazards
Count data∗ Poisson
Repeated measure∗ Hierarchical model


Methods for these kinds of data are outside the scope of this course.

ST Ali CMED6100 – Session 10 Slide 62

Probability distributions Inference Comparing groups Choice of statistical methods

Goodness of fit of a regression model

• In linear regression the value R 2 quantifies goodness of fit as the


proportion of variation in the outcome variable explained by the
predictors.

• Sometimes R 2 can be ‘adjusted’ to take into account the number of


predictors – including more predictors would only increase R 2 but it is
often preferable to have a parsimonious model.

• In logistic regression there are a few other measures of goodness of fit


including the AIC, the area under ROC curve (sometimes called the
c-statistic), and the Hosmer-Lemshow statistic.

ST Ali CMED6100 – Session 10 Slide 63


Probability distributions Inference Comparing groups Choice of statistical methods

Choosing which predictors to include in a model


• There is no consensus on how to select predictors for inclusion in a

regression model. Alternatives include


– Forced entry – select which predictors you would like to include
regardless of whether they are important in the model
– Forward selection – start from a simple model and then add
predictors one by one as long as they are important (maybe with
p < 0.2?), not including unimportant ones.
– Backward selection - start from a model with many predictors and
remove the ones that are least important (maybe with p > 0.2?)
until only important ones are left.
– Stepwise selection – alternate forward and backward selection.
• Forced entry or backward selection are recommended.
ST Ali CMED6100 – Session 10 Slide 64

Errors in assessment Misleading presentation Infographics Dishonest presentation

Part III

Misleading use of statistics

ST Ali CMED6100 – Session 10 Slide 65

Errors in assessment Misleading presentation Infographics Dishonest presentation

Does getting fit reduce mortality?

• Seminal study by Blair et al.∗

• 10, 000 men enrolled.

• Fitness measured by “treadmill test duration” at baseline and


at a 2nd examination after 5 days.

• Subsequent follow-up for 5 years.

• “... patients will reduce risk of mortality by increasing physical


activity and improving fitness.”

Blair SN et al. Changes in physical fitness and all-cause mortality: a prospective study of healthy and unhealthy
men. JAMA, 1995; 273(14): 1093-8.

ST Ali CMED6100 – Session 10 Slide 66


Errors in assessment Misleading presentation Infographics Dishonest presentation

All-cause death rates per 10,000 man-years (log scale) in 9,777


men by age groups and change or lack of change in physical
fitness. Death rates are shown atop the bars and the numbers of
deaths within the bars. Source: Blair et al., JAMA, 1995
ST Ali CMED6100 – Session 10 Slide 67

Errors in assessment Misleading presentation Infographics Dishonest presentation

Revised plot of Blair’s findings


1000

800 Unfit −> Unfit

All−cause 600
death rates
per 10 000
man−years 400 Unfit −> Fit

200 Fit −> Fit

0
20 30 40 50 60 70
Age group

All-cause death rates per 10,000 man-years in 9,777 men by age


groups and change or lack of change in physical fitness.
ST Ali CMED6100 – Session 10 Slide 68

Errors in assessment Misleading presentation Infographics Dishonest presentation

Criticisms by Williams∗
• Blair only took one baseline measurement of fitness (and one
measurement at follow-up).
• What if a particular patient was feeling more energetic than
usual, on the day of his test?

''true'' observed
level level

Baseline fitness
measured via treadmill
test duration (minutes)
∗ Williams PT. The illusion of improved physical fitness and reduced mortality.
Medicine & Science in Sports & Exercise. 2003; 35(5): 736-40.
ST Ali CMED6100 – Session 10 Slide 69
Errors in assessment Misleading presentation Infographics Dishonest presentation

Fitness not accurately evaluated

• The treadmill duration might not accurately represent the


level of fitness of a given patient.

• We could have wrongly assessed baseline and follow-up fitness


levels.

• How might this affect the conclusions of Blair et al?

ST Ali CMED6100 – Session 10 Slide 70

Errors in assessment Misleading presentation Infographics Dishonest presentation

Scatterplot of 10,000 simulated


men. Nobody’s fitness changed
during the study.
‘Observed fitness’ was measured
with error, in both assessments.
Bottom 20% are “unfit”.
Note that the apparent changers
are in between those classified
as consistently fit for unfit.

ST Ali CMED6100 – Session 10 Slide 71

Errors in assessment Misleading presentation Infographics Dishonest presentation

Stand your ground and gun deaths - the original figure

The graph counts the number of


gun deaths in Florida. A line rises,
bounces a little, reaches a 2nd
highest peak labeled “2005, Florida
enacted its ‘Stand Your Ground’
law,” and falls precipitously. The
‘Stand Your Ground’ law removed
the duty to retreat before using
force in self-defense.

ST Ali CMED6100 – Session 10 Slide 72


Errors in assessment Misleading presentation Infographics Dishonest presentation

Comparing non-square areas


The “iceberg” of frequent heartburn or acid regurgitation: Proportion at each level
based on the current study findings of people with frequent reflux symptoms.

GI consulters
10%

Primary care consulters


44%

Non−consulting GER
subjects 46%

The iceberg of disease is a great concept. However it is not well suited to displaying
quantitative information. To be correct, the area (not the height) of each section
should be proportional to the percentage of interest. The correct version is on the
right-hand side.
ST Ali CMED6100 – Session 10 Slide 74

Errors in assessment Misleading presentation Infographics Dishonest presentation

Not confusing height with area


Figure: The “iceberg” of frequent heartburn or acid regurgitation: Proportion
at each level based on the current study findings of people with frequent reflux
symptoms.
GI consulters

Primary care
consulters

Non−consulting
GER subjects

0% 10% 20% 30% 40% 50%


Proportion

Source: Nandurkar et al., 2005 Am J Gastroenterol.

The horizontal bar chart can still give the general idea of a (half-) iceberg
shape, and this time the quantitative interpretation is correct.
ST Ali CMED6100 – Session 10 Slide 75

Errors in assessment Misleading presentation Infographics Dishonest presentation

Infographics
Source:
http://www.forbes.com/sites/matthewherper/2013/02/19/a-
graphic-that-drives-home-how-vaccines-have-
changed-our-world/

“Information graphics or
infographics are graphic visual
representations of information,
data or knowledge intended to
present complex information
quickly and clearly”
(Wikipedia).

ST Ali CMED6100 – Session 10 Slide 76


Errors in assessment Misleading presentation Infographics Dishonest presentation

Full graphic: http://scienceblog.cancerresearchuk.org/wp-content/uploads/2011/12/Attributable-risk-circles-web-


preview-550px-darker.gif
ST Ali CMED6100 – Session 10 Slide 77

Errors in assessment Misleading presentation Infographics Dishonest presentation

ST Ali CMED6100 – Session 10 Slide 78

Errors in assessment Misleading presentation Infographics Dishonest presentation

How important was the introduction of measles vaccine?


 
 
  FIGURE  1  ±  CANADA                                                                                            
  MEASLES                                                                                                                                    
 
800.00 REPORTED  INCIDENCE  (1935-­1983)  
 
 
 
  700.00
 
 
Source: http://genesgreenbook.com/content/proof-
 
  600.00 vaccines-didnt-save-us
 
 
 
500.00
It appears that measles incidence
 
 
 
 
had declined most dramatically
400.00
 
 
 
before the vaccine was introduced.
  300.00
Measles  Vaccines  
  Introduced  
  Live  1963  /  Inactivated  1964  
     
  200.00
 
 
 
100.00
 
 
 
  0.00
 
1935 1947 1959 1971 1983
 
 
 

  Source: A dapted from: Public H ealth Agency of C anada, F igure 8 ± Measles Reported
  Incidence C anada. http://www.phac-aspc.gc.ca/publicat/cig-gci/p04-meas-roug-eng.php
ST Ali
  CMED6100 – Session 10 Slide 79
Errors in assessment Misleading presentation Infographics Dishonest presentation

How important was the introduction of measles vaccine?

Source: http://www.phac-aspc.gc.ca/publicat/cig-gci/p04-meas-roug-eng.php

The original data give a very different picture.

ST Ali CMED6100 – Session 10 Slide 80

Errors in assessment Misleading presentation Infographics Dishonest presentation

90
Trends in air pollution

80 ● ●

70 ●


60 ● ● ●

Concentration 50

µg/m3)
(µ 40
y=0.4182x−770.55
30
R2=0.0159
20

10

0
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Figure: Cool season average RSP concentration at Tung Chung station -


fit a single line

ST Ali CMED6100 – Session 10 Slide 81

Errors in assessment Misleading presentation Infographics Dishonest presentation

90
Trends in air pollution

80 ● ●

70 ●


60 ● y=−4.2286x+8555 ● ●

Concentration 50 y=6x−11945 R2=0.6754

µg/m3)
(µ 40 R2=0.686
30

20

10

0
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Figure: Cool season average RSP concentration at Tung Chung station -


fit two separate lines

ST Ali CMED6100 – Session 10 Slide 82


Errors in assessment Misleading presentation Infographics Dishonest presentation

What does R 2 mean?


• The first figure was used to argue that pollutant levels have not changed

in recent year, second figure was used to argue that pollutant levels are

now declining rapidly due to effective government interventions

– The second figure was argued to be preferable because the


regression lines had much higher R 2 values.
• R 2 describes the amount of variation in the response variable that is
explained by variation in the predictor.

• If air pollutant levels remained similar from year to year, then variation in
pollutant levels could not be explained by time? If the correlation is low,
the R 2 would be low, but this does not imply that a horizontal line does
not fit the data . . .
ST Ali CMED6100 – Session 10 Slide 83

Sample size Missing data Replication Review

Part IV

Practical issues

ST Ali CMED6100 – Session 10 Slide 84

Sample size Missing data Replication Review

Sample size calculations

• Choose the primary outcome measure

• If determining the value/proportion to a specified degree of


precision, consider the width of the confidence interval.

ST Ali CMED6100 – Session 10 Slide 85


Sample size Missing data Replication Review

Practical 3 scenario
Suppose you would like to test H0 : OR = 1 at α = 0.05 (two-sided) in a case-control
study. The prevalence of exposure in the control population is assumed to be 25%.
You can request funding from a local agency, but the budget must be no higher than
$120,000. The cost of recruiting a case is $400, while controls are easier to find and
recruit and will only cost $200 each. The following table shows the power of
alternative possible study designs to detect odds ratios of 1.5, 1.8 and 2.0:

Case-to-control ratio
2:1 1:1 1:2 1:4 1:8
OR = 2.0 0.781 0.875 0.879 0.802 0.630
OR = 1.8 0.620 0.737 0.746 0.653 0.484
OR = 1.5 0.314 0.404 0.416 0.349 0.247

The 1:2 design with 150 cases and 300 controls has the highest power, and has power
of 75% and 89% to detect ORs > 1.8 and > 2.0 respectively.
ST Ali CMED6100 – Session 10 Slide 86

Sample size Missing data Replication Review

Sample size for regression adjustment

• As rules of thumb, using regression adjustment for potential


confounders will increase the power of a comparison between
two or more groups, provided the data set is large enough to
allow regression.
• Multiple regression – should have 10 observations for each
variable considered for inclusion in the model.
– some confounders may be represented by multiple variables,
e.g. age categorized in 5 groups would contribute 4 variables.

ST Ali CMED6100 – Session 10 Slide 87

Sample size Missing data Replication Review

Sample size for regression adjustment

• Logistic regression – should have 10 events (e.g. deaths) for


each variable considered for inclusion in the model.
– if the event is extremely common then this rule is reversed -
should have 10 non-events for each variable.

• Survival analysis – should have 10 events (e.g. deaths) for


each variable considered for inclusion in the model.

ST Ali CMED6100 – Session 10 Slide 88


Sample size Missing data Replication Review

Dealing with missing data

• Missing data on explanatory variables is a common problem in


biomedical research.

• Data not recorded (e.g. comorbid conditions); subject refused


to answer (e.g. income); data lost (e.g. laboratory technician
dropped the specimen)
• It is difficult to deal with missing data in statistical analysis!
– try to take steps to minimize the problem before/during data
collection phase!!

ST Ali CMED6100 – Session 10 Slide 89

Sample size Missing data Replication Review

Dealing with missing data

• Many alternatives:
• Complete case analysis
– Exclude all subjects with missing data on any variable of
interest
• Pairwise exclusion
– Only exclude subjects with missing data on the variable on an
analysis-by-analysis basis.

• Include as separate category in regression model.

ST Ali CMED6100 – Session 10 Slide 90

Sample size Missing data Replication Review

Dealing with missing data (continued)


• Mean imputation
– Assign any missing values to take the mean observed value.
• Last observation carried forward
– If patient ID102 has unknown age, assign them the age of
patient ID101.

• ‘Hot deck’ method


• Multiple imputation
– Use a statistical model to predict values of unknown
explanatory variables from all known information, and use this
model to adjust the analyses of the outcome measures.
ST Ali CMED6100 – Session 10 Slide 91
Sample size Missing data Replication Review

Dealing with missing data (continued)

• The two best choices are the complete case analysis and
multiple imputation.

• If it is fair to assume that missing data are missing completely


at random (e.g. laboratory specimens destroyed by accident),
a complete case analysis is appropriate

• If the chance that a value is missing may be related to other


observed values (e.g. older subjects less likely to report their
income), a complete case analysis may be biased

ST Ali CMED6100 – Session 10 Slide 92

Sample size Missing data Replication Review

Dealing with missing data (continued)

• In either case multiple imputation should give the most


appropriate results (a complete case analysis can be less
powerful, since the sample size is smaller)
• If the chance that a value is missing may be related to other
unobserved values then we may need specialized methods
– e.g. censoring of survival times in a trial of treatments for
cancer, where sicker patients dropped out.

ST Ali CMED6100 – Session 10 Slide 93

Sample size Missing data Replication Review

Data management

• https://www.youtube.com/watch?v=N2zK3sAtr-4

• Store data in simple format if possible (e.g. CSV)

• Avoid storing data on USB drives

• Keep multiple copies of data

• Make a data dictionary, use meaningful variable names

ST Ali CMED6100 – Session 10 Slide 94


Sample size Missing data Replication Review

Reproducibility and replication


• Concerns about reproducibility of epidemiologic research,
• e.g. Peng 2006 AJE – “The replication of important findings
by multiple independent investigators is fundamental to the
accumulation of scientific evidence. . . . However, because of
the time, expense, and opportunism of many current
epidemiologic studies, it is often impossible to fully replicate
their findings. An attainable minimum standard is
reproducibility, which calls for data sets and software to be
made available for verifying published findings and conducting
alternative analyses.”
ST Ali CMED6100 – Session 10 Slide 95

Sample size Missing data Replication Review

Reproducibility and replication


• Wicherts et al. 2011 PLoS ONE

• The authors asked for replication data to 49 studies published


in two major psychology journals. Many did not comply even
though they were explicitly under contract with the journals
to provide the data.
• Papers whose authors withheld data had more reporting errors
(inconsistencies in their tables and p-values).
• The unwillingness to share data was particularly clear when
reporting errors had a bearing on statistical significance.
ST Ali CMED6100 – Session 10 Slide 96

Sample size Missing data Replication Review

Releasing raw data

Releasing raw data can:


• Promote reproducibility of results;

• Allow other investigators to conduct their own analyses;

• Allow other investigators to compare data with theirs, for example


to explore similarities and differences between research findings.
• Allow other investigators to plan their own studies.

It is likely that open access to published data will become the


standard in the medium to long term, see for example the new
data release policy of PLoS journals.
ST Ali CMED6100 – Session 10 Slide 97
Sample size Missing data Replication Review

Making sure your results are reproducible

• Save your raw data and cleaned dataset, and document all changes
made during the cleaning process.
• Consider including dates or version numbers in your dataset
filenames.
• Document all of the steps taken in your analyses, including the
specific datasets used and the sample sizes included in each analysis.
• Software which allows you to write a series of commands is
particularly useful, as this ‘script’ or ‘syntax’ can be saved and used
again later to reproduce results.

ST Ali CMED6100 – Session 10 Slide 98

Sample size Missing data Replication Review

Review

• This module has covered descriptive statistics and elementary probability,


and has introduced basic topics in inferential biostatistics, including
regression, confidence intervals and hypothesis tests.

• The module is designed for postgraduate students in the Faculty of


Medicine who require elementary skills in biostatistics to complete their
projects and dissertations; therefore the primary focus of the course has
been on the practical use and interpretation of statistical methods.

• Practical sessions introduced the use of SPSS as a tool to aid statistical


analysis.

ST Ali CMED6100 – Session 10 Slide 99

Sample size Missing data Replication Review

Further reading
• Altman DG, Bland JM. Missing data. BMJ, 2007; 334:424.
• Critical care series on medical statistics
http : //ccforum.com/series/CC Medical
• Statistics at square one http : //www .bmj.com/statsbk/
• Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic
research. Am J Epidemiol. 2006;163(9):783-9.
• Wicherts JM, Bakker M, Molenaar D. Willingness to Share
Research Data Is Related to the Strength of the Evidence and
the Quality of Reporting of Statistical Results. PLoS ONE,
2011; 6(11): e26828.
ST Ali CMED6100 – Session 10 Slide 100
Sample size Missing data Replication Review

Course evaluation

1. Student Feedback on Teaching and Learning (SFTL)


http://sftl.hku.hk

2. MPH Student Learning Experience Survey on Course Moodle


page

ST Ali CMED6100 – Session 10 Slide 101

You might also like