Biostatistics of HKU MMEDSC Session10handoutprint3

Outline
Statistics in practice
CMED6100 – Session 10
ST Ali
School of Public Health

The University of Hong Kong
20 November 2021
sli.do/#hkubiostat21
ST Ali CMED6100 – Session 10 Slide 2
Outline
Module aims
• To provide students with a foundation in biostatistics

– For those who require elementary skills in biostatistics to
complete their projects and dissertations.
– For those who require elementary skills in their workplace.
– For those who require elementary skills to understand and
interpret the medical literature.
Outline
Module objectives
After completing this module, students will be able to:
1. Present data using appropriate tabular and graphical formats;
2. Define and calculate standard measures of location and dispersion of data;
3. Define probability and recognize common probability distributions

including the binomial and Normal distributions;
4. Calculate and interpret p-values for simple hypothesis tests;
5. Interpret parameter estimates and confidence intervals from linear

regression, logistic regression and proportional hazards regression models;
6. Perform power and sample size calculations for one- and two-group
studies.

Descriptive tables Graphs Bump plots Results
Part I
Presenting information
Rounding
Table: Proportion of lung cancer cases and healthy men with
different smoking habits.
Rounding the column percentages
Smoking habits Lung cancer cases Healthy men
to one decimal place makes the (n = 86) (n = 86)
Heavy smoker 65.1% 36.0%
comparison more difficult. Light smoker 31.4% 47.7%
Non-smoker 3.5% 16.3%
Rounding to the nearest Table: Proportion of lung cancer cases and healthy men with
different smoking habits.
percent is sufficient here. Smoking habits Lung cancer cases Healthy men
(n = 86) (n = 86)
Note: if we wanted to round Heavy smoker 65% 36%

Light smoker 31% 48%
these numbers to 2 significant Non-smoker 3% 16%
figures, we would change the Percentages have been rounded so sums may not total.
“3%” to “3.5%”.
Include a succinct summary

Table: Proportion of lung cancer cases and healthy men with different smoking habits.
Smoking habits Lung cancer cases Healthy men

(n = 86) (n = 86)
Heavy smoker 65% 36%
Light smoker 31% 48%
Non-smoker 3% 16%
Percentages have been rounded so sums may not total.
Example description in text:

The smoking habits of 86 men with lung cancer and 86 healthy men were
compared (Table). Many more of the lung cancer cases were smokers than the
healthy men. The proportion of heavy smokers among the lung cancer cases
and healthy men were 65% and 36% respectively.

Formatting tables
Table: Tuberculosis notification rates in Hong Kong, 2012-2018, by age

group.
Age groups Notification rate, per 100,000
2012 2013 2014 2015 2016 2017 2018
65-69 124.39 106.03 99.23 98.24 110.18 107.2 96.32
70-74 163.73 136.9 156.75 134.05 132.7 132.96 123.97
75-79 190.91 187.17 189.5 155.79 163.28 154.47 155.45
80-84 296.49 221.03 244.7 209.36 208.98 185.53 193.55
85 or over 321.69 303.6 311.11 278.8 245.24 245.22 246.53
Source: Notifiable Infectious Diseases, Centre for Health Protection
How could you improve this table?
Improved?
Table: Tuberculosis notification rates in Hong Kong, 2012-2018, by age

group.
Age groups 2012 2013 2014 2015 2016 2017 2018

85 or over 322 304 311 279 245 245 247
80-84 296 221 245 209 209 186 194
75-79 191 187 190 156 163 154 155
70-74 164 137 157 134 133 133 124
65-69 124 106 99 98 110 107 96
We have rounded, reordered the rows, and removed the gridlines. Patterns in
the data are clearer.
Successful graphs
• As a general principle of constructing graphs, we will try to

show maximum information with minimum ink.
• Successful graphs will communicate with ease, and
– Show trends and relationships.
– Not deceive the reader.
– Improve on the data being shown in a table or in text.

TB example by Microsoft Excel

Figure: TB notification rates per 100,000 population by age groups in Hong
Kong, 2018.
300
250
Notification rate
200
150
100
50
0
65 - 69 70 - 74 75 - 79 80 - 84 85 & over
We can make a number of improvements to this plot...

Bar graph: improved TB example

Figure: TB notification rates per 100,000 population by age groups in Hong
Kong, 2018.
85 or over
80-84
75-79
70-74
65-69
0 50 100 150 200 250 300

Notification rate
Source: Notifiable Infectious Diseases, Centre for Health Protection

Line graph: TB notifications
Figure: TB notification rate per 100,000 population in Hong Kong by age

group, 2012-2018.
350
300
250
200
Notification
rate
150
100
85 or over
50 80−84
75−79
70−74
0 65−69
2012 2013 2014 2015 2016 2017 2018

Figure: At least we should reorder the legend
350
300
250
200
Notification
rate
150
100
65-69
50 70-74
75-79
80-84
0 85 or over
2012 2013 2014 2015 2016 2017 2018
Figure: Or, even better, label the lines directly
350
300
250 85 or over
200 80-84
Notification
rate 75-79
150
70-74
100 65-69
50
2012 2013 2014 2015 2016 2017 2018
Time to influenza infection

Figure: Time to laboratory confirmation of influenza infection in
household contacts of a laboratory-confirmed index case, following
application of a non-pharmaceutical intervention.
100% Hand hygiene
Mask+HH
Control
80%
60%
Proportion
not infected
40%
20%
0%
0 2 4 6 8
Day since intervention


Figure: It can be misleading to start the y-axis away from 0 in survival
analysis, as it can overstate differences between curves (people are used
to seeing Kaplan-Meier plots on a 0-1 scale).
100%
98%
96%
Proportion
not infected Hand hygiene
94%
Mask+HH
92%
Control
90%
0 2 4 6 8

Figure: Good to add a gap between the y-axis and x-axis to highlight that
the y-axis does not go down to zero.
100%
98%
96%
Proportion
Hand hygiene
not infected 94%
Mask+HH
92%
Control
90%
0 2 4 6 8
Figure: Or it can be even better to reverse the y-axis in these situations.

10%
Control
8%
Mask+HH
6%
Proportion Hand hygiene
infected
4%
2%
0%
0 2 4 6 8

Health expenditures vs life expectancy
Life expectancy by health expenditures per capita, 2007 Life expectancy by health expenditures per capita,
1970-2008
Health expenditures are total (public and private), in
The data points are years. The other countries are
PPP-converted US dollars. Data source: OECD. Australia, Austria, Belgium, Canada, Denmark,
Finland, France, Germany, Ireland, Italy, Japan, the
Netherlands, New Zealand, Norway, Portugal, Spain,
Sweden, Switzerland, and the United Kingdom. Data
Source: OECD.
Bump plots
Bump plots are similar to line graphs:
Figure: Tuberculosis notifications per 100,000 population in Hong Kong
by age group, 2012-2018.
85 or over 322
80−84 296
247 85 or over
75−79 191 194 80−84
70−74 164
155 75−79
65−69 124 124 70−74

96 65−69
2012 2018
Characteristics of bump plots
• Ranking is clear from direct labeling on at least one side of

the chart.
• No scale needed, the scale can be seen by labels of exact

values on one or both sides of the chart.
• Very useful for comparing changes in ranking over time.

• Also very useful to show differences in various measures
between two or more groups.
– e.g. could plot men on left and women on right, and display
TB incidence by gender.
An example ARTICLES
These numbers are from an article in Estimates of relative survival rates, by cancer site.
Relative survival rate, % (SE) Relative survival rate, % (SE)
the Lancet. How are

5 yearsthey ordered?
10 years 15 years 20 years 5 years 10 years 15 years 20 years
Cancer site Cancer site
Oral cavity and pharynx 54·8 (1·2) 39·3 (1·4) 35·5 (1·6) 32·4 (1·8) Oral cavity and pharynx 56·7 (1·3) 44·2 (1·4) 37·5 (1·6) 33·0 (1·8)
Would you choose a different order?
Oesophagus
Stomach
13·0 (1·3)
19·9 (1·1)
6·4 (1·1)
18·7 (1·2)
4·1 (1·1)
13·7 (1·3)
1·8 (0·9)
12·2 (1·5)
Oesophagus
Stomach
14·2 (1·4)
23·8 (1·3)
7·9 (1·3)
19·4 (1·4)
7·7 (1·6)
19·0 (1·7)
5·4 (2·0)
14·9 (1·9)
Colon 61·4 (0·8) 57·4 (1·0) 49·5 (1·2) 47·2 (1·5) Colon 61·7 (0·8) 55·4 (1·0) 53·9 (1·2) 52·3 (1·6)
Rectum 61·5 (1·2) 50·3 (1·4) 39·8 (1·6) 39·3 (2·0) Rectum 62·6 (1·2) 55·2 (1·4) 51·8 (1·8) 49·2 (2·3)
Liver and intrahepatic 4·1 (0·8) 3·7 (1·0) 4·2 (1·4) 3·4 (1·6) Liver and intrahepatic 7·5 (1·1) 5·8 (1·2) 6·3 (1·5) 7·6 (2·0)
bile duct bile duct
Pancreas 3·4 (0·5) 2·4 (0·5) 2·4 (0·5) 0·8 (0·4) Pancreas 4·0 (0·5) 3·0 (0·5) 2·7 (0·6) 2·7 (0·8)
Note: the relative 62·4
survival
Larynx
Lung and bronchus
14·9 rate8·2 is
(0·4) 9·4 (0·3) (0·4) the
(2·1) 56·3 (2·3) 45·3 (2·6) 39·3 (3·1)
7·2 (0·5)
Larynx
Lung and bronchus
68·8 (2·1)
15·0 (0·4)
56·7 (2·5)
10·6 (0·4)
45·8 (2·8)
8·1 (0·4)
37·8 (3·1)
6·5 (0·4)
Melanomas 89·7 (0·9) 84·5 (1·2) 79·3 (1·6) 73·0 (2·1) Melanomas 89·0 (0·8) 86·7 (1·1) 83·5 (1·5) 82·8 (1·9)
Breast 85·9 (0·4) 76·2 (0·6) 58·1 (0·8) 51·8 (0·9) Breast 86·4 (0·4) 78·3 (0·6) 71·3 (0·7) 65·0 (1·0)
survival rate relative
Cervix uterito 67·9
70·7 (1·6) the(1·7)anticipated
Corpus uteri and uterus,
61·1 (2·1) 57·0 (2·3)
84·3 (1·0) 82·2 (1·3) 78·1 (1·6) 83·7 (1·8)
Cervix uteri
Corpus uteri and uterus,
70·5 (1·6)
84·3 (1·0)
64·1 (1·8)
83·2 (1·3)
62·8 (2·1)
80·8 (1·7)
60·0 (2·4)
79·2 (2·0)
NOS NOS
mortality in the general

Ovary
Prostate population
97·6 (0·4) 75·6 –(2·1)
48·9 (1·3) 44·5 (1·5) 36·7 (1·7) 34·7 (1·9)
(1·0) 54·6 (1·5) 43·9
Ovary
Prostate
55·0 (1·3)
98·8 (0·4)
49·3 (1·6)
95·2 (0·9)
49·9 (1·9)
87·1 (1·7)
49·6 (2·4)
81·1 (3·0)
Testis 95·2 (1·0) 93·0 (1·4) 86·4 (2·0) 84·1 (2·8) Testis 94·7 (1·1) 94·0 (1·3) 91·1 (1·8) 88·2 (2·3)
Urinary bladder 81·8 (1·0) 76·4 (1·4) 66·5 (1·9) 62·5 (2·5) Urinary bladder 82·1 (1·0) 76·2 (1·4) 70·3 (1·9) 67·9 (2·4)
thus it is a measure of excess
Kidney and renal pelvis
Brain and other nervous
62·4 (1·3)
32·4 (1·4)
53·3 (1·7)
26·9 (1·4)
46·8 (2·1)
19·5 (1·5)
41·7 (2·6)
19·9 (1·7)
Kidney and renal pelvis
Brain and other nervous
61·8 (1·3)
32·0 (1·4)
54·4 (1·6)
29·2 (1·5)
49·8 (2·0)
27·6 (1·6)
47·3 (2·6)
26·1 (1·9)
system system
Thyroid 95·6 (0·9) 94·9 (1·3) 91·1 (1·8) 96·3 (2·0) Thyroid 96·0 (0·8) 95·8 (1·2) 94·0 (1·6) 95·4 (2·1)
mortality. See the 81·0
original
Hodgkin’s disease
Non-Hodgkin lymphomas
article
(1·7) 73·9 (2·0) 66·2 (2·3)for a
57·4 (2·7)
53·4 (1·0) 43·4 (1·2) 37·0 (1·5) 30·8 (1·8)
Hodgkin’s disease
Non-Hodgkin lymphomas
85·1 (1·7)
57·8 (1·0)
79·8 (2·0)
46·3 (1·2)
73·8 (2·4)
38·3 (1·4)
67·1 (2·8)
34·3 (1·7)
Multiple myeloma 30·7 (1·7) 10·0 (1·3) 7·2 (1·4) 3·7 (1·2) Multiple myeloma 29·5 (1·6) 12·7 (1·5) 7·0 (1·3) 4·8 (1·5)
more detailed explanation of relative

Leukaemias 45·3 (1·2) 33·4 (1·3) 24·9 (1·4) 20·6 (1·5)
Rates derived from SEER 1973–98 database (both sexes, all ethnic groups).12
Leukaemias 42·5 (1·2) 32·4 (1·3) 29·7 (1·5) 26·2 (1·7)
Rates derived from SEER 1973–98 database (both sexes, all ethnic groups).12
NOS=not otherwise specified. NOS=not otherwise specified.
survival. Table 3: Most recent cohort estimates of relative survival

rates, by cancer site
Table 4: Most recent period estimates of relative survival
Source: Brenner, H. Long-term survival rates ofcancer
rates, by cancer patients achieved by the end of
site the 20th
century:
most recently. Period a period
analysis analysis.
for 1998 suggestsLancet, 260:1131-1135.
that the 2002;and 8%, although period estimates are slightly higher than
ST Ali patients diagnosed with cancer most recently have very
CMED6100 cohort
– Session 10 ones for patients with cancers other than lung Slide 23
favourable long-term survival prospects. cancer.
Cohort and period estimates of 5-year, 10-year,
15-year, and 20-year survival rates (with SEs) are shown Discussion
for 24 frequent cancers in table 3 (cohort) and table 4 These results show that long-term survival expectations
(period). Results of period analysis showed that better of patients with many types of cancer are substantially
long-term survival rates for patients with most cancers better than those suggested by conventional cohort-based
have been achieved by the end of the 20th century than estimates, which refer to cohorts of patients diagnosed
suggested by cohort estimates. Period estimates are higher many years ago. Although survival rates and their changes
than corresponding cohort ones for 16 (67%) of 24 forms over time vary strongly by cancer site, period estimates of
of cancer for 5-year relative survival, and for 20 (83%), 22 10-year, 15-year, and 20-year relative survival are about
Descriptive(92%),
tablesand 20 (83%) cancers for 10-year,
Graphs15-year, and 7%, 11%, and 11%Bump higher,plots
respectively, than traditional Results
20-year survival, respectively. Period estimates of 20-year cohort estimates for all cancers.
relative survival exceed corresponding cohort estimates by Differences in traditional estimates of long-term
about 10% or more for patients with rectal, breast and survival in cancer patients from other countries are even
ovarian cancer, melanomas, and Hodgkin’s disease, but greater, in view of the fact that survival rates of patients in
the most striking difference is seen for patients with the USA have for a long time been higher than those of
prostate cancer (37·2%). A difference of more than 5% is patients in most other parts of the world, including
An example – cancer survival
seen for patients with cancers of the colon, bladder,
kidney, and the brain and nervous system, and
Europe.17,18
Period analysis, which has been widely used in other
Perhaps it would be clearer to
leukaemias. Differences between period and cohort
estimates are less pronounced for 5-year and 10-year
areas of health statistics such as life tables and life
expectancy, was proposed for survival analysis of cancer
relative survival rates, but period estimates of 5-year and patients a few years ago.3,4 Period and cohort analyses
10-year survival are substantially higher than have been shown to yield closely similar estimates of long-
re-order the table by 5-year survival
corresponding cohort ones for some cancers, such as
those of the oral cavity and pharynx, rectum, ovary,
term survival, as long as survival rates remain constant
over time.3 Such a pattern was noted for lung cancer in
prostate, and for Hodgkin’s disease. this analysis. If survival improves over time, such
rates (see right).
By period analysis, 20-year relative survival rates are
close to 90% for cancers of the thyroid and testis, exceed
improvement is more timely captured by period than by
cohort estimates of long-term survival rates.3
80% for melanomas and prostate cancer, are about 80% The main reason why long-term survival rates obtained
for endometrial cancer, and almost 70% for bladder by cohort analysis are so much lower when major
cancer and Hodgkin’s disease. Breast cancer has a 20-year improvements in survival arise over time is because they
relative survival rate of 65%, cervical cancer 60%, and are affected strongly by survival in the first few years
colorectal, ovarian, and renal cancer about 50%. By after diagnosis. Although the same patients also affect
Alternatively we would order by
contrast, patients with cancers of the oesophagus, liver,
pancreas, and lung, and multiple myeloma continue to
long-term period survival estimates, their contribution to
the survival function is restricted to a recent period—ie,
have very poor 20-year relative survival rates between 2% many years after diagnosis—when cancer-related deaths
20-year survival rates.

THE LANCET • Vol 360 • October 12, 2002 • www.thelancet.com 1133
For personal use. Only reproduce with permission from The Lancet Publishing Group.
These data can be graphically
presented in a ‘bump plot’ ...
Figure: The 5-, 10-, 15-, and 20-year relative survival rates for various cancers.

Presenting your statistical analyses
• Remember to be consistent when rounding numbers
• Be consistent with column alignment (should usually use

right-alignment)
• Including sample size can aid interpretation
• Clarify abbreviations and methods in footnotes
Influenza study results
Table: Influenza secondary infection risks in households.
Risk of infection (95% CI)∗ p-value†

Control (n=279) Hand hygiene (n=257) Mask+HH (n=258)
Lab-confirmed influenza 0.10 (0.06, 0.14) 0.05 (0.03, 0.09) 0.07 (0.04, 0.11) 0.22
Clinical influenza(1) 0.19 (0.14, 0.24) 0.16 (0.12, 0.21) 0.21 (0.16, 0.27) 0.40
Clinical influenza(2) 0.05 (0.02, 0.08) 0.04 (0.02, 0.06) 0.07 (0.04, 0.11) 0.28
∗ 95% confidence intervals.
† By Pearson chi-square test adjusted for within-household correlation.
(1) is at least 2 of fever≥37.8◦ C, cough, headache, sore throat, aches or pains in muscles or joints.
(2) is fever≥37.8◦ C plus cough or sore throat.
Factors affecting influenza virus transmission
Characteristic n Adjusted OR∗ 95% CI for OR

Control arm 279 1.00
Hand hygiene arm 259 0.57 (0.26, 1.22)
Mask+HH arm 258 0.77 (0.38, 1.55)
Child (aged ≤5y) 44 1.91 (0.69, 5.30)

Child (aged 6 − 15y) 88 2.87 (1.42, 5.78)
Adult (aged 16+y) 662 1.00
Not vaccinated 688 1.00

Vaccinated in past 1 year 106 0.33 (0.12, 0.91)
∗
Adjusted odds ratios of lab-confirmed infection estimated under a
multivariable logistic regression model adjusting for sex of household
contact, age, sex and antiviral use of corresponding index case, and
allowing for within-household clustering.

Probability distributions Inference Comparing groups Choice of statistical methods
Part II
From probability to inferential statistics
The normal distribution

0.4
0.3
Density
0.2
0.1
0.0
17 18 19 20 21 22 23
Normal (20,1)

If we sampled from a N(20,1) distribution, the distribution of a
sample sized 64 might look like this ...
30
24
Frequency
18
12
17 18 19 20 21 22 23 24
Y

If we sampled from a N(20,1) distribution, the distribution of a
sample sized 64 might look like this ...
30
24
Frequency
18
12
17 18 19 20 21 22 23 24
Y
Means of repeated samples
• If we have a single sample of size 64 from a Normal (µ,1)

distribution, the best estimate of the mean µ is the sample
mean x̄.
• According to the central limit theorem, under repeated

sampling the sample means will follow a normal distribution
√
with mean µ and standard error of the mean σ/ n, i.e., 1/8
in our example since σ = 1 and n = 64.
Standard error versus standard deviation

Figure: Distribution of sample means vs original distribution. The
standard deviation refers to the spread of data. The standard error refers
to the variability of the mean under repeated sampling.
µ
s.e.
s.d.
Standard error versus standard deviation
Standard deviation Standard error

Describes variability in data Describes variability in sample
mean
Not affected by sample size Decreases with increasing sample

size
Most observations will fall within Mean ±2SE gives a 95% confi-
±2SD of the mean dence interval for the mean
CI for the sample mean

• Recall the central limit theorem: If X follows a distribution with mean µ
and standard deviation σ, and we take a random sample of size n,
provided that n is sufficiently large the sample mean X̄ will follow a
normal distribution, X̄ ∼ Normal(µ, σ 2 /n).
• Hence drawing repeated random samples of size n from the population,
√
we could say 95% of X̄ s fall within µ ± 1.96σ/ n.
µ
Possible case for X̄
X
●
− 1.96σ
X− σ n + 1.96σ
X+ σ n
2.5% 2.5%
µ− 1.96σ
σ n µ µ+ 1.96σ
σ n
Figure: In other random samples X̄ could be sampled here.

Possible case for X̄
− 1.96σ
X− σ n X + 1.96σ
X+ σ n
2.5% 2.5%
µ− 1.96σ
σ n µ µ+ 1.96σ
σ n
Figure: In 5% of samples X̄ will be in the tails of the distribution and

then the 95% CI will not include µ.
Definition of a confidence interval

• Under repeated samples, we can say that P% of P%
confidence intervals will contain the true population value.
• For example, 95% of all 95% confidence intervals will cover the
true population value.
• A single CI may or may not cover the true value.
• We can say that we have 95% confidence that a single 95% CI
will cover the true value, but this is simply a short version of
the definition above.
• Strictly speaking, we cannot say that there is a 95% chance
that a single 95% CI will cover the true value.
Comparing groups
x2
●
x1
●
Null hypothesis – assume both groups are samples from the same
distribution. What is the chance of getting a difference x̄1 − x̄2 as
unusual or more unusual than the difference observed?

Comparing groups
Under the null hypothesis, x̄1 − x̄2 will have a Normal distribution
with mean 0 and variance σ12 /n1 + σ22 /n2 .
UN Survey – Results from session 4

25
● X=65
20
15
Frequency
10
5
0
0 20 40 60 80
25
● X=10
20
15
Frequency
10
5
0
0 20 40 60 80
% of member states in Africa
Figure: Responses of 61 students given X = 65 and 58 given X = 10.

Observed difference versus sampling distribution

0.20
Density
0.15
0.10
0.05
0.00 ●
−6 −4 −2 0 2 4 6
X1 − X2
Figure: An observed standardised difference of 5 is at the extremes of the
sampling distribution under the null hypothesis.
Plausibility of results under the null hypothesis

0.20
Density
0.15
0.10
0.05
0.00 ●
−6 −4 −2 0 2 4 6
X1 − X2
Figure: If the null hypothesis were true, i.e. no difference between means, it
would be very unusual to observe such a large difference (whether less than −5
or greater than 5). We would only observe such a large difference in 1% of
ST Ali
repeated experiments. CMED6100 – Session 10 Slide 52
How do we interpret this?

• If we repeated this experiment many times, and if the null hypothesis
were true, we would only see differences greater than 5 (or less than −5)
in 1% of those experiments.
• The value of 0.01 or 1% is often referred to as a p-value
• Notice that the p-value is a conditional probability – it is conditional on

the null hypothesis being true.
• Small p-values, indicating that observed differences are unlikely under the
null hypothesis, are usually taken as evidence against the null hypothesis
• A common threshold is p < 0.05; in that case p-values less than 0.05 are
called ‘statistically significant’.
Common misunderstandings about p-values
1. The p-value is not the probability that the null hypothesis is

true.
• The p-value is p(such unusual data | null hypothesis is true),

not p(null hypothesis is true | such unusual data).
• We cannot derive the second probability without some

assumption about p(null hypothesis is true) and
p(such unusual data)


2. The p-value is not the probability that a finding is “merely due
to chance”.
• As the calculation of a p-value is conditional on the

assumption that a finding is the product of chance alone, it
cannot simultaneously be used to gauge the probability of
that assumption being true.
• The p-value is the probability that a finding is “merely due to

chance” if the null hypothesis is true.
• Even low probability events happen sometimes.

3. The p-value does not indicate the size or importance of the

observed effect (compare with effect size).
• In a large sample, the standard errors will be small, and

therefore even small differences may be associated with small
(and therefore highly ‘significant’) p-values.
Statistical versus practical significance
• In large samples, even small differences may lead to p-values

less than 0.05
• Sometimes such small ‘statistically significant’ differences may

not have practical or clinical importance.
• Sometimes in smaller samples we will not be able to identify

statistically significant effects even if they would be practically
significant.

4. A p-value of 1.00 does not mean the null hypothesis is true
• A p-value of 1.00 indicates that the observed data were

completely consistent with no effect (for example the primary
outcome occurred at exactly the same rate in two groups)
• Study of 10 people, Group A: 2/5 vs Group B: 2/5 experience

the event of interest – p-value for difference = 1.00
• Study of 1000 people, Group A: 200/500 vs Group B: 200/500

experience the event of interest – p-value for difference = 1.00
Choose a hypothesis test
Explanatory variable Outcome variable or response variable

or predictor variable 2 categories 3+ categories
2 categories Chi-squared test (or
3+ categories Fisher’s exact test, McNemar’s test)
Ordinal Logistic regression –a
Continuous Logistic regression –a
a
Methods for these kinds of data are outside the scope of this course.

Choose a hypothesis test (cont)
Explanatory variable Outcome variable or response variableb

or predictor variable Ordinal (e.g. Likert scale) Continuous
2 categories t-test, Mann-Whitney U t-test, Mann-Whitney U
3+ categories ANOVA ANOVA
Ordinal ANOVA ANOVA
Continuous Regression/correlation Regression/correlation
b
Methods for testing a hypothesis about a single variable or paired difference
include the 1-sample t-test, paired t-test, and the Wilcoxon signed rank test.
Multivariable regression models
Outcome variable Regression model

2 categories Logistic
Continuous Linear
Time-to-event data Proportional hazards
Count data∗ Poisson
Repeated measure∗ Hierarchical model
∗
Methods for these kinds of data are outside the scope of this course.
Goodness of fit of a regression model
• In linear regression the value R 2 quantifies goodness of fit as the

proportion of variation in the outcome variable explained by the
predictors.
• Sometimes R 2 can be ‘adjusted’ to take into account the number of

predictors – including more predictors would only increase R 2 but it is
often preferable to have a parsimonious model.
• In logistic regression there are a few other measures of goodness of fit

including the AIC, the area under ROC curve (sometimes called the
c-statistic), and the Hosmer-Lemshow statistic.

Choosing which predictors to include in a model

• There is no consensus on how to select predictors for inclusion in a
regression model. Alternatives include

– Forced entry – select which predictors you would like to include
regardless of whether they are important in the model
– Forward selection – start from a simple model and then add
predictors one by one as long as they are important (maybe with
p < 0.2?), not including unimportant ones.
– Backward selection - start from a model with many predictors and
remove the ones that are least important (maybe with p > 0.2?)
until only important ones are left.
– Stepwise selection – alternate forward and backward selection.
• Forced entry or backward selection are recommended.
Errors in assessment Misleading presentation Infographics Dishonest presentation
Part III
Misleading use of statistics
Does getting fit reduce mortality?
• Seminal study by Blair et al.∗
• 10, 000 men enrolled.
• Fitness measured by “treadmill test duration” at baseline and

at a 2nd examination after 5 days.
• Subsequent follow-up for 5 years.
• “... patients will reduce risk of mortality by increasing physical

activity and improving fitness.”
∗
Blair SN et al. Changes in physical fitness and all-cause mortality: a prospective study of healthy and unhealthy
men. JAMA, 1995; 273(14): 1093-8.

All-cause death rates per 10,000 man-years (log scale) in 9,777

men by age groups and change or lack of change in physical
fitness. Death rates are shown atop the bars and the numbers of
deaths within the bars. Source: Blair et al., JAMA, 1995
Revised plot of Blair’s findings

1000
800 Unfit −> Unfit
All−cause 600
death rates
per 10 000
man−years 400 Unfit −> Fit
200 Fit −> Fit
0
20 30 40 50 60 70
Age group
All-cause death rates per 10,000 man-years in 9,777 men by age

groups and change or lack of change in physical fitness.
Criticisms by Williams∗
• Blair only took one baseline measurement of fitness (and one
measurement at follow-up).
• What if a particular patient was feeling more energetic than
usual, on the day of his test?
''true'' observed
level level
Baseline fitness
measured via treadmill
test duration (minutes)
∗ Williams PT. The illusion of improved physical fitness and reduced mortality.
Medicine & Science in Sports & Exercise. 2003; 35(5): 736-40.
Fitness not accurately evaluated
• The treadmill duration might not accurately represent the

level of fitness of a given patient.
• We could have wrongly assessed baseline and follow-up fitness

levels.
• How might this affect the conclusions of Blair et al?
Scatterplot of 10,000 simulated

men. Nobody’s fitness changed
during the study.
‘Observed fitness’ was measured
with error, in both assessments.
Bottom 20% are “unfit”.
Note that the apparent changers
are in between those classified
as consistently fit for unfit.
Stand your ground and gun deaths - the original figure
The graph counts the number of

gun deaths in Florida. A line rises,
bounces a little, reaches a 2nd
highest peak labeled “2005, Florida
enacted its ‘Stand Your Ground’
law,” and falls precipitously. The
‘Stand Your Ground’ law removed
the duty to retreat before using
force in self-defense.

Comparing non-square areas

The “iceberg” of frequent heartburn or acid regurgitation: Proportion at each level
based on the current study findings of people with frequent reflux symptoms.
GI consulters
10%
Primary care consulters

44%
Non−consulting GER
subjects 46%
The iceberg of disease is a great concept. However it is not well suited to displaying
quantitative information. To be correct, the area (not the height) of each section
should be proportional to the percentage of interest. The correct version is on the
right-hand side.
Not confusing height with area

Figure: The “iceberg” of frequent heartburn or acid regurgitation: Proportion
at each level based on the current study findings of people with frequent reflux
symptoms.
GI consulters
Primary care
consulters
Non−consulting
GER subjects
0% 10% 20% 30% 40% 50%

Proportion
Source: Nandurkar et al., 2005 Am J Gastroenterol.
The horizontal bar chart can still give the general idea of a (half-) iceberg
shape, and this time the quantitative interpretation is correct.
Infographics
Source:
http://www.forbes.com/sites/matthewherper/2013/02/19/a-
graphic-that-drives-home-how-vaccines-have-
changed-our-world/
“Information graphics or
infographics are graphic visual
representations of information,
data or knowledge intended to
present complex information
quickly and clearly”
(Wikipedia).

Full graphic: http://scienceblog.cancerresearchuk.org/wp-content/uploads/2011/12/Attributable-risk-circles-web-

preview-550px-darker.gif
How important was the introduction of measles vaccine?

FIGURE 1 ± CANADA
MEASLES

800.00 REPORTED INCIDENCE (1935-1983)

700.00

Source: http://genesgreenbook.com/content/proof-

600.00 vaccines-didnt-save-us

500.00
It appears that measles incidence

had declined most dramatically
400.00

before the vaccine was introduced.
300.00
Measles Vaccines
Introduced
Live 1963 / Inactivated 1964

200.00

100.00

0.00

1935 1947 1959 1971 1983

Source: A dapted from: Public H ealth Agency of C anada, F igure 8 ± Measles Reported
Incidence C anada. http://www.phac-aspc.gc.ca/publicat/cig-gci/p04-meas-roug-eng.php
ST Ali
CMED6100 – Session 10 Slide 79
How important was the introduction of measles vaccine?
Source: http://www.phac-aspc.gc.ca/publicat/cig-gci/p04-meas-roug-eng.php
The original data give a very different picture.
90
Trends in air pollution
●
80 ● ●
●
70 ●
●
60 ● ● ●
●
Concentration 50
µg/m3)
(µ 40
y=0.4182x−770.55
30
R2=0.0159
20
10
0
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Figure: Cool season average RSP concentration at Tung Chung station -

fit a single line
90
Trends in air pollution
●
80 ● ●
●
70 ●
●
60 ● y=−4.2286x+8555 ● ●
●
Concentration 50 y=6x−11945 R2=0.6754
µg/m3)
(µ 40 R2=0.686
30
20
10
0
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Figure: Cool season average RSP concentration at Tung Chung station -

fit two separate lines

What does R 2 mean?

• The first figure was used to argue that pollutant levels have not changed
in recent year, second figure was used to argue that pollutant levels are
now declining rapidly due to effective government interventions
– The second figure was argued to be preferable because the

regression lines had much higher R 2 values.
• R 2 describes the amount of variation in the response variable that is
explained by variation in the predictor.
• If air pollutant levels remained similar from year to year, then variation in
pollutant levels could not be explained by time? If the correlation is low,
the R 2 would be low, but this does not imply that a horizontal line does
not fit the data . . .
Sample size Missing data Replication Review
Part IV
Practical issues
Sample size calculations
• Choose the primary outcome measure
• If determining the value/proportion to a specified degree of

precision, consider the width of the confidence interval.

Practical 3 scenario
Suppose you would like to test H0 : OR = 1 at α = 0.05 (two-sided) in a case-control
study. The prevalence of exposure in the control population is assumed to be 25%.
You can request funding from a local agency, but the budget must be no higher than
$120,000. The cost of recruiting a case is $400, while controls are easier to find and
recruit and will only cost $200 each. The following table shows the power of
alternative possible study designs to detect odds ratios of 1.5, 1.8 and 2.0:
Case-to-control ratio
2:1 1:1 1:2 1:4 1:8
OR = 2.0 0.781 0.875 0.879 0.802 0.630
OR = 1.8 0.620 0.737 0.746 0.653 0.484
OR = 1.5 0.314 0.404 0.416 0.349 0.247
The 1:2 design with 150 cases and 300 controls has the highest power, and has power
of 75% and 89% to detect ORs > 1.8 and > 2.0 respectively.
Sample size for regression adjustment
• As rules of thumb, using regression adjustment for potential

confounders will increase the power of a comparison between
two or more groups, provided the data set is large enough to
allow regression.
• Multiple regression – should have 10 observations for each
variable considered for inclusion in the model.
– some confounders may be represented by multiple variables,
e.g. age categorized in 5 groups would contribute 4 variables.
Sample size for regression adjustment
• Logistic regression – should have 10 events (e.g. deaths) for

each variable considered for inclusion in the model.
– if the event is extremely common then this rule is reversed -
should have 10 non-events for each variable.
• Survival analysis – should have 10 events (e.g. deaths) for

each variable considered for inclusion in the model.

Dealing with missing data
• Missing data on explanatory variables is a common problem in

biomedical research.
• Data not recorded (e.g. comorbid conditions); subject refused

to answer (e.g. income); data lost (e.g. laboratory technician
dropped the specimen)
• It is difficult to deal with missing data in statistical analysis!
– try to take steps to minimize the problem before/during data
collection phase!!
Dealing with missing data
• Many alternatives:
• Complete case analysis
– Exclude all subjects with missing data on any variable of
interest
• Pairwise exclusion
– Only exclude subjects with missing data on the variable on an
analysis-by-analysis basis.
• Include as separate category in regression model.
Dealing with missing data (continued)

• Mean imputation
– Assign any missing values to take the mean observed value.
• Last observation carried forward
– If patient ID102 has unknown age, assign them the age of
patient ID101.
• ‘Hot deck’ method

• Multiple imputation
– Use a statistical model to predict values of unknown
explanatory variables from all known information, and use this
model to adjust the analyses of the outcome measures.
• The two best choices are the complete case analysis and
multiple imputation.
• If it is fair to assume that missing data are missing completely

at random (e.g. laboratory specimens destroyed by accident),
a complete case analysis is appropriate
• If the chance that a value is missing may be related to other

observed values (e.g. older subjects less likely to report their
income), a complete case analysis may be biased
• In either case multiple imputation should give the most

appropriate results (a complete case analysis can be less
powerful, since the sample size is smaller)
• If the chance that a value is missing may be related to other
unobserved values then we may need specialized methods
– e.g. censoring of survival times in a trial of treatments for
cancer, where sicker patients dropped out.
Data management
• https://www.youtube.com/watch?v=N2zK3sAtr-4
• Store data in simple format if possible (e.g. CSV)
• Avoid storing data on USB drives
• Keep multiple copies of data
• Make a data dictionary, use meaningful variable names

Reproducibility and replication

• Concerns about reproducibility of epidemiologic research,
• e.g. Peng 2006 AJE – “The replication of important findings
by multiple independent investigators is fundamental to the
accumulation of scientific evidence. . . . However, because of
the time, expense, and opportunism of many current
epidemiologic studies, it is often impossible to fully replicate
their findings. An attainable minimum standard is
reproducibility, which calls for data sets and software to be
made available for verifying published findings and conducting
alternative analyses.”
Reproducibility and replication

• Wicherts et al. 2011 PLoS ONE
• The authors asked for replication data to 49 studies published

in two major psychology journals. Many did not comply even
though they were explicitly under contract with the journals
to provide the data.
• Papers whose authors withheld data had more reporting errors
(inconsistencies in their tables and p-values).
• The unwillingness to share data was particularly clear when
reporting errors had a bearing on statistical significance.
Releasing raw data
Releasing raw data can:

• Promote reproducibility of results;
• Allow other investigators to conduct their own analyses;
• Allow other investigators to compare data with theirs, for example

to explore similarities and differences between research findings.
• Allow other investigators to plan their own studies.
It is likely that open access to published data will become the

standard in the medium to long term, see for example the new
data release policy of PLoS journals.
Making sure your results are reproducible
• Save your raw data and cleaned dataset, and document all changes
made during the cleaning process.
• Consider including dates or version numbers in your dataset
filenames.
• Document all of the steps taken in your analyses, including the
specific datasets used and the sample sizes included in each analysis.
• Software which allows you to write a series of commands is
particularly useful, as this ‘script’ or ‘syntax’ can be saved and used
again later to reproduce results.
Review
• This module has covered descriptive statistics and elementary probability,

and has introduced basic topics in inferential biostatistics, including
regression, confidence intervals and hypothesis tests.
• The module is designed for postgraduate students in the Faculty of

Medicine who require elementary skills in biostatistics to complete their
projects and dissertations; therefore the primary focus of the course has
been on the practical use and interpretation of statistical methods.
• Practical sessions introduced the use of SPSS as a tool to aid statistical

analysis.
Further reading
• Altman DG, Bland JM. Missing data. BMJ, 2007; 334:424.
• Critical care series on medical statistics
http : //ccforum.com/series/CC Medical
• Statistics at square one http : //www .bmj.com/statsbk/
• Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic
research. Am J Epidemiol. 2006;163(9):783-9.
• Wicherts JM, Bakker M, Molenaar D. Willingness to Share
Research Data Is Related to the Strength of the Evidence and
the Quality of Reporting of Statistical Results. PLoS ONE,
2011; 6(11): e26828.
Course evaluation
1. Student Feedback on Teaching and Learning (SFTL)

http://sftl.hku.hk
2. MPH Student Learning Experience Survey on Course Moodle

page

Biostatistics of HKU MMEDSC Session10handoutprint3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostatistics of HKU MMEDSC Session10handoutprint3

Uploaded by

Copyright:

Available Formats

Outline

School of Public Health

ST Ali CMED6100 – Session 10 Slide 2

• To provide students with a foundation in biostatistics

ST Ali CMED6100 – Session 10 Slide 3

1. Present data using appropriate tabular and graphical formats;

2. Define and calculate standard measures of location and dispersion of data;

3. Define probability and recognize common probability distributions

4. Calculate and interpret p-values for simple hypothesis tests;

5. Interpret parameter estimates and confidence intervals from linear

ST Ali CMED6100 – Session 10 Slide 4

ST Ali CMED6100 – Session 10 Slide 5

Descriptive tables Graphs Bump plots Results

Note: if we wanted to round Heavy smoker 65% 36%

Descriptive tables Graphs Bump plots Results

Include a succinct summary

Smoking habits Lung cancer cases Healthy men

Percentages have been rounded so sums may not total.

Example description in text:

ST Ali CMED6100 – Session 10 Slide 7

Table: Tuberculosis notification rates in Hong Kong, 2012-2018, by age

Source: Notifiable Infectious Diseases, Centre for Health Protection

How could you improve this table?

ST Ali CMED6100 – Session 10 Slide 8

Descriptive tables Graphs Bump plots Results

Table: Tuberculosis notification rates in Hong Kong, 2012-2018, by age

Age groups 2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 9

Descriptive tables Graphs Bump plots Results

• As a general principle of constructing graphs, we will try to

ST Ali CMED6100 – Session 10 Slide 10

TB example by Microsoft Excel

We can make a number of improvements to this plot...

Descriptive tables Graphs Bump plots Results

Bar graph: improved TB example

0 50 100 150 200 250 300

Source: Notifiable Infectious Diseases, Centre for Health Protection

Descriptive tables Graphs Bump plots Results

Line graph: TB notifications

Figure: TB notification rate per 100,000 population in Hong Kong by age

2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 13

Line graph: TB notifications

Figure: At least we should reorder the legend

2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 14

Descriptive tables Graphs Bump plots Results

Line graph: TB notifications

Figure: Or, even better, label the lines directly

2012 2013 2014 2015 2016 2017 2018

ST Ali CMED6100 – Session 10 Slide 15

Descriptive tables Graphs Bump plots Results

Time to influenza infection

ST Ali CMED6100 – Session 10 Slide 16

Time to influenza infection

ST Ali CMED6100 – Session 10 Slide 17

Descriptive tables Graphs Bump plots Results

Time to influenza infection

ST Ali CMED6100 – Session 10 Slide 18

Descriptive tables Graphs Bump plots Results

Time to influenza infection

Figure: Or it can be even better to reverse the y-axis in these situations.