You are on page 1of 40

Lecture #2: Descriptive Statistics

I. Sampling

1. General Info
a. Impossible to obtain data from every member of a population
i. Need to take representative sample of underlying population
1. then draw inferences about the population from the sample.
b. Sample always involves an element of random variation/error.
c. Sampling statistics are essentially about characterizing the nature and magnitude of this random error.

2. Random Error
a. Def. the variation that is due to chance
i. Inherent feature of sampling, statistical inference, and measurement of biological phenomenon
1. Blood pressure measurement
b. Statistics as a field concerned w/ random error
i. Thus, a significant p-value or a precise confidence interval CANNOT tell you if the underlying data is
1. P-value/CI only statistical no info on biases
2. the further the point estimate from the null, the smaller the p-value
a. further away from the null
c. Not as important as systematic error

3. Systematic Error
a. Def. any process that acts to distort data or findings from their true value.
b. More important than random error.
i. It can be removed by better processes
c. Can be seen as selection bias, measurement bias, or confounding bias

4. Statistical Inference
a. Def. the process whereby one draws conclusions regarding a population from the results observed in a
sample taken from that population
b. Types
i. Estimation estimating the specific value of a parameter
1. Used with confidence intervals
ii. Hypothesis Testing making a decision about a hypothesized value of a parameter

II. Variation in Clinical Data

1. General Info
a. There is inherent variability in data
b. All variation is additive
i. The net observed variation is a result of the all the individual sources of variation
c. Two main categories (biological and measurement)

2. Biological Variation
a. Def. variation in the actual entity being measured
b. Outside of the science stuff, can be subdivided
i. Variation within a person (intra-person)
1. Your BP changes as a result of stimuli (time of day, posture, emotions)
ii. Variation between people (inter-person)
c. Without it, there would be nothing for epidemiologists to measure
i. The presence of biological variation is sine qua non
d. Net effect it adds to the level of random error in any measurement process
i. Can be reduced with repeated measurements

3. Measurement Variation
a. Def. variation due to the measurement process
b. Causes
i. Instrument error (inaccuracy in the instrument)

Husaini 1 of 40
ii. Operator error (inaccuracy in the person operating the test)
c. Can introduce BOTH random and systematic error
i. Systematic differences is why different laboratories establish their own reference range
d. Types
i. Inter-observer variability different observes reading the same test
ii. Intra-observer variability same person observing test at different times
e. Net effect the use of specific operational standards can reduce the impact of measurement bias

III. Validity and Reliability

1. Validity
a. Def. the degree to which a measurement process tend to measure what is intended to
i. It is the accuracy
ii. A valid instrument/test free of any systematic error/bias
1. Will be close to the underlying true value
b. Can be determined by comparing to an accepted gold standard
c. When no gold standards exist, we measure some specific phenomena or construct
i. Constructs are then used to develop a clinical scale which can then be used to measure the
phenomenon in practice.
d. Types of validity
i. Content validity includes all the dimension to be measured
1. If measuring for pain, you would include questions on aching, throbbing, burning (but not
itching, nausea, tingling)
ii. Construct validity the scale correlates with other known measures
1. A scale for depression includes questions related to it such as those about fatigue and
iii. Criterion validity scale predicts a directly observable phenomenon
1. To see if responses to pain bear a predictable relationship to pain of known severity
e. For dichotomous data validity usually expressed in terms of sensitivity and specificity
f. For continuous data can use mean, SD, correlation, and regression analysis

2. Reliability
a. Def. the extent that repeated measures of a phenomenon tend to yield the same results regardless of the
i. It is the reproducibility
ii. No comparison to a reference or gold standard
iii. Refers to the lack of random error
b. Classified as intra-observer variability or inter-observer variability
c. If not direct observation reliability can be assed with the test-retest method
i. Respondents answer the same question at two different times.
ii. Measures a form of intra-person reliability
d. Type of data measured dictates the exact statistical approach Validity= Get what your supposed to
i. Categorical data Kappa Ask family/neighbors/goldstandard
ii. Interval data intra-class correlation
Reliability = Always get the same result
IV. Statistical aspects of variability test-retest

1. General Info
a. Measures of variation
i. Is basically a measure of dispersion
1. variance (2), SD (), and range
b. Measures of agreement
i. Correlation (r) and Kappa

2. Standard Deviation ()
a. Def. the absolute value of the average difference of individual values from the mean. It is calculated by taking
the square root of the variance
1. 1 SD = 68% of total observations
2. 2 SD = 95% of total observations
a. 2 SD away from the mean is considered abnormal

Husaini 2 of 40
3. Correlation (r)
a. Def. the correlation coefficient expresses the reliability of a continuous measurement (interval data)
i. It measures the strength of the liner relationship between two continuous variables
b. Ranges from -1 to 1 (zero is no correlation)
c. Takeaway
i. If info from actual/true values then correlation is a test of validity
ii. Most cases correlation asses reliability
d. It is possible to have r values, yet have little direct agreement between observers
i. A perfect r (1.0) can be obtained if Lab A results are always exactly 10mg/dl higher than those of Lab
e. It is also often used in test-retest studies
i. Used for intra-rater or intra-person variability

4. Kappa
a. Def. reliability can be characterized for categorical/qualitative data
i. It corrects for the degree of chance in the overall level of agreement
ii. it tells us the possible agreement over and above chance the reviewers have achieved
b. The ability of kappa to adjust for chance agreement is important clinically
i. The prevalence of a particular condition being evaluated affects the likelihood that observers will
agree purely due to chance
1. Even if two people have no idea what they are doing, there will be substantial agreement
by chance alone
2. The magnitude of the agreement by chance increases as the proportion of positive (or
negative) assessments increases.
3. Two people each repeatedly toss a coin
a. Four possible options (HH, TT, TH, HT)
i. agreement 50% of the time due to chance
ii. Thus any percentage above the 50% is what we care about.
ii. If the prevalence of the attribute is either high or low, than the overall percent agreement will also be
high. In other words, if something if obviously right or obviously wrong, people are more likely to
agree as such.
1. prevalence overall agreement
2. prevalence overall agreement
3. prevalence overall agreement
c. Kappa ranges from -1 to 1
i. Negative agreement that is worse than chance
ii. Zero agreement is no better than chance
iii. Positive the amount of agreement above chance

V. Types of Data

1. Categorical
a. Nominal no order
i. Alive vs. dead, male vs. female, blood type (A,B,AB,O),

Husaini 3 of 40
b. Ordinal in a natural order, but not equally spaced
i. 1st/2nd/3rd degree burn, pain scale for migraines (none, mild, moderate, severe), Glasgow Coma Scale

2. Interval Form of the Variable decided by the investigator

a. Discrete on a number line
i. There is equal spacing between values (no fractions)
ii. Examples: # of live births, stools/day, # of sexual partners
b. Continuous lots of possible values within the range clinically possible
i. BP, IQ, BMI, random blood glucose

VI. Distributions and Measures of Central Tendency

1. General Info
a. The normal, Gaussian, distribution is the bell-shaped
i. The mean, median, and mode are all equal
b. Two ways to summarize distribution
i. Central tendency mean, median, mode
ii. Dispersion Standard Deviation

2. Central Tendency
a. Mean pulled by outliers
i. The center of gravity of the distribution
b. Median best when values are skewed
i. Will be between mode and mean when the data is skewed
c. Mode least sensitive to skewed data
i. Maximum value

3. Dispersion
a. Standard deviation used for normal (or near normal) distributions
i. 1 SD = 2/3 of the observations
ii. 2 SD = 95% of the observations

VII. Misc.

1. Abnormality
a. Abnormality depends on the population and their respective distribution
i. The cut-off will differ b/w populations
b. Best definition of abnormality
i. Being unusual greater than 2 SD from mean
ii. Sick observation regularly associated with disease
1. Most common definition
iii. Treatable only considered abnormal if treatment leads to improved outcome

2. Sub-group sampling
a. May need to obtain a larger sample from important subgroups and select subjects at random within subgroup

3. Regression to the mean some outliers may be due to random error; retesting them will cause them to move closer to
the mean.

Lecture #3: Frequency Measures

I. Quantifying Uncertainty

1. General Info
a. Uncertainty can be characterized as
i. Qualitatively unlikely, possible, suspected, etc
ii. Quantatative probability and odds
1. Can be converted back and forth
b. Often used to quantify a physicians opinion

Husaini 4 of 40
c. Quantitative odds can force one to be more exact than is justified

P = a/b
2. Probability
a. Expresses uncertainty explicitly
a = number of events
b. Numerical value between 0 and 1
b = the total number at risk
c. Calculated as a proportion

3. Odds
a. The ratio of the probability of the event occurring over the probability of the event not occurring

4. Relationship between probability and odds

a. small probability (<10%) little difference b/w probability and odds
b. large probabilities big difference b//w probability and odds
c. Probability and Odds are more alike the lower the absolute P (risk)

II. Measures of Disease Frequency

1. Ratios
a. Expressed as (A/B) where A is NOT a part of B
b. In other words, A & B are mutually exclusive frequencies
c. Ratio of blacks to white in a school was 15/300 or 1:20

2. Proportion
a. Expressed as (A/B) where A is INCLUDED in B
b. Based on a fraction in which the numerator (frequency of disease or condition) is included in the denominator
(the population)
c. The proportion of blacks in the school was 15/315 or 4.8%

3. Rates
a. Special types of proportions that are evaluated over a specified time period
i. Express the relationship b/w an event (e.g. disease) during a given time period and a defined
population at risk over the same time period
b. Must have a population-at-risk and a specific time period

III. Basic Measures of Disease Frequency

1. Prevalence
a. The proportion of the number of cases observed compared to the population at risk at a given point in time.
i. no time dimension
ii. Is the pretest probability
b. Refers to all cases of disease observed
at a given moment
c. Is a function of both the incidence rate and
the mean duration of the disease in the population
i. Ex. Arthritis no cure, so there is a long duration. Thus, the
Prevalence = burden
prevalence is high (for a given incidence rate)
Incidence = risk Ex. Rabies lethal disease, so the duration is very short. Thus,
the prevalence is very low (for a given incidence rate)
d. Conveys the disease burden preferred by epidemiological studies for disease burden

2. Cumulative Incidence Rate

a. The most commonly used measured of incidence.
i. Used for fixed/closed populations, only counts the first event (you only die once), and normally
measures stable rates (cancer rates)
1. Example Fixed Cohort (a medical school class)
b. Def. the proportion of a fixed population that becomes diseased during a stated period of time.
c. A measure of average risk.
i. The probability that a person
develops the disease in a
specified time period

Husaini 5 of 40
ii. If you have a 5-year CIR of disease is 10%, then you have a 10% chance of developing the disease
over the next five years.
d. Range of 0 to 1 and must have a reference to time CFR = Deaths/Cases
i. Thus, it must increase with time
Mortality = Deaths/Population
1. Time : CIR
e. It is the event rate in the context of randomized trails Incidence rate = cases/population
i. Control event rate (CER) for the baseline/control group
ii. Experimental event rate (EER) for the treatment group
f. Case-Fatality Rate (CFR) proportion of affected individuals that die from the disease
i. CFR = die/affected thus, we need the number of affected in the denominator
1. This contrasts to mortality rate in which the denominator is the entire population
ii. Associated with the seriousness and/or virulence of the disease
1. CFR : virulent the disease
iii. Best Measure for the lethality of the condition
g. Attack Rate number of people affected divided by the number at risk
i. Used as a measure of morbidity (illness) in outbreak investigations

3. Incidence Density Rate (IDR)

a. More commonly used in larger epidemiologic studies
i. Used for open populations at a variable starting point (randomized trial where it takes time to enroll
patients), single or multiple outcomes (UTIs), and measures highly variable rates (outbreaks)
1. Example an open cohort (a randomized control trial)
b. It represents the speed/instantaneous rate at a given point in time that disease is occurring in the
population (analogous to miles/hr a car is traveling)
i. An incidence rate of 25 cases per 100,000 population-years expresses the instantaneous speed
which the disease is affecting the population.
ii. It is a dynamic measure that can
freely change
iii. IDR = 0 disease is not occurring in
the population
iv. IDR = infinity theoretical maximum; implies instantaneous, universal effect (nuclear explosion!)
c. The numerator is the same as CIR; however, the denominator is now person-time
i. Denominator = the sum of the disease-free time experience
ii. In chronic studies, the standard measure is 100,000 person years
iii. In outbreaks, they will make more sense (such as person-days)
iv. Person time is approximated by estimating the population size midway through the time period

4. The Mortality Rate

a. Def. the frequency of death in a defined population during a specified time period
i. Measure using either CIR (cumulative Incidence Rate) or IDR (Incidence Density Rate)
1. Often given as # per person time (which is IDR)
b. Measures #s of deaths rather than # of disease events
c. Biggest contributor of mortality is incidence
i. The mortality rate is therefore some fraction of the underlying incidence rate depending on
the lethality of the condition
d. Termed all-cause mortality when all deaths are combined (regardless of cause)
e. The denominator is the population at risk for dying from the condition
i. The denominator in the CFR (Case Fatality Rate) is the number of affected individuals

Husaini 6 of 40
1. Why CRF is the best measure for the lethality of a condition
2. CRFs can be similar between two populations when intuition tells you otherwise (aka, once
you have it your fuckedits just a matter of time)
a. In other words, CRFs doesnt tell you true mortality or incidence as you dont
know who is going to get what and how often they may do so. What CRF does
tell you is that once you get it, this is how lethal it is based on those who died
divided by those who have it.
b. For example, acute myocardial infarction between males and females. Males
(10% CRF) and Females (12% CRF) while mortality was 110/100,00 person
years for males and 35/1000,00 person years for females.

Lecture #4: Effect Measures

I. Risks and Measures of Effect

1. General Idea
a. In clinical studies, it is common to calculate the risk/CIR of an event in different populations
i. By taking the ratio or difference between these two measures, we can calculate two fundamental
measures of effect
1. The relative risk
2. The absolute risk
b. The risk in the control group = baseline risk

2. Relative Risk (RR)

a. Def. the ratio of the risk in the treated group (Riskt) relative to the risk in the control group (Riskc).
i. It measures the strength/magnitude of the effect of the new treatment on mortality relative to the
effect of the standard.
ii. Null value = 0
iii. Magnitude is important # conveys less
chance/confounding biases
1. Factor that increase risk
a. <2 = small effect
Relative Risk efficacy of a treatment b. 2-5 = moderate effect
c. >5 = large effect
2. Factor the decreases risk
a. >0.5 = small effect
i. High number indicates that higher risk in treatment group
b. 0.5 -0.2 = medium effect
c. <0.2 = large effect
i. Low # indicates lower risk in treatment group
3. If RR < 1.0 treatment group lowered event rate
4. If RR > 1.0 treatment group increased event rate
a. Harmed more than the control.
b. In cohort studies, RR measures the strength/magnitude of association between an exposure and a outcome
c. Only measured where the actual incidence or risk of an event can be measured
i. RCTs and cohort studies
ii. CCS & XS cannot thus cannot measure the actual incidence/risk in a population
d. RR is favored by epidemiologists b/c it is fairly constant across different populations
i. Can be transported from one study to another
e. Limitations
i. Limited in clinical usefulness fails to convey information on the likely effectiveness of clinical
ii. Is not symmetrical The OR is symmetrical
iii. No measure of impact Not a very useful measure of impact of a risk factor on a population
1. No info on frequency or prevalence of risk factor

3. Relative Risk Reduction

a. Applied in the context of a treatment that reduces the risk of some adverse outcome
i. It indicates the magnitude of the treatment effect in relative terms
1. <10% = small treatment effect

Husaini 7 of 40
2. 10-30% = moderate treatment effect
3. >30% = large treatment effect
ii. A RRR of 38% would be interpreted as the death rate being 38% lower after the new treatment
compared to the old treatment
iii. It represents the proportion of the original baseline (control) risk that is removed by the
b. It is nothing more than a re-expression of the RR (hence they add to 1)
c. Commonly used in context of RCT RRR how much risk was removed
d. More Clinically Important more direct meaning
i. It indicates by how much in relative terms the event rate is decreased by the treatment
ii. X (Rc) = (Rc RT)

4. Absolute Risk Reduction (ARR)

a. It is simply the absolute difference in risks between the control and the
treatment groups
i. Simple and direct measure of the impact of treatment
b. More clinically useful it is a measure of the absolute benefit of
i. Preferred measure when discussing the benefits of clinical
intervention at an individual patient level
c. It will vary based on the baseline risk in the control group
i. At constant RRR, the ARR will vary based on the baseline risk
ii. The absolute benefit of treatment depends upon how much
risk there is in the population before the treatment is applied
1. An ARR in one study CANNOT be transported to
d. null value is zero


Clinically Important Least Middle Most

Assumption: same for all
Portability? Yes Will vary b/w populations
Equation RRR = 1 - RR Riskc - Riskt

T reduces outcome by X amount

Strength of the effect of T relative
(magnitude of the treatment effect -
Measures to C (magnitude of association Absolute difference b/w T & C
the proportion of the baseline that
between exposure and outcome)
was removed by treatment)

"The risk of outcome is X times "The outcome is X% lower in T "The outcome is (T-C) lower in T
lower (or higher) in T compared to C compared to C" compared to C

5. Number Needed to Treat (NNT)

a. The number of patients who would need to be treated in order to prevent one adverse effect
i. It is the amount of work required to take advantage of the potential clinical benefit of an intervention
ii. # lots of work to gain any benefit
b. It is a simpler way interpret absolute probabilities
i. Converts the probabilities into real numbers
c. As a result of relationship with ARR
i. Needs to be accompanied by a time frame
1. As time : ARR : NNT
ii. Will be influenced by baseline Baseline Risk & RRR Inversely related to NNT
1. As baseline risk : NNT
a. The less risk there is, the more people we need to treat to show anything
d. Also depends on the relative efficacy of the treatment (RRR)
i. As RRR : NNT
1. As treatment gets less
effective, we need to treat more
to get the same result

Husaini 8 of 40
e. Use NNT and NNH in concert with each other to make decisions
i. Will describe in absolute terms the trade off in both benefits and harm

II. Population-based Measures of Effect

1. General Info
a. In order to understand the impact of a risk factor on the incidence of disease in a population, we need to
i. The relative effect of the risk factor
ii. The prevalence of the risk factor in the population
b. In order to quantify the impact of the risk factors, we have the implicit assumption that the risk factor is a
cause of the disease.
c. PAR and PARF indicate the potential public health significance of a risk factor
i. Risk factor with big effect (RR =10) but is rare (P = 0.01%) has a PARF of 1%
ii. Risk factor with small effect (RR=2) but is common (P=40%) has a PARF of 44%.

2. Population Attributable Risk (PAR) PAR = RD X Prevalence of Risk Factor

a. The excess disease in a population that is associated with a risk factor
b. In other words, it is the excess disease (incidence) in the population that is caused by the risk factor

3. Population Attributable Risk Fraction (PARF) PARF = PAR/ Total Incidence

a. The fraction of total disease in the population that is attributable to the risk factor
b. In other words, it is the proportion of the total incidence in the population that is
attributable to the risk factor
i. Prevalence is the prevalence of the risk factor
c. It also represents the maximum potential impact of prevention efforts on
the incidence of disease in the
d. Used in cohort studies
e. population if the risk factor was eliminated

PAR excess disease associated with prevalence of risk factor

PARF proportion of total incidence that is attributable to the risk factor

odds of exposure (cases) a/c ad

III. The Odds Ratio OR =
odds of exposure (controls) b/d bc
1. The Odds Ratio
a. Measure of effect choice for case control studies (CCS)
i. CCS is not able to quantify the actual
incidence or risk of disease
b. Is a good approximation of the RR
i. When outcome of interest is rare (<10%)
OR more closely approximates RR
1. Odds and probability are more
alike when the risk is small
2. Odds ratio can only be interpreted
as RR when baseline risk <10%
c. It Is the odds of exposure in cases compared to the
odds of exposure in controls
i. Describes both the magnitude and strength
of an association between exposure and
1. Null value = 1.0
2. OR >1.0 = positive association between exposure and the disease
3. OR <1.0 = negative association between exposure and the disease
d. Calculate the ratio of odds of exposure in the cases (95/5) divided by the odds of exposure among the
controls (56/44)

Husaini 9 of 40
i. The odds of death due to lung cancer was 15.6 times higher in smokers compared to non-smokers
ii. Odds ratio is symmetrical if you calculated the odd of disease among the exposed (a/b) and
divided it by the odds of disease among the non-exposed (c/d), you would get the same odds ratio.

2. Limitations of the Odds Ratio

a. OR deviates from the true RR as the baseline risk in the untreated group increases
i. Noticeable once risk >10%; often the case in RCTs
b. OR deviates from the true RR as the treatment effect gets larger OR overestimates treatment effect
c. OR is always further away from the null value of 1.0 than the RR
i. thus the treatment effect is always over-estimated
d. There is nothing clinically intuitive about using the OR

Lecture 5 & 6: Statistics

I. Classical Hypothesis (Significance) Testing

General Overview
a. Concerned with making a decision about the value of an unknown parameter
b. Views experimentation as a decision making process
c. Null Hypothesis (Ho) no difference in the groups being compared with respect to the measured quantity of
d. Alternative Hypothesis (HA) the groups being compared are different
i. can be specified for direction (one-sided alternative) instead of any difference (two-sided alternative)
ii. if difference round regardless, it is called the treatment effect
iii. we can never prove Ha is true, we can only reject Ho
e. Process of testing null hypothesis consists of calculating the probability of obtaining the results observed
assuming the null hypothesis is true
i. This probability known as the p-value
1. It is the probability of observing the test statistic at least as large as the one observed
P value = % that H0 is true under the assumption that the null hypothesis is true
2. P = probability of seeing the result P% of the time assuming the null hypothesis is
f. Alpha () the significance level
i. By convention, set to 5%; can be altered to suit researchers needs.
g. If the P-value is less than Alpha, the null hypothesis is rejected (as the percentage chance of
it being true is lower than what we define as significant).

2. Steps in Hypothesis Testing

a. Define the null hypothesis
b. Define the alternative hypothesis
c. Calculate the P-value assuming the null hypothesis is true, this is the probability of obtaining the results
found in the data
d. Accept or reject the null hypothesis
i. If the probability of observing the actual data under the null hypothesis is small than the significance
level (p <), then we reject the null.
e. Accept the alternative hypothesis

3. The T-test
a. Tests means between two groups using continuous data, assuming the data is normally distributed
b. Larger values of t result in smaller p values which are more consistent with Ho being false
i. Numerator larger differences in the mean result in larger t values
1. difference : t: P Ho being false
ii. Denominator measure of the standard error of the difference
1. As sample size increases, the denominator decreases, and t
a. sample size : t : P Ho being false

4. Type I () and Type II () errors

a. Type I (FP) error
i. Occurs when we determine a difference exists when there is not one
ii. A statistically significant p value is obtained
1. Even though there is no underlying difference between the groups being compared
iii. The rate that false positives occurs is the significance level ()
1. Also known as the Type I error rate
2. Set at 5% b/c scientists are by nature cautious
a. Want to avoid false alarms only find the person guilty beyond a reasonable
Type 1 Error (FP) = significance level () doubt

Husaini 10 of 40
3. This makes sense b/c everything under this 5% will be deemed significant (even though it
Type 2 Error (FN) = beta is not assuming we are still under the FP pretense) as the null hypothesis will be rejected.
In other words, 5 times out of 100 we will have a Type I error because there are 5 chances
(0.0, 0.01, 0.02, 0.03, and 0.04) that the p-value can be under alpha (0.5).
b. Type II (FN) error
i. Occurs when we determine a difference does
not exist when in fact it does
1. When we accept Ho as false
ii. A statistically non-significant p-value is
1. Even though there is a difference
between the groups being compared
iii. The rate that false negatives occurs is beta
1. Also known as Type II error rate
2. Sample size estimates are based on
setting beta at either 20% or as low
as 10%
3. This means that a real difference would be missed 20% of the time
iv. For smaller studies, the probability of a Type II error is a lot higher
c. & have an inverse relationship
i. As one increases, the other decreases

5. Power and Sample Size Power = (1-) = sensitivity = probability of correctly rejecting Ho when Ho is false
a. Power
i. The complement of the Type II error rate power = (1-)
ii. The probability of correctly rejecting Ho when Ho is false
1. The probability of the study finding a difference when a difference truly exists.
iii. Most studies have power = 0.8 or greater
iv. Power is analogous to sensitivity
v. Easiest way to increase power = sample size
b. 4 parameters
i. (FP) error rate
1. a smaller alpha increases beta, which would lower the power of the study making it
harder to identify a real difference
a. a low more stringent test harder to prove difference exists (harder to
reject Ho)
i. Likely to get a Type II error ()
2. as : : Power
ii. (FN) error rate
1. the smaller the beta, the easier it will be to identify a difference
a. this can be accomplished by increase the sample size or increasing
2. : easier to find a difference ( power)
iii. Effect Size
1. The magnitude of the treatment difference you are trying to detect
a. Bigger differences are easier to detect than smaller differences
b. Size does matter
2. The study will be powered to find the minimal clinically important difference (the
smallest difference b/w 2 treatments what would be clinically beneficial)
iv. The variability of the data
1. The greater the variability in the data the harder it will be to detect a difference
a. variability : power
2. It is harder to detect the true signal when there is a lot of noise to contend with
3. Also true with rare events (death, relapse in follow-up study)
a. rare outcome : power
c. Problem with low power studies
i. It is difficult to interpret negative results
1. No effect? Or was there a failure to detect a true effect b/c of too small #s or outcomes
2. Low power studies also indicate imprecise measurements (wide CIs)
ii. Low power studies Type II errors ()
iii. Low power studies no effect on Type I errors ()

II. Estimation, Point Estimates, and Confidence Intervals

1. In a nutshell
a. Approaches statistical inference as a measurement exercise
i. Estimating the specific value and the precision in which the specific value is measured
b. Same info as p-value; however also gives
i. Size of treatment difference

Husaini 11 of 40
ii. The precision of the estimated difference
iii. Information that aids in interpreting a negative result

2. Point Values and CIs

a. Point estimate observed single best estimate
i. Conveys the magnitude of an effect
b. Confidence Interval set of all possible values that are consistent with the data
i. It quantifies the precision
ii. CI = point estimate +/- (percentile distrib*standard error)
1. Percentile distrib measure of confidence
2. Standard error = /sqr(n)
a. N: standard error : narrower CI
iii. Not a uniform distribution #s closer to the estimate are more likely
1. the 95% CI is symmetrical
a. (CI +1) has the same probability as (CI -1)
b. However, (CI +.5) has a larger probability of occurring than (CI+1) b/c
(CI+0.5) is closer to the point estimate so it is more likely
iv. the further the point estimate from the null, the more extreme the p-value
1. At one extreme, or the positive end (in a study to prove an increased effect)
a. then p-value would be less than 0.05 (assuming 95% CI)
2. At the other extreme, or the negative end (in the same study to prove an increased effect)
a. then p-value would be much greater than 0.05 (assuming 95% CI)
v. All the values outside the 95% CI would be statistically significant from the point estimate at p<0.05
vi. If the CI includes 0 or a negative number results are not statistically significant
1. However, results from these negative trials may still be clinically significant

3. Clinical Relevance
a. Clinicians should only adjust their practices if there is a treatment difference and that treatment is large enough
to be clinically important.
b. With wide confidence intervals, clinicians can determine what they think is clinically important and then reach
conclusions appropriate for their practice.

III. Multiple Comparisons

1. When two identical groups of patients are compared, there is a chance () that a statistically significant p value will be
obtained (type I error)
a. When multiple comparisons are performed, the risk of one or more false-positive p values increases
i. If choose enough outcomes will eventually get data that is statistically significant
2. Bonferroni Correction
a. Method for reducing the overall Type I error risk when making multiple comparisons
b. Divide the overall type I error risk desired (0.05) by the number of comparisons the new value is the for
each individual test
c. Controls the type I error risk, but reduces the power in type II error risk

Lecture 7: Clinical Testing

I. Clinical Testing (Diagnostic Strategies)

1. Hypothetico-deductive reasoning
a. Diagnostic strategy that nearly all clinicians use most of the time
b. Steps
i. Formulate hypotheses for patients primary problem
ii. First consider explanations that are most likely and/or those that are particular harmful to miss
1. Simultaneously rule out those that would be particularly harmful or catastrophic and try to
rule-in those that are considered to be most likely.
iii. Continue until list is shortened and/or candidate disease has very high likelihood (>90%)
c. The list of possibilities is reduced by considering the evidence for and against each, discussing those which are
very unlikely and conducting further tests to increase the likelihood of the most plausible candidates

II. Clinical Test Characteristics

1. Se & Sp Overview
a. Due to inherent variability in biological systems

Husaini 12 of 40
i. FPs and FNs will always occur
b. Interpretation of diagnostic results is essentially concerned with comparing the relative frequencies of the
incorrect results (FN/FP) to the correct results (TP/TN)
c. As tests normally have a continuum of values, positive and negative are divided due to a cut-off point that
differentiates between normal and abnormal
i. To the extent that the two populations (see figure) have similar
measurements, the test will not be able to discriminate between
1. Degree of overlap = measure of the test effectiveness
a. Sp and Se quantify this
ii. overlap: discriminatory power of the test
d. The presence or absence of a disease must be determined by a gold standard

2. Sensitivity
a. Defined as, the proportion of individuals with disease that have a positive test
result or the ability of a test to detect a disease when it is present
i. The true-positive rate
1. Test positives divided by total disease positives
ii. Calculated from diseased individuals
iii. The conditional probability of being test positive given the disease is present
1. Se = P (T+|D+)
2. When the disease is present, the test will be positive
b. The more sensitive a test, the better the NPV
c. A perfectly sensitive test
i. Test recognizes all actual positives it rarely misses
All diseased patients test (+) ii. Type II error (FN) we wont miss the disease
No FN results 1. Negative results rule out disease should be reassuring
All TN patients are disease free 2. All negatives must be TNs
Sizable portion of D- test positive 3. No FNs
iii. Does not tell you if disease is present
1. Test gives no information on false positives
iv. Three scenarios when high sensitivity tests should be selected
1. Early stages of work-up when large # of potential diseases are being considered
a. (-) result rules out that disease; helps narrow down choices
2. When there is an important penalty for missing the disease
a. TB, syphilis, etc
b. FNs since they are treatable, we want to make sure we dont miss them
3. Probability of disease is relatively low (low prevalence)
a. Purpose is to discover asymptomatic disease
X% of the time
Patients with this will have this test result
(indicates Se)
Duodenal ulcer History of ulcer, 50+ years, pain relieved by eating or pain after eating 95%
Favorable prognosis following non-traumatic
Positive corneal reflex 92%
intracranial pressure Absence of spont. Pulsation of retinal veins 100%
DVT (+) D-dimer 89%
Pancreatic cancer (+) ERCP 95%

d. It is the percentage of sick people who are correctly identified as sick

e. Most helpful when test is negative rules out disease
i. (+) results will depend on the rate of FPs (specificity) SnNOUT = seNsitivity = FNs = NPV
f. Used for screening in diseases with low prevalence
g. SnNOUT: a highly SeNisitive test, when Negative, rules OUT disease ( FNs)
3. Specificity
a. Defined as, the proportion of individuals without disease that have a negative test result or the ability of a
test to indicate non-disease when disease is not present.
i. The true-negative rate SpPIN = sPecificity = FPs = PPV
1. Test negatives divided by total disease negatives
ii. Calculated from non-diseased individuals
iii. The conditional probability of being test negative given the disease is absent
1. Sp = P(T-|D-)
2. When the disease is absent, the test will be negative
iv. If Sp were 75%, the FP rate would be 25%
b. Low specificity
i. Will have lots of FPs
ii. We want to make sure we dont miss anything
(similar to airport screening)
iii. Analogous to high sensitivity

Husaini 13 of 40
c. The more specific a test, the better the PPV
d. A perfectly specific test
i. Test recognizes all actual negatives confirms health
All non-disease patients test (-) ii. Type II error (FP) well catch it if you have it
No FP results 1. Positive results rule in disease true bad news
2. All positives must be TPs
All TP patients have disease iii. Does not tell you if disease is absent
Sizable portion of D+ test negative 1. Test gives no information on false negatives
iv. Two scenarios when high specificity tests should be selected
1. To rule-in a diagnosis that has been suggested by other tests
2. When FPs can harm the patient physically or emotionally
a. Confirmation of HIV or cancer
b. When we want to be absolutely sure a condition is present

X% of the time
Patients without this will have this test result
(indicates Sp)
Alcohol dependency No to 3 or more of the 4 CAGE questions 99.7%
Fe-deficiency Anemia (-) serum ferritin 90%
Breast Cancer (-) fine needle aspirate 98%
Strep Throat (-) pharyngeal gram stain 96%

e. It is the percentage of healthy people that are correctly identified as healthy

f. SpPIN: a highly SPecific test, when Positive, rules IN disease ( FPs)
4. Trade off Between Sensitivity and Specificity
a. No such thing as a perfect test; must have a trade-off
b. For continuous scales, location of cut-off point is arbitrary
i. Can be modified for the purpose of the test
c. Lowering the cutoff point Se but Sp
i. FN; FP

5. Receiver Operator Characteristic (ROC) curves

a. Uses
i. Compare the accuracy of two or more tests
ii. Illustrate the trade-off b/w Se & Sp as the cut-point is changed
1. Slope of ROC curve (TP rate : FP rate) is the likelihood ratio
b. Constructed by plotting the sensitivity (true positive rate) against the false positive rate (1-specificity) for a series
of cut-points
c. Best discriminating tests lie further to the north-
i. FN rates (indicated by high Se)
ii. FP rates (indicated by high Sp)
d. To discriminate b/w diseased and non-disease individuals
i. Area under ROC curve (AUROCC) indicates the
overall accuracy of the test
1. 0.5 no discriminating ability
2. 1.0 perfect accuracy
e. Tests w/ no discriminating ability diagonal straight line
i. Equivalent to likelihood ratio (LR) of 1.0
f. Deciding on cut point
i. Influenced by
1. Likelihood of disease (prevalence)
2. Relative costs (risk/benefit) associated
with errors in diagnosis
a. Includes both FN & FP
ii. If cost of missing a diagnosis (FN) is high
(compared to FP) we want a low FN
1. Operate on horizontal part of the curve (60 units)
a. FN result are minimized at the expense of FP results
b. Maximizes sensitivity (~90%) while providing reasonable specificity (~50%)
iii. If cost of falsely labeling a health person as diseased (FP) is high compared to the cost of
missing a diagnosis (FN) we want a low FP
1. Operate on the vertical part of the curve (320 units)
a. FP results are minimized at the expense of FN results
b. Maximizes specificity (~99%) while providing moderate sensitivity (~40%).

III. Prevalence and Predictive Value

1. General Idea

Husaini 14 of 40
a. Se & Sp can only be calculated if the true disease status is known
i. They are conditional on the disease status being either positive (Se) or negative (Sp)
b. Clinician is using test precisely b/c they do not know the disease state
i. We want to know the conditional probability of disease given a test result

2. Positive Predictive Value (PPV)

a. Defined as, the probability of the disease in a patient with a positive (abnormal) test
i. True Positives
b. The conditional probability of being diseased given that the test was positive
i. PVP = P(D+|T+)
ii. Sp & PVP are linked as both provide info on FP rate
c. Influenced by Sp and Prevalence
i. Decreases as prevalence decreases
1. The relative size of D+ (cell a) to D- (cell b) is now much smaller
2. prevalence = PPV
ii. Increases as specificity increases
1. The more specific the test, the less FPs there will be and thus, more likely those that test
positive will actually have the disease/condition
2. Sp = PPV
d. A highly specific test helps rule-in disease b/c PVP is maximized
i. SpPIN implies that 1-PPV is very small
1. # = they probably have it (as PPV is )
2. # = less able to rule in disease (not all test positive patients require follow-up)
ii. test specificity = better the PPV

3. Negative Predictive Value (NPP)

a. Defined as, the probability of not having the disease when the test result is negative (normal)
i. True Negatives
b. The conditional probability of not being diseased given the test was negative
i. PVN = P(D-|T-)
ii. Se & PVN are linked as both provide info on FN rate
c. Influenced by Se & Prevalence
i. Increases as prevalence decreases
1. The relative size of D- (cell d) to D+ (cell c) is much larger
2. prevalence = NPP
d. A highly sensitive test helps to rule-out disease because PVN is maximized
i. test sensitivity = better NPV
e. Clinically, more interested in the complement of PVN (1-PVN)
i. = P(D+|T-)
ii. Tells the clinician what the probability is of having the disease despite testing negative
1. Informs on the rate of FNs among all negative test results
2. SnNOUT implies that 1-NPV is very small
a. # = they really dont have it (as NPV is )
b. # = less able to rule out the disease (cant send a test-negative patient
home with a clean bill of health
f. PVN = few FN among all test negative results
i. Indicates an alternative diagnosis should be sought

4. Prevalence
a. Represents the proportion of the total population tested that has the
b. It is the third force that often goes unnoticed before revealing
its influence in dramatic fashion
i. Has dramatic influence on PVN & PVP
c. AKA likelihood of disease, prior probability, prior belief, prior odds,
pre-test probability & pre-test odds.
d. As prevalence falls
i. PPV
ii. NPV
e. As prevalence increases

Husaini 15 of 40
i. PPV
ii. NPV

IV. Bayes Theorem

1. General Idea
a. Calculating the predictive values for any combination of Se, Sp, & prevalence (using 2x2 tables)
b. The process by which disease probabilities are revised in face of new test information
c. Can be used to calculate PPV, NPV, Se, Sp, and Prev.

2. Example w/ Se = 90%, Sp = 80%, & prevalence = 10%

a. Make grid, fix N, and then calculate the expected number of diseased by applying the prevalence rate to N
b. Calculate D+|T+ (cell a) by multiplying # diseased by Se
i. Place remaining under D+|T- (cell c) as these are the
1. FNs = (1-Se)* # diseased
c. Calculate D-|T- (cell d) by multiplying # healthy by Sp
i. Place remaining under D-|T+ (cell b) as these are the
1. FPs = (1-Sp)* # healthy
d. Use top two rows (cells a&b) to calculate PPV & the lower
two rows (cells c&d) to calculate NPV
V. Multiple Testing Strategies

1. General Idea
a. Tests with sufficiently high Se & Sp that can simultaneously
rule out and rule in are very rare
i. We only have access to an array of imperfect tests
b. Get more information by combining tests

2. Parallel Testing
a. Situation where several tests are fun simultaneously
i. Any one positive test leads to further evaluation
b. Used in early phases of work-up when trying to , rule stuff out
i. Lots of negative tests = condition ruled out
c. Positive test only means that more testing is needed
i. b/c of in FP
d. Very costly, highly inefficient, dangerous to the patient, and is bad medicine
e. Best when need highly sensitive test (yet only have a couple insensitive tests)
i. Combining the relatively insensitive tests in parallel maximizes your chance of identifying
diseased subjects.
f. Net effect
i. likelihood of detecting disease
1. Se (there are multiple opportunities to find a positive test result)
2. PVN
ii. risk of FP
1. Sp
2. PVP

3. Serial Testing
a. Situation where several tests are run in order
i. Each subsequent test is only run if the prior test was positive
b. Any negative test work-up stopped
c. Best used when
i. We want to be sure a disease is ruled in w/ certainty
1. Used when we have no time constraints
ii. If the definitive test is expensive, difficult, or invasive
1. To avoid overusing, we make sure the patient is positive to other tests before advancing
a. Example colonoscopy after (+) fecal occult blood test
d. Great example of the logic of Bayes theorem to revise probabilities
i. The results of the first test are used to provided pre-test probabilities for the second test
e. Net effect
i. Sp & PPV b/c each case has to test positive to multiple tests
1. FPs
ii. Se $ NPV

Lecture 8: Screening and Prevention

Husaini 16 of 40
I. Introduction

1. General Idea
a. Goal of screening mortality & morbidity (and/or expensive or toxic treatment)
i. Is a form of secondary prevention
ii. Designed to detect disease early in asymptomatic phase
1. Early treatment either slows disease progression or provides a cure
iii. Premise is based on concept that early treatment will stop/retard disease progression
iv. Screening has both diagnostic and therapeutic components

2. Results of screening
a. Unlikely to have the disease (both FN & TN)
b. Likely to have disease therefore requires further diagnostic evaluation

Testing Screening
Sick patients are tested Healthy, non-patients are screened
Diagnostic intent No diagnostic intent
to disease prevalence to disease prevalence

3. Two fundamentally different types of screening

a. Mass/population based application of screening tests to large, unselected populations
i. Mammography screening for breast cancer in women <40 years
b. Case finding use of screening by clinicians to identify disease in patients who present for other unrelated
i. Blood pressure measurements

II. Characteristics of Disease

1. Pre-clinical phase (PCP)

a. Defined as, the period between when early detection by screening is possible and when the clinical
diagnosis would usually have been made
b. Must be sufficiently long in order for a disease to be a suitable candidate for screening
i. The point that a typical person seeks medical attention depends upon availability of medical care, as
well as the level of medical awareness in the population
c. Examples
i. Long PCP screening might useful colorectal cancer (PCP = 7-10 years)
ii. Short PCP unlikely for screening childhood diabetes (PCP = weeks to months)
d. The prevalence of detectable pre-clinical disease in a population is a critical determinant of the
potential utility of screening
i. Prevalence of disease itself is not the critical component
ii. Depends on
1. Incidence of disease
2. Average duration of pre-
clinical phase
3. Recent screening ( prevalence)
4. Detection capabilities of the test
( sensitive test = prevalence)

III Lead Time

1. General Idea
a. Defined as, the interval from detection by screening to the time at which diagnosis would have been
made without screening
i. It is the central rational of screening as it equals the amount of time by which treatment is
advanced or made early
ii. Results in longer awareness of disease
b. Does not necessarily imply any improved outcome
i. After lead time has occurred, early treatment must then be effective for screening to be beneficial

Husaini 17 of 40
2. Lead time is not a theory or a statistical artifact
a. It is what is expected w/ an early diagnosis
b. It must occur for screening to be worthwhile
i. It is therefore a necessary condition for screening to be effective in reducing mortality
3. Distribution of lead time is important
a. It indicates the length of time by which detection and treatment must be advanced in order to achieve a
level of improved mortality
b. Suggests how often screens should be done

IV. Characteristics of Screening Tests

1. Sensitivity
a. Defined as, the proportion of cases with a positive screening test among all cases of pre-clinical disease
b. In order to be accurate, all pre-clinical disease individuals must be identified w/an acceptable gold standard
diagnostic test
i. The true disease status of negative screening individuals is impossible to verify
1. No justification for a full diagnostic work-up
a. Excellent example of verification bias
c. sensitive a test = better the NPV
d. Imperfect sensitivity affects a few (the cases)
i. An imperfect (sensitive) test will have a lot of FNs, so a lot of diseased people will be
classified as negative; thus, it is affecting the cases
e. Can only be estimated in screening studies by counting the # of interval cases that occur over a
specified period in persons who tested negative to the screening test
i. In other words, count the people who got the disease but tested negative
1. false negatives (FNs) = screening Se

2. Specificity
a. Defined as, the ability of screening test to designate as negative people who do not have pre-clinical disease
b. Imperfect specificity affects many (the healthy!)
i. An imperfect (specific) test will have lots of SPs, so a lot of healthy people will be classified
as positive; thus, it is affecting healthy people
c. specific a test = better the PPV
d. For screening to be feasible the FP rate (1-Sp) needs to be sufficiently low
i. Since prevalence in pre-clinical disease is always , the positive predictive value (PPV) will be low in
most screening programs
1. pre-clinical prevalence : PPV ; thus, we need FP
ii. PPV can be improved by
1. screening only high risk populations
2. using a lower frequency of screening (which pre-clinical prevalence)
a. repeatedly screening will catch everyone
i. pre-clinical prevalence
1. No one else to catch
ii. PVP will in a successful screening program
1. Less people to catch

3. Yield
a. Defined as, the amount of previously unrecognized disease that is diagnosed and brought to treatment
as a result of screening
b. Affected by
i. the sensitivity of the screening test
1. Se = smaller fraction of diseased individuals are detected at any screening
ii. Pre-clinical disease prevalence in the population
1. pre-clinical prevalence = yield
a. Aiming screening programs at risk populations will efficiency

V. Evaluation of Screening Outcomes

1. Methods
a. Experimental
i. Conduct a RCT of the screening modality
1. compare the disease specific cumulative mortality rate
a. groups randomized to screening
b. control
2. allows one to study effects of early treatment
3. estimate distribution of lead times
4. identify prognostic factors
ii. randomized design critical
1. eliminating confounding (known & unknown)
2. allowing a valid comparison between groups

Husaini 18 of 40
a. unaffected by time bias
1. Expensive, time, ethical concerns
b. Non-experimental
i. Cohort comparison of advanced illness or death rate b/w people who chose to be screened and
that do not
ii. CCS comparison of screening history b/w people w/ advanced disease/death and those unaffected
iii. Ecological correlation of screening patterns and disease experience of several populations
c. Problems with non-experimental
i. Confounding due to health awareness
1. Those that choose to get screened are more health conscious and have lower mortality
ii. Poor quality, often retrospective data
iii. Difficult to distinguish screening from diagnostic examinations

2. Measures of effect
a. Comparison of survival experience/duration
i. the efficacy of a screening program cannot be assessed by comparing the duration of survival of
screen detected cases versus cases diagnosed clinically
1. Although common, they over-estimate the effect of screening
a. Selection bias patients who chose to get screened are more health
conscious, better educated, and have an inherently better prognosis
i. may also occur when subjects decide to get screened b/c they have
b. Lead time screen-detected cases will survive longer even without benefit of
early treatment
i. Simply b/c they were detected earlier!
ii. Survival is increased due to lead time
c. Length-based sampling screen detected cases represent a sample of
cases prevalent in the asymptomatic pre-clinical phase
i. It is not simply a sample of all cases in a population
ii. screening preferentially identifies slow growing, indolent cases that
have a long pre-clinical phase
1. Slow growing tumors will obviously have a better prognosis
as they have a long pre-clinical phase and a long clinical
b. Disease-specific mortality rate (DSMR)
i. The only true valid measure of the efficacy of a screening program is to conduct a randomized
screening trail where the DSMRs are compared b/w groups assigned screening or no screening
ii. Unlike survival time, the DSMR will not be changed by early diagnosis/lead time
1. DSMR accurately reflects benefit of screening
iii. One major problem with DSMR
1. Within the confines of a randomized screening trial, the specific cause of death is usually
assigned by an adjudication committee
a. Since they get all the information they need to properly figure out the cause of
death, they can pretty much figure out what screening group they were in
i. Study becomes un-blinded
ii. Tendencies in a breast cancer
1. Deaths in mammography trial not breast cancer related
2. Deaths in control group over diagnose breast cancer as
cause of death
2. Debate is now if the ideal measure should be all-cause mortality
a. Not subject to these biases

VI. Pseudo-disease or Over-Diagnosis

1. General Idea
a. A potential negative side-effect of screening is pseudo-disease or over-diagnosis
i. Identifying disease that wouldnt become clinically apparent if not for screening
b. Involves three forms
i. Over-diagnosis
1. Cases detected what would have never progressed to a clinical state
a. Ex. Cancer cases w/limited malignant potential; PSA testing and low-grade
prostate cancer; mammography and ductal carcinoma in situ
b. Is an extreme form of length-based sampling
2. Pap testing
a. incidence of invasive cervical cancer
b. in overall incidence of cervical cancer b/c of over-diagnosis of carcinoma in
ii. Competing risks

Husaini 19 of 40
1. Cases are identified that would have been interrupted by an unrelated death
a. Identification of prostate cancer in an 85 year old man who would have died of
iii. Serendipity
1. The identification of disease due to non-related diagnostic test
a. Chest X-ray for TB identifies lung cancer

VII. Feasibility & Need for Screening

1. Prevention paradox
a. Occurs when a majority of the patients come from a low to moderate risk pool (low to moderate
hypertenstion) while only a few come from a high risk pool (extreme hypertension). Also seen in mothers
who have Down Syndrome babies (A majority of Down Syndrome babies come from younger, lower risk
mothers than the older, higher risk mothers)
i. Paradox = It is common and logical to equate high risk populations with a majority of the
b. A preventative measure may provide a large benefit to the community at-large, but very little to the individual. It
explains how the absolute benefit provided by a preventive action to the individual can be small, yet, collectively
the benefit may be significant. Example, if everyone in a community always used a seatbelt, over the lifetime
one subject out of 400 would be saved from dying in a motor vehicle accident. The net benefit on an individual
level is small, but it is large when judged from the community level.
2. Other important Issues
a. Assessability
i. Program should be convenient, free of discomfort, efficient, and economical
b. Efficiency
i. PPV = wasteful program
1. Most of the test positive individuals will not have the disease
ii. PPV = normally good
1. Can still be associated w/only a few cases detected and thus, only a small reduction in
overall mortality
iii. No reduction in mortality if
1. Mortality from the disease is normally low
2. Risk of death from other causes is high (in the aged)
c. Cost-effectiveness
i. Should these health dollars be spent on this program?
1. Most population based screening programs are about 30-50K/year of life saved

3. Evaluating the Need and feasibility of screening

a. For subjects who would develop the condition that you are trying to help into one of the following three groups
i. A cure is necessary, but not possible (Nec, NotPos)
1. If target population is death from lung cancer, these subjects are going to die regardless
screening would not be beneficial
ii. Cure is possible but not necessary (Pos, NotNec)
1. If target subjects includes those who develop lung cancer but will not die of it (over-
diagnosis will die of something else before dying of lung cancer) screening would not
be beneficial
iii. Cure is necessary and maybe possible (Nec, Pos)
1. This target group is the only group that can benefit from screening. It represents the cases
of lung cancer that individuals would have died from if not for the screening program
b. It is helpful to consider the relative sizes of these three groups
i. A reasonable estimate can be made on knowledge of the natural history of disease, the potential of
the intervention to identify the condition early, potential treatment to impact the outcome, and the
potential to identify undiagnosed by benign disease

Lecture 9: The RCT

I. Introduction to the RCT

1. General Idea
a. Experimental study conducted on clinical patients
b. Investigator controls everything
i. The exposure type, amount, and duration
ii. Randomization who receives what
c. Most scientifically vigorous study
i. Groups are equivalent w/respect to baseline prognosis
1. The unpredictable random assignment eliminates/reduces confounding from known and
unknown prognostic factors.
ii. No biased measurements

Husaini 20 of 40
1. Blinding ensures that outcomes are measured with the same degree of accuracy and
completeness in every patient
iii. Main potential biases are selection and measurement and they are small compared to cohort,
d. Can confidently attribute cause and effect
i. As a result of the conditions, the presence or absence of treatment is the only thing that differed
between the two groups
1. thus, any effect is a result of the respective group
ii. Has a high internal validity
1. The experimental design ensures that strong cause and effects conclusions can be drawn
from the results
e. Gold standard to determine the efficacy of treatment

2. Two Types of RCTs

a. Prophylactic trials
i. Evaluate the efficacy of an intervention designed to prevent disease
1. Vaccine, vitamin supplement, patient education, screening
b. Treatment trails
i. Evaluate efficacy of a curative drug or individual
1. Designed to manage/mitigate signs and symptoms of disease

3. Levels of RCTs
a. Individual level highly select group, tightly controlled conditions
b. Community level large groups, less rigidly controlled conditions
i. Test interventions for primary prevention purposes

II. An Overview of the RCT Design and Process

1. Inclusion criteria
a. Done in order to optimize
i. The rate of the primary outcome
ii. The expected efficacy of the treatment
iii. The generaliziblity of the results
iv. The recruitment, follow-up, and compliance of patients
b. Goal is to identify sup-population whom the intervention is feasible and will produce the desired effect
i. Choice of inclusion criteria represents a balance between
1. Picking the people who are most likely to benefit
2. Not sacrificing the generalizability of the study
ii. If too restrictive study population is so unique, it cant be applied to other populations

2. Exclusion criteria
a. Valid reasons for excluding patients that would mess-up the study
i. The risk of treatment/placebo is unacceptable
ii. Treatment is unlikely to be effective for the respective patient
1. Disease is too severe, too mild, or treatment has already failed in the patient
iii. Co-morbities interfere w/ intervention, measure of outcome, expected length of follow up (terminal
iv. Compliance patient unlikely to complete follow-up or adhere to protocol
v. Other practical reasons
1. Language, cognitive barriers, no phone
b. Goal is still to identify sup-population whom the intervention is feasible and will produce the desired
i. Avoid excessive exclusions
1. Will add to complexity of screening process (every patient needs to be assessed, so
exclusions = time) and ultimately decrease recruitment
a. exclusions = complexity = recruitment
ii. Trade off between
1. Patients more likely to make the study a success
2. Sacrificing generalizibility
a. If too restrictive, wont apply to real world
b. internal validity; external validity

3. Baseline Measurements
a. Necessary to collect sufficient (but not excessive) demographic to illustrate that the randomization process
resulted in identical groups
b. Baseline info to be collected
i. Demographics of participants
1. Important to demonstrate that randomization process worked
ii. Contact & identifying info from patient and contact info from friends, family, etc
1. Important to track subject during study prevent loss-to-follow up

Husaini 21 of 40
iii. Major clinical and prognostic factors for the primary outcome that can be evaluated in pre-specified
subgroup analyses
1. If we thought treatment effect would dependent on gender, we would collect info on gender

4. Randomization and Concealment

a. Randomization process should be reproducible, unpredictable, & tamper proof
i. The process of generating the randomization scheme & steps taken to ensure concealment should be
described in detail
ii. The scheme itself should be unpredictable (cant predict what group the next person is going to be
1. Unpredictability is assured through concealment
a. Concealment critical in preventing selection bias (the potential for
investigators to manipulate who gets what treatment)
b. Results in balance of known and unknown confounders
c. Randomization and concealment are separate from blinding Concealment = prevent selection bias
i. Concealment Blinding (Before study has begun)
1. Concealment designed to prevent selection bias
2. Randomization prevent confounding bias
3. Blinding to reduce measurement bias Blinding = prevent measurement bias
ii. They are mutually independent (you can have one w/o the other)
d. Randomization schemes (After study has begun)
i. Simple coin flip
ii. Blocked Randomization randomization done w/in blocks of 4-8 subjects
1. Ensures that there is an equal balance b/w groups
iii. Stratified Blocked Randomization strata are defined according to a critically important
factor/subgroup (gender, disease severity, or study center)
1. Ensures balance b/w groups and within each subgroup

5. Intervention
a. Important to balance potential benefits vs. risks of intervention
i. Everyone is exposed to potential side effects of interventions
ii. Yet, not everyone will benefit from the intervention
1. Not everyone will develop the outcome; no intervention is ever 100% effective)
iii. Thus, caution dictates using the lowest effective dose
b. RCTs designed on premise that serious side effects will occur much less frequently than the outcome
i. Thus, RCTs are very under-powered to detect side effects ( : : power)
1. Phase IV post-marketing surveillance studies are around to check serious side effects once
drugs make it to market b/c many more people will the power so rare, but serious, side
effects can be uncovered.
c. Control group measures the cumulative effects of all other
factors except for the treatment
i. Spontaneous improvements due to natural history
ii. Hawthorne effect subjects improve simply b/c they
are being studied
iii. Placebo Effect Its in your head

6. Blinding (masking)
a. Cardinal feature of RCT
i. It preserves the benefits of randomization by
preventing biased assessment of outcomes
b. Blinding = prevents measurement bias
c. Helps reduce non-compliance, contamination, & cross-overs
i. Especially true in control group (they are unaware
that they are not getting the active treatment)
d. Types
i. Single blind either patient or physician is blinded
ii. Double blind both patient and physician are blinded
iii. Ideally patients, caregivers, collectors of outcome data, adjudicators of the outcome data, & the
data analyst
e. Placebo
i. Defined as, any agent or process that attempts to mask, or blind, the identity of the true active
ii. Common feature of drug trails
iii. Especially important when primary outcome measure is non-specific (soft)
1. soft = patients self reporting pain, nausea, depression, etc
2. Placebo effect the tendency for such soft outcomes to improve in study
participants (regardless of control vs. treatment)
a. The effect is regarded as the baseline against which to measure the effect of
active treatment
iv. Placebos are not justified when known standard to care already exists

Husaini 22 of 40
1. Must give patients the minimum standard of care
a. Ex. In stroke prevention trial w/ anti-platelet drugs, aspirin would be the minimum
standard of care that would be used as the control group
f. Many times blinding/placebo are not feasible (surgical interventions)
i. Difficult to mask who got surgery and who didnt
ii. Study referred to as an open trial
iii. Blinding may be hard to maintain
1. When treatment has clear and obvious benefit or side effect
a. Turns urine orange, etc
iv. In such cases, use hard outcomes to standardize treatment/data collection
1. Blinding feasible hard outcomes (any cause death)

7. Loss to follow up, non-compliance, & Contamination

a. General Idea
i. All patients should be accounted for throughout the trial
ii. All three of these are potential problems with RCTs
1. Will of patients, so this power
2. If occurs in a non-random fashion, will introduce bias
iii. Will eventually translate to the slow, prolonged death of the trial
1. The original RCT degenerates into a observational (cohort) study
a. The active/compliant participants in each arm at no longer under the control of
the study investigator.
iv. Trailists go to great lengths as they attempt to reduce these problems
1. Try to enroll patients who are more likely to be compliant and not LTFU
a. Use two screening visits (time wasters), pre-randomization run-in periods for
drug trails (early non-adheres)
2. Once patients enrolled, imperative to minimize LTFU by tracking and following-up on
b. Loss to follow-up (LTFUs)
i. Normally related to outcomes of interest
1. LTFU more likely if side effects, patient moved away, got worse, got better, or simply lost
2. Death also is a cause for LTFU
3. LTFU = sample size = power of the study
ii. If final outcome of subjects LTFU is unknown, that patient cannot be included in the final
1. Can have significant negative effect on the studys conclusions
a. power (smaller sample size)
b. Biased results (differential LTFUs.not equal b/w groups)
iii. Negative effects of LTFU CANNOT be easily corrected by the intention-to-treat analysis
1. w/o knowledge of final outcome status, these patients must be dropped from the analysis.
iv. Major problem with trying to do an intention-to-treat-analysis (see below)
c. Poor Compliance
i. Can be related to outcomes of interest
1. Presence of side-effects, iatrogenic drug reactions, patient got better/worse or simply lost
ii. Non-compliance = expected to have worse outcome (than those who comply)
1. Regardless of what treatment group they were assigned to
iii. Important to assess the degree of non-compliance and the degree to which it differentially affects one
arm of the study vs. the another
d. Contamination
i. Defined as, the situation when subjects cross-over from one arm of the study into the other; thereby
contaminating the initial randomization process
ii. Ex. Early AIDS RCTs; patients got treatments assayed by private labs to see if they were getting
the placebo or active drug (AZT). If they were getting the placebo, they would buy AZT on the street
e. Solutions
i. Intention-to-treat-analysis (ITT)
1. Defined as, the principle that ALL participants are analyzed according to their
original randomization group or arm regardless of protocol violations
a. All subjects should be included in both numerator and denominator of group
event rates
2. Gold standard for RCT
3. LTFUs make ITT very problematic
a. If subject included in denominator but not numerator
i. The event rate in that group will be underestimated
b. If subjects simply dropped from the study (which is common)
i. Analysis can be biased
1. de-factor per protocol (PP) analysis if dropped b/c
outcome status is known

Husaini 23 of 40
c. In order to mitigate problems
i. Trails will impute/extrapolate an outcome based on missing data
1. Using the last or worse observation
2. Attempting to predict the unobserved outcome based on the
characteristics of the subject
ii. Results should always be viewed with caution
ii. 5 & 20 rule
1. Technique to assess the likely impact of poor compliance & LTFU; The percentages are
those of the study participants affected by LTFU or non-compliance
a. If affects <5% = bias is minimal
b. If it affects >20% = bias is likely to be considerable
iii. best case/worst case sensitivity analysis
1. Assess potential impact of LTFU
2. Best case
a. LTFU subjects assumed to have best outcome (no adverse outcomes)
b. Event rates calculated counting all LTFU in denominator but not in the numerator
3. Worst case
a. LTFU subjects assumed to have worst outcome
b. Event rates calculated counting all LTFU in both numerator and denominator
4. Overall potential impact is then gauged by comparing the actual results with the range of
findings from the best case/worse case sensitivity analysis
a. High range of estimates imply studys findings are questionable

8. Measuring outcomes, Sub-group analyses, and Surrogate end points

a. 1 & 2 study outcomes (w/ associated definitions) should be defined before the study is started
i. Termed a priori or pre-specified comparisons
b. Best outcomes are hard, clinically relevant end points (disease rates, death, recovery, complications,
hospital/ER use)
i. Need to be measured w/ accuracy, precision, and measured in the same manner in both groups
ii. Outcomes need to be clinically relevant to the patients themselves
1. Death, recovery, complications = patient-important or patient-relevant outcomes
c. Surrogate end points
i. Hard outcomes cannot always be used
1. It takes too long to measure disease mortality
2. Thus, these end points are used to reduce the length and/or size of the intended study
should be based on validated biologically relevant endpoints
ii. Need to ensure whether a surrogate end point used is an adequate measure of the real outcome of
iii. Ideally, a prior RCT should prove that the end point is a valid surrogate measure for the real
outcome of interest
1. Ex. Study designed to reduce stroke incidence
a. Degree of BP reduction is considered a valid surrogate end point b/c of the
known causal relationship b/w BP & stroke risk
d. Pre vs. post-hoc sub-group analyses
i. Sub-group analyses
1. Defined as, the examination of the primary outcome among study sub-groups
defined by key prognostic variables such as age, gender, race, disease severity, etc
2. Identifies whether treatment has different effect w/in specific sub-populations
a. Differences in efficacy of treatment b/w sub-groups may be described in terms of
a treatment-by-sub-group interaction
3. All sub-group analyses need to be PRE-SPECIFIED ahead of time
a. Natural tendency among author of trials that didnt show positive result tend to
go fishing for any positive results w/in the subgroup comparisons
b. Leads to multiple comparisons potential for FPs
i. Thus, post-hoc (non pre-specified comparisons) should be
regarded as exploratory findings that should be re-examined in
future RCTs as pre-panned comparisons

9. Statistical Analyses
a. Should be straight forward as the design should have created balance b/w all the factors except for the
i. Simple matter of comparing 1 outcomes b/w groups
1. Continuous data t-test
2. Categorical outcomes chi-square
3. Small data or not Gaussian non-parametric methods
4. Survival type studies Kaplan Meire survival curves or Cox Regression modeling (will
measure the fraction of patients living for a certain amount of time after treatment)
b. Intention-to-Treat Analysis (ITT)
i. Most important concept for RCT

Husaini 24 of 40
ii. Compares outcomes based on the original treatment arm that each individual participant was
randomized to regardless of protocol violations
1. Violations include ineligibility, non-compliance, contamination, or LTFU
iii. Results are the most valid, but conservative estimate of the true treatment effect
1. Approach is the truest to the principles of randomization (which sticks to the perfectly
comparable groups at study outset)
2. However, ITT cannot fix the problem of LTFU unless the missing outcomes are imputed
using a valid method (which can never be fully verified)
a. Thus, no amount of
statistical analysis can fix
the problem that the final
outcome is unknown for a
sub-set of subjects
c. Per Protocol (PP)
i. Fundamentally Flawed
ii. Persons dropped
1. Those in treatment arm who did not
2. Control subjects who got treated
iii. Analyzed
1. Only those who complied w/ the
original randomization
iv. Answers the question as to whether the
treatment works among those who complied
1. It can never provide an unbiased
assessment of the true treatment
a. The decision to comply w/
treatment is unlikely to
occur at random
2. Basically the same thing when, during an ITT analysis, subjects are dropped b/c of
unknown outcome
v. Aka. Efficacy, exploratory, or effectiveness analyses
d. As Treated (AT)
i. Fundamentally Flawed
ii. Analyzed
1. Everyone assuming subjects got the treatment or did not (regardless of which group they
were originally assigned to)
iii. Basically the same as analyzing a trial as if a cohort study had been done completely destroys
any of the advantages afforded by randomization
1. everyone decided themselves whether to get treated or not
iv. Published studies use AT when studies do not show positive ITT analysis
1. Have to ask, what was the point of doing the trial in the first place if you ended up doing an
AT analysis AT approach w/o merit
e. Example
i. Note the very high death rate in the 26 subjects that were slated for surgery but received medical
1. At baseline, were probably a very sick group of patients who died before surgery or were
too sick to undergo it
ii. Note the 50 subjects who should have gotten medical treatment but got surgery instead and their
much lower death rate
1. At baseline, these men were probably healthier so impossible to judge the relative merits of
surgery based on this info
iii. Analysis
1. ITT surgery has small, significant benefit
2. PP or AT surgery has a larger and statistically significant benefit for surgery
a. This estimate is biased!
i. The 26 high risk subjects were either dropped from the surgery
group (PP) or moved to the medical group (AT which makes
medical look much worse)

Husaini 25 of 40
10. Meta-analyses
a. assessing trial quality, trail reporting, and trail registration; improve effect size estimtates (narrow the CI)
b. Meta-analyses fast becoming the undisputed king of the evidence based tree
c. Three important implications for RCTs
i. Assessment of study quality
1. b/c of the variability in the quality of published RCTs, meta-analysts will attempt to assess
their quality to determine whether the quality of a trial has an impact on the overall results
2. all approaches focus on
a. a description of randomization process
b. the use of concealment & blinding
c. a description of the LTFU and non-compliance rates
ii. Trail reporting
1. Reports on quality assessment of trials (using Jadad scale or similar tool)
2. If trail is of marginal or poor quality
a. Probably did not report info on key quality criteria
i. Randomization, concealment, blinding, LTFU
b. Not sure if author simply failed to mention or if they simply did not follow these
3. Lead to development of specific guidelines for the reporting of clinical trials
a. CONSORT Statement aims to make sure trials are reported in a consistent
fashion and that specific descriptions are included so the validity can be
independently assessed
iii. Trail registration
1. Big problem for meta-analyses is potential for publication bias
2. Results from meta-analyses can be seriously biased if there is a tendency to not
publish trails w/ negative or null results
a. Thus, when we collect data, we are collecting relatively much more positive data
that what is truly representative
i. The negative data is hidden
b. Unpublished negative trails
i. Either small (power)
ii. Large, drug company sponsored trials
1. Company doesnt want to release info
3. International Committee of Medical Journal Editors
a. Requires that for a trial to be published in any of these journals, it must
have been registered prior to starting it
i. Thus, scientific community will then a registry of all trials undertaken on
a respective subject

11. Equivalence and Non-inferiority Design

a. As many conditions have a standard treatment that makes use of a placebo trail ethically unacceptable, new
drugs need to be compared to this active control/standard treatment
i. However, it is increasingly difficult to prove that a new drug is better than an existing drug
b. Alternative approach = prove that the new drug is NO WORSE than the active control (w/in a given
tolerance or equivalence margin)
i. emphasis at the federal level on the conduct of comparative effectiveness trails
1. Trails done directly to compare alternative treatments
c. Equivalence Trails
i. Most often used to prove a new drug is equivalent to an existing
standard drug w/in a given tolerance or equivalence margin
1. Most often used in generic drug development
a. Prove similar bioavailability, pharmacology
d. Non-inferiority Trails
i. Designed to prove that a new drug is no less effective than an existing
standard drug
1. One sided equivalence test
2. Interest in non-inferiority trails assumes other drug

Husaini 26 of 40
a. Better safety profile ( side effects, monitoring)
b. Easier dosing schedule
i. compliance
c. Cheaper
3. May involve the evaluation of the same drug given using a different strategy, dose, or
ii. Methodological challenges
1. Null hypothesis is opposite that of typical superiority trial
a. Superiority trail
i. Ho: new drug = active control
ii. Ha: New drug active control
b. Non-inferiority trail
i. Ho: New drug + equivalence margin < active control
1. the active control is substantially better than the new drug
2. Rejecting Ho new drug is not inferior to the active
control within the bounds of the equivalence margin
ii. HA: New drug + equivalence margin active control
2. Equivalence margin = how much we are willing to accept that the new drug can have worse
a. Set by clinically deciding how big a difference there would have to be between
the two drugs before we would decided that the drugs are clinically not equivalent
b. It is the critical determinant of the success of the trial & its sample size
c. #s = more conservative
d. #s = more liberal
iii. Other problems/limitations of non-inferiority trials
1. Assay sensitivity
a. A poorly conducted trail may falsely show that the 2 drugs are equivalent
i. Poor trail conduct (compliance, follow-up, blinding, etc) will favor the
2. Blinding
a. Vital step to reduce measurement bias in superiority trials
b. It cannot protect against the investigators giving the same outcomes/ratings to all
i. Thereby showing non-inferiority
3. ITT analysis
a. ITT is gold standard in superiority trails
b. ITT in non-inferiority trails tends to bias towards finding non-inferiority
i. Including non-compliance in treatment/control tends to minimize the
differences b/w groups
1. Thus, this can show an inferior drug to be non-inferior
c. PP analysis can introduce bias in either direction
i. Not recommended as it can compound the problem
d. Best bet = do both ITT & PP and hope findings are consistent
i. Even so, accepting HA doesnt rule out possibility of bias

III. Advantages & Disadvantages of RCTs

1. Advantages
a. internal validity
b. Control of exposure (amount, timing, frequency, duration)
c. Randomization
i. Ensures balance of factors that could influence outcome
1. controls the effect of known and unknown confounders
d. A true measure of efficacy
2. Disadvantages
a. Limited external validity
b. Artificial environment
i. Strict eligibility criteria and conducted in specialized tertiary care medical centers
1. Greatly limits generalizability
c. Difficult/complex to conduct
i. Takes time and is expensive
d. Limited scope due to ethical concerns
i. Mostly therapeutic/preventive only

Lecture 10&11: XS, Cohort Studies, & Case Control Study

I. General Info

1. Overview

Husaini 27 of 40
a. Observational study = investigator has no control over exposure
b. Descriptive
i. Case reports & case series (Clinical)
1. Profile of a clinical case or case series which should
a. Illustrate a new finding
b. Emphasize a clinical principle
c. Generate a new hypothesis
2. It is not a measure of disease occurrence
3. As there is no control or comparison group, we usually cannot identify risk factors or
the cause
a. Exception 12 cases w/ salmonella infection, 10 had eaten cantaloupe
ii. Cross-sectional (Epidemiological) prevalence, or collecting data
c. Analytical
i. Cohort
ii. Case-control
iii. Ecological
1. we dont know exposures, but people who are affected are relatedso we study the
relationship workers and asbestos
2. Risk Factor
d. Heard daily with cholesterol (heart disease), HPV (cervical cancer), cell phones (brain cancer), TV watching
(childhood obesity), etc
i. However, association does not mean causation
ii. Ex. almost perfect overlap b/w CHD and non-CHD b/w in
percentage of men who developed coronary heart disease
with respect to serum cholesterol
1. Even though cholesterol is a proven risk factor
2. If you are just given one # for an individual, it is
difficult to predict outcome b/c of the perfect
e. How are risk factors used
i. Identifying individuals/groups at risk
1. Even though ability to predict future disease in
individual patients is very limited (even for well
established risk factors like cholesterol/CHD), it
still helps identify populations
ii. Causation causative agent vs. a marker
iii. Establish pretest probability (Bayes theorem)
iv. Risk stratification
1. Helps to identify target populations (age >40 for mammography screening)
v. Prevention
1. Remove causative agent prevent disease

3. Causation vs. Association

a. The relationship between a risk factor and disease can be due to
i. The risk factor being a cause of disease = causative agent
ii. The risk factor is NOT a cause but merely associated w/ the
disease = a marker
b. Need to guard against thinking, A causes B when really, B causes A
i. Called reverse causation

4. Prevention
c. Removing a true cause = disease incidence
i. Decrease aspirin use = Reyes Syndrome
ii. Discourage prone position while infants are sleeping
1. Back to Sleep = SIDS

5. Observational Studies
d. XS, Cohort, and CCS are all analytical observational studies

II. Cross Sectional Studies

1. General Idea
a. Exposure & Outcome at the same time
b. Also called prevalence study
i. Prevalence measured by conducting a survey of the population
of interest
c. Mainstay of descriptive epidemiology
i. Patterns of occurrence by time, place, and person
ii. Estimate disease frequency (prevalence) and time trends

Husaini 28 of 40
iii. when trying to get a handle on an ideatrying to get clues on the origin of disease by looking at
d. Useful for
i. Program planning
ii. Resource allocation
iii. Generate hypotheses

2. How
a. Select sample of individual subjects and report disease prevalence (%)
b. Can also simultaneously classify subjects according to exposure and disease status to draw inferences
i. Describe association using the Odds Ratio (OR)

3. Examples
a. Prevalence of asthma in school-aged children in MI
b. Trends and changing epidemiology of hepatitis in Italy
c. Characteristics of teenage smokers in MI
d. Prevalence of stroke in Olmstead County, MN

4. Advantages/Disadvantages Prevalence Study no time dimension

a. Advantages Does not measure incidence no RR or AR
i. Quick
Only use Odds Ratio
ii. Inexpensive
iii. Useful
b. Disadvantages
i. Uncertain temporal relationship
ii. Survivor effect
iii. Low prevalence due to
1. Rare disease
2. Short duration Exposure Outcome
III. Cohort Study Incident Study Finds incidence through nature
Good for rare exposure; not good for rare outcomes
1. General Idea Has very high confounding bias (selection and measurement bias also occur)
a. Exposure Outcome
b. Is a group w/ something in common (an exposure)
c. Start with disease-free at-risk population
i. They are susceptible to the disease of interest
ii. Have control and non-control
d. Determine eligibility and exposure status
e. Follow-up and count incident status
f. Very similar to RCT; however, in cohort studies exposures are chosen by
nature rather than by randomization

2. Types of Cohort Studies

a. Population based (one sample)
i. Select entire population (N) or a known fraction of the population (n)
ii. p(exposed) in population can be determined
iii. Exp (+) = IDR exposed; Exp (-) = IDR unexposed
b. Multi Sample
i. Select subgroups with known exposure
1. Ex. Smokers vs. non-smokers; coal miners vs. uranium
ii. p(exposed) in population cannot be determined
iii. Multi-sample cohort studies are done in occupational studies
1. Fireman and cancer risk?
2. Exp (+) = fireman; Exp (-) =Non fireman (police)

3. Relative Risk
a. The standard measure of association in cohort studies
b. Describes the magnitude and direction of the association
c. Incidence can be measured as IDR or CIR
d. Interpretation
i. RR = 1.0 null
ii. RR = 2.0 risk is twice as high in exposed vs. non-
iii. RR = 0.5 risk in exposed is half that in non-exposed

0 0.2 0.5 1 2 3 4 5 6
Big Moderate Small Moderate Big

4. Sources of Cohorts

Husaini 29 of 40
a.Geographically defined groups
i. Framingham, MA heart study
b. Special resource groups
i. Medical plans (Kaiser Permanente), Medical professionals (Physicians health study, Nurses Health
Study), Veterans, College Grads
c. Special Exposure Groups
i. Occupational exposures
1. Lead workers, uranium miners
a. If everyone was exposed, then you need an external (non-exposed) cohort for
comparison purposes
i. Lead workers to car assembly workers
5. Cohort Design Options
a. Variation in timing of exposure and disease measurement
b. Types
i. Prospective
ii. Historical Look back at the entire cohort (Exposure) and see who gets the outcome
iii. Retrospective
1. Go back in time to figure out exposure
2. Comparing exposure and non-exposure
3. Doesnt happen often, but sweet way to do it.
4. Examples
a. Aware of cases of fibromyalgia in women within a large HMO. Go back and
determine who had silicone breast implants (past exposure). Compare incidence
of disease in exposed and non-exposed.
i. Go back and look
1. Who had fibromyalgia
2. Who had silicone breast implants
ii. Case control study would look like this
1. Ask women w/fibromyalgia if they had silicone implants
2. Determine a control group & then only compare the 2 groups
no population comparison
b. Framingham Study: used frozen blood bank to determine baseline levels of hs-
CRP and then measure incidence of CHD by risk groups (quartiles)
i. They measured the CRP levels in the blood from the 60s and then
figured out who currently had CHD
5. We know that the population is composed of cases and non-cases. In case control
studies we find the cases/controls and then track backward to determine exposure.
In retrospective cohort studies, we start by figuring out exposure retroactively and
then track them forward to figure out if they developed into cases
6. Need complete population data in order to do this
a. need to classify everyone


a. General idea
i. Important question for public health
1. How much can we lower disease incidence if we intervene to remove this risk factor?
2. We want to know how much disease did an exposure cause in a respective population
ii. PAR & PARF assume that the risk factor is causal it caused the disease
b. Population Attributable Risk
i. The incidence of disease in a population that is associated with a risk factor
ii. PAR = (attributable risk) (P)
iii. Equals the excess incidence of X in the population due to risk factor Y
c. Population Attributable Risk Fraction
i. The fraction of disease in a population that is attributed to a risk factor
ii. PARF = PAR/total incidence
1. = P (RR-1)/[1+P(RR-1)]
a. A risk factor w/ a small RR but a large prevalence can cause more disease in a
population than a risk factor with a big RR and a low prevalence
b. prevalence & RR trumps prevalence & RR
iii. Represents the maximum potential impact on disease incidence if risk factor was removed

7. Bias
a. Selection Bias
i. Can occur at any time once the cohort is first assembled
1. Patients assembled for the study differ in many ways other than the exposure under
study and these factors may determine the outcome
a. Ex. Only the uranium miners at the highest risk for lung cancer (smokers, prior
family history) agree to participate.
ii. Can occur during the study
1. Differential LTFU in exposed vs. non-exposed

Husaini 30 of 40
a. LTFU doesnt occur at random
iii. Its basically inevitable
b. Confounding Bias
i. As the exposure of interest is not assigned at random & other risk factors may be associated
w/ both the exposure and the disease, confounding basis can occur in these cohort studies
ii. Confounding bias the big one for cohort studies

8. Advantages/Disadvantages
a. Advantages
i. Can measure disease incidence, can study the natural history of disease, provides strong
evidence b/w casual association between E/D (b/c time order is known), provides info on time lag,
multiple diseases can be examined, good choice if exposure is rare (assemble special exposure
cohort), generally less susceptible to bias vs. CCS
b. Disadvantages
i. Takes time, large samples, is expensive, complicated to implement and conduct, not useful for
rare diseases/outcomes, problems of selection bias (assembling at start and LTFU during) &
prolonged time period compounds LTFU, and confounding
Effect cause
Begin with the OUTCOME (case) and then GOING BACK ( Recall Bias) looking for ODDS OF EXPOSURE
Good for RARE OUTCOMES; cohort for rare exposures
IV. Case Control Studies (CCS) Cannot calculate incidence (no RR or AR)
High bias for everything (RECALL, selection, confounding, & measurement)
1. General Idea
a. An alternative observational design to identify risk factors for a disease/outcome
i. Two samples are selected
1. Patients who had developed the disease in question
2. Otherwise similar people who did not develop the disease in question
ii. Find a case (45year old female) with a control (45 year old female)
1. Distribution of age and gender are the same b/w the groups they can no longer
iii. They already have the outcome
b. Question: how do diseased cases differ from non-diseased (controls) w/ respect to prior exposure
c. In the population we find those that are diseased, and then match controls that are not diseased. In
other words, we figure out cases and controls first and then figure out if exposures occurred.
i. We know that the population is composed of cases and non-cases. In case control studies we find
the cases/controls and then track backward to determine exposure. In retrospective cohort
studies, we start by figuring out exposure retroactively and then track them forward to figure
out if they developed into cases
d. Compare the frequency of exposure among cases and control
i. Effect cause
e. Cannot calculate disease incidence rates b/c the CCS does not
follow a disease free population over time
f. For CCS, all the cases had the outcome
g. Basically, we identify cases and then look backward to find
causes of disease (& non-disease)
i. Look for common exposure
ii. Still set up a control group & then look back at that group as

2. Nested CCS
a. Study of MHG in infants
i. Not only did they look forward to see how the infants were
affected, they set up a control group in both those w/MHG &
those w/o MHG & looked back for exposures

3. Examples of CCS
a. Outbreak investigations ( what is causing young women to die of
toxic shock)
b. Birth defects (drug exposures and heart teratology)
c. New (unrecognized) disease (DES and vaginal cancer in adolescents)

4. Essential features of CCS design

a. Directionality outcome to exposure
b. Timing retrospective for exposure, but case ascertainment can be either retrospective or prospective
c. Rare/new disease design of choice if disease is rare or if a quick answer is needed
i. Cohort design is not useful
d. Challenging the most difficult type of study to design and execute
e. Design options

Husaini 31 of 40
i. Selection of cases
1. Requires case definition
a. Need for standard diagnostic criteria, consider severity of disease, and consider
duration of disease (prevalent or incident case?)
2. Requires eligibility criteria
a. Age of residence, age, gender, etc
ii. Sources of cases
1. Population based
a. Identify and enroll all incident cases from a defined population
i. Ex. Disease registry, defined geographic area, vital records
2. Hospital based
i. Popular in USA b/c we dont have good national/regional databases
b. Identify cases where you can find them
i. Hospitals, clinics
ii. Issues of representativeness, prevalent or incident cases?
iii. Selection of controls
1. Controls reveal the normal/expected level of exposure in the population that gave
rise to the cases
2. Should have the same eligibility criteria as the cases
3. Issue
a. Control comparability to cases concept of the study base
i. Controls should be from same underlying population
ii. Need to determine if the control would have developed disease
would s/he be included as a case in the study
1. If no, then dont include as a control
iv. Sources of controls
1. Population based
a. Ideal as it represents the exposure distribution in the general population
b. However, if there is a low participation rate response bias likely (selection
2. Hospital based
a. Used when population based controls are not feasible
b. Much more susceptible to bias
c. Advantages
i. Similar to cases? (it is a hospital after all..), more likely to participate,
and efficient (there in a hospital)
d. Disadvantages
i. Are they representative of the study base?
ii. They already have some kind of disease/co-morbidity
1. Dont select if risk factor for their disease is similar to the
disease under study (COPD & lung cancer)
3. Other sources
a. Relatives, neighbors, friends (of cases)
i. Advantages
1. Similar to cases and more willing to cooperate
ii. Disadvantages
1. More time consuming, may not be willing to give info, may
have similar risk factors

5. Analysis of CCS
a. The only valid measure of association for the CCS is the Odds Ratio (OR)
b. Under reasonable assumptions (the rare disease assumption), the OR approximates the RR
c. Odds Ratio
i. Odds of exposure among cases = a/c
ii. Odds of exposure among controls = b/d
iii. Similar interpretation as RR
iv. Provides the same information as RR if
1. Controls represent the target population
2. Cases represent all cases
3. Rare disease assumption holds
a. Or if CCS us undertaken w/population based sampling
v. OR can be calculated for any design
1. OR is the only valid measure for the CCS
2. RR can only be calculated for RCT & cohort studies
3. Publications will occasionally mislabel OR & RR

6. Confounding
a. Exposure of interest may be confounded by a factor that is associated with the exposure and the
disease it is an independent risk factor that the disease
b. Can be controlled

Husaini 32 of 40
i. At the design phase
1. Randomization, restriction, matching
ii. At the analysis phase
1. Age-adjustment, stratification, multivariable adjustment
c. Matching
i. Used to control an extraneous variable by matching controls to cases on a factor you know is
an important risk factor or marker for disease
1. Age (w/in 5 years), sex, neighborhood
ii. If the factor is fixed to be the same in both the cases and controls, then it cant be confounded
iii. Analysis of matched CCS needs to account for the matched case-control pairs
1. Only pairs that are discordant with respect to exposure provide useful information
2. McNemars OR = b/c
a. Case (+/-) vs. Control (+/-) and then match in a 2X2 table
i. Each box entered as a pair of one case and one control
ii. Concordant pair = both smokers or both non-smokers
iii. Discordant cells = contribute to Odds Ratio
1. Case is a smoker, control is not
2. Case is a non-smoker, control is a smoker
b. the only pair that gives any information is discordant pairs
iv. Can power by matching more than one control per case
1. 4 controls to 1 case = power
2. Useful if few cases are available
v. Over-matching
1. Matching can result in controls being so similar to cases that all exposures are the same
a. Ex. 8 cases of GRID (LA county 1981) in which all cases were gay men so they
were matched using a 4:1 matching ration to other gay men who did not have
signs of GRID (32 controls)
i. No differences found in sexual or other lifestyle habits
d. Recall Bias
i. Presence of disease may affect ability to recall or report the exposure
1. Is a form of measurement bias
2. Ex. Exposure to OTC drugs during pregnancy use by moms of normal and congenitally
abnormal babies
a. Its pretty hard to remember if/when you may have taken a Tylenol
ii. To lessen potential
1. Blind participants and study personnel to study hypothesis
2. Use explicit definitions for exposure
3. Use controls w/ an unrelated but similar disease
a. Ex. Heart tetralogy (cases), hypospadia (controls)
e. Reverse Causation
i. The disease or sub-clinical manifestations of it results in a change in behavior (exposure)
ii. Ex. Obese children found to be less physically active than non-obese children

7. Advantages/Disadvantages RCT Cohort CCS XS

a. Advantages
Odds Ratio Yes Yes Yes Yes
i. Quick & cheap (relatively)
1. So ideal for outbreaks RR Yes Yes No No
ii. Can study rare or new diseases AR (RD) Yes Yes No No
iii. Can evaluate multiple exposures PARF No Yes No No
b. Disadvantages
i. Uncertain of E D relationship
1. Especially uncertain of timing RCT Cohort CCS XS
ii. Cannot estimate disease rates
iii. Worry about representativeness of controls Selection
iv. Inefficient if exposures are rare Confounding -
v. Bias selection, confounding, measurement Measurement
Cohort (incidence) Case Control Cross-Sectional (prevalence)
Begins with a defined population at risk Population at risk may be undefined Begins with a defined population
Cases not selected but ascertained by Cases selected by investigator from an Cases not selected but ascertained by a
surveillance available pool of patients single examination of the population
Controls (the comparison group) are Controls selected by investigator to Noncases include those free of disease at
not selected, they evolve naturally resemble cases the single examination
Exposure measured before the Exposure measured, reconstructed, or Exposure measured at the same time as
development of disease recollected after development of disease disease
Risk & incidence cannot be measured Risk & incidence cannot be measured
Risk, incidence of disease, & RR
directly directly
measured directly
RR can be estimated by the odds ratio RR can be estimated by the odds ratio

Husaini 33 of 40
EPI 547
I. Random Stuff

1. Bias
It is a deviation from the truth (Grimes, Bias and Causal Associations, Lancet 2002)
Systematic Error (Bias) - Error in study design which may skew the results leading to a deviation from the
i. This is when all measurements are consistently all high, or low.
1. Ex: A spectroscopy machine consistently gives high readings because it wasnt calibrated.
c. Three broad classes of Bias
i. Confounding
1. Factor that distorts the true relationship of the study variable of interest of being to both
a. The outcome of interest
b. The study variable
c. Confonding = bias that we can control
2. Two mechandims
a. Confounding by Indication
i. Intractable problem where prognostic factors influence treatment
1. Problematic when elevating treatment effects from
observational data
ii. Ex. Asthma studies of 1980s showed an association between a -
agonist (Fenoterol) and death from asthma
1. However, it was argued that patients who had more sever
asthma were therefore at a higher risk of mortality from
asthma and thus were likely to be prescribed Fenoterol in
the first place.
2. The severity of the disease confounds the association
between the drug and the adverse outcome
b. Channeling effect (bias)
i. Tendency for clincians to prescribe certain treatments based on a
Reducing Bias
Confounding use RANDOMIZATION patients underlying prognosis or comorbidity profile
Selection use CONCEALMENT 1. results in differences in baseline risk
Measurement use BLINDING ii. Solution
1. Adapt the design of the study
2. Statistically adjust the baseline risk differences
ii. Selection
1. Internal validity questions for Dx & Px
2. Cohort studies
3. Selection bias = bais that we cannot control (compared to confounding)
iii. Measurement
1. Internal validity questions for harm
2. Case Control Study

2. A non-inclusive list of specific types of bias can include:

a. Assessment / Measurement Bias Occurs when one group of patients has a better (or worse) chance of
having their outcome detected than another group (EPI 547 CP).
i. Particularly likely to occur for soft outcomes like side effects, mild disabilities, sub-clinical disease or
specific cause of death
ii. Minimize bias by adhering to the following 3 principles
1. Ensuring that all observations are carried out by observers who are blinded to the exposure
stats of the particular patients.
a. Blinding eliminates measurement bias outcomes are measured with same
degree of accuracy and completeness in every participant
2. Develop (and use) careful criteria or rules for deciding whether an outcome event has
3. Apply equally rigorous efforts to ascertain all events regardless of exposure group
b. Information Bias / Observation Bias / Classification Bias / Measurement Bias Results from incorrect
determination of exposure or outcome, or both (Grimes 2002)

Husaini 34 of 40
c. Interviewer bias error introduced by an interviewers conscious or subconscious gathering of selective data
(the interviewer might think that people are sicker than they really are).
d. Recall bias error due to differences in accuracy or completeness of recall to memory of past events or
experiences. Particularly relevant to case control studies (CCS).
e. Selection bias an error in patient assignment between groups that permits a confounding variable to arise
from the study design rather than by chance alone.
i. Occurs when the groups of exposed and non-exposed assembled for the study differ in some way
other than the prognostic factors under study.
ii. When extraneous variables affect the outcome of the study
iii. This stems from an absence of comparability between groups being studied
iv. Spectrum Bias
1. Definied as: the difference in both the spectrum and severity of diseae between
a. The population among whom the test was first developed (the study population)
i. Phase I evalutions to see if (+) in sick people
1. The sickest of the sick
a. Easier time picking out the obvious
b. Se will be overestimated
ii. Phase I evaluations to see if (-) in normal people
1. The wellest of the well
a. Healthier and younger than clinically relevant
b. Less likely to have other Dx or co-morbitites
c. Less FP results (or so many TN..)
d. Overestiimates Sp
iii. NET EFFECT: new diagnostic tests are overly optimistic
a. Overestimate Se and Sp
b. The population that the test will be used in practice (clinically relevant population)
i. Phase II evaluations clinical population that has a whole array of
conditions that are a part of the DDX
a. Conditions that cause FPs
b. This underestimates the Sp
c. Opposite of the wellest of the well
v. Assembly or Susceptibility bias Is an example of selection bias, since the bias occurs when the
subjects are first selected.
1. Survival Cohort This is a special type of assembly bias where only the patients that
survived the outcome are taken into account
2. A survival cohort describes the past history of prevalent cases and NOT that of a true
inception cohort
3. Individuals who would have been included in a true inception cohort are not accounted for
because they died soon after the onset of treatment
vi. Migration Bias / Loss-to-Follow-Up This is another form of selection bias which occurs when
patients drop out of the study prematurely
vii. Referral / Sampling Bias This is a selective referral of patients to tertiary (academic) medical
centers where many publications concerning prognostic aspects of disease are conducted and this
selection bias alters the clinical spectrum of disease
1. The proportion of more severe or unusual cases tends to be artificially higher at tertiary
care centers.
2. People who are treated at primary care centers are often TOO SICK to be referred to or
even make the trip to the tertiary care center.
3. The people that survived the referral to the academic centers are the ones getting studied.
f. Volunteer bias people who choose to enroll in clinical research may be systematically different (e.g.
healthier, or more motivated) from your patients.
g. Verification Bias / Workup bias when the decision to conduct the confirmatory or reference standard test is
influenced by the result of the diagnostic test under study.
i. Ask: Where all the patients subjected to a gold standard
ii. Fecal occult blood test and colonoscopy
1. FOBT (-) are not referred to colonoscopty, but they could be FNs
a. Over estimates Se (b/c of the FNs are underestimated)
b. Under estimates Sp (b/c # of TN are underestimated)

Husaini 35 of 40
3. Cochrane Collaboration
a. This international group, named for Archie Cochrane, is a unique initiative in the evaluation of healthcare
interventions that prepares, disseminates, and continuously update systematic reviews of controlled trials for
specific patient problems.
b. This team will gather all of the studies on a subject, disregard the poor studies, and come up with a consensus
on the final outcomes of the good studies.
c. Key points for systematic reviews
i. Grade concealment of allocation
ii. Describe key quality parameters relevant to topic
iii. Report risk of bias table

4. Effectiveness vs. Efficacy

a. Effectivness: A measurement of benefit resulting from an intervention for a given health problem under
conditions of usual practice. This form of evaluation considers both the efficacy of an intervention and its
acceptance by those to whom it is offered. It helps answer does the practice do more good than harm to
people to whom it is offered?
i. Think of as the clinical trials that try to figure out if the treatment works under Usual and Ordinary
b. Efficacy: A measure of benefit resulting from an intervention for a given health problem under conditions of
ideal practice. It helps answer does the practice do more good than harm to people who fully comply with the
recommendations? (N.B. It is the job of RCTs to measure efficacy)
i. Think of as the clinical trial checking to see if the treatment can work under Ideal situations (Fletcher

5. Evidence Based Medicine (EBM)

a. The conscientious, explicit, and judicious use of current best evidence in making decisions about the care of
individual patients.
i. Clinical Expertise
ii. Research Evidence
iii. Patient Preference
b. The practice of evidence-based medicine requires integration of individual clinical expertise and patient
preferences with the best available external clinical evidence from systematic research.
c. It is the application of clinical epidemiology to the care of patients and includes the following concepts:
i. Formulate: Converting the information need into an answerable clinical question
1. EBM clinical question PICO
a. P = Population
b. I = Intervention
c. C = Control
d. O = Outcome
2. Clinical question type
a. RCT = Therapy
b. Cohort = Dx & Px
i. Having exposure and looking for outcome
c. Case Control = Harm (outcome)
i. Have outcome and looking back for exposure/harm
ii. Search: Track down/search for the best evidence because EBM must include the best available
research evidence
iii. Appraise: Critically appraise and judge whether the evidence is strong enough to base clinical
decisions on
1. Systematic approach
a. Validity
b. Results
c. Application
iv. Apply: Integrate the critical appraisal with clinical expertise and pts biology, values, and

6. Hazard Function
a. The probability of an event (such as death or relapse) at a given moment in time (t) (EPI 547 CP).
b. It is a direct measure of prognosis and indicates that, given the patient has survived to a certain point in time,
what is the probability of the patient failing during the next time period?
i. in hazard function indicates that the prognosis worsens with time

Husaini 36 of 40
ii. in hazard function indicates that the prognosis improves for those patients that survive longer

7. Censoring
a. When the event of interest does not occur in all individuals because
i. Study stopped/ended before outcome occurred
ii. LTFU
iii. Death from other (competing) causes eg. Road accident

8. Necessary and Sufficient

a. Necessary The factors that MUST ALWAYS BE PRESENT for the disease to occur
i. If factor is NOT present, then disease DOES NOT HAVE TO occur, and CANNOT occur
ii. If factor present, then disease CAN occur, but disease DOES NOT HAVE TO occur.
b. Sufficient Once the factor IS PRESENT, the disease WILL ALWAYS occur
i. Once factor is present, then disease MUST ALWAYS occur, even though it is not the only important
thing for the disease to occur
c. Examples
i. Rabies Virus and Human Rabies Rabies virus is BOTH NECESSARY & SUFFICIENT because
virus must be present for the disease to occur and once it is present, the disease must occur
ii. Mycobacterium tuberculosis and Clinical TB disease This bacteria is ONLY NECESSARY to cause
tuberculosis because it must be present for TB to occur, but it is not sufficient, because not everyone
that has the bacteria will get the disease
iii. Smoking and Lung Cancer Smoking is NEITHER necessary because when smoking is present lung
cancer can occur, but lung cancer does not have to and when smoking is not present, then lung
cancer may or may not occur. It is not sufficient because when smoking is present lung cancer does
not always occur.
iv. Maternal Alcohol Use and Fetal Alcohol Syndrome As for the maternal drinking and fetal alcohol
syndrome, it is ONLY NECESSARY for a mother to drink in order for the child to be born with FAS. It
is not sufficient, because other factors play a role too, one being the dose etc.
v. Prone Sleeping and Sudden Infant Death Syndrome This is NEITHER necessary because prone
sleeping is not always present in SIDS cases and it is not sufficient because prone sleeping does not
mean that SIDS will occur
vi. PhenylPropanolAmine and hemorrhagic stroke PPA is NEITHER necessary or sufficient. It is more
a risk factor than the cause in the narrow sense of the word
vii. HIV and AIDS HIV is BOTH NECESSARY & SUFFICIENT because HIV must be present for the
disease to occur (but does not have to occur) and SUFFICIENT because once the factor is present,
the disease will occur
9. Study Designs
a. Case Series A collection or a report of the series of patients with an outcome of interest. No control group is
b. Case Control Study (CCS) Identifies patients who have a condition or outcome of interest (cases) and
patients who do not have the condition or outcome (controls). The frequency that subjects are exposed to a risk
factor of interest is then compared between the cases and controls. Because of the design of the CCS, disease
rates cannot be directly measured (contrast this with the cohort study design). Thus the comparison between
cases and controls is actually done by calculating the odds of exposure in cases and controls. The ratio of these
2 odds results in the odds ratio (OR) which is usually a good approximation of the relative risk (RR)
i. Advantages
1. it is relatively quick and inexpensive requiring fewer subjects than other study designs.
2. It is often times the only feasible method for investigating very rare disorders or when a
long lag time exists between an exposure of interest and development of the
outcome/disease of interest.
3. It is also particularly helpful in studies of outbreak investigations where a quick answer
followed by a quick response is required.
ii. Disadvantages: recall bias, unknown confounding variables, and difficulty selecting appropriate
control groups.
c. Crossover Design A method of comparing 2 or more treatments or interventions in which all subjects are
switched to the alternate treatment after completion of the first treatment. Typically allocation to the first
treatment is by a random process. Since all subjects serve as their own controls, error variance is reduced.
d. Cross-Sectional Survey The observation of a defined population at a single point in time or during a specific
time interval. Exposure and outcome are determined simultaneously. Also referred to as a prevalence survey
because this is the only epidemiological frequency measure that can be measured (in other words incidence
rates cannot be generated from this design)
e. Cohort Study Involves identification of two groups (cohorts) of patients who are defined according to whether
they were exposure to a factor of interest e.g., smokers and non-smokers . The cohorts are then followed over
time and the incidence rates for the outcome of interest in each group are measured. The ratio of these
incidence rates results in the relative risk (RR) which quantifies the magnitude of association between the factor
and outcome (disease). Note that when the follow-up occurs in a forward direction the study is referred to as a
prospective cohort. When follow-up is done based on historical information it is referred to as a retrospective

Husaini 37 of 40
i. Advantages
1. can establish clear temporal relationships between exposure and disease onset
2. able to generate incidence rates .
ii. Disadvantages
1. control/unexposed groups may be difficult to identify, exposure to a variable may be linked
to a hidden confounding variable, blinding is often not possible, randomization is not
2. For relatively rare diseases of interest, cohort studies require huge sample sizes and long
f/u (hence they are slow and expensive).
f. N-of-1 Trial When an individual patient undergoes pairs of treatment periods organized so that one period
involves use of the experimental treatment and the other involves use of a placebo or alternate therapy. Ideally
the patient and physician are both blinded, and outcomes are measured. Treatment periods are replicated until
patient and clinician are convinced that the treatments are definitely different or definitely not different.
g. Randomized Controlled Trial (RCT) A group of patients is randomized into an experimental group and into a
controlled group. These groups are then followed up and various outcomes of interest are documented. RCTs
are the ultimate standard by which new therapeutic maneuvers are judged. Randomization should result in the
equal distribution of both known and unknown confounding variables into each group. An unbiased RCT also
requires concealment and where feasible blinding.
i. It is the gold standard of clinical research designs because it REDUCES CONFOUNDING from
known and unknown confounders
ii. Randomization should ensure that there are NO differences between the groups at baseline.
1. This can be described as the groups having an equal chance at prognosis at baseline
2. It can also be described as controlling for the known and unknown confounding variables
3. It CANNOT be applied to ALL clinical questions of interest
4. RCTs do NOT have to always have a placebo group, but they must have a group to
compare to (different drug)
iii. Disadvantages: often impractical, limited generalizability, volunteer bias, significant expense, and
sometimes ethical difficulties.
h. Systematic Review A formal review of a focused clinical question based on a comprehensive search
strategy and structured critical appraisal designed to reduce the likelihood of bias.
i. No quantitative summary is generated however.
ii. Any summary of research that attempts to address a focused clinical question in a systematic,
reproducible manner.
iii. These reviews provide a summary of studies which have been searched out comprehensively with
explicit and reproducible search strategy intended to answer a specific clinical question
iv. These reviews incorporate some sort of inclusion criteria, valid quality assessment methods, a
rigorous appraisal of the evidence offered, and some summary conclusions
v. High quality SRs
1. Assess quality of individual studies
2. Report results
i. Narrative Review This provides a general overview which may not follow rigorous, reproducible scientific
i. This may result in biased or erroneous conclusions
ii. It may be that these reviews provide practical information about managing common clinical conditions
iii. One is unlikey to find a detailed and discriminating appraisal of evidence
j. Meta-Analysis A systematic review which uses quantitative methods to combine the results of several
studies into a pooled summary estimate.
i. The quantitative synthesis that yields a single best estimate of, for instance, treatment effect.
ii. This is a sub-set of systematic reviews where the investigators report a combined summary statistic
with a variable of interest.

10. Survival Function = S(t)

a. This is the survival experience of a population, which is the probability of survival to a given point in time (t).
i. S(3) = 60 This indicates that 60% of the population survived 3 years
b. Median Survival Time Crude, but commonly used measure of survival time at which half the patients have
c. Hazard Function is closely linked

Likelihood Ratio = given that test is X, what are

the ODDS that(D+) would occur compared to (D-)

11. Liklihood Ratio

a. Defined as: the probablity of observing the test result in the presence of disease divided by the probability of
observing the test result in the absence of disease.
i. It is therefore ODDS that a given test result (x) would occur in diseased individual (D+) compared to a
non-diseaed individual (D-).
b. Make the application of Bayes Theorem more practical/easier
i. an altervative way of describing the performance of a diagnositic test
c. Prevalance
i. Its the prior probability of disease the clincicans best gues/opinion prior to ordering a test.

Husaini 38 of 40
ii. Fundamental Fact #1
1. The interpretation of test results depends on the probability of disease before the test was
d. Odds-LR form of Bayes Theorem
i. The environment in which the test is applied (indicated by pre-test odds) is as important as the
information provided by the test (indicated by the LR); in other words, each aspect is only half of the
ii. The LR can be obtained on-online, the hard part is for the clinician to provide an accurate estimate of
the pre-test odds.
iii. (Pre-test ODDS)(LR) = (Post-test ODDS)
1. Pre-test ODDS = (Prev/(1-Prev)
2. Post-test PROBALITY = (post test odds)/(1+post test odds)
a. Post test porbality = PVP (or PVN)
iv. Thus, can calculate PVP or PVN from
1. LR+
2. LR-
3. Pre-test odds
v. Positive posttest Probability) the likelihood of disease given that the test is (+) = P(D+T+)
vi. Negative posttest probability) the likelihood of disease given that the test is (-) = P(D+T-)
1. This one isnt much use to us as it is the complement of PVN or 1- P(D-T-)
a. Therefore, we calculate PVN = 1- post test probability
e. Can be calculated for a range of test results (ordinal or continuous test), thereby preserving clinical information
f. Test results provide the maximum amount of information the change b/w prior and post test probability when
the prevalence of disease is between 40-60%.
g. Theoritical advantage of LR over Se and Sp
i. Se and Sp remain constant regardless of the prior probability of disease
ii. LR is less susceptible to changes in the underlying prevalence of disease because they are
calculated from smaller slices of the data.

12. Kaplan-Meier Estimator

a. Widely accepted method of estimating S(t) that is expressed as the product of conditional probabilities
i. S(3) = S(1) + S(21) + S(32)
13. Log-rank test
a. A formal statistical test of the difference in survival distribution
b. It compares the observed number of events in one group to the expected number of events if the two groups
had identical hazard functions.
14. Cox proportional hazard model
a. Very powerful regression modeling technique based on the hazard function
i. Allows for full application and flexibility of regression analysis to be applied to survival data
ii. In its simplist form, it is an extension of the log rank test
iii. The measure of effect is the HR
b. Advantages
i. Able to handle a large number of prognostic variables (discrete and continuous)
ii. Can adjust for confounding varibales
iii. Evaluate interaction effects

15. Heterogeneity
a. Test done in systematic reviews and meta analysis to determine how similar individual study RESULTS are
b. Q statistic
i. Based on the chi-square test (the test has low power to detect heterogeneity)
ii. H0: p>0.5 there is homogeneity among the results
iii. Ha: p<0.5 then heterogeneity is present
1. Unlikey that chance explains the difference in the studies
iv. NOTE: the H0 is OPPOSITE of what we would normally think
c. Inconsistency Index or I2
i. Estimate of the variability in results due to true differences in treatment effect vs. chance
ii. I2<25% = low heterogeneity
iii. I2 25-75% = moderate heterogeneity
iv. I2 >75% = high heterogeneity
d. Sources of heterogeneity
i. Clinical heterogeniety
1. Populatoin
2. Intervention
3. Outcome
ii. Methodologcical heterogenetiy
1. Design
iii. Chance

16. Pooling Results

a. Weighted average

Husaini 39 of 40
i. Larger trails have more weight as a simple mean may provide an unbalanced estimate of the effect
ii. Types
1. Fixed effect model = used with LOW heterogeneity
a. Inference based on studies at hand
b. So assumption is that identicial studies should produce identical results
i. Thus, any difference is only from within-study random variation
c. Combines all the studies according to weight
2. Random effect model = used with HIGH heterogeneity
a. Random sample of studies from all the possible studies in the univerise
b. Accounts for between and within study random variation
c. More weight given to smaller studies
d. Wider 95% confidence interval

Husaini 40 of 40