Observational Studies: Usmle Endpoint Statistics

USMLE ENDPOINT STATISTICS
OBSERVATIONAL STUDIES
DR AHMED SHEBL Page 1

CROSS-SECTIONAL STUDY “SNAPSHOT STUDY”

 Frequency of disease and frequency of risk related factors are assessed in the present.
 Asks, “What is happening?”
 Measures:
 Disease prevalence.
 Can show risk factor association with disease, but does not establish causality.
CASE-CONTROL STUDY
 Compares a group of people with disease to a group without disease. Retrospective study.
 Looks to see if odds of prior exposure or risk factor differs by disease state.
 Asks, “What happened?”
 Measures:
 Odds ratio (OR).
 Patients with COPD had higher odds of a history of smoking than those without
COPD.
 Subjects are not randomly selected but rather explicitly recruited with (cases) and without
(control) disease. Control are selected who do not have disease of interest, and regardless
of exposure status to risk factor.
 Cannot assess incidence or prevalence of disease. Very useful for studying conditions
with very low incidence or prevalence.
 Can help determine causal relationships.

COHORT STUDY
 Compares a group with a given exposure or risk factor to a group without such exposure.
 Looks to see if exposure or risk factor is associated with later development of disease.
 Can be:
 Prospective (asks, ―Who will develop disease?‖) or
 Retrospective (asks, ―Who developed the disease [exposed vs nonexposed]?‖).
 Measures:
 Relative risk (RR).
 ―Smokers had a higher risk of developing COPD than nonsmokers.‖
 Can determine incidence and causal relationships.
TWIN CONCORDANCE STUDY

 Compares the frequency with which both monozygotic twins vs both dizygotic twins
develop the same disease.
 Measures heritability and influence of environmental factors (―nature vs nurture‖).
ADOPTION STUDY
 Compares siblings raised by biological vs adoptive parents.
 Measures heritability and influence of environmental factors.
META-ANALYSIS
 Groups the results of several trials (ideally high-quality randomized controlled trials) to
increase statistical power and provide an overall estimate of the effect of an exposure on
an outcome.
ECOLOGICAL STUDY
 The unit of analysis in this study is populations not individuals.
 Studied using population data.

CLINICAL TRIAL
 Experimental study involving humans.
 Compares therapeutic benefits of 2 or more treatments, or of treatment and placebo.
 Study quality improves when study is randomized, controlled, and double-blinded (ie,
neither patient nor doctor knows whether the patient is in the treatment or control group).
 Triple-blind refers to the additional blinding of the researchers analyzing the data.
 Four phases (―Does the drug SWIM?‖).
CROSSOVER STUDY DESIGN
 Crossover trials allow the patients to serve as their own controls.

ALGORITHM FOR DETERMINING STUDY DESIGN

EVALUATION OF DIAGNOSTIC TESTS

 Uses 2 × 2 table comparing test results with the actual presence of disease.
 TP = true positive; FP = false positive; TN = true negative; FN = false negative.
 Sensitivity and specificity are fixed properties of a test.
 PPV and NPV vary depending on disease prevalence in population being tested.
SENSITIVITY (TRUE POSITIVE RATE)

 Proportion of all people with disease who test positive, or the probability that when the
disease is present, the test is positive.
 Represents the ability of a test to rule out those with the disease, very important for
screening purposes.
 High sensitivity test used for screening in diseases with low prevalence.
 = TP / (TP + FN) = 1 – false negative rate.
SN-N-OUT = highly SeNsitive test, when Negative, rules OUT disease.
 The false negative rate is (1- sensitivity). High sensitivity indicates a low false-negative
rate.
 If sensitivity is 100%, then FN is zero. So, all negatives must be TNs.

SPECIFICITY (TRUE NEGATIVE RATE)

 Ability of the test to correctly identify individuals without the disease.
 Proportion of all people without disease who test negative, or the probability that when
the disease is absent, the test is negative. (The ability of a test to exclude those without
the disease.)
 Value approaching 100% is desirable for ruling in disease and indicates a low false
positive rate.
 High specificity test used for confirmation after a positive screening test.
 = TN / (TN + FP) = 1 – false-positive rate.
SP-P-IN = highly SPecific test, when Positive, rules IN disease.
 If specificity is 100%, then FP is zero. So, all positives must be TPs.
 Specificity should be high in confirmatory tests to decrease the number of false
positives (FP).

Screening test  high sensitivity.
Confirmatory (gold standard) test  high specificity.

LIKELIHOOD RATIO
 LRs can be multiplied with pretest odds of disease to estimate posttest odds.
 LR+ > 10 and/ or LR– < 0.1 are easy-to-remember indicators of a very useful diagnostic
test.
 UW: Positive and negative predictive values are influenced by disease prevalence.
Sensitivity, specificity, and likelihood ratios are not.

QUANTIFYING RISK
 Definitions and formulas are based on the classic 2
× 2 or contingency table.
ODDS RATIO (OR)

 Typically used in case-control studies.
 Measure of association between an exposure and an
outcome.
 Example:
 The odds of an event (eg, disease) occurring giving a certain exposure (a/b) vs the
odds of an event occurring in the absence of that exposure (c/d).
 It represents the odds that an outcome (eg, major arrhythmia) occurred in the
presence of a particular exposure (eg beta-blocker) compared to the odds that the
outcome occurred in the absence of that exposure.
RELATIVE RISK
 Typically used in cohort studies.
 Risk of developing disease in the exposed group divided by risk in the unexposed group
 Example:
 If 21% of smokers develop lung cancer vs 1% of nonsmokers, RR = 21/1 = 21.
 For rare diseases (low prevalence), OR approximates RR.
 RR = 1  no association between exposure and disease.
 RR > 1  exposure associated with ↑ disease occurrence.
 RR < 1  exposure associated with ↓ disease occurrence.
 In low prevalence situations, RR = OR.
 When the disease is rare, a and c represent small quantities. Therefore, a is
negligible compared to b, and c is negligible compared to d; this results in a

reasonable mathematical approximation of RR, where RR becomes

approximately ad / bc, which equals OR.
ATTRIBUTABLE RISK
 The excess risk in an exposed population that can be explained by exposure to a
particular risk factor.
The difference in risk between exposed and
unexposed groups.
The proportion of disease occurrences that are
attributable to the exposure.
 Example:
If risk of lung cancer in smokers is 21% and risk in nonsmokers is 1%, then 20%
of the lung cancer risk in smokers is attributable to smoking.
 AR = =
ABSOLUTE RISK REDUCTION

 Percentage indicating the actual difference in event rate between control and treatment
groups.
 ARR = Control Rate - Treatment Rate.
 Example:
 If 8% of people who receive a placebo
vaccine develop the flu vs 2% of people who receive a flu vaccine, then ARR =
8% − 2% = 6% = .06).

RELATIVE RISK REDUCTION

 Percentage indicating relative reduction in the treatment event rate compared to the
control group.
 RRR = ARR/Control Rate.
 Example:
 If 2% of patients who receive a flu shot develop the flu, while 8% of unvaccinated
patients develop the flu, then RR = 2/8 = 0.25, and RRR = 0.75.
NUMBER NEEDED TO TREAT

 Number of patients who need to be treated for 1 patient to
benefit.
 Lower number = better treatment.
NUMBER NEEDED TO HARM

 Number of patients who need to be exposed to a risk factor for 1 patient to be harmed.
 Higher number = safer exposure.


INCIDENCE VS PREVALENCE
 UW: An increasing prevalence and stable incidence can be attributed to factors

which prolong the duration of a disease (e.g., improved quality of care).
 UW: The attack rate is the ratio of the number of people who contract an illness
divided by the number of people who are at risk of contracting that illness.


PRECISION VS ACCURACY
PRECISION (RELIABILITY)
 The consistency and reproducibility of a test  gives similar or very close results on
repeat measurements.
 The absence of random variation in a test.
 Random error ↓ precision in a test.
 ↑ Precision  ↓ standard deviation.
 ↑ Precision  ↑ statistical power (1 − β).
ACCURACY (VALIDITY)
 The trueness of test measurements.
 The absence of systematic error or bias in a test.
 Systematic error ↓ accuracy in a test.

BIAS AND STUDY ERRORS

SELECTION BIAS
 Nonrandom sampling or treatment allocation of subjects such that study population is
not representative of target population (eg, study participants included based on
adherence or other criteria related to outcome).
 Most commonly a sampling bias.
 Examples:
 Berkson bias—study population selected from hospital is less healthy than
general population.
 Healthy worker effect—study population is healthier than the general
population.
 Non-response bias—participating subjects differ from nonrespondents in
meaningful ways.
 STRATEGY TO REDUCE BIAS
 Randomization.
 Ensure the choice of the right comparison/reference group.
ATTRITION BIAS
 Loss to follow-up occurs disproportionately between the exposed and unexposed groups.
 Attrition bias is a form of selection bias.
 Attrition bias does not occur when the losses happen randomly between the exposed and
unexposed groups as this simply leads to a smaller study population.

RECALL BIAS
 Awareness of disorder alters recall by subjects.
 Potential problem for retrospective studies such as case-control studies, particularly
when questionnaires are used to inquire about distant past exposure.
 Examples:
 Patients with disease recall exposure after learning of similar cases.
 Decrease time from exposure to follow-up.

MEASUREMENT BIAS
 Information is gathered in a systemically distorted manner.
 Examples:
 Association between HPV and cervical cancer not observed when using non-
standardized classifications.
 Hawthorne effect—participants change their behavior in response to their
awareness of being observed.
 Use objective, standardized, and previously tested methods of data collection that
are planned ahead of time.
 Use placebo group.
PROCEDURE BIAS
 Subjects in different groups are not treated the same.
 Examples:
 Patients in treatment group spend more time in highly specialized hospital units
 Blinding and use of placebo reduce influence of participants and researchers on
procedures and interpretation of outcomes as neither are aware of group
allocation.
OBSERVER-EXPECTANCY BIAS
 The investigator's decision is affected by prior knowledge of the exposure status.
 Researcher’s belief in the efficacy of a treatment changes the outcome of that treatment
(aka Pygmalion effect; self-fulfilling prophecy).
In the classic classroom experiment that first described the Pygmalion effect; a group of
students wererandomly assigned high intelligence quotient (IQ) scores; their teachers
were then told of these artificial results and had higher expectations of this group as a
result. The students with the randomly assigned high IQ scores actually performed better,
likely because their teachers unconsciously behaved in a manner that would facilitate
their success.

 Examples:
 If observer expects treatment group to show signs of recovery, then he is more
likely to document positive outcomes.
CONFOUNDING BIAS
 When a factor is related to both the exposure and outcome, but not on the causal
pathway  factor distorts or confuses effect of exposure on outcome.
 Examples:
 Pulmonary disease is more common in coal workers than the general population;
however, people who work in coal mines also smoke more frequently than the
general population.
 Multiple/repeated studies.
 Crossover studies (subjects act as their own controls).
 Matching (patients with similar characteristics in both treatment and control
groups).
 Stratified analysis.
 Restriction.
 Randomization.
Effect modification is present when the effect of the main exposure on the outcome is
modified by the presence of another variable Effect modification is not a bias.

LEAD-TIME BIAS
 Early detection is confused with ↑ survival.
 Examples:
 Apparent prolongation of survival after applying a screening test that detects a
disease earlier than it would have been otherwise detected but without any real
effect on prognosis.
 Measure ―back-end‖ survival (adjust survival according to the severity of disease
at the time of diagnosis).

STATISTICAL DISTRIBUTION
MEASURES OF CENTRAL TENDENCY
 Mean = (sum of values)/(total number of values).
 Most affected by outliers (extreme values).
 Median = middle value of a list of data sorted from least to greatest.
 If there is an even number of values, the median will be the average of the middle
two values.
 Mode = most common value.
 Least affected by outliers.
MEASURES OF DISPERSION
 Standard deviation:
 A measure of the degree of dispersion from the
mean.
 When SD is small, data points tend to have
minimal variation and are tightly clustered around the mean.
 In contrast, a large SD implies that the data points are spread out over a large
range.
 The SD is calculated such that 68% of all values lie within 1 SD from the mean.
 The remaining 32% of values lie outside of 1 SD, with 16% of these above and
16% below 1 SD from the mean.
 In addition, 95% of all values are within 2 SDs from the mean and 99.7% are
within 3 SDs.

 Standard error:
 Shows how closely sample means are related to population means.
 = SD/ √n.
NORMAL DISTRIBUTION
 Gaussian, also called bell-shaped.
 Mean = median = mode.

NON-NORMAL DISTRIBUTIONS
 Bimodal:
 Suggests two different populations (eg, metabolic polymorphism such as fast vs
slow acetylators; age at onset of Hodgkin lymphoma; suicide rate by age).
 Positive skew:
 Typically, mean > median > mode.
 Asymmetry with longer tail on right.
 Negative skew:
 Typically, mean < median < mode.
 Asymmetry with longer tail on left.

STATISTICAL HYPOTHESES
 Null (H0):
 Hypothesis of no difference or
relationship (eg, there is no association
between the disease and the risk factor
in the population).
 Alternative (H1):
 Hypothesis of some difference or
relationship (eg, there is some
association between the disease and the risk factor in the population).
OUTCOMES OF STATISTICAL HYPOTHESIS TESTING
CORRECT RESULT
 Stating that there is an effect or difference when one exists (null hypothesis rejected in
favor of alternative hypothesis).
 Stating that there is not an effect or difference when none exists (null hypothesis not
rejected).
INCORRECT RESULT
 Type I error (α):
 Stating that there is an effect or difference when none exists (null hypothesis
incorrectly rejected in favor of alternative hypothesis).
 α is the probability of making a
type I error.
 p is judged against a preset α
level of significance (usually
0.05).
 If p < 0.05, then there is less than a 5% chance that the data will show something
that is not really there.
 Also known as false-positive error.

 Type II error (β)

 Stating that there is not an effect or difference when one exists (null hypothesis is
not rejected when it is in fact false). Also known as false-negative error.
 β is the probability of making a type II error.
 The power of a study:
 The ability of a study to detect a difference between groups when such a
difference truly exists = the probability of rejecting the null hypothesis
when it is false.
 Power =1- β
 ↑ Power and ↓ β by:
 ↑ Sample size
(n).
 ↑ expected effect size.
 ↑ Precision of measurement.

CONFIDENCE INTERVAL
 A Cl is a range that can be interpreted as follows:
 If the study were repeated 100 times, the results obtained would lie within that
range in 95 out of the 100 times.
 All CIs have a null value. For example, the null value would correspond to 0 mm
Hg (no difference between the SBP in cocoa intake and control groups).
 If the Cl crosses the null value, then there is no statistically significant difference
between the groups.
 For example if the 95% Cl is [-2.7, -1.3], so, it is entirely negative (does not cross
0)  the result is considered statistically significant.
 If the 95% CI for a mean difference between 2

variables includes 0, then there is no significant difference and H0 is not rejected.
 If the 95% CI for odds ratio or relative risk includes 1, H0 is not rejected.
 If the CIs between 2 groups do not overlap  statistically significant difference exists.
 If the CIs between 2 groups overlap  usually no significant difference exists.
 When the 95% CI does not include the null value, this gives a corresponding p-value
<0.05 and the association between exposure and outcome is considered statistically
significant.

 A p-value <0.05 reflects that there is a very low probability that the result was due to
chance alone; formally, the p-value is the probability of observing a given (or more
extreme) result due to chance alone assuming that the null hypothesis is true.
 A statistically significant 95% CI corresponds to a p-value <0.05.
 A statistically significant 99% CI corresponds to a p-value <0.01.
COMMON STATISTICAL TESTS

T-TEST
 Checks differences between means of 2 groups. Tea is meant for 2.
 Example: comparing the mean blood pressure between men and women.
ANOVA
 Checks differences between means of 3 or more groups. 3 words: ANalysis Of VAriance.
 Example:
 Assess for differences in mean blood pressure among 3 sample populations
grouped by exercise status (eg, never exercise, exercise occasionally, or exercise
frequently).
CHI-SQUARE (Χ²)
 Checks differences between 2 or more (nominal variables) of categorical outcomes
(not mean values). Pronounce Chi-tegorical.
 Example:
 In order to test the efficacy of a new drug, compare the number of recovered
patients given the drug with those who were not. Chi-square features nominal data
only, and any number of groups (2×2, 2×3, 3×3, etc.).

 Comparing the percentage of members of 3 different ethnic groups who have

essential hypertension.
Which statistical test will most likely be used to analyze the data?
 Comparing the blood sugar levels of husbands and wives  t-test.
 Comparing the number of staff who do or do not call in sick for each of 3 different
nursing shifts  Chi-square test. Staff either call in sick or do not (nominal variable)
over 3 shifts (nominal variable). Two nominal variables with a 2 × 3 design, chi-square.
 Relationship between time spent on studying and test score  Pearson correlation.
 A researcher believes that boys with same-sex siblings are more likely to have higher
testosterone levels  t-test.
 Twenty rats are coated with margarine and 20 with butter as part of a study to explore
the carcinogenic effects of oleo.  Chi-square test. Margarine versus butter (nominal),
cancer versus no cancer (nominal). Therefore, chi-square.
 Comparison of passing and failure rates at each of 3 test sites  Chi-square test.
Passing versus failure (nominal), 3 sites (nominal).
 Comparison of actual measured test scores for students at each of 3 test sites 
ANOVA.

PEARSON CORRELATION COEFFICIENT

 r is always between −1 and +1.
 The closer the absolute value of r is to 1, the stronger the linear correlation between the 2
variables.
 Positive r value  positive correlation (as one variable ↑, the other variable ↑).
 Negative r value  negative correlation (as one variable ↑, the other variable ↓).
 Coefficient of determination = r2 (amount of variance in one variable that can be
explained by variance in another variable).
MEDICINE IN SOCIETY
PRIMARY, SECONDARY & TERTIARY PREVENTION

MORTALITY REDUCTION
 Tobacco use, especially smoking, is the single most preventable cause of death and
disease in the United States.
 The risk of Ml-associated mortality begins to decrease immediately upon smoking

cessation.
 It appears that smoking not only worsens the complications of diabetes but also increases
the likelihood of developing diabetes in the first place thus highlighting the importance of
prevention and early cessation.
CANCER FACTS:
 The most common cancers among women living in the United States during 2011 were
breast cancer, lung cancer and colon cancer.
 Breast cancer is the most common cancer among women in the United States, but its
mortality rate is moderately low.
 Despite the fact that the incidence of lung cancer is much lower than that for breast
cancer, mortality from lung cancer is higher. Lung cancer has emerged as the leading
cause of cancer mortality in women. Increased mortality is due to ↑ tobacco use.


Observational Studies: Usmle Endpoint Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Observational Studies: Usmle Endpoint Statistics

Uploaded by

Copyright:

Available Formats

USMLE ENDPOINT STATISTICS

DR AHMED SHEBL Page 1

CROSS-SECTIONAL STUDY “SNAPSHOT STUDY”

DR AHMED SHEBL Page 2

TWIN CONCORDANCE STUDY

DR AHMED SHEBL Page 3

CROSSOVER STUDY DESIGN

 Crossover trials allow the patients to serve as their own controls.

DR AHMED SHEBL Page 4

ALGORITHM FOR DETERMINING STUDY DESIGN

DR AHMED SHEBL Page 5

EVALUATION OF DIAGNOSTIC TESTS

SENSITIVITY (TRUE POSITIVE RATE)

DR AHMED SHEBL Page 6

SPECIFICITY (TRUE NEGATIVE RATE)

DR AHMED SHEBL Page 7

Screening test  high sensitivity.

Confirmatory (gold standard) test  high specificity.

DR AHMED SHEBL Page 8

Sensitivity, specificity, and likelihood ratios are not.

DR AHMED SHEBL Page 9

ODDS RATIO (OR)

DR AHMED SHEBL Page 10

reasonable mathematical approximation of RR, where RR becomes

ABSOLUTE RISK REDUCTION

DR AHMED SHEBL Page 11

RELATIVE RISK REDUCTION

NUMBER NEEDED TO TREAT

NUMBER NEEDED TO HARM

DR AHMED SHEBL Page 12

DR AHMED SHEBL Page 13

 UW: An increasing prevalence and stable incidence can be attributed to factors

DR AHMED SHEBL Page 14

DR AHMED SHEBL Page 15

DR AHMED SHEBL Page 16

BIAS AND STUDY ERRORS

DR AHMED SHEBL Page 17

DR AHMED SHEBL Page 18

DR AHMED SHEBL Page 19

DR AHMED SHEBL Page 20

DR AHMED SHEBL Page 21

DR AHMED SHEBL Page 22

DR AHMED SHEBL Page 23

DR AHMED SHEBL Page 24

OUTCOMES OF STATISTICAL HYPOTHESIS TESTING

DR AHMED SHEBL Page 25

 Type II error (β)

DR AHMED SHEBL Page 26

 If the 95% CI for a mean difference between 2

DR AHMED SHEBL Page 27

COMMON STATISTICAL TESTS

DR AHMED SHEBL Page 28

 Comparing the percentage of members of 3 different ethnic groups who have

DR AHMED SHEBL Page 29

PEARSON CORRELATION COEFFICIENT

DR AHMED SHEBL Page 30

 The risk of Ml-associated mortality begins to decrease immediately upon smoking

DR AHMED SHEBL Page 31

DR AHMED SHEBL Page 32

You might also like