You are on page 1of 74

TABLE OF CONTENTS

Biostatistics
1. Research Study Designs
2. Bias and Study Errors
3. Risk Quantification
4. Diagnostic Tests
5. Statistical Distributions
6. Statistical Testing
REVIEW OUTLINE (g
1. Intro to Research Design

Biostatistics: A. Overview
B. Features of Research Design

Research
C. Types of Research Studies
2. Descriptive Studies - Observational I
A. Overview

Study
B. Case Report/Case Series
C. Cross Sectional Studies
D. Ecological Studies

Designs
3. Longitudinal Studies - Observational II
A. Overview
B. Case Control Studies
C. Cohort Studies
4. Heritability Studies - Observational III
A. Overview
B. Twin Concordance Studies
C. Adoption Studies
5. Experimental Studies
A. Clinical Trials
B. Crossover Studies
C. Drug Trial Phases
6. Evaluating Research Studies
A. Overview
B. Bradford Hill Criteria
7. Practice Question
(§) Bootcamp.com
Biostatistics: Research Study Designs Bootcamp.com

Intro to Research Design


Overview
Goal: Determine a relationship between variables
• Examples:
o Does exposure to pesticides increase the risk of Parkinson’s disease?
o Is a new cancer therapy more effective than current treatments?
Features of Research Design
• Hypothesis: Proposed explanation based on limited evidence
o Example: Exposure to pesticides increases the risk of Parkinson’s disease
• Independent Variable: The hypothesized cause
o Example: Exposure to pesticides
• Dependent Variable: The hypothesized effect
o Example: Development of Parkinson’s disease
• Sample: The portion of a population being studied
o Example: Farm workers in California
• Control: Other potential variables are intentionally held constant
o Example: Group participants by age
Types of Research Studies
• Observational
o Participants grouped based on observed independent variable
o Dependent variable measured
o Cannot determine causal relationships
Experimental
o Independent variable manipulated by researcher
o Dependent variable measured
o Can determine causality M ' M ' M
o Often ethically implausible
• • •
Biostatistics: Research Study Designs Bootcamp.com

Descriptive Studies - Observational Studies I


Overview:
• No intervention involved, cannot assess causality
• Conducted at a single time point
• Limited ability for quantitative analysis
• Hypotheses generated from observations
Case Report/Case Series
• Participants: Selected based on diagnosis
o Single participant for case report, Multiple participants for case series
• Data collected: Potential risk factors
• Quantitative analysis: Average exposure for disease group only
• Strengths: Simple, cost effective, generate lots of descriptive data
• Limitations: No control group, little quantitative analysis, does not generalize
• Example: Collecting data on assess of young men with Kaposi’s sarcoma
Cross Sectional Studies
• Participants: Selected from a group of interest
• Data collected: Participants’ diagnosis and/or risk factor history
• Quantitative analysis: Prevalence
• Strengths: Relatively efficient
• Limitations: Cannot evaluate risk or incidence
• Example: Counting how many children in a particular city have asthma
Ecological Studies
• Participants: Selected as whole populations
• Data collected: Frequency of a disease and risk factors
• Quantitative analysis: Comparison of averages between populations
• Strengths: Cost-effective and efficient
• Limitations: Ecological fallacy
• Example: Measuring levels of air pollution and cancer rates in two cities
Biostatistics: Research Study Designs Bootcamp.com

Longitudinal Studies - Observational II


Overview
Cannot assess causality
No intervention involved
• Assign temporality to exposure and outcome
• More robust quantitative analysis possible
Cohort Studies
• Participants: Recruited based on exposure history HIGH y iel d
• Data collected: Disease outcomes
• Quantitative Analysis: Incidence, relative risk
• Time Course: Retrospective or prospective
• Strengths: Assign temporality to exposure and outcome
• Limitations: Can be expensive and time-consuming
• Example:
o Recruit people who take aspirin daily and people who don’t
o Collect data on whether or not each person has a heart attack
Case Control Studies
Participants: Recruited based on diagnosis HIGH y iel d
Data collected: Exposure histories
Quantitative Analysis: Odds ratio
Time course: Typically retrospective
Strengths: Often more cost-effective and efficient
Limitations: Cannot calculate risk or incidence
Example:
o Recruit people who have and have not had a heart attack
o Collect data on whether or not they took aspirin daily
Biostatistics: Research Study Designs Bootcamp.com

Heritabilitv Studies - Observational III


Overview
• Cannot assess causality
• No intervention involved
• Specifically useful for assessing heritability of traits

Adoption Studies
• Adopted children compared to biologic family and adopted family
o Biologic family —> Similar genes, different environment
o Adopted family —> Different genes, similar environment

Twin Concordance Studies


• Monozygotic twins compared to dizygotic twins
o Monozygotic twins —> ~100% shared genome + shared environment
o Dizygotic twins —> ~50% shared genome + shared environment
Biostatistics: Research Study Designs Bootcamp.com

Experimental Studies
Overview
• Researcher manipulates the independent variable
• Can assess causality

Clinical Trials
• Experimental studies on humans
• Compare benefits of 2 or more treatments, or a treatment and a placebo
• Randomized: Participants assigned to experimental groups randomly
• Controlled: Other potentially relevant variables are accounted for in randomization
• Blinded: Participants involved in the study are unaware of group assignment
o Double-Blinded: Participants and researchers conducting the study
o Triple-Blinded: Participants, researchers conducting the study, and data analysts

-,.,.
Crossover Studies
• Patients serve as their own controls
• Can J confounding bias
WASH
OUT

Drug Trial Phases:


• Phase 1: Assess toxicity, phar macokinetics, and pharmacodynamics
o Small sample of healthy individuals
• Phase 2: Assess treatment efficacy, dosing, and adverse effects
o Larger sample of patients
• Phase 3: Compares new drug to the current standard of care
o Large sample of patients randomly assigned to an experimental group
• Phase 4: Identify long-term or rare adverse effects
o Postmarketing surveillance
Biostatistics: Research Study Designs Bootcamp.com

Evaluating Research Studies


Overview CAUSE EFFECT
• Concepts that help interpret evidence for causality
• Neither prove nor disprove causality

Bradford Hill Criteria


• Temporality: Does the presumed cause occur before the effect?
• Strength: How strong is the association?
o Association / Causation
• Consistency: Can the results be replicated?
• Specificity: Is there a 1:1 relationship between the presumed cause and effect?
• Biological gradient: Does J, exposure to the cause J, the effect? CAUSE
o Reversibility: Does the effect go away if the cause is removed?
• Plausibility: Is there a conceivable mechanism for how the presumed cause leads to the effect?
• Coherence: Is this relationship consistent with existing scientific knowledge?
• Experiment: Is the relationship supported by experimental data?
• Analogy: Have similar causes been shown to lead to similar effects?
CAUSE EFFECT

CAUSE EFFECT

\________ -
CAUSE

CAUSE ] EFFECT
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1001 Previous Next

A research group is studying whether or not living close to the highway increases risk of children developing asthma. They
identify 500 children from ages 2 to 8 living in housing located within 1000 feet of a major highway, and 500 children from ages
2 to 8 living in housing located at least 1 mile from a major highway. 5 years later, the researchers check the children’s clinical
records and determine how many in each group have been diagnosed with asthma. What kind of study design best describes
this study?

A. Case-control study
B. Cross-sectional study
C. Prospective cohort study
D. Randomized clinical trial
E. Case series
F. Retrospective cohort study
G. Crossover study
H. Ecological study
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1001 Previous Next

A research group is studying whether or not living close to the highway increases risk of children developing asthma. They
identify 500 children from ages 2 to 8 living in housing located within 1000 feet of a major highway, and 500 children from ages
2 to 8 living in housing located at least 1 mile from a major highway. 5 years later, the researchers check the children’s clinical
records and determine how many in each group have been diagnosed with asthma. What kind of study design best describes
this study?

A. Case-control study
B. Cross-sectional study
O C. Prospective cohort study
D. Randomized clinical trial
E. Case series
F. Retrospective cohort study
G. Crossover study
H. Ecological study
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1001 Previous Next

A research group is studying whether or not living close to the highway increases risk of children developing asthma. They
identify 500 children from ages 2 to 8 living in housing located within 1000 feet of a major highway, and 500 children from ages
2 to 8 living in housing located at least 1 mile from a major highway. 5 years later, the researchers check the children’s clinical
records and determine how many in each group have been diagnosed with asthma. What kind of study design best describes
this study?

A. Case-control study - Select 500 children with asthma and 500 without, and see if they grew up next to the highway
B. Cross-sectional study - Select 1000 children living in a specific city and assess asthma diagnosis and proximity to highway for each child
O C. Prospective cohort study
D. Randomized clinical trial - Select 1000 children and force half of them to live near the highway, then assess asthma diagnosis in five years
E. Case series - Write a detailed report about ten children diagnosed with asthma
F. Retrospective cohort study - Select 500 teenagers who lived near the highway as children and 500 who did not, and assess their asthma diagnosis
G. Crossover study - Not feasible since two treatments are not being compared
H. Ecological study - Measure average proximity to highway and rates of asthma diagnosis in an entire population of children in two cities
REVIEW OUTLINE

1. Types of Study Errors


Biostatistics: A. Overview
B. Accuracy vs. Precision

Bias and
C. Random vs. Systematic Error
2. Recruitment Bias
A. Overview

Study Errors
B. Sampling Bias
C. Attrition Bias
3. Performing Bias
A. Overview
B. Recall Bias
C. Measurement Bias
D. Procedure Bias
E. Observer Bias
4. Interpretation Bias
A. Overview
B. Confounding Bias
C. Lead Time Bias
D. Length Time Bias
5. Practice Question

(§) Bootcamp.com
Biostatistics: Bias and Study Errors Bootcamp.com

Types of Study Errors


Overview:
• The goal of a research study is to determine the truth High Accuracy Low Accuracy

!GG
• Error: Difference between the true value and measured result
o Error / mistake
Accuracy vs. Precision:
• Accuracy: Proximity of a measurement to the true value
o Also referred to as validity
• Precision: Variation between the measurement values
o Also referred to as reproducibility or reliability
Random vs. Systematic Error:
• Random Error: J, precision leads to measured values that differ from the truth
o Unpredictable

jGG
• Systematic Error: J, accuracy leads to measured values that differ from the truth
o Commonly referred to as bias
o Predictable
• True Value = Measured Value + Random Error + Systematic Error
Biostatistics: Bias and Study Errors Bootcamp.com

Recruitment Bias Timeline of Running a Study


Overview:
Recruiting Collecting Interpreting
• Bias: Systematic error in study design
Participants Data Results
• Recruitment Bias: Occurs when recruiting participants
o Study participants not representative of overall population
o Also known as selection bias
Sampling Bias:
• Members of population of interest have different probabilities of being selected
o Example: Advertising a clinical trial on an undergraduate bulletin board
• Berkson’s Bias: Occurs when participants selected from hospitalized patients
o Example: Patients hospitalized for DKA recruited for a study on a new diabetes med
Attrition Bias:
• Participants lost to follow up are not representative of overall population
o Example: Participants who do not own a car drop out of the study
Biostatistics: Bias and Study Errors Bootcamp.com

Performing Bias Timeline of Running a Study


Overview:
• Occurs while conducting the study Recruiting Collecting Interpreting
Recall Bias: Participants Data Results
• Participants’ memory of exposure history changes with knowledge of outcome
• Affects retrospective studies
• Example: Patients with lung cancer recall asbestos exposure
Measurement Bias:
• Data collected in a systematically incorrect way
o Example: A study center’s pulse oximeter measures 3% too low
• Hawthorne Effect: Behavior changes when being observed
o Example: Hand-washing increases when hospital staff are knowingly observed
Procedure Bias:
• Patients in different groups are treated differently
o Example: In a study on a new surgical technique, only treatment group patients receive surgery
Observer Bias:
• Researcher’s expectations influence data collection
• Pygmalion/Rosenthal Effect: High expectations lead to improved performance
• Example: A researcher who expects a sleep medication to work records better sleep in treatment group
Biostatistics: Bias and Study Errors Bootcamp.com

Timeline of Running a Study


Interpretation Bias
Overview: Recruiting Collecting Interpreting
• Occurs when interpreting results Participants Data Results
Confounding Bias: h ig h y iel d
A variable other than the independent variable causes a change in the dependent variable
o Example: A study claims drinking coffee is associated with lung cancer without controlling for smoking
Effect Modification: The relationship between the independent and dependent variables depends on a third variable
o A true effect (not bias!) commonly confused with confounding bias
o Example: Oral contraceptives increase risk of blood clots only in people who smoke
■ Risk of blood clots: OOP + smoking > smoking alone > OOP alone

0@O
■ If this were confounding, risk of blood clots: OOP + smoking = smoking alone > OOP alone
Lead Time Bias:
• Earlier detection incorrectly interpreted as longer survival
• Occurs when survival time is chosen as an endpoint G
o Example: A new screening test detects colon cancer earlier, but disease course has not changed
Length Time Bias:
• More aggressive diseases more likely to be diagnosed based on symptoms
• Less aggressive diseases more likely to be diagnosed based on screening tests
o Example: A new screening test disproportionately detects less aggressive breast cancer

Death or
Death Remission
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1002 Previous Next

A research group has developed a new treatment for pancreatic cancer. They are interested in seeing if their new treatment is
more effective than currently available options. They recruit 100 participants who were recently diagnosed with pancreatic
cancer at one of five outpatient clinics and randomly assign half to receive their new treatment, and half to receive the current
standard of care. Some of the demographic information of the patient groups is shown in Table 1. Neither the patients nor their
care teams are aware of the group assignments. The researchers then compare 5-year survival rates between the treatment
groups. What kind of bias is most likely affecting this study?

A. Recall bias
B. Observer bias
Experiment Group Control Group
C. Lead-time bias
D. Confounding bias Mean Age 60.3 62.5
E. Berkson’s bias
F. Misclassification bias Tobacco Use 37 34

Heavy Alcohol Use 7 40

Tumor Location:

Head 23 20

Tail 27 30

Table 1. Demographic information for experimental and


control groups. Mean age and number of participants with
tobacco use, alcohol use, and tumor location in the head or tail
of the pancreas for each group is described.
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1002 Previous Next

A research group has developed a new treatment for pancreatic cancer. They are interested in seeing if their new treatment is
more effective than currently available options. They recruit 100 participants who were recently diagnosed with pancreatic
cancer at one of five outpatient clinics and randomly assign half to receive their new treatment, and half to receive the current
standard of care. Some of the demographic information of the patient groups is shown in Table 1. Neither the patients nor their
care teams are aware of the group assignments. The researchers then compare 5-year survival rates between the treatment
groups. What kind of bias is most likely affecting this study?

A. Recall bias
B. Observer bias
Experiment Group Control Group
C. Lead-time bias
O D. Confounding bias Mean Age 60.3 62.5
E. Berkson’s bias
F. Misclassification bias Tobacco Use 37 34

Heavy Alcohol Use 7 40

Tumor Location:

Head 23 20

Tail 27 30

Table 1. Demographic information for experimental and


control groups. Mean age and number of participants with
tobacco use, alcohol use, and tumor location in the head or tail
of the pancreas for each group is described.
REVIEW OUTLINE
©
1. Morbidity Frequency Measures
Biostatistics:
8. Odds Ratio and Relative Risk -
A. Overview Practice Question 2
B. Prevalence 9. Additional Calculations with

Risk
C. Incidence Relative Risk
2. Interpreting Prevalence A. Overview
and Incidence B. Relative Risk Reduction

Quantification
A. Overview C. Absolute Risk Reduction
B. Interpreting Population Changes D. Attributable Risk
C. Interpreting Disease State E. Number Needed to Treat
3. Morbidity Frequency Measures - F. Number Needed to Harm
Practice Question 10. Additional Calculations with
4. Mortality Frequency Measures Relative Risk - Practice
A. Overview Question
B. Mortality Rate
C. Case Fatality Rate
5. Relative Risk
A. Overview
B. Calculating Relative Risk
C. Interpreting Relative Risk
6. Odds Ratio
A. Overview
B. Calculating Odds Ratio
C. Interpreting Odds Ratio
7. Odds Ratio and Relative Risk -
Practice Question 1
(§) Bootcamp.com
Biostatistics: Risk Quantification Bootcamp.com

Morbidity Frequency Measures


Overview:
• Morbidity: Any departure from a state of health (except death)
• Important for quantifying public health status
Prevalence:
• Proportion of people currently ill
• Total # cases at a point in time / Total # of people in population HIGH YIELD
o Example: In a town of 1,000 people, 40 have diabetes. Prevalence = 4%.
• Can be calculated from a cross-sectional study
Incidence:
• Cumulative incidence: Proportion of population that is becoming ill
o # New cases in a unit of time / Total # of people at risk in population HIGH y iel d
o Must subtract people who are not at risk out of the denominator
o Example: In a town of 1,000 people, 40 have diabetes in the year 2000.
• In the year 2001, 10 more people are diagnosed with diabetes.
• Incidence of diabetes in 2001 = 1.04%
• Attack Rate: # People infected / # People exposed
o Example: Of 100 people at a picnic, 40 ate potato salad and 20 became sick
• Attack rate = 20%
• Food-specific attack rate = 50%
• Incidence Rate: Number of people becoming ill per unit of time
o # New cases in a unit of time / Total person-time spent at risk
o More accurate than cumulative incidence
■ Accounts for exact timing of disease development + population changes
o Example: Person 1 - 4mos, person 2 - 2mos, person 3 - 7mos, person 4 - limos
• Incidence rate = 4 cases per 1,176 person-months (98 person-years)
• 4.1 cases per 100 people per year
Biostatistics: Risk Quantification Bootcamp.com

Interpreting Prevalence and Incidence


Overview:
• Prevalence: Proportion of people currently ill
• Incidence: Measure of people who are becoming ill
o Cumulative incidence: Proportion of population becoming ill
o Incidence rate: # Of people becoming ill per unit of time
• Prevalence = Incidence Rate x Avg Duration of Disease

Interpreting Population Changes:


• f Prevalence
o f Incidence: Acquiring disease is becoming more common
■ Example: HIV outbreak (more people getting sick and staying sick)
o =/; Incidence: Avg duration of disease is increasing
■ Example: A new cancer treatment extends life expectancy by 10 years
• =/; Prevalence
o f Incidence: Rapidly healing or fatal disease (+/- seasonality)
■ Example: Annual flu incidence > flu prevalence measured in summer
o =/; Incidence: Effective primary prevention
■ Example: Opening a needle exchange program reduces Hep C cases

Interpreting Disease State:


• Chronic diseases: Prevalence > Incidence
• Acute diseases: Prevalence = Incidence
o Rapidly healing or deadly diseases: Prevalence < Incidence
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1003 Previous Next

A local public health agency is studying atherosclerosis in their town’s adult population. They discover that in the year 2020,
8,500 of the 35,000 adults living in the town are currently diagnosed with atherosclerosis. In 2021, they find that 9,100 adults
living in the town now have a diagnosis of atherosclerosis. They decide to run a series of public service announcements on
healthy diet changes to prevent the development of atherosclerosis.

What is the annual cumulative incidence of atherosclerosis?

A. 2.3%
B. 26%
C. 13%
D. 24%
E. 7.1%
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1003 Previous Next

A local public health agency is studying atherosclerosis in their town’s adult population. They discover that in the year 2020,
8,500 of the 35,000 adults living in the town are currently diagnosed with atherosclerosis. In 2021, they find that 9,100 adults
living in the town now have a diagnosis of atherosclerosis. They decide to run a series of public service announcements on
healthy diet changes to prevent the development of atherosclerosis.

What is the annual cumulative incidence of atherosclerosis?

O A. 2.3%
B. 26%
C. 13%
D. 24%
E. 7.1%
Item 2 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1003 Previous Next

A local public health agency is studying atherosclerosis in their town’s adult population. They discover that in the year 2020,
8,500 of the 35,000 adults living in the town are currently diagnosed with atherosclerosis. In 2021, they find that 9,100 adults
living in the town now have a diagnosis of atherosclerosis. They decide to run a series of public service announcements on
healthy diet changes to prevent the development of atherosclerosis.

What are the most likely changes to the prevalence and incidence of atherosclerosis in the town in 2022, assuming the public
health agency's interventions are effective?

A. The prevalence decreases, the incidence stays the same


B. The prevalence stays the same, the incidence increases
C. The prevalence increases, the incidence decreases
D. The prevalence stays the same, the incidence stays the same
E. The prevalence stays the same, the incidence decreases
Item 2 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1003 Previous Next

A local public health agency is studying atherosclerosis in their town’s adult population. They discover that in the year 2020,
8,500 of the 35,000 adults living in the town are currently diagnosed with atherosclerosis. In 2021, they find that 9,100 adults
living in the town now have a diagnosis of atherosclerosis. They decide to run a series of public service announcements on
healthy diet changes to prevent the development of atherosclerosis.

What are the most likely changes to the prevalence and incidence of atherosclerosis in the town in 2022, assuming the public
health agency's interventions are effective?

A. The prevalence decreases, the incidence stays the same


B. The prevalence stays the same, the incidence increases
C. The prevalence increases, the incidence decreases
D. The prevalence stays the same, the incidence stays the same
O E. The prevalence stays the same, the incidence decreases
Biostatistics: Risk Quantification Bootcamp.com

Mortality Frequency Measures


Overview:
• Mortality: State of being dead
• How much of the population has died?

Mortality Rate:
• Cumulative incidence of death in entire population
• Number of deaths in a population in a given time interval / Total number of people in population
• Example: In 2000, 10 people in a town of 1000 die. The mortality rate is 1%.

Case Fatality Rate:


• Cumulative incidence of death in patients with disease
• Number of cases in which patients died in a given time interval / Total number of cases
• Example:
o In 2000, 120 cases of Salmonella infection were reported in a town of 1000 people.
o 10 of those patients died.
o The case fatality rate is 8.3%
Biostatistics: Risk Quantification Bootcamp.com

Relative Risk
Overview:
• Longitudinal studies assess association between exposure + outcome
• Need to quantify strength of association
• Risk: Probability of an event occurring (outcome of interest / all possible outcomes in population at risk)
o Example: 5,000 people who take aspirin and 5,000 people who do not are recruited for a study
■ 300 people have a first-time heart attack during the 10 years they are followed
■ Overall risk = 300/10,000 = 3%
• Risk (and RR) only calculated from longitudinal cohort studies (prospective or retrospective) HIGH YIELD
o Need to know the # people at risk of developing the outcome
o Ina case-control study, all participants have already developed the outcome
Calculating Relative Risk:
• Compares the risk of an event occurring based on exposure
• RR = Risk,.Exposed./Risk.,.,-
Not Exposed
.
o RiskExposed = (# People w/ outcome) / (# People w/ exposure)
■ Example: 120 heart attacks among 5,000 people taking aspirin —► Risk = 0.024
o RiskNot Exposed = (# People w/ outcome) / (# People w/out exposure)
■ Example: 180 heart attacks among 5,000 people not taking aspirin —> Risk = 0.036
o Example: RR = 0.667
• RR = [a/(a+b)] / [c/(c+d)] h ig h y iel d
Interpreting Relative Risk:
• RR = 1 —► No difference in risk basedon exposure
• RR > 1 —► Exposure f risk
• RR < 1 —► Exposure J. risk
o Example: People taking aspirin are 0.667 times as likely to have a heart attack
■ In other words, people not taking aspirin are 1 5x as likely to have a heart attack
• RR 1 does not imply causation
Biostatistics: Risk Quantification Bootcamp.com

Odds Ratio Outcome


Overview:
• Longitudinal studies assess association between exposure + outcome
• Cannot calculate risk in a case-control study (always retrospective)
• Can calculate odds: Probability of event occurring / Probability of event not occurring
o Example: 5,000 people who have had a heart attack and 5,000 who have not are recruited for a study 5
■ 4,000 of these people report taking a daily aspirin in the 10 years before their heart attack o
■ The overall odds of taking aspirin is 4,000/6,000 = 0.667 S’
Calculating Odds Ratio:
• Compares the odds of having had a prior exposure based on outcome history
• Primarily used for case-control studies
j _i _i
^Exposure | Outcome 7 ^Exposure | No Outcome
o OddsExposure | outcome= (0/° ^60ple w/ exposure and w/ outcome) / (% People w/out exposure and w/ outcome)
■ Example: Among people who have had a heart attack, 1,500 took aspirin and 3,500 did not
• Odds = 0.3/0.7 = 0.429
0 OddsExposure | n o outcome = ^sople w/ exposure and w/out outcome) / (% People w/out exposure and w/out outcome)
■ Example: Among people who have not had a heart attack, 2,500 took aspirin and 2,500 did not
• Odds = 0.5/0.5 = 1
0 Example: OR = 0.429/1 = 0.429
0 Shortcut: (a x d)/(b x c) HIGH YIELD
Interpreting Odds Ratio:
• OR = 1 —> Odds of exposure the same regardless of outcome
• OR > 1 —> Odds of exposure f if outcome occurred
• OR < 1 —> Odds of exposure | if outcome occurred
0 Example: People who have a heart attack are 0.429x as likely to have taken aspirin
■ People who have not had a heart attack are 2.33x more likely to have taken aspirin
• OR / 1 does not imply causation
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1004 Previous Next

A research group recruits 2,000 participants with skin cancer and 2,000 participants without skin cancer. Each participant fills
out a survey about their time spent in the sun, their use of sunscreen, and their use of UV tanning beds. They find an
association between skin cancer and use of UV tanning beds. These results are shown below.

Which of the following measures of association are they most likely to report in their published work?

A. Relative risk
B. Number needed to harm
C. Incidence rate
D. Pearson correlation coefficient
E. Odds ratio
Skin Cancer No Skin Cancer

Tanning Bed Use 576 133

No Tanning Bed Use 1424 1867


Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1004 Previous Next

A research group recruits 2,000 participants with skin cancer and 2,000 participants without skin cancer. Each participant fills
out a survey about their time spent in the sun, their use of sunscreen, and their use of UV tanning beds. They find an
association between skin cancer and use of UV tanning beds. These results are shown below.

Which of the following measures of association are they most likely to report in their published work?

A. Relative risk
B. Number needed to harm
C. Incidence rate
D. Pearson correlation coefficient
O E. Odds ratio
Skin Cancer No Skin Cancer

Tanning Bed Use 576 133

No Tanning Bed Use 1424 1867


=
-
- Item 2

Question ID:
of 2
1004
■ _ Mark �

Previous

Next
Test Your Knowledge
Difficulty: 000 Bootcamp.com

A research group recruits 2,000 participants with skin cancer and 2,000 participants without skin cancer. Each participant fills
out a survey about their time spent in the sun as well as their use of sunscreen and UV tanning beds. They find an association
between skin cancer and use of UV tanning beds. These results are shown below.

Which of the following calculations represents the odds ratio for this study?

A. (576/(576+133))/(1424/(1424+1867))
B. 1/(((576/(576+133))/(1424/(1424+1867)))
C. (576+1424)/(576+133+1424+1867)
D. (576/1424))/(133/1867))
E. (576x1424)/(133x1867)
Skin Cancer No Skin Cancer

Tanning Bed Use 576 133

No Tanning Bed Use 1424 1867


=
-
- Item 2

Question ID:
of 2
1004
■ _ Mark �

Previous

Next
Test Your Knowledge
Difficulty: 000 Bootcamp.com

A research group recruits 2,000 participants with skin cancer and 2,000 participants without skin cancer. Each participant fills
out a survey about their time spent in the sun, their use of sunscreen, and their use of UV tanning beds. They find an
association between skin cancer and use of UV tanning beds. These results are shown below.

Which of the following calculations represents the odds ratio for this study?

A. (576/(576+133))/(1424/(1424+1867))
B. 1/(((576/(576+133))/(1424/(1424+1867)))
C. (576+1424)/(576+133+1424+1867)
0 D. (576/1424))/(133/1867))
E. (576x1424)/(133x1867)
Skin Cancer No Skin Cancer

Tanning Bed Use 576 133

No Tanning Bed Use 1424 1867


Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1005 Previous Next

The contingency table from a cohort study on the association between eating a high fat diet and developing gallstones is
shown below. Which of the following is the most accurate interpretation of the results from this study?

A. Eating a highfat diet is associated with increased risk of developing gallstones, because the relative risk is greater than 0
B. Eating a highfat diet is associated with decreased risk of developing gallstones, because the relative risk is less than 1
C. Eating a highfat diet is associated with increased risk of developing gallstones, because the odds ratio is greater than 0
D. Eating a highfat diet is associated with decreased risk of developing gallstones, because the odds ratio is greater than 1

Gallstones

+ - Sum

High Fat Diet


+ 491 3719 4210

- 624 3452 4076

Sum 1115 7171


Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1005 Previous Next

The contingency table from a cohort study on the association between eating a high fat diet and developing gallstones is
shown below. Which of the following is the most accurate interpretation of the results from this study?

A. Eating a high fat diet is associated with increased risk of developing gallstones, because the relative risk is greater than 0
O B. Eating a high fat diet is associated with decreased risk of developing gallstones, because the relative risk is less than 1
C. Eating a high fat diet is associated with increased risk of developing gallstones, because the odds ratio is greater than 0
D. Eating a high fat diet is associated with decreased risk of developing gallstones, because the odds ratio is greater than 1

Gallstones

+ - Sum

High Fat Diet


+ 491 3719 4210

- 624 3452 4076

Sum 1115 7171


Biostatistics: Risk Quantification Bootcamp.com

Additional Calculations with Relative Risk Outcome


Overview:
• Help contextualize relative risk
Relative Risk Reduction:
• Proportion of (desired) outcome explained by (beneficial) exposure
• RRR = 1 - RR
• Example: The RR of getting the flu if vaccinated vs not vaccinated is 0.333. The RRR is 0.667.
Absolute Risk Reduction:
• Amount of (desired) outcome explained by (beneficial) exposure
• In(wn. - Risk.Exposed. = LV
ARR = Risk.Unexposed [c/(c+d)]
/j - l[a/(a+b)]
\ /j
• Example: The ARR of the flu vaccine is 0.2.
Attributable Risk:
• Amount of (undesired) outcome explained by (harmful) exposure
• AR = Risk.Exposed. - Risk..Unexposed = [a/(a+b)]
L v /j - [c/(c+d)]
l \ /J
• %AR = (RR-1)/RR* 100
• Example: The RR of having cirrhosis if a heavy drinker is 15. The AR is 0.14 and the %AR is 93%.
Number Needed to Treat:
• # Patients that need to be treated for 1 patient to benefit
• NNT = 1/ARR HIGH YIELD
• J. Means more effective treatment
• Example: # People needed to vaccinate to prevent 1 flu case is 1/0.2 = 5.
o Always rounded up!
Number Needed to Harm:
• # Patients that need to be exposed for 1 patient to be harmed
• NNH = 1/AR
• f = Safer exposure
• Example: # People that need to drink heavily for 1 person to develop cirrhosis is 7.
o Always rounded down!
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1006 Previous Next

300 patients with rectal cancer undergoing treatment with chemotherapy are recruited into a clinical study assessing the
effectiveness of a new drug designed to reduce chemotherapy-induced side effects. Half of the patients are assigned to
receive the new drug, and half are assigned to receive a placebo. Among the patients in the new drug group, 68 participants
report experiencing side effects following chemotherapy. Among the patients in the placebo group, 103 patients report
experiencing side effects.

What is the relative risk reduction for side effects among patients receiving the new drug?

A. 0.782
B. 0.660
C. 0.204
D. 0.340
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1006 Previous Next

300 patients with rectal cancer undergoing treatment with chemotherapy are recruited into a clinical study assessing the
effectiveness of a new drug designed to reduce chemotherapy-induced side effects. Half of the patients are assigned to
receive the new drug, and half are assigned to receive a placebo. Among the patients in the new drug group, 68 participants
report experiencing side effects following chemotherapy. Among the patients in the placebo group, 103 patients report
experiencing side effects.

What is the relative risk reduction for side effects among patients receiving the new drug?

A. 0.782
B. 0.660
C. 0.204
O D. 0.340
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1006 Previous Next

300 patients with rectal cancer undergoing treatment with chemotherapy are recruited into a clinical study assessing the
effectiveness of a new drug designed to reduce chemotherapy-induced side effects. Half of the patients are assigned to
receive the new drug, and half are assigned to receive a placebo. Among the patients in the new drug group, 68 participants
report experiencing side effects following chemotherapy. Among the patients in the placebo group, 103 patients report
experiencing side effects.

What is the number of patients needed to treat for for one patient to have their side effects improved by the new drug?

A. 2
B. 7
C. 5
D. 13
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1006 Previous Next

300 patients with rectal cancer undergoing treatment with chemotherapy are recruited into a clinical study assessing the
effectiveness of a new drug designed to reduce chemotherapy-induced side effects. Half of the patients are assigned to
receive the new drug, and half are assigned to receive a placebo. Among the patients in the new drug group, 68 participants
report experiencing side effects following chemotherapy. Among the patients in the placebo group, 103 patients report
experiencing side effects.

What is the number of patients needed to treat for for one patient to have their side effects improved by the new drug?

A. 2
B. 7
O C. 5
D. 13
REVIEW OUTLINE

Biostatistics:
1. Sensitivity and Specificity
A. Overview
B. Sensitivity

Diagnostic 2.
C. Specificity
Positive and Negative Predictive Values

Tests
A. Overview
B. Positive Predictive Values
C. Negative Predictive Values
3. Diagnostic Test Thresholds
A. Overview
B. Widening the Inclusion Criteria
C. Narrowing the Inclusion Criteria
D. Receiving Operating Characteristic Curves
4. Sensitivity, Specificity, PPV, and NPV - Practice Question
5. Likelihood Ratio
A. Overview
B. LR+
C. LR-
6. Likelihood Ratio - Practice Question

(§) Bootcamp.com
Biostatistics: Diagnostic Tests Bootcamp.com

Sensitivity and Specificity


Overview:
• Measures that describe the trustworthiness of diagnostic tests
• Intrinsic properties of the test HIGH YIELD
• Determined using “gold standard” diagnostic tests
• Example: How reliable is the result of a urine hCG test compared to blood?

Sensitivity: CONDITION/
DISEASE
• True-positive rate
• Probability that a person who has the condition will test positive
• Sensitivity = TP/(TP+FN) HIGH YIELD
• 1 - Sensitivity = False-negative rate
• f Sensitivity = j FN —► Important for screening
• Mnemonic: Sn-N-Out
• Example: Of 16 pregnant people, 13 test positive. Sensitivity = 0.81.

Specificity:
• True-negative rate
• Probability that person who does not have the condition will test negative
• Specificity = TN/(TN+FP) HIGH y iel d
• 1 - Specificity = False-positive rate
• f Specificity = | FP —> Important for diagnostic confirmation
• Mnemonic: Sp-P-ln
• Example: Of 16 non-pregnant people, 15 test negative. Specificity = 0.94.
Biostatistics: Diagnostic Tests Bootcamp.com

Positive and Negative Predictive Values


Overview:
Measures that describe the trustworthiness of diagnostic tests
Useful when test result known, but true condition status unknown
Vary with disease prevalence HIGH YIELD
o f Prevalence = ] Pre-test probability
mfiflf
CONDITION/
DISEASE
o Pre-test probability: Estimated probability of patient condition before test result is known
Positive Predictive Values:
• Probability that a person with + test has condition
• PPV = TP/(TP+FP) HIGH YIELD
• f PPV = f Probability person with + test has condition
• f Pre-test probability —> f PPV
• Example: Of 14 people with + pregnancy test, 13 are pregnant. PPV = 93%.
Negative Predictive Values:
• Probability that a person with - test does not have condition
• NPV = TN/(TN+FN) HIGH YIELD
• f NPV = f Probability person with - test does not have condition
• T Pre-test probability = ; NPV
• Example: Of 18 people with - pregnancy test, 15 are not pregnant. NPV = 83%.
Biostatistics: Diagnostic Tests Bootcamp.com

Diagnostic Test Thresholds


Overview:
Many diagnostic tests are quantitative
Cut-off value: Defines the inclusion criteria for a + or - result
o Affects sensitivity, specificity, PPV, and NPV values
• High value tests: f Value —> + Result
o Example: Cholesterol in hypercholesterolemia
• Low value tests: | Value —> + Result
o Example: Hemoglobin in anemia
Widening the Inclusion Criteria:
• | Cut-off value for a high value test Disease
absent
• f Cut-off value for low value test
• Effects on test characteristics: Number of
O I False - t Sensitivity & f NPV people
o f False + -> ! Specificity & | PPV
Narrowing the Inclusion Criteria:
• f Cut-off value for a high value test
• | Cut-off value for a low value test
• Effects on test characteristics:
o | False + -> f Specificity & f PPV
Test results
o f False - —> | Sensitivity & | NPV Test results

Receiving Operating Characteristic Curves:


• Used to assess test performance
• Compares sensitivity and specificity at different thresholds
• Plots true-positive rate against false-positive rate
• f Area under the curve = f Performance
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1007 Previous Next

Researchers are studying the use of prostate-specific antigen as a screening test for prostate cancer. Study participants with
and without a prostate cancer diagnosis as confirmed by transrectal ultrasound-guided biopsy undergo prostate-specific
antigen screening. A prostate-specific antigen concentration of 4 ng/mL qualifies as a positive test result. The results are
shown in the table below.

What is the sensitivity of the prostate specific antigen test?

A. 45.4% PSA > 4 ng/mL PSA < 4ng/mL


B. 71.5%
C. 23.7%
Biopsy + 118 379
D. 90.7%
E. 18.1%
Biopsy - 47 456
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1007 Previous Next

Researchers are studying the use of prostate-specific antigen as a screening test for prostate cancer. Study participants with
and without a prostate cancer diagnosis as confirmed by transrectal ultrasound-guided biopsy undergo prostate-specific
antigen screening. A prostate-specific antigen concentration of 4 ng/mL qualifies as a positive test result. The results are
shown in the table below.

What is the sensitivity of the prostate specific antigen test?

A. 45.4% PSA > 4 ng/mL PSA < 4ng/mL


B. 71.5%
O c. 23.7% Biopsy + 118 379
D. 90.7%
E. 18.1%
Biopsy - 47 456
Item 2 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1007 Previous Next

Researchers are studying the use of prostate-specific antigen as a screening test for prostate cancer. Study participants with
and without a prostate cancer diagnosis as confirmed by transrectal ultrasound-guided biopsy undergo prostate-specific
antigen screening. A prostate-specific antigen concentration of 4 ng/mL qualifies as a positive test result. The results are
shown in the table below.

Which of the following statements correctly describes the effects of changing the definition of a positive test result to 3 ng/mL?

A. The negative predictive value will increase. PSA > 4 ng/mL PSA < 4ng/mL
B. The number of false positives will decrease.
C. The sensitivity will decrease.
Biopsy + 118 379
D. The positive predictive value will increase.
E. The specificity will increase.
Biopsy - 47 456
Item 2 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1007 Previous Next

Researchers are studying the use of prostate-specific antigen as a screening test for prostate cancer. Study participants with
and without a prostate cancer diagnosis as confirmed by transrectal ultrasound-guided biopsy undergo prostate-specific
antigen screening. A prostate-specific antigen concentration of 4 ng/mL qualifies as a positive test result. The results are
shown in the table below.

Which of the following statements correctly describes the effects of changing the definition of a positive test result to 3 ng/mL?

O A. The negative predictive value will increase. PSA > 4 ng/mL PSA < 4ng/mL
B. The number of false positives will decrease.
C. The sensitivity will decrease.
Biopsy + 118 379
D. The positive predictive value will increase.
E. The specificity will increase.
Biopsy - 47 456
Biostatistics: Diagnostic Tests Bootcamp.com

Likelihood Ratio CONDITION /


DISEASE

Overview:
• Pre-test probability: Estimated probability of patient condition before test result is known
o Pre-test odds: Pre-test probability / (1 - Pre-test probability)
• Post-test probability: Estimated probability of patient condition after test result is known
o Post-test odds: Post-test probability / (1 - Post-test probability)
• Likelihood ratio is useful for assessing the clinical utility of a test
• Compares likelihood that a person with a specific test result has or does not have the condition
o Calculated individually for both positive and negative test results
• Intrinsic to the test (not affected by prevalence)
• Pre-test odds x Likelihood ratio = Post-test odds

LR+:
• For a person that tests +, Compares likelihood that the person does or does not have the condition
• LR+ = True-positive rate / False-positive rate
o LR+ > 1 —> Person who has condition more likely to test + than person who does not
o LR+ > 10 —> Test is highly specific
• Example: The TP rate of a pregnancy test is 0.81 and the FP rate is 0.06. LR+ = 13.5
o Person who is pregnant is 13.5x as likely to have a + test compared to person who is not

LR-:
• For a person that tests -, Compares likelihood that the person does or does not have the condition
• LR- = False-negative rate / True-negative rate
o LR- < 1 —» Person who does not have condition more likely to test - than someone who does
o LR- < 0.1 —► Test is highly sensitive
• Example: The FN rate of a pregnancy test is 0.19 and the TN rate is 0.94. LR- = 0.2
o Person who is pregnant is 0.2x as likely to have a - test compared to person who is not
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: QOOO Bootcamp.com
Question ID: 1008 Previous Next

A 14 year old male presents to his primary care physician in January with fevers, myalgias, and a rhinorrhea for the past two
days. Several of his teammates on the baseball team have had similar symptoms in the past week. The physician estimates
that this patient’s pre-test odds of influenza are 2.33. He then orders a rapid antigen influenza known to have a specificity of
0.90 and a sensitivity of 0.78. The test comes back positive for influenza. What are the patient’s post-test odds of having
influenza?

A. 90.00
B. 18.17
C. 10.13
D. 2.02
E. 7.80
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: QOOO Bootcamp.com
Question ID: 1008 Previous Next

A 14 year old male presents to his primary care physician in January with fevers, myalgias, and a rhinorrhea for the past two
days. Several of his teammates on the baseball team have had similar symptoms in the past week. The physician estimates
that this patient’s pre-test odds of influenza are 2.33. He then orders a rapid antigen influenza known to have a specificity of
0.90 and a sensitivity of 0.78. The test comes back positive for influenza. What are the patient’s post-test odds of having
influenza?

A. 90.00
O B. 18.17
C. 10.13
D. 2.02
E. 7.80
REVIEW OUTLINE

Biostatistics:
1. Measures of Central Tendency
A. Overview
B. Mean

Statistical C. Median
D. Mode

Distributions
2. Measures of Dispersion
A. Overview
B. Range
C. Standard Deviation
D. Variance
E. Standard Error of the Mean
3. Normal and Non-normal Distributions
A. Overview
B. Normal Curve
C. Non-normal Distributions
4. Statistical Distributions - Practice Question 1
5. Statistical Distributions - Practice Question 2

(§) Bootcamp.com
Biostatistics: Statistical Distributions Bootcamp.com

Measures of Central Tendency


Overview:
• Describe the typical or average value of a dataset
Mean:
• Sum of all data values /# of values

T tfT '
• Advantages:
o Accounts for all values in a dataset
• Disadvantages:
o Most affected by outliers
Median:
Middle value of a dataset sorted from least to greatest
o Average of the two middle values if dataset has even # of values
Advantages:

iilt'MlTT
o Minimally affected by outliers
Disadvantages:
o Ignores all but the middle value of a dataset
Mode:
Most common value
Advantages:
o Least affected by outliers

11 If
o Can be used for categorical data
Disadvantages:
o Multiple or no mode values may occur
o May not be representative
Biostatistics: Statistical Distributions Bootcamp.com

Measures of Dispersion
Overview:
• Describe how much values in a dataset differ from the average
Range:
• Range = Largest value - Smallest value
• Sensitive to outliers
• Example: The final exam scores in a class are 83%, 90%, 78%, 94%, and 74%.
o The range is 94 - 74 = 20
Standard Deviation:
• Calculates the average difference between each value and the mean
• Denoted by o or SD
• o = ^[(Z[x-p]2)/n]
• 1'0 = More variability in a dataset
• For a dataset that follows a normal distribution: HIGH y iel d
o Mean +/-1 o —► 68% of the sample
o Mean +/- 2 o —► 95% of the sample
o Mean +/- 3 o —► 99.7% of the sample
• Example: The mean is 83.8 and the SD is 7.39
Variance:
• Calculates the square of the average difference between each value and the mean
• Denoted by o2
• o2 = (Z[x - p]2) / n
• Example: The variance is 7.392 = 54.6
Standard Error of the Mean:
• Estimates the difference between a sample’s mean and the true population mean
• SEM = oA/n
• SEM f when o f or n |
• Example: The SEM is 3.305
Biostatistics: Statistical Distributions Bootcamp.com

Normal and Non-normal Distributions


Overview:
Describe the probability of each possible value occurring in a population/sample
• Powerful quantitative way of visualizing and analyzing data
Normal Distribution:
• Defined by the mean and the SD
• Mean = Median = Mode
• Symmetrical
• For a dataset that follows a normal distribution: HIGH YIELD
o Mean +/-1 a —> 68% of the sample
o Mean +/- 2 o —> 95% of the sample
o Mean +/- 3 o —> 99.7% of the sample
• Example: Height
Non-normal Distributions:
• Bimodal: A distribution with two distinct modes
o Suggests presence of two different populations
o Example: Peacock tail lengths
• Positive skew: Asymmetry with longer tail on the right
o Mean > Median > Mode
o Example: Income distribution
• Negative skew: Asymmetry with longer tail on the left
o Mode > Median > Mean
o Example: Average age at death
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1009 Previous Next

A graduate student is studying how much time adults living in her town spend exercising each week. She recruits a sample of
2,000 adults between the ages of 18 and 65, and asks them to self-report how much time they spent exercising each week for
three months. She then calculates the average amount of time exercising per week for each individual. She finds that the
mean amount of time spent exercising per week for the entire sample is 2.33 hours, and the variance is 0.43 hours squared.

Assuming this sample is normally distributed, approximately how many people in the sample spend fewer than 1 hour per
week exercising?

A. 3
B. 100
C. 320
D. 50
E. 640
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1009 Previous Next

A graduate student is studying how much time adults living in her town spend exercising each week. She recruits a sample of
2,000 adults between the ages of 18 and 65, and asks them to self-report how much time they spent exercising each week for
three months. She then calculates the average amount of time exercising per week for each individual. She finds that the
mean amount of time spent exercising per week for the entire sample is 2.33 hours, and the variance is 0.43 hours squared.

Assuming this sample is normally distributed, approximately how many people in the sample spend fewer than 1 hour per
week exercising?

A. 3
B. 100
C. 320
O D. 50
E. 640
Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: O Bootcamp.com
Question ID: 1010 Previous Next

A hospital administration team is collecting data on inpatient stay durations at a local hospital. The results of their study are
shown in the graph below.

Which of the following statements best describes this dataset?

A. The mean of this dataset is less than the median


B. The mode of this dataset is 6
C. The mode and the median of this dataset are not equal
D. This dataset is negatively skewed
E. The mode of this dataset is greater than the mean

Duration of Inpatient Hospital Stay (Days)


Item 1 of 1 Test Your Knowledge
■ Mark Difficulty: O Bootcamp.com
Question ID: 1010 Previous Next

A hospital administration team is collecting data on inpatient stay durations at a local hospital. The results of their study are
shown in the graph below.

Which of the following statements best describes this dataset?

A. The mean of this dataset is less than the median


B. The mode of this dataset is 6
O C. The mode and the median of this dataset are not equal
D. This dataset is negatively skewed
E. The mode of this dataset is greater than the mean

Duration of Inpatient Hospital Stay (Days)


REVIEW OUTLINE

1. Hypothesis Testing

Biostatistics:
A. Overview
B. Null Hypothesis
C. Alternative Hypothesis

Statistical 2.
D. Interpreting Hypothesis Testing Results
Testing Errors
A. Overview

Testing B. Type I Errors


C. Type II Errors
3. Common Types of Statistical Tests - Nominal Independent Variables
A. Overview
B. t-Test
C. ANOVA
D. Chi-Square Test
4. Common Types of Statistical Tests - Interval Independent Variables
A. Overview
B. Regression Analysis
C. Pearson’s Correlation Coefficient
5. Hypothesis Testing and Errors - Practice Question
6. Confidence Intervals
A. Overview
B. Calculating Confidence Intervals
C. Interpreting Confidence Intervals
C. Meta-Analysis
(§) Bootcamp.com 7. Confidence Intervals - Practice Question
Biostatistics: Statistical Testing Bootcamp.com

True Population
Hypothesis Testing
Overview:
• Hypothesis: Proposed explanation based on limited evidence
o Developed prior to start of the study
o Example: f Brain natriuretic peptide levels are associated with heart failure
• Hypothesis Testing: Analysis of whether or not hypothesis is supported
o Conducted after obtaining study results from a sample
o Begins with assumption that hypothesis is wrong
o Calculates the likelihood of obtaining those results in a different sample
Null Hypothesis:
• No relationship/difference exists between variables
• Represented by the symbol Ho
• Example: Ho = BNP levels of people w/ HF = BNP levels of people w/out HF Examples of Possible Samples
Alternative Hypothesis:
• Relationship/difference exists between variables
• Represented by the symbol H1
• Example: H1 = BNP levels of people w/ HF > BNP levels of people w/out HF
Interpreting Hypothesis Testing Results:
• p-value: Probability of obtaining results equal or more extreme assuming Ho is true
o Calculated based on known theoretical distributions
o f p-value —► f Likelihood of obtaining these results
• Arbitrarily chosen cut-off value represented by a No HF 11 HF

o Often 0.05 or 0.01


• If p-value > a, fail to reject the Ho
• If p-value L a, reject the Ho HIGH y iel d
o Provides statistically significant support for the H1
o Statistical significance Clinical significance
Biostatistics: Statistical Testing Bootcamp.com

Testing Errors
Overview:
• Results of hypothesis testing from sample may not reflect truth in population
• Null hypothesis (Ho): No relationship exists between the variables
• Alternative hypothesis (H^: A relationship exists between variables
• Hypothesis testing —> Reject Ho or fail to reject Ho
I BNP levels I
Type I Errors:
• Incorrectly rejecting Ho HIGH y iel d
• False positive
• Example: Stating BNP levels f in people w/ HF when they are not
• Alpha: Probability of making a Type I Error
o Pre-set value (often 0.05)
Type II Errors:
• Incorrectly failing to reject Ho h ig h y iel d
• False negative
Example: Stating BNP levels are not f in people w/ HF when they are The Truth in the Population
Beta: Probability of making a Type II Error

1 H1 1
Power: Probability of correctly rejecting Ho
o Power = 1 - p Ho
______________________________________
o f Power by
■ Ta Correct
■ f Sample size Fail to Decision
Type II Error
■ f Expected effect size Reject Ho 1-0 P
■ f Precision of measurement
Correct
Type 1 Error
Reject Ho 0
Decision
1-P
Biostatistics: Statistical Testing Bootcamp.com

Common Types of Statistical Tests - Nominal Independent Variables


Overview:
• Different tests are appropriate for different types of data
• Independent Variable: The hypothesized cause
• Dependent Variable: The hypothesized effect
• Interval Variable: Numerical variable where a difference in value is meaningful
• Nominal Variable: Categorical variable describing mutually exclusive, non-ordered groups
t-Test:
• Independent Variable: 1 nominal variable
• Dependent Variable: 1 interval variable
• Goal: Analyze difference between the means of 2 independent groups HIGH y iel d
• Example: Comparing BNP levels higher in people w/ and w/out HF
• Special Cases:
o Paired t-Test: Analyze difference between the means of 2 paired measurements
■ Example: Compare BNP levels in people w/ HF before and after treatment w/ an ACE inhibitor
ANOVA:
• Independent Variable: 1 nominal variable for a one-way ANOVA
• Dependent Variable: 1 interval variable
• Goal: Analyze difference between the means of 2+ independent groups
• Example: Comparing BNP levels in people w/out HF, people w/ L-sided HF, and people w/ R-sided HF
• Special Cases:
o Two-way ANOVA: 2 nominal independent variables
■ Example: Compare BNP levels in people w/out HF, w/ R-sided HF, and w/ L-sided HF, and people w/ and w/out high cholesterol
Chi-Square Test:
• Independent Variable: 1 nominal variable
• Dependent Variable: 1 nominal variable
• Goal: Analyze difference between percentages or proportions of a nominal variable in any number of groups HIGH YIELD
• Example: Comparing the proportion of people w/out HF, people w/ L-sided HF, and people w/ R-sided HF that have high cholesterol
Biostatistics: Statistical Testing Bootcamp.com

Common Types of Statistical Tests - Interval Independent Variables


Overview:
• Different tests are appropriate for different types of data
• Independent Variable: The hypothesized cause . • ’
• Dependent Variable: The hypothesized effect Z ,
• Interval Variable: Numerical variable where a difference in value is meaningful s
Regression Analysis: I
• Independent Variable: 1 interval variable for a simple regression
• Dependent Variable: 1 interval variable
• Goal: Predict a linear cause-and-effect relationship between the independent & dependent variables ---------------------------
• Example: Predict the effect of BNP values on longevity bn p levels
• Special Cases:
o Multiple Linear Regression: >1 interval independent variable
■ Example: Predict the effects of BNP and cholesterol values on longevity
Pearson’s Correlation Coefficient:
• No hypothesized cause and effect relationship
• Goal: Measure strength of association between two variables .
• Calculates an r value between -1 and 1 | ’ • •
o If r < 0, negative correlation h ig h y iel d s >
■ r values closer to -1 represent f - association
o If r = 0, no correlation . .
-------------------------
Variable 1
o If r > 0, positive correlation HIGH y iel d
■ r values closer to 1 represent f + association
• Example: The r for BNP levels and cholesterol levels is 0.6 . -

Variable 1 Variable 1 Variable 1


Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1011 Previous Next

A research team hypothesizes that childhood exposure to pets decreases the chances of developing seasonal allergies later in
life. After conducting a power analysis to determine the necessary sample size for a statistical power of 80%, they recruit 1,300
people with seasonal allergies and 1,300 people without seasonal allergies and determine whether or not the participants had
pets in their home between the ages of 0 and 5. They find that 274 of the people with allergies had childhood pets, and 318 of
the people without allergies had childhood pets.

Which of the following statistical tests would be most appropriate for analyzing this dataset?

A. Regression analysis
B. Chi-square test
C. Paired t-test
D. AN OVA
E. Correlation coefficient
Item 2 of 2 Test Your Knowledge
■ Mark Difficulty: QOO Bootcamp.com
Question ID: 1011 Previous Next

A research team hypothesizes that childhood exposure to pets decreases the chances of developing seasonal allergies later in
life. After conducting a power analysis to determine the necessary sample size for a statistical power of 80%, they recruit 1,300
people with seasonal allergies and 1,300 people without seasonal allergies and determine whether or not the participants had
pets in their home between the ages of 0 and 5. They find that 274 of the people with allergies had childhood pets, and 318 of
the people without allergies had childhood pets.

A chi-square analysis of the data using an cr level of 0.05 gives a p-value of 0.04. Assuming there truly is an association
between the two variables, which of the following statements is the most accurate interpretation of these results?

A. The probability of making a type II error is 5%


B. Because people without allergies were more likely to have pets, the null hypothesis can be rejected
C. There is a 4% chance of getting these same results due to chance
D. These results are not statistically significant
E. The probability of concluding there is no association was 20%
Biostatistics: Statistical Testing Bootcamp.com

True Population Examples of Possible Samples


Confidence Intervals
Overview:
• Research studies can determine a summary value only for one sample
• Confidence Intervals: Range that likely includes the true population mean
., No HF HF

.,
C.
0

Calculating Confidence Intervals: HIGH YIELD


• x
Confidence Interval = ± Z(SE)
0
.;
o x= Summary value for the sample (mean, OR, RR)
o Z = # of standard deviations a value is from the mean
■ Fixed value for a specific confidence level
• For a 90% confidence level, Z = 1.65 : BNP levels
__,__._
:
• For a 95% confidence level, Z = 1.96 HIGH YIELD
• For a 99% confidence level, Z = 2.58
• Confidence level = 1 - a
o SE = Standard Error = Standard Deviation / ✓n
• Example: The mean BNP value in a sample of 500 people was 75 pg/ml, and the SD was 12.3.
o The 95% Cl is 73.9 to 76.1 pg/ml
Interpreting Confidence Intervals:
• For a 95% confidence interval:
o 95% of independently run experiments will yield a summary statistic w/in the Cl
• For difference between means: Fail to reject H 0 if range includes 0
• For an odds ratio or relative risk: Fail to reject H 0 if range includes 1
• For the means of two samples:
o Reject H 0 if the ranges do not overlap
o Usually fail to reject H 0 if the ranges do overlap
Meta-Analysis:
• Analyzes summary statistics from multiple studies
• Improves power and generalizability
Item 1 of 2 Test Your Knowledge
■ Mark Difficulty: OO Bootcamp.com
Question ID: 1012 Previous Next

A researcher is conducting a study to determine the effectiveness of a new antihypertensive medication. 36 participants are
recruited for the study and randomly divided into Groups 1 and 2. Group 1 participants take a placebo for one month, then take
no medication for two weeks, then take the new medication for one month. Group 2 participants take the new medication for
one month, then take a placebo for one month following the two week washout period. The patients are unaware of their group
assignments. The researcher then subtracts each participant’s systolic blood pressure while taking the new medication from
their systolic blood pressure while taking the placebo. The average difference in the systolic blood pressure is 15 mmHg, and
the standard deviation is 27.4 mmHg.

Which of the following statements best describes the 95% confidence interval for these results?

A. 95% of the participants had a decrease in systolic blood pressure between 6 mmHg and 24 mmHg
B. Raising the confidence level to 99% would increase the interval width
C. Since the confidence interval does not include 0, the effect of the medication is not statistically significant
D. The average difference is expected to fall between 10.4 and 19.6 mmHg in 95% of independent samples
E. A confidence interval cannot be determined for this type of study
Item 2 of 2 Test Your Knowledge
■ Mark Difficulty: O Bootcamp.com
Question ID: 1012 Previous Next

A researcher is conducting a study to determine the effectiveness of a new antihypertensive medication. 36 participants are
recruited for the study and randomly divided into Groups 1 and 2. Group 1 participants take a placebo for one month, then take
no medication for two weeks, then take the new medication for one month. Group 2 participants take the new medication for
one month, then take no medication for two weeks, then take a placebo for one month. The patients are unaware of their
group assignments. The researcher then calculates the difference between each participant’s systolic blood pressure while
taking the new medication and while taking the placebo. The average difference in the systolic blood pressure is 15 mmHg,
and the standard deviation is 27.4 mmHg.

Similar studies were conducted at 5 other hospitals, and the results were pooled for a meta-analysis. Which of the following
statements is the most appropriate conclusion to draw from the meta-analysis?

A. The sample from hospital 4 has a mean of 25


B. Only one of these studies fails to reject the null hypothesis
C. The sample from hospital 3 has the largest standard error
D. Hospital 5 had the largest sample size
Hospital 1
E. The sample from hospital 6 has the smallest effect size Hospital 2
Hospital 3
Hospital 4
Hospital 5
Hospital 6 ----- ch-----
I I I I I I I I I I I I
-10 -5 0 5 10 15 20 25 30 35 40 45

SBPpiacebo SBPfreat merit


Biostatistics: Research Study Designs Bootcamp.com

References
Observational and Experimental Studies
• Created with BioRender.com
Descriptive Studies
• Created with BioRender.com
Longitudinal Studies
• Created with BioRender.com
Heritability Studies Family Tree
• Created with BioRender.com
Experimental Study Design
• Created with BioRender.com
Biostatistics: Bias and Study Errors Bootcamp.com

References
Types of Study Errors
• Created with BioRender.com
Recruitment Bias
• Created with BioRender.com
Interpretation Bias
• Created with BioRender.com
Biostatistics: Risk Quantification Bootcamp.com

References
Morbidity Frequency Measures
• Created with BioRender.com
Interpreting Prevalence and Incidence
• Created with BioRender.com
Mortality Frequency Measures
• Created with BioRender.com
Relative Risk
• Created with BioRender.com
Odds Ratio
• Created with BioRender.com
Additional Calculations with Relative Risk
• Created with BioRender.com
Biostatistics: Diagnostic Tests Bootcamp.com

References
Sensitivity and Specificity
• Created with BioRender.com
True and False Positives and Negatives
• Created with BioRender.com
Cut-off Values
• Created with BioRender.com
Biostatistics: Statistical Distributions Bootcamp.com

References
Measures of Central Tendency
• Created with BioRender.com
Measures of Dispersion
• Created with BioRender.com
Normal and Non-normal Distributions
• Created with BioRender.com
Biostatistics: Statistical Testing Bootcamp.com

References
Hypothesis Testing
• Created with BioRender.com
Correlation Coefficient
• Created with BioRender.com
Confidence Intervals
• Created with BioRender.com

You might also like