Professional Documents
Culture Documents
Epidemiology and Biostatistics PDF
Epidemiology and Biostatistics PDF
BIOSTATISTICS
REVIEW, PART I
Tommy Byrd MSII
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
Know the 4 scales of data measurement
Nominal
Ordinal
Interval
Ratio
Nominal scale data are divided into
qualitative categories or groups
Male Female
Black White
Suburban Rural
Ordinal scale data has an order
Class rankings data (1st / 2nd / 3rd)
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
Many naturally occurring phenomena are
distributed in the bell-shaped normal or
Gaussian distribution
Score
Mode
Median
Mean
Score
Mode is the value that occurs with the
greatest frequency
2 4 5 7 4 2 3 6 8 9 7 5 4 4 2 4 6 7 7 7
Bimodal
distribution!
2 3 4 5 6 7 8 9
Median is the value that divides the
distribution in half
Odd # total elements: the median is the middle one
Even # total elements: the median is the average of the
two middle ones
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
Normal distributions with identical
measures of central tendency can have
different variabilities
Variability = the extent to which their scores are clustered
together or scattered about
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it
again.
So, out of a
class of 100,
A) 9-11
Therefore,
assuming the
about how
many people
B) 2-3
got an A?
of the test (assume
scores was 10 extra credit
C) 14-16
points, we can
assume the
was possible)
D) 4-6
following:
E) 19-21
Grade (%)
The z score is simply how many standard
deviations the element lies above or
below the mean
A table of z scores
compares the z score to
the Area beyond Z
65 85
z = 0.5 z = + 1.5
Grade (%)
The z score is simply how many standard
deviations the element lies above or
below the mean
A table of z scores
compares the z score to
the Area beyond Z
~7
people
here
Therefore the z score can be used to
specify probability
We know that 6.7% of the class
has a grade above 85%, so the
probability of one randomly
selected person from this
population having a grade above
85% is 6.7%, or 0.067
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
What if we dont know every single
persons score on the test?
But, through some stealthy looking-over-shoulders while
people check their online test scores, we can get a
sample of random scores
How close to the actual class average will our sample be?
n = the size of
each sample
0% 70% 100%
The standard error of the mean (SEM)
is the standard deviation over the square
root of the sample size
SEM = /n
Recall that the SEM = 10/1 = 10
standard
deviation () of
this test was 10 SEM = 10/4 = 5
percentage
points
SEM = 10/7 = 3.8
SEM = /n
Similar to P values !
For USMLE
purposes,
consider
degrees of
freedom
(df) to
equal n-1
So what do
we do with
all this? t = the number of estimated standard
errors away from the sample mean
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
There are 7 steps in hypothesis testing
1) State the null and alternative hypothesis, H0 and HA
H0 = no difference
HA = there is a difference
2) Select the decision criterion (level of significance)
3) Establish the critical values of t
4) Draw a random sample, find its mean
5) Calculate the standard deviation of the sample (S) and
find the estimated standard error of the sample
6) Calculate the value of the test statistic t that
corresponds to the mean of the sample (tcalc)
7) Compare the calculated value of t with the critical
values of t, then accept or reject the null hypothesis
Step 1: State the null and alternative
hypotheses
We want to test Julia Silvas claim: Because of Tommy
and Danielles amazing biostats presentation, the average
Step 1 score of our class will be 260
Null hypothesis = The mean score is 260
Alternative hypothesis = The mean score is not 260
Sample size
(n) = 10
students, so
df = 9
So tcrit = 2.262
Step 4: Draw a random sample and
calculate the mean of the sample
284 234 268 254 246 264 266 265 245 244
Average = 257
Step 5: Calculate standard deviation and
estimated standard error of the sample
3 / 4.747 = 0.632
Step 7: Compare t-values and be very
concerned that Julia Silva is a psychic
Our calculated t-value (same thing as t-score) is 0.632
Our critical t-value is 2.262
t=0
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
Error types indicate that you accepted the
wrong hypothesis
Type I Error Type II Error
False-positive error False-negative error
You accept the alternative You fail to reject the null
hypothesis when there is no hypothesis when there
difference actually is a difference
Also known as alpha ()
Also known as error
error yes, this is
referring to the we just is the probability of
talked about making a type II error
The p-value is the
probability of making a
type I error
A study with greater power has less
type II () error
The power of a statistical test = 1
The power represents the probability of rejecting the null
hypothesis when it is in fact false (vs. accepting it in
error); we want this to happen!
Conventionally, a study is required to have a power of 0.8
(or a of 0.2) to be acceptable
Power increases as increases trade off
High-yield point: Increasing the sample size is the
most practical and important way of increasing the
power of a statistical test
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
Nonexperimental (descriptive or analytic)
study designs Cohort studies
Group without disease are selected and followed for an
extended period
Some members may have already been exposed to risk
factor
Exception: Inception Cohorts follow those recently
diagnosed to track progression
Can estimate incidence
Not good for rare diseases
Historical cohort study = retrospective cohort study
Nonexperimental (descriptive or analytic)
study designs Case-control studies
All are retrospective
Compare people who do have the disease (the cases) w/
otherwise similar people who do not have the disease
Start w/ outcome then LOOK BACK into the past for
possible independent variables that may have caused the
disease
Cheap, good for rare or that take a long time to develop
Nonexperimental (descriptive or analytic)
study designs Case-series studies
Essentially a series of case reports that may link disease
to exposure, but NOT controlled, as in case-control (no
group w/o the disease compared to)
Eg. Kaposiss sarcoma
Nonexperimental (descriptive or analytic)
study designs Prevalence survey
Survey (snap shot) of a whole population, also asks
about risk factors individually
Prevalence ratio = the prevalence of a disease in people
who have and have not been exposed to a risk factor
Likely to overrepresent chronic diseases and
underrepresent acute diseases
Nonexperimental (descriptive or analytic)
study designs Ecological studies
Check non-individual info (eg. study of the rate of
diabetes in countries with different levels of automobile
ownership)
May be experimental:
Community intervention trials
Experimental group consists of an entire community, while the control
group is an otherwise similar community that is not subject to any kind
of intervention
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf
Bias occurs from systemic (rather than
random) errors when one outcome is
systematically favored over another
(Magazine
subscribers in
great
e depression)
What is th
difference The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file
again. If the red x still appears, you may have to delete the image and then insert it again.
between
e l e c ti o n bias
s
ling
and samp
bias? The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and
(Referral bias)
then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Bias occurs from systemic (rather than
random) errors when one outcome is
systematically favored over another
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then
open the file again. If the red x still appears, you may have to delete the image and then insert it again.
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then
open the file again. If the red x still appears, you may have to delete the image and then insert it again. (Putting all whites in drug
group and blacks in
control group for treating a
racially selective disease)
Race = confounding
variable
Bias occurs from systemic (rather than
random) errors when one outcome is
systematically favored over another
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then
open the file again. If the red x still appears, you may have to delete the image and then insert it again.
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the
file again. If the red x still appears, you may have to delete the image and then insert it again.
The image cannot be displayed. Your computer may not have enough memory to open the image,
or the image may have been corrupted. Restart your computer, and then open the file again. If the
red x still appears, you may have to delete the image and then insert it again.
http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf