You are on page 1of 10

STATISTICAL TESTs ASSUMPTIONS HYPOTHESIS VARIABLES NATURE notes

TOOL
One Sample T- Stat. diff bet. Sample mean  Continuous data parametric reported as (APA): t(21)=1.042, p=.309
test and… (interval/ratio)
o Known & hypothesized  Normal probability t : indicates that we are comparing to a t-
value of pop. Mean distribution distribution (t-test)
o Sample midpoint of test var.  Simple random 21 : indicates the degrees of freedom, which is N-1
sample m0 – comparison value 1.042 : indicates the obtained value of the t-
o Test var. & chance
Stat diff. bet. Change score &  DV has no outliers statistic (obtained t-value)
zero  Obs. are p=.309 : Indicates the probability of obtaining the
independent observed t-value if the null hypothesis is correct.
Paired Test of only 2 means from  DV is continuous  When one or more assumptions are not met use
Samples t Test same or related units  Related samples & Wilcoxon Signed-Ranks Test
o 2 time points grps  For assump. Related to normality & outliers, use
H0: µ1 = µ2 ("the paired
o 2 conditions  Random sample Dependent -measured var that represents the diff. bet the paired val—
population means are equal")
o 2 measurements  Approx.. normal at two different times not the og val
H1: µ1 ≠ µ2 ("the paired
o Matched pair distribution or for two related
population means are not
 No outliers conditions or units
equal")
Var. is a
or
representation of the
H0: µ1 - µ2 = 0 ("the difference
diff. bet. Paried values
between the paired population
Independent Test means of 2 (& only two)  Unpaired means DV  Run ANOVA if more than 2 grps
means is equal to 0")
Samples t Test independent grps  DV is continuous IV  If assumptions are not met, use Mann-Whitney U
H1: µ1 - µ2 ≠ 0 ("the difference
 IV is categorical test
between the paired population
 Cases have both IV  Use Welch t Test when equal var among pop. can’t
means is not 0")
& DV be assumed
 Independent  run Levene’s test to check homogeneity of var
samples & grps
One-way Test of difference of means of  Random samples H0: µ1 = µ2 = µ3 = ... = µk ("all k o If grp var has only 2 grps, results would be
ANOVA 2 or more groups  Normal dist population means are equal") equivalent to Independent Samples t Test. To
 Groups  Homogeneity of var. H1: At least one µi different ("at confirm: t2=F
 Interventions  No outliers least one of the k population o an omnibus (Latin for “all”) test because the F
 Change scores means is not equal to the test indicates whether the model is significant
others") overall
where µi is the population o if homogeneity of var is violated:
mean of the ith group (i = 1, 2, ...,  use alt. stat that don’t assume eq. var among
k) populations (e.g. Browne-Forsythe or Welch
statistics)

Mikah1280
 results not trustworthy for post hoc
 if var is unequal: use Dunnett’s C (or other post
hoc that don’t assume equal var)

POST HOC Test Approaches for Pairwise Comparisons with ANOVA Designs
o Family Wise Error –
Scheffe test Fisher LSD Test Tukey a (Tukey’s Dunnet test Games-Howell Test
probability of any
- computes a new - Least Significant Difference HSD) - similar to - used with variances are unequal (see Unequal
one set of sig. test is
critical value for an F - if an omnibus test is - calculates a new the Tukey Variances below) and takes into account
a Type I error
test conducted when conducted and is significant, critical value that test unequal group sizes
o Bonferroni - control
comparing two groups the null hypothesis is can be used to - only used if - do better than the Tukey HSD if variances are
the familywise error
from the larger incorrect. (If the omnibus test evaluate whether a set of very unequal
rate
ANOVA is nonsignificant, no post hoc differences comparisons
o Modified Bonferroni
tests are conducted.) between any two are being
Approach – greater
pairs of means are made to one
power
significant particular
- has greater power group
than the other -
tests
Two-way evaluate simultaneously the
ANOVA effect of two grouping
variables (A and B) on a
response variable
Chi-square Test for an association  2 categorical var. H0: "[Variable 1] is independent Nonparametric -only compare categorical var.
Test for between categorical variables  2 or more categories of [Variable 2]" - only assesses associations between categorical
Independence for each var. H1: "[Variable 1] is not variables and cannot provide any inferences about
-Uses contingency table to  Independence of independent of [Variable 2]" causation.
analyze the data obs. OR
- an arrangement in which  Rel. large sample H0: "[Variable 1] is not
data is classified according size associated with [Variable 2]"
to two categorical  V1 & v2– at least H1: "[Variable 1] is associated
variables. The categories Nominal with [Variable 2]"
for one variable appear in  simple random
the rows, and the sampling.
categories for the other  If sample data are
variable appear in displayed in a
columns. contingency table,
the expected
frequency count for
Mikah1280
each cell of the table
is at least 5
Wilcoxon Compare paired means for  Paired means
Signed-Ranks continuous data  Continuous data
Test  Not normally
distributed
 Ranked data
Pearson -Test of relationship  2 or more Two-tailed significance test: Correlation can take on any value in the range [-1,
Correlation Bivariate tests continuous var H0: ρ = 0 ("the population 1]
- measures the strength and  Cases that have correlation coefficient is 0; there - sign indicates the direction of the relationship
direction of linear values on both var is no association") H1: ρ ≠ 0 o -1 : perfectly negative linear relationship
relationships between:  Linear rel. bet. Var. ("the population correlation o 0 : no relationship
- pairs of continuous  Independent cases coefficient is not 0; a nonzero o +1 : perfectly positive linear relationship
variables producing a  Bivariate normality correlation could exist") One- -magnitude of the correlation (how close it is to
sample correlation  Random sampling of tailed significance test: -1 or +1) indicates the strength
coefficient, r data H0: ρ = 0 ("the population o .1 < | r | < .3 … small / weak correlation
Parametric for
- pairs of variables in the  No outliers correlation coefficient is 0; there o .3 < | r | < .5 … medium / moderate
r
population, represented by is no association") H1: ρ > 0 correlation
Nonparametric
a population correlation ("the population correlation o .5 < | r | ……… large / strong correlation
for ρ
coefficient, ρ (“rho”). coefficient is greater than 0; a
positive correlation could exist")
OR
H1: ρ < 0 ("the population
correlation coefficient is less
than 0; a negative correlation
could exist") where ρ is the
population correlation
coefficient

Statistical Package for Social Sciences (SPSS) Data View


Variable View
Use:  Features or attributes of var are defined
1. Stat analysis  Deets of var names, types & labels are stored
2. Manipulate data  Row = var
3. Generate tables & graphs  Column = feature (type like numeric, dot, etc; measure like scale, ordinal, nominal)
 Var Names
SPSS IBM screens:
Mikah1280
 Alphabet charac Target Population
 Up to 64 charac (letters, #, non-punc symbols) long population for which you would like to make some conclusions
 Shouldn’t end in underscore or full stop
 No spaces, use underscore Fixed population
 %, >, and punctuation marks can’t be used relatively permanent and perhaps defined by some event
 Var type
 Numeric – number var Dynamic population
 String membership is not necessarily permanent
 Combo of letters only or w/ #
 Comma/dot for large numeric symbols Ratio
 scientific notation, date, dollar, custom currency and restricted numeric -a number that is obtained by dividing one number by another.
 Width & decimals -doesn't necessarily imply any particular relationship between the numerator and the
 Width of var denominator
 number of characters to be entered for the variable Proportion
 default setting is 8 characters - type of ratio that relates a part to a whole
 Decimals - often expressed as a percentage (%)
 2 places by default Rate
 categorical variables: no decimal places are required - type of ratio in which the denominator also takes into account another dimension, usually time
 continuous variables: the number of decimal places must be the same as the
number that the measurement was collected Counts of Diseased People
 Labels essential to detecting trends or the sudden occurrence of a problem, such as an epidemic
 Name , describe , & ID var
 Output from SPSS will list var label Prevalence
 Keep short as possible - A fundamental measure
 Values - proportion of the population that has disease at a particular time
 Used to assign labels to a var - a way of assessing the overall burden of disease in the population
 Categorical or nominal - can assess the frequency of behaviors or characteristics that might be risk factors for disease
 Missing - point prevalence, the proportion of the population at a 'point' in time
Blehk read na lng ang sa module ang last lines o point in time can be event rather than calendar time
- Period prevalence, broader time period (years range)

¿ of people disease at a point ∈time


V. EPIDEMIOLOGICAL MEASURES Prevalence=
No . of people∈the study populatio n
Measures of Disease Frequency
Incidence
Population - A fundamental measure
group of people with some common characteristic - measure of occurrence of new cases of disease (or other outcomes) during a span of time
- related measures: incidence proportion (cumulative incidence) & incidence rate
Risks
Mikah1280
- probability that an individual with certain characteristics such: age, race, sex, will experience
a health status change over a specified follow-up period (i.e. risk period) - P= proportion of the population with the disease
- often used for prediction at the individual level - (1-P) is the proportion without it
Cumulative incidence - IR is the incidence rate
- the probability of developing disease over a stated period of time - Avg. Duration is the average time that people have the disease
- most common estimate of risk; must specify a time period - If the frequency of disease is rare, then the relationship can be expressed as follow:
- always a proportion (bound bet. 0 & 1)
- assumes fixed or closed cohort (no exits allowed) Prevalence = (Incidence Rate) x (Average Duration of Disease)
- for brief specified time periods (e.g. outbreaks aka Attack rate)
- attrition is big prob Then
- doesn’t allow to follow subj for diff time periods ( prevalence )
- In real life, one must deal with losses, competing risks, attrition, dynamic cohorts, and Ave . Duration=
( incidence)
differential follow-up time!!
Special measures of Incidence
¿ of new cases∈a specified perio d Morbidity Rate
C . incidence=
¿ of pips(at risk)∈the study populatio n incidence of non-fatal cases of a disease in a population during a specified time period.
- a cumulative incidence and therefore is really a proportion, not a true rate
Incidence rate Mortality Rate
- Measure of the number of cases (“incidence”) per unit of time (“rate”) Case-Fatality Rate
- Denominator is a combo of # of people & amount of time: person-time number of deaths from a specific disease divided by the total number of cases of that
- Time units can be in days, months, or years, but should be tied to study length and aid disease
interpretation of the results Attack Rate
- "person-years": most common Live Birth Rate
Infant Mortality Rate
¿ of new casesof disease
I . rate=
Total observation time∈a group at ris k Special Prevalence Measures
often incorrectly referred to as incidences or rates, but they are, in fact, proportions.
Units for Denominators Autopsy Rate
all three measures of disease frequency (prevalence, cumulative incidence, and incidence rate) Birth Defect Rate
are expressed as some multiple of 10 in order to facilitate comparisons.
Measures of Association
Relationship of prevalence, incidence rate, and average duration of disease Options for comparing disease frequencies
- If the population is initially in a "steady state," meaning that prevalence is fairly constant Fundamental Methods for comparing disease freq:
and incidence and outflow [cure and death] are about equal), then the relationship among 1. Ratio calculation of 2 measure of disease freq. ( by division)
these three parameters can be described mathematically as: 2. Calculate difference between 2 measures by subtraction

P 1. Measures of disease frequency can be compared by calculating their ratio. Common terms
=IR × Ave . Duration
(1−P) to describe these ratios are: ▪ risk ratio ▪ rate ratio ▪ relative risk ▪ relative rate

Mikah1280
CI e exposed grp • Risk difference, i.e., absolute risk, i.e., provides a measure of the public health impact of
- Risk Ratio= = the risk factor, and focuses on the number of cases that could potentially be prevented by
CI u unexposed grp
eliminating the risk factor
o CI – cumulative incidence
o If the risk ratio is 1 (or close to 1), it suggests no difference or little Attributable proportion allows you to calculate the proportion of disease in the exposed
difference in risk (incidence in each group is the same). group that can be attributed to the exposure. This can also be looked at as the proportion of
o A risk ratio > 1 suggests an increased risk of that outcome in the exposed disease in the exposed group that could be prevented by eliminating the risk factor.
group. - Deals with exposure that increases the risk of a disease
o A risk ratio < 1 suggests a reduced risk in the exposed group. RD CI e −CI u
Attributable Proportion= =
o If you are interpreting a risk ratio, you will always be correct by saying: CI e CI e
"Those who had (name the exposure) had RR 'times the risk' compared to
Can also be expressed as a percentage, just multiply to 100
those who (did not have the exposure)." Or "The risk of (name the disease) An alt. formula is actual CI or IR is not available:
among those who (name the exposure) was RR 'times as high as' the risk of
(name the disease) among those who did not (name the exposure). ( RR−1)
Attributable Proportion=
RR
IR e exposed grp
- Rate Ratio= =
IRu unexposed grp The “Preventive Fraction”
- dealing with an exposure that reduces the risk of disease
closely related to risk ratios but computed as incidence rate (IR) in an
o
exposed group divided by the incidence rate in an unexposed (or less CI e −CI u
exposed) comparison group Preventive Fraction=
CI u
o often interpreted as if they were risk ratios, but more precise to refer to the
ratio of rates rather than risk
Reference group - It is a group against which we can compare the other exposure groups. The Population Attributable Faction
The reference group is usually (not always) the least exposed group - One can also compute the proportion or percentage of cases in the entire study population
that can be attributed to the exposure.
Measures of Potential Impact
- The attributable proportion for the entire population is the (incidence) risk in the overall
𝑹𝒊𝒔𝒌 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒄𝒆 (𝑹𝑫) = 𝐶𝐼𝑒 − 𝐶Iu
population that can be attributed to the exposure.
o focuses on absolute effect of the risk factor, or the excess risk of disease in those who
have the factor compared with those who don't. Population Attributable Fraction (PAF) = (proportion of cases exposed) x (proportion in the
𝑹𝒂𝒕𝒆 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒄𝒆 (𝑹𝑫) = 𝐼𝑅𝑒 − 𝐼Ru exposed)
o based on subtraction of incidence rates, so the units are retained.
This is the proportion (fraction) of all cases in the population that can be attributed to the exposure.
Another eq. method :

Perspective of Rel. Diff. (ratios) vs. Abs. Diff. P pop ×( RR−1)


• Relative risk, i.e., risk ratios, rate ratios, and odds ratios, provide a measure of the PAF=
strength of the association between a factor and a disease or outcome.
P pop ( RR−1 )+1

Mikah1280
where Ppop = the proportion of exposed subjects in the entire study population, and RR = the risk or rate ratio. This
formulation can be especially helpful if you have an estimate of the proportion of exposed subjects in a population Gold standard
from an external source, such as a population survey
- most accurate test for determining a disease
The Odds Ratio (OR) - currently preferred method for diagnosing a specific disease
(a/c) / (b/d) or ad/bc - basis for validation of newly designed test
On Risk Ratio:gives estimates that are increasing more extreme than the risk ratio would - $$$
- may be arbitrary and may change
have been. By more extreme:
OR > 1 : control is better than intervention. Exposure associates with higher odds of
outcome Validity
OR < 1 : intervention is better than control. Exposure associates with lower odds of - capability of a test to point out which people have a disease and which don’t
outcome - test accuracy
OR = 1 : exposure does not affect odds of outcome - 2 objective measure
o Sensitivity
Remember that in a cohort study you can calculate either a risk ratio or an odds ratio, but in a  Ability to ID correctly pips w/ disease
case-control study: you can only calculate an odds ratio.  Provides true positive rate
Case control studies: TP
 Sensitivity=
(RR−1) TP+ FN
o AP=
RR o Specificity
 ID correctly pips w/o disease
 Provided true negative rate
o AP=
(¿−1)
¿ TN
 Specificity=
TN + FP
o PAF=P e × AP=Pe ×
(¿−1)
¿ - ACCURATE TEST = HIGH SENSITIVITY & SPECIFICITY
 Pe the proportion of cases that have the exposure. - Sensitivity & Specificity relationship is inversely proportional
o See module p 82-83 for net specificity & net sensitivity understanding

Cut-off value
VI. ASSESSING THE VALIDITY AND RELIABILITY OF DIAGNOSTIC AND SCREENING TESTS o value which determines the limit between positive and negative test results

Testing for disease results: screening test must be highly sensitive, qualitative
+ POSITIVE (sick) or – NEGATIVE (healthy) confirmatory test must be highly specific, quantitative
 True Positive (TP) – correct diagnosis of sick
 False Positive (FP)– healthy wrongly IDed sick
 True Negative (TN) – healthy IDed healthy
 False Negative (FN) – sick IDed healthy

Diagnostic Tests
- All procedures performed to confirm or determine the presence of a disease.
Mikah1280
Use of Multiple Tests
 Sequential (Two-stage) Testing
o less expensive, less invasive, or less uncomfortable test is generally performed firsts
o See module p 82-83 for net specificity & net sensitivity understanding

Basis for Simultaneous & Sequential Testing


 Objectives
o Screening?
o Diagnostic?
 Considerations on setting for testing
o Length of hospy stay
o Cost
o Degree of invasiveness per test
o Extent of 3rd-partya insurance coverage
*Simultaneous – multiple tests are done in one go
- if positive in any one test, subject is considered positive
- gain in net sensitivity
*Sequential - retest those who tested positive on the first test, there is a loss in net
sensitivity and a gain in net specificity

Predictive Value of a Test


“If the test results are positive/negative in this patient, what is the probability that
this patient has/ don’t have the disease?”

TP
Positive predictive value(PPV )=
TP+ FP

TN
Negative predictive value(NPV )=
TN + FN

 Higher disease prevalence = higher predictive value


o Screening programs are most productive & efficient if directed to high-risk pop.

Mikah1280
Reliability (Repeatability) of Tests  in which at least one of the observations in each pair was positive, the percent
agreement is
Factors that contribute to variation bet. tests: a
 Intrasubject variation  Percent agreement= ×100
a+b +c
o postprandially or postexercise, at home or in a physician’s office
o consider the conditions under which the test was performed, including the time of the Kappa Statistic
day
 Intraobserver variation ( Percent agreement )−(Percent agreement expected by chance alone )
o Occurs between two or more readings of the same test results made by the same Kappa=
100 %−(Percent agreement expected by chance alone)
observer
o Tests and examinations differ in the degree to which subjective factors enter into
Understanding the above Kappa: a color guide
the observer’s conclusions  how much better is the agreement between the observers’ reading than would be
o greater the subjective element in the reading, likely the greater the intraobserver
expected by chance alone?
variation is too  What is the most that the two observers could have improved their agreement over the
 Interobserver variation agreement that would be expected by chance alone?
o Variation between observers – such that two examiners often do not derive the
 The max agreement between them would be a 100%
same result  kappa quantifies the extent to which the observed agreement that the observers
o The extent to which observers agree or disagree is an important issue, therefore,
achieved exceeds that which would be expected by chance alone and expresses it as the
express the extent of agreement in quantitative terms: proportion of the maximum improvement that could occur beyond the agreement
 Percent agreement
expected by chance alone.

SUMMARY OF FORMULAS exclude the ones found on this page

¿ of people disease at a point ∈time


Prevalence=
No . of people∈the study populatio n
add thenumbers∈all of the cells ∈which both parties agree
Percent agreement= ×100
Total readings
¿ of new cases∈a specified perio d
C . incidence=
¿ of pips(at risk)∈the study populatio n

¿ of new cases of disease


I . rate=
Total observation time∈a group at ris k

Relationship of prevalence, incidence rate, and average duration of disease


 Above paired observation table shows that d was labeled negative by both observers so P
disregard it =IR × Ave . Duration
(1−P)

Mikah1280
TP
Sensitivity=
TP+ FN
CI e exposed grp
Risk Ratio= =
CI u unexposed grp TN
Specificity=
TN + FP
IR e exposed grp
Rate Ratio= = TP
IRu unexposed grp Positive predictive value( PPV )=
TP+ FP

𝑹𝒊𝒔𝒌 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒄𝒆 (𝑹𝑫) = 𝐶𝐼𝑒 − 𝐶Iu TN


Negative predictive value(NPV )=
TN + FN
𝑹𝒂𝒕𝒆 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒄𝒆 (𝑹𝑫) = 𝐼𝑅𝑒 − 𝐼Ru

RD CI e −CI u
Attributable Proportion( AP )= =
CI e CI e

( RR−1)
Attributable Proportion=
RR

CI e −CI u
Preventive Fraction=
CI u

Population Attributable Fraction (PAF) = (proportion of cases exposed) x (attributable


proportion in the exposed)

P pop ×( RR−1)
PAF=
P pop ( RR−1 )+1

(¿−1)
AP= ¿
(¿−1)
PAF=P e × AP=Pe × ¿

Mikah1280

You might also like