Professional Documents
Culture Documents
Independent / unrelated
groups
2 groups 2 groups
Independent t-test Mann-Whitney U-test
≥ 3 groups ≥ 3 groups
One-factor ANOVA Kruskal-Wallis test
Example
• Is the average age of patients who receive
best supportive care significantly different
from chemotherapy patients?
40 Deviation
Interquartile 12.0
20
Range
0
0 20 40 60 80 100
Age at Diagnosis
20 Deviation
15
Interquartile 12.0
10
5
Range
0
0 20 40 60 80 100
Age at Diagnosis
Null hypothesis (H0):
Average age is the same in the two groups
T-test; p=0.02
Comparing groups
Numerical data
Independent / unrelated
groups
2 groups 2 groups
Independent t-test Mann-Whitney U-test
≥ 3 groups ≥ 3 groups
One-factor ANOVA Kruskal-Wallis test
Numerical data
Dependent/Related
groups
Not Normally
Normally distributed
Distributed
2 groups 2 groups
Paired t-test Wilcoxon signed rank test
≥ 3 groups
≥ 3 groups
Repeated Measures
ANOVA Friedman test
Comparing 2 groups: Paired data
20
time (mins)
15
10
0
Manual Contour Modified Contour
Null hypothesis (H0): The average change
between manual and modified is 0
Independent / unrelated
Related groups
groups
Chi-square test
Fisher’s exact test McNemar’s test
Linear Trend test
Comparing 2 groups with
a categorical outcome
Group 1 = female
Group 2 = male
Group 1 = female
Group 2 = male
Chi-square statistic =
2 2 2 2
386−375.9 500−510.1 457−467.1 644−633.9
+ + +
375.9 510.1 467.1 633.9
= 0.85 ; p=0.36
Chi-square test
Independent / unrelated
Related groups
groups
Chi-square test
Fisher’s exact test McNemar’s test
Linear trend test
Non numerical / categorical data
Comparing dependent/related
groups
Comparing paired groups with a
categorical outcome
ie. 785 vs 75
1000 cancer patients diagnosed via screening or presenting
with symptoms were classified according to their ER status
(positive vs negative).
What is the most appropriate test to use to see whether those
diagnosed via symptoms are more likely to be ER negative
than those diagnosed via screening?
120
100
Creatinine
80
60
40
20
0
0 20 40 60 80 100 120
Digoxin
Correlation
Pearson correlation:
• r = 0 no relationship
• r = 1 perfect +ve relationship
• r = -1 perfect –ve relationship
Pearson's correlation coefficient
y
-0.5
3 2
2
-1.0
2
0 -1.5
0 1
-1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2
x x x x
-2 0
-2 1
-2
y
y
-4 -4 -4 0
-6 -6
-6 -1
-8 -8
-2
-1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2
x x x x
Pearson's correlation coefficient
Sensitive to outliers
0.85 0.32
30
8
25
6 20
y
y
15
4
10
2
5
0 0
-1 0 1 2 -1 0 1 2
x x
Correlation
Spearman correlation:
Eg.
Correlation between pork consumption and cirrhosis
mortality r = 0.40 (over 16 countries)
Relationships
Continuous Data
Normally
Not Normally
distributed
Distributed
Pearson’s Spearman’s
correlation correlation
(2 variables) (2 variables)
In a study looking at the relationship between pre-
treatment HADs anxiety score and post treatment
satisfaction score (SAT) were obtained on 130 women.
The Pearson correlation was -0.65; p<0.001.
What is the single best statement?
120
100
Creatinine
80
60
40
20
0
0 20 40 60 80 100 120
Digoxin
Simple linear regression equation
Used to determine the exact linear relationship
between two variables x and y
y=b+ mx
b = intercept of line on y-axis (value of y when x is zero)
m = slope or gradient of line
x= independent variable
y = dependent variable
Linear Regression
Creatinine vs. digoxin
Creatinine = b + m digoxin
Eg.
Creatinine = b + m1 digoxin + m2 age
Multiple linear regression
Eg.
“Is there a relationship between cholesterol and
age after adjusting for BMI?”
Linear Regression
Logistic Regression
Multiple linear regression
Multiple logistic regression
Logistic regression
Eg.
Cancer
5,747 963 6,710
Non-Cancer
174,310 1,653,760 1,828,070
Total 180,057
1,654,723 1,834,780
NCI-funded Breast Cancer Surveillance Consortium (HHSN261201100031C). Downloaded 09/01/2015 from the Breast Cancer Surveillance
Consortium Web site - http://breastscreening.cancer.gov/statistics/benchmarks/screening/2009/table7.html .
Abnormal Normal Total
Sensitivity:
The percentage of the test being positive among the
people who really have the disease.
= 5,747 / 6,710 = 86%
Abnormal Normal Total
Specificity:
The percentage of the test being negative among the
people who really don’t have the disease.
= 1,653,760/1,828,070 = 90%
Abnormal Normal Total
Total
Prevalence of disease
𝑃𝑃𝑉 =
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒+(1−𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦)×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
Relationship between NPV, sensitivity,
specificity and prevalence
𝑁𝑃𝑉 =
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 ×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
1−𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒+𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
To assess the effectiveness of a new ultrasound screening test, 100
women with abnormal mammogram results were screened. 25
women were later found, by pathology, to have cancer. 20 of these
25 women had a positive ultrasound result. 65 of those without
cancer had a negative ultrasound.
Follow-up results for up to 84 months were reported for a series of 323 men with prostate
cancer
treated by brachytherapy (BXT). At entry, a Prognostic Index was calculated for each man,
and each classed as low, intermediate or high risk.
The table below gives summary statistics comparing the three risk groups for age and the
outcome measure which was the proportion with PSA £0.2 ng/ml at five years.
What is the most appropriate statistical test to compare age by the level of risk ?
(a) The associated hypothesis test has a null hypothesis that the correlation is
equal to one.
(b) It reflects the magnitude of the association for linear and non-linear
relationships between two numerical variables.
(c) A Pearson's correlation coefficient of zero indicates there is no linear
association between the two variables.
(d) A Pearson's correlation coefficient of positive one indicates there is no linear
association between the two variables.
(e) A Pearson's correlation coefficient of negative one indicates there is a non-
linear relationship between the two variables.
Question 8