You are on page 1of 82

Comparing variables

Statistical Comparison Tests

What tests should be used?


Numerical data

Comparing independent groups


Comparing groups
Numerical data

Independent / unrelated
groups

Normally distributed Not Normally Distributed

2 groups 2 groups
Independent t-test Mann-Whitney U-test

≥ 3 groups ≥ 3 groups
One-factor ANOVA Kruskal-Wallis test
Example
• Is the average age of patients who receive
best supportive care significantly different
from chemotherapy patients?

• Or, could any difference seen be explained by


chance?
Comparing 2 independent groups
Chemotherapy patients
Mean 66.7
80
Median 67.0
60
Standard 9.2
Frequency

40 Deviation
Interquartile 12.0
20
Range
0
0 20 40 60 80 100
Age at Diagnosis

Best supportive care patients


Mean 75.2
35
30
Median 76.0
25 Standard 9.4
Frequency

20 Deviation
15
Interquartile 12.0
10
5
Range
0
0 20 40 60 80 100
Age at Diagnosis
Null hypothesis (H0):
Average age is the same in the two groups

Alternative hypothesis (H1):


Average age is different

If p > 0.05 we don’t reject the null hypothesis

If p < 0.05 we reject the null hypothesis: there is a


statistically significant difference between the
groups
Example
Is the average age of patients who receive best
supportive care significantly different from
chemotherapy patients?

Chemotherapy patients: mean age = 66.7


Best Supportive care patients: mean age = 75.2

T-test; p=0.02
Comparing groups
Numerical data

Independent / unrelated
groups

Normally distributed Not Normally Distributed

2 groups 2 groups
Independent t-test Mann-Whitney U-test

≥ 3 groups ≥ 3 groups
One-factor ANOVA Kruskal-Wallis test
Numerical data

Comparing dependent groups


Dependent/related groups?
= paired data

• Repeat data on same group of patients


Eg. before and after some intervention

• Patient groups which have been one-to-one


matched for characteristics such as age, sex,
stage of cancer.
Comparing groups
Numerical Data

Dependent/Related
groups

Not Normally
Normally distributed
Distributed

2 groups 2 groups
Paired t-test Wilcoxon signed rank test

≥ 3 groups
≥ 3 groups
Repeated Measures
ANOVA Friedman test
Comparing 2 groups: Paired data

Time to produce organ at risk contours


25

20
time (mins)

15

10

0
Manual Contour Modified Contour
Null hypothesis (H0): The average change
between manual and modified is 0

Alternative hypothesis (H1): The average


change between manual and modified is not 0

If p > 0.05 we don’t reject the null hypothesis

If p < 0.05 we reject the null hypothesis: there is a


statistically significant difference between the
groups
HADS anxiety scores were obtained for 30 cancer patients
before and after a session with a clinical psychologist to assess
whether there had been a significant change. The HADS scores
had a significantly skewed distribution.

Which is the most appropriate test?:

(a) Two-sample t-test


(b) Wilcoxon signed rank test
(c) One factor ANOVA
(d) Mann-Whitney test
(e) Paired t-test
Non numerical / categorical data

Comparing independent groups


Comparing groups
Non-numeric/
categorical data

Independent / unrelated
Related groups
groups

Chi-square test
Fisher’s exact test McNemar’s test
Linear Trend test
Comparing 2 groups with
a categorical outcome

Group 1 = female
Group 2 = male

Outcome = tumour side (left vs right)


Comparing 2 groups with
a categorical outcome

Group 1 = female
Group 2 = male

Outcome = tumour side (left vs right)

Use a chi-square test


Is there a difference in tumour side between
males and females?

Side/ Sex Left Right Total

Female 386 500 886

Male 457 644 1,101

Total 843 1,144 1,987


Is there a difference in tumour side between
males and females?

Side/ Sex Left Right Total

Female 386 500 886


44% 56%
Male 457 644 1,101
42% 58%
Total 843 1,144 1,987
Null hypothesis (H0): Tumour side is similar
between sexes (eg. 44% ≈ 42% for left side)

Alternative hypothesis (H1): Tumour side


is different between sexes (or tumour side is
associated with sex)

If p > 0.05 we don’t reject the null hypothesis

If p < 0.05 we reject the null hypothesis: there is a


statistically significant association between sex
and tumor side
Is there a difference in tumour side between
males and females?

Chi-square test; p=0.36

No significant difference in tumour side

No association between tumour side and sex


of patient
What is the chi-square test?

Compares observed and expected values


(under the null hypothesis) for each cell in
the table

Expected values are calculated by


multiplying the relevant column and row
totals and dividing by the grand total
What is the chi-square test?

The chi-square statistic =

sum of the terms


2
𝑂 −𝐸
𝐸
[one term for each cell in the table]

This numerical value is then compared


with the chi-square distribution
Is there a difference in tumour side between
males and females?

Side/ Sex Left Right Total

Female 386 500 886


375.9 510.1
Male 457 644 1,101
467.1 633.9
Total 843 1,144 1,987

Expected values in red


Is there a difference in tumour side between
males and females?
Side/ Sex Left Right Total

Female 386 500 886


375.9 510.1
Male 457 644 1,101
467.1 633.9
Total 843 1,144 1,987

Chi-square statistic =
2 2 2 2
386−375.9 500−510.1 457−467.1 644−633.9
+ + +
375.9 510.1 467.1 633.9
= 0.85 ; p=0.36
Chi-square test

Can also be used for comparing more than 2


groups, and for outcomes that have more
than 2 categories.

Eg. Comparing patients grouped into 4


distinct categories of treatment and an
outcome measured at 12 months as ‘died’,
‘relapse’, ‘remission’.
Fisher’s Exact test

Use instead of chi-square test if group sizes


are small

Eg. in previous example if only 4 females


and 5 males
Linear trend test

If the grouping factor is an ordered category, eg


stage of tumour, and the outcome is a 2-category
variable, eg death within 12 months, a more
powerful analysis than a simple chi-square test is a
chi-square test for linear trend.

This method effectively fits a straight line to the


death rates corresponding to each stage.
Comparing groups
Non-numeric/
categorical data

Independent / unrelated
Related groups
groups

Chi-square test
Fisher’s exact test McNemar’s test
Linear trend test
Non numerical / categorical data

Comparing dependent/related
groups
Comparing paired groups with a
categorical outcome

One group with data before and after treatment

Outcome = binary (eg. yes vs no)


Example

McNemar’s test is based on comparing the off-diagonal numbers:

Before ‘yes’ / After ‘no’ vs. Before ‘no’/After ‘yes’

ie. 785 vs 75
1000 cancer patients diagnosed via screening or presenting
with symptoms were classified according to their ER status
(positive vs negative).
What is the most appropriate test to use to see whether those
diagnosed via symptoms are more likely to be ER negative
than those diagnosed via screening?

(a) Fisher’s Exact test


(b) McNemar’s test
(c) Chi-square test for linear trend
(d) Wilcoxon signed rank test
(e) Chi-square test
Scatterplot
Scattergram of creatinine vs. digoxin
140

120

100
Creatinine

80

60

40

20

0
0 20 40 60 80 100 120

Digoxin
Correlation
Pearson correlation:

• Used for Normally distributed data


• Measures linear relation between variables
Correlation

• r = 0 no relationship
• r = 1 perfect +ve relationship
• r = -1 perfect –ve relationship
Pearson's correlation coefficient

1 0.93 0.75 -0.2


8
7
1.0
6 6 6
0.5
5
4 0.0
4
4
y

y
-0.5
3 2
2
-1.0
2
0 -1.5
0 1

-1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2

x x x x

-1 -0.89 -0.8 -0.08


4 3
0 2
2
0 2

-2 0
-2 1
-2
y

y
-4 -4 -4 0

-6 -6
-6 -1
-8 -8
-2

-1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2

x x x x
Pearson's correlation coefficient

Sensitive to outliers

0.85 0.32
30
8
25

6 20
y

y
15
4

10

2
5

0 0

-1 0 1 2 -1 0 1 2

x x
Correlation
Spearman correlation:

• Used for non-Normally distributed data


• Measures monotonic relation between
variables
Correlation does not imply
causation
Correlation does not imply
causation

Eg.
Correlation between pork consumption and cirrhosis
mortality r = 0.40 (over 16 countries)
Relationships
Continuous Data

Normally
Not Normally
distributed
Distributed

Pearson’s Spearman’s
correlation correlation
(2 variables) (2 variables)
In a study looking at the relationship between pre-
treatment HADs anxiety score and post treatment
satisfaction score (SAT) were obtained on 130 women.
The Pearson correlation was -0.65; p<0.001.
What is the single best statement?

(a) HADs score explains 65% of the variability in satisfaction score


(b) We can conclude that an increasing HADs score is a cause of poor
SAT score
(c) A correlation of 1 is interpreted as showing no difference
between the two factors.
(d) There is a significant negative linear relationship between HADs
and SAT
(e) The correlation between HADs and SAT can be different from the
correlation between SAT and HADs
Prediction

• Describe the relationship between two


variables
• Be able to predict the value of one
variable for a subject when only have data
on the other variable
Regression
“How can we predict creatinine from digoxin?”
Scattergram of creatinine vs. digoxin
140

120

100
Creatinine

80

60

40

20

0
0 20 40 60 80 100 120

Digoxin
Simple linear regression equation
Used to determine the exact linear relationship
between two variables x and y

y=b+ mx
b = intercept of line on y-axis (value of y when x is zero)
m = slope or gradient of line

x= independent variable
y = dependent variable
Linear Regression
Creatinine vs. digoxin

Creatinine = b + m digoxin

Fitted regression equation:

Creatinine = 5 + 1.2 digoxin


Creatinine vs. digoxin

Creatinine = 5 + 1.2 digoxin

For every unit increase in digoxin,


there is a 1.2 unit increase in creatinine
Goodness of fit

• How well does the regression line


fit the data?
• How accurate are the predictions?
R2

• The proportion of the total


variation explained by the model

• For simple linear regression = the


square of the correlation between
x and y.
Multiple linear regression

More than one ‘x’ variable (independent


variable) in the regression equation

Eg.
Creatinine = b + m1 digoxin + m2 age
Multiple linear regression

Can be used to adjust for confounding variables:

Eg.
“Is there a relationship between cholesterol and
age after adjusting for BMI?”

chol = b + m1 age + m2 BMI


Confounders

• Complicate relationships between two


variables of interest .

• Related to both the dependent variable and


the independent variable
The predictive model for the link between tumour size and
age was derived for a cohort of 250 women with breast
cancer:
Tumour size = 2.47 + 0.10 x age ; p=0.13

(a) Age is treated as a confounder in the regression


(b) The model shows that tumour size significantly
increases by 0.10 for every 1 year increase in age
(c) The association between tumour size and age is not
statistically significant
(d) For every unit change in tumour size, there is a 0.10
change in average age.
(e) The correlation between tumour size and age is
0.10
Prediction models

Outcome = continuous Outcome=binary


Predictors = continuous Predictors = continuous
and/or categorical and/or categorical

Linear Regression
Logistic Regression
Multiple linear regression
Multiple logistic regression
Logistic regression

Dependent variable = binary (two categories)

Eg.

Predict ‘complications (yes/no) following op’


from ‘current systolic BP’
Logistic regression

Predict ‘complications (yes/no) following op’


from ‘pre-op systolic BP’

Loge [p/(1-p)] = b + m (pre-op BP)

Where p = probability of having a complication


Screening Tests – Mammograms
Test

Abnormal Normal Total

Cancer
5,747 963 6,710

Non-Cancer
174,310 1,653,760 1,828,070

Total 180,057
1,654,723 1,834,780

NCI-funded Breast Cancer Surveillance Consortium (HHSN261201100031C). Downloaded 09/01/2015 from the Breast Cancer Surveillance
Consortium Web site - http://breastscreening.cancer.gov/statistics/benchmarks/screening/2009/table7.html .
Abnormal Normal Total

Cancer 5,747 963 6,710

Non-Cancer 174,310 1,653,760 1,828,070

Total 180,057 1,654,723 1,834,780

Sensitivity:
The percentage of the test being positive among the
people who really have the disease.
= 5,747 / 6,710 = 86%
Abnormal Normal Total

Cancer 5,747 963 6,710

Non-Cancer 174,310 1,653,760 1,828,070

Total 180,057 1,654,723 1,834,780

Specificity:
The percentage of the test being negative among the
people who really don’t have the disease.
= 1,653,760/1,828,070 = 90%
Abnormal Normal Total

Cancer 5,747 963 6,710

Non-Cancer 174,310 1,653,760 1,828,070

Total 180,057 1,654,723 1,834,780

Positive predictive value (PPV):


The percentage of test positive results that are true
positives
= 5,747 / 180,057 = 3%
Abnormal Normal Total

Cancer 5,747 963 6,710

Non-Cancer 174,310 1,653,760 1,828,070

Total 180,057 1,654,723 1,834,780

Negative predictive value (NPV):


The percentage of test negative results that are true
negatives
= 1,653,760 / 1,654,723 = 99.9%
Abnormal Normal Total

Cancer True positive False negative

Non-Cancer False positive True negative

Total
Prevalence of disease

• PPV and NPV vary with prevalence

• Sensitivity and specificity do not vary with


prevalence
Varying prevalence

Prevalence Sensitivity Specificity PPV NPV

1.0 per 1,000 0.86 0.90 0.85% 99.98%

3.6 per 1,000 0.86 0.90 3% 99.9%

20 per 1,000 0.86 0.90 15% 99.7%


Relationship between PPV, sensitivity,
specificity and prevalence

𝑃𝑃𝑉 =

𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒+(1−𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦)×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
Relationship between NPV, sensitivity,
specificity and prevalence

𝑁𝑃𝑉 =

𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 ×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
1−𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒+𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
To assess the effectiveness of a new ultrasound screening test, 100
women with abnormal mammogram results were screened. 25
women were later found, by pathology, to have cancer. 20 of these
25 women had a positive ultrasound result. 65 of those without
cancer had a negative ultrasound.

(a) The PPV is 20/35


(b) The sensitivity of a test depends on the prevalence of cancer
(c) The sensitivity is 20/100
(d) The specificity is 65/75
(e) Specificity = 1 - sensitivity
Question 1

Multiple linear regression analysis (multivariable


analysis) can be used to:
Select the best statement

(a) adjust for confounding by adding the confounding


variable as an independent variable.
(b) determine if a study result is clinically important.
(c) adjust for loss to follow-up bias.
(d) calculate a study's power.
(e) correct the p-value for multiple comparisons.
Question 2

Select the best statement:

(a) The correlation coefficient, r, is dependent on the units of


measurement.
(b) In correlation, it matters which variable is put on the x-axis
and which variable is put on the y-axis.
(c) A correlation coefficient is always positive
(d) The Pearson correlation coefficient, r, is sensitive to
extreme values (outliers).
(e) Correlation is a measure of the relationship between the
population mean and the sample mean.
Question 3

Select the best statement concerning a scatterplot:

(a) It plots the distribution of potential confidence intervals in


a data set.
(b) It plots the distribution of a single variable.
(c) It provides a visual description of the distribution of
potential sample means drawn from a given population.
(d) It plots the standard errors of randomly selected data from
a given population.
(e) It plots the values of two numerical variables in a data set.
Question 4

The following is a linear regression model:

Energy Expenditure = 0.56 x (Caloric Intake) + 502.


Select the correct answer:

(a) The model adjusts for potential confounding by caloric intake.


(b) For every unit change in energy expenditure there is a 0.56 unit
change in mean caloric intake.
(c) For every unit change in caloric intake there is a 0.56 unit change
in mean energy expenditure.
(d) For every unit change in energy expenditure there is a change in
caloric intake of 0.56 +502.
(e) 502 is the value of caloric intake when energy expenditure is zero.
Question 5

Follow-up results for up to 84 months were reported for a series of 323 men with prostate
cancer
treated by brachytherapy (BXT). At entry, a Prognostic Index was calculated for each man,
and each classed as low, intermediate or high risk.
The table below gives summary statistics comparing the three risk groups for age and the
outcome measure which was the proportion with PSA £0.2 ng/ml at five years.

What is the most appropriate statistical test to compare age by the level of risk ?

(a) a one way analysis of variance


(b) a log-rank test
(c) a chi-squared test
(d) an unpaired t-test
(e) none of the above
Question 6

A cohort of cancer patients experiencing nausea and


vomiting during chemotherapy were given a new anti-
nausea treatment. A symptom score (1-10) was recorded
before and after treatment. The clinical effect of the
medication could be evaluated using:

(a) Wilcoxon signed rank test


(b) Spearman correlation coefficient
(c) Mann-Whitney U-test.
(d) McNemar’s test
(e) One-factor ANOVA.
Question 7

Select the best statement concerning Pearson's correlation coefficient:

(a) The associated hypothesis test has a null hypothesis that the correlation is
equal to one.
(b) It reflects the magnitude of the association for linear and non-linear
relationships between two numerical variables.
(c) A Pearson's correlation coefficient of zero indicates there is no linear
association between the two variables.
(d) A Pearson's correlation coefficient of positive one indicates there is no linear
association between the two variables.
(e) A Pearson's correlation coefficient of negative one indicates there is a non-
linear relationship between the two variables.
Question 8

Results of a liver scan (positive vs negative) and subsequent


pathology result (abnormal vs normal) were obtained for 344
patients. Abnormal pathology results were found for 258, and
positive liver scans were found for 263 patients. Of these 263
patients, 231 had abnormal pathology

(a)The sensitivity of the liver scan is 231/263


(b)The prevalence of abnormal pathology in the cohort is 258/344
(c)The specificity of the liver scan cannot be determined
(d)The PPV of the liver scan is 263/344
(e)The NPV of the liver scan cannot be determined

You might also like