Professional Documents
Culture Documents
Abstract
Objectives: In the evaluation of cancer screening tests, cancer-free controls are often matched to cancer cases on factors such as sex and
age. We assessed the potential merits and pitfalls of such matching using an example from colorectal cancer (CRC) screening.
Study Design and Setting: We compared sex and age distribution of CRC cases and cancer-free people undergoing screening colono-
scopy in Germany in 2006 and 2007. We assessed specificity by sex and age of two immunochemical fecal occult blood tests (iFOBTs) in
a study among screening colonoscopy participants conducted in the same years, and we assessed the expected impact of matching by sex
and age on the validity of specificity estimates at various cut points.
Results: In the screening colonoscopy program, the proportion of men and mean age were 59.6% and 68.6 years among 10,324 CRC
patients compared with 45.6% and 64.7 years, respectively, among 997,490 cancer-free participants. The specificity of the iFOBTs was
higher among women than among men and decreased with age. Matching of cancer-free controls by age and sex would have led to the
underestimation of specificity at all cut points assessed.
Conclusion: In the evaluation of cancer screening tests, matching of controls may lead to biased estimates of specificity. Ó 2013
Elsevier Inc. All rights reserved.
Keywords: Bias; Early detection; Matching; Screening; Specificity; Statistical methods
R-Biopharm AG, Darmstadt, Germany [28]) are used for il- 2. The selection of a sample of CRC-free participants
lustration here. The lower detection limit and the cut point with matching to the sex and age distribution of
for positivity given by the manufacturer are 0.42 and CRC cases (‘‘matched sampling’’).
2 mg/g stool for the hemoglobin test and 0.38 and 2 mg/g 3. ‘‘convenience sampling’’ of controls without specific
stool for the hemo-/haptoglobin complex test, respectively. attention to their sex and age distribution; here,
Furthermore, patients were asked to fill out a standardized a range of possible values of specificity was derived
questionnaire. Colonoscopy and histology reports were col- from the range of specificity estimates for the various
lected, and relevant data were extracted in a standardized subgroups defined by sex and age. In addition, we
manner. The latter was done independently by two trained provide an example of a specificity estimate expected
investigators who were blinded with respect to test results, from a relatively young convenience sample which
and the potential discrepancies were resolved by consensus. was derived by including only men and women aged
younger than 60 years in the control group.
2.2. Statistical analyses
The sex and age distribution (categories were 55e59, 3. Results
60e64, 65e69, 70e74, 75e79, and 80þ years) and mean
ages of screening colonoscopy participants with and The sex and age distribution of participants of the Ger-
without CRC were derived from the national registry by man screening colonoscopy program in 2006 and 2007
descriptive statistics. Specificities of the iFOBTs by sex with and without CRC is shown in Table 1. Forty-four per-
and age were derived from the BLITZ study. Among cent of cancer cases were aged 70 years or older compared
1,785 participants enrolled in BLITZ in 2006 and 2007, with 24% of participants without CRC. Conversely, only
the following exclusions were made to ensure conditions 12.7% of cancer patients were younger than 60 years of
of a screening setting and minimize potential misclassifi- age compared with 28.6% of screening participants with-
cation because of imperfect colonoscopy: visible rectal out CRC. These differences resulted in a mean age differ-
bleeding or previous positive FOBT result (n 5 111), in- ence of almost 4 years (68.6 vs. 64.7 years, respectively).
flammatory bowel disease (n 5 13), previous colonoscopy Also, the proportion of men was much higher (59.6%)
in the past 5 years (n 5 117), stool sampling after colono- among CRC cases than among participants without CRC
scopy (n 5 65), inadequate bowel preparation for colono- (45.7%).
scopy (n 5 79), and incomplete colonoscopy (n 5 22). In
addition, we excluded 48 patients with pseudopolyps or
Table 1. Age and sex distribution of participants of screening
histologically undefined polyps. After further exclusion
colonoscopy with and without colorectal cancer
of 1 participant with missing information on age, 6 parti-
Colorectal cancer
cipants with missing iFOBT results, and 11 cases with
CRC, 1,312 participants were retained for estimating Yes No
specificities by sex and age. To ensure reasonably precise Sex Age (yr) n % n %
specificity estimates, three rather than six age categories, Men 55e59 767 7.4 119,878 12.0
were used (!60, 60e69, and 70þ years). Multiple logis- 60e64 1,047 10.1 100,787 10.1
tic regression with test positivity as dependent variable 65e69 1,694 16.4 120,485 12.1
70e74 1,377 13.3 69,297 6.9
and sex and age as independent variables was used to test
75e79 862 8.3 33,203 3.3
for associations of both variables with specificity. Further- 80þ 404 3.9 11,481 1.2
more, specificities were also evaluated for a number of Total 6,151 59.6 455,131 45.6
higher cut points (6, 10, 14 mg/g stool) besides the one Mean (yr) 68.5 65.0
recommended by the manufacturer (2 mg/g stool) because Women 55e59 547 5.3 165,529 16.6
60e64 665 6.4 118,013 11.8
the latter yielded specificities for some subgroups that
65e69 1,071 10.4 132,926 13.3
would typically be regarded as too low for population- 70e74 871 8.4 73,760 7.4
based screening. 75e79 610 5.9 36,406 3.6
Finally, specificities were calculated for each cut point 80þ 409 4.0 15,725 1.6
that would be expected with the following sampling strate- Total 4,173 40.4 542,359 54.4
Mean (yr) 68.9 64.5
gies of controls:
Total 55e59 1,314 12.7 285,407 28.6
1. The complete inclusion of all CRC-free screening 60e64 1,712 16.6 218,800 21.9
65e69 2,765 26.8 253,411 25.4
colonoscopy participants meeting the criteria outlined
70e74 2,248 21.8 143,057 14.3
in the preceding paragraph or selection of a random 75e79 1,472 14.3 69,609 7.0
sample of these participants; we will refer to this 80þ 813 7.9 27,206 2.7
strategy as ‘‘correct sampling’’ as specificity is de- Total 10,324 100 997,490 100
rived from a sample that is expected to be representa- Mean (yr) 68.6 64.7
tive of the CRC-free screening population. German national screening colonoscopy registry, 2006e07.
H. Brenner et al. / Journal of Clinical Epidemiology 66 (2013) 202e208 205
Table 2. Specificity (%) of two immunologic fecal occult blood tests (RIDASCREEN Haemoglobin and RIDASCREEN Haemo-/Haptoglobin Complex)
by sex, age, and cut point of positivity
Test
Hemoglobin Hemo-/haptoglobin complex
Age (yr) Cut point (mg/g stool) Cut point (mg/g stool)
Sex Category Mean n 2 6 10 14 2 6 10 14
Men !60 55.8 215 82.3 92.6 91.5 96.3 86.5 94.9 96.7 96.7
60e69 64.6 316 81.3 86.7 87.7 90.2 83.2 92.4 95.9 97.8
70þ 73.6 130 79.9 80.0 83.3 86.9 81.5 90.0 92.3 92.3
Women !60 55.8 246 91.5 95.3 97.6 98.4 94.3 97.6 98.8 98.8
60e69 64.5 309 87.7 93.9 94.5 95.8 88.0 96.8 98.1 98.4
70þ 73.0 96 83.3 93.8 93.8 93.8 84.4 89.6 95.8 96.9
BLITZ study, Germany, 2006e07.
The specificity of the iFOBT by sex, age, and cut point correspond to a relative increase of the false-positive rate
of positivity among participants of the BLITZ study is by 12, 26, and 27%, respectively. Not caring about the
shown in Table 2. Overall, 1,312 participants without sex and age distribution in the selection of controls (conve-
CRC with a mean age of 63.0 years were included, of nience sampling) could result in under- or overestimation
whom 50.4% were males. Sex- and age-specific estimates of specificity. A range of values of specificity that might
of specificity are based on between 96 and 316 participants be obtained by convenience sampling are given by the
per subgroup. As expected, specificity increased with highest and lowest values of specificity observed in the
increasing cut points. For both iFOBTs and each cut point, subgroups defined by sex and age. Ranges were generally
specificity was generally lower in men than that in women rather wide, and specificities at a given cut point varied
and decreased with age between both sexes. Furthermore, by up to 15.3 percent units depending on the control group
both sex and age were independent significant predictors chosen. In the example of a relatively young convenience
of specificity in multiple logistic regression models with sample of controls (mean age, 55.8 years) which was
test result as dependent variable (P ! 0.05 in each case, obtained by restricting the analysis to men and women
data not shown). younger than 60 years of age, specificity would have been
As a result, matching of controls to the sex and age dis- overestimated at each cut point by between 2.4 and 4.5 per-
tribution of cases (matched sampling) would be expected to cent units with the hemoglobin test and between 0.6 and 4.1
lead to underestimation of specificity for all cut points as- percent units with the hemo-/haptoglobin complex test.
sessed. For example, specificities of 83.0, 88.4, and
89.6% of the hemoglobin test would be expected at cut
4. Discussion
points yielding true specificities of 84.8, 90.8, and 91.8%,
respectively (Table 3). This underestimation of specificity Our empirical example illustrates that matching of con-
in absolute terms by 1.8, 2.4, and 2.2 percent units would trols to the sex and age distribution of cases might lead to
Table 3. Expected specificity (%) of immunologic fecal occult blood test (RIDASCREEN Hemoglobin and RIDASCREEN Haemo-/Haptoglobin
Complex) according to the sampling of controls and cut points of positivity
Cut point (mg/g stool)
Test Sampling 2 6 10 14
a
Hemoglobin Correct sampling 84.8 90.8 91.8 93.8
Matched samplingb 83.0 88.4 89.6 91.8
Convenience sampling
Rangec 79.9e91.5 80.0e95.3 83.3e97.6 86.9e98.4
Exampled 87.2 94.4 96.3 97.4
Hemo-/haptoglobin complex Correct samplinga 86.6 94.0 96.6 97.2
Matched samplingb 84.6 92.5 95.5 96.9
Convenience sampling
Rangec 81.5e94.3 89.6e97.6 92.3e98.8 92.3e99.0
Exampled 90.7 96.3 97.8 97.8
German national screening colonoscopy registry and BLITZ study, Germany, 2006e07.
a
Sex and age distribution corresponds to that in carcinoma-free screening participants.
b
Matching to sex and age distribution of colorectal cancer cases.
c
Range of values observed in subgroups defined by sex and age (Table 2).
d
Example of a relatively young convenience sample (mean age, 55.8 years) that was derived by restricting the analysis to men and women
younger than 60 years of age.
206 H. Brenner et al. / Journal of Clinical Epidemiology 66 (2013) 202e208
biased estimates of the specificity of screening tests if the sex It should be noted, however, that the arguments outlined
and age distribution of cases and controls in the screening in this article refer to studies aiming to describe the sensi-
population vary and specificity likewise varies according to tivity and specificity of diagnostic tests in the screening set-
sex and age. The former condition is expected to commonly ting. In other contexts, preferences may be different. In
hold and has repeatedly been demonstrated in cancer screen- particular, there might be studies primarily aiming to assess
ing because of the strong increase in cancer incidence and to what extent a test, whose results vary by age and sex,
prevalence with age for almost all cancers and common var- might have any independent diagnostic value (beyond diag-
iation in cancer incidence and prevalence between men and nostic value mediated by its association with age and sex)
women [8e12]. The latter condition may also frequently to distinguish people with and without cancer. In such stud-
apply and was quite pronounced for the iFOBTs used for il- ies, matching for age and sex might be a method of choice
lustration in our analysis. Other common examples include indeed. Vice versa, the use of convenience samples of con-
measurements of pepsinogen and inflammatory markers in trols whose age and sex distribution strongly vary from that
peripheral blood, which have been suggested as screening of cases might inappropriately attribute diagnostic value to
markers for gastric and a variety of other cancers [29e31] tests whose results vary by age and sex independent of the
and which are well known to show a major variation by presence of cancer.
sex and age [32,33]. Therefore, the sex and age distribution Regarding the specific context of our empirical example,
of controls should reflect that of cancer-free screening partic- a few additional issues require further discussion. In many
ipants rather than that of cancer cases. studies on early detection markers of CRC, cases are re-
Ideally, the evaluation of cancer screening tests should cruited in the clinical setting rather than in a screening set-
be done in screening populations with direct sampling of ting, and the age distribution of those cases is likely to be
cases and controls from the subpopulations with and with- even further shifted toward higher ages compared with
out cancer. The German screening colonoscopy program CRC-free controls from the screening population. Accord-
provides such a setting for the evaluation of early detection ing to data from population-based cancer registries from
markers of CRC because subpopulations with and without Germany [39], the estimated proportion of patients diag-
CRC in the screening population can be reliably distin- nosed with CRC in Germany in 2006 and 2007 who were
guished by colonoscopy, which is considered the diagnostic 70 years or older at the time of diagnosis was 57.6%, that
gold standard in this context. Most studies aiming to eval- is, substantially higher than the corresponding proportion
uate cancer screening tests are conducted in different set- of 44% observed among CRC patients identified by screen-
tings, however, starting with the identification and ing colonoscopy (Table 1). On the other hand, cancer-free
sampling of cancer cases for which adequate controls are participants of screening colonoscopy in our empirical
then sought. Often, cases are identified in clinical settings example may be older on average than the typical cancer-
(e.g., after the diagnosis and before the start of therapy), free CRC screening population because screening colono-
and convenience samples are used as controls, such as scopy is offered from the age of 55 years only in Germany,
patients undergoing similar diagnostic procedures but whereas screening is recommended from the age of 50
found free of the cancer of interest or other healthy volun- years by expert panels [40e42], and screening by FOBT
teers, such as blood donors. Such convenience sampling is offered starting from the age of 50 years in Germany.
may sometimes lead to very large discrepancies in sex Therefore, the age gap between CRC cases and cancer-
and age distribution of cases and controls, far beyond those free controls may even be larger, and the potential bias
justified by true differences in the screening population. from matching by age may be more severe in other settings
For example, in the aforementioned review of studies on than in the one assessed in our empirical example. Further
blood-based tests for CRC early detection [3], the majority limitations to generalizability might arise from potential
of studies used healthy volunteers, patients with benign differences in self-selection of participants of screening
diseases, or a combination of both as controls. Where re- colonoscopy and users of other noninvasive options of pri-
ported, the mean age of controls was mostly much lower mary screening tests [43,44].
than that of CRC cases, with differences exceeding 20 years In our example, specificity was defined by the propor-
in several studies [34e38]. In such situations, matching of tion of negative tests among those without CRC. Ideally,
controls to the sex and age distribution of cases might have however, screening tests for CRC should also detect ad-
provided less biased estimates of specificity. Nevertheless, vanced adenomas, the precursors of CRC, and possibly
direct sampling from the cancer-free screening population, even other (nonadvanced) adenomas. It might therefore
which would avoid additional threats of validity that may be argued that specificity should be determined among
result from convenience sampling (a detailed discussion those free of advanced neoplasms (or even free of any neo-
of which is beyond the scope of this article) and which plasms) only. Repeating our analyses with these alternative
may typically yield some difference in sex and age distribu- definitions of specificity yielded substantially higher levels
tion of cases and controls, should be the preferred strategy of specificity at all cut points assessed, along with an even
to be applied whenever possible. slightly larger gap in mean age between CRC cases and
H. Brenner et al. / Journal of Clinical Epidemiology 66 (2013) 202e208 207
controls and a similar bias by matching of controls to the screening settings. Although extreme differences in such
sex and age distribution of cases. distribution may be indicative of the use of inappropriate
Specificity decreased with age and was lower among convenience samples, full agreement enforced by matching
men than among women for the specific tests evaluated is also often undesirable as it typically hinders controls
in our example, leading to underestimation of specificity from being representative of the cancer-free screening
by matched sampling and overestimation of specificity in population.
case of a relatively young convenience sample. The de- Ideally, controls should be a random sample of the
crease in specificity with age might be explained by an in- cancer-free screening population. In studies in which ran-
creasing risk of bleeding from other gastrointestinal lesions dom sampling of controls from the cancer-free study popu-
with older age. Likewise, lower specificity among men than lation is not possible, matching of controls to the age and
among women might result from higher prevalences of ul- sex distribution of the cancer-free screening population
cer or other gastrointestinal bleeding sources and higher (which can typically be closely approximated by the age
prevalences of aspirin use among men than among women and sex distribution of the total screening population) rather
[45]. Similar variation of specificity by sex and age would than matching to the age and sex distribution of cases might
be expected for other FOBTs. For other types of tests, inde- be the method of choice. In studies in which the age and sex
pendence of specificity from sex and age or even reverse distribution of controls differs from that of the cancer-free
patterns might also be conceivable. If sex and age are unre- screening participants, adjustment to the age and sex distri-
lated to specificity, then matching by these factors does not bution of the cancer-free screening population by appropri-
introduce bias, but it is also unnecessary as it then does not ate weighting of age- and sex-specific estimates of
have any impact on specificity estimates. specificity might be considered to prevent the type of pos-
In our study, performance of colonoscopy, which is con- sible bias outlined in this article.
sidered the gold standard for the detection of CRC, allowed
the distinction of screening participants with and without
CRC at high reliability. In many other screening settings, Acknowledgments
delineation of the cancer-free subpopulation may not be The authors acknowledge excellent contributions in the
as straightforward or not be possible at all. In such settings, conduction of the BLITZ study by Isabel Lerch, Sabrina
the age and sex distribution of the entire screening popula- Hundt, Ulrike Haug, and the cooperating gastroenterology
tion may often be a reasonable proxy for the age and sex practices. The authors are grateful to Labor Limbach
distribution of the cancer-free screening population, given (Heidelberg) for laboratory analyses of the iFOBT.
the low prevalence of cancer in most screening settings.
Our study has specific strengths and limitations. References
Strengths include the very large database of the German na-
[1] Bosch LJ, Carvalho B, Fijneman RJ, Jimenez CR, Pinedo HM, van
tional screening colonoscopy registry used to assess sex and
Engeland M, et al. Molecular tests for colorectal cancer screening.
age distribution of CRC cases and CRC-free controls and Clin Colorectal Cancer 2011;10:8e23.
the evaluation of the tests in a true screening setting. We [2] Luo X, Burwinkel B, Tao S, Brenner H. MicroRNA signaturesd
presented a detailed empirical illustration of variation of novel biomarker for colorectal cancer? Cancer Epidemiol Biomark
specificity by sex and age for two tests only. However, very Prev 2011;20:1272e86.
[3] Tao S, Hundt S, Haug U, Brenner H. Sensitivity estimates of blood
similar patterns were seen for both tests and in fact would
based tests for colorectal cancer detection: impact of overrepresenta-
be expected for FOBTs in general, the so far best estab- tion of advanced stage disease. Am J Gastroenterol 2011;106:242e53.
lished noninvasive tests for CRC screening. In most previ- [4] Dudouet B, Jacob L, Beuzeboc P, Magdalenat H, Robine S,
ous evaluations of other CRC early detection markers, no Chapuis Y, et al. Presence of villin, a tissue-specific cytoskeletal pro-
stratification by sex and age was done, and for many of tein, in sera of patients and an initial clinical evaluation of its value
for the diagnosis and follow-up of colorectal cancers. Cancer Res
those markers, it is unknown to what extent their specificity
1990;50:438e43.
may vary by sex and age. [5] Schiedeck TH, Wellm C, Roblick UJ, Broll R, Bruch HP. Diagnosis
Despite its limitations, our article illustrates the poten- and monitoring of colorectal cancer by L6 blood serum polymerase
tial merits and pitfalls of matching of controls in the eval- chain reaction is superior to carcinoembryonic antigen-enzyme-
uation of cancer screening tests. Although matching of linked immunosorbent assay. Dis Colon Rectum 2003;46:818e25.
[6] Holten-Andersen MN, Christensen IJ, Nielsen HJ, Lilja H,
controls to the sex and age distribution of cases may help
Murphy G, Jensen V, et al. Measurement of the noncomplexed free
to avoid potentially even larger bias by the use of conve- fraction of tissue inhibitor of metalloproteinases 1 in plasma by im-
nience samples whose sex and age distribution may be munoassay. Clin Chem 2002;48:1305e13.
even further away from that of cancer-free people from [7] Leung WK, To KF, Man EP, Chan MW, Bai AH, Hui AJ, et al. Quan-
the screening population, such matching may still result titative detection of promoter hypermethylation in multiple genes in
the serum of patients with colorectal cancer. Am J Gastroenterol
in biased estimates of specificity. Along the same lines,
2005;100:2274e9.
agreement of sex and age distribution of cases and controls [8] Lutz JM, Francisci S, Mugno E, Usel M, Pompe-Kirn V,
should not be regarded as a quality criterion of studies aim- Coebergh J-W, et al. Cancer prevalence in Central Europe: the
ing to assess the sensitivity and specificity of tests in cancer EUROPREVAL Study. Ann Oncol 2003;14:313e22.
208 H. Brenner et al. / Journal of Clinical Epidemiology 66 (2013) 202e208