2011 Screening of Prostate Cancer by Analyzing Trace Elements in Hair and Chemometrics

Biol Trace Elem Res (2011) 144:97–108
DOI 10.1007/s12011-011-9038-5
Screening of Prostate Cancer by Analyzing Trace

Elements in Hair and Chemometrics
Chao Tan & Hui Chen
Received: 1 March 2011 / Accepted: 14 March 2011 /

Published online: 31 March 2011
# Springer Science+Business Media, LLC 2011
Abstract Prostate cancer is the most common non-cutaneous malignancy and second
leading cause of cancer mortality in men. The principle goal of this study was explore the
feasibility of applying boosting coupled with trace element analysis of hair, for accurately
distinguishing prostate cancer from healthy person. A total of 113 subjects containing 55
healthy men and 58 prostate cancers were collected. Based on a special index of variable
importance and a forward selection scheme, only nine elements (i.e., Zn, Cr, Mg, Ca, Al, P,
Cd, Fe, and Mo) were picked out from 20 candidate elements for modeling the relationship.
As a result, an ensemble classifier consisting of only eight decision stumps achieved an
overall accuracy of 98.2%, a sensitivity of 100%, and a specificity of 96.4% on the
independent test set while all subjects on the training set are classified correctly. It seems
that integrating boosting and element analysis of hair can serve as a valuable tool of
diagnosing prostate cancer in practice.
Keywords Prostate cancer . Element . Boosting . Chemometrics
Introduction
Prostate cancer is a form of cancer that develops in the prostate, a gland in the male reproductive
system. Prostate cancer tends to develop in men over the age of 50 and although it is one of the
most prevalent types of cancer in men, many never have symptoms, undergo no therapy, and
C. Tan (*)
Department of Chemistry and Chemical Engineering, Yibin University, Yibin 644007,
People’s Republic of China
e-mail: chaotan1112@163.com
C. Tan
Computational Physics Key Laboratory of Sichuan Province, Yibin University, Yibin 644007,
People’s Republic of China
H. Chen
Hospital, Yibin University, Yibin 644007, People’s Republic of China
98 Tan and Chen
eventually die of other causes. Also, the risk of developing it increases with age [1–4].
Although the etiology of prostate cancer is unknown, many risk factors including genetics,
age, race, and diet have been identified in the development of prostate cancer [5]. All the
factors might be tracked to the difference of chemical element concentration, especially the
trace elements. Recently, Schöpfer et al. have investigated selenium and cadmium levels in
prostates of nonsmokers and smokers and drawn some valuable conclusions [6].
Nowadays, the investigation of trace element in the body fluids and tissues of living persons
has been correlated to the diagnosis of various diseases. [7–13]. Even though the biochemical
mechanism of these elements for the cause of cancer in human body is not very clear in the
present stage, from the perspective of cancer prevention and diagnosis, the studies on the
relationship between the cause of cancer and the levels of elements are of great importance. It
is widely recognized that the presence of specific element level in human subjects may be an
indicator of certain cancers [14–16]. On the contrary, some other elements such as selenium
have protective effects on cancer. Few people actually develop cancer in the high-selenium
regions, suggesting that selenium, at the level at which it occurs in foods, is not a carcinogen
but in fact a cancer-protecting agent [17]. As a marker of elements in the human body, hair
has attracted considerable attention because: (1) the concentrations of most of the elements
are higher in hair than in other human materials; (2) the sampling of hair is easy, painless and
non-invasive, and also special storage conditions are not needed; (3) unlike blood, hair is an
inert and chemically homogeneous sample. Some researches have been done; most of them
focus on the distribution of trace elements and a basic statistic analysis. Guo et al. has
explored the feasibility of using some elements to predict prostate cancer using support vector
machine [18]. We have reported the feasibility of using chemometrics and element analysis
for predicting lung cancer and cardiovascular diseases [19, 20].
Considering the complicated interactions among various elements, it is very necessary to
use multivariate methods to explore the relationship between a group of elements and
cancer, other than single ones. Up to now, most of the modeling methods are based on
building a single model with limited accuracy. Recently, the so-called “ensemble” strategy
has gained increasing attention in various fields and has made a fundamental shift in the
mindset of a predictive model designer, i.e., instead of trying to build a single complex
model, one can instead resort to combine a set of simple models [21]. Its main advantage is
able to increase the accuracy and robustness of a classification/regression model. Boosting
is one of the most popular ensemble techniques and has been successfully applied to
various fields [22, 23]. In boosting modeling, a so-called base learning algorithm is called
repeatedly, each time feeding it a different subset of the training samples. After many
rounds, all base models are combined into a single prediction model. In boosting, the
subsequent models focus more on those misclassified samples by the preceding ones, just
by which, it can produce more reliable prediction than the individual models.
In the present work, the main goal is explore the feasibility of applying boosting coupled
with trace element analysis of hair, for accurately distinguishing prostate cancer from healthy
person. A dataset involving 20 (i.e., Na, Mg, Al, P, K, Ca, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se,
Mo, Cd, Pb, and Sr) elements of 113 hair samples, among which 55 were taken from healthy
men and the remaining 58 from patients with prostate cancer, were used for illustration. Based
on a special index, only nine elements (i.e., Zn, Cr, Mg, Ca, Al, P, Cd, Fe, and Mo) were
selected for modeling the relationship. As a result, an ensemble classifier composed of only
eight decision stumps achieved an overall identification accuracy of 98.2%, a sensitivity of
100%, and a specificity of 96.4% on the independent test set while all cases on the training set
are classified correctly. It seems that integrating boosting and element analysis of hair is maybe
a valuable tool of diagnosing prostate cancer in practice
Screening of Prostate Cancer by Analyzing Trace Elements 99
Theory and Method
Algorithm of Boosting
Let @ ¼ fðxi ; yi ÞgNi¼1 denote a training set of N independent instances consisting of input
attributes xi ¼ ðxi1 ; xi2 ; ; xip Þ 2 X Rp and class labels yi 2 Y ¼ fw1 ; w2 ; ; wJ g. In a
classification problem, the goal is to use the information only from @ to construct
classifiers that have good generalization ability, namely, perform well on the previously
unseen data. Since multi-class classification problems can be generally transformed into
many binary ones to solve [24], we assume in this paper that the label space Y consists of
just two possible labels, that is, Y ¼ f1; þ1g. In the boosting algorithm, one of the main
ideas of it is to maintain a probability distribution or a set of weights Dt over the training
set @. In each iteration, say t, the algorithm takes as input a training set @t ¼
ðtÞ ðtÞ
fðxi ; yi ÞgNi¼1 which is randomly selected from @ according to Dt and the weight of
the ith training instance assigned by Dt(i). Initially, all weights are set to be equal, but in
each of sequential iteration, the weights of incorrectly classified instances are increased in
a suitable way so that the following trained classifier focuses on the “hard” instances in
the original training set. It should be noted that in boosting algorithm, the performance of
a weak classifier ht is evaluated by its misclassification error over all training instances
with respect to the distribution Dt. That is to say, εt calculated is a global measure of the
performance of ht on @. A special ht may performs well on some training instances
whereas behaves badly on the others. Detailed, this algorithm consists of the following
[25]:
& Given: a training set @ ¼ fðxi ; yi ÞgNi¼1 in which xi 2 X Rp and yi 2 Y ¼ f1; þ1g; a
weak learning algorithm h; number of iterations T
& Initialize: Set the probability distribution over @ as Dt ðiÞ ¼ 1=N ði ¼ 1; 2; ; N Þ
& Iterate:
For t=1, ⋯, T
1. According to the distribution Dt, draw N training instances at random from @ with
ðtÞ ðtÞ
replacement to compose a new training set @t ¼ fðxi ; yi ÞgNi¼1 .

2. Apply h and @t to train a weak classifier ht : X ! f1; þ1g and compute the error
of h t as
X
N
"t ¼ Pr ðh t ðxi Þ 6¼ yi Þ ¼ Iðh t ðxi Þ 6¼ yi ÞDt ðiÞ
iDt
i¼1
where, I(.) is the indicator function which takes value 1 or 0 depending on whether
the ith training instance is misclassified by h t or not?
3. If εt >0.5, then set Dtþ1 ðiÞ ¼ 1=N ði ¼ 1; 2; ; N Þ and go to step 1; if εt =0, then
set εt =10−10 to continue the following iterations
4. Choose at ¼ 12 lnð1""t Þ.
t
5. Update the probability distribution over @t as

at
Dt ðiÞ e ; if ht ðxi Þ ¼ yi
Dtþ1 ðiÞ ¼
Zt eat ; if h t ðxi Þ 6¼ yi
where, Zt is a normalization factor (it should be chosen so that Dt+1 is a

distribution over @t
100 Tan and Chen
& Output: the ensemble classifier as

X
T
HðxÞ ¼ signð at h t ðxÞÞ
t¼1
In boosting, the weighted error εt on each weak classifier h t is a key index. Freund has
also
Q pproved that theffi training error of the ensemble classifier H(x) is at most
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½2 errðtÞ ð1 errðtÞ Þ, known as upper error Bound (ErrBound), a useful index [26].
Thus, in this study, we used both WeightedErr and ErrBound for analysis purpose.
Decision Stump
A decision stump is a machine learning model consisting of a one-level decision tree with
categorical or numerical class label. That is, it is a decision tree with one internal node (the root)
which is immediately connected to the terminal nodes. A decision stump makes a prediction
based on the value of just a single input feature. For a two-class problem, it is defined as:
f ðx; j; b; sÞ ¼ s signðxj bÞ;

where, s takes on the values {−1,1} and b takes on values as defined below. A decision stump
is specified by the parameters j, b, and s. It is easily seen that for fixed values of s and b, the
decision stump is a shifted step function that assigns x a label based on only the jth predictor
xj. There exist many candidates decision stumps (i.e., combinations of s and b) for each
predictor. Given a training set fxi ; yi g; i ¼ 1; 2 n, we prepare a collection of decision
stumps for each predictor xj in the following manner.
1. Sort all unique values of the jth predictor xj as fxðjÞi g; i ¼ 1; 2; ; nj , where nj is the
number of unique values of the jth predictor. Note that x(j)i is the jth predictor of the ith
sample xi.
2. Find all mid-points between sequential pairs of points in this sorted collection.
3. For each mid-point (indicated by b), prepare two candidate decision stumps f(x; j, b, 1)
and f(x; j, b,−1).
P
Finally, a total of Kj¼1 2ðnj 1Þclassifiers are prepared in step (3) for K predictors. So,
one weak classifier in Adaboost can be obtained.
Sample Set Partitioning
In order to sort all samples according to their representatives, the well-known Kennard–
Stone (KS) algorithm [27, 28] is adopted. The KS algorithm sequentially select a sample to
maximize the Euclidean distances between already selected samples and the remaining
samples, so they spread throughout the multivariate space they determine. For a spectral
response matrix with N samples (rows) and K variables (columns), the multivariate
Euclidean distances between sample i and j is
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
XK

Dij ¼ xi xj ¼ ðxip xjp Þ2 :
p¼1
The first step is to find the two most distant samples in the entire set, which maximize Dij.
Next, a third sample is chosen by the following criteria: the distances between each sample and
the two selected are calculated; the shortest of each of these pairs of distances is selected, and
finally the sample with the maximum value in this group of minimal distances is selected.
When M samples out of N have been chosen, the next sample, M+1, is selected by calculating
di ðM Þ ¼ minfDi1 ; Di2 ; ; DiM g

for the N−M samples that were not chosen previously. All of these, the samples with index l,
will be selected if they satisfy
dl ðM Þ ¼ maxfdi ðM Þg:
By this means, each class of samples first form a sequence, which makes it more
advantageous to study the effect of different subset sizes on prediction. Next, for each
sequence, an alternative resampling is applied to pick one sample of every two samples into
the training set while the remaining samples constitute the test set. By this means, the whole
dataset is split into two parts with approximately the same distribution.
Experimental
The dataset in this study was taken from the work of Zhang Z (Shanghai University). Here,
a brief introduction was provided. A total of 113 hair samples were collected. Group 1 (55
samples) came from healthy subjects; group 2 (58 samples) came from volunteers who had
been diagnosed as prostate cancer patients. About 0.3 g of hair was cut and collected from
the occipital region of the head (close to the scalp) for each subject. All samples was put
into paper envelopes and labeled for the restored. Before analysis, all samples were first
washed with detergent solution for 15 min and flushed with doubly deionized water until no
bobbles are formed; then they were dried for 4 h in an oven at 80–85°C and cut into pieces
of 0.5–1 cm length. About 0.2 g of sample was weighted for digestion by a special
procedure as follows: predigest the sample into a Teflon digesting bottle with 5 mL nitric
acid overnight. Then further digest the samples in a microwave oven (conditions of
microwave oven: 2 min under 5 MPa pressure, followed by 2 min at 10 MPa pressure, then
3 min under 12 MPa pressure and, finally, 1 min under 15 MPa pressure). Cool the digested
samples to room temperature. A total of 20 elements including Na, Mg, Al, P, K, Ca, V, Cr,
Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Pb, and Sr were analyzed by Agilent 7500c ICP-
MS using coupled with calibration curves. For classification/diagnosing purpose, each
healthy sample was assigned a label 1 while a prostate sample was assigned a label 2. All of
the calculations were performed with Matlab version 7.1 under Windows Xp, based on
Pentium IV with 256 RAM.
Results and Discussion
Sample Set Partitioning and Primary Statistics
First, each class of samples was ranked as a sequence by KS algorithm, as described above.
Then, for each sequence, an alternative resampling was applied to pick one sample of every
two samples into the training set while the remaining samples constitute the test set. The
training set and the test set consist of 56 and 57 samples, respectively. By this means, the
training set and the test set exhibited approximately the same information distribution,
102 Tan and Chen
which make it valid to use the test set for measuring the performance of a calibration model
constructed on the training set [29].
Table 1 summarizes a few descriptive statistics including mean, standard deviation
(SD), minimum, maximum. It is clear in Table 1 that there exist some differences of
elemental concentration mean between two groups (e.g., Mg, P, and Pb), but in most
cases, the concentration appears very dispersive, as the SD indicate. On the average, the
concentrations of 12 elements (i.e., Na, Al, K, V, Fe, Ni, Cu, As, Se, Mo, Cd, and Pb) for
the cancer group are higher than those for the healthy group while for other elements the
case is exactly reverse. Hereinto, the observed cadmium and lead concentrations in hairs
of prostate cancer group are close to four and 13 times that of the healthy group,
respectively. Cadmium has been suggested to increase prostate cancer risk factor due to
its demonstrated ability to simulate the growth of human prostate epithelial cell even at
low level and to induce their malignant transformation [6]. Lead has been shown to be a
contributing cause of many cancers [30]. In contrast, there is increasing evidence that
selenium is a cancer-protecting agent and that fewer people actually develop cancer in
high-selenium regions. However, both cadmium and lead belong to the group of
selenium-antagonistic elements that interact with Se, abolishing its anti-carcinogenic
effect [17, 31]. So, the toxicity of cadmium and lead themselves together with their
counteraction to the anticancer effect of selenium can explain why prostate group has
Table 1 Summary of descriptive statistics in both cancer and healthy groups (ug/g)
Cancer (n=58) Healthy (n=55)
Mean±SD Min–max R-to-Sea Mean±SD Min–max
Na 359.83±410.43 12.92–2276.4 0.69 257.82±258.54 0.05–1109

Mga 54.24±54.22 0.02–252.84 0.17 99.30±81.27 33.27–572.8
Ala 66.10±89.37 2.89–393.1 0.51 52.77±98.46 0.02–664.2
Pa 187.05±48.37 116.38–441.5 −0.35 360.43±172.81 142.5–1193
K 147.71±144.86 0.11–725.2 0.29 92.21±53.62 6.5–288.5
Caa 429.99±283.65 100.6–1330 −0.31 1570±998.56 496.2–5701
V 1.05±1.72 0–9.71 0.85 0.06±0.11 0–0.72
Cra 2.67±5.78 0.01–36.54 0.017 4.64±6.50 0.03–35
Mn 1.27±1.50 0.01–5.99 0.40 1.54±1.85 0.01–10.3
Fea 97.59±122.72 0.02–455.42 0.61 38.20±70.54 0.01–499.5
Co 0.11±0.08 0.02–0.5 −0.14 0.76±1.31 0–5.98
Ni 1.38±2.48 0–13.7 0.45 1.02±1.09 0–5.32
Cu 16.15±12.86 7.65–93.92 0.003 14.21±17.03 4.62–122.8
Zna 171.92±75.75 25.66–512.01 −0.013 229.86±78.49 78.49–418
As 3.76±6.87 0.04–35.97 0.35 3.15±8.20 0–47.15
Se 141.85±185.08 0.9–711.06 1.00 0.55±0.53 0.03–2.55
Moa 0.08±0.11 0.02–0.95 0.41 0.006±0.03 0–0.24
Cda 2.70±4.50 0.01–25.43 0.63 0.58±2.95 0–21.82
Pb 160.78±337.94 0.11–2319.2 0.72 12.58±29.33 0.02–214.7
Sr 0.46±0.67 0.03–4.41 −0.095 1.67±2.19 0–10.07
R-to-Se denotes the correlation coefficient to Se

a
Elements that forms the optimal subset for modeling the relationship
Fig. 1 The frequency histogram

and corresponding estimated
probability distribution of Zn
concentration for cancer
group (red line) and healthy
group (blue line)
higher concentrations of lead and cadmium. Furthermore, to get an overview of data

distribution, we check the frequency histogram for all elements and find almost all of the
distributions remain considerable overlaps. Taking Zn for example, Fig. 1 gives the
frequency histogram and corresponding estimated probability distribution for both
groups. It seems that using only one or a few elements may be unable to produce a
satisfactory model. In order to obtain preliminary indications on the possible natural
clustering of samples, principal components analysis has been used. Figure 2 gives the
explained variance vs. the number of PCs and all the loading vectors. Obviously, no PC
dominates to explain variance. Even if the first three PCs explain about 57% variances, in
the PC1-PC2-PC3 score plot as well as its projection on PC1-PC2 plane), points
corresponding to both groups still remain considerable overlap, as shown in Fig. 3,
implying that it is not feasible for discrimination purpose by the way of extracting PCs.
Considering that some of the variables/elements may be uninformative or have so low
ratios of the signal-to-noise that their inclusion will lead a poorer model, a procedure of
Fig. 2 Explained variance vs. the

number of PCs
104 Tan and Chen
Fig. 3 Score plot of the first

three PCs and its 2-D projection
variable selection is adopted before modeling. In order to measure the differentiating

ability of an element, a special index “var_importance” is defined as:
absðmeanðcancerÞ meanðhealthyÞÞ
var importance ¼ ;
absðSDðcancerÞ SDðhealthyÞÞ
where, abs denotes the absolute value. Figure 4 show the index values of var_importance
of all elements. Based on this, Zn seems to be the most important element. Even so, it is
difficult to use only such an element for distinguishing healthy persons with cancer ones
owing to distribution superposition. These evidences indicate that the classification task is
not easy.
Fig. 4 The index values of var_-

importance of all elements
Fig. 5 Misclassified rate (MCR)

as a function of ensemble size
(the ensemble size of 1 denotes a
single classifier)
Classification Model of Prostate Cancer
Without variable selection (i.e., using all element concentration), a series of ensemble
classifiers with varied ensemble size (the number of weak classifiers–decision stump) were
built. Figure 5 shows the misclassified rate (MCR) as a function of ensemble size (the
ensemble size of 1 denotes a single classifier).It is clear that once the ensemble size is
above 3, the MCR associated to training remains 0, i.e., all training samples can be
correctly classified. When the ensemble size reaches 8, the MCR associated to test remains
its minimum. Thus, the ensemble is fixed at 8. When dealing with predictive models, often
too many variables are measured. It is very tempting for the modeler to construct a model
using all available variables. This, however, creates several data analysis problems such as
overfitting. Some of the variables may be uninformative or have so low ratios of the signal
to noise that their inclusion will lead a poorer model. So, a simple variable/element
selection procedure has carried out. First, all elements are ranked according to
Fig. 6 Misclassified rate (MCR)

as a function of the number of
elements
106 Tan and Chen
Fig. 7 Comparison of diagnosis

and prediction for the training
and test sets (Note: “1” signifies
healthy while “2” denotes patient;
only one samples belonging to
healthy people are misclassified)
var_importance. Then, the elements used for modeling are introduced stepwise by their
importance. Figure 6 shows the MCR as a function of the number of elements.
Obviously, with the increase of elements, the MCR values for both the training set and
the test set drop quickly; when the number of element reaches nine, the MCR
corresponding to both the training set and the test set achieve their minimum (0% and
1.7%, respectively). Afterward, as more elements are introduced, the MCR remains
unchanged. It seems that the variable selection is helpful and using too many elements is
not advisable. Thus, Mg, Al, P, Ca, Cr, Fe, Zn, Mo, and Cd may be the most important
elements for diagnosing prostate cancer. Considering that Se is often regarded as a key
element in the study of the relationship between cancer and trace elements, we also
calculate the correlation coefficient between all elements and Se concentrations (the “R-to-
Se” in Table 1). It is clear that Se concentration is closely correlated to those of Al, Fe, and
Cd while Pb concentration is also closely correlated to Se. In fact, by a complex balances
among elements, the information of some elements are expressed by others. The classifier
that consists of only eight stump decisions and contains nine elements achieves the optimal
performance. As shown in Fig. 7, all of training subjects are correctly predicted; among 57
test samples, all cancer subjects are correctly predicted and only one healthy subject is
misclassified as cancer. That is, the accuracy, sensitivity and specificity are 98.2%, 100%,
and 96.4%, respectively.
Conclusions
Trace elements are being increasingly recognized as essential mediators of various diseases
including cancers. The concentrations of some elements in serum, blood, and hair samples
can be used to distinguish the healthy people and cancer patients. This study shows that
boosting of decision stumps coupled with nine trace elements in hair can be used as an aid
for diagnosing prostate cancer in clinical practice. However, to get a better understanding of
the relationship, more research work has to be done.
Acknowledgments This work was supported by Sichuan Province Science Foundation for Youths
(09ZQ026-066). The authors are grateful to Zhang Z for releasing the dataset used in this paper.
References
1. Jermal A, Thomas A, Murray T, Thun M (2002) Cancer statistics. Cancer J Clin 52:23–47
2. Cohen LA (2002) Nutrition and prostate cancer. Ann NY Acad Sci 963:148–155
3. Shannon J, Tewoderos S, Garzotto M, Beer TM, Derenick R, Palma A, Farris PE (2005) Statins and
prostate cancer risk: a case–control study. Am J Epidemiol 162:318–325
4. Kurahashi N, Inoue M, Iwasaki M, Sasazuki S, Tsugane AS (2008) Dairy product, saturated fatty acid,
and calcium intake and prostate cancer in a prospective cohort of Japanese men. Cancer Epidemiol
Biomark Prev 17:930–937
5. Douglas MT (2003) The importance of trace element speciation in biomedical science. Anal Bioanal
Chem 375:1062–1066
6. Schöpfer J, Drasch G, Schrauzer GN (2010) Selenium and cadmium levels and ratios in prostates, livers,
and kidneys of nonsmokers. Biol Trace Elem Res 134:180–187
7. Forte G, Alimonti A, Violante N, Gregorio M, Senofonte O, Petrucci F, Sancesario G, Bocca B (2005)
Calcium, copper, iron, magnesium, silicon and zinc content of hair in Parkinson’s disease. J Trace Elem
Med Biol 19:195–201
8. Pasha Q, Malik SA, Iqbal J, Shaheen N, Shah MH (2008) Comparative evaluation of trace metal distribution
and correlation in human malignant and benign breast tissues. Biol Trace Elem Res 125:30–40
9. Zhai HL, Chen XG, Hu ZD (2003) Study on the relationship between intake of trace elements and breast
cancer mortality with chemometric methods. Comput Biol Chem 27:581–586
10. Celik HA, Aydin HH, Ozsaran A, Kilincsoy N, Batur Y, Ersoz B (2002) Trace elements analysis of
ascitic fluid in benign and malignant diseases. J Clin Biochem 35:477–481
11. Miura Y, Nakai K, Sera K, Sato M (1999) Trace elements in sera from patients with renal disease. J Nucl
Instrum Methods Phys Res B 150:218–221
12. Patriarca M, Menditto A, Felice GD, Petrucci F, Caroli S, Merli M, Valente C (1998) Recent
developments in trace element analysis in the prevention, diagnosis, and treatment of diseases.
Microchem J 59:194–202
13. Frisk P, Darnerud P, Ola FG, Blomberg J, Ilbäck NG (2007) Sequential trace element changes in serum
and blood during a common viral infection in mice. J Trace Elem Med Biol 21:29–36
14. Ren YL, Zhang ZY, Ren YQ, Li W, Wang MC, Xu G (1997) Diagnosis of lung cancer based on metal
contents in serum and hair using multivariate statistical methods. Talanta 44:1823–1831
15. Zhang ZY, Zhou HL, Liu SD, Harrington P (2006) Application of Takagi–Sugeno fuzzy systems to
classification of cancer patients based on elemental contents in serum samples. Chemom Intell Lab Syst
82:294–299
16. Gray BN, Walker C, Barnard R, Bennett RC (1982) Use of serum copper/zinc ratio in patients with large
bowel cancer. J Surg Oncol 20:230–232
17. Schrauzer GN (2009) Selenium and selenium-antagonistic elements in nutritional cancer prevention. Crit
Rev Biotechnol 29:10–17
18. Guo JK, Deng WH, Zhang LC, Li CH, Wu P, Mao PL (2007) Prediction of prostate cancer using
hair trace element concentration and support vector machine method. Biol Trace Elem Res 116:257–
271
19. Tan C, Chen H, Xia CY (2009) Early prediction of lung cancer based on the combination of trace
element analysis in urine and an Adaboost algorithm. J Pharm Biomed 49:746–752
20. Tan C, Chen H, Xia CY (2009) The prediction of cardiovascular disease based on trace element contents
in hair and a classifier of boosting decision stumps. Biol Trace Elem Res 129:9–19
21. Bermejo S, Cabestany J (2004) Ensemble learning for chemical sensor arrays neural. Process Lett 79:25–
35
22. Zhang MH, Xu QS, Daeyaert F, Lewi PJ, Massart DL (2005) Application of boosting to classification
problems in chemometrics. Anal Chim Acta 544:167–176
23. He P, Fang KT, Liang YZ, Li BY (2005) A generalized boosting algorithm and its application to two-
class chemical classification problem. Anal Chim Acta 543:181–191
24. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions.
Marchine Learning 37:297–336
25. Tan C, Chen H, Zhu WP (2010) Application of boosting classification and regression to modeling the
relationships between trace elements and diseases. Biol Trace Elem Res 134:146–159
108 Tan and Chen
26. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application
to boosting. J Comput Syst Sci 55:119–139
27. Daszykowski M, Walczak B, Massart DL (2002) Representative subset selection. Anal Chim Acta
468:91–103
28. Galváo RKH, Araújo MCU, Martins MN, José GE, Pontes MJC, Silva EC, Saldanha TCB (2005) A
method for calibration and validation subset partitioning. Talanta 67:736–740
29. Tan C, Li ML, Qin X (2007) Study of the feasibility of distinguishing cigarettes of different brands using
an Adaboost algorithm and near-infrared spectroscopy. Anal Bioanal Chem 389:667–676
30. Alatise OI, Schrauzer GN (2010) Lead exposure: a contributing cause of the current breast cancer
epidemic in Nigerian women. Biol Trace Elem Res 136:127–139
31. Schrauzer GN (2008) Interactive effects of selenium and cadmium on mammary tumor development and
growth in MMTV-infeced female mice. A model study on the roles of cadmium and selenium in human
breast cancer. Biol Trace El Res 123:27–34

2011 Screening of Prostate Cancer by Analyzing Trace Elements in Hair and Chemometrics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2011 Screening of Prostate Cancer by Analyzing Trace Elements in Hair and Chemometrics

Uploaded by

Copyright:

Available Formats

Biol Trace Elem Res (2011) 144:97–108

Screening of Prostate Cancer by Analyzing Trace

Chao Tan & Hui Chen

Received: 1 March 2011 / Accepted: 14 March 2011 /

Keywords Prostate cancer . Element . Boosting . Chemometrics

Theory and Method

5. Update the probability distribution over @t as

where, Zt is a normalization factor (it should be chosen so that Dt+1 is a

& Output: the ensemble classifier as

f ðx; j; b; sÞ ¼ s signðxj bÞ;

Sample Set Partitioning

di ðM Þ ¼ minfDi1 ; Di2 ; ; DiM g

Results and Discussion

Sample Set Partitioning and Primary Statistics

Cancer (n=58) Healthy (n=55)

Mean±SD Min–max R-to-Sea Mean±SD Min–max

Na 359.83±410.43 12.92–2276.4 0.69 257.82±258.54 0.05–1109

R-to-Se denotes the correlation coefficient to Se

Fig. 1 The frequency histogram

higher concentrations of lead and cadmium. Furthermore, to get an overview of data

Fig. 2 Explained variance vs. the

Fig. 3 Score plot of the first

variable selection is adopted before modeling. In order to measure the differentiating

Fig. 4 The index values of var_-

Fig. 5 Misclassified rate (MCR)

Classification Model of Prostate Cancer

Fig. 6 Misclassified rate (MCR)

Fig. 7 Comparison of diagnosis

You might also like