You are on page 1of 6

JID:IRBM AID:555 /FLA [m5G; v1.260; Prn:30/05/2019; 16:23] P.

1 (1-6)
IRBM ••• (••••) •••–•••

Contents lists available at ScienceDirect

IRBM

www.elsevier.com/locate/irbm

Original Article

Determination of the Blood, Hormone and Obesity Value Ranges that


Indicate the Breast Cancer, Using Data Mining Based Expert System
S.B. Akben
Department of Electrical and Electronics Engineering, Engineering Faculty, Osmaniye Korkut Ata University, Osmaniye, Turkey

h i g h l i g h t s g r a p h i c a l a b s t r a c t

• Blood analysis and anthropometric value


ranges indicating breast cancer were pre-
sented visually.
• The decision tree was determined as the
method to diagnose breast cancer with a
success rate of 90.5.
• Importance of each attribute was pre-
sented numerically to medical experts for
a preliminary diagnosis of breast cancer.

a r t i c l e i n f o a b s t r a c t

Article history: Breast cancer is a dangerous type of cancer that spreads into other organs over time. Therefore, medical
Received 24 January 2019 studies are being done for the early diagnosis by means of the anthropometric data and blood analysis
Received in revised form 28 April 2019 values besides the mammographic and histological findings. However, medical studies have identified
Accepted 24 May 2019
only cancer-related values but the value ranges indicating the cancer have not been determined yet.
Available online xxxx
Concurrently the automated diagnostic systems are being developed to assist medical specialists in
Keywords: biomedical engineering studies. The range of values or boundaries indicating the cancer are automatically
Breast cancer determined in biomedical methods, but only the diagnostic result is presented. Because of this, biomedical
Data mining studies don’t provide enough opportunity for medical experts to evaluate the relationship between values
Obesity and result. In this study, decision trees that is one of data mining method was applied to anthropometric
Hormone data and blood analysis values to complete the mentioned deficiencies in breast cancer diagnosis aiming
Blood analysis studies. The determined value ranges were also presented visually to medical experts understand them
easily. The proposed diagnostic system has accuracy rate up to 90.52% and provides value ranges indicating
the breast cancer as well as mathematically presents the relations between the values and cancer.
© 2019 AGBM. Published by Elsevier Masson SAS. All rights reserved.

1. Introduction Although proliferation of cells takes a long time, proliferating cells


can spread to other organs through lymph and blood, and the re-
Breast cancer is a common type of cancer and is the most com- sponse to treatment fails in the advanced phases of the cancer
mon type of cancer in women. This type of cancer occurs as a period which is divided into 4 phases. Therefore, breast cancer is a
result of uncontrolled proliferation of cells in the breast canal milk very dangerous and lethal type of cancer that should be diagnosed
ducts (ductal carcinoma) or the milk glands (lobular carcinoma). early [1–4].
For the diagnosis of breast cancer, general methods such as di-
agnostic by hand, mammography, ultrasonography and biopsy are
E-mail address: batuhanakben@osmaniye.edu.tr. used. With these methods, the diagnostic data of persons are ob-

https://doi.org/10.1016/j.irbm.2019.05.007
1959-0318/© 2019 AGBM. Published by Elsevier Masson SAS. All rights reserved.
JID:IRBM AID:555 /FLA [m5G; v1.260; Prn:30/05/2019; 16:23] P.2 (1-6)
2 S.B. Akben / IRBM ••• (••••) •••–•••

Table 1
Statistical values of the data set containing 52 breast cancer patients and 64 control group members.

Control Group Breast Cancer Patients


Age (Years) 58.08±18.96 56.67±13.49
Body Mass Index-BMI (kg/m2 ) 28.32±5.43 26.98±4.62
Glucose (mg/dL) 88.23±10.20 105.56±26.56
Insulin (μU/mL) 6.93±4.86 12.51±12.32
Hemostasis Model Assessment-HOMA 1.55±1.22 3.62±4.59
Leptin (ng/mL) 26.64±19.34 26.60±19.21
Adiponectin (μg/mL) 10.32±7.63 10.06±6.19
Resistin (ng/mL) 11.62±11.45 17.25±12.64
Serum Monocyte Chemoattractant Protein-1 (pg/dL) 499.73±292.24 563.02±384.00

tained by determining the dimensional and other characteristics of sion trees algorithm and presenting the rules determined by deci-
the tumour. Afterwards, expert medical doctors correlate the past sion trees visually. These visuals presented to assist expert medical
and available data and decide whether the person is cancer [5,6]. doctors in diagnosis, can be used without requiring any computer-
In order to assist the expert medical doctors in the decision- ized device or mathematical knowledge. The data used to test the
making process, there are also engineering studies that determine proposed method are leptin, resistin, body mass index etc. which
the correlation between past and available data with mathematical have recently been determined to be associated with breast cancer
methods and produce the diagnostic results automatically [7–11]. and provide the possibility of diagnosis at an earlier stage. Thus,
In these engineering studies, past cancer data and corresponding the value ranges indicating the breast cancer of these data was de-
diagnose results were first uploaded by medical experts to com- termined and presented to medical literature.
mon databases, then data mining or machine learning algorithms
were applied to uploaded data and successful automated diag- 2. Materials
nostic methods were proposed. Especially in studies using “Breast
Cancer Wisconsin (Original)” and “Breast Cancer Wisconsin (Diag- The data set used in this study was recorded in the Department
nostic)” datasets, many successful automated diagnostic methods of Obstetrics of Coimbra University Hospital between 2009-2013.
performed using the dimensions and other characteristic data of Whether the subjects had breast cancer was determined accord-
tumours are available in the biomedical literature [12–15]. ing to mammography results and histologically confirmed by ex-
Recent medical studies have claimed that some blood and hor- pert medical doctors. It was approved by expert medical doctors
mone data such as insulin, leptin, resistin, Adiponectin, polypep- that infection, acute illness or comorbidity have not observed in
tide-specific antigen (M3 Epitope of TPA), breast cancer-specific the control group (group without cancer). Data were obtained be-
cancer antigen 15.3 (CA15–3), insulin-like growth factor binding fore the treatment and surgery. The data set consists of blood,
protein-3 (IGFBP-3) can also be used for the diagnosis of breast hormones and anthropometric attributes and these attributes are
cancer [16–20]. These recent medical findings have provided the associated with cancer according to medical literature. The main
earlier diagnosis possibility as compared to the cancer diagnosis
purpose of recording the data is to evaluate hyperresistinemia and
based on the dimensional and other characteristic features of the
metabolic dysregulation in breast cancer.
tumour and have also been the reference for new research such as
This data set has been used in previous studies for the purpose
the relationship between obesity and cancer. However, the value
of automatic breast cancer diagnosis and is explained in more de-
ranges of the data pointing to cancer were not determined exactly
tail in these studies. Also, it is published on a website named UCI
in these studies.
Machine Learning Repository as open access [21,28]. The statistical
On the other hand, data mining and machine learning meth-
characteristics of the data set used (mean and standard deviation)
ods that will automatically diagnose the breast cancer at an earlier
can be seen from Table 1, and more detailed medical information
stage by using new medical findings such as leptin, resistin, obesity
about the data set can be obtained from previous studies.
etc. have also been proposed simultaneously in the biomedical en-
Table 2 shows the anatomicopathological features of breast can-
gineering literature. These engineering studies have also compared
cer patients. As can be seen from Table 2, most of the cancer
the success of automated diagnosis methods performed using new
patients are in the first two phase and the diagnostic method to
medical findings and tumour characteristics [21–27]. In summary,
be developed by the subjects in this data set will be appropriate
new engineering methods are tried to develop in parallel with new
for early diagnosis.
medical findings.
However, the automated diagnostic methods recommended in
the biomedical engineering literature are unintelligible for expert 3. Method
medical doctors and do not allow them to interpret the relation-
ship between results and data. Furthermore, the value ranges of Decision trees were used as a method to determine value
the data pointing to cancer were not presented in these studies ranges in the study. The accuracy of the decision tree was deter-
likewise medical studies. But expert medical doctors prefer meth- mined by cross validation and ROC analysis.
ods that do not require high knowledge of mathematics and allow
interpretation without a computer. So, they require to know the 3.1. Decision trees
value ranges of the data pointing to cancer. If so, there is still
needed for successful automated diagnostic method that presents Decision trees method is an algorithm that decides which at-
the mathematical relationship between the data and the result in tributes are to be branched by determining the relationship be-
a clear and understandable way to medical experts. tween the attributes in the data set and class labels. In this algo-
Therefore, a new breast cancer automated diagnosis system has rithm, firstly the most important attribute (most associated with
proposed by using data mining methods in this study and this the class labels) is separated into two branches and the separa-
system also allows to medical experts to easily understand and in- tion limit is determined. Then, the same process is repeated for
terpret the relationship between data and diagnostic result. The each branch. This procedure creates a decision tree that determines
method used in the study is using the data as inputs to the deci- which attribute value ranges should be selected to obtain the class
JID:IRBM AID:555 /FLA [m5G; v1.260; Prn:30/05/2019; 16:23] P.3 (1-6)
S.B. Akben / IRBM ••• (••••) •••–••• 3

Table 2
Anatomicopathological features of breast cancer patients [21].

Tumour Phase Tumour Phase Tumour Phase Lymph node The state of estrogen
(Number and (Number and (Number and involvement (ER) and progesterone
Percent) Percent) Percent) (Number and (PR) receptors and
Percent) protein CerbB2
(Number and Percent)
I 0 ≤2cm Yes ER+ (53, 82.8%)
(14, 21.3%) (5, 7.8%) (54, 84.4%) (27, 42.2%) ER- (5, 7.8%)
II I >2cm No PR+ (52, 81.3%)
(41, 63.9%) (29, 45.3%) (10, 15.6%) (37, 57.8%) PR- (6, 9.4%)
III II CERB2+ (18, 28.1%)
(9, 14.8%) (30, 46.9%) CERB2− (40, 62.5%)

labels. The attribute importance level is determined by the infor- testing. The rules of the method (method is the decision tree in
mation gain value of the Entropy algorithm or by the Gini index of this study) are determined according to the training set and the
the Gini algorithm [29]. rules are tested in the test set. Then the training and test sets are
refreshed by creating the new training and test sets. The same pro-
3.2. Gini algorithm cedure is repeated for refreshed sets [32]. In this study, in each of
the 10 cycles, randomly selected 90% of the data set was used as
In Gini algorithm, attribute importance level is determined by training set while remaining 10% was used as test set. So, different
Gini coefficient. The Gini-left and Gini-right values of the attribute training and test sets were generated in each of the 10 cycles. The
are calculated as in Equations (1) and (2) before Gini coefficient is decision tree boundaries formed in each cycle were averaged.
calculated. In the equations (1) and (2), k is the number of classes,
3.5. ROC analysis
T x is the number of samples in the branch x of a node, L i is the
number of samples in the category i of the left branch and R i is
ROC analysis is a method used to determine whether the accu-
the number of samples in the category i of the right branch.
racy rate is distributed equally for classes. In this study, it is used
k 
 2 to determine whether the success is equally distributed for healthy
Li
Ginileft = 1 − (1) subjects and cancer patients. In the ROC analysis, the accuracy rate
| T sol | for the healthy subjects is called specificity while the accuracy rate
i=
k 
 2 for the patients is called sensitivity [33]. The overall determinism
Ri
Giniright = 1 − (2) of the method (the ratio of determining both classes correctly) is
| T sağ | determined by the accuracy value. The calculation of specificity,
i=
sensitivity and accuracy values in this study is shown in equations
Gini coefficient is determined according to Ginileft and Giniright val-
(6), (7) and (8).
ues as can be seen in Equation (3).
HSCH
1  Specificity = (6)
Gini j = | T left |Ginileft + | T right |Giniright (3) HSCH + PCH
n PCP
The first branching in decision trees is made for the attribute has Sensitivity = (7)
PCP + HSCP
smallest Gini index (GI) or largest 1-GI [30]. HSCH + PCP
Accuracy = (8)
HSCH + HSCP + PCP + PCHS
3.3. Entropy algorithm
In the Equations (6), (7) and (8) HSCH means Healthy Subjects
In the entropy algorithm, information gains are calculated based Classified as Healthy, PCP means Patients Classified as Patient,
on the entropy value of the attributes. The attribute with the HSCP means Healthy Subjects Classified as Patient, PCH means Pa-
largest information gain is primarily branched because it is most tients Classified as Healthy.
associated with the classes. The information gains of the attributes
4. Results and discussion
are calculated as in Equation (5) and the entropy values are at-
tributes are calculated as in Equation (4) [31]. In the study, the optimum number of branching (number of

n nodes) and division algorithm were determined first. Therefore, the
Entropy( T ) = E ( T ) = − p i log2 ( p i ) (4) cross-validation error was determined for Entropy and Gini algo-
i =1
rithms after each branching. Fig. 1 shows the relationship between
the number of branches and the cross-validation error for Entropy

n
|T i | and Gini algorithms.
Information Gain( X , T ) = E ( T ) − E (T i ) (5)
|T | i As can be seen from Fig. 1, the ideal number of nodes for both
i =1
algorithms is 7, but the cross-validation error is smaller if 7 nodes
The p i in Equations (4) and (5) is the probability of class i and T i are used for the Gini algorithm. Therefore, 7-node decision tree us-
is the number of class label i while X is the attribute. ing Gini algorithm was preferred in the study. The 7-node decision
tree obtained by Gini algorithm is as in Fig. 2. The accuracy of the
3.4. Cross-validation decision tree shown in Fig. 2 is 90.52%
If the decision tree in Fig. 2 is taken into consideration, it is
It is an algorithm used to ensure that the results calculated by seen that BMI is predictive for a small number of subjects. So, it
the methods (decision tree, etc.) are not coincidental. According to may be thought that the division due to BMI is coincidental. How-
this algorithm, the data set is divided into two sets as training and ever, in order to understand whether the division due to the BMI
JID:IRBM AID:555 /FLA [m5G; v1.260; Prn:30/05/2019; 16:23] P.4 (1-6)
4 S.B. Akben / IRBM ••• (••••) •••–•••

Fig. 1. Relationship between the number of branch and cross-validation error for Entropy and Gini algorithms.

Fig. 2. Visual schematic of decision tree rules.

is coincidental, the importance of the attributes should be consid- The overall prediction accuracy of the decision tree that has 7
ered. Fig. 3 shows the importance of all the attributes. branches and created by Gini algorithm is 90.52%. Although this
Although BMI seems to be important for a small number of accuracy rate seems to be adequate, it should be compared with
subjects, it has considerable importance for all of data set as seen other classification methods and the success of the decision tree
in Fig. 3. In fact, it is seen that BMI is more important than Re- proposed for breast cancer diagnosis in this study should be eval-
sistin in the data set. Therefore, it cannot be said that the division uated. Therefore, the accuracy of decision tree proposed in this
due to the BMI is coincidental. Nevertheless, it should be kept in study in Table 3 is compared with accuracies of other classifica-
tion methods.
mind that accuracy rate is 86.2% without using nodes related to
As seen from Table 3, the decision tree created by Gini algo-
BMI. Another noteworthy point in Fig. 3 is that although the Gini
rithm is more successful than other classification methods. There-
algorithm does not find HOMA sufficiently meaningful for division,
fore, the most successful method for the diagnosis of breast cancer
this may change in different data sets containing more subjects.
using the blood, hormone and obesity data is the decision tree
Considering that 116 subjects were used in the study and new at- based on Gini algorithm.
tributes related to cancer may enter to medical literature, it should If the study is evaluated in terms of methodology and gen-
be kept in mind that the attribute limits and importance ranking eral findings, it is necessary to compare this study with the main
determined in this study may vary then the diagnostic success may previous study (the study of scientists who published the data
reduce. Furthermore, some patients are in the later phases of can- used in this study in open access) result that used same data set.
cer and they may have changed the success rate. This case should Glucose, age, resistin and BMI are determined as the most pre-
be taken into consideration in future studies. dictive attributes in the previous study are. This result coincides
JID:IRBM AID:555 /FLA [m5G; v1.260; Prn:30/05/2019; 16:23] P.5 (1-6)
S.B. Akben / IRBM ••• (••••) •••–••• 5

Fig. 3. Importance degree of attributes for branching.

Table 3 decision tree pointing to breast cancer (fourth branch from left
Accuracy rates of classification methods. to right) is shaped due to the relationship between obesity, high
Methods Specificity Sensitivity Accuracy glucose and cancer. It is also noteworthy that in another branch
Naive Bayes 88.5% 43.8% 63.8% (third branch from left to right) of the determined decision tree,
SVM 82.7% 73.5% 77.6% low glucose and high resistin of a non-obese person indicates the
KNN 84.6% 79.7% 81.9% breast cancer. This finding also coincides with the medical litera-
Decision Tree (Gini) 90.4% 92.2% 90.5%
ture because there are many findings in the medical literature that
Discriminant Analysis 78.9% 75.0% 76.7%
Linear Regression 82.7% 67.2% 74.2% glucose is feeding on cancer and the level of glucose in cancer pa-
Ada Boost 84.9% 88.9% 87.7% tients increases [34]. However, when it is considered that resistin
ANN 80.87% 79.23% 79.43% is a hormone that decreases the glucose level, it can be concluded
that the glucose, which has a tendency to increase due to can-
cer, is tried to be stopped by the body by increasing its resistin
with the result of the current study. So, it can be said that the
[35]. In other words, this rule is not contrary to medical literature
findings of current study are approved by the biomedical engineer-
and it will be a good source in future studies. Another branch that
ing literature. The accuracy of the previous study using the same
points to cancer (the right-most branch) likewise conforms to the
data ranged from 87% to 91%. In the current study, a non-variable
medical literature because, as noted above, the positive correlation
90.52% accuracy rate was obtained. That is, the current study is
between cancer and glucose is present in the medical literature.
more stable and successful than the method proposed in the pre-
In summary, the medical findings of the study are supported by
vious study. Nevertheless, the main previous study has proposed
the medical literature as well as being the good source for future
only automatic diagnosis method such as other biomedical cancer
medical studies.
studies and has not provided visuals to expert medical doctors un-
derstand it easily. In this respect, the superiority of the present
study to this previous study can be mentioned. In other previous 5. Conclusion
studies using the same data set, different machine learning algo-
rithms were tried, but reached successes were almost same or less In the study, a diagnostic method based on anthropometric and
as compared to the success of present study. However, in these blood analysis data was proposed to assist expert medical doctors
studies, no findings were presented for the medical literature and for their diagnosis of breast cancer. The proposed method is based
only automated diagnostic algorithms were proposed. In addition, on the decision tree algorithm and the results are presented as
the use of pre-processing in some these studies may also make the visuals that expert medical doctors can easily understand.
proposed algorithms specific for only data set used. This is another The accuracy rate of the method is 90.52%. The proposed
disadvantage for them, as compared to present study [23,24,26,27]. method generally indicates that breast cancer may occur in hu-
116 subjects were used in both studies. If more subjects are mans in the following cases:
used, there may be some changes in the proposed images in this
study, such as the value ranges and the order of importance of the • Middle-aged people (over 45 years old) with low blood sugar
attributes. However, it was determined in this study that the de- and high resistin levels while they are not obese
cision tree method has successful prediction and the decision tree • Elderly people (over 65 years old) with high blood sugar
method has the possibility to present visual results. If so, it can be • Obese and mature people (under 65 years old) with high blood
said that the successful results will be obtained again even if the sugar
number of subject changes. So, the proposed method can produce
more validated results with a larger dataset. In this respect, it can These findings were obtained for 116 subjects and may vary
be said that the current study will be pioneer for future studies. slightly if more subjects were used. However, the method is more
If the findings were examined from a medical point of view, it suitable for producing approved results in the medical literature
was found that the determinants were glucose, age, BMI and re- using more data and is also suitable for development. The method
sistin. There are many conclusions that these attributes are very can also be used to assess the relationship between cancer and
related to cancer cells in medical research. Thus, the results in other data pointing the cancer that could be proposed in the med-
the medical literature coincide with the findings of the current ical literature. In addition, a blood analysis-based pre-diagnosis
study. At the same time, the relationship between breast cancer device can be produced by approving the attribute limits by med-
and obesity is the another finding of this study and this finding ical authorities after applying the method to more subjects. In
is coincided with the medical literature. Because a branch of the this way, people can obtain preliminary information about whether
JID:IRBM AID:555 /FLA [m5G; v1.260; Prn:30/05/2019; 16:23] P.6 (1-6)
6 S.B. Akben / IRBM ••• (••••) •••–•••

they have cancer or not, and this can be determined by the simple [13] Shajahaan SS, Shanthi S, ManoChitra V. Application of data mining techniques
device used in daily life. to model breast cancer data. Int J Emerg Technol Adv Eng 2013;3(11):362–9.
[14] Lavanya D, Rani DKU. Analysis of feature selection with classification: breast
cancer datasets. Indian J Comput Sci Eng (IJCSE) 2011;2(5):756–63.
Human and animal rights
[15] Sarvestani AS, Safavi AA, Parandeh NM, Salehi M. Predicting breast cancer sur-
vivability using data mining techniques. In: 2nd international conference on
The authors declare that the work described has been carried software technology and engineering, vol. 2. Ekim; 2010. p. 227–31.
out in accordance with the Declaration of Helsinki of the World [16] Nicolini A, Ferrari P, Rossi G. Mucins and cytokeratins as serum tumor markers
Medical Association revised in 2013 for experiments involving hu- in breast cancer. In: Advances in cancer biomarkers. Dordrecht: Springer; 2015.
p. 197–225.
mans as well as in accordance with the EU Directive 2010/63/EU
[17] Shao Y, Sun X, He Y, Liu C, Liu H. Elevated levels of serum tumor markers
for animal experiments. CEA and CA15-3 are prognostic parameters for different molecular subtypes of
breast cancer. PLoS ONE 2015;10(7):e0133830.
Informed consent and patient details [18] Scully T, Scott CD, de Silva HC, Firth SM, Twigg SM, Pintar JE, et al. Insulin-
like growth factor binding protein-3 (IGFBP-3) enhances obesity-related breast
tumorigenesis. Cancer Res 2016;76(14):741.
The authors declare that this report does not contain any per-
[19] Şavaş HB, Gültekin F. İnsülin Direnci ve Klinik Önemi. SDÜ Tıp Fakültesi Dergisi
sonal information that could lead to the identification of the pa- 2017;24(3):116–25.
tient(s). [20] Akin S, Akin S, Gedik E, Haznedaroglu E, Dogan AL, Altundag MK. Serum
chemerin level in breast cancer. Int J Hematol Oncol (UHOD: Uluslararasi
Funding Hematoloji Onkoloji Dergisi) 2017;27(2):127–32.
[21] Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, et al. Using
resistin, glucose, age and BMI to predict the presence of breast cancer. BMC
This work did not receive any grant from funding agencies in Cancer 2018;18(1):29.
the public, commercial, or not-for-profit sectors. [22] Jain R, Jamal S, Goyal S, Wahi D, Singh A, Grover A. Resisting the resistance
in cancer: cheminformatics studies on short-path base excision repair pathway
Author contributions antagonists using supervised learning approaches. Comb Chem High Through-
put Screen 2015;18(9):881–91.
[23] Aslan MF, Celik Y, Sabanci K, Durdu A. Breast cancer diagnosis by different
All authors attest that they meet the current International Com- machine learning methods using blood analysis data. Int J Intell Syst Appl Eng
mittee of Medical Journal Editors (ICMJE) criteria for Authorship. 2018;6(4):289–93.
[24] Polat K, Sentürk U. A novel ML approach to prediction of breast cancer: com-
Declaration of Competing Interest bining of mad normalization, KMC based feature weighting and AdaBoostM1
classifier. In: 2nd international symposium on multidisciplinary studies and in-
novative technologies (ISMSIT-2018); 2018. p. 1–4.
The authors declare that they have no known competing finan- [25] Li Y, Chen Z. Performance evaluation of machine learning methods for breast
cial or personal relationships that could be viewed as influencing cancer prediction. Appl Comput Math 2018;7(4):212–6.
the work reported in this paper. [26] Silva Araújo VJ, Guimarães AJ, de Campos Souza PV, Silva Rezende T, Souza
Araújo V. Using resistin, glucose, age and BMI and pruning fuzzy neural net-
work for the construction of expert systems in the prediction of breast cancer.
References
Mach Learn Knowl Extr 2019;1(1):466–82.
[27] Fijri AL, Rustam Z. Comparison between fuzzy kernel C-means and sparse
[1] Olivotto I, Gelmon K, Kuusk U, Draper J, McCready D. Intelligent patient guide learning fuzzy C-means for breast cancer clustering. In: IEEE international con-
to breast cancer. Intelligent Patient Guide Limited; 2016. ference on applied information technology and innovation (ICAITI). Indonesia:
[2] Clark A, Fallowfield LP. Breast cancer. CRC Press; 2014. Padang; September 2018. p. 158–61.
[3] Stewart BWKP, Wild CP. World cancer report 2014. Health; 2017.
[28] Website: UCI machine learning repository, breast cancer Coimbra data set.
[4] Dossus L, Benusiglio PR. Lobular breast cancer: incidence and genetic and non-
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra.
genetic risk factors. Breast Cancer Res 2015;17(1):37.
[29] Silahtaroğlu G. Veri madenciliği. İstanbul: Papatya Yayınları; 2018.
[5] Senkus E, Kyriakides S, Ohno S, Penault-Llorca F, Poortmans P, Rutgers E, et al.
[30] Adak MF, Yurtay N. Gini Algoritmasını Kullanarak Karar Ağacı Oluşturmayı
Primary breast cancer: ESMO clinical practice guidelines for diagnosis, treat-
Sağlayan Bir Yazılımın Geliştirilmesi. Int J Inf Technol 2013;6(3):1–6.
ment and follow-up. Ann Oncol 2015;26(5):8–30.
[31] Aktas MS, Kalıpsız O. Veri Madenciliğinde Özellik Seçim Tekniklerinin
[6] Dashevsky BZ, Goldman DA, Parsons M, Gönen M, Corben AD, Jochelson MS,
Bankacılık Verisine Uygulanması Üzerine Araştırma ve Karşılaştırmalı Uygu-
et al. Appearance of untreated bone metastases from breast cancer on FDG
lama. In: 9th Turkish national software engineering symposium (UYMS 2015).
PET/CT: importance of histologic subtype. Eur J Nucl Med Mol Imaging
Izmir, Turkey: Yasar University; 2015. p. 72–83. Eylül.
2015;42(11):1666–73.
[7] Sumbaly R, Vishnusri N, Jeyalatha S. Diagnosis of breast cancer using decision [32] Örekici TG, Erdoğan S, Ankaralı H. Sınıflama Modelinin Performansını
tree data mining technique. Int J Comput Appl 2014;98(10):16–24. Geliştirmede Yeniden Örnekleme Yöntemlerinin Kullanımı. Bilişim Teknolojileri
[8] Lavanya D, Rani KU. Ensemble decision tree classifier for breast cancer data. Int Dergisi 2012;5(3):1–8.
J Inf Technol Convergence Serv 2012;2(1):17–24. [33] Akçetin E, Çelik U. Karınca Kolonisi Optimizasyonu Sınıflandırma Algoritması
[9] Danacı M, Çelik M, Akkaya AE. Veri Madenciliği Yöntemleri Kullanılarak Meme Yöntemi İle Telefon Bankacılığında Doğrudan Pazarlama Kampanyası Üzerine
Kanseri Hücrelerinin Tahmin ve Teşhisi. In: Akıllı sistemlerde Yenilikler ve Bir Sınıflandırma Analizi. J Internet Appl Manag (İnternet Uygulamalarıve Yöne-
Uygulamaları Sempozyumu (ASYU’2010); 2010. p. 21–4. timi Dergisi) 2015;6(1).
[10] Poyraz O. Tıpda veri madenciliği uygulamaları: Meme kanseri veri seti analizi. [34] Acar B. Erken evre meme kanserinde glikoz taşıyıcısı 1 ve mast hücre triptazı
Yüksek lisans tezi. Trakya Üniversitesi; 2012. düzeylerinin prognostik ve prediktif faktörlerle ilişkisi. Uzmanlık Tezi. Adnan
[11] Aydın EA, Keleş MK. Breast cancer detection using K-nearest neighbors data Menderes Üniversitesi, Tıp Fakültesi; 2009.
mining method obtained from the bow-tie antenna dataset. Int J RF Microw [35] Demiray AG. İleri evre küçük hücreli dışı akciğer kanserli hastalarda serum
Comput-Aided Eng 2017;27(6). leptin, adiponektin, resistin ve ghrelin düzeylerinin yaşam kalitesi ile ilişkisi.
[12] Şentürk A, Şentürk ZK. Yapay Sinir Ağları İle Göğüs Kanseri Tahmini. El-Cezeri Uzmanlık Tezi. Pamukkale Üniversitesi, Tıp Fakültesi; 2011.
Fen ve Mühendislik Dergisi 2016;3(2):345–50.

You might also like