Professional Documents
Culture Documents
h i g h l i g h t s
a r t i c l e i n f o a b s t r a c t
Article history: Compressive strength and UPV parameters are the methods that are used to determine high-volume
Received 6 March 2015 mineral admixture concrete quality. But experiments for all levels of these parameters are expensive,
Received in revised form 11 June 2015 difficult and time consuming. For determination of output values, classifiers with model extraction
Accepted 12 June 2015
features can be used. In this study, classifiers, with the rule-based M5 rule and tree model M5P in the area
Available online 10 July 2015
of data mining are used to predict the compressive strength and UPV of concrete mixtures after 3, 7, 28
and 120 days of curing. The M5 rule and tree model M5P are tested using the available test data of 40
Keywords:
different concrete mix-designs gathered from literature [1]. The input of the model is a variable data
Data mining
M5 rule
set corresponding to concrete mixture proportions. The findings of this study indicated that the M5 rule
Tree model M5P and tree model M5P models are sufficient tools for estimating the compressive strength and UPV of con-
Concrete crete. 97% and 87% success is obtained in predicting compressive strength and UPV results, respectively.
Compressive strength Ó 2015 Elsevier Ltd. All rights reserved.
UPV
http://dx.doi.org/10.1016/j.conbuildmat.2015.06.029
0950-0618/Ó 2015 Elsevier Ltd. All rights reserved.
236 Y. Ayaz et al. / Construction and Building Materials 94 (2015) 235–240
The decision tree method is one of the data mining methods and algorithm is used for both classifying and solution of regression problems. The
CART algorithm can use both numerical and nominal data types. The CART benefits
the tree structures are one of the supervised learning methods.
from Gini index as branch criteria [23].
Trees that are used for numeric prediction are just like ordinary To build a M5 tree, dividing criteria must be defined. Dividing the branches
decision trees called a regression tree, except that at each leaf they criteria is based on standard deviation of the attribute values. The attribute that
store either a class value that represents the average value of reduces expected error is chosen as the root of the tree. The formula of standard
instances that reach the leaf [12]. deviation reduction (SDR) is calculated as [20]
which choose the attribute and its splitting value for each node
to maximize the information gain, model trees minimize the where
intra-subset variation in the class values down each branch. In
T: set of attributes values
other words, for each node a model tree chooses an attribute and Ti: attribute value that taken from divided node according to selected attribute
its splitting value to maximize the expected error reduction average value of the sets of T attribute
T:
(standard deviation reduction) [13]. sd(T): standard deviation of T.
Data mining techniques have been mostly used in many engi-
neering applications including the behavior of concrete materials The tree grooving process, except CART’s attribute choosing, is taken from CART
[21]. After building the tree, pruning the tree must be done to increase classification
and structures in recent years [14–18]. Usability of prominent
performance. Pruning procedure deletes the branches which result error in the
modeling techniques such as M5 rule and tree model M5P is not learning data. Soon as the tree building procedure starts, the CART tree enlarges
known. The aim of this paper is to construct models, M5 rule and by dividing continuously without any stopping rule. When the building ends,
tree model M5P, to evaluate the effect of fly ash (FA) and blast fur- pruning starts from the leafs to root. After every pruning, most successful tree is
nace slag (BFS) on compressive strength and the UPV of concrete. determined [24].
At M5 algorithm, for the pruning process first of all, difference between real
For this purpose, WEKA workbench was used to evaluate the data.
class value and predicted value is averaged at every training example reach the
WEKA contains lots of data mining techniques and machine learn- node. This averaged value is multiplied by this coefficient [20]
ing algorithms. It is developed for researchers who want to try out
existing methods on new datasets easily. It gives comprehensive nþv
ð4Þ
support for experimental data mining, containing preparing input nv
data, evaluating statistically learning schemes, and showing out-
where
come of learning [12]. Using WEKA, models were constructed,
trained and tested using the available test data of 40 different con- n: number of training examples at the node
crete mix-designs gathered from literature [1] (see Table 1). In this v: number of parameter that represent class value at the node.
paper, M5 rule and tree model M5P were utilized in order to pre-
dict compressive strength and UPV of concretes containing FA
2.1.2. M5 rule
and BFS without performing any experiments. In training of the M5 rule is a rule based learning technique and can predict the nominal and
models, cement content, BFS content, FA content and curing period numeric values. M5 rule sets is generated from model trees. Model trees are enough
were entered as input, while compressive strength and UPV values to predict numeric and nominal values. Model trees are used in the M5 rule. The
were used as outputs. 97% success is achieved in prediction of com- rule algorithm works by repeating model tree building and trying to select best rule
at every cycle.
pressive strength.
M5 rule generates the rules from M5 tree based on the partial and regression
tree (PART) algorithm that presented by Frank and Witten [25] in 1998. To build
a M5 rule following steps are applied:
2. Materials and methods
(1) A M5 tree learner is applied for whole training data.
2.1. Data mining algorithm (2) Such as M5P, the tree is pruned.
(3) The best leaf is turned into rule.
There are a lot of learning techniques in literature such as neural network, (4) Previous procedure continues until the whole instances are included in the
instance based learning, regression tree, standard regression, which are used to pre- rules. An instance can be included by different rules at the same time.
dict numeric values, but the techniques which have the best predicting perfor- (5) In contrast to PART, which employs the same strategy for categorical pre-
mance is not determined. Their performances change according to application diction, m5P Rules build full trees instead of partially explored trees [26].
area. In this study, different learning techniques are used to predict our UPV and
compressive strength values of the concrete. The best performance is obtained by
using the rule-based M5 rule and tree model M5P. 2.2. Data collection
The main objective of this study is developing models to predict the compres-
2.1.1. M5P Tree sive strength and UPV values of concrete. For this aim, at first it is needed to prepare
First of all, Quinlan [19] presented concept of ‘‘model tree’’ that he named it M5, data and construct databases for training and testing the models.
as a new method for dealing with learning problems in 1992. Model trees are Admixtures, such as fly ash (FA) and blast furnace slag (BFS), are used as
designed as a combination of decision tree and linear regression functions at the replacements for cement for improving the mechanical properties, decreasing the
leaves. Young and Witten [20] has developed M5P which perform better on dataset rate of hydration, decreasing the alkali aggregate reactivity and decreasing the per-
and they make some chances on the original M5 algorithm to reduce the tree size. meability of concrete. FA and BFS are the most common concrete ingredients due to
Now let’s describe how to build a M5 tree. Firstly, to build a model tree, their pozzolanic properties [1]. Zero percent, 50%, 60% and 70% FA or BFS were used
decision-tree algorithm is used. Secondly, a tree pruning way, which was presented in replacement of cement. In addition, 25% FA + 25% BFS, 30% FA + 30% BFS and 35%
by Breiman [21] and Quinlan [22] , is applied with some differences. They suggested FA + 35% BFS also replaced cement. The samples were tested at 3, 7, 28 and 120 days
a new decision tree algorithm CART (Classifying and Regression Tree). The CART for UPV and compressive strength. Mix proportions are given in Table 1.
Y. Ayaz et al. / Construction and Building Materials 94 (2015) 235–240 237
The compressive strength and UPV values of 40 concrete mixes are given in
Table 1. These values were obtained from WEKA workbench containing data mining
methods. In these tests, all data was entered numerically and all outputs were also
obtained numerically. So, the data was analyzed with the methods giving regression
results in WEKA tools. In these analyses, M5 rule and tree model M5P, which gives
best results for both compressive strength and UPV outputs, is used obtaining max-
imum success.
The data mining method is designed to predict only one output value although
there are two outputs shown in Table 1. So, compressive strength and UPV values
were divided into two data parts. Both data parts use same input attributes and
input values. These values are given in Table 1.
In the analysis when the cross validation value is defined as 15 folds it can be
observed that the correlation coefficient rate increases. The aim of data mining is
to find outputs on the independent input values and compare with experimental
outputs after the learning process is completed [12].
The M5 rule and tree model M5P is used to generate the models on the input
data and predict the compressive strength and UPV of concrete used in the study. Fig. 1. Comparison of experimental data with training results obtained from
The input data is divided into several parts, with each part in turn used to test a models.
model fitted to the remaining parts. For this study, a fifteen fold cross-validation
was used. The correlation coefficient and mean absolute error were used to judge
the performance of models in predicting the compressive strength and UPV with
different data used in present study.
The success and performance of methods is measured with correlation coeffi-
cient. The correlation coefficient (r) is a measure of the strength of the
straight-line or linear relationship between two variables. The correlation coeffi-
cient takes on values ranging between +1 and 1 [27].
One of the performance criteria in the analyses is mean absolute error. The
mean absolute error is one of the performance criteria in data mining. A model’s
ability to predict is good when mean absolute error approaches to zero [12].
40 concrete mixes collected from the literature [1] were evaluated. The input
variables were: the content of cement, BFS and FA, and curing period. The two out-
put variables were the compressive strength and UPV values. Therefore, input and
output variables, which are listed Table 1, were chosen for this study.
3.1. M5 rules model results for compressive strength values Fig. 2. Scatter plot of the measured versus predicted compressive strength of
trained data for M5 rule model.
The M5 rules model is one of the rules-based classification
methods. For this model, 40 instances shown in Table 1 were
A correlation coefficient of 0.97 means that a prediction is 97%
tested. Different cross-validation values were tested to find the
successful. The tree structure obtained by the M5P decision tree is
best cross-validation value. For this application 15-fold cross-
shown in Fig. 3.
validation which gave the best results was used. A 0.97 correlation
Considering these tree model rules are depending on days and
coefficient and a 2.71 mean absolute error was obtained.
two models are created for day value smaller and greater than
A correlation coefficient of 0.97 means that a prediction is 97%
17.5 days as in the M5 rule. M5P classifier results may be
successful. Rule-based classifier’ results may be expressed with
expressed with one or more linear equations. As a result of the
linear equations. In this study two equations were obtained
analysis two rules are produced depending on number of days:
depending on number of days
day 6 17:5 : 1:188 day þ 0:0844 C kg 0:0347 FAkg 2:4882
Comp: str: ¼ day 6 17:5 : 1:188 day þ 0:0844 C kg 0:0347 FAkg 2:4882
day > 17:5 : 0:1325 day þ 0:0572 C kg 0:0687 FAkg þ 20:7084 Comp: str: ¼
day > 17:5 : 0:1708 day þ 0:0646 C kg 0:0594 FAkg þ 14:6682
ð5Þ
ð6Þ
The day number which separates the rules is 17.5. It can be seen
that slag is not used as a parameter in the linear model rule and has When Eq. (6) is examined, the Eq. (6) is same as the equation in
no effect on results at the produced linear model. The results the M5 rule for days < 17. The equations for days > 17.5 are differ-
obtained using expression (5) are given in Table 1(h). The experi- ent. As with the M5 rule, in Eq. (6) the slag is not used in equations
mental results of compressive strength, Table 1(f) and the results and do not affect the results. The results obtained from expression
of expression (5), Table 1(h) are compared in Fig. 1. As can be seen (6) using experimental results as data are given in Table 1(i). The
in Figs. 1 and 2 there is big consistency between the experimental experimental results of compressive strength, Table 1(f) and the
results and the calculated results.
In this section the tree model M5P is applied to UPV output. For
this model, 40 instances shown in Table 1 were tested. Different
cross-validation values were tested to find the best cross-
validation value. For this application 20-fold cross-validation
which gave the best results was used. A 0.87 correlation coefficient
and a 147.69 mean absolute error was obtained.
Tree model obtained from tree model M5P is given in Fig. 8.
There is a separation for 17.5 days is in UPV results as in tree
structure obtained by the tree model M5P. This mean 17.5 days
is critical value for both compressive strength and UPV.
According to these results the resulting correlation is 0.87 and
the mean absolute error parameter is very high as 147.69. It can
be said that although the M5P tree and M5 rule are the different
Fig. 7. Scatter plot of the measured versus predicted UPV of trained data for M5
rule model.
kinds of classifiers, the results of both are similar for UPV values.
Fig. 4. Comparison of experimental data with training results obtained from M5P As a result of the analysis for UPV two rules are produced depend-
tree model. ing on the number of days:
Y. Ayaz et al. / Construction and Building Materials 94 (2015) 235–240 239
Table 1
Properties of the test set mixes.
Input data Experimental outputs Predicted outputs for Predicted outputs for UPV
compressive strength
a b c d e f g h i j k
Sample ID Day Cement Fly Ash Slag Comp. strength UPV M5 rule (MPa) Tree model M5 rule Tree model
(kg) (kg) (kg) (MPa) (m/s) M5P (MPa) (m/s) M5P (m/s)
1 3 350 – – 27.90 4060 30.62 30.62 4047 4047
2 3 175 175 – 8.60 3460 9.77 9.77 3443 3443
3 3 140 210 – 4.50 2980 5.60 5.60 3322 3322
4 3 105 245 – 2.40 2720 1.44 1.44 3201 3201
5 3 175 – 175 9.00 3460 15.85 15.85 3723 3723
6 3 140 – 210 7.50 3510 12.89 12.89 3658 3658
7 3 105 – 245 4.90 3330 9.94 9.94 3594 3594
8 3 175 87.5 87.5 10.50 3750 12.81 12.81 3583 3583
9 3 140 105 105 7.30 3570 9.25 9.25 3490 3490
10 3 105 122.5 122.5 5.40 3450 5.69 5.69 3397 3397
11 7 350 – – 38.10 4270 35.37 35.37 4260 4260
12 7 175 175 – 13.70 3780 14.53 14.53 3656 3656
13 7 140 210 – 9.10 3580 10.36 10.36 3535 3535
14 7 105 245 – 4.50 3270 6.19 6.19 3414 3414
15 7 175 – 175 20.00 3490 20.60 20.60 3936 3936
16 7 140 – 210 17.80 3950 17.64 17.64 3872 3872
17 7 105 – 245 15.10 3980 14.69 14.69 3807 3807
18 7 175 87.5 87.5 18.40 3990 17.56 17.56 3796 3796
19 7 140 105 105 16.20 3800 14.00 14.00 3703 3703
20 7 105 122.5 122.5 11.60 3760 10.44 10.44 3611 3611
21 28 350 – – 43.60 4310 44.44 42.06 4340 4296
22 28 175 175 – 20.90 3990 22.41 20.36 4001 3884
23 28 140 210 – 13.80 3910 18.00 16.02 3933 3801
24 28 105 245 – 9.70 3640 13.59 11.68 3865 3719
25 28 175 – 175 35.00 4200 34.43 30.76 4215 4116
26 28 140 – 210 32.30 4230 32.43 28.49 4190 4080
27 28 105 – 245 26.40 4110 30.42 26.23 4164 4044
28 28 175 87.5 87.5 33.40 4260 28.42 25.56 4108 4000
29 28 140 105 105 30.90 4160 25.21 22.26 4061 3941
30 28 105 122.5 122.5 25.30 4080 22.01 18.96 4015 3882
31 120 350 – – 54.60 4470 56.63 57.77 4480 4574
32 120 175 175 – 35.00 4140 34.60 36.07 4141 4162
33 120 140 210 – 31.30 4140 30.19 31.73 4073 4079
34 120 105 245 – 27.40 4070 25.78 27.39 4005 3997
35 120 175 – 175 50.20 4280 46.62 46.47 4355 4394
36 120 140 – 210 45.40 4340 44.62 44.21 4330 4358
37 120 105 – 245 37.40 4250 42.61 41.95 4304 4322
38 120 175 87.5 87.5 41.90 4230 40.61 41.27 4248 4278
39 120 140 105 105 35.30 4220 37.40 37.97 4201 4219
40 120 105 122.5 122.5 34.70 4150 34.20 34.67 4155 4159
day 6 17:5 : 53:2938 day þ 1:8496 C kg 1:6018 FAkg þ 3239:5785
UPV ¼
day > 17:5 : 3:0205 day þ 1:026 C kg 1:3283 FAkg þ 3851:9541
ð8Þ
In expression (8), 17.5 day is the critical day separating the rules
from each other. The slag is not used in expressions and has no
effect for UPV. When slag is not used in the analysis there is no
change in results.
The results obtained from expression (8) using experimental
UPV results as data are given in Table 1(k). The experimental
results of UPV, Table 1(g) and the results of expression (8),
Table 1(k) are compared according to experiment numbers in
Fig. 9. UPV experimental results and training outputs are compat-
ible (Figs. 9 and 10).
The comparison between measured and predicted values is
Fig. 9. Comparison of experimental UPV data with training results obtained from
shown in Figs. 1–10. Results in Figs. 1–10 show that both M5 rule tree model M5P.
model and M5P rule tree model are capable of generalizing
between the input and the output variables with reasonably good
predictions. The values of R2 are 0.97 and 0.966 for comparison of is able to follow a very close trend to the experimental values. The
measured and predicted compressive strength of concrete for M5 values of R2 are 0.87 and 0.87 for comparison of measured and pre-
rule model and M5P rule tree model, respectively. High R2 values dicted UPV of concrete for M5 rule model and M5P rule tree model,
reflect the strength of the correlation between measured and pre- respectively. The high values of R2 demonstrate that the proposed
dicted variables. Moreover, as it was depicted in Figs. 1–10 that the model is suitable for predicting the compressive strength and UPV
proposed models formulation for compressive strength prediction values very closely with the experimental values.
240 Y. Ayaz et al. / Construction and Building Materials 94 (2015) 235–240