Professional Documents
Culture Documents
6 (2023) 607-613
https://doi.org/10.12989/cac.2023.32.6.607 607
Abstract. The objective of this work is to determine the compressive strength of geopolymer concrete utilizing four distinct
machine learning approaches. These techniques are known as gradient boosting machine (GBM), generalized linear model
(GLM), extremely randomized trees (XRT), and deep learning (DL). Experimentation is performed to collect the data that is
then utilized for training the models. Compressive strength is the response variable, whereas curing days, curing temperature,
silica fume, and nanosilica concentration are the different input parameters that are taken into consideration. Several kinds of
errors, including root mean square error (RMSE), coefficient of correlation (CC), variance account for (VAF), RMSE to
observation’s standard deviation ratio (RSR), and Nash-Sutcliffe effectiveness (NSE), were computed to determine the
effectiveness of each algorithm. It was observed that, among all the models that were investigated, the GBM is the surrogate
model that can predict the compressive strength of the geopolymer concrete with the highest degree of precision.
Keywords: compressive strength; GBM; geopolymer concrete; GLM; machine learning; XRT
(2022) predicted the compressive strength of concrete. understood throughout the learning process.
Several works are available in literature regarding the Additive modelling is the foundation of boosting
prediction of compressive strength of different types of algorithms. The basic idea is to combine multiple smaller
concrete. functions to create a more complex one. As in gradient
The compressive strength of geopolymer paste, mortar, boosting, a complex model is created by combining
and concrete was predicted by Nazari and Sanjayan (2015) multiple simpler models. Gradient boosting employs a
using a variety of optimization approaches in conjunction weighted sum of a sufficient number of base learners to
with support vector machine and artificial neural network train a model.
algorithms. Gene expression programming technique was
𝐹0 (𝑥) = arg ∙ min ∑𝑛𝑖=1 𝐿(𝑦𝑖 , 𝛾) (1)
used by Shahmansouri et al. (2020) for predicting the 𝛾
compressive strength of GGBS-based geopolymer concrete.
For predicting the compressive strength of fly ash-based where, (𝑥𝑖 , 𝑦𝑖 )𝑛𝑖=1 represents the input dataset, 𝐿 (𝑦, 𝐹(𝑥))
geopolymer concrete, Nguyen et al. (2020) presented a deep is the differentiable loss function, and 𝑀 is the number of
residual network with no dropout and no normalization. iterations.
According to the study of Ahmad et al. (2021), the boosting For 𝑚 = 1 to 𝑀, the pseudo-residual is computed as
strategy outperformed ANN and AdaBoost for predicting 𝜕𝐿(𝑦𝑖 ,𝐹(𝑥𝑖 ))
the compressive strength of high calcium fly ash 𝑟𝑖𝑚 = − [ ]
𝜕𝐹(𝑥𝑖 )
geopolymer concrete. Cao et al. (2022) demonstrated that
𝐹(𝑥)=𝐹𝑚−1 (𝑥) (2)
XGBoost outperformed support vector machine and for 𝑖 = 1,2, … , 𝑛.
multilayer perceptron techniques in predicting the After that, a base learner ℎ𝑚 (𝑥) is fitted to the pseudo-
compressive strength of GPC. residuals i.e., it is trained using the training set. The
The use of machine learning techniques for predicting multiplier 𝛾𝑚 is computed as the optimization of the Eq. (3)
the strength of geopolymer concrete is relatively a new area as
to explore. In literature, most of the researchers used single
machine learning technique to predict the compressive 𝛾𝑚 = arg ∙ min ∑𝑛𝑖=1 𝐿(𝑦𝑖 , 𝐹𝑚−1 (𝑥𝑖 ) + 𝛾ℎ𝑚 (𝑥𝑖 )) (3)
𝛾
strength of geopolymer concrete. Present study aims to
carry out the compressive strength of geopolymer concrete The model is then updated as
using four different machine learning techniques, namely,
gradient boosting machine (GBM), generalized linear 𝐹𝑚 (𝑥) = 𝐹𝑚−1 (𝑥) + 𝛾𝑚 ℎ𝑚 (𝑥) (4)
model (GLM), extremely randomized trees (XRT), and
deep learning (DL) techniques. The data used for training 2.2 Generalized linear model (GLM)
the models are obtained experimentally. Four different input
parameters (curing days, curing temperature, silica fume, The response variable “y” is described in a linear
and nanosilica content) are taken as the input parameters regression model as a function or linear combination of all
while compressive strength is the response variable. The the predictors “X”. The underlying linear relationship
efficiency of each technique is worked out by computing between the predictors and response variables. Additionally,
the various errors such as, root mean square error (RMSE), the response variable’s error distribution should follow a
coefficient of correlation (CC), variance account for (VAF), normal distribution. Consequently, a linear model will be
RMSE to observation’s standard deviation ratio (RSR), and created. There are three main components in the case of
Nash-Sutcliffe Efficiency (NSE) were calculated. Amongst GLM: linear predictor (linear relation), link function (the
the models studied, GBM was found to be the most accurate function links linear predictor and the parameter for
surrogate model which can predict the compressive strength probability distribution), and probability distribution.
of the geopolymer concrete. Mathematically, the same can be expressed as
In 𝜆𝑖 = 𝑏0 + 𝑏1 𝑥1 (5)
2. Mathematical modeling
𝑦𝑖 ~Poisson(𝜆𝑖 ) (6)
In this section, the mathematical modeling for the GBM, where, ln is the link function and 𝑏0 + 𝑏1 𝑥𝑖 is the linear
GLM, XRT, and DL techniques is presented. predictor.
In next, the Poisson regression is applied. The prediction
2.1 Gradient boosting machine (GBM) curve looks like an exponential curve.
𝜆𝑖 = 𝑒𝑥𝑝(𝑏0 + 𝑏1 𝑥𝑖 ) (7)
A type of ensemble learning is the boosting approach.
These approaches’ fundamental principles revolve around
two steps: (1) learning about the base learners; and (2) 2.3 Extremely randomized trees (XRT)
combining all the learned models into a single forecast.
Gradient Boosting can be used with any differentially The XRT algorithm was presented by Geurts et al.
differentiable loss function because it is a generic model, (2006) and same has been employed during the present
however just because it can be demonstrated to work with a study.
squared loss model does not imply that it can be fully XRT builds the regression trees. This algorithm chooses
Comparative studies of different machine learning algorithms in predicting the compressive strength of geopolymer… 609
carefully the nodes which are to be cut at random and to taken as the input, whereas the compressive strength of the
grow new trees. At first, a node is to be spilt based on the geopolymer concrete is the response variable.
attributes of the dataset 𝑆. If the split is true, then the code
will return nothing, else, it will select the attributes 𝐾 2.6 Performance evaluation
among nonconstant attributes. Then the splits are drawn as
For evaluating the performance of the four ML
{𝑠1 … 𝑠𝑘 } (8) algorithms employed during the present work, the errors are
where 𝑠𝑖 represents a randomized split that has to be determined as reported in Table 1.
considered. Where, 𝑥𝑖 and 𝑥𝑖𝑝 are the actual and predicted value for
Now, the score is calculated as the compressive strength of the geopolymer concrete and 𝑟
denotes the total dataset count used. The bar above the 𝑥
𝑆𝑐𝑜𝑟𝑒(𝑠∗ , 𝑆) = 𝑚𝑎𝑥𝑖=𝑖…𝑘 𝑆𝑐𝑜𝑟𝑒(𝑠𝑖 , 𝑆) (9) represents the mean value.
In the similar manner, if the split from all the nodes
returns as constant, the XRT gives that value as the final
predicted output. 3. Results and discussion
Table 2 GPC mix proportions of materials for different grades of concrete (kg/m 3)
Coarse Curing
Mix FA GGBS Alccofine Nano Silica Silica Fume Sand
Aggregate temperature (°C)
N0C27 149.17 213.1 63.93 0 0 547.8 1276.8 27
N0C60 149.17 213.1 63.93 0 0 547.8 1276.8 60
N0C75 149.17 213.1 63.93 0 0 547.8 1276.8 75
N0C90 149.17 213.1 63.93 0 0 547.8 1276.8 90
N0120 149.17 213.1 63.93 0 0 547.8 1276.8 120
N0.5C27 149.17 213.1 63.93 2.13 21.13 547.8 1276.8 27
N0.5C60 149.17 213.1 63.93 2.13 21.13 547.8 1276.8 60
N0.5C75 149.17 213.1 63.93 2.13 21.13 547.8 1276.8 75
N0.5C90 149.17 213.1 63.93 2.13 21.13 547.8 1276.8 90
N0.5C120 149.17 213.1 63.93 2.13 21.13 547.8 1276.8 120
N1C27 149.17 213.1 63.93 4.26 42.6 547.8 1276.8 27
N1C60 149.17 213.1 63.93 4.26 42.6 547.8 1276.8 60
N1C75 149.17 213.1 63.93 4.26 42.6 547.8 1276.8 75
N1C90 149.17 213.1 63.93 4.26 42.6 547.8 1276.8 90
N1C120 149.17 213.1 63.93 4.26 42.6 547.8 1276.8 120
N1.5C27 149.17 213.1 63.93 6.39 63.9 547.8 1276.8 27
N1.5C60 149.17 213.1 63.93 6.39 63.9 547.8 1276.8 60
N1.5C75 149.17 213.1 63.93 6.39 63.9 547.8 1276.8 75
N1.5C90 149.17 213.1 63.93 6.39 63.9 547.8 1276.8 90
N1.5C120 149.17 213.1 63.93 6.39 63.9 547.8 1276.8 120
(a) (b)
(c) (d)
Fig. 1 Variation of % error (×100) (represented as range on Y-axis) for unseen dataset [% error=(predicted value-actual
value×100)/actual value] for different count of training dataset used for training the machine learning algorithm (a) GBM, (b)
GLM, (c) XRT and (d) DL (The error is determined for the testing dataset, i.e., unseen dataset (Range on Y-axis indicates the
error range in the prediction of the compressive strength of the concrete for model obtained using different % of training
dataset in training the model).)
Comparative studies of different machine learning algorithms in predicting the compressive strength of geopolymer… 611
Table 3 Statistical errors in predicting the compressive strength of the geopolymer concrete
RMSE (0) CC (1) VAF (100) RSR (0) NSE (1)
Model
Tr Te Tr Te Tr Te Tr Te Tr Te
GBM 0.0204 0.0357 0.9142 0.8870 98.940 96.342 0.0289 0.0401 0.9091 0.8762
DL 0.0791 0.0952 0.8308 0.7771 96.703 94.551 0.0810 0.0993 0.8123 0.7853
XRT 0.1789 0.2150 0.7017 0.6545 93.392 91.187 0.2072 0.3260 0.6004 0.5499
GLM 0.4002 0.4471 0.5893 0.5102 90.568 88.045 0.4291 0.5529 0.5132 0.4885
(a) (b)
(c) (d)
Fig. 2 Variation of the training and testing deviance for (a) GBM, (b) GLM, (c) XRT and (d) DL for different model
parameters
the training and deviance of the compressive strength of the predicted values lie outside the range of ±5%. Thus, GBM
geopolymer concrete for different model parameters. For can predict the compressive strength of geopolymer
GBM, as the number of trees reaches 135, the deviance concrete with good accuracy.
becomes constant. Similar types of studies are also carried Reason for the best performance of GBM compared to
out with respect to the leaves and depth. In the case of XRT, the other modes studied: The GBM is found to perform best
the model trained using 30 number of trees shows minimum compared to the DL, XRT, and GLM models. The reason
value for the deviance. Hence, the same number of trees are for the same can be summarized as:
adopted for training the XRT model. In the case of GLM, • Because the gradient boosting model adheres to
after 20 iterations, the deviance became constant. Therefore, ensemble learning, handling and interpreting the data is
the same is adopted in further studies. DL trained using simpler.
8000 epochs is used for further studies. The details of the • GBM is one of the best methods for processing bigger
GBM and XRT models are presented in Table 4. datasets and computing with weak learners at least loss.
Fig. 3 shows the agreement line diagram for the • A robust technique for machine learning that can
predicted v/s the actual value for the compressive strength quickly detect overfitting training datasets is the
of the geopolymer concrete using the trained models. For gradient boosting algorithm.
the GBM model, most of the values lie within the error Sensitivity analysis: The output variable is always
range of ±5% with maximum values lying near to the dependent on the several input variables. However, the
perfect agreement line i.e., the line on which the actual and extent of the dependency of the output is different for all the
predicted values are same. In the case of GLM, most of the input variables adopted. Hereby, the results for the
612 Sagar Paruthi, Ibadur Rahman and Asif Husain
(a) (b)
(c) (d)
Fig. 3 Actual v/s predicted value for compressive strength of geopolymer concrete obtained using (a) GBM, (b) DL, (c) XRT
and (d) GLM models for unseen dataset
4. Conclusions
Fig. 4 Sensitivity of the output variable i.e., compressive Present study aims to predict the compressive strength
strength on different input variables adopted over the scaled of geopolymer concrete using four different machine
importance to 1 learning techniques, namely, GBM, DL, XRT, and GLM.
Four different input parameters were considered for training
the models. The dataset used during the present study was
sensitivity analysis are presented in Fig. 4. It can be seen obtained experimentally. It was observed that the GBM was
that all the algorithms predicted that the nanosilica content able to predict the compressive strength of the geopolymer
widely affects the compressive strength of the GPC. This is concrete with good accuracy followed by DL technique.
true to a large extent as the nanosilica is the finest material GLM was not able to predict the compressive strength as
that is adopted in the present work. Finer the material is the error in case of GLM was large.
used during the preparation of the concrete, more will be
the compressive strength obtained, keeping the other
parameters constant. Followed by nanosilica content, References
temperature is observed to be the second most important
Ahmad, A., Ahmad, W., Chaiyasarn, K., Ostrowski, K.A., Aslam,
variable affecting the compressive strength of the concrete, F., Zajdel, P. and Joyklad, P. (2021), “Prediction of geopolymer
while silica fume being the lowest one. concrete compressive strength using novel machine learning
Comparative studies of different machine learning algorithms in predicting the compressive strength of geopolymer… 613