1 s2.0 S2214509523010173 Main

Case Studies in Construction Materials 20 (2024) e02836
Contents lists available at ScienceDirect
Case Studies in Construction Materials

journal homepage: www.elsevier.com/locate/cscm
Machine learning based prediction models for spilt tensile strength

of fiber reinforced recycled aggregate concrete
Mohammed Alarfaj a, *, Hisham Jahangir Qureshi b, Muhammad Zubair Shahab c,
Muhammad Faisal Javed d, Md Arifuzzaman b, Yaser Gamil e, *
a
Department of Electrical Engineering, College of Engineering, King Faisal University, Al-Ahsa 31982, Saudi Arabia
b
Department of Civil and Environmental Engineering, College of Engineering, King Faisal University, Al-Ahsa 31982, Saudi Arabia
c
Department of Civil Engineering, COMSATS University Islamabad, Abbottabad Campus, Abbottabad 22020, Pakistan
d
Department of Civil Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Swabi 23640, Pakistan
e
Department of Civil Engineering, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500 Bandar Sunway, Selangor,
Malaysia
A R T I C L E I N F O A B S T R A C T
Keywords: The demand for concrete production has led to a significant annual requirement for raw mate
Fiber reinforced Recycled Aggregate Concrete rials, resulting in a substantial amount of waste concrete. In response, recycled aggregate concrete
Machine Learning has emerged as a promising solution. However, it faces challenges due to the vulnerability of the
Sustainability
hardened mortar attached to natural aggregates, leading to susceptibility to cracking and reduced
Eco-friendly Concrete
Spilt Tensile Strength
strength. This study focuses on predicting the split tensile strength of fiber reinforced recycled
Gene expression programming aggregate concrete using five prediction models, including two deep neural network models
Deep neural networks DNN1 and DNN2, one optimizable Gaussian process regression (OGPR), and two genetic pro
Optimizable gaussian process regression gramming based GEP1 and GEP2 models. The models exhibited high accuracy in predicting spilt
tensile strength with robust R2, RMSE, and MAE values. DNN2 has the highest R2 value of 0.94
and GEP1 has the lowest R2 value of 0.76. DNN2 model R2 was 3.3% and 13.5% higher than
OGPR and GEP2. Similarly, DNN2 and GEP2 model performed 9.3% and 9.21% better than DNN1
and GEP1 respectively in terms of R2. DNN2 model performed 20.32% and 31.5% better than
OGPR and GEP2 in terms of MAE. Similarly, GEP2 and DNN2 MAE were 13.1% and 31.5% better
than GEP1 and DNN1. Sensitivity analysis using the relevance factor and permutation feature
importance revealed that the most significant positive factors are cement, natural coarse aggre
gates, density of recycle aggregates, and superplasticizer while recycle aggregate concrete, max
Abbreviations: FRAC, Fiber reinforced recycled aggregate concrete; STS, Splitting tensile strength; RA, Recycled aggregates; RAC, Recycled
aggregate concrete; ML, Machine learning; MRM, Multiple regression methods; CNN, Convolutional neural networks; ANN, Artificial neural
network; GEP, Gene expression programming; OGPR, Optimizable gaussian probability regression; MLR, Multiple linear regression; RF, Random
forest; SVM, Support vector machine; XG Boost, eXtreme gradient boosting; GBR, Gradient bosting regressor; DTR, Decision tree regressor; MLP,
Multilayer perceptron; HPO, Hyperparameter optimization; R2, Determination coefficient; MAE, Mean absolute error; RMSE, Root mean square
error; OBJ, Objective function; PI, Performance indicator; CC, Cement content; WC, Water content; RCA, Recycled concrete aggregates; NCA,
Natural coarse aggregates; Dmax_RCA, Maximum aggregate size of RCA; Den_RCA, Density of RCA; WA_RCA, Water absorption of RCA; SP,
Superplasticizer; FV, Fiber volume; FT, Fiber type; VIF, Variance inflation factor; KNN, K-Nearest-Neighbor; PFI, Permutation feature importance; r-
value, Relevancy factor; ANFIS, Adaptive Neuro Fuzzy Inference System; MARS, Multivariate adaptive regression splines; RNN, Recurrent Neural
network.
* Corresponding authors.
E-mail addresses: mkalarfaj@kfu.edu.sa (M. Alarfaj), hqureshi@kfu.edu.sa (H.J. Qureshi), zubairshahab@cuiatd.edu.pk (M.Z. Shahab),
arbabfaisal@giki.edu.pk (M.F. Javed), marifuzzaman@kfu.edu.sa (M. Arifuzzaman), yaser.gamil@ltu.se (Y. Gamil).
https://doi.org/10.1016/j.cscm.2023.e02836
Received 21 July 2023; Received in revised form 28 October 2023; Accepted 27 December 2023
Available online 5 January 2024
2214-5095/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
M. Alarfaj et al. Case Studies in Construction Materials 20 (2024) e02836
size, and water content of recycle aggregates and water content have the most negative effect on
STS values. The proposed ML methods, especially DNN2 and OGPR can be effectively utilized in
practical projects, saving time and cost for eco-friendly fiber reinforced recycled aggregate con
crete mixes. However, it is required to study more input variables and utilize hybrid models to
further enhance the accuracy and reliability of the models.
1. Introduction
Concrete is a common building material because of its excellent qualities. According to China’s National Development and Reform
Commission, more than 32 billion cubic meters of commercial concrete was produced in 2021 [1]. As a result, there is a significant
annual demand for cement, sand, natural gravel, and crushed stone. In addition, the demolition of unserviceable structures produces
more than 2 billion tons of waste concrete [1–3]. Reusing construction and demolition waste in construction materials is inherently
logical, as it leads to environmentally beneficial construction practices. RA-based concrete is one of the potential solutions for
decreasing the scale of natural resource consumption. Therefore, waste concrete is crushed to make recycled concrete aggregates
(RCA). However, the final product from the preparation of RCA often consists of hardened mortar attached with natural aggregate.
Therefore, RCA exhibits lower strength compared to natural aggregate and is prone to cracking due to attached hardened mortar [4–6].
Additionally, as seen in Fig. 1, multiple fissures are produced throughout the crushing process. The utilization of these recycle ag
gregates in concrete increases its susceptibility to cracking under external loads, leading to a decrease in strength and service life. The
cracking characteristics of recycled aggregate concrete (RAC) require significant importance and need considerable attention
compared to other mechanical properties.
Therefore, the tensile strength of concrete is crucial because a crack appears when tensile stresses exceed tensile strength of
concrete. Splitting tensile strength (STS) is an indirect measurement of concrete’s tensile strength. RAC typically exhibits lower
splitting-tensile strength when compared to natural aggregate concrete. Yang et al.[7] noted that as the replacement ratio of the
recycled concrete aggregates increased, there was a decrease in both compressive and splitting tensile strength. Researchers have used
fibers to enhance mechanical properties and increase fracture resistance of RAC. Fiber reinforced recycled aggregate concrete (FRAC)
has been demonstrated to perform better in terms of fracture resistance [8–10]. Akca et al.[11] reinforced RAC with polypropylene
fiber and found that the inclusion of fiber increased the material’s flexural and splitting tensile strengths. Ali et al.[12] investigated the
mechanical characteristics of plain RAC and glass fiber reinforced RAC and found that the splitting tensile strength of fiber reinforced
RAC was much higher. Furthermore, Gao et al.[13] observed a considerable increase in flexural strength with an increase in steel fiber
volume percentage. Nevertheless, the relationship between the mix ingredients, fibers, and RAC mechanical qualities should be
carefully considered for a successful mix design. A reliable mechanism is required to estimate the properties of fiber reinforced RAC
effectively.
Laboratory tests, computational techniques, and Multiple regression methods (MRM) have been used to forecast the cracking
performance of plain concrete [14–16]. The sequential laboratory procedure, which comprises casting specimens, curing for a certain
amount of time, and testing, raises concerns of cost and time. Multiple regression methods have been used extensively over the past 20
years but the strength increment of cement materials with age is nonlinear, which makes it challenging for MRM to accurately reflect
the relationships between input variables and strength [17–19]. Moreover, fiber reinforced RAC is a complex heterogenous composite
that poses a challenge for property estimation due to its high nonlinearity. In recent years, there has been an increase in the application
of soft computing techniques in civil engineering. ML models can deliver trustworthy and accurate forecasts on optimized design data
while saving time, money and physical effort [20–22]. These algorithms overcome the drawbacks and complexity of traditional
regression models by using strategies to extract information from input data and generate extremely exact output data [23–25].
Awoyera et al.[26] applied GEP and ANN for modeling compressive, flexural and spilt tensile strength of geopolymer self-compacting
concrete. Their research concluded that both GEP and ANN methods provide accurate predictions with minimal errors from experi
mental data. Ahmad et al. [27] used GEP and ANN for compressive strength prediction of concrete containing RCA. The study revealed
GEP performed better than ANN for accurate predictions. Duan et al. [28] used artificial neural network with dataset of 168 instances
for compressive strength of RA concrete. The author noted that ANN is a promising algorithm for prediction of concrete compressive
with diverse kinds and sources of recycle aggregates. Deng et al. [29] utilized convolutional neural networks (CNN), which they found
to be superior to traditional neural networks for compressive strength prediction. Patil et al.[30] compared ANN with multiple linear
regression (MLR) for prediction of mechanical properties of RA concrete. The authors noted that ANNs demonstrated superior R2
values. Recently, Artificial neural networks have been increasingly used by researchers to predict the characteristics of construction
materials. With huge multivariable datasets, ANN can solve and explain non-linear issues without the need for equation construction.
Fig. 1. Cracking phenomena of recycled aggregate concrete.
2
Machine learning has also been previously used to forecast splitting tensile strength of concrete. The most widely used techniques for
splitting tensile strength are Neural networks (ANN) [26,31–38], random forest (RF) [37,39–42], support vector machine (SVM) [32,
34,39–41,43,44], eXtreme gradient boosting (XG Boost) [34,40,43,45], gradient boosting regressor (GBR) [39,43,45], decision tree
regressor (DTR) [37,39,46] and multilayer perceptron (MLPs) [41,43]. Researchers [36–38,42,45] studied spilt tensile strength of RA
concrete but does not consider effect of fibers. .
Moreover, developing a successful ML model for a problem is a multi-step and challenging process. Comparing a number of created
ML models to determine which is best for deployment is one of the most crucial step [47]. De-Ccoster et al.[48] mentioned that there
are several state-of-the-art algorithms available, including exact and metaheuristic techniques. The outcomes of these algorithms in
literature demonstrate that no single method performs better than the others [48]. Employing various algorithms also increases the
robustness of a study’s findings. Fernández-Delgado et al. [49] have shown that diversity through algorithm selection contributes to
more reliable findings and helps avoid placing too much reliance on a single model. Often, studies propose new models and only
compare them to other models from the same family. Therefore, researchers are clueless about the comparative and superior per
formance of the proposed model from other models working on different principles [49]. Moreover, a comparative analysis of different
algorithms allows researchers to identify the most suitable model for their specific problem. Caruana et al.[50] emphasize the
importance of comparing multiple algorithms to determine the best approach for a particular problem. The study suggests that there is
no universally best learning algorithm, although some methods outperform others on average, there’s substantial variability across
different problems and metrics. Even the best models give occasional poor performances and poor models perform exceptionally well.
Different models are sensitive to different types of data and may excel in specific situations, utilization of different models increases the
effectiveness of study. Uddin et al.[51] concluded that the use of a same supervised learning algorithm may yield varying results when
applied across different study settings. It’s important to note that comparing the performance of two supervised learning algorithms
might yield inaccurate results if they were independently utilized in distinct studies. Incorporating a variety of algorithms and
comparing their results also helps overcome bias that might be inherent in a single model. Each algorithm may make different as
sumptions and predictions, leading to a broader exploration of the problem space. This diversity can help identify hidden patterns and
relationships in the data that a single algorithm might miss. Furthermore, recent studies have employed the hyperparameter opti
mization method more frequently to develop superior MLA models [52]. Wu et al.[53] emphasize the crucial role of hyperparameter
tuning for creating efficient and robust ML models. The hyperparameter tuning process exhibits variability, mainly because different
machine learning algorithms utilize distinct sets of hyperparameters. To streamline and optimize this process, researchers have created
automated hyperparameter optimization (HPO) systems. HPO delivers numerous advantages in the domain of ML research [52–54].
HPO searches the hyperspace and selects the best performing combination of hyperparameters after iterative process. It proves
particularly beneficial because various hyperparameters may have unique optimal values to achieve peak performance across diverse
problems. Consequently, the integration of HPO techniques facilitates the identification of the most suitable model for a given problem
by significantly reducing the manual effort required in developing robust ML model when handling extensive datasets or complex
models characterized by numerous hyperparameters.
Fig. 2. Benefit of recycle aggregate concrete.
3
2. Research significance
Previous research mainly focused on essential elements used in the manufacturing process of concrete. Furthermore, studies on RA
based concrete spilt tensile strength do not consider the influence of fibers. The prediction of cracking behavior of RAC become more
complex, considering the influence of recycled aggregate, let alone the fiber reinforced RAC. There are very limited studies on spilt
tensile strength of fiber reinforced RAC, therefore an extensive study is required to train and test ML based prediction models followed
by correlation and sensitivity study. In this study five machine learning models are developed for spilt tensile strength of fiber rein
forced RAC. Three distinct machine learning techniques were used: Deep neural networks, Bayesian optimized Gaussian process re
gressor, and Gene expression programming technique. Two different sets of hyper-parameters were used for DNN and GEP to reach
optimum solution. Bayesian optimizer is used in OGPR model for hyperparameter tuning and selection of best performing combination
of hyperparameters. A dataset with 10 input parameters and 257 datapoints was collected from peer reviewed published study. Each
model’s performance was assessed using k-fold cross validation and statistical metrices including determination coefficient (R2), mean
absolute error (MAE), root mean square error (RMSE), objective function (OBJ). This research also aims to assess the positive or
negative impact of raw ingredients and fibers on the splitting-tensile strength of FRAC by employing relevancy factor and permutation
feature importance. The key contribution of this study is to reveal the underlying principles of various machine learning techniques
and to compare their effectiveness in predicting the splitting tensile strength of FRAC. Use of diverse machine learning models provide
performance comparison towards the prediction of STS of fiber reinforced RAC [36]. Moreover, application of different machine
learning techniques helps researchers to assess and compare their performance on same dataset. Overall, this study contributes to our
understanding of machine learning trends and their applicability in diverse real-world domains for predicting construction material’s
properties. Furthermore, the research sheds light on the challenges faced and suggests potential directions for future investigations.
3. Methodology
3.1. Source and feature of literature data
In this study, the ML models were trained and tested using literature data from peer-reviewed publication [1]. The output variable
was split tensile strength while input variables were cement content, water content, natural coarse aggregate content, recycle concrete
aggregate content, maximum aggregate size of RCA, density of RCA, water absorption of RCA, superplasticizer, fiber volume and fiber
type. The dataset had 257 instances, and 80% of the data is used for training and the remaining 20% for testing of the models. Table 1
displays description of the dataset.
3.2. Data preprocessing
Data preprocessing is required before applying ML model to remove anomalies and discrepancies from dataset. A statistical analysis
was conducted to view the characteristics of data (see Table 1). K-Nearest-Neighbor (KNN) algorithm was used to replace for missing
values and outliers. Next, min-max normalization is used to make sure that each feature has an equal impact on the training process. As
seen in Eq. 1, where Xmin and Xmax stand for the minimum and maximum values of feature X.
Xi − Xmin
Xi = (i = 1, 2, ...n) (1)
Xmax − Xmin
3.3. Nature of experimental data for ML models
For machine learning models to be effective and highly accurate in their predictions, it is essential to gather relevant data and input
variables. Conventional machine learning researchers often overlook the physical significance of variables, resulting in an ML model
that lacks the capability to handle scenarios with different distribution of data and beyond the constructed database. To overcome this
limitation, the compiled database is formulated using knowledge from physical tests and existing mechanical models. There are no
Table 1
Characteristics of Dataset.
Items Notations Mean Min Max STD kurtosis Skewness
3
Cement (kg/m ) Cement 369.9 158 600 61.968 1.118 0.0686
Water (kg/m3) Water 185.63 98.28 343.5 37.385 6.186 1.413
Natural coarse aggregate (kg/m3) NCA 343.17 0 1143 362.756 -1.174 0.447
Recycle concrete aggregate (kg/m3) RCA 732.82 57 1474 387.36 -0.8887 0.262
Maximum Size of RCA mm Dmax_RCA 18.105 10 25 4.2691 -0.3645 -0.7554
Density of RCA (kg/m3) Den_RCA 2430.9 2010 2661 156.68 0.4027 -0.838
Water absorption of RCA (kg/m3) WA_RCA 5.3993 1.9 10.9 1.845 1.9426 1.0883
Superplasticizer (kg/m3) SP 1.0148 0 7.8 1.652 3.2258 1.831
Fiber volume % FV 0.293 0 5 0.584 31.95 4.65
Split Tensile Strength MPa STS 3.0628 1.38 7.61 1.087 1.658 1.011
Fiber type Steel fiber, Carbon fiber, Polypropylene fiber, Basalt fiber, Glass fiber, Woolen fiber.
4
established mechanical models for FRAC, but it is possible to draw comparisons for RAC from cracking models of plain concrete.
According to these models such as fictitious crack model, boundary effect model, size effect model, the mixture design, including the
proportions of cement content, water content, coarse aggregate content, and aggregate size, plays a crucial role in determining
cracking and fracture, irrespective of specimen geometries [1,15,55,56]. As evident in literature, the splitting tensile strength of FRAC
is also influenced by supplementary factors such as the attributes of recycled aggregates (e.g., content, density, water absorption, size),
fiber types, and their volume fractions.
Furthermore, Fig. 3 displays the correlation matrix of the study’s data input. Darker hues denote higher correlation values, whereas
lighter colors represent lower correlation values. It is clear that the input variables with the strongest positive correlation were cement
content and water content (R = 0.42). NCA and RCA have the highest negative correlation (R=− 0.62). Higher correlation values (|r| >
0.5) among the input variables may indicate the potential presence of multicollinearity. Multicollinearity has the capacity to influence
the outcomes and could introduce bias into the model. In the field of statistical analysis, variance inflation factor (VIF) plays a crucial
role as a diagnostic tool for assessing the existence of collinearity among independent variables [57]. The VIF values typically range
from 0 to 10, occasionally being restricted to a range of 0 to 5. This numerical range provides a quantifiable measure of the degree of
collinearity present within a regression model [58]. Furthermore, the tolerance value is the reciprocal of the VIF, is also useful for
evaluating multicollinearity. A tolerance value greater than 0.1 but less than 1 is commonly considered an indication of the absence of
significant multicollinearity [59]. As shown in Table 2, the independent variables in this study exhibit no significant multicollinearity,
as indicated by VIF values below 5 and tolerance values exceeding 0.1 [59].
4. Characteristics of employed techniques
4.1. DNN
The neural structure and functioning characteristics of the biological brain are modelled by the ANN. A neural network is made up
of a large number of interconnected neurons that process information by changing states dynamically in response to sequences of
external inputs [60,61]. The most popular architecture is feedforward neural network with multilayer perception. It generally consists
Fig. 3. Correlation heatmap analysis of variables.
5
Table 2
VIF and tolerance values for each independent variable.
Inputs
Cement Water NCA RCA Dmax_RCA Den_RCA WA_RCA SP FV FT
Collinearity Statistics VIF 1.71 1.87 1.73 2.74 1.31 2.09 1.72 1.29 1.22 1.55
Tolerance 0.583 0.535 0.576 0.365 0.763 0.479 0.58 0.772 0.822 0.646
of numerous neurons distributed across two or three layers, with at least one hidden layers [62]. The neurons in successive layers are
interconnected while there are no connections between neurons in the same layer. The conventional deep neural network (DNN)
design, which has numerous hidden layers, is shown in Fig. 4(a). Data collection and exploration, problem discovery, architecture
construction, the learning process, data training based on the designed network, and testing of the trained network for efficiency
Fig. 4. The typical ANN’s functional activation method and architecture: (a) ANN with two HL, (b) activation processes in the hidden and output
layer and (c) Activation functions.
6
assessment are the key processes that make up an ANN model [18,63]. The activation procedures for the hidden and output layers are
shown in Fig. 4(b)-(c), respectively. Typically, Eq. (2) is used to determine the weighted summations of the input parameters [64].
( )
∑ n
(net)j = f wij xi + b (2)
i=1
Where n is the number of nerve cells in each layer, (net)j is the output from the neuron, wij is the connection’s weight, and b is defined
as the bias weight, xi is the input parameters, and f is the activation function. The tan-sigmoid function is written in Eq. (3) [64,65].
2
Oj = f (net)j = tan − sin(n) = (3)
(1 + exp(− 2n) − 1)
4.2. OGPR
The Gaussian process (GP) is a machine learning algorithm that makes use of the Bayesian learning concept and the Gaussian
stochastic approach [66]. The GP is based on a streamlined form of the Gaussian probability distribution. Gaussian Probability dis
tribution characterizes random variables as scalars or vectors following multivariate Gaussian distributions [67]. The model hyper
parameters have a significant impact on accuracy of GP regression models perform. The conventional GP technique hyperparameters
are optimized using the highest likelihood approximation approach. Accordingly, the projecting average and covariance in a GP model
may be described as follows: in a set of N observations, D = ( xn, yn)where n = 1 to N and xn belong to X and yn belong to R [53,68].
μ(x; {xn , yn }, θ) = K (X, x)T K (X, X)− 1 (y − m(X)) (4)
∑ ́ ́ ́
(x, x ; {xn , yn } , θ) = k(x, x ) − K(X, x)T K(X, X)− 1
K(X, x ) (5)
where K (X, x) is an N-dimensional column vector representing the cross-covariance between x and the set X. The GP hyperparameters
are indicated by θ. The (N x N) matrix K (X, X) serves as the Gram matrix for the set X. Furthermore, the Expected improvement (EI) is
implemented using the acquisition function., (EI) can be stated as follows[68].
Fig. 5. Mechanism of GEP.
7
aEI (x; D, θ) = σ (x; D, θ)[γ(x)φ(γ(x)) + N(γ(x)); 0, 1)] (6)
Where, γ(x) = f (Xbestσ(x;D,θ)

− μ(x; D,θ))
;.
The collective distribution function is shown by (∅ (⋅)), and the lowest observed value is indicated by f (xbest). Predictive mean
function and predictive variance function are represented by μ (x; D, θ) and σ (x; D,θ), respectively. Additionally, the best current value
is denoted as[68]:
Xbest = arg min

w f (Xn ) (7)
4.3. GEP
Koza developed computational method called genetic programming (GP) that use concepts of genetics and natural selection
[69–71]. It solves problems based on Darwinian reproduction and functions similarly to genetic concepts like mutation, crossover, and
reproduction [69,70,72]. GP is a flexible tool that introduces parse trees (nonlinear patterns) as a substitute for fixed length binary
strings. Genetic Expression Programming (GEP) developed by Ferreira, expands concept of Genetic Programming (GP) by fusing
fixed-length chromosomes from genetic algorithms (GA) with parse trees [71]. GEP program operates with fixed-length strings. This
implies a separation of genotype and phenotype within the GEP algorithm. One significant modification in GEP is that only the ge
notype is transferred to the next stage. This eliminates the necessity to replicate and modify the entire structure since all mutations take
place within a single linear framework[73]. Furthermore, it creates various expression trees by combining function sets, terminal sets,
fitness functions, control parameters, and stop states. The single chromosomes of GEP generate several genes divided into head and tail
segments [72]. The chromosomes encode the data necessary for establishing a casual correlation, and the Karva language was
Fig. 6. Flowchart for the research’s machine learning process.
8
developed to interpret this information. The schematic depiction of the Genetic Expression Programming (GEP) method can be
observed in Fig. 5. The procedure initiates by randomly generating fixed-length chromosomes for the entire population. These
chromosomes are subsequently represented as Expression Trees (ETs), and their fitness is evaluated. The individuals best suited for the
reproductive phase are selected. This process is iterated with new individuals over multiple generations until the optimal solution is
achieved.
5. Formulations of ML models
In this research, five ML models were designed to find a better ML model for accurate predictions of the spilt tensile strength of
FRAC. In DNN models, neurons and number of hidden layers are changes to optimize performance. The DNN1 architecture (10–30 − 20
- 1) has 10 input variables, 2 hidden layers (HL), with 30 neurons in the first hidden layer and 20 neurons in second hidden layer and 1
output parameter. Additionally, feed forward neural networks were utilized as supervised training algorithms, that has been used
progressively in variety of problems for feature selection and function computation [74,75]. The weights and biases stages were
improved with the help of the Levenberg-Marquardt function [65]. The tan-sigmoid activation function is employed in hidden layer
while PURELIN linear function is used in the output layer [75]. DNN2 architecture (10− 30-20− 10-1) has 3 hidden layers with 30, 20
and 10 neurons respectively. DNN2 uses the same activation function as DNN1. Same learning rate of 0.25, and 5000 no. of epochs was
used for DNN1 and DNN2. All mentioned DNN models were implemented using MATLAB 2021a. The kernel function, size and sigma
values are three essential components of the OGPR model. The selection of the best set of hyper-parameters for OGPR is performed
using Bayesian optimization. Metaheuristic algorithms search the hyperparameter space and assess the performance of each
hyper-parameter combination and select best combination. The quadratic, exponential, squared exponential, and matern 5/2 kernel
functions were used. The kernel scale value was set at 0.015–15. Similarly, Sigma values were 0.0001- 129.75. Among these hyper
parameters, final selection is made after optimal performance on training and validation stage. In this manner, an optimal solution is
suggested for STS using Bayesian optimized OGPR. The number of genes and chromosomes, head size, and function set were the
important factors that affected the prediction performance of GEP models. The GEP1 and GEP2 models were created by selecting
hyperparameters using knowledge and engineering judgement from pervious problems. Two GEP models (GEP1 and GEP2) were
created by modifying the GEP parameters in an effort to improve prediction accuracy and provide an optimal solution. The function set
and linking function were same for both GEP models. GEP1 model uses 4, 60 and 10 values for number of gene, chromosomes, and
head size respectively. GEP2 uses 05,70 and 12 values for number of genes, chromosomes, and head size. The program GeneXPro 5.0
was used for GEP models. The whole study mechanism and folwchart is shown in Fig. 6. Table 3 provides a list of hyperparameters for
ML models.
6. Model Evaluation Criteria
Standard statistical measures such as the determination coefficient (R2), mean absolute error (MAE), root mean square error
(RMSE), objective function (OBJ) and performance index (PI) were used to assess the performance of the GEP, DNN, and OGPR models
[76]. Table 4 lists these performance indicators together with their mathematical formulations.
Table 3
Features of formulated ML models.
Model Feature
DNN1 • Architecture: 10–30 –20 – 1

• Learning rate = 0.25, Momentum rate = 0.9, and epochs = 5000
• Activation function: logsig, tansig, purelin
DNN2 • Architecture: 10–30 – 20–10 – 1
• Learning rate = 0.25, Momentum rate = 0.9, and epochs = 5000
• Activation function: logsig, tansig, purelin
OGPR • Cross-validation folds: 5 folds
• Kernel function: exponential, quadratic, squared exponential, matern 5/2.
• Kernel scale: 0.015–15.
• Sigma: 0.0001–129.75
• Basic function: constant
GEP1 • Function set: + , -, x, /, sin, cos
• Linking function: addition
• Gene: 4
• Chromosomes: 60
• Head size: 10
GEP2 • Function set: + , -, x, /, sin, cos
• Linking function: addition
• Gene: 5
• Chromosomes: 70
• Head size: 12
9
Table 4
Statistical indicators.
Equation of statistical indicator Acceptable range
∑n
(Yi − Xi ) 2 Close to 1
R2 = 1 − ∑i=1 n 2
i=1 (Yi − Xi )
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅
∑n 2 MAE < RMSE
i=1 (Yi − Xi )
RMSE =
n
1∑n
MAE = |X − Yi |
n i=1 i
RRMSE For good model (lessthan 0.2)
PI =
1+R
nT − nV nv 0 value for best model
OBJ = ( )ρT + 2( )ρV ; T = Training data, V = Testing data
n n
n = data points, Xi = Experimental data, Yi = predicted data, Xi = average

experiment values, Yi = average predicted values
7. Sensitivity Analysis
7.1. Relevancy Factor
Sensitivity analysis is a technique for assessing the relative impact of input variables on a simulated model’s performance. The
primary objective of this work was to use suggested ML models to investigate the proportionate effects of input data, such as cement,
water, NCA, and RCA, on the estimated values of STS. The training data points were used to conduct the analysis. The relevance factor
(r) was used to assess the extent of each input parameter’s impact on the projected output values. When employing ML models to
anticipate the estimated values, a higher r-value denotes a parameter’s stronger predictive power [77]. The r-value was determined
using the formula shown in Eq. 8 [77,78].
∑
N
(Ij,i − I j )(Oi − O)
i=1
r(if , O) = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (8)
∑N ∑N
(Ij,i − I j )2 (Oi − O )2
i=1 i=1
Where, Ij signifies the jth input information, Oi is the simulated output information of STS acquired from the selected models, N is the
number of data points, andIj provides the average value of jth input parameter.
7.2. Permutation feature importance
The permutation feature importance (PFI) parameter measures the weighted contribution of each individual input to the prediction
of the target variable. Due to its efficiency and simplicity, it has been frequently used. PFI evaluates the correlation between an input
parameter (Xi) and the anticipated result (Yi), while maintaining the status quo for all other input parameters. By evaluating the
variance in error calculation between the accuracy attained with complete input data and the accuracy attained by randomly
permuting the input variable, it is possible to assess the significance of a given input parameter. (Eq.9) [79].
PFI = MAEpermuted − MAEoriginal (9)
The mean absolute error (MAE) was used as the error metric to determine the PFI for significant input parameters. It is crucial to
remember that a PFI value near to zero suggests that the input variable has little impact on the outcome while higher PFI values suggest
a strong influence on the model’s prediction.
8. Results and Discussions
8.1. Performance of ML Models
After hyper-parameter selection and training process, ML models are tested on new and unseen data to assess their performance and
fitting capacity. Fig. 7 shows scatter plots of the predicted and actual values with error range of ± 10%, along with the linear fitting
curve. DNN2 model performance is most robust, as most predictions fall within the error lines (Fig. 7-(a). GEP1 model has scattered
points on regression plot showing lower performance (Fig. 7-(d). The DNN2 model has highest R2 values of 0.914 and 0.94 in training
and testing phases. Similarly, DNN1 model has R2 values of 0.84 and 0.86 in training and testing respectively. DNN2 model has higher
accuracy than DNN1. The OGPR model has R2 value of 0.89 and 0.91 in training and testing respectively. After DNN2 model, OGPR is
the most robust and efficient model. GEP2 model has R2 values of 0.86 and 0.83 in training and testing phases. While GEP1 has R2
10
Fig. 7. Regression plots of proposed 5 models. a) DNN1 b) DNN2c) OGPR d) GEP1 e) GEP2.
values of 0.71 and 0.76 in training and testing respectively. The DNN2, OGPR and DNN1, models attained excellent accuracy with R2
values above 0.8, demonstrating a good correlation between predicted and actual data points. Fig. 8 shows experimental and predicted
data along with absolute errors in predictions. Absolute error line clearly shows error and their magnitude at each sample index. DNN2
model has lowest error values (Fig. 8-b) followed by OGPR model (Fig. 8-c). GEP1 model has the highest magnitude and frequency of
errors (Fig. 8-d). The results show that the DNN2 and OGPR models outperformed both GEP models in prediction accuracy during
training and testing phase. Among the DNN and GEP models, DNN2 and GEP2 had higher prediction accuracy in both phases.
8.2. Comparison of Developed Models
Moreover, statistical indicators including R2, RMSE, MAE, PI, and OBJ, were used to evaluate the performance of the ML models.
11
Fig. 8. Absolute error plots of proposed 5 models. a) DNN1 b) DNN2c) OGPR d) GEP1 e) GEP2.
The results are shown in Fig. 9(a)-(e). DNN2 has the highest R2 value and GEP has the lowest R2 value. DNN2 model R2 was 3.3% and
13.5% higher than OGPR and GEP2 in testing phase. DNN2 model performed 8.86% and 9.3% better than DNN1 in training and testing
respectively in terms of R2. Similarly, GEP2 model R2 is 21.12% and 9.21% better than GEP1 in training and testing respectively. DNN2
model performed 20.32% and 31.5% better than OGPR and GEP2 in terms of MAE in testing phase. Similarly, GEP2 and DNN2 MAE
were 13.1% and 31.5% better than GEP1 and DNN1 in testing phase. DNN2 model performed 18.53% and 40.14% better than OGPR
and GEP2 in terms of RMSE while GEP2 and DNN2 RMSE were 15.47% and 34.8% better than GEP1 and DNN1 in testing phase. Better
model performance is indicated by PI and OBJ values that are closer to 0 [76]. The PI value of training phase was DNN2 (0.055), OGPR
(0.062), DNN1 (0.079), GEP2 (0.074), GEP1 (0.115). Similarly, PI values for testing phase were DNN2 (0.038), OGPR (0.047), DNN1
(0.061), GEP2 (0.067), GEP1 (0.083). DNN2 model has the lowest OBJ value of 0.048. OBJ values of other models are OGPR (0.056),
DNN1 (0.072), GEP2 (0.071), GEP1 (0.102). The constructed models, especially DNN2 and OGPR have given robust performance for
spilt tensile strength prediction of FRAC comparable to STS prediction of RA concrete in literature [36,37,42].
Moreover, hyperparameters were changed for selection of the most optimal model for prediction of spilt tensile strength of FRAC.
The evaluation of DNN models considered the number of hidden layers (HL) as a governing factor in optimizing the network’s results
and reducing deviations. Analysis of the training and testing phases indicated that DNN2 (with 3 HL) exhibited a superior degree of
forecasting accuracy compared to DNN1 (with 2 HL). It’s evident that certain key parameters significantly impacted the prediction
performance of GEP models. As is observed, influential parameters for GEP include the number of genes, head size, number of
chromosomes, linking function and function set. Notably, the GEP2 model provided a more accurate adjustment to real-world data
compared to GEP1, primarily because of its higher number of genes, head size, and chromosomes (see Table 3). Previous research has
also indicated that a greater number of chromosomes is associated with improved fitness but may yield suboptimal results in testing or
validation data. Furthermore, it’s evident that the Bayesian optimizer and the hyperparameter-based Gaussian process regressor (GPR)
12
Fig. 9. Values of Statistical indicators (a) R2 (b) RMSE (c) MAE (d) PI (e) OBJ.
demonstrated notably superior predictive abilities for STS prediction, with results similar to DNN2 model. The hyperparameters of
GPR, including the base function, kernel function, kernel scale, and sigma values, were accurately optimized through a Bayesian
optimization process across 30 iterations. The most optimal hyperparameters and those associated with minimal error were
13
determined after 24 iterations for split tensile strength. Furthermore, Table 5 provides a detailed account of the hyperparameters
optimized for the OGPR model during the training and validation phases for split tensile strength.
8.3. Training accuracy and cross validation
For deep neural networks, the model training accuracy was plotted to visualize the training progress. Fig. 10 shows the training
progress in terms of accuracy and depicts the model improvement during training. Accuracy curve does not show any under or
overfitting. The convergence is achieved on 5000 no. of epochs after which models show no improvement in accuracy. Moreover, in
this study a random state function with value zero is used for all models to make results reproducible. Therefore, the same 80% data is
used for training and the remaining 20% is used for testing. However, fixing database in this manner may cause overfitting effecting the
accuracy of models. Therefore, cross validation with 05 folds is used to address this issue. The training data was split into five (05) sets
using the cross-validation method. Then it utilizes one set for model validation and four (04) sets for training. As a result, the function
does five (05) rounds, validating each time with a fresh set of data. The average of each iteration’s unique performance is used to
evaluate the final performance [80]. The cross-validation function helps prevent data overfitting and raises the predictive power of the
model. Fig. 11 shows cross-validation accuracy of machine learning models for training and validation dataset through box plot
following Rahman et al.[81]. Box shows interquartile range and horizontal line denoting the median value. Upper and lower whiskers
are 1.5 times the difference between the first quartile and third quartile respectively. The cross-validation accuracy for training and
validation ranges from 68% to 96%. Highest performance is shown by DNN2 with accuracy ranges from 88% to 94%. For both training
and testing data, the GEP1 model shows the highest dispersion. The average accuracy of machine learning models ranges from 72 to
91% and 72.6% to 92.4% for training and validation data, respectively.
8.4. Sensitivity analysis
8.4.1. Relevancy factor

In this study, the relevancy factor (r) values were calculated for the input parameters considering their impact on the forecasted
values of STS using DNN2 and OGPR models. The r-values provide insight into the influence of each individual input variable on the
model outputs[78]. A higher r-value indicates a stronger impact on the forecasting performance of the model. The results, displayed in
Fig. 12. Among them, RCA and Dmax_RCA exhibits a particularly strong inverse relationship with strength performance. Higher doses
of RCA content result in a significant reduction in STS. On the other hand, the analysis shows that the Cement, NCA, Den_RCA, SP and
Fibers have positive influence on the strength values. Moreover, RCA, Dmax_RCA, WA__RCA and water content have the most negative
effect on STS values. The cement content and water content are generally known as the two most significant components determining
not only the fresh qualities but also the hardened features of concrete, therefore this is in excellent accord with actual understanding of
concrete. Increased recycled aggregate content in recycled aggregate concrete (RAC) results in less natural aggregate being used,
which decreases strength. This is because recycled aggregates often have lower strength qualities than natural aggregates owing to
interior spaces and fractures. As a result, the splitting tensile strength of RAC is further reduced. Furthermore, using recycled ag
gregates increases the Interfacial Transition Zones (ITZs), which are weak places inside the concrete. As a result, the recycled aggregate
content has a considerable effect on the splitting tensile strength of FRAC, which is consistent with previous machine learning studies
[1,82]. Pervious research has mentioned similar results, Pan et al.[42] revealed cement content, water content, RCA content,
maximum aggregate size of RCA has highest effect on outcome. Amin et al.[37] study also mention cement content, NCA content, RCA
content, water content has highest influence on output. Han et al.[1] reveals lesser impact of fiber volume and type. Similar, results of
influence of raw ingredients on spilt tensile strength of RA concrete is revealed by [36].
8.4.2. Permutation feature importance

The Permutation Feature Importance is a widely recognized indicator used to assess the influence of input variables on model
outputs and the desired properties of matrices. The PFI values obtained for STS using the DNN2 and OGPR models are presented in
Table 5
Optimized hyperparameters of OGPR model.
Phase STS
Training • Basic function: Linear

• Optimizer: Bayesian optimization.
• Kernel function: Noninotropic exponential
• Kernel scale: 14.6
• Sigma: 15.3
• Standardize: false.
Validation • Basic function: Linear
• Optimizer: Bayesian optimization.
• Kernel function: Noninotropic rational quadratic
• Kernel scale: 0.0574
• Sigma: 153.5
• Standardize: false.
14
Fig. 10. Accuracy vs number of epoch for deep neural networks.
Fig. 11. Cross validation accuracy of models (a) Training dataset (b) Testing dataset.
Fig. 12. Relevancy factor graph.
Table 6. PFI value close to 0 indicates a lower impact, while a higher value indicates a more significant role in determining the output
information [77]. The MAE values for the entire dataset including training and test data points, using DNN2 and OGPR models were
approximately 0.55 and 0.54 respectively. The analysis revealed that the order of higher PFI values for STS was cement>RCA
>SP>Den_RCA and so on, confirming the significance of these factors. The inclusion of RCA in concrete mixtures has a substantial
impact on the strength properties. The findings from the sensitivity analysis can be effectively utilized in practical applications to
design optimal RAC mixtures for achieving desirable strength performance.
15
Table 6
PFI values of permuted features on DNN2 and GEP2 models for all data points.
Strength Properties Input Features DNN2 OGPR
MAE perm PFI MAE perm PFI
Spilt tensile-strength (STS) Water 0.61 0.16 0.59 0.138

Cement 0.64 0.181 0.61 0.153
NCA 0.48 0.026 0.47 0.03
RCA 0.61 0.158 0.58 0.121
Dmax_RCA 0.51 0.057 0.527 0.067
Den_RCA 0.56 0.099 0.56 0.095
WRCA 0.56 0.102 0.521 0.061
Fiber_vol 0.51 0.049 0.48 0.016
Fiber_types 0.52 0.068 0.51 0.049
SP 0.55 0.097 0.56 0.098
8.5. Analysis of hex contour map for correlations
Fig. 13 provide the results of correlation analysis conducted on the combined training and test datasets for STS. The hex contour
maps illustrate the influence of each input variable on the outputs, as well as the frequency distribution of input and output data. The
regions with thicker and darker colors indicate the most frequently used data points of inputs. RCA content is distributed from 57 to
1500 kg/m3. It can be seen that at higher RCA content, STS values are lower. Similarly, RCA density is distributed between 2000 to
2600 kg/m3. At higher density of RCA, STS values are higher. This validates the results of sensitivity analysis that RCA content has
negative while density of RCA has positive effect.
9. Conclusions
This study employed various prediction models to forecast the spilt tensile strength of fiber reinforced recycle aggregate concrete.
The models utilized in the research included two deep neural network (DNN) methods, namely DNN1 and DNN2, one optimizable
Gaussian process regression (OGPR), and two models based on genetic programming (GEP) known as GEP1 and GEP2. The study
considered ten input variables with a total of 257 data points collected from published paper, with 80% used for the training phase and
remaining for test phase. Statistical parameters were employed to compare the performance of the models and evaluate their predictive
capabilities.
1. DNN2 achieved the highest R2 values of 0.94. write about R2. OGPR and GEP2 have R2 values of 0.91 and 0.83. DNN2 and GEP2
performed 9.3% and 9.21% better than DNN1 and GEP1 in terms of R2.
2. DNN2 lowest MAE and RMSE 0.187 MPa and 0.255 MPa. DNN2 MAE and RMSE is 20.43% and 18.53% and 31.5% and 40.14%
better than OGPR and GEP2 respectively. DNN2 and GEP2 MAE performed 31.5% and 13.1% than DNN1 and GEP1. Similarly, in
terms of RMSE DNN2 and GEP performed 34.7% and 13.25% better than DNN1 and DNN2.
3. PI values for testing phase were DNN2 (0.038), OGPR (0.047), DNN1 (0.061), GEP2 (0.067), GEP1 (0.083). DNN2 model has the
lowest OBJ value of 0.048. OBJ values of other models are OGPR (0.056), DNN1 (0.072), GEP2 (0.071), GEP1 (0.102).
4. Write about K fold cross validation. DNN2 has the highest cross validation range of 88% to 94%. ML models’ accuracy range
between for 72% to 91% training and 72.6% to 92.4% for validation respectively.
5. Sensitivity analysis considering the relevance factor and PFI revealed that the most significant positive contribution is cement and
negative contribution is made by RCA. The analysis shows that the Cement, natural coarse aggregate, density of RCA, SP and Fibers
have positive influence on the strength values. Moreover, RCA, Dmax_RCA, WA__RCA and water content have the most negative
effect on STS values. The PFI analysis revealed that the order of higher PFI values for STS was cement>RCA >SP>Den_RCA,
confirming the significance of these factors. The inclusion of RCA in concrete mixtures has a substantial impact on the strength
properties.
9.1. Limitations and future work
While this is a data mining study that involves the analysis of dataset, assessment of extensive algorithms, and a thorough
sensitivity study. This study also employs programming based GEP technique in addition to algorithmic-based approaches, demon
strating the diversity of this research but it’s essential to address its limitations and constraints. It’s crucial to emphasize that the
employed machine learning techniques are optimized for the specific input parameters. Introducing new input parameters will require
fresh training, testing, and hyperparameter optimization. The accuracy of the prediction models is intricately tied to the completeness
of data. This study is using 10 input variables with 257 datapoints, but it’s also crucial to study more input variables to assess their
importance in predicting spilt tensile strength of FRAC. These ML models should also be used to predict other mechanical properties of
FRAC like tensile strength, corrosion, toughness, and durability by using a large data base that consider extensive number of
explanatory variables. Similarly, current database should be explored by other cutting-edge ML models like Adaptive Neuro Fuzzy
16
Fig. 13. Hex contour analysis.
Inference System (ANFIS), Gradient bosting techniques (GBT), Multivariate adaptive regression splines (MARS), Multi-layer percep
tron (MLP), etc. These two processes will help compare and verify the performance of variety of ML models on same dataset and also in
exploration of other mechanical properties of FRAC.
17
Declaration of Competing Interest
We have no conflicts of interest to disclose.
Data Availability
Data will be made available on request.
Acknowledgement
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for
funding this research work through the project number INST024.
Appendix A. Supporting information
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.cscm.2023.e02836.
References
[1] X. Han, K. Cui, Q. Xiao, Physics assisted machine learning methods for predicting the cracking performance of recycled aggregate concrete, (2022).
[2] S. Zhou, C. Lu, X. Zhu, F. Li, Preparation and characterization of high-strength geopolymer based on BH-1 lunar soil simulant with low alkali content,
Engineering 7 (2021) 1631–1645, https://doi.org/10.1016/j.eng.2020.10.016.
[3] H. Huang, Y. Yuan, W. Zhang, L. Zhu, Property assessment of high-performance concrete containing three types of fibers, Int. J. Concr. Struct. Mater. 15 (2021),
https://doi.org/10.1186/s40069-021-00476-7.
[4] K.H. Younis, K. Pilakoutas, Strength prediction model and methods for improving recycled aggregate concrete, Constr. Build. Mater. 49 (2013) 688–701.
[5] L. Sun, C. Wang, C. Zhang, Z. Yang, C. Li, P. Qiao, Experimental investigation on the bond performance of sea sand coral concrete with FRP bar reinforcement for
marine environments, Adv. Struct. Eng. (2022), https://doi.org/10.1177/13694332221131153.
[6] C. Zhou, J. Wang, X. Shao, L. Li, J. Sun, X. Wang, The feasibility of using ultra-high performance concrete (UHPC) to strengthen RC beams in torsion, J. Mater.
Res. Technol. 24 (2023) 9961–9983, https://doi.org/10.1016/j.jmrt.2023.05.185.
[7] K.H. Yang, H.S. Chung, A.F. Ashour, Influence of type and replacement level of recycled aggregates on concrete properties, Acids Mater. J. 105 (2008), https://
doi.org/10.14359/19826.
[8] H. Katkhuda, N. Shatarat, Improving the mechanical properties of recycled concrete aggregate using chopped basalt fibers and acid treatment, Constr. Build.
Mater. 140 (2017) 328–335.
[9] C.R. Meesala, Influence of different types of fiber on the properties of recycled aggregate concrete, Struct. Concr. 20 (2019) 1656–1669.
[10] M. Wang, X. Yang, W. Wang, Establishing a 3D aggregates database from X-ray CT scans of bulk concrete, Constr. Build. Mater. 315 (2022), https://doi.org/
10.1016/j.conbuildmat.2021.125740.
[11] K.R. Akça, Ö. Çakır, M. İpek, Properties of polypropylene fiber reinforced concrete using recycled aggregates, Constr. Build. Mater. 98 (2015) 620–630.
[12] B. Ali, L.A. Qureshi, Influence of glass fibers on mechanical and durability performance of concrete with recycled aggregates, Constr. Build. Mater. 228 (2019)
116783.
[13] D. Gao, L. Zhang, Flexural performance and evaluation method of steel fiber reinforced recycled coarse aggregate concrete, Constr. Build. Mater. 159 (2018)
126–136.
[14] F. Yuan, L. Cheng, X. Shao, Z. Dong, L. Zhang, G. Wu, X. He, Full-field measurement and fracture and fatigue characterizations of asphalt concrete based on the
SCB test and stereo-DIC, Eng. Fract. Mech. 235 (2020) 107127.
[15] X. Hu, Q. Li, Z. Wu, S. Yang, Modelling fracture process zone width and length for quasi-brittle fracture of rock, concrete and ceramics, Eng. Fract. Mech. 259
(2022), https://doi.org/10.1016/j.engfracmech.2021.108158.
[16] L. Chen, Z. Chen, Z. Xie, L. Wei, J. Hua, L. Huang, P.S. Yap, Recent developments on natural fiber concrete: a review of properties, sustainability, applications,
barriers, and opportunities, Dev. Built Environ. 16 (2023), https://doi.org/10.1016/j.dibe.2023.100255.
[17] F. Khademi, M. Akbari, S.M. Jamal, M. Nikoo, Multiple linear regression, artificial neural network, and fuzzy logic prediction of 28 days compressive strength of
concrete, Front. Struct. Civ. Eng. 11 (2017) 90–99.
[18] A. Sadrmomtazi, J. Sobhani, M.A. Mirgozar, Modeling compressive strength of EPS lightweight concrete using regression, neural network and ANFIS, Constr.
Build. Mater. 42 (2013) 205–216, https://doi.org/10.1016/j.conbuildmat.2013.01.016.
[19] H. Zhou, F. Jiangao, L. Huang, Y. Hu, Z. Xie, Z. Zeng, M. Liu, B. Wang, X. Zhou, Early shrinkage modeling of complex internally confined concrete based on
capillary tension theory, Buildings 13 (2023) 2201, https://doi.org/10.3390/buildings13092201.
[20] U. Atici, Prediction of the strength of mineral admixture concrete using multivariable regression analysis and an artificial neural network, Expert Syst. Appl. 38
(2011) 9609–9618.
[21] M.H. Fazel Zarandi, I.B. Türksen, J. Sobhani, A.A. Ramezanianpour, M.H.F. Zarandi, I.B. Türksen, J. Sobhani, A.A. Ramezanianpour, Fuzzy polynomial neural
networks for approximation of the compressive strength of concrete, Appl. Soft Comput. 8 (2008) 488–498, https://doi.org/10.1016/J.ASOC.2007.02.010.
[22] H. Tang, Y. Yang, H. Li, L. Xiao, Y. Ge, Effects of chloride salt erosion and freeze–thaw cycle on interface shear behavior between ordinary concrete and self-
compacting concrete, Structures 56 (2023), https://doi.org/10.1016/j.istruc.2023.104990.
[23] M.A. DeRousseau, E. Laftchiev, J.R. Kasprzyk, B. Rajagopalan, W.V. Srubar, A comparison of machine learning methods for predicting the compressive strength
of field-placed concrete, Constr. Build. Mater. 228 (2019) 116661, https://doi.org/10.1016/j.conbuildmat.2019.08.042.
[24] D.-C.C. Feng, Z.-T.T. Liu, X.-D.D. Wang, Y. Chen, J.-Q.Q. Chang, D.-F.F. Wei, Z.-M.M. Jiang, Machine learning-based compressive strength prediction for
concrete: an adaptive boosting approach, Constr. Build. Mater. 230 (2020) 117000, https://doi.org/10.1016/j.conbuildmat.2019.117000.
[25] E. Ford, S. Kailas, K. Maneparambil, N. Neithalath, Machine learning approaches to predict the micromechanical properties of cementitious hydration phases
from microstructural chemical maps, Constr. Build. Mater. 265 (2020) 120647, https://doi.org/10.1016/j.conbuildmat.2020.120647.
[26] P.O. Awoyera, M.S. Kirgiz, A. Viloria, D. Ovallos-Gazabon, Estimating strength properties of geopolymer self-compacting concrete using machine learning
techniques, J. Mater. Res. Technol. 9 (2020) 9016–9028, https://doi.org/10.1016/j.jmrt.2020.06.008.
[27] A. Ahmad, K. Chaiyasarn, F. Farooq, W. Ahmad, S. Suparp, F. Aslam, Compressive strength prediction via gene expression programming (Gep) and artificial
neural network (ann) for concrete containing rca, Buildings 11 (2021) 324, https://doi.org/10.3390/buildings11080324.
[28] Z.H. Duan, S.C. Kou, C.S. Poon, Prediction of compressive strength of recycled aggregate concrete using artificial neural networks, Constr. Build. Mater. 40
(2013) 1200–1206, https://doi.org/10.1016/j.conbuildmat.2012.04.063.
18
[29] F. Deng, Y. He, S. Zhou, Y. Yu, H. Cheng, X. Wu, Compressive strength prediction of recycled concrete based on deep learning, Constr. Build. Mater. 175 (2018)
562–569, https://doi.org/10.1016/j.conbuildmat.2018.04.169.
[30] S.V. Patil, K. Balakrishna Rao, G. Nayak, Prediction of recycled coarse aggregate concrete mechanical properties using multiple linear regression and artificial
neural network, J. Eng. Des. Technol. (2021).
[31] D.K. Bui, T. Nguyen, J.S. Chou, H. Nguyen-Xuan, T.D. Ngo, A modified firefly algorithm-artificial neural network expert system for predicting compressive and
tensile strength of high-performance concrete, Constr. Build. Mater. 180 (2018) 320–333, https://doi.org/10.1016/j.conbuildmat.2018.05.201.
[32] A. Behnood, K.P. Verian, M. Modiri Gharehveran, Evaluation of the splitting tensile strength in plain and steel fiber-reinforced concrete based on the
compressive strength, Constr. Build. Mater. 98 (2015) 519–529, https://doi.org/10.1016/j.conbuildmat.2015.08.124.
[33] D. Nagarajan, T. Rajagopal, N. Meyappan, A comparative study on prediction models for strength properties of LWA concrete using artificial neural network,
Rev. La Constr. 19 (2020), https://doi.org/10.7764/RDLC.19.1.103-111.
[34] P. Guo, W. Meng, M. Xu, V.C. Li, Y. Bao, Predicting mechanical properties of high-performance fiber-reinforced cementitious composites by integrating
micromechanics and machine learning, Materials 14 (2021), https://doi.org/10.3390/ma14123143.
[35] S. Ray, M. Haque, T. Ahmed, T.T. Nahin, Comparison of artificial neural network (ANN) and response surface methodology (RSM) in predicting the compressive
and splitting tensile strength of concrete prepared with glass waste and tin (Sn) can fiber, J. King Saudi Univ. - Eng. Sci. 35 (2023), https://doi.org/10.1016/j.
jksues.2021.03.006.
[36] Y. Zhu, A. Ahmad, W. Ahmad, N.I. Vatin, A.M. Mohamed, D. Fathi, Predicting the splitting tensile strength of recycled aggregate concrete using individual and
ensemble machine learning approaches, Crystals 12 (2022), https://doi.org/10.3390/cryst12050569.
[37] M.N. Amin, A. Ahmad, K. Khan, W. Ahmad, S. Nazar, M.I. Faraz, A.A. Alabdullah, Split tensile strength prediction of recycled aggregate-based sustainable
concrete using artificial intelligence methods, Materials 15 (2022) 4296.
[38] J. De-Prado-gil, O. Zaid, C. Palencia, R. Martínez-García, Prediction of splitting tensile strength of self-compacting recycled aggregate concrete using novel deep
learning methods, Mathematics 10 (2022), https://doi.org/10.3390/math10132245.
[39] B.P. Koya, S. Aneja, R. Gupta, C. Valeo, Comparative analysis of different machine learning algorithms to predict mechanical properties of concrete, Mech. Adv.
Mater. Struct. 29 (2022) 4032–4043, https://doi.org/10.1080/15376494.2021.1917021.
[40] G.A. Lyngdoh, M. Zaki, N.M.A. Krishnan, S. Das, Prediction of concrete strengths enabled by missing data imputation and interpretable machine learning, Cem.
Concr. Compos. 128 (2022) 104414.
[41] Q. Zhang, H. Habibi, Comparison of data mining methods to predict mechanical properties of concrete with fly ash and alccofine, J. Mater. Res. Technol. 15
(2021) 2188–2201, https://doi.org/10.1016/j.jmrt.2021.09.024.
[42] X. Pan, Y. Xiao, S.A. Suhail, W. Ahmad, G. Murali, A. Salmi, A. Mohamed, Use of artificial intelligence methods for predicting the strength of recycled aggregate
concrete and the influence of raw ingredients, Materials 15 (2022) 4194.
[43] H. Nguyen, T. Vu, T.P. Vo, H.T. Thai, Efficient machine learning models for prediction of concrete strengths, Constr. Build. Mater. 266 (2021) 120950, https://
doi.org/10.1016/j.conbuildmat.2020.120950.
[44] K. Yan, H. Xu, G. Shen, P. Liu, Prediction of splitting tensile strength from cylinder compressive strength of concrete by support vector machine, Adv. Mater. Sci.
Eng. 2013 (2013), https://doi.org/10.1155/2013/597257.
[45] J. De-Prado-gil, C. Palencia, P. Jagadesh, R. Martínez-García, A comparison of machine learning tools that model the splitting tensile strength of self-compacting
recycled aggregate concrete, Materials 15 (2022), https://doi.org/10.3390/ma15124164.
[46] M. Shang, H. Li, A. Ahmad, W. Ahmad, K.A. Ostrowski, F. Aslam, P. Joyklad, T.M. Majka, Predicting the mechanical properties of RCA-based concrete using
supervised machine learning algorithms, Materials 15 (2022), https://doi.org/10.3390/ma15020647.
[47] J. Zhou, W. Huang, F. Chen, Facilitating machine learning model comparison and explanation through a radial visualisation, Energies 14 (2021), https://doi.
org/10.3390/en14217049.
[48] A. De Coster, N. Musliu, A. Schaerf, J. Schoisswohl, K. Smith-Miles, Algorithm selection and instance space analysis for curriculum-based course timetabling,
J. Sched. 25 (2022), https://doi.org/10.1007/s10951-021-00701-x.
[49] M. Fernández-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res 15
(2014).
[50] R. Caruana, A. Niculescu-Mizil,. An empirical comparison of supervised learning algorithms ACM Int. Conf. Proc. Ser. 2006 doi: 10.1145/1143844.1143865.
[51] S. Uddin, A. Khan, M.E. Hossain, M.A. Moni, Comparing different supervised machine learning algorithms for disease prediction, BMC Med. Inform. Decis. Mak.
19 (2019), https://doi.org/10.1186/s12911-019-1004-8.
[52] L. Yang, A. Shami, On hyperparameter optimization of machine learning algorithms: Theory and practice, Neurocomputing 415 (2020) 295–316, https://doi.
org/10.1016/j.neucom.2020.07.061.
[53] J. Wu, X.-Y.Y. Chen, H. Zhang, L.-D.D. Xiong, H. Lei, S.-H.H. Deng, Hyperparameter optimization for machine learning models based on Bayesian optimization,
J. Electron. Sci. Technol. 17 (2019) 26–40, https://doi.org/10.11989/JEST.1674-862X.80904120.
[54] H. Huang, M. Guo, W. Zhang, M. Huang, Seismic Behavior of Strengthened RC Columns under Combined Loadings, J. Bridg. Eng 27 (2022) 05022005, https://
doi.org/10.1061/(asce)be.1943-5592.0001871.
[55] A. Hillerborg, M. Modéer, P.-E. Petersson, Analysis of crack formation and crack growth in concrete by means of fracture mechanics and finite elements, Cem.
Concr. Res. 6 (1976) 773–781.
[56] Z.P. Bažant J. Planas Fract. Size Eff. Concr. Other Quasibrittle Mater. 2019 doi: 10.1201/9780203756799.
[57] J. Fox, Applied regression analysis and generalized linear models, J. Chem. Inf. Model. 53 (2008).
[58] W. Black, B.J. Babin, Multivariate data analysis: its approach, evolution, and impact, in: Gt. Facil., 2019, https://doi.org/10.1007/978-3-030-06031-2_16.
[59] A. Alyaseen, A. Poddar, N. Kumar, S. Tajjour, C.V.S.R. Prasad, H. Alahmad, P. Sihag, High-performance self-compacting concrete with recycled coarse
aggregate: Soft-computing analysis of compressive strength, J. Build. Eng. 77 (2023), https://doi.org/10.1016/j.jobe.2023.107527.
[60] S. Chithra, S.R.R.S.R.S. Kumar, K. Chinnaraju, F. Alfin Ashmita, F.A. Ashmita, F. Alfin Ashmita, A comparative study on the compressive strength prediction
models for high performance concrete containing nano silica and copper slag using regression analysis and artificial neural networks, Constr. Build. Mater. 114
(2016) 528–535, https://doi.org/10.1016/J.CONBUILDMAT.2016.03.214.
[61] İ.B. Topçu, C. Karakurt, M. Sarıdemir, Predicting the strength development of cements produced with different pozzolans by neural network and fuzzy logic,
Mater. Des. 29 (2008) 1986–1991.
[62] İ.B. Topçu, M. Sarıdemir, Prediction of mechanical properties of recycled aggregate concretes containing silica fume using artificial neural networks and fuzzy
logic, Comput. Mater. Sci. 42 (2008) 74–82.
[63] J. Sobhani, M. Najimi, A.R. Pourkhorshidi, T. Parhizkar, Prediction of the compressive strength of no-slump concrete: a comparative study of regression, neural
network and ANFIS models, Constr. Build. Mater. 24 (2010) 709–718, https://doi.org/10.1016/j.conbuildmat.2009.10.037.
[64] A.A.M. Ahmed, Prediction of dissolved oxygen in Surma River by biochemical oxygen demand and chemical oxygen demand using the artificial neural networks
(ANNs), J. King Saud. Univ. Sci. 29 (2017) 151–158.
[65] M.A. Getahun, S.M. Shitote, Z.C. Abiero Gariy, Artificial neural network based modelling approach for strength prediction of concrete incorporating agricultural
and construction wastes, Constr. Build. Mater. 190 (2018) 517–525, https://doi.org/10.1016/j.conbuildmat.2018.09.097.
[66] M.A. Haque, B. Chen, M.F. Javed, F.E. Jalal, Evaluating the mechanical strength prediction performances of fly ash-based MPC mortar with artificial intelligence
approaches, J. Clean. Prod. 355 (2022) 131815, https://doi.org/10.1016/j.jclepro.2022.131815.
[67] G. Kopsiaftis, E. Protopapadakis, A. Voulodimos, N. Doulamis, A. Mantoglou, Gaussian process regression tuned by bayesian optimization for seawater intrusion
prediction, Comput. Intell. Neurosci. 2019 (2019) 1–12, https://doi.org/10.1155/2019/2859429.
[68] H. Cui, J. Bai, A new hyperparameters optimization method for convolutional neural networks, Pattern Recognit. Lett. 125 (2019) 828–834, https://doi.org/
10.1016/j.patrec.2019.02.009.
[69] J.R. Koza, Genetic programming as a means for programming computers by natural selection, Stat. Comput. 4 (1994) 87–112.
19
[70] A. Gholampour, A.H. Gandomi, T. Ozbakkaloglu, New formulations for mechanical properties of recycled aggregate concrete using gene expression
programming, Constr. Build. Mater. 130 (2017) 122–145, https://doi.org/10.1016/j.conbuildmat.2016.10.114.
[71] C. Ferreira, Gene Expression Programming Mathematical Modeling by an Artificial Intelligence, 2006. 〈https://books.google.com/books?
hl=en&lr=&id=NkG7BQAAQBAJ&oi=fnd&pg=PR7&dq=Ferreira,+C.,+Gene+expression+programming:+mathematical+modeling+by+an+artificial+
intelligence.+Vol.+21.+2006:+Springer.&ots=YZpsyC1iAY&sig=NlLz-31qU2f8LhqD3yCMXCRunNk〉 (accessed December 17, 2019).
[72] M. Sarıdemir, Genetic programming approach for prediction of compressive strength of concretes containing rice husk ash, Constr. Build. Mater. 24 (2010)
1911–1919, https://doi.org/10.1016/j.conbuildmat.2010.04.011.
[73] L. Chen, Z. Wang, A.A. Khan, M. Khan, M.F. Javed, A. Alaskar, S.M. Eldin, Development of predictive models for sustainable concrete via genetic programming-
based algorithms, J. Mater. Res. Technol. 24 (2023) 6391–6410, https://doi.org/10.1016/j.jmrt.2023.04.180.
[74] A.A.M. Ahmed, S.M.A. Shah, Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River,
J. King Saud. Univ. - Eng. Sci. 29 (2017) 237–243, https://doi.org/10.1016/j.jksues.2015.02.001.
[75] H. Madani, M. Kooshafar, M. Emadi, Compressive strength prediction of nanosilica-incorporated cement mixtures using adaptive neuro-fuzzy inference system
and artificial neural network models, Pract. Period. Struct. Des. Constr. 25 (2020), https://doi.org/10.1061/(ASCE)SC.1943-5576.0000499.
[76] M.F. Iqbal, Q. feng Liu, I. Azim, X. Zhu, J. Yang, M.F. Javed, M. Rauf, Prediction of mechanical properties of green concrete incorporating waste foundry sand
based on gene expression programming, J. Hazard. Mater. 384 (2020) 121322, https://doi.org/10.1016/j.jhazmat.2019.121322.
[77] C. Benamara, K. Gharbi, M. Nait Amar, B. Hamada, Prediction of wax appearance temperature using artificial intelligent techniques, Arab. J. Sci. Eng. 45 (2020)
1319–1330.
[78] M.R. Ahmad, B. Chen, J.-G. Dai, S.M.S. Kazmi, M.J. Munir, Evolutionary artificial intelligence approach for performance prediction of bio-composites, Constr.
Build. Mater. 290 (2021) 123254, https://doi.org/10.1016/j.conbuildmat.2021.123254.
[79] Y. Peng, C. Unluer, Analyzing the mechanical performance of fly ash-based geopolymer concrete with different machine learning techniques, Constr. Build.
Mater. 316 (2022) 125785, https://doi.org/10.1016/j.conbuildmat.2021.125785.
[80] M.C. Kang, D.Y. Yoo, R. Gupta, Machine learning-based prediction for compressive and flexural strengths of steel fiber-reinforced concrete, Constr. Build. Mater.
266 (2021) 121117, https://doi.org/10.1016/J.CONBUILDMAT.2020.121117.
[81] J. Rahman, K.S. Ahmed, N.I. Khan, K. Islam, S. Mangalathu, Data-driven shear strength prediction of steel fiber reinforced concrete beams using machine
learning approach, Eng. Struct. 233 (2021) 111743, https://doi.org/10.1016/j.engstruct.2020.111743.
[82] J.M.V. Gómez-Soberón, Porosity of recycled concrete with substitution of recycled concrete aggregate: an experimental study, Cem. Concr. Res. 32 (2002)
1301–1311.
20

1 s2.0 S2214509523010173 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S2214509523010173 Main

Uploaded by

Copyright:

Available Formats

Case Studies in Construction Materials 20 (2024) e02836

Contents lists available at ScienceDirect

Case Studies in Construction Materials

Machine learning based prediction models for spilt tensile strength

Fig. 1. Cracking phenomena of recycled aggregate concrete.

Fig. 2. Benefit of recycle aggregate concrete.

3.1. Source and feature of literature data

3.2. Data preprocessing

3.3. Nature of experimental data for ML models

4. Characteristics of employed techniques

Fig. 3. Correlation heatmap analysis of variables.

Cement Water NCA RCA Dmax_RCA Den_RCA WA_RCA SP FV FT

μ(x; {xn , yn }, θ) = K (X, x)T K (X, X)− 1 (y − m(X)) (4)

Fig. 5. Mechanism of GEP.

aEI (x; D, θ) = σ (x; D, θ)[γ(x)φ(γ(x)) + N(γ(x)); 0, 1)] (6)

Where, γ(x) = f (Xbestσ(x;D,θ)

Xbest = arg min

Fig. 6. Flowchart for the research’s machine learning process.

6. Model Evaluation Criteria

DNN1 • Architecture: 10–30 –20 – 1

n = data points, Xi = Experimental data, Yi = predicted data, Xi = average

7.1. Relevancy Factor

7.2. Permutation feature importance

8. Results and Discussions

8.1. Performance of ML Models

8.2. Comparison of Developed Models

8.3. Training accuracy and cross validation

8.4. Sensitivity analysis

8.4.1. Relevancy factor

8.4.2. Permutation feature importance

Training • Basic function: Linear

Fig. 10. Accuracy vs number of epoch for deep neural networks.

Fig. 12. Relevancy factor graph.

MAE perm PFI MAE perm PFI

Spilt tensile-strength (STS) Water 0.61 0.16 0.59 0.138

8.5. Analysis of hex contour map for correlations

9.1. Limitations and future work

Fig. 13. Hex contour analysis.

Declaration of Competing Interest

We have no conflicts of interest to disclose.

Data will be made available on request.

Appendix A. Supporting information

You might also like