You are on page 1of 10

Engineering Structures 208 (2020) 110331

Contents lists available at ScienceDirect

Engineering Structures
journal homepage: www.elsevier.com/locate/engstruct

Data-driven machine-learning-based seismic failure mode identification of T


reinforced concrete shear walls
Sujith Mangalathua, Hansol Jangb, Seong-Hoon Hwangc, Jong-Su Jeonc,

a
Equifax Inc., Atlanta, GA 30040, USA
b
Department of Civil, Architectural and Environmental Engineering, The University of Texas, Austin, TX 78712, USA
c
Department of Civil and Environmental Engineering, Hanyang University, Seoul 04763, Republic of Korea

ARTICLE INFO ABSTRACT

Keywords: A reinforced concrete shear wall is one of the most critical structural members in buildings, in terms of carrying
Failure mode classification lateral loads. Despite its importance, post-earthquake reconnaissance and recent experimental studies have
Machine learning highlighted the insufficient safety margins of shear walls. The lack of empirical and mechanics-based models
Reinforced concrete shear wall prevents rapid failure mode identification of existing shear walls. This study builds on recent advances in the
Critical input parameters
area of machine learning to determine the failure mode of shear walls as a function of geometric configurations,
material properties, and reinforcement details. This study assembles a comprehensive database consisting of 393
experimental results for shear walls with various geometric configurations. Eight machine learning models,
including Naïve Bayes, K-Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, XGBoost, LightGBM, and
CatBoost were evaluated in this study, in order to establish the best prediction model. As a result of detailed
evaluation, a machine learning model based on the Random Forest method is proposed in this paper. The
proposed method has 86% accuracy in identifying the failure mode of shear walls. This study also demonstrates
that aspect ratio, boundary element reinforcement indices, and wall length-to-wall thickness ratio are the critical
parameters influencing the failure mode of shear walls. Finally, an open-source data-driven classification model
that can be used in design offices across the world is provided in this paper. The proposed model has the
flexibility to account for additional experimental results yielding new insights.

1. Introduction detailed continuum-based finite element models [3]. Although such a


detailed evaluation is valuable for the seismic performance evaluation of
Shear walls are a commonly used lateral load resisting system in individual buildings, its application is limited in identifying the vulner-
buildings, and recent earthquakes have highlighted the importance of ability of building portfolios in a short time, which is a typical scenario in
the performance of shear walls during seismic events. However, eva- regional risk assessment. The advent of data-driven approaches provides
luations of existing buildings revealed the lack of ductile detailing of a viable alternative to the computationally intensive numerical models
shear walls [1]. for risk and vulnerability assessment.
It has been noted in experimental studies that a shear wall can fail in Mangalathu and Jeon [4] suggested a machine learning model to
flexure, shear, flexure-shear, sliding shear, or out-of-plane, depending on identify the failure mode of beam-column joints. The authors assembled
the geometric configurations and material properties. Knowing the sus- an experimental database to construct an easy-to-use machine model
ceptibility of a shear wall to a specific failure mode can help engineers for rapid assessment of the failure scenarios of beam-column joints.
complete a structural performance analysis and decide on proper retro- Huang and Burton [5] used machine learning methods to classify the
fitting strategies. Past experiments substantiated the fact that the failure mode of reinforced concrete frames with infills. As showed by
common notion that squat shear walls are susceptible to shear failure Mangalathu and Jeon [6], the machine learning model for classifying
mode is not true [2]. The failure mode of walls can be determined by a the failure mode of reinforced concrete columns is about 10% more
complex set of design parameters, not by only one or two parameters. accurate than the existing finite element-based approach. Based on a
The strategy currently employed to identify the failure mode is based on limited dataset of 97 reinforced masonry shear walls, Siam et al. [7]

Corresponding author.

E-mail addresses: sujithmss@gatech.edu (S. Mangalathu), let908@utexas.edu (H. Jang), shwang46@hanyang.ac.kr (S.-H. Hwang),
jongsujeon@hanyang.ac.kr (J.-S. Jeon).

https://doi.org/10.1016/j.engstruct.2020.110331
Received 24 October 2019; Received in revised form 2 January 2020; Accepted 31 January 2020
0141-0296/ © 2020 Elsevier Ltd. All rights reserved.
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

suggested a clustering algorithm to classify the masonry walls. Kiani which are more likely to exhibit an unpredicted shear failure that can be
et al. [8] explored the application of machine learning algorithms for described as a sudden drop of stiffness and strength under seismic loads.
the derivation of fragility curves. Other data-driven approaches have ASCE 41-06 [18] indicates that walls are considered slender (nor-
been directed to the generation of seismic vulnerability curves of in- mally controlled by flexure) if their aspect ratio is greater than 3.0, and
frastructural systems intended for regional risk assessment [9,10]. De- are considered short or squat (normally controlled by shear) if their
spite the advancement in the application of machine learning techni- aspect ratio is less than 1.5. If walls have an intermediate aspect ratio
ques for failure mode and damage assessment of infrastructure systems, between 1.5 and 3.0, they exhibit behavior affected by both shear and
no studies have yet been carried out on the data-driven approaches for flexure. FEMA 306 [19] suggests that ductile flexural failure typically
the failure mode identification of shear walls. Such a study is critical, as occurs in well-designed and relatively tall shear walls (with aspect ra-
shear walls are the most common lateral load resisting system, and tios greater than 3.0), given they have enough shear strength. As shown
there is no mechanics- or empirical-based model for the failure mode in Fig. 1(a), flexural failure occurs due to the crushing of concrete or the
identification of shear walls. In addition, most of the existing studies in fracture of longitudinal reinforcement in the plastic hinge zone. Flex-
earthquake engineering are limited to traditional machine learning ural failure is not commonly observed in squat shear walls, especially
models such as Naïve Bayes, Random Forest, Decision Tree, and Dis- those with aspect ratios less than 1.0. Nevertheless, flexural failure can
criminant analysis [4,6–8]. However, gradient boosting methods are happen in such walls, depending on their reinforcement details, and
popular machine learning methods these days, due to their efficiency, they may be observed with shear failure.
accuracy, and interpretability [11,12]. These methods are backed by Paulay and Priestley [17] classified the shear failure mode of squat
solid theoretical results that show how combining weaker models (base shear walls into three cases: diagonal tension failure, diagonal com-
predictors) iteratively in a greedy manner leads to strong predictors. pression failure, and sliding shear failure, as shown in Fig. 1(b) through
Despite their popularity in other fields, no studies in the field of (d). Their description of these failure modes is summarized as follows.
earthquake engineering have been carried out to explore these Diagonal tension failure is likely to occur when the wall has insufficient
methods. To fill this gap, and to utilize the application of recently ad- horizontal shear reinforcement, and is characterized by one or more
vanced machine learning methods, this paper explores the application corner-to-corner diagonal cracks, as shown in Fig. 1(b). Diagonal com-
of boosting models such as adaptive boosting (AdaBoost) [13], extreme pression failure occurs when a wall has large and adequate horizontal
gradient boosting (XGBoost) [14], light gradient boosting machine shear reinforcement; the concrete may crush under diagonal compression
(LightGBM) [12], and categorical boosting (CatBoost) [15,16]. Note with widespread crack patterns, as shown in Fig. 1(c). The shear walls
that the current study is the first attempt to explore these boosting with boundary elements have a higher potential for diagonal compres-
methodologies for failure mode identifications. Since the performance sion failure, compared to their counterparts with rectangular cross sec-
of the machine learning model depends on the data under considera- tions, because their ability to withstand higher flexural strength, thus
tion, there is a critical need to utilize the current state-of-art machine increasing the shear demand on the web [20]. By limiting nominal shear
learning models for the failure mode prediction of shear walls. Speci- stress and providing sufficient horizontal shear reinforcement, diagonal
fically, this study had the following objectives: (1) to assemble an ex- tension or compression failure can be inhibited. Sliding shear failure
perimental database of shear walls. The database includes geometric occurs either due to (1) large cracks at the wall base and (2) the crushing
properties, reinforcement pattern, and material properties. Such an of concrete and buckling of rebars over a narrow band along the base of
open-source database contributes to the data-driven approach for the the wall, as shown in Fig. 1(d), after significant yielding in the flexural
seismic performance evaluation of buildings. (2) To evaluate the per- reinforcement. Note that the out-of-plane failure mode is not considered
formance of various machine learning models in the classification of in the current paper, and further studies are needed in that direction for
failure modes of shear walls. Various machine learning models such as the application of data driven approaches to predict this failure mode.
Naïve Bayes, K-Nearest Neighbors, Decision Tree, Random Forest,
AdaBoost, XGBoost, LightGBM, and CatBoost are used in this study to 3. Experimental database
establish a classification model. (3) To identify the importance of the
input parameters on the failure mode of shear walls. Such an identifi- As the success of the machine learning model depends on a well-
cation helps researchers to plan proper experimental investigations in constructed database, the detailed description of the database, statis-
order to understand the seismic behavior of shear walls. (4) To create tical analysis of input parameters, and key insights are presented in this
an open-source data-driven classification model that can be used in section.
design offices across the world for the rapid failure mode prediction of
shear walls. The open-source model has the flexibility to incorporate 3.1. Description of experimental database for reinforced concrete shear
more experimental data as it becomes available, thus allowing for walls
continuous improvements in the classification model.
A brief summary of the failure modes, and the existing approaches This study establishes an experimental database consisting of 393
for predicting the failure modes, are outlined below. Subsequently, the one-story, one-bay reinforced concrete shear walls with rectangular or
experimental database is explored to ascertain key insights. The fol- non-rectangular (barbell or flanged) cross sections. Most of the speci-
lowing sections summarize the machine learning models, and the ap- mens used in this database were extracted from two existing databases
plication of machine learning methods to obtain the best prediction [21,22] (see https://purr.purdue.edu/publications/2434/1 and
model. The salient insights obtained from the study are provided in the www.dap.series.upatras.gr for public data and references) and the re-
conclusion section. maining part of the database is newly collected by the authors, based on
existing experimental test results. The collected data is provided in
2. Failure mode of shear walls Supplementary Material. The intention of the study is to generate a
machine learning model to suggest the failure mode identification for a
Reinforced concrete shear walls, commonly used in building systems, wide variety of cases, and the specimens with no axial load were also
are subjected to axial loads, bending moments, and shear forces. Shear included in the database. Of all the specimens, 238 of them have a
walls are generally categorized as either slender (tall, high-rise) and squat rectangular cross section, 95 have a barbell type cross section, and 60
(low-rise), depending on their aspect ratio (shear span length divided by have a flanged cross section (Fig. 2). Shear walls with boundary ele-
wall length). Slender walls are more likely to have a ductile failure me- ments (barbell and flanged sections) can usually achieve higher peak
chanism dominated by flexural yielding near the base. Due to their shear strength than their counterparts with rectangular cross sections
geometry, squat walls tend to have shear-controlled failure mechanisms, [23]. This is associated with the fact that the boundary elements

2
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

(a) Flexural failure (b) Diagonal tension failure

(c) Diagonal compression failure (d) Sliding shear failure


Fig. 1. Typical failure modes of shear walls.

provide additional reinforcement and web confinement. All the shear


walls have symmetric cross sections. Additionally, all the specimens
have continuous longitudinal reinforcement without lap splice, and
deformed and straight reinforcement. Design parameters of the 393
shear walls in the updated database have the following ranges:

• Aspect ratio: 0.25 ≤ M/Vl ≤ 4.10


w

• Wall length to wall thickness ratio: 4.35 ≤ l /t ≤ 57.0


w w

• Concrete compressive strength: 13.7 MPa ≤ f ' ≤ 130.8 MPac

• Vertical reinforcement of web: 0.000 ≤ ρ ≤ 0.037vw

• Horizontal reinforcement of web: 0.000 ≤ ρ ≤ 0.037 hw

• Vertical reinforcement of boundary element: 0.000 ≤ ρ ≤ 0.100 vc Rectangular (R) Barbell (B) Flanged (Fl)
• Horizontal reinforcement of boundary element: Fig. 2. Cross section shape of walls.
0.000 ≤ ρ ≤ 0.130
hc

• Yield strength of vertical reinforcement in web:


0 MPa ≤ f ≤ 1,001 MPa
y,vw

• Yield strength of horizontal reinforcement in web: the web to the side area of the web); ρvc is the vertical reinforcement ratio
0 MPa ≤ f ≤ 1,262 MPa
y,hw of the boundary element (the ratio of the total area of vertical reinforce-
• Yield strength of vertical reinforcement in boundary element: ment in the boundary element to the cross sectional area of the boundary
0 MPa ≤ f ≤ 776 MPa
y,vc element); ρhc is the horizontal reinforcement ratio of the boundary element
• Yield strength of horizontal reinforcement in boundary: (the ratio of the total area of horizontal reinforcement in the boundary
0 MPa ≤ f ≤ 1,253 MPa
y,hc element to the side area of the boundary element); fy,vw is the yield
• Gross cross sectional area of wall: 7,800 mm ≤ A ≤ 825,375 mm
2
g
2
strength of the vertical reinforcement in the web; fy,hw is the yield strength
• Area of boundary element to area of gross sectional area of wall; of the horizontal reinforcement in the web; fy,vc is the yield strength of the
0.0 ≤ A /A ≤ 0.443
b g vertical reinforcement in the boundary element; fy,hc is the yield strength
• Axial load ratio: 0.0 ≤ P/f 'A ≤ 0.499where M/Vl is the aspect
c g w of the horizontal reinforcement in the boundary element; Ab is the area of
ratio of the wall, which is defined as the shear span length (defined as the the boundary element; Ag is the gross sectional area of the wall; and P is
base moment demand (M) divided by the base shear demand (V)) divided the axial compressive load on the wall. Note that Ab/Ag is used to account
by the wall length (lw); tw is the wall (web) thickness; fc' is the compressive for the contribution of the size of the boundary elements to the gross
strength of the concrete cylinder; ρvw is the vertical reinforcement ratio of section of the shear walls with barbell and flanges. Also, the design
the web, (the ratio of the total area of vertical reinforcement in the web to parameters ρvc, ρhc, fy,vc, and fy,hc are considered to account for the effect of
the cross sectional area of the web); ρhw is the horizontal reinforcement the boundary element on the failure mode, whether the cross section of the
ratio of the web (the ratio of the total area of horizontal reinforcement in wall is rectangular or non-rectangular.

3
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

'

'

Fig. 3. Distribution of design and response parameters.

In the database, the number of specimens with flexural failure, columns) recently performed by the authors [4,6] to set up input
flexure-shear failure, shear failure, and sliding shear failure is 152, 96, parameters based on design parameters described in the previous sec-
122, and 23, respectively. Fig. 3 illustrates the distribution of the above tion. Use of dimensionless input parameters results in dimensionless
design parameters, as well as the distribution of the cross section shape coefficients in a fitted prediction model, which are independent on the
of walls: rectangular (R), barbell (B), and flanged (Fl). change of the system of units. Thus, all input parameters used in this
study are dimensionless based on the design parameters described
3.2. Selection of input parameters for identification of failure mode above for the convenience of potential users. Some of design para-
meters, such as M/Vlw, lw/tw, Ab/Ag, and P/fc'Ag, are regarded as input
Since studies on the classification of failure mode for reinforced parameters. The first three parameters are used to account for the
concrete shear walls in a probabilistic manner are very scarce, this configuration of shear walls. Design parameters regarding the re-
study employs existing studies on the failure mode identification of inforcement, such as reinforcement ratio and yield strength, are con-
other members (reinforced concrete beam-column connections and sidered by introducing the reinforcement index, which was used in the

4
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

' ' '

'

Fig. 4. Distribution of input parameters.

previous studies to classify the failure mode of reinforced concrete shear failure mode (either diagonal tension or compression). For this
members [4,6]. The reinforcement index is the reinforcement ratio reason, this research regards the failure mode of reinforced concrete
times the associated yield strength normalized to the concrete com- walls as being categorized into the following four cases: flexural failure
pression strength (fc'). The reinforcement index can be interpreted as (F), flexure-shear failure (FS), shear failure (S), and sliding shear failure
the strength ratio, which is defined to incorporate both the quantity and (SL), which are denoted as 1, 2, 3, and 4, respectively, in the model
strength of materials in the section. In this study, four reinforcement code (Section 7). In the database, the number of specimens exhibiting
indices are used: the web vertical reinforcement index (ρvwfy,vw/fc'), the flexural failure, flexure-shear failure, shear failure, and sliding shear
web horizontal reinforcement index (ρhwfy,hw/fc'), the boundary element failure were 152, 96, 122, and 23, respectively, as shown in Fig. 3.
vertical reinforcement index (ρvcfy,vc/fc'), and the boundary element
horizontal reinforcement index (ρhcfy,hc/fc'). In addition, the cross sec-
tion shape of the shear walls is added as an input parameter. Here, three 4. Overview of machine learning techniques
shapes of cross section are considered: rectangular (R), barbell (B), and
flanged (Fl). Fig. 4 shows the distribution of input parameters computed Eight machine learning models were used in this study to establish
using the design parameters. Note that because the input parameters the best failure mode classification algorithm: (1) Naïve Bayes, (2) K-
X1 = M/Vlw, X2 = lw/tw, X7 = P/fc'Ag, and X9 = Ab/Ag were already Nearest Neighbors, (3) Decision Tree, (4) Random Forest, (5) AdaBoost,
shown in Fig. 3, they are not shown in Fig. 4. It is clear that there is no (6) XGBoost, (7) LightGBM, and (8) CatBoost. Although algorithms such
regular pattern regarding the distribution of input parameters. Ad- as Naïve Bayes, K-Nearest Neighbors, Decision Tree, and Random Forest
ditionally, Fig. 5 shows the correlation between the input parameters has been explored in the field of structural engineering [4,5,8,10],
used for the identification of failure mode for shear walls. The corre- advanced algorithms such as AdaBoost, XGBoost, LightGBM, and Cat-
lation coefficient indicates the strength of the relationship between the Boost have not been explored. In the following section, f represents the
relative movements of two parameters. It is seen from Fig. 5 that some failure mode [flexure (F), flexure-shear (FS), shear (S), or sliding shear
parameters are strongly correlated while others are weakly correlated. (SL)], X represents the input vector with the nine parameters (M/Vlw,
For example, the correlation coefficient between the cross section shape lw/tw, ρvwfy,vw/fc', ρhwfy,hw/fc', ρvcfy,vc/fc', ρhcfy,hc/fc', P/fc'Ag, section shape,
and Ab/Ag was –0.871, which indicates a strong negative relationship and Ab/Ag), Y represents the predicted output mode, and x represents a
with each other. The correlation coefficient between M/Vlw and P/fc'Ag specific observation. Only a brief description of the various machine
was 0.513; the correlation coefficient between lw/tw and section shape learning methods is given in this section; interested readers can refer to
(or Ab/Ag) was –0.624 (or 0.571). Weak correlations exist between M/ Friedman et al. [24] for a more detailed description.
Vlw and section shape (or Ab/Ag). The correlation is an indication of the
need for nonlinear classification algorithms. Note that the current study
is limited to a specific set of input parameters, and further studies are 4.1. Naïve Bayes
needed that account for all the input parameters that might have an
influence on the failure mode of walls. A Naïve Bayes classifier simplifies learning by assuming that the
effect of an attribute value on a given class is independent of the values
3.3. Selection of failure mode types for response parameter of other attributes. The classifier is based on Bayes theorem:

Following the descriptions in Section 2, Grammatikou et al. [22] f ff (x )


Pr (x ) = Pr (Y = f |X = x ) =
defined the failure mode of reinforced concrete walls with the following K
i=1 i fi (x ) (1)
five classes to develop their strength and deformation capacity pre-
diction models: flexure, diagonal tension, diagonal compression, sliding where f is the prior probability that a randomly chosen observation
shear, and special case of squat walls failing in shear. In their database, comes from the fth class, and f f (x ) represents the density function that
most of the specimens exhibiting shear failure in diagonal tension and an observation comes from the fth class. f is computed in the current
compression failed in shear after flexural yielding. However, some of study by computing the fraction of the training observations belonging
existing references did not explicitly express a specific type of flexure- to the fth class, and a Gaussian distribution is assumed for f f (x ) .

5
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

M /Vl w l w /t w ρ vw f y,vw /f c ρ hw f y,vw /f c ρ vc f y,vc /f c ρ hc f y,hc /f c P /f c A g Section A b /A g

M /Vlw
1.000 -0.433 0.071 -0.069 -0.159 0.122 0.513 0.274 -0.229
lw /tw

-0.433 1.000 0.100 0.290 -0.048 -0.217 -0.317 -0.624 0.571


ρvwfy,vw /fc

0.071 0.100 1.000 0.461 0.133 0.077 0.143 -0.015 0.011


ρhwfy,vw /fc

-0.069 0.290 0.461 1.000 0.168 0.314 0.112 -0.235 0.188


ρvcfy,vc/fc

-0.159 -0.048 0.133 0.168 1.000 0.181 -0.123 0.087 -0.127


ρhcfy,hc /fc

0.122 -0.217 0.077 0.314 0.181 1.000 0.123 0.189 -0.181


P /fcAg

0.513 -0.317 0.143 0.112 -0.123 0.123 1.000 0.233 -0.193


Section

0.274 -0.624 -0.015 -0.235 0.087 0.189 0.233 1.000 -0.871


Ab /Ag

-0.229 0.571 0.011 0.188 -0.127 -0.181 -0.193 -0.871 1.000

Fig. 5. Correlation matrix for input parameters.

4.2. K-Nearest Neighbors 4.4. Random Forest

K-Nearest Neighbors is a non-parametric machine learning algo- A Random Forest classifier consists of a combination of tree classi-
rithm that does not place any assumption on the decision boundaries fiers, in which each classifier is generated using a random vector
between the failure modes. In this method, if K most similar samples in sampled independently from the input vector [25,26]. Random Forest
the feature space belong to some failure mode, then the sample would includes two important methods: random feature subspace and out-of-
be determined to belong to this failure mode. The conditional prob- bag estimates. The former enables a much faster construction of trees,
ability for x in failure mode f is then estimated as: and the latter, the possibility of evaluating the relative importance of
each input feature. A more detailed description of Random Forest can
1
Pr (x ) = Pr(Y = f |X = x ) = I (yi = f ) be found in the references [10,26,27].
K i NK (2)
4.5. Boosting Methods: AdaBoost, XGBoost, LightGBM, CatBoost
where NK represents the K points in the training data that are closest to
observation, f.
AdaBoost, XGBoost, LightGBM, and CatBoost are boosting methods
to improve the performance of a model by combining a set of weak
4.3. Decision Tree classifiers to form a strong classifier. The concept involves choosing the
weak classifiers in such a way that, when combined, their performance
Decision Tree is a non-parametric classification method where the is improved significantly. In the initial step in AdaBoost, all the ob-
classification problem is split into a hierarchy of simple decisions, with servations are weighted equally. In the subsequent steps, observations
each decision based on only one or several of the input features. The that were incorrectly classified carry more weight than the observations
modeling of Decision Tree consists of two steps: (1) tree building and that were correctly classified, and the model is retrained. Thus, the
(2) tree pruning. Tree building consists of splitting the training set space learners are trained based on the weighted classification accuracy of the
into non-overlapping regions, based on the Gini Index [25]. The tree previous learners. Freidman et al. [24] proposed an alternative boosting
obtained in the building step may have many branches, and to avoid approach called gradient boosting, which involves performing regres-
overfitting, tree pruning is carried out next. A cost complexity pruning sion on a function of the gradient vector of the loss function evaluated
is adopted in this study, and the pruning parameter is estimated based at the previous iteration. Methods such as XGBoost, LightGBM, and
on the 10-fold validation technique. CatBoost fall into the gradient boosting category. They all use decision

6
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

(a) Predicted class (b) Predicted class (c) Predicted class


F FS S SL Recall F FS S SL Recall F FS S SL Recall
82 17 5 2 96 5 3 2 106 0 0 0
77% 91% 100%

F
(30%) (6%) (2%) (1%) (35%) (2%) (1%) (1%) (39%) (0%) (0%) (0%)
Observed class

Observed class

Observed class
15 39 12 0 15 46 5 0 0 66 0 0
FS 59% 70% 100%

FS

FS
(5%) (14%) (4%) (0%) (5%) (17%) (2%) (0%) (0%) (24%) (0%) (0%)
2 20 61 1 4 8 72 0 0 0 84 0
73% 86% 100%
S

S
(1%) (7%) (22%) (0%) (1%) (3%) (26%) (0%) (0%) (0%) (31%) (0%)
4 3 5 7 4 3 6 6 0 0 0 19
Precison SL

Precison SL

Precison SL
37% 32% 100%
(1%) (1%) (2%) (3%) (1%) (1%) (2%) (2%) (0%) (0%) (0%) (7%)

80% 49% 73% 70% 69% 81% 74% 84% 75% 80% 100% 100% 100% 100% 100%
Accurary Accurary Accurary

(d) Predicted class (e) Predicted class (f) Predicted class


F FS S SL Recall F FS S SL Recall F FS S SL Recall
106 0 0 0 85 17 4 0 106 0 0 0
100% 80% 100%
F

F
(39%) (0%) (0%) (0%) (31%) (6%) (1%) (0%) (39%) (0%) (0%) (0%)
Observed class

Observed class

Observed class
0 65 1 0 13 46 6 1 0 66 0 0
98% 70% 100%
FS

FS

FS
(0%) (24%) (0%) (0%) (5%) (17%) (2%) (0%) (0%) (24%) (0%) (0%)
0 0 84 0 7 13 62 2 0 0 84 0
100% 74% 100%
S

S
(0%) (0%) (31%) (0%) (3%) (5%) (23%) (1%) (0%) (0%) (31%) (0%)
0 1 1 17 0 2 2 15 0 0 0 19
Precison SL

Precison SL

Precison SL
89% 79% 100%
(0%) (0%) (0%) (6%) (0%) (1%) (1%) (5%) (0%) (0%) (0%) (7%)

100% 98% 98% 100% 99% 81% 59% 84% 83% 76% 100% 100% 100% 100% 100%
Accurary Accurary Accurary

(g) Predicted class (h) Predicted class


F FS S SL Recall F FS S SL Recall
106 0 0 0 106 0 0 0
100% 100%
F
F

(39%) (0%) (0%) (0%) (39%) (0%) (0%) (0%)


Observed class

Observed class

0 66 0 0 0 66 0 0
100% 100%
FS

FS

(0%) (24%) (0%) (0%) (0%) (24%) (0%) (0%)


0 0 84 0 0 0 84 0
100% 100%
S

(0%) (0%) (31%) (0%) (0%) (0%) (31%) (0%)


0 0 0 19 0 0 0 19
Precison SL

Precison SL

100% 100%
(0%) (0%) (0%) (7%) (0%) (0%) (0%) (7%)

100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Accurary Accurary

Fig. 6. Confusion matrix of classification models of various machine learning techniques using the training set: (a) Naïve Bayes, (b) K-Nearest Neighbors, (c) Decision
Tree, (d) Random Forest, (e) AdaBoost, (f) XGBoost, (g) LightGBM, and (h) CatBoost.

trees as the base weak learner, and iteratively fit a sequence of such learn [28] and the link for the code is given in Section 7. 70% of the
trees using gradient boosting. XGBoost fits the new model by reducing assembled data given in the supplementary material section is used to
the misclassification error of the previous model. However, XGBoost is establish the prediction model (training set), and the remaining 30% of
slow in implementation, due to its use of sequential model training. the data is used to evaluate the performance of the prediction model
LightGBM is a model based on decision tree algorithms, in which the (test set). The division of the entire dataset into the training set and test
model is generated leaf-wise rather than depth-wise (as in other deci- set was randomly carried out, and the performance of the model on the
sion tree-based methods). Such a leaf-wise generation leads to accurate test set was an indication of the performance of the model on the un-
more-complex trees. CatBoost is a methodology that successfully han- known data. In other words, machine learning models are developed
dles categorical features in the input parameters, and takes advantage using the methods mentioned in Section 4 with the 70% of the as-
of dealing with them during training, as opposed to during preproces- sembled data.
sing time. The performance of each machine learning model was evaluated in
more detail using a confusion matrix (Figs. 6 and 7). The confusion
5. Identification of failure modes using machine learning matrix is a table of the observed failure mode versus the predicted
techniques failure mode. Each element in the confusion matrix Cij (i = 1:4, j = 1:4)
is equal to the number of observations known to be in failure mode i,
The machine learning techniques described in the previous section but predicted to failure mode j. Therefore, the diagonal elements in the
were used to identify the failure mode of reinforced concrete shear confusion matrix represent the failure modes that were correctly clas-
walls from the assembled database. To use the machine learning sified by the machine learning algorithm, and the off-diagonal elements
models, the authors converted the material, structural, and geometric represent the failure modes that were not predicted correctly. Three
properties of the shear wall to the nine input parameters described in performance measures were used in this study to evaluate the perfor-
Section 3.2. The machine learning codes for the models mentioned in mance of the model: accuracy, precision, and recall. The accuracy is the
Section 4 are developed using the open-source python package scikit- fraction of predictions the model got right, i.e., the accuracy is the ratio

7
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

(a) Predicted class (b) Predicted class (c) Predicted class


F FS S SL Recall F FS S SL Recall F FS S SL Recall
38 4 3 1 43 1 0 2 37 6 3 0
83% 93% 80%

F
(32%) (3%) (3%) (1%) (36%) (1%) (0%) (2%) (31%) (5%) (3%) (0%)
Observed class

Observed class

Observed class
6 21 3 0 7 20 3 0 4 22 1 3
FS 70% 67% 73%

FS

FS
(5%) (18%) (3%) (0%) (6%) (17%) (3%) (0%) (3%) (19%) (1%) (3%)
2 5 30 1 1 3 34 0 2 1 31 4
79% 89% 82%
S

S
(2%) (4%) (25%) (1%) (1%) (3%) (29%) (0%) (2%) (1%) (26%) (3%)
1 1 1 1 0 0 1 3 0 0 0 4
Precison SL

Precison SL

Precison SL
25% 75% 100%
(1%) (1%) (1%) (1%) (0%) (0%) (1%) (3%) (0%) (0%) (0%) (3%)

81% 68% 81% 33% 76% 84% 83% 89% 60% 85% 86% 76% 89% 36% 80%
Accurary Accurary Accurary

(d) Predicted class (e) Predicted class (f) Predicted class


F FS S SL Recall F FS S SL Recall F FS S SL Recall
43 2 0 1 36 6 1 3 41 4 0 1
93% 78% 89%

F
F

(36%) (2%) (0%) (1%) (31%) (5%) (1%) (3%) (35%) (3%) (0%) (1%)
Observed class

Observed class

Observed class
5 21 3 1 9 15 5 1 7 19 3 1
70% 50% 63%
FS

FS

FS
(4%) (18%) (3%) (1%) (8%) (13%) (4%) (1%) (6%) (16%) (3%) (1%)
2 2 34 0 4 8 26 0 2 2 34 0
89% 68% 89%
S

S
S

(2%) (2%) (29%) (0%) (3%) (7%) (22%) (0%) (2%) (2%) (29%) (0%)
0 0 0 4 0 2 0 2 0 0 0 4
Precison SL

Precison SL
Precison SL

100% 50% 100%


(0%) (0%) (0%) (3%) (0%) (2%) (0%) (2%) (0%) (0%) (0%) (3%)

86% 84% 92% 67% 86% 73% 48% 81% 33% 67% 82% 76% 92% 67% 83%
Accurary Accurary Accurary

(g) Predicted class (h) Predicted class


F FS S SL Recall F FS S SL Recall
39 7 0 0 40 5 0 1
85% 87%
F

(33%) (6%) (0%) (0%) (34%) (4%) (0%) (1%)


Observed class

Observed class

7 19 3 1 5 21 3 1
63% 70%
FS

FS

(6%) (16%) (3%) (1%) (4%) (18%) (3%) (1%)


3 3 32 0 2 2 34 0
84% 89%
S

(3%) (3%) (27%) (0%) (2%) (2%) (29%) (0%)


0 0 0 4 0 0 0 4
Precison SL

Precison SL

100% 100%
(0%) (0%) (0%) (3%) (0%) (0%) (0%) (3%)

80% 66% 91% 80% 80% 85% 75% 92% 67% 84%
Accurary Accurary

Fig. 7. Performance evaluation of various machine learning techniques using the test set: (a) Naïve Bayes, (b) K-Nearest Neighbors, (c) Decision Tree, (d) Random
Forest, (e) AdaBoost, (f) XGBoost, (g) LightGBM, and (h) CatBoost.

of the number of correct failure mode predictions to the total failure comes at a cost of low precision for flexure-shear mode of failure.
mode predictions, and is given in the index (5, 5) of the confusion • Gradient boosting methods such as AdaBoost, XGBoost, LightGBM,
matrix. The percentage of failure modes that are correctly assigned by and CatBoost did not improve the performance of the model, com-
the machine learning algorithm is called the precision, which is given in pared to the bagging based Random Forest model. This underscores
the fifth row of the confusion matrix. On the other hand, the percentage the need for a detailed evaluation of both simple and advanced
of actual failure modes that are correctly assigned by the machine models, before establishing a machine-learning-based failure mode
learning algorithm is known as the recall and is given in the fifth identification model.
column in the confusion matrix. Note that accuracy is a global measure • In general, non-parametric tree-based methods had better perfor-
of the performance of the machine learning method, while precision mance compared to parametric non-tree based methods such as
and accuracy are particular to each failure mode. High values of ac- Naïve Bayes. This is because of the non-linear decision boundaries
curacy, precision, and recall in a model indicates that it can identify the between the failure methods. Amongst the boosting methods,
failure modes correctly. The following inferences can be drawn from AdaBoost had the lowest performance, compared to the other
Figs. 6 and 7: methods.
• Figs. 6 and 7 underscore the importance of splitting the data into
• Random Forest had the highest accuracy, with 86% for the test set, training and test sets. Training a model solely based on the entire
followed by KNN (85%) and CatBoost (84%). The fact that tree- data set may not yield satisfactory performance for unknown data
based models had better performance is an indication of the com- (e.g., XGBoost model had 100% accuracy for the training set, but
plex non-linear decision boundaries that separate the failure modes. only 83% accuracy for the test set).
• Often the identification of flexure-shear failure mode is difficult [6],
and the Random Forest model had 70% recall and 84% precision in It is seen from Figs. 6 and 7 that the performance of the model
identifying the flexure-shear failure mode in the test set. Only DT varies, depending on the failure mode under consideration. Based on its
has high recall than RF for flexure-shear mode of failure, but it total accuracy and its fair performance in identifying the brittle shear

8
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

20 mode for the test set. From further exploration of Random Forest, it was
' 14 determined that the following input parameters are the critical factors
' 14 governing the failure mode of a shear wall: the aspect ratio, the
13
boundary element reinforcement indices, and the wall length-to-wall
thickness ratio.
' 9
This study demonstrates the capability of machine learning models
8
in the prediction of the failure mode of shear walls. The open-source
' 8
data-driven classification model can be used in design offices across the
' 7
world for rapid failure mode prediction of shear walls. The model has
Section R 4 the flexibility to accommodate new experimental results to yield ad-
Section Fl 2 ditional insights. The user can update the open-source database and re-
Section B 1 run the model to update the new experimental results, as available. In
addition, the proposed classification model can help other researchers
in planning their experimental studies. The study demonstrates the
capability of machine learning based failure mode identification models
Fig. 8. Relative importance of input parameters affecting failure mode in that can be utilized for other structural components. Note that detailed
Random Forest model. descriptions of damage state of reinforced concrete walls with different
failure modes help engineers for the informed retrofit decision strate-
mode of failure, Random Forest is suggested as the machine learning gies. For example, the flexural failure mode could be of reinforcement
model for shear wall failure mode identification. To evaluate how the buckling and fracture and concrete crushing. The current approach only
performance of the Random Forest model is affected by the input predicts the flexural mode without identifying the reason for the failure
parameters, further analysis was carried out to identify the importance mode. Further studies and data are needed for the identification of
of the input parameters, as shown in Fig. 8. Note that the summation of damage progress patterns with different failure modes.
all the values above the vertical bars in Fig. 8 is 100%. The optimal
parameters of Random Forest were established first through a gird 7. Open source data driven model
search algorithm (provided in detail in the attached code), and the
relative importance of each parameter was established by noting the The Supplementary Material of this paper provides the database
variation in the out of bag (OOB) error. As seen in Fig. 8, the aspect used in the current study, and the Jupyter Notebook python code for
ratio, boundary element reinforcement indices, and wall length-to-wall the machine learning model is provided in GitHub (https://github.com/
thickness ratio are the critical factors that govern the failure mode of sujithmangalathu/Shear-Wall-Failure-Mode.git).
shear walls. It is noted that the cross-section shape has less influence on
the failure mode than other parameters. Acknowledgements

6. Conclusions This work was supported by the National Research Foundation of


Korea (NRF) grant funded by the Korea government (MSIT)
Shear walls are used in a lateral force resisting system to resist (NRF‐2019R1C1C1007780).
winds and earthquakes. Depending on the loading, geometric, and
material configurations, shear walls can fail in shear, flexure, flexure- Appendix A. Supplementary data
shear, or sliding. Although the behavior of walls under different con-
figurations has been extensively studied by experiments and earthquake Supplementary data to this article can be found online at https://
reconnaissance, there is no empirical or mechanics-based methodology doi.org/10.1016/j.engstruct.2020.110331.
to predict the failure mode of shear walls. An easy-to-use failure mode
prediction is critical for rapid damage assessment, seismic risk assess- References
ment, and retrofitting decision strategies. This paper explores the cap-
ability of machine learning and artificial intelligence in failure mode [1] Paulay T. Design aspects of shear walls for seismic areas. Can J Civ Eng
identification of concrete shear walls. 1975;2(3):321–44.
[2] Lefas ID, Kotsovos MD, Ambraseys NN. Behavior of reinforced concrete structural
In the initial part of this study, an extensive database is created by walls: strength, deformation characteristics, and failure mechanism. ACI Struct J
the authors, based on the available experiments. The database consists 1990;87(1):23–31.
of 393 one-story, one-bay reinforced concrete shear walls with rectan- [3] Dashti F, Dhakal RP, Pampanin S. 2014. Numerical simulation of shear wall failure
mechanisms. In: The 2014 New Zealand Society for Earthquake Engineering
gular or non-rectangular (barbell (I-) or flanged) sections. In the data- Conference, Auckland, New Zealand.
base, the number of specimens with flexural failure, flexure-shear [4] Mangalathu S, Jeon J-S. Classification of failure mode and prediction of shear
failure, shear failure, and sliding shear failure is 152, 96, 122, and 23, strength for reinforced concrete beam-column joints using machine learning tech-
niques. Eng Struct 2018;160:85–94.
respectively. Based on insights from the past studies, 10 input para-
[5] Huang H, Burton HV. Classification of in-plane failure modes for reinforced concrete
meters were generated by the authors that can capture the geometry, frames with infills using machine learning. J Build Eng 2019;25:100767.
material properties, and reinforcement details of shear walls. The entire [6] Mangalathu S, Jeon J-S. Machine learning-based failure mode recognition of cir-
cular reinforced concrete bridge columns: comparative study. J Struct Eng
dataset is divided into training set and test set. The training set is used
2019;145(10):04019104.
to establish the prediction model, and the performance of the model is [7] Siam A, Ezzeldin M, El-Dakhakhni W. Machine learning algorithms for structural
evaluated through the test set (unknown data). Eight machine learning performance classifications and predictions: application to reinforced masonry
models: Naïve Bayes, K-Nearest Neighbors, Decision Tree, Random shear walls. Struct 2019;22:252–65.
[8] Kiani J, Camp C, Pezeshk S. On the application of machine learning techniques to
Forest, AdaBoost, XGBoost, LightGBM, and CatBoost were evaluated in derive seismic fragility curves. Comput Struct 2019;218:108–22.
this study. The performance of the model was evaluated using three [9] Seo J, Linzell DG. Use of response surface metamodels to generate system level
metrics: global accuracy, precision, and recall. The results of the study fragilities for existing curved steel bridges. Eng Struct 2013;52:642–53.
[10] Mangalathu S, Heo G, Jeon J-S. Artificial neural network based multi-dimensional
showed that Random Forest had the highest accuracy for the training fragility development of skewed concrete bridge classes. Eng Struct
set, followed by CatBoost and XGBoost. In reality, the prediction of 2018;162:166–76.
flexure-shear mode is difficult. The proposed Random Forest model had [11] Friedman JH. Greedy function approximation: a gradient boosting machine. Ann
Stat 2001;29(5):1189–232.
70% recall and 84% precision in identifying the flexure-shear failure

9
S. Mangalathu, et al. Engineering Structures 208 (2020) 110331

[12] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y. 2017. LightGBM: a [20] Gulec CK, Whittaker AS. 2009. Performance-Based Assessment and Design of Squat
highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Reinforced Concrete Shear Walls. Report No. MCEER-09-0010. Buffalo, NY:
Conference on Neural Information Processing Systems (NIPS 2017); 3146-3154, 2017. Multidisciplinary Center for Earthquake Engineering Research, University at
[13] Freund Y, Schapire RE. Experiments with a new boosting algorithm. Proceedings of Buffalo.
the Thirteenth International Conference on International Conference on Machine [21] Usta M, Pujol S. ACI Subcommittee 445B, Puranam A, Song C, Wang Y. ACI 445B
Learning. 1996. Shear Wall Database. Purdue University Research Repository 2017. https://doi.org/
[14] Chen T, Guestrin C. 2016. XGBOOST: a scalable tree boosting system. In: 10.4231/R7HH6H39.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge [22] Grammatikou S, Biskinis D, Fardis MN. Strength, deformation capacity and failure
Discovery and Data Mining; 785-794, 2016. modes of RC walls under cyclic loading. Bull Earthq Eng 2015;13(11):3277–300.
[15] Dorogush AV, Ershov V, Gulin A. 2018. CatBoost: gradient boosting with catego- [23] American Society of Civil Engineers. Seismic Design Criteria for Structures,
rical features support. arXiv preprint arXiv:1810.11363. Systems, and Components in Nuclear Facilities (ASCE/SEI 43–05). Reston, VA:
[16] Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. 2018. CatBoost: ASCE; 2005.
unbiased boosting with categorical features. In: Proceedings of the 32nd [24] Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. Springer
International Conference on Neural Information Processing Systems (NIPS 2018); Series in Statistics. Berlin: Springer; 2001.
6638-6648, 2018. [25] Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees.
[17] Paulay T, Priestley MJN. Seismic Design of Reinforced Concrete and Masonry Boca Raton, FL: Chapman & Hall/CRC; 1984.
Buildings. New York, NY: John Wiley & Sons; 1992. [26] Breiman L. Random forests. Mach Learn 2001;45(1):5–32.
[18] American Society of Civil Engineers. Seismic Rehabilitation of Existing Buildings [27] Mangalathu S, Jeon J-S. Stripe-based fragility analysis of multispan concrete bridge
(41–06). Reston, VA: ASCE; 2006. classes using machine learning techniques. Earthq Eng Struct Dyn
[19] Applied Technical Council. 1998. Evaluation of Earthquake Damaged Concrete and 2019;48(11):1238–55.
Masonry Buildings - Basic procedures manual (FEMA 306). Federal Emergency [28] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-
Management Agency, Washington, DC. learn: Machine learning in Python. J Mach Learn Res 2011;12:2825–30.

10

You might also like