You are on page 1of 15

Computers and Structures 241 (2020) 106355

Contents lists available at ScienceDirect

Computers and Structures


journal homepage: www.elsevier.com/locate/compstruc

Application of pool-based active learning in reducing the number of


required response history analyses
Jalal Kiani a, Charles Camp a, Shahram Pezeshk a,⇑, Naeem Khoshnevis b
a
Department of Civil Engineering, University of Memphis, Memphis, TN 38152, USA
b
Department of Computer Science, University of Memphis, Memphis, TN 38152, USA

a r t i c l e i n f o a b s t r a c t

Article history: A step by step method is presented for reducing the need for a large number of response history analyses
Received 17 June 2020 (RHAs) in developing surrogates to predict the structural responses. These surrogates, which map ground
Accepted 2 August 2020 motions features and characteristics of the structural systems into structural responses, are used in deriv-
Available online 20 August 2020
ing fragility curves; and mostly are developed using machine learning algorithms. A machine learning
algorithm, depending on the complexity of the model, requires a sufficient amount of training data to
Keywords: predict the outputs accurately. For complicated structural models, generating training data can be com-
Machine learning
putationally demanding. Therefore, there is a need to generate the least amount of training data while
Active learning
Neural networks
preserving the accuracy of the prediction models. Towards this goal, a pool-based query-by-committee
Response history analyses active learning (AL) algorithm is applied to choose probably the most informative data samples for sur-
Fragility curve rogate training. A pool of unlabeled GM data samples is generated and a committee of artificial neural
networks (ANNs) is defined to choose the smallest subset of the data that would be the most informative
training data. The selected data samples meet informativeness, representativeness, and diversity criteria.
The results of the applied case study show that implementing AL, can considerably reduce the size of
required training data and consequently the amount of RHAs while improving the performance of the
prediction models. Specifically, the findings demonstrate that AL improves the average F-scores of
ANNs by about 10% and significantly reduces their variations, which indicates its stable behavior.
Ó 2020 Elsevier Ltd. All rights reserved.

1. Introduction ficiently estimate the seismic demand in the structural systems


based on a minimal number of RHAs would be a significant
Seismic risk assessment within the performance-based earth- improvement. Here accuracy will measure the difference between
quake engineering framework involves performing response his- the structural response resulted from a sample set of GMs and that
tory analyses (RHAs). The results from the RHAs eventually from a large number of GMs. The estimated structural response is
quantify the distribution of the structural responses and are used said to be sufficient and stable if the group-to-group variability
to derive fragility curves. A critical issue concerning RHAs is the among the structural responses estimated using different sample
appropriate number of required ground motions (GMs) for per- sets of GMs is small.
forming seismic demand analysis. Based on the statistical theory Numerous studies have been conducted to estimate the opti-
[1], using a large sample of GMs for RHAs leads to a more accurate mum number of required RHAs to address the limitations associ-
estimation of the structural responses. However, it is not practical ated with computational resources and time. Kiani et al. [2]
to use a large number of GMs for RHAs using sophisticated nonlin- reviewed and discussed a large number of research studies [3–9]
ear numerical models due to demands on computational resources focused on minimizing the number of required RHAs and examin-
and long model run times. A procedure that can accurately and suf- ing the reliability of the structural responses obtained from a lim-
ited number of RHAs. The findings of these studies suggest that
specifying a minimal number of GMs for performing RHAs is a sub-
Abbreviations: RHA, Response History Analysis; GM, Ground Motion; AL, Active jective choice. Overall, despite the prevalent importance of the
Learning; PL, Passive Learning; ANN, Artificial Neural Network; QBC, Query by number of RHAs on the accuracy and stability of the estimated
Committee. structural responses, there is no a holistic methodology in the lit-
⇑ Corresponding author.
E-mail address: spezeshk@memphis.edu (S. Pezeshk).

https://doi.org/10.1016/j.compstruc.2020.106355
0045-7949/Ó 2020 Elsevier Ltd. All rights reserved.
2 J. Kiani et al. / Computers and Structures 241 (2020) 106355

erature that can be used to find the optimal number of required the structure subjected to GMs. Each part of the dataset has been
RHAs for an arbitrary structural system. generated in previous studies, and those details are presented in
An alternative approach is to build surrogate models to acceler- the following sections. Results from analyses of an 8-story building
ate large computations. Recent developments in machine learning subjected to 2000 GM records are utilized to build the dataset. In
techniques provide powerful tools to develop surrogate models for addition, 22 different features describing the severity of the input
predicting the responses of numerical simulations (e.g., [10]). In GMs are computed. As a result, the dataset includes 2000 data
this case, based on a sufficient amount of training data, surrogate samples with 22 features. This dataset is called unlabeled data.
models are generated, and then used to map input parameters to RHAs are conducted for all samples in the dataset and the maxi-
output results without demanding significant computational mum inter-story drift ratio (MIDR) is computed to label them.
resources or time. Several research studies [11–20] used machine Due to the required RHAs, labeling data samples is considered a
learning tools for quantifying the distribution of the structural computationally demanding task.
responses and deriving the parameters of fragility curves. The following sections provide further information about the
When developing surrogates for RHAs, the prediction model maps applied structural system, seismic input GMs, and seismic demand
GM features and the characteristics of the structural systems as input analysis method.
data into the structural responses as output data. In general, a set of
the input data samples is refereed to unlabeled data; whereas, their 2.1. Structural model
combination with corresponding output is called labeled data. The
process of developing surrogates using machine learning techniques This study is based on the structural responses of an 8-story
is called training. Depending on the complexity of the model, the building with the overall dimensions of 36.56 m by 36.56 m in plan
training process requires a sufficient amount of labeled data. Kiani and 30.19 m in elevation. The structural system of the applied
et al. [16] discussed that the performance of different machine learn- building includes perimeter special steel moment-resisting frames
ing models in predicting the structural responses significantly with reduced beam section connections and an interior gravity
depends on the number of observations used for training (i.e., the load system. The building has a fundamental period of about
number of GMs used for RHAs). Although using a large labeled data- 2.3 s and was designed for a site located in Los Angeles, CA. Ghas-
base of GMs leads to a better and more accurate estimation of the semieh and Kiani [24] provided the design details of the building,
seismic demand, labeling a large set of GMs (e.g., categorizing into including the size and arrangement of beam and columns and
collapse or non-collapse classes using the results of RHAs) is often the detailing of connections and panel zones. The building is mod-
expensive and even impractical for complicated structural systems. eled in OpenSees as a 2-D centerline model using the lumped plas-
When labeling the input data is computationally demanding, ticity model. Since the structural system is symmetric, the
active learning (AL) techniques can be very efficient and beneficial numerical model consists of a single two-dimensional external
tools for training machine learning models (e.g., see Settles 2009 frame. The structural model accounts for the destabilizing effects
and references therein). Starting with the least amount of data to of gravity loads through applying one-half story weight to lean-
train a model, AL trains some prediction models and searches the on columns (i.e., members with very high axial stiffness and negli-
pool of unlabeled data for the most informative data samples that gible flexural stiffness connected to the main frame using rigid ele-
will improve the performance of the prediction models. Since ments). Also, the computational model considers stiffness and
labeling data is considered a computationally demanding task, AL strength deterioration using Modified Ibarra-Medina-Krawinkler
can significantly reduce the amount of required training data and Deterioration model [25].
improve computational efficiency.
Among many different AL algorithms, in this study, a batch-mode 2.2. GM dataset
pool-based query by committee (QBC) method (e.g., see Khoshnevis
and Taborda [22]) is utilized to reduce the computational time for This study uses the GCIM framework to select a set of GM
training the prediction models. Initially, in this method, a pool of records consistent with the seismic hazard of the site. The GCIM
unlabeled data samples is generated. Using the least amount of has its basis on the assumption that all intensity measures (IMs),
labeled data samples, several prediction models are trained, which which characterize the severity of a GM, follow a multivariate log-
are called committee members. Following guidelines which are men- normal distribution. A detailed procedure for constructing the
tioned in the Methodology section, another batch of unlabeled data GCIM target and selecting GMs based on this framework is pro-
samples from the pool are chosen and labelled using RHAs. The vided elsewhere [23,26]. Briefly, constructing the GCIM framework
new batch of data sample are added into the training data and the requires an IM as the primary GM feature (called conditioning IM)
process is repeated until either the predefined performance criteria to build the conditional distributions of other IMs and link the
are reached, or the computational budget is met. results of probabilistic seismic hazard analyses to structural
The structural responses of a 2D 8-story structural system sub- responses. In this study, spectrum intensity (SI) is used as the con-
jected to a set of GMs selected based on the generalized conditional ditioning IM based on the recommendations of Kiani and Pezeshk
intensity measure framework GCIM [23] is considered. Artificial neu- [27].
ral networks (ANNs) are implemented to predict the structural Additionally, based on the GCIM framework, it is required to
responses from the seismic inputs. F-score is used to measure the consider all those GM characteristics that recognized as important
performance of trained ANNs using the unseen test dataset. Addition- for predicting the structural responses. Hence, a large numbers of
ally, the performance of ANNs due to the effects of the imbalanced IMs that identified as important in predicting the structural
dataset, the method for initializing the weights of ANNs, and the responses [27] are considered including: acceleration spectrum
batch size are investigated. Results show that using AL improves intensity ASI, displacement spectrum intensity DSI , peak ground
the performance of ANNs and reduces the variation in the outcomes. acceleration PGA, cumulative absolute velocity CAV, significant
duration (i.e., Ds5–75 and Ds5–95), and spectral acceleration SA at
periods of 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 7.5,
2. Study dataset and 10 s. Additionally, as listed in Table 1, 20 hazard levels are con-
sidered, including the levels with the exceedance probabilities of
The study dataset is a compilation of different elements, includ- 50% to 0.0005% in 50 years. After building the target GCIM using
ing structural characteristics, GMs’ features, and the response of OPENSHA [28], 100 GMs are selected that match to the target over
J. Kiani et al. / Computers and Structures 241 (2020) 106355 3

Table 1
The considered hazard levels and their equivalent earthquake intensity computed for a site located in Los Angeles, CA.

LevelNo. Hazard Level with Probability of exceedance Intensity level with SI LevelNo. Hazard Level with Probability of exceedance Intensity level with SI
in 50 years (cm s/s) in 50 years (cm s/s)
1 50% 44.04 11 0.25% 239.56
2 30% 58.69 12 0.2% 250.80
3 20% 69.97 13 0.15% 266.07
4 10% 90.76 14 0.1% 289.19
5 5% 113.30 15 0.05% 330.0
6 3% 131.93 16 0.025% 375.14
7 2% 146.56 17 0.015% 410.63
8 1.5% 157.91 18 0.01% 439.39
9 1% 175.21 19 0.007% 466.35
10 0.5% 205.45 20 0.005% 493.29

each hazard level (i.e., in total 20 hazard levels  100 GMs over Kiani et al. [16] proposed an alternative method using machine
each hazard = 2000 GMs). Kiani et al. [2] provide detailed informa- learning tools for predicting the structural responses. Their method
tion about the selected GMs and their characteristics. involves training a prediction model using the labeled data. The
trained model can be implemented to predict the structural
responses for arbitrary GMs. The results in terms of generated data
2.3. Seismic demand analysis points can be applied to derive the parameters of fragility curves.
However, developing such a prediction model initially involves
In this study, the multiple stripe analysis [29], which is able to having a large number of labeled data points. With this problem
implement hazard-consistent GMs over different earthquake IM as motivation, in this study, AL technique is applied to reduce
levels, is used for performing RHAs to capture the distribution of the number of labeled data points required for training machine
the structural responses in terms of MIDR. Then, all GM samples learning models, which are then used for predicting the structural
are labelled into one out of two applied classes based on the responses.
resulted MIDR. Data samples with MIDR < 0.03 rad are labeled as
Class 1; otherwise, they are labeled as Class 2. There is a significant
drop in the lateral resistance of the structural system based on the
result of pushover analysis [30] at this level of MIDR. Hence, it can 3. Methodology
be assumed that the structure is in collapse or near collapse phase.
Because of the importance of this phase of structural behavior, it is The methodology used in this study is based on AL, which has
used for the purpose of this study. Notably, the suggested process its basis on the idea that machine learning tools can achieve better
in this study can be implemented to derive the parameters of fra- performance while using as few labeled observations as possible
gility curves for other damage levels in any structural or geotech- for training [21]. In AL, the classifier can query for the label of unla-
nical systems. To implement the suggested method for any beled data points to be used for training and learning. For this pur-
arbitrary cases, it is just required to label the GM samples into pose, the active learner, in a series of steps, tries to choose the most
two different classes based on the considered limit state (i.e., valuable data points with the highest informativeness, representa-
whether the GM sample exceeds the considered limit state or not tiveness, and diversity from the pool of unlabeled data points. AL
based on the RHA’s results). techniques are very useful tools where the unlabeled data samples
Fig. 1(a) presents the percentage of GM samples exceeding the come at little cost but labeling them is time-consuming and expen-
considered limit state over different hazard levels. In addition, sive. The main difference between AL and the commonly used pas-
Fig. 1(b) shows the distribution of spectral acceleration (SA) at sive learning (PL) is the ability to query sample data points based
the fundamental period of the structural systems, SA(T = 2.3 s), upon past queries and responses. PL builds prediction models
over different earthquake intensity levels for both Class 1 and Class based on a large number of randomly selected labeled data points
2. There are no sample GMs causing MIDR > 0.03 rad over the first while AL judiciously asks for the labeled data points that have the
six earthquake intensity levels. In this dataset, the two classes are best chance to improve the performance of machine learning
not balanced with the majority of GM samples in Class 1 (1,474 out models.
of 2,000 GMs). In AL, there are several different methodologies to query the
label of data samples, such as membership query synthesis,
stream-based selective sampling, and pool-based sampling [21].
2.4. Fragility curves This study uses a pool-based sampling scenario which assumes
that there is a large pool of unlabeled data points. As will be
The obtained labeled data from the previous steps provide the explained later, this technique draws instances from the pool of
number of GMs exceeding the considered limit state to the total unlabeled data based on some informativeness measure. Detailed
number of GMs used over each earthquake intensity level (the frac- explanations about other scenarios can be found in Settles [21]
tions of GMs exceeding the considered limit state). From this data, and references therein.
seismic fragility curves are generated using methods of estimating In addition, AL involves evaluating the informativeness of unla-
the parameters of distribution such as maximum likelihood. How- beled data samples. As discussed by Settles [21], there are several
ever, this methodology is computationally demanding as it uses a methods of formulating such query strategies, including uncer-
fixed number of GM samples over a large number of hazard levels. tainty sampling, QBC, expected model change, estimated error
Much of the cost with this methodology comes from the labeling of reduction, etc. This study implements the QBC algorithm [31] in
GMs, which are randomly sampled and selected. In classical which a set of models, which are trained on the current labeled
approaches for deriving seismic fragility curves, GMs to be labeled data, are maintained as committee members. Each committee
are randomly and probabilistically drawn from a target member or prediction model is allowed to vote on the labeling of
distribution. query candidates.
4 J. Kiani et al. / Computers and Structures 241 (2020) 106355

Fig. 1. (a) The percentage of GMs belong to Class 2 with MIDR greater than 0.03 rad over each hazard level. (b) The distribution of SA(T = 2.3 s) over different earthquake
intensity levels for Class 1 and Class 2. The thick gray line represents the interquartile range, and the thin gray line represents 1.5 times of interquartile range.

3.1. The workflow of the applied method Step 3. Labeling: perform RHAs to estimate the responses of the
structure subjected to GMs in the query list and determine their
Fig. 2 illustrates the workflow of the proposed methodology for labels (e.g., collapse and non-collapse). The output is binary indi-
accurately generating a prediction model while reducing the num- cating whether each GM exceeds the considered limit state.
ber of required RHAs. Achieving an accurate prediction model Step 4. Training: train a set of machine learning models (com-
based on this methodology involves the following steps. mittee members in QBC method) using the existing labeled GMs,
Step 1. Data collection: collect unlabeled data in terms of earth- where GM features and binary results of RHAs are used as the input
quake GM records. It is important to select a large number of GMs and output, respectively. Train N committee members using N dif-
and ensure that they are representative of the seismic hazard of the ferent combinations of labeled data (i.e., randomly splitting into N
site. See the Study Dataset Section for more details. different combinations of training and validation sets).
Step 2. Set up: choose a set of GMs to initialize the training pro- Step 5. Prediction: predict the label of the rest of unlabeled GMs
cess. When selecting the initial set of GMs, consider two essential by passing them to the committee members. Each committee
criteria, including representativeness and diversity. Diversity member produces a hypothesis for the unseen GM samples in
means that the selected data samples should scatter across the full accordance with the previously labeled GMs. The hypotheses pro-
input space rather than concentrating in a small local region [32]. duced by each committee member are labels for the unseen GMs.
In the present case, diversity guarantees that the selected GMs Step 6. Disagreement measure: select those GMs with the high-
come from a variety of low- to high-intensity earthquake levels. est informativeness from the pool of unlabeled GMs. Informative-
Selected GMs should be diverse; otherwise, the machine learning ness means that the selected data samples must contain
models probably learn less using GMs with similar characteristics. important information such that labeling them would considerably
Representativeness is a measure of the number of data samples improve the performance of the models. The QBC method is used
that are similar or close to the target sample [32]. K-means cluster- to examine the informativeness of unlabeled GM samples based
ing [33] is a common practice in choosing the initial set of data on the disagreement among the committee members. The more
samples for AL algorithms to ensure the representativeness and disagreement among the committee members about a GM sample,
diversity. For this purpose, it is required to partition unlabeled the more uncertain the GM sample is, and, therefore, the more
GM samples into k clusters and then find k number of GM samples informative it is. For example, if all committee members predict
(a batch of data with size k) that are closet to the centers of clus- that a specific GM sample causes collapse, this GM sample is not
ters. Then, these data samples are removed from the pool of unla- an informative one and, hence, adding it to the query list does
beled data and are put into the query list. not improve the performance of the models. On the other hand,
J. Kiani et al. / Computers and Structures 241 (2020) 106355 5

Fig. 2. Workflow of the proposed methodology for reducing the number of required RHAs.

if some committee members predict that a sample GM leads to col- GM sample to sample A is identified and removed from the
lapse while others forecast non-collapse label, this GM sample is a selected dataset. This process is repeated until there are no remain-
valuable and informative one. ing GM samples. Choosing a GM sample with the farthest distance
Select those 2 k GM samples that committee members have the from the others in the set satisfies the diversity criteria and remov-
most disagreement among them based on the predicted labels in ing the closest GM sample improves the representativeness of the
the previous step. Vote entropy VE is used to quantify committee sampling approach.
disagreement for an individual GM sample GM* [34]. Step 7. Retraining: repeat Step 4 to Step 6 by labeling the query
X V ðl; GM Þ list, retraining prediction models, and updating committee mem-
V ðl; GM Þ
VEðGM Þ ¼  log ð1Þ bers until labeling data budget is exhausted or the machine learn-
N N
l ing models provide satisfactory performance.
where V ðl; GM  Þ is the number of committee members (out of N
members) voting for label l for GM*. 3.2. The applied machine learning method
Then, half of GMs (k GM samples) are selected based on diver-
sity and representativeness. For this purpose, the distances Every classification model can be enhanced with AL. Kiani et al.
between each of 2 k samples are computed. Next, a GM sample [16] applied a large number of classification methods for predicting
is selected that has the largest sum of distances from others (call the structural responses and they found that Random Forest and ANNs
it sample A) and it is added to the query list. Next, the closest perform better than others when the data is imbalanced. Hence, this
6 J. Kiani et al. / Computers and Structures 241 (2020) 106355

study implements ANNs as learner methodologies for predicting the 4.1. Considered scenarios
structural responses given seismic input. An ANN is based on a collec-
tion of interconnected nodes called neurons over different layers Two different sampling conditions are considered, including AL,
loosely modeling the biological neurons in a brain. These artificial neu- which selects GM samples using the procedure discussed in Sec-
rons are connected by weighted values over different levels. This tion 3.1. and PL, which randomly samples k GMs at each iteration.
means that given an input value, a neuron initially performs some sort As explained in Step 4, AL trains a set of committee members (ANNs
of computations, and then the resulted value will be multiplied by a herein) at each iteration after updating the query list and labeling the
weight. After applying an activation function to the weighted output, data samples in this list. For training ANNs in the first iteration of AL,
the results for each layer will be used as the input for the next layer. biases are initialized with zero while the weights are randomly ini-
Training an ANN involves optimizing weights to minimize the tialized based on Nguyen-Widrow initialization algorithm [35].
error between the actual responses and the predicted ones. This When ANNs are trained, they, as committee members, select the
study uses the scaled conjugate gradient backpropagation opti- most valuable and informative data samples and add them to the
mization method for training ANNs, which includes updating the query list for labeling based on the procedure explained in Step 6.
weights and biases to reduce the error. This algorithm is used in For the next iteration of AL, the updated labeled data samples are
this study because it is a computationally efficient algorithm for used for training the new committee members. In these iterations,
ANNs, and it can shorten iteration without reducing the quality unlike the first iteration, there are two different options for retraining
of training and testing results. Initial ANN results indicate that ANNs including: first, randomly setting the initial weights; and sec-
using more than two hidden layers does not improve the perfor- ond, using the weights of the pre-trained models as the starting
mance. Fig. 3 shows the configuration of the ANN with 22 and 11 points. Instead of randomly estimating the initial weights, the second
neurons in the hidden layers 1 and 2, respectively. More impor- method considers the weights of the trained models in the previous
tantly, since the input data or IMs come from different scales, the iteration as the initial weights of the new models and update them
data is normalized to improve the performance and stability of using the labeled dataset. Both methods are used to examine their
ANNs. Additionally, those IMs (22 IMs) that were used earlier for impact on the results.
GM selection are considered as input features. As discussed earlier, the dataset used in this study is not bal-
This study examines the performance or generalization of the anced, with most of the data samples belonging to Class 1. To
trained ANNs in predicting the structural responses using the test sub- examine the effect of class imbalance on the performance of the
set. For this purpose, 20% of the entire data is randomly assigned to the trained model using AL, one more case that removes all those
hold-out test set. Also, 80% and 20% of the remaining data are allocated GM samples over Level 1 to Level 5 (in total 500 GM sample) is
to the training and validation sets, respectively. Then, the training and considered since all of them belong to Class 1. The new dataset
validation sets are implemented to fit ANNs and prevent overfitting. includes 1500 GM samples, which 974 of them belong to Class 1,
Later, the hold-out test dataset is used to obtain an unbiased evalua- and 526 of them belong to Class 2. In this case, the ratio of data
tion of the trained ANNs. Additionally, the process of randomly divid- samples in Class 1 to those belong to Class 2 is 1.85; while in the
ing the dataset and fitting ANNs models is repeated 100 times to original dataset this ratio is 2.8. As the imbalance degree in the
reduce the error that might be induced in the results due to randomly new dataset (with 1500 sample GMs) is less than that for the orig-
dividing the dataset into test, training, and validation subsets. inal data (with 1500 sample GMs), it is called as the balanced data-
F-score is used to measure the performance of ANNs in predict- set henceforth. Also, the original dataset is called the imbalanced
ing the structural responses. F-score is the weighted harmonic dataset.
mean of precision and recall. Precision is the ratio of the number For simulations, k GMs are selected at each iteration according
of positive instances that are predicted correctly to the number to the procedure discussed earlier and the process is stopped once
of instances predicted as positive; while recall is defined as the 600 GMs are processed. One key challenge for AL is to choose an
fraction of the number of positive samples that are predicted cor- efficient value for k or the batch size. For the first part of the study,
rectly to the total number of actual positives in the dataset. a batch size of 10 is used, but later the importance of the batch size
on the results is examined by considering three more cases with
the batch size of 5, 15, and 20. Even though all GMs are labeled
4. Results and discussion and categorized into two different classes, it is assumed that all
of them are unlabeled. At each iteration, ten different ANNs (i.e.,
In this section, the performance of AL is examined in predicting ten committee members) are trained using the labeled GMs.
the structural responses using the dataset described above.

Fig. 3. (a) The architecture of applied ANN and (b) anatomy of a single node or neuron. The input for ANNs is the features of GMs (i.e., IMs) and the output is structural
responses (binary class). The hidden layers use the hyperbolic tangent activation function and the output layer uses sigmoid activation function.
J. Kiani et al. / Computers and Structures 241 (2020) 106355 7

Table 2
The considered methods for training ANNs.

Scenario name Learning Process Weights Dataset Batch Size


AL-IMB-T AL Transferred Imbalanced 10
AL-IMB-R AL Random Imbalanced 10
AL-BAL-T AL Transferred Balanced 10
PL-IMB-T PL Transferred Imbalanced 10
AL-IMB-T-5 AL Transferred Imbalanced 5
AL-IMB-T-15 AL Transferred Imbalanced 15
AL-IMB-T-20 AL Transferred Imbalanced 20

In total, the cases presented in Table 2 are considered for this ered limit state, meaning that half of them belongs to Class 2.
study. Because of that AL approximately choose equally from both classes
over this level. More importantly, the intensity of this level in
terms of SI is 330 cm s/s, which is interestingly very close to the
4.2. Properties of the selected GMs by the active learning algorithm median of the resulted fragility curve from the observed data
(l ¼ 339cm:s=s). This demonstrates that AL finds those GM sam-
Fig. 4 shows the distribution of all applied GM samples as a ples from the hazard level with the same intensity as the median
function of the first and second principal components, which are of the fragility curve as informative and valuable data points. On
the linear combinations of original GM features. Also, this figure the other hand, the lowest contribution is for low-intensity levels,
shows the location of queried GM samples at different iterations in which most of GMs belong to Class 1 for both balanced and
of the AL process. As seen, the proposed methodology keeps track imbalanced datasets. Specifically, the total contribution of levels
of the used GMs and queries for the label of new GMs. Additionally, of 1 to 10 is about 8%, showing that the selected GMs over these
Fig. 5 shows magnitude-distance of all unlabeled GMs associated hazard levels are not highly valuable in improving the performance
with the location of labeled GMs over three different iterations. of ANNs and in predicting the structural responses. Finally, Fig. 6
The results demonstrate that the algorithm initially selects GM shows that there is no significant difference between the results
samples over a wide range of intensities, magnitudes, and dis- of four different AL scenarios.
tances due to using the criteria of diversity and representativeness. Fig. 7 presents the percentage of the selected GM samples from
In the next iterations, the algorithm also intends to select GMs over different hazard levels over ten initial iterations of AL-IMB-T. This
a wide range of magnitude and distance. However, it mainly figure shows that AL initially chooses GM samples over a wide
chooses those GM samples around the minimum values of first range of hazard levels to consider diversity and representativeness.
principal components, which are equivalent to high-intensity Specifically, a considerable number of the selected GM samples
GMs (i.e., GMs with high SI values). Hence, committee members come from the hazard levels of 3, 7, 13, and 18. However, as the
have a considerable disagreement about GM samples from high algorithm goes further, the contribution of low-intensity levels sig-
earthquake intensity levels. nificantly decreases, and the algorithm mostly finds those GM
Fig. 6 presents the percentage of the selected GM samples over samples over the hazard levels of 15 and 16 as informative ones.
different hazard levels and classes for different considered scenar- Interestingly, the contribution of very rare hazard levels (i.e., 17–
ios of AL. The AL algorithm intends to use those GM samples from 20) is less than the levels of 15 and 16 with the intensity close to
the hazard level of 15 (with the exceedance probability of 0.05% in the median of fragility curves. For the present case, which is an
50 years and SI = 330 cm s/s) for all considered AL cases. This haz- imbalanced dataset, the algorithm has initially little information
ard level, on average, has a contribution of about 18% (i.e., 8.5% about the minority class (Class 2) and intends to select most of
from Class 1 and 9.5% from Class 2). Accumulation of more training GM samples from the majority class (Class 1). Then, the algorithm
GM samples at this hazard level suggests that learning over this will be less certain about Class 2 and, therefore, produces more GM
level is very challenging. Moreover, Fig. 1 (a) shows that about
50% of the applied GM samples over this level exceed the consid-

Fig. 4. Distribution of selected GM samples based on AL-IMB-T for repetition 1. (a) also shows the initial selected set of GM samples using k-means clustering. (b) and (c)
presents the queried GM samples by the algorithm after getting trained for10 and 100 training GM samples, respectively.
8 J. Kiani et al. / Computers and Structures 241 (2020) 106355

Fig. 5. Magnitude-distance distribution of selected GMs based on AL-IMB-T for repetition 1. (a) Also shows the initial selected set of GMs using k-means clustering. (b) and (c)
presents the queried GMs by the algorithm after getting trained for10 and 100 training GM samples, respectively.

Fig. 6. Bar plot showing the average percentage of used GMs over different hazard levels based on their class for 10 initial iterations. Note that this figure is the average of the
results for 100 repetitions.

samples for labeling from this class. As shown, after the initial iter- to 600 GMs) using the imbalanced dataset. There are a few findings
ation, the model mostly chooses GM samples from Class 2. concerning the results presented in this figure:

4.3. Performance of active learning versus passive learning  As expected, the performance of ANNs is significantly higher
when considering Class 1 as the positive class rather than Class
Fig. 8 compares the performance of AL and PL in terms of F- 2 irrespective of the applied sampling method. This is mainly
score per class with the number of populations (ranging from 10 due to the unequal distribution of classes within the applied
J. Kiani et al. / Computers and Structures 241 (2020) 106355 9

Fig. 7. The percentage of the selected GMs from each hazard level over ten initial iterations for AL-IMB-T. Note that this figure is the average results for 100 repetitions.

Fig. 8. (a) Performance of models trained using AL-IMB-T and PL-IMB-T methods in terms of F-score as a function of the number of populations. Pink and green dots,
respectively, represent single replications of the AL-IMB-T and PL-IMB-T experiments. The solid black line represents the average performance of AL-IMB-T, and the yellow
dashed line shows the average performance of PL-IMB-T. A small amount of horizontal random variation is added to each data point to handle over plotting. (b) Density
distributions of F-scores in three different population categories. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version
of this article.)

dataset. The imbalance dataset biases the performance of the  The results demonstrate that AL generally provides larger F-
prediction models towards the majority class; Class 1. Addition- scores in comparison to PL. For both classes, PL, which are
ally, a comparison of density plots explains that the variation of trained on the randomly labeled data, does not achieve the
the results is higher for Class 2 than Class 1. same performance as AL given the same number of data sam-
ples used for training. More importantly, the efficiency of AL
10 J. Kiani et al. / Computers and Structures 241 (2020) 106355

is more visible when it comes to Class 2, which is the minority tion of variance and suggested a method to correct the variance
class. Specifically, for Class 2, the performance of ANNs estimate. Hence, the paired Student’s t-test with the Nadeau and
improves, on average, by 10% when using AL instead of PL. In Bengio correction is used to determine the statistical significance
overall, using the gain in performance and reduction in the cost of the difference between the results of AL and PL.
resulting from the proposed method based on AL are clear. The difference between the F-scores for the AL and PL, dj , is:
 At initial iterations (i.e., up to about 40 GM samples for Class 1,
dj ¼ F ALj  F PLj ð2Þ
and up to about 80 GM samples for Class 2), both AL and PL
methods provide poor performance in terms of F-score. At these where F ALj and F PLj are, respectively, the F-scores for the AL and PL
iterations, increasing the training samples considerably models measured on the repetition j (1  j  n).
improves the average performance of ANNs irrespective of the In addition, assume that n1 GM samples are used for training
applied learning process. However, after these thresholds, the ANNs, and the remaining n2 GM samples are used for testing. The
average performance of ANNs does not change with the t-statistic of the corrected paired Student’s t-test is:
increase in the training size. This demonstrates that there is Pn
1
an optimal number for GM samples needed for training the pre- n j¼1 dj
t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  ð3Þ
diction models that leads to acceptable performance. 1
þ nn2 r b2
 The variation in PL’s results is much higher compared to AL’s n 1

results. The density plots also confirm that the AL models are
intended to be more focused than the PL models. Furthermore, where r b 2 is the estimate of the variance of n differences (dj).
the uncertainty in the results decreases as the training size This statistic is used with the Student-t distribution with n-1
increases, particularly for the AL method. degrees of freedom to compute the significance of the results and
 The results show that F-scores for some of the models trained the confidence intervals. The null hypothesis is: for a randomly
based on PL is small and never improve. When the weights of drawn training set, AL and PL will have the same performance on
the pre-trained models used as the starting points for the next a randomly drawn test sample. If the null hypothesis is rejected
iteration, the loss function for ANNs may end up in local minima it suggests that the difference in F-scores is statistically significant.
and never get better. This issue can happen in both AL and PL; Fig. 9 presents the resulted t-statistic values, P-values, and 95%
however, in PL, it is more common as shown in Fig. 8. As a confidence intervals of differences as a function of the number of
result, this study uses an ensemble model (i.e., combining the data points used for training. The resulted P-values are smaller
predictions form multiple models trained in 100 repetitions) than the considered significance level (i.e., 5%), meaning that the
to reduce the effect of this issue. null hypothesis can be rejected. Therefore, the results provide con-
vincing evidence that AL and PL perform differently.
Additionally, the confidence intervals presented in Fig. 9 explain
4.4. Statistical significance of the results that AL performs better than PL, particularly for Class 2. For exam-
ple, the results show that with 95% confidence AL results in F-
An important factor in the performance analysis of AL and PL is scores between 0.06 and 0.093 greater than those for PL when
whether the results statistically provide convincing evidence that using 100 GM samples for training. On the other hand, the
AL outperforms PL in predicting the structural responses. It is improvement in F-scores for Class 1 is not as significant as that
required to estimate whether the difference between AL and PL for Class 2. The confidence intervals for Class 2 are so close to
results is true and reliable or just due to statistical chance. The the average difference and not clear in the plot.
most popular statistical test used for this purpose is the paired Stu-
dent’s t-test combined via random subsamples of the training data- 4.5. The effect of using pre-trained weights
set [36]. However, the key assumption of this statistical test is
violated when using random resampling because the test and Fig. 10 compares the performance of AL in two different cases,
training sets overlap in different repetitions and, thus, they are including AL-IMB-T, which uses the pre-trained weights from the
not independent [36]. Nadeau and Bengio [37] demonstrated that previous iteration, and AL-IMB-R, which randomly initializing
overlapping of test and training sets might lead to an underestima- weights at each iteration. This figure shows that the difference

Fig. 9. (a) t-static values resulted from the corrected paired Student’s t-test, which is used for examining the significance of the results, (b) the P-values resulted from the
corrected paired Student’s t-test, measuring the difference between AL and PL results, and (c) 95% confidence intervals for the average difference between the performance of
the AL-IMB-T and PL-IMB-T.
J. Kiani et al. / Computers and Structures 241 (2020) 106355 11

Fig. 10. (a) Performance of models trained using AL-IMB-T and AL-IMB-R, which measure the effect of using pre-trained weights on the AL results, in terms of F-score as a
function of the number of applied GM samples for training. Pink and green dots, respectively, represent single replications of the AL-IMB-T and AL-IMB-R experiments. The
solid black line represents the average performance of AL-IMB-T, and the yellow dashed line shows the average performance of AL-IMB-R. A small amount of horizontal
random variation is added to each data point to handle over plotting. (b) Density distributions of F-scores in three population categories. (For interpretation of the references
to colour in this figure legend, the reader is referred to the web version of this article.)

between results is negligible for Class 1; while using pre-trained cate that the performance of ANNs for the minority class (Class
weights instead of randomly initializing them can very slightly 2) is independent of the level of imbalance in the dataset. The sta-
decrease the performance of models for Class 2. Fig. 11(b) shows tistical results presented in Fig. 13 also confirms that both AL-BAL-
that the P-values are approximately greater than the considered T and AL-IMB-T perform the same in the case of Class 2. This can be
significance level when using less than 100 GM samples for train- justified by the fact that the total number of data points belongs to
ing. This demonstrates that there is no enough evidence to reject Class 2 does not change when balancing the data. However, there is
the null hypothesis stating AL-IMB-T and AL-IMB-R have the same a drop in F-scores for Class 1 when balancing the dataset. This drop
performance at the initial iterations. However, after ten initial iter- is also evident from the statistical results presented in Fig. 13. The
ations, the difference in results due to using pre-trained or random P-values for Class 1 are almost zero for different training sizes
weights is statistically significant based on the resulted P-values meaning that the difference between the means of F-scores for
shown in Fig. 11(c), though the differences are very small and neg- AL-BAL-T and AL-IMB-T are statistically considerable. Also,
ligible as shown in Fig. 11(c). Fig. 13(c) shows that the decrease in the performance of ANNs
for Class 1 is about 0.04 due to balancing the dataset. This decrease
in the performance of ANNs in case of the majority class is because
4.6. The effect of balancing dataset of the decrease in the number of data points belongs to Class 1.
Interestingly, a comparison between the results presented in
Fig. 12 shows the performance of AL when using the balanced Fig. 12 and Fig. 8 explains that the difference between the perfor-
(AL-BAL-T) and imbalanced (AL-IMB-T) datasets. The results indi-

Fig. 11. (a) t-static values resulted from the corrected paired Student’s t-test, which is used for examining the significance of the results, (b) the P-values resulted from the
corrected paired Student’s t-test measuring the effect of using pre-trained weights on the AL results, and (c) 95% confidence intervals for the average difference between the
performance of the AL-IMB-T and AL-IMB-R methods.
12 J. Kiani et al. / Computers and Structures 241 (2020) 106355

Fig. 12. (a) Performance of models trained using AL-IMB-T and AL-BAL-T methods, which measure the effect of class imbalance on the AL results, in terms of F-score as a
function of the number of applied GMs. Pink and green dots, respectively, represent single replications of the AL-IMB-T and AL-BAL-T experiments. The solid black line
represents the average performance of AL-IMB-T, and the yellow dashed line shows the average performance of AL-BAL-T. A small amount of horizontal random variation is
added to each data point to handle over plotting. (b) Density distributions of F-scores in three population categories. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)

Fig. 13. (a) t-static values resulted from the corrected paired Student’s t-test, (b) the P-values resulted from the corrected paired Student’s t-test measuring the effect of class
imbalance on the AL results, and (c) 95% confidence intervals for the average difference between the performance of the AL-IMB-T and AL-BAL-T methods.

Fig. 14. (a) F-scores for AL-IMB-T and PL-IMB-T models versus the training size for repetition 1, (b) F-score variation rates for AL-IMB-T and PL-IMB-T versus the iteration
number for repetition 1, (c) The mean and standard deviation of F-score variation rates for AL-IMB-T and PL-IMB-T versus the iteration number for 100 repetitions.
J. Kiani et al. / Computers and Structures 241 (2020) 106355 13

mance of the AL and PL methods is more visible in case of imbal- consider a large pool of unlabeled data points containing GM sam-
anced dataset than the balanced one. Therefore, to gain an accept- ples from a wide range of hazard levels without any concerns that
able performance when using AL, it is suggested to initially this might lead to a high level of imbalance in the dataset.

Fig. 15. (a), (c), and (e) the P-values resulted from the corrected paired Student’s t-test comparing the average performance of the AL models with the batch size of 10 with
the AL models with the batch size 5, 15, 20, respectively. (b), (d), and (f) 95% confidence intervals for the average difference between the performance of the AL model with the
batch size of 10 and the AL models with batch sizes of 5, 15, and 20, respectively.
14 J. Kiani et al. / Computers and Structures 241 (2020) 106355

points for training. Hence, there is enough evidence to reject the


null hypothesis, which assumes that models with batch sizes of 5
and 10 have the same performance. Additionally, Fig. 15(b) shows
that the maximum difference in F-scores is about 0.03 for Class 2.
The negative values indicate that the average results for models
with a batch size of 10 are less than those for models with a batch
size of 5. Additionally, Fig. 16 compares the stability of models
when using batch sizes of 5 and 10. There is no significant differ-
ence between the distribution of the variation rates of F-scores
for these batch sizes. Hence, using a batch size of 5 does not lead
to unstable results; while it slightly improves the performance of
prediction models, particularly over initial iterations when the
training size is small. However, the computational cost
increases when using a small batch size, which is not appealing
practically.
Fig. 16. The mean and standard deviation of F-score variation rates for models with
batch sizes of 5 and 10 versus the iteration number for 100 repetitions.

5. Conclusion

4.7. Stability analysis A methodology is presented based on the pool-based query by


committee active learning (AL) method to reduce the computa-
Stability is a desirable and important criterion of machine tional cost associated with developing machine learning models
learning tools that provides a basis for the reproducibility. A for predicting the structural responses. Using a step by step
machine learning algorithm is said to be stable when the trained method, three criteria is considered, including informativeness,
model produces consistent predictions with respect to small diversity, and representativeness, to choose the most
changes in training samples. This criterion is examined in this sec- valuable data points or ground motion samples for performing
tion to compare the performance of AL and PL. Fig. 14(a) shows response history analyses. Artificial neural networks (ANNs) is
how F-scores from AL and PL for repetition 1 change as the number used as the surrogate models for predicting the structural
of training samples increases. For PL, there are significant fluctua- responses. From the results, there are several important
tions in F-scores as the training size changes; while AL F-scores findings:
appear to be stable. AL F-scores initially increase as the number
of training samples increases, then, remain quite constant.  The results demonstrated that AL could produce high-level per-
To quantify these changes in the performance of AL and PL, the formance classification models in comparison to the commonly
stability is defined as the rate of variation in F-scores with respect used passive learning (PL) method given the same number of
to the change in training size. Fig. 14(b) presents F-score variation labeled data points. Hence, a smaller number of labeled ground
rates, which are the differences between successive F-scores motion samples is required for training ANNs when using AL.
divided by the batch size, for both AL and PL for repetition 1. F- Hence, the number of response history analyses required for
score variation rate for AL is constantly increasing over initial iter- labeling ground motion samples decrease when using AL.
ations, demonstrating the improvement in the performance of the  Using AL improves the generalizability of the trained models in
models with the increase in the size of training. After several iter- comparison to the models trained using PL, which randomly
ations, F-score variation rate is approximately zero. On the other selects the input data points. Specifically, it was found that
hand, the instability of the PL models is evident from the fluctua- the variation in the performance of the models trained using
tions in F-score variation rates. Fig. 14(c) shows the mean and stan- AL is much smaller than that for the models trained using PL.
dard deviation of F-score variation rates for all 100 repetitions  AL can be used as a remedy to decrease the detrimental effects
which confirms that using AL leads to more stable results than PL. of class imbalance, which is an inherent property of the struc-
tural engineering datasets, on the performance of prediction
models.
4.8. The effect of batch size  The effect of batch size on the performance of models trained
using AL is examined. The statistical results demonstrated that
In the previous sections, a batch size of 10 is used for training using a batch size of 5 could lead to better results over the ini-
ANNs meaning that the algorithm selects 10 data points at a time tial iterations in which the total number of labeled data points
to be labeled. In this section, the importance of the batch size is used for training is small. However, using a small batch size is
examined on the performance of AL. Fig. 15 presents the results computationally demanding. In this trade-off situation, it is sug-
of the corrected paired Student’s t-test comparing the performance gested using a batch size of 10.
of models trained using three different batch sizes including 5, 15,  The performance of ANNs does not significantly change when
and 20 with the model trained using a batch size of 10. For this using either of pre-trained or random weights as the initial
case, the null hypothesis is that there is no difference between parameters during training.
the average performance of models when using a batch size other
than 10. For batch sizes of 15 and 20, Fig. 15(c) and Fig. 15(e) show Although the results of this study are based on a specific data-
that the P-values are approximately greater than 5% explaining set, they are promising, and the proposed method can be used to
that the difference in the results due to using a batch size greater train a surrogate model for predicting the seismic demand in any
than 10 is not statistically significant. For these two cases, Fig. 15 structural systems. More importantly, in this study, a data set is
(d) and Fig. 15(f) also confirm that the differences are small and used with a small number of unlabeled data points. However, using
negligible. a very large unlabeled dataset (considering that collecting a large
When it comes to the model with a batch size of 5, as shown in number of unlabeled data points is not computationally demand-
Fig. 15(a), the P-values are less than 5% when using less 200 data ing) would improve the results.
J. Kiani et al. / Computers and Structures 241 (2020) 106355 15

Declaration of Competing Interest [17] Mangalathu S, Jeon J-S. Stripe-based fragility analysis of multispan concrete
bridge classes using machine learning techniques. Earthq Eng Struct Dyn 2019.
https://doi.org/10.1002/eqe.3183.
The authors declare that they have no known competing finan- [18] Zhang R, Chen Z, Chen S, Zheng J, Büyüköztürk O, Sun H. Deep long short-term
cial interests or personal relationships that could have appeared memory networks for nonlinear structural seismic response prediction.
Comput Struct 2019;220:55–68. https://doi.org/10.1016/J.
to influence the work reported in this paper.
COMPSTRUC.2019.05.006.
[19] Mangalathu S, Jeon JS, Padgett JE, DesRoches R. ANCOVA-based grouping of
bridge classes for seismic fragility assessment. Eng Struct 2016;123:379–94.
References https://doi.org/10.1016/j.engstruct.2016.05.054.
[20] Kiani J. Application of machine learning and statistical tools for ground motion
[1] Benjamin J, Cornell C. Probability, statistics, and decision for civil selection. Univ Memphis 2020.
engineers. New York: McGraw-Hill; 1970. [21] Settles B. Computer sciences active learning literature survey. Sci York 2009.
[2] Kiani J, Camp C, Pezeshk S. On the number of required response history [22] Khoshnevis N, Taborda R. Application of pool-based active learning in physics-
analyses. Bull Earthq Eng 2018;16:5195–226. https://doi.org/10.1007/s10518- based earthquake ground-motion simulation. Seismol Res Lett
018-0381-1. 2019;90:614–22. https://doi.org/10.1785/0220180296.
[3] Eads L, Miranda E, Krawinkler H, Lignos DG. An efficient method for estimating [23] Bradley BA. A generalized conditional intensity measure approach and holistic
the collapse risk of structures in seismic regions. Earthq Eng Struct Dyn ground-motion selection 2010;89:501–20. https://doi.org/10.1002/eqe.995.
2013;42:25–41. https://doi.org/10.1002/eqe.2191. [24] Ghassemieh M, Kiani J. Seismic evaluation of reduced beam section frames
[4] Buratti N, Stafford PJ, Bommer JJ. Earthquake accelerogram selection and considering connection flexibility. Struct Des Tall Spec Build
scaling procedures for estimating the distribution of drift response. J Struct Eng 2013;22:1248–69. https://doi.org/10.1002/tal.1003.
2011;137:345–57. https://doi.org/10.1061/(asce)st.1943-541x.0000217. [25] Lignos DG, Krawinkler H. Deterioration modeling of steel components in
[5] Reyes JC, Kalkan E. How many records should be used in an ASCE/SEI-7 ground support of collapse prediction of steel moment frames under earthquake
motion scaling procedure?. Earthq Spectra 2012;28:1223–42. https://doi.org/ loading. J Struct Eng 2011;137:1291–302. https://doi.org/10.1061/(ASCE)
10.1193/1.4000066. ST.1943-541X.0000376.
[6] Hancock J, Bommer JJ, Stafford PJ. Numbers of scaled and matched [26] Bradley BA. A ground motion selection algorithm based on the generalized
accelerograms required for inelastic dynamic analyses. Earthq Eng Struct conditional intensity measure approach. Soil Dyn Earthq Eng 2012;40:48–61.
Dyn 2008;37:1585–607. https://doi.org/10.1002/eqe.827. https://doi.org/10.1016/j.soildyn.2012.04.007.
[7] Shome N, Cornell CA, Bazzurro P, Carballo JE. Earthquakes, records, and [27] Kiani J, Pezeshk S. Sensitivity analysis of the seismic demands of RC moment
nonlinear responses. Earthq Spectra 1998;14:469–500. https://doi.org/ resisting frames to different aspects of ground motions. Earthq Eng Struct Dyn
10.1193/1.1586011. 2017;46:2739–55. https://doi.org/10.1002/eqe.2928.
[8] Baker JW. Efficient analytical fragility function fitting using dynamic structural [28] Field EH, Jordan TH, Cornell CA. OpenSHA: A developing community-modeling
analysis. Earthq Spectra 2015;31:579–99. https://doi.org/10.1193/ environment for seismic hazard analysis. Seismol Res Lett 2003;74:406–19.
021113EQS025M. https://doi.org/10.1785/gssrl.74.4.406.
[9] Gehl P, Douglas J, Seyedi DM. Influence of the number of dynamic analyses on [29] Jalayer F, Cornell CA. Alternative non-linear demand estimation methods for
the accuracy of structural response estimates. Earthq Spectra 2015;31:97–113. probability-based seismic assessments. Earthq Eng Struct Dyn
https://doi.org/10.1193/102912EQS320M. 2009;38:951–72. https://doi.org/10.1002/eqe.876.
[10] Khoshnevis N. Gaining scientific and engineering insight into ground motion [30] Kiani J, Camp C, Pezeshk S. The importance of non-spectral intensity measures
simulation through machine learning and approximate modeling approaches. on the risk-based structural responses. Soil Dyn Earthq Eng 2019;120:97–112.
Univ Memphis 2018. https://doi.org/10.1016/j.soildyn.2019.01.036.
[11] Lagaros ND, Fragiadakis M. Fragility assessment of steel frames using neural [31] Seung HS, Oppert M, Sompolinsky H. Query by committee. fifth Annu. Work
networks. Earthq Spectra 2007;23:735–52. https://doi.org/10.1193/ Comput Learn theory 1992.
1.2798241. [32] Wu D. Pool-based sequential active learning for regression. IEEE Trans Neural
[12] Lagaros ND, Tsompanakis Y, Psarropoulos PN, Georgopoulos EC. Networks Learn Syst 2018:1–12. https://doi.org/10.1109/
Computationally efficient seismic fragility analysis of geostructures. Comput TNNLS.2018.2868649.
Struct 2009;87:1195–203. https://doi.org/10.1016/j.compstruc.2008.12.001. [33] Macqueen J. Some methods for classification and analysis of multivariate
[13] Mitropoulou CC, Papadrakakis M. Developing fragility curves based on neural observations. 5th Berkeley Symp. Math Stat Probab 1967:281–97.
network IDA predictions. Eng Struct 2011;33:3409–21. https://doi.org/ [34] Dagan I, Engelson SP. Committee-based sampling for training probabilistic
10.1016/J.ENGSTRUCT.2011.07.005. classiiers. Int Conf Mach Learn 1995:150–7.
[14] Mahmoudi SN, Chouinard L. Seismic fragility assessment of highway bridges [35] Nguyen D, Widrow B. Improving the learning speed of 2-layer neural networks
using support vector machines. Bull Earthq Eng 2016;14:1571–87. https://doi. by choosing initial values of the adaptive weights. IJCNN Int Jt Conf Neural
org/10.1007/s10518-016-9894-7. Networks, Publ by IEEE 1990:21–6. https://doi.org/10.1109/
[15] Jalayer F, Ebrahimian H, Miano A, Manfredi G, Sezen H. Analytical fragility ijcnn.1990.137819.
assessment using unscaled ground motion records. Earthq Eng Struct Dyn [36] Dietterich TG. Approximate statistical tests for comparing supervised
2017;46:2639–63. https://doi.org/10.1002/eqe.2922. classification learning algorithms. Neural Comput 1998;10:1895–923.
[16] Kiani J, Camp C, Pezeshk S. On the application of machine learning techniques https://doi.org/10.1162/089976698300017197.
to derive seismic fragility curves. Comput Struct 2019;218:108–22. https:// [37] Nadeau C. Inference for the generalization error. Mach Learn 2003;52:239–81.
doi.org/10.1016/j.compstruc.2019.03.004. https://doi.org/10.1023/A:1024068626366.

You might also like