You are on page 1of 11

IIE Transactions (1997) 29, 91±101

Comparison of regression and neural network models


for prediction of inspection pro®les for aging aircraft

JAMES T. LUXHéJ1, TREFOR P. WILLIAMS2 and HUAN-JYH SHYUR1


1
Department of Industrial Engineering, Rutgers University, P.O. Box 909, Piscataway, NJ 08855-0909, USA
2
Department of Civil Engineering, Rutgers University, P.O. Box 909, Piscataway, NJ 08855-0909, USA

Received May 1994 and accepted May 1996

Currently under phase 2 development by the Federal Aviation Administration (FAA), the Safety Performance Analysis System
(SPAS) contains `alert' indicators of aircraft safety performance that can signal potential problem areas for inspectors. The Service
Diculty Reporting (SDR) system is one component of SPAS and contains data related to the identi®cation of abnormal,
potentially unsafe conditions in aircraft and/or aircraft components/equipment.
SPAS contains performance indicators to assist safety inspectors in diagnosing an airline's safety `pro®le' compared with others
in the same peer class. This paper details the development of SDR prediction models for the DC-9 aircraft by analyzing sample
data from the SDR database that have been merged with aircraft utilization data. Both multiple regression and neural networks are
used to create prediction models for the overall number of SDRs and for SDR cracking and corrosion cases. These prediction
models establish a range for the number of SDRs outside which safety advisory warnings would be issued. It appears that a data
`grouping' strategy to create aircraft `pro®les' is very e€ective at enhancing the predictive accuracy of the models. The results from
each competing modeling approach are compared and managerial implications to improve the SDR performance indicator in
SPAS are provided.

1. Introduction to have, adjusting for age of the aircraft, ¯ight time, and
landings, could help to identify situations in need of
A repairable or `maintained' system is `a system which, heightened level of surveillance by the FAA's safety in-
after failing to perform one or more of its functions sa- spectors, for example if the airline's number of SDRs is
tisfactorily, can be restored to fully satisfactory perfor- far above or below what should be expected. An excessive
mance by any method, other than replacement of the number of SDRs in a given time period could suggest
entire system' [1]. E€ective and ecient maintenance mechanical, operating, or design problems with certain
management is essential not only for production systems aircraft. Although too few SDRs reported in a given time
but also for large scale service systems, such as air and may not necessarily be problematic, an expert panel of
surface transport systems. These repairable systems are safety inspectors noted that a very low number of SDRs
subject to aging mechanisms such as wear, fatigue, creep, for an airline in a given time period could possibly suggest
and stress corrosion. Inspection and diagnostic activities organizational or management problems, lack of reg-
are integral components of an e€ective maintenance ulatory compliance, airline maintenance cutbacks, or ®-
strategy in an attempt to ensure system safety, reliability, nancial or labor problems. Both situations would merit
and availability. closer scrutiny by FAA safety inspectors.
The Federal Aviation Administration (FAA) has es- The CMAS research is only one initiative of a larger
tablished a Center for Computational Modeling of Air- FAA research program, termed the Safety Performance
craft Structures (CMAS) at Rutgers University. One Analysis System (SPAS), that is an analytical tool in-
CMAS research project has focused on analyzing the tended to support FAA inspection activities [2, 3] and
contribution of the FAA's Service Diculty Reporting that contains numerous indicators of safety performance
(SDR) database to aircraft safety. The SDR system for signaling potential problem areas for inspector con-
contains data related to the identi®cation of abnormal, sideration [4±6].
potentially unsafe conditions in aircraft or aircraft com- The numerous performance indicators that are cur-
ponents and equipment. rently de®ned in SPAS assist in diagnosing an airline's
Estimation of the total number of SDRs in a given `pro®le' compared with others in the same peer class. The
time interval that a particular airline would be expected currently planned SDR performance indicator is the

0740-817X Ó 1997 ``IIE''


92 Luxhùj et al.

number of SDR records for the airline for the de®ned aircraft. In addition to ¯ight hours and number of land-
period [4]. This preliminary SDR performance indicator ings, aging mechanisms such as wear, fatigue, creep, and
in SPAS, by not allowing inspectors to di€erentiate be- stress corrosion contribute to reported incidents of
tween di€erent types of problem, is too simplistic to be of cracking and corrosion in an aircraft's fuselage and other
practical value. The Rutgers University CMAS research major structural components, and models are also de-
examined this planned indicator, and the result of our veloped to predict the average number of SDRs for
research e€ort was the construction of more re®ned, cracking and corrosion cases. The identi®cation of un-
speci®c SDR performance indicators. The tracking of favorable trends will enable the FAA to specify that the
performance indicators also facilitates the identi®cation airlines take preemptive maintenance measures.
of unfavorable trends. The eventual goal of this research
is to develop an intelligent decision system that will be a
hybrid of expert system and neural network technologies
3. Data description
supported by aviation databases to facilitate organiza-
tional coordination and ecient workload scheduling for
The CMAS research team was provided with a subset of
aircraft safety inspectors under budgetary and stang
the SDR database that had been merged with the Aircraft
constraints.
Utilization (ARS) database for the same set of planes.
This merged database was created by Battelle [7] and
consisted of 1308 observations for the DC-9 aircraft for
2. Research methodology
the period April 1974 to March 1990. Table 1 displays
sample data. Only the following quantitative data for
The currently planned SDR performance indicator is S,
each plane were available in the merged database:
which is simply the number of SDR records for the airline
for the de®ned period. The count of records is not nor- · age,
malized. If S > 0, the indicator status is set as `expected'; · estimated ¯ight hours, and
if S = 0, the indicator status is set as `advisory' [4]. It is · estimated number of landings.
expected that over a six-month period, normal operations
Because actual data on ¯ight hours and landings were not
by an airline will lead to ®nding a non-zero number of
reported directly in the SDR database, the estimated
SDRs due to routine and non-routine maintenance. This
¯ight hours and estimated landings are derived from the
`alert' indicator is too general to be of practical value to
original delivery date of the plane to the ®rst airline, the
safety inspectors; for instance, it fails to di€erentiate by
date of the ARS data reference, and the SDR date. The
age of the aircraft, ¯ight time, and number of landings.
equations developed by Battelle for these derived values
Although many prediction methods exist in the lit-
are reported in Rice [7] and are presented below:
erature, this research focuses on only two modeling ap-
proaches to develop more re®ned SDR predictors: Estimated flight hours ˆ ‰…SDR date ÿ service date†=
multiple regression and neural networks. Multiple re- …ARS date ÿ service date†Š
gression represents a `classical' approach to multivariate  FHSCUM;
data analysis whereas the emerging ®eld of neural net-
works represents a `new' approach to nonlinear data Estimated number ˆ ‰…SDR date ÿ service date†=
analysis. of landings …ARS date ÿ service date†Š
The regression and neural network models presented in  LDGSCUM;
this paper may be used to predict the average aggregate
number of SDRs in a given time interval for the DC-9 where

Table 1. Sample of SDR and ARS `merged' data [7]

Aircraft Serial SDR date Part name Part location Part condition Estimated age Estimated Estimated
model numbera ¯ight hours landings
DC9 333 84±03±22 Skin E + E Compt Cracked 17.74 32 619.03 53 999.20
DC9 333 84±03±22 Skin Aft bag bin Cracked 17.74 32 619.03 53 999.20
DC9 333 86±07±07 Skin Fuselage Cracked 20.03 36 836.23 60 980.56
DC9 444 80±06±20 Skin Galley door Cracked 13.24 34 396.44 33 888.77
DC9 444 81±12±01 Skin FS625 Corroded 14.69 38 160.55 37 597.32
DC9 444 87±05±11 Skin Rt wheel well Cracked 20.14 52 299.10 51 527.19
DC9 444 87±05±11 Skin STA 580-590 Cracked 20.14 52 299.10 51 527.19
a
Fictitious serial numbers are used owing to con®dentiality of data.
Prediction of inspection pro®les for aging aircraft 93

SDR date ˆ date of the SDR report …SDR database†; landings are developed on the basis of a smaller set of
service date ˆ original delivery date of the plane to averaged data to predict the total expected number of
SDRs per year, the number of SDRs per year for cracked
the first airline …ARS database†; cases, and the number of SDRs per year for corrosion
ARS date ˆ date of the ARS report …ARS database†; cases for the DC-9 aircraft. The `best' grouping strategy
FHSCUM ˆ cumulative fuselage flight hours for each model case is then selected based on the highest
R2 value.
…ARS database†; and To provide a means for checking the SDR predictions
LDGSCUM ˆ cumulative fuselage landings against existing data, the data were partitioned into two
…ARS database†: di€erent sets on the basis of aircraft serial numbers. The
®rst set was used to build the prediction model and the
Because the ARS date time lagged the SDR date, Rice second was used to evaluate the prediction model's per-
extrapolated the quantitative ARS data on ¯ight hours formance on un®t data. Such an approach is useful for
and landings to the SDR date. He developed a multiplier testing prediction model generality [17]. This approach is
by calculating the ratio of [(SDR date ) service date)/ also used in neural network modeling and is analogous to
(ARS date ) service date)] and then extrapolated the creating a `training' set of data to build the model and a
¯ight hours and landings at the ARS date to the date of `production' set of data to evaluate model performance
the SDR. on new data. These terms are used in this paper to dis-
tinguish between the two data sets. The original data were
partitioned into mutually exclusive training and produc-
4. Multiple regression models tion sets by using serial numbers for the di€erent aircraft.
Two-thirds of the data were placed into the training set,
Initially, regression models were created from the 1308 and one-third into the production set.
DC-9 observations in their original format, referred to as After the data have been partitioned into training and
the `ungrouped' data. For the ungrouped data, the production sets, a grouping strategy is similarly applied
number of SDRs for each airplane is based on the cu- to each data set. For example, an `age'grouping strategy
mulative number of data records (each record represents is outlined below:
only one SDR). When cases with missing data were
1. Group the data to create age `cohorts' (i.e., groups of
eliminated, there were a total of 1229 usable data cases.
1, 2, 3, ... -year old planes).
The coecients of multiple determination, or R2 values,
2. Calculate the `average' ¯ight hours and number of
for these models were very low, with the `best' model
landings for each `age cohort'.
having an R2 of 0.2448 and a coecient of variation
3. Calculate the average number of SDRs per number
(C.V.) of 69.85. The C.V. reported here is the ratio of the
of aircraft in each `age cohort'.
root-mean-square error of the model to the sample mean
of the dependent variable multiplied by 100 and indicates Forward stepwise regression is used where variables are
how well the model ®ts the data. If the model does not ®t added one at a time. Partial correlation coecients are
the data well, then the C.V. becomes large. It appeared examined to identify an additional predictor variable that
that there was much noise in the data because a plot of explains both a signi®cant portion and the largest portion
the ungrouped data revealed extensive ¯uctuations. of the error remaining from the ®rst regression equation.
The forward stepwise procedure selects the `best' regres-
sion model based on the highest R2 from the following list
5. Data grouping strategies of possible explanatory variables: age, ¯ight hours,
number of landings, (age)2, (¯ight hours)2, (number of
In an attempt to create robust SDR prediction models landings)2, age ´ ¯ight hours, age ´ number of landings,
that will provide SDR pro®les for a representative DC-9, ¯ight hours ´ number of landings, ¯ight hours/age, and
di€erent data grouping strategies are used. Such an ap- number of landings/age. The default stopping criterion
proach was used in Brammer [8], Fabrycky [9], Fabrycky for the F test to determine which variable enters the
et al. [10], Frisch [11], Luxhùj and Jones [12], Luxhùj and model uses a signi®cance level of 0.15. In the second stage
Rizzo [13], and Luxhùj [14±16], to create large-scale lo- of our analysis, the best prediction model was chosen on
gistics models for the U.S. Navy. These `population' the basis of lowest Mean Square Error (MSE) on the
models were developed to determine both maintenance training and production data, because MSE is a better
and system repair/replacement strategies for large gro- indicator of predictive accuracy. The quadratic terms
upings of similar equipment on the basis of operating were considered in an inherently linear model to evaluate
hours, operating environment, failure mode, etc. any nonlinear relationships and the impact of interaction
By using multiple regression models, data grouping terms was evaluated. The forward stepwise procedure was
strategies for age, estimated ¯ight hours, and estimated used to ®nd a prediction equation with an R2 close to 1
94 Luxhùj et al.
Table 2. Results of Data Grouping Procedure

No. of data records `Grouped' no. of data records


Model Training Production Training Production
Overall no. of SDRs 805 424 16 14
No. of SDRs (cracking) 572 306 16 16
No. of SDRs (corrosion) 242 127 10 9

and to provide an equation that was economical: one that the results not generalizable as the parameter estimates in
used only a few independent variables. the model may not be stable owing to the high variance of
As a result of the grouping strategy, all interpretations the estimated coecients. Because ¯ying hours, number of
are now with respect to the average number of SDRs per landings, and the age of an aircraft are interrelated, mul-
year. In the example above, the dependent variable be- ticollinearity is inherent in the independent variables.
comes the average number of SDRs for a representative Two statistical measures of multicollinearity are the
DC-9 with a `pro®le' of estimated ¯ight hours and esti- tolerance (TOL) value and the variance in¯ation factor
mated landings as de®ned by its associated age cohort. (VIF) [18]. The tolerance value is equal to one minus the
For the grouped data, we now have the number of SDRs proportion of a variable's variance that is explained by
for each airplane with respect to an interval (i.e., age, the other predictors. A low tolerance value indicates a
¯ight hours, or landings). The di€erent structure of the high degree of collinearity. The variance in¯ation factor is
data between grouped and ungrouped records led to the reciprocal of the tolerance value, so a high variance
structural di€erences between the regression models and in¯ation factor suggests a high degree of collinearity
to the use of di€erent explanatory variables. present in the model. The VIF and TOL measures assume
The grouping procedure gave the results shown in normality and are typically relative measures. A high
Table 2. (above 0.10) and a low VIF (below 10) usually suggest a
A prediction model for the overall number of SDRs relatively small degree of multicollinearity [18].
per year for a representative DC-9 that uses the `age' data While parsimonious regression models were developed
grouping strategy is as follows: by observing the VIF and TOL measures during model
building and selection, an attempt was also made to re-
Overall no of SDRs ˆ …0:00256264  agesq†
move multicollinearity by removing the linear trend from
ÿ …4:038133  10ÿ9  fhrsq† the observed variables. Both the dependent and in-
‡ …0:002347  fhr=age† dependent variables were transformed by replacing their
observed values with their natural logarithms. Although
ÿ 4:173934:
this approach was successful in reducing multico-
Note that this prediction model makes use of only three llinearity, the resulting regression models all had higher
independent variables: the age squared (agesq), the ¯ight coecients of variation and lower R2 values than models
hours squared (fhrsq), and ¯ight hours/age (fhr/age). The without such variable transformations.
R2 is 0.9297, which indicates that this model can explain There are times in regression modeling when the as-
92.97% of variability of the expected number of overall sumption of constant error variance (i.e., homosce-
SDRs per year about its mean. This model was developed dasticity) may be unreasonable and heteroscedastic error
from 16 grouped data records that corresponded to air- disturbances will occur. When heteroscedasticity is pre-
craft ranging from approximately 8 to 24 years old. sent, ordinary least-squares estimation places more
An important point to remember when using this weight on the observations with large error variances
model is that one must have a suciently large data than on those with small error variances. The White Test
sample of DC-9 aircraft in order to compute `averages' of is used in this study to test for heteroscedasticity [17]. In
estimated landings and ¯ight hours for a speci®ed aircraft the White Test, the null hypothesis of homoscedasticity is
age. The more data that one has, the better one can model evaluated and the test does not depend critically on
a representative aircraft with the data grouping strategy normality. We report the results of the White Test on our
as previously discussed. data later in this section.
Alternative grouping strategies to `age' were also ex-
amined. Graphical analysis was used to examine the tra-
6. Regression-modeling adequacy issues deo€ of the number of observations versus adjusted R2
values to determine interval grouping sizes for estimated
The regression models were examined for multico- landings and estimated ¯ight hours. When using the data
llinearity, because a high degree of multicollinearity makes grouping strategy of estimated landings, the suggested
Prediction of inspection pro®les for aging aircraft 95

interval size is 4000 landings for the SDR cracking and e2 ˆ …ÿ0:0007  agesq† ‡ …1:89304  10ÿ10  fhrsq†
corrosion cases and 5500 landings for the total number of ‡ …0:00001052  fhr/age† ‡ 0:228357;
SDRs. When using the data grouping strategy of esti-
mated ¯ight hours, the suggested interval grouping size is with an R2 of 0.1038. The test statistic NR2 equalled (16)
4000 hours. When analyzing the graphs, the goal is to ®nd (0.1038) = 1.6608, which follows a v2 distribution with 3
an interval grouping size that maximizes the adjusted R2 degrees of freedom. The critical value of the v2 with 3
yet results in the use of a reasonable number of observa- degrees of freedom at the 5% signi®cance level is 7.81.
tions (i.e., n ³ 16, which corresponds to aircraft ranging Because 1.6608 < 7.81, we accept the null hypothesis of
from 8 to 24 years old) to facilitate model development. homoscedasticity. The White Test was similarly applied
Also, there were upper limits to the interval sizes for to the SDR cracking and corrosion cases.
landings and ¯ight hours beyond which too few groups On the basis of an analysis of the 1229 data ob-
resulted. The adjusted R2 is used because the number of servations for merged SDR and ARS data, it appears
predictors is changing for each alternative interval size. that the data grouping strategy results in SDR prediction
As discussed earlier, prediction models were developed models that can be used to predict expected reporting
from training data and evaluated on production data. pro®les for a representative DC-9. Con®dence intervals
Because the goal was to maximize the accuracy of the can be calculated for the expected number of SDRs per
SDR predictions, the MSE was used for comparative year so that a range of values can be reported along with
purposes. Although the MSE has some bias, it is an es- a point estimate. For example, Fig. 1 displays the re-
timator with very low variance. siduals and con®dence limits for the interval (CLI) that
Table 3 presents the `best' SDR regression models for includes the variation for both the mean and the error
the DC-9 aircraft comparing across grouping strategies, term. In essence, this ®gure graphically displays the
predictor variables, and outcome variable. The table also prediction interval for the overall number of SDRs
displays the squared partial correlation coecients that across all airlines for 95% con®dence. Such an approach
can be used to assess the relative importance of the dif- establishes control limits or threshold levels outside
ferent independent variables used in the regression mod- which national SDR advisory warnings would be posted.
els. The VIFs for the overall SDR and corrosion models To construct con®dence intervals for a particular age
are acceptable and suggest a relatively small degree of group for a given airline, it is necessary to consider the
multicollinearity. However, the VIF for the cracking number of aircraft in that age group owned by that
model suggests a moderate degree of collinearity, and this airline. SDR prediction models for each airline could be
model should be used with caution as the parameter es- developed by following the same grouping methodology
timates might not be stable. The application of the White as outlined above, but with the data partitioned by age
Test resulted in the acceptance of the null hypothesis of and airline. Such models were not developed in this
homoscedasticity at the 5% signi®cance level for all three study, as only 2 of 22 airlines had a sucient number of
models and suggests that the assumption of constant er- data observations by airline.
ror variances is reasonable. For the overall SDR predic- It appears that ungrouped SDR and ARS data are not
tion model, the White Test resulted in the following useful for prediction purposes. Grouped data strategies
regression equation for the regression residuals: show promise in predicting SDR pro®les based on the

Table 3. SDR multiple regression models

Dependent variable Grouped no of R2 C.V. MSE Independent Squared VIF


observations variables partial
correlation
Training Production Training Production coecient
data data data data
Overall no. of SDRs Agesq 0.9137 11.39
Grouping strategy: age 16 14 0.9297 19.40 0.1953 0.9219 Fhrsq 0.5202 11.19
Increment: 1 year Fhr / age 0.6070 3.52

No. of SDRs (cracking) Fhr / age 0.7593 124.81


Grouping strategy: 16 15 0.7899 6.69 0.0061 0.0265 Age ´ ldg 0.3764 124.81
¯ight hours
Increment: 4000 hours

No. of SDRs (corrosion) Agesq 0.9661 4.76


Grouping strategy: age 10 9 0.9780 12.29 0.0321 1.6918 Fhrsq 0.9333 9.87
Increment: 1 year Fhr / age 0.6962 4.32
96 Luxhùj et al.

siduals are unimodal and bell-shaped. Box plot diagrams


of the residuals for all models imply symmetrical data sets
and that the medians are nearly zero.
A limitation on these prediction models is that the
results presented in this paper are based on a relatively
small sample of merged DC-9 SDR and ARS data (i.e.,
1229 observations) for the period 1974±90. Generalizing
the results to other aircraft types should be done with
caution. The value or contribution of this study's ®ndings
exists in the methods and techniques used to identify the
factors in¯uencing the expected number of SDRs.

7. Neural network models


Fig. 1. Residual analysis and 95% con®dence limits for overall
SDR prediction model. Data sorted and grouped by age (in- A parallel CMAS research e€ort focused on the devel-
crement = 1 year). opment of neural networks to determine patterns in SDR
reporting. Emanating from research in Arti®cial In-
DC-9 analysis. These data grouping strategies generally telligence (AI) [20±22], neural networks attempt to si-
result in robust models that are useful in developing air- mulate the functioning of human biological neurons.
craft population pro®les. A plausible reason for the ap- Neural networks have been particularly useful in pattern
parent success of the grouping strategy is that computing recognition problems that involve capturing and learning
the average number of SDRs for an interval (i.e., age, complex underlying (but consistent) trends in data.
¯ight hours, number of landings) results in the dependent Neural networks are highly nonlinear and, in some cases,
variable becoming approximately normal owing to the are capable of producing better approximations than
Central Limit Theorem. multiple regression, which produces a linear approxima-
Of the three prediction models, the model to predict tion [23, 24]. Neural network learning supports incre-
the overall expected number of SDRs appears the `best'. mental updating and is easier to embed in an intelligent
It has the second highest R2 value (0.9297), a low degree decision system because batch processing is not required.
of multicollinearity, and low MSEs on both the training Although neural networks o€er an alternative to regres-
(0.1953) and production (0.9219)
p data. For the training sion that will determine functional relationships between
data, the magnitude of the MSE is low relative to the variables to predict an outcome measure, neural network
sample mean of 2.63 SDRs (i.e., ratiop = 0.168). For the outcomes lack a simple interpretation of results. For in-
production data, the ratio of the MSE to the sample stance, the modeling technique does not provide objective
mean
p of 2.92 SDRs is higher (i.e., ratio = 0.329). If the criteria to decide what set of predictors is more important
MSE > 0.33 ´ sample mean, then normality is not a for the prediction. Neural networks can also su€er from
reasonable assumption and additional distributional in- over®tting of the data and lack of prediction generality.
formation is needed to construct a useful con®dence in- The limitations of neural networks with respect to out-
terval. When compared with the overall SDR prediction liers, multicollinearity, and other problems inherent in
model, the model for corrosion cases has a larger R2 , but real world data have received scant attention.
there is a degradation of performance on p the production Backpropagation neural networks are the most com-
data based on MSE. The ratios of the MSE values to monly used neural network architectures. These neural
the sample means for the training and production data networks are especially good for pattern recognition. The
are 0.095 and 0.567, respectively. The model for corrosion initial program employs an analog, three-layer, back-
was built on the smallest number of averaged data, and propagation network. To develop a backpropagation
this could account for its degraded performance on the model, a training set of data patterns that consist of both
production data. The SDR prediction model for cracking inputs and the actual outputs observed must be devel-
cases has the smallest R2, however, it has the best per- oped. During training the neural network processes pat-
formance
p on the production data. The ratios of the terns in a two-step procedure. In the ®rst or forward
MSE values to the sample means are 0.064 and 0.092; phase of backpropagation learning, an input pattern is
thus, dispersion of the residuals around the mean is small. applied to the network, and the resulting activity is al-
When compared with the overall SDR prediction model, lowed to spread through the network to the output layer.
the smaller number of averaged observations used in The program compares the actual output pattern gener-
building the model for cracking could account for the ated for the given input with the corresponding training
smaller R2. Stem-and-leaf displays [19] for all models set output. This comparison results in an error for each
indicate that the shapes of the distributions for the re- neurode in the output layer. In the second, or backward,
Prediction of inspection pro®les for aging aircraft 97

phase, the error from the output layer is propagated back 6.2. It became necessary to transform the input data to
through the network to adjust the interconnection obtain neural nets that compare favorably with the re-
weights between layers. This learning process is repeated gression models. To summarize the early attempts with
until the error between the actual and desired output neural network modeling for SDR prediction, the results
converges to a prede®ned threshold [25]. with ungrouped data are not good across a variety of
In neural network modeling, R2 compares the accuracy architectures and di€erent learning parameters. This led
of the model with the accuracy of a trivial benchmark to a search for more re®ned modeling strategies.
model where the prediction is simply the mean of all the
sample patterns. A perfect ®t would result in an R2 of 1, a
very good ®t near 1, and a poor ®t near 0. If the neural 9. Creation of neural network models with
network model predictions are worse than one could `grouped' data
predict by just using the mean of the sample case outputs,
R2 will be 0. Although not precisely interpreted in the Neural network models for SDR prediction were also
same manner as the R2 in regression modeling, never- created with alternative data grouping strategies as pre-
theless, the R2 from neural network modeling can be used viously outlined and the same training and production
as an approximation when comparing model adequacy data sets as those used in the regression analysis. The
with a multiple regression model. SDR neural network models are presented in Table 4.
General Regression Neural Networks (GRNN) are Training times for the backpropagation models were in-
known for their ability to train on sparse data sets. It has signi®cant. Since model `®t' and prediction accuracy were
been found that GRNN gives better performance than deemed to be most important, R2 and MSE were used to
backpropagation for some problems [26]. It is particu- select the `best' neural network con®guration. The best
larly useful for problems involving continuous function data grouping strategies as determined from the regres-
approximation [26]. A GRNN network is a three-layer sion analysis were similarly applied in neural network
network that contains one hidden neuron for each modeling. These neural network models can be used to
training pattern. GRNN training di€ers from back- predict the average number of SDRs by using a data
propagation networks because training occurs in only one grouping strategy of one-year time increments for the
pass. A GRNN is capable of functioning after only a few overall number of SDRs and for the number of corrosion
training patterns have been entered. cases. To predict the average number of SDRs for
cracking cases, the data grouping strategy was based on
increments of 4000 ¯ight hours. In all cases, the MSE was
8. Initial results with neural networks lower on the training data than on the production data.
Especially note that although the neural network for the
The neural network models were developed with the corrosion case performed well on the training data
`NeuroShell 2' [27] computer program. The program (R2 = 0.9411, MSE = 0.086), the MSE on the production
implements several di€erent types of neural network data increased signi®cantly (MSE = 3.125). It should also
models. Initial model development focused on the use of be observed that the model for corrosion cases had the
backpropagation and general regression neural networks smallest number of training and production patterns de-
with the ungrouped data. rived from data groupings with the smallest number of
In the initial stage of neural network development it observations of the three models constructed. Thus this
was concluded that neural networks created with un- model should be used with caution on un®t data as it does
grouped data do not provide acceptable results. R2 values not appear to generalize well.
across seven alternative network architectures ranged As in regression modeling, 90% or 95% `con®dence
from 0.13 to 0.45 with MSE values ranging from 3.92 to intervals' could be developed for the overall number of

Table 4. SDR neural network models

Output No. of patterns Backpropagation (BP) modela Hybrid modelb


Training Production R2 MSE MSE R2 MSE MSE n
data data (training) (production) (training) (production)
Overall no. of SDRs 16 14 0.9452 0.152 0.541 0.9603 0.110 2.626 4
No. of SDRs (cracking) 16 15 0.6899 0.009 0.409 0.8404 0.005 0.019 2
No. of SDRs 10 9 0.9411 0.086 3.125 0.9727 0.040 3.502 3
(corrosion)
a
For all BP models, inputs are Age, Fhr, and Ldg.
b
For all hybrid models, inputs are Age, Fhr, Ldg, Class 1,¼,Class n, where n is the number of class intervals.
98 Luxhùj et al.

SDRs and the number of SDRs for cracking and corro- backpropagation neural network to predict the number
sion cases. These con®dence intervals could be displayed of SDRs. The second stage then feeds the classi®ed out-
in a fashion analogous to quality control charts serving as put along with the above quantitative data to a back-
more re®ned `alert' indicators for inspectors that specify propagation neural network to predict the number of
upper and lower safety control limits by aircraft type. SDRs. As with multiple regression, models were devel-
Burke et al. [28] report on a two-stage neural network oped to predict the overall number of SDRs and the
that models the relationship between the frequency of number of SDRs for cracking and corrosion cases. For
vibration of a beam and the correct control action for the SDR cracking and corrosion cases, only two and
minimizing that vibration. Initially, backpropagation was three `classes' were required, respectively, given the range
tried as a modeling technique; however, the modeling for the number of SDRs in each case.
results were poor. It has been documented [29] that neural In all SDR cases, prediction results with the hybrid
networks often try to `over®t' the training data. Burke models were better on the training data than from solely
et al. discovered that the opposite sometimes occurs when using a three-layer backpropagation architecture. How-
occasionally a way is needed to `wrinkle' the data surface ever, the MSEs from the production data improved only
to facilitate a mapping from the independent variable(s) in the cracking case. Further investigations are required
to the dependent variable. with larger data sets to determine the extent of the ben-
In the beam de¯ection problem, Burke et al. develop a e®ts of a two-stage approach, as the training time sig-
`granular decomposition' of the problem by transforming ni®cantly increases with the hybrid model.
it into a coarse or granular problem for the ®rst-stage
neural network and a re®ned problem for the second-
stage network. To coarsen the problem, it is transformed 10. Conclusions and recommendations
by having a ®rst-stage neural network classify input fre-
quency to the beam so that it corresponds to one of seven As a result of this research, new SDR alert indicators that
output classes of voltage. A backpropagation neural can have con®dence intervals have been developed to
network is then used to relate the vector of frequency and predict the number of SDRs in the overall, cracking, and
classes to the optimal voltage. Using the granular de- corrosion cases for the DC-9 aircraft. These re®ned in-
composition approach, Burke et al. report that an R2 dicators o€er signi®cant improvements over the planned
value of 0.99 is obtained for both the 160 pattern training SDR indicator, which simply represents a count of the
set or full 200 pattern training set relating frequency and total number of SDRs for any given period. For the
class to the optimal voltage. multiple regression analysis, the modeling strategy of
The concept of a two-stage hybrid neural network is grouping the data on the basis of age, ¯ight hours, or
tested in this research to develop SDR prediction models number of landings to predict the average number of
to determine whether any incremental improvements SDRs yielded better results than using the ungrouped
could be obtained in prediction accuracy. Table 4 also data. Initial results from ungrouped data with back-
summarizes the results from these hybrid neural net- propagation neural network models across a wide range
works. The ®rst stage uses a Probabilistic Neural Net- of hidden nodes, learning rates, momentum parameters,
work (PNN) to classify the age of a DC-9 aircraft into its and training strategy choices were poor. However, neural
corresponding `class' for the expected number of SDRs. network models with the data groupings from the re-
A PNN is a supervised neural network that is used to gression analysis achieved a marked improvement in
train quickly on sparse data sets [30]. This neural network SDR prediction results.
separates input patterns into some de®ned output cate- When a hybrid neural network model was created that
gories. In the process of training, the PNN clusters pat- used two stages, the SDR prediction accuracy on the
terns by producing activations in the output layer. The training data improved slightly for the models to predict
value of the activations correspond to the probability the overall number of SDRs and for cracking cases when
mass function estimate for that category. It was thought compared with a three-layer backpropagation archi-
that the use of a PNN is this study could be helpful in tecture. However, results with the production data were
`wrinkling' the SDR data and facilitate the classi®cation mixed. Three-layer backpropagation models were also
of SDRs based upon an input pro®le of aircraft data. developed with the independent variables determined
For the overall SDR prediction model, the PNN is from the regression analysis for each SDR case. However,
used in this study to classify the number of SDRs into one the prediction accuracy improved on the corrosion model
of four classes: class 1 for 0 £ S £ 2, class 2 for only for the training data, and improved with the pro-
2 < S £ 4, class 3 for 4 < S £ 6, and class 4 for duction data for the cracking model only when compared
6 < S £ 8, where S represents the number of SDRs. The with the backpropagation models using the original in-
PNN is used in the ®rst stage to classify the age of a DC-9 puts of age, ¯ight hours, and number of landings.
aircraft into its corresponding class for expected number The `best' models compared across modeling methods
of SDRs. This vector of age and class then is fed into a are identi®ed in Table 5. The `best' models for each case
Prediction of inspection pro®les for aging aircraft 99
Table 5. Summary of `best' SDR prediction models across methods

SDRs overall SDRs cracking SDRs corrosion


2 2 2
Modeling method R MSE R MSE R MSE
Training Training Production Training Training Production Training Training Production
data data data data data data data data data
Regression 0.9297 0.1953 0.9219b 0.7899 0.0061 0.0265b 0.9780 0.0321 1.6918a
Backpropagation 0.9452 0.1520 0.5410a 0.6899 0.0090 0.4090 0.9411 0.0860 3.1250b
neural network (BPNN)
BPNN (with predictor 0.9318 0.1890 1.0170 0.6918 0.0090 0.0290 0.9806 0.0290 4.7030
variables from
regression)
Hybrid neural network 0.9603 0.1100 2.6260 0.8404 0.0050 0.0190a 0.9727 0.0400 3.5020
a
First choice.
b
Second choice.

were selected on the basis of prediction accuracy with the grouping with new population characteristics. For a de-
production data. In the case to predict the overall number tailed discussion of preventive maintenance in population
of SDRs, the three-layer backpropagation model performs models, see Agee and Gallion [31].
the best. To predict the number of SDR cracking cases, the While the population concept is constructive for de-
two-stage hybrid neural network is selected, and a regres- veloping models to predict national norms for SDR re-
sion model is selected as the `best' method to predict the porting, there is a loss of information in grouping the
number of SDR corrosion cases. However, an analysis of data. It is recognized that SDR reporting pro®les will
Table 5 reveals that the regression models are strong sec- vary by di€erences in ¯ying patterns, airlines, location,
ond choices with respect to prediction accuracy. Moreover, ¯eet size, etc. Research is continuing with alternative
the regression models typically take less time to develop FAA data sources to identify the variability inherent in
than neural network models and there is a rich theory for these underlying factors and its contribution to under-
testing regression model adequacy. The modest improve- standing national SDR reporting pro®les. The current
ments in predictive accuracy from using a neural network methodology could be extended, for example, to develop
in this SDR study do not seem to support the extra `costs' of SDR prediction models for the DC-9 aircraft for peer
computational time and modeling e€ort required to ®nd a classes of airlines based on ¯eet size. The CMAS research
neural network that can outperform a regression model. team is gaining access to larger FAA data sources that
In this study, the information gained from regression will enable the construction of SDR prediction models by
analysis regarding the `best' data grouping strategies aircraft type and by airline. The de®nition of the classes
helped to improve the performance of a neural network could be re®ned to depend on the size of the population
model. However, the use of regression analysis to identify of aircraft of a given age owned by an individual airline.
the `best' set of explanatory variables to use as inputs to a This CMAS research has already been expanded to
neural network needs further investigation. include the use of neural networks to predict maintenance
In this study to develop national SDR prediction requirements for individual components of aging aircraft
models, the original ungrouped data set appeared to be [32] and to develop reliability curve ®tting techniques for
noisy. A `population concept' proved to be a very e€ec- constant and monotonically increasing hazard rates [33].
tive modeling technique both for regression analysis and The use of an expert system that uses Bayesian prob-
in the construction of neural networks for determining ability theory to handle uncertainty in safety diagnostic
strategic safety inspection indicators. An important procedures is also being investigated. The research ®nd-
technical issue in using population modeling techniques ings for the SDR prediction models coupled with the
has to do with failure prediction for parts that have been design of an expert system for inspection diagnostics lead
repaired or replaced. The failure rates of a new part in an to the development of an integrated decision support
old aircraft and new versus repaired parts will a€ect the system for FAA safety inspectors. Such a decision sup-
inherent characteristics of the aircraft population. One port system for aircraft safety could be used for inspec-
modeling approach is to assume that a repair returns the tion workload planning and scheduling, for monitoring
part to the condition that it was in just before failure, so speci®c part locations and for forecasting part removal
that the part can remain in the same original population. dates for repair, overhaul, etc., based upon an aircraft's
However, this assumption might not be realistic, and re- `inspection pro®le' as characterized by its associated po-
paired parts might need to be modeled with a separate pulation characteristics.
100 Luxhùj et al.

Acknowledgements [15] Luxhùj, J.T. (1991) Importance measures for system components
in population models. International Journal of Quality and Relia-
bility Management, 8 (2), 58±69.
We acknowledge the support of the Federal Aviation
[16] Luxhùj, J.T. (1992) Replacement analysis for components of large
Administration's Safety Performance Analysis System scale production systems. International Journal of Production
(SPAS) program and Mr John Lapointe and Mr Michael Economics, 27, 97±110.
Vu. In addition, we are grateful for the comments and [17] Pindyck, R.S. and Rubinfeld, D.L. (1991) Econometric Models
suggestions from Professor Candace Yano. and Economic Forecasts, McGraw-Hill, New York.
[18] Hair, J.F., Anderson, R.E., Tatham, R.L. and Black, W.C. (1992)
This article is based on research performed at Rutgers Multivariate Data Analysis, 3rd edn, Macmillan, New York.
University. The contents of this paper re¯ect the view of [19] Hoaglin, D.C., Mosteller, F. and Tukey, J.W. (1983) Under-
the authors who are solely responsible for the accuracy of standing Robust and Exploratory Data Analysis, John Wiley and
the facts, analyses, conclusions, and recommendations Sons, New York.
presented herein, and do not necessarily re¯ect the ocial [20] McCulloch, W.S. and Pitts, W. (1943) A logical calculus of ideas
immanent in nervous activity. Bulletin of Mathematical Bio-
view or policy of the Federal Aviation Administration. physics, 5, 115±133.
[21] Hop®eld, J.J. (1982) Neural networks and physical systems with
emergent collective abilities. Proceedings of the National Academy
of Science, 79, 2554±2558.
References [22] Hop®eld, J.J. (1984) Neurons with graded response have collective
computational properties like those of two-state neurons. Pro-
[1] Ascher, H.E. and Feingold, H. (1984) Repairable Systems Relia- ceedings of the National Academy of Science, 81, 3088±3092.
bility (Lecture Notes in Statistics, vol. 7), Marcel Dekker, New [23] Simpson, P. (1990) Arti®cial Neural Systems, Pergamon Press,
York. New York.
[2] Safety Performance Analysis Subsystem (1992) Functional De- [24] Wasserman, P. (1989) Neural Computing: Theory and Practice,
scription Document, U.S. Department of Transportation, Volpe Van Nostrand Reinhold, New York.
National Transportation Systems Center, Cambridge, MA, [25] Caudil, M. (1991), Neural network training tips and techniques,
March. AI Expert, 6 (1), 56±61.
[3] Safety Performance Analysis Subsystem (1992) Prototype Con- [26] Specht, D. (1991) A general regression neural network. IEEE
cept Document, U.S. Department of Transportation, Volpe Na- Transactions on Neural Networks, 2 (6), 568±576.
tional Transportation Systems Center, Cambridge, MA, April. [27] `NeuroShell 2' (1993) Ward Systems Group, Frederick, MD.
[4] Safety Performance Analysis Subsystem (1992) Indicators Sug- [28] Burke, L.I., Vaithyanathan, S. and Flanders, S.W. (1993) A hy-
gested for SPAS Prototype (DRAFT), U.S. Department of brid neural network approach to beam vibration minimization.
Transportation, Volpe National Transportation Systems Center, Technical Paper, Department of Industrial Engineering, Lehigh
Cambridge, MA, August. University.
[5] Safety Performance Analysis Subsystem (1992) Continuing Ana- [29] Weigend, A., Huberman, B.A. and Rumelhart, D.E. (1990) Pre-
lysis: Indicator Graphs and Tables, U.S. Department of Trans- dicting the future: a connectionist approach. International Journal
portation, Volpe National Transportation Systems Center, of Neural Systems, 1 (3), 193±210.
Cambridge, MA, October. [30] Specht, D. (1990) Probabilistic neural networks. Neural Networks,
[6] Safety Performance Analysis Subsystem (1992) Continuing Ana- 3, 109±118.
lysis: Additional Indicator De®nitions (DRAFT), U.S. Depart- [31] Agee, M.H. and Gallion, M.S. (1986) Simulation of population
ment of Transportation, Volpe National Transportation Systems maintenance requirements. Volume II of a Final Report, Research
Center, Cambridge, MA, October. Contract No. N00039-84-C-0346 entitled Algorithmic Develop-
[7] Rice, R.C. (1991) Repair database assessment. Battelle Summary ment and Testing of Spare Parts Mortality/Support Systems
Report (Contract no. DTRS-57-89-C-00006). Software, Naval Electronics System Command.
[8] Brammer, K.W. (1985) A transient state maintenance require- [32] Shyur, H.-J., Luxhùj, J. T. and Williams, T. P. (1996) Using
ments planning model. M.S. thesis, Virginia Polytechnic Institute neural networks to predict component inspection requirements for
and State University. aging aircraft. Computers and Industrial Engineering, 30(2), 257±
[9] Fabrycky, W.J. (1981) Logistics systems design using ®nite 267.
queueing analysis, in Proceedings of the International Logistics [33] Luxhùj, J.T. and Shyur, H.-J. (1995) Reliability curve ®tting for
Congress, pp. II-65±II-71. aging helicopter components. Reliability Engineering and System
[10] Fabrycky, W.J., Malmborg, C.J., Moore, T.P. and Brammer, Safety, 46, 229±234.
K.W. (1984) Repairable equipment population systems (REPS)
demonstrator user's guide (IBM-PC). Virginia Polytechnic In-
stitute and State University.
[11] Frisch, F. (1993) Mortality and spareparts: a conceptual analysis, Biographies
in Proceedings of the 1983 Federal Acquisition Research Sympo-
sium, pp. 467±480. James T. Luxhùj is Associate Professor of Industrial Engineering at
[12] Luxhùj, J.T. and Jones, M.S. (1988) A computerized population Rutgers University. He completed his Ph.D. in Industrial Engineering
model for system repair/replacement. Computers and Industrial and Operations Research from Virginia Polytechnic Institute and State
Engineering, 14 (3), 345±359. University. Dr Luxhùj serves as a Department Editor for IIE Trans-
[13] Luxhùj, J.T. and Rizzo, T.P. (1988) Probabilistic spares provi- actions on Operations Engineering, as an IIE Faculty Advisor, and is a
sioning for population models. Journal of Business Logistics, 9, senior member of IIE. He is a past Director of the Engineering Econ-
(1), 95±117. omy Division of IIE. He is a recipient of the SAE Teetor Award for
[14] Luxhùj, J.T. (1991) Sensitivity analysis of maintained systems Engineering Education Excellence. His research interests include sys-
using a population model: a case study. International Journal of tems maintenance and reliability, production economics, and intelligent
Quality and Reliability Management, 8 (1), 56±70. decision systems. He is a member of Tau Beta Pi and Alpha Pi Mu.
Prediction of inspection pro®les for aging aircraft 101
Trefor P. Williams is Associate Professor of Civil Engineering at Huan-Jyh Shyur received his Ph.D. in Industrial Engineering from
Rutgers University. He received his Ph.D. in Civil Engineering from Rutgers University. He has a B.S. degree in Industrial Engineering
Georgia Institute of Technology. Dr Williams's research interests in- from Tunghai University, Taiwan. He received his M.S. degree in In-
clude the application of arti®cial intelligence to transportation and dustrial and Systems Engineering from National Chiao-Tung Uni-
highway engineering problems, construction management, and trac versity, Taiwan. His research interests include accelerated life testing,
engineering. He is a member of the American Society of Civil Engineers quality and reliability, nonparametric methods, neural network mod-
and is a registered professional engineer. els, and aircraft safety. He is employed with Crown Communications,
an FAA contractor.

You might also like