https://sites.google.com/site/journalofcomputing/volume-2-issue-8-august-2010
As the number of object-oriented software systems increases, it becomes more important for organizations to maintain those systems effectively. However, currently only a small number of maintainability prediction models are available for object oriented systems. In this paper, we develop an extreme learning machine (ELM) maintainability prediction model for object-oriented software systems. The model is based on extreme learning machine algorithm for single-hidden layer feed-forward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. The model is constructed using popular object-oriented metric datasets, collected from different object-oriented systems. Prediction accuracy of the model is evaluated and compared with commonly used regression-based models and also with Bayesian network based model which was earlier developed using the same datasets. Empirical results from the simulation show that our ELM based model produces promising results in terms of prediction accuracy measures that are better than most of the other earlier implemented models on the same datasets.

Attribution Non-Commercial (BY-NC)

324 views

Extreme Learning Machine as Maintainability Prediction model for Object-Oriented Software Systems

https://sites.google.com/site/journalofcomputing/volume-2-issue-8-august-2010
As the number of object-oriented software systems increases, it becomes more important for organizations to maintain those systems effectively. However, currently only a small number of maintainability prediction models are available for object oriented systems. In this paper, we develop an extreme learning machine (ELM) maintainability prediction model for object-oriented software systems. The model is based on extreme learning machine algorithm for single-hidden layer feed-forward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. The model is constructed using popular object-oriented metric datasets, collected from different object-oriented systems. Prediction accuracy of the model is evaluated and compared with commonly used regression-based models and also with Bayesian network based model which was earlier developed using the same datasets. Empirical results from the simulation show that our ELM based model produces promising results in terms of prediction accuracy measures that are better than most of the other earlier implemented models on the same datasets.

Attribution Non-Commercial (BY-NC)

- Analysis of relationship between road safety and road design parameters of four lane National Highway in India
- SW-CMM
- 12 Reliability QA Standards
- Regression Testing Techniques
- Very important book
- Excel functions of Decile and Percentile
- Software Metrics: Some degree of software measurement and analysis
- ISO9KvsCMM
- Analysis of Street Level Air Quality along traffic corridors of Lagos Metropolis
- A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network Variants
- A Practical Model for Measuring Maintain Ability
- Cmm
- Optimal Release Time: Numbers or Intuition?
- Lecture 1
- CMM vs ISO
- Maintain Ability in Software Engineering
- Software Project Management
- Software Project Management
- A Manual Method for Applying the Hansch Approach to Drug Design
- Rfit

You are on page 1of 8

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 49

Object-Oriented Software Systems

S. O. Olatunji*1, Z. Rasheed*2, K.A. Sattar*3, A. M. Al-Mana*4, M. Alshayeb*5, E.A. El-Sebakhy#6

*

Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia

#Senior Scientist of Artificial Intelligence & Data Mining in Business and Health Science, MEDai Inc. an Elsevier Company, Millenia Park One,

4901 Vineland Road, Suite 450, Orlando, Florida 32811, USA

Abstract— As the number of object-oriented software systems increases, it becomes more important for organizations to maintain

those systems effectively. However, currently only a small number of maintainability prediction models are available for object

oriented systems. In this paper, we develop an extreme learning machine (ELM) maintainability prediction model for object-

oriented software systems. The model is based on extreme learning machine algorithm for single-hidden layer feed-forward neural

networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. The model is

constructed using popular object-oriented metric datasets, collected from different object-oriented systems. Prediction accuracy of

the model is evaluated and compared with commonly used regression-based models and also with Bayesian network based model

which was earlier developed using the same datasets. Empirical results from the simulation show that our ELM based model pro-

duces promising results in terms of prediction accuracy measures that are better than most of the other earlier implemented mod-

els on the same datasets.

Index Terms— Software maintainability, Extreme Learning Machines, Bayesian Network, Regression

—————————— ——————————

model for an object-oriented software system based on the

I. INTRODUCTION recently introduced learning algorithm called extreme learn-

ing machine (ELM) for single-hidden layer feed-forward

Software maintainability is defined as the ease of finding

neural networks (SLFNs) which randomly chooses hidden

and correcting errors in the software [1]. It is analogous to

nodes and analytically determines the output weights of

the hardware quality of Mean-Time-To-Repair, or MTTR.

SLFNs. In theory, this algorithm tends to provide good ge-

While there is as yet no way to directly measure or predict

neralization performance at extremely fast learning speed.

software maintainability, there is a significant body of

The experimental results, found in literatures, based on a

knowledge about software attributes that make software eas-

few artificial and real benchmark function approximation

ier to maintain. These include modularity, self (internal)

and classification problems including very large complex

documentation, code readability, and structured coding

applications, and particularly the empirical results from this

techniques. These attributes also improve sustainability, the

study, demonstrated that the ELM can produce good genera-

ability to make improvements to the software [1].

lization performance in most cases and can learn thousands

It is arguable that many object-oriented (OO) software

of times faster than conventional popular learning algo-

systems are currently in use. It is also arguable that the

rithms for feed-forward neural networks.

growing popularity of OO programming languages, such as

Despite the importance of software maintenance, It is un-

Java, as well as the increasing number of software develop-

fortunate that little work has been done as regards develop-

ment tools supporting the Unified Modeling Language

ing predictive models for software maintainability, particu-

(UML), encourages more OO systems to be developed at

larly object-oriented software system, which is evident in the

present and in the future. Hence it is important that those

fewer number of software maintainability prediction mod-

systems are maintained effectively and efficiently. A soft-

els, that are currently available in the literature.

ware maintainability prediction model enables organizations

We have developed a maintainability prediction model for

to predict maintainability of a software system and assists

an object-oriented software system based on the recently

them to manage and plan their maintenance resources.

introduced learning algorithm called extreme learning ma-

In addition, if an accurate maintainability prediction mod-

chine (ELM). Implementation was carried out on representa-

el is available for a software system, a defensive design can

tive datasets related to the target systems. Furthermore, we

be adopted. This would reduce future maintenance effort of

performed comparative analysis between our model and the

the system. Maintainability of a software system can be

models presented by Koten and Gray [5], which include

measured in different ways. Maintainability could be meas-

Regression-Based and Bayesian Network Based models, in

ured as the number of changes made to the code during a

terms of their performance measures values, as recommend-

maintenance period or be measured as effort to make those

ed in the literatures.

changes [1][5]. The predictive model is called a maintenance

Furthermore, the usefulness of the Extreme Learning Ma-

effort prediction model if maintainability is measured as

chines in the area of software engineering and, in particular,

effort.

maintainability prediction for an object-oriented software

In this research we developed a maintainability prediction

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 50

system, has been made clearer by describing both the steps sion-based techniques for some datasets, the model is still

and the use of Extreme Learning Machines as an artificial inferior to regression-based techniques for some other data-

intelligence modeling approach for predicting the maintai- sets.

nability of object-oriented software system. Zhou and Leung used Multivariate Adaptive Regression

The rest of this paper is organized as follows. Section II Splines (MARS) to predict object oriented software maintai-

contains review of related earlier works. Section III dis- nability [9]. In their research work, they compared MARS

cusses background information that includes software main- with multivariate linear regression (MLR), artificial neural

tainability, some prediction modeling techniques, and also networks (ANN), regression trees (RT), and support vector

describes the main modeling technique used: extreme learn- machines (SVM). However, their MARS-based technique

ing machine (ELM). Section IV presents the OO software does not outperform the compared techniques for all datasets

data sets used in our study and their description. It also con- they used.

tains data description, data analysis based on skewness, and

types of skewness. Section V presents the research ap- III. BACKGROUND

proach. Section VI contains model evaluation, prediction This section will talk about the software maintainability

accuracy measures, empirical results, comparison with other and its types, previously held models used to predict soft-

models and discussions. Section VII concludes the paper and ware maintainability and the novel approach called Extreme

outlines directions for future work. Learning Machine used in this context.

Software maintainability is the process of modification of

Many object oriented software maintainability prediction

a software product after delivery to correct faults, to improve

models were developed in the last decade; however, most of

performance or other attributes, or to adapt the product to a

them suffer from low prediction accuracies [9]. In fact, most

changed environment [6]. Maintaining and enhancing the

of the research work found in the literature related to soft-

reliability of software during maintenance requires that

ware maintainability prediction of object oriented software

software engineers understand how various components of a

products either make use of statistical models or artificial

design interact. People usually think of software mainten-

neural networks, though some recent work have been done

ance as beginning when the product is delivered to the

using some other artificial intelligence techniques such as

client. While this is formally true, in fact decisions that af-

Bayesian Belief Networks (BBN) [5], and Multivariate

fect the maintenance of the product are made from the earli-

adaptive regression splines (MARS) [9].

est stage of design.

Regression techniques have been thoroughly utilized to

Software maintenance is classified into four types: correc-

predict maintainability of object oriented software systems

tive, adaptive, perfective and preventive [1]. Corrective

[7] [17].

maintenance refers to fixing a program. Adaptive mainten-

In addition, several types of artificial neural networks

ance refers to modifications that adapt to changes in the data

were also employed in predicting the maintainability effort

environment, such as new product codes or new file organi-

of object oriented software systems. Ward neural network

zation or changes in the hardware of software environments.

and General Regression neural network (GRNN) was used

Perfective maintenance refers to enhancements: making the

to predict the maintainability effort for object oriented soft-

product better, faster, smaller, better documented, cleaner

ware systems using object oriented metrics [13]. On the oth-

structured, with more functions or reports. The preventive

er hand, Back-Propagation Multi-Layer Perceptron (BP-

maintenance is defined as the work that is done in order to

MLP) has been used to predict faulty classes in object

try to prevent malfunctions or improve maintainability.

oriented software [11]. In the same research work, they used

When a software system is not designed for maintenance,

radial basis function networks (RBF) to predict the type of

it exhibits a lack of stability under change. A modification in

fault a faulty class has.

one part of the system has side effects that ripple throughout

Bayesian Belief Networks (BBN) was suggested as a

the system. Thus, the main challenges in software mainten-

novel approach for software quality prediction by Fenton et

ance are to understand existing software and to make

al. [12] and Pearson [20]. They build their conjecture based

changes without introducing new bugs.

on Bayesian Belief Networks’ ability in handling uncertain-

ties, incorporating expert knowledge, and modeling the B. Regression Based Models

complex relationships among variables. However, a number Regression models are used to predict one variable from

of researchers [2][10] have introduced several limitations one or more other variables. Regression models provide the

with Bayesian Belief Networks when they are applied as a scientist with a powerful tool, allowing predictions about

model for object oriented software quality and maintainabili- past, present, or future events to be made with information

ty perdition. Recently, Koten and Gray used a special type about past or present events.

of Bayesian Belief Networks called Naive-Bayes classifier

[5] to implement a Bayesian-Belief-Networks-based soft- 1) Multiple Linear Regression Model

ware maintainability prediction model. Although their re- Multiple linear regression attempts to model the relation-

sults showed that their model give better results than regres- ship between two or more explanatory variables and a re-

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 51

sponse variable by fitting a linear equation to observed data. correspond to direct probabilistic influences. The RVs cor-

Every value of the independent variable x is associated with respond to important attributes of the modeled system which

a value of the dependent variable y. The regression line for p exemplifying the system’s behavior. Directed connection

explanatory variables x1 , x2 ,......, x p is defined to between the two nodes indicates a casual effect between

RVs which associated with these nodes.

be y 0 1 x1 2 x2 .... p x p . This line de- The structure of directed acyclic graph states that each

node is independent of all its non descendants conditioned

scribes how the mean response y changes with the expla-

on its parent nodes. In other words, the Bayesian Network

natory variables. The observed values for y vary about their represents the conditional probability distribution

means y and are assumed to have the same standard devi- P(Y/X1,…,Xn) which is used to quantify the strength of

variables Xi on the variable Y, Nodes Xi are called the par-

ation . Formally, the model for multiple linear regression ents of Y and Y is called a child of each Xi. This should be

given n observations is noted that outcomes of the events for the variables Xi have

yi 0 1 xi1 2 xi 2 .... p xip i an influence on the outcome of the event Y.

for i = 1,2,….n D. Extreme Learning Machine

In general, the learning rate of feed-forward neural net-

where i is notation for model deviation. works (FFNN) is time-consuming than required. Due to this

One approach to simplifying multiple regression equa- property, FFNN is becoming bottleneck in their applications

tions is the stepwise procedures. These include forward se- limiting the scalability of them. According to [4], there are

lection, backwards elimination, and stepwise regression. two main reasons behind this behavior, one is slow gradient

They add or remove variables one-at-a-time until some based learning algorithms used to train neural network (NN)

stopping rule is satisfied. and the other is the iterative tuning of the parameters of the

networks by these learning algorithms. To overcome these

2) Forward selection problems, [2][4] proposes a learning algorithm called ex-

Forward selection starts with an empty model. The varia- treme learning machine (ELM) for single hidden layer feed-

ble that has the smallest P value when it is the only predictor forward neural networks (SLFNs) which randomly selected

in the regression equation is placed in the model. Each sub- the input weights and analytically determines the output

sequent step adds the variable that has the smallest P value weights of SLFNs. It is stated that “In theory, this algorithm

in the presence of the predictors already in the equation. tends to provide the best generalization performance at ex-

Variables are added one-at-a-time as long as their P values tremely fast learning speed” [4].

are small enough, typically less than 0.05 or 0.10. This is extremely good as in the past, it seems that there

exists an unbreakable virtual speed barrier which classic

3) Backward elimination learning algorithms cannot go break through and therefore

It starts with all of the predictors in the model. The varia- feed-forward network implementing them take a very long

ble that is least significant that is, the one with the largest P time to train itself, independent of the application type

value is removed and the model is refitted. Each subsequent whether simple or complex. Also ELM tends to reach the

step removes the least significant variable in the model until minimum training error as well as it considers magnitude of

all remaining variables have individual P values smaller than weights which is opposite to the classic gradient-based

some value, such as 0.05 or 0.10. learning algorithms which only intend to reach minimum

training error but do not consider the magnitude of weights.

4) Stepwise regression Also unlike the classic gradient-based learning algorithms

This approach is similar to forward selection except that which only work for differentiable activation functions ELM

variables are removed from the model if they become non learning algorithm can be used to train SLFNs with non-

significant as other predictors are added. differentiable activation functions. According to [4], “Unlike

Backwards elimination has an advantage over forward se- the traditional classic gradient-based learning algorithms

lection and stepwise regression because it is possible for a facing several issues like local minimum, improper learning

set of variables to have considerable predictive capability rate and over-fitting, etc, the ELM tends to reach the solu-

rather than any individual subset. Forward selection and tions straightforward without such trivial issues”.

stepwise regression will fail to identify them because some- The ELM has several interesting and significant features

times variables don't predict well individually and Backward different from traditional popular gradient-based learning

elimination starts with everything in the model, so their joint algorithms for feed forward neural networks: These include:

predictive capability will be seen. The learning speed of ELM is extremely fast. In simula-

C. Bayesian Networks

A Bayesian network consists of nodes interconnected by

the directed links forming directed acyclic graph. In this

graph, nodes represent random variables (RVs) and links

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 52

T

βi2, … , βim] is the weight vector that connects the ith neu-

ron and the output neurons, and bi is the threshold of the ith

hidden neuron. The “.” in wi . xj means the inner product of

wi and xj.

SLFN aims to minimize the difference between oj and tj.

This can be expressed mathematically as:

~

N

g (w . x

i 1

i i j bi ) t j , j 1, ... , N

Or, more compactly, as:

Hβ=T

where

g (w1 . x1 b1 ) ... g (wN~ . xN~ b N~ )

tions reported by Huang et al. [2], the learning phase of . .

ELM can be completed in seconds or less than seconds for

H(w1 , ..., wN~ , b1 , ..., bN~ , x1 , ..., xN ) . ... .

many applications. Previously, it seems that there exists a

. .

virtual speed barrier which most (if not all) classic learning

(

1 N 1

g w . x b ) ... g ( wN

~ . x N b )

N

~ N N~

algorithms cannot break through and it is not unusual to take

very long time to train a feed-forward network using classic

learning algorithms even for simple applications. 1T T1T

The ELM has better generalization performance than the . .

gradient-based learning such as back propagation in most

cases. ,β= . and T = .

The traditional classic gradient-based learning algorithms . .

and some other learning algorithms may face several issues T T

like local minima, improper learning rate and over fitting, N~ N~ m T N~ N m

etc. In order to avoid these issues, some methods such as As proposed by Huang and Babri (1998), H is called the

weight decay and early stopping methods may need to be neural network output matrix.

used often in these classical learning algorithms. The ELM

tends to reach the solutions straightforward without such According to Huang et al. (2004), the ELM algorithm

trivial issues. The ELM learning algorithm looks much works as follows:

simpler than most learning algorithms for feed-forward Given a training set

neural networks.

Unlike the traditional classic gradient-based learning al-

N xi , t i | xi R n , t i R m , i 1, ..., N ,

gorithms which only work for differentiable activation func- ~

activation function g(x), and hidden neuron number = N ,

tions, as easily observed the ELM learning algorithm could do the following:

be used to train SLFNs with many non-differentiable activa- Assign random value to the input weight wi and the bias

tion functions [3]. bi,

~

1) How Extreme Learning Machine Algorithm i = 1, … , N

Works Find the hidden layer output matrix H.

Let us first define the standard SLFN (single-hidden layer Find the output weight β as follows:

feed-forward neural networks). If we have N samples (xi, ti), β = H†T

where xi = [xi1, xi2, … , xin]T Rn and ti = [ti1, ti2, … , tim]T where β, H and T are defined in the same way they were

~ defined in the SLFN specification above.

Rn, then the standard SLFN with N hidden neurons and

activation function g(x) is defended as:

~

N

g (w . x

i 1

i i j bi ) o j , j 1, ... , N ,

where wi = [wi1, wi2, … , win]T is the weight vector that con-

nects the ith hidden neuron and the input neurons, βi = [βi1,

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 53

DATASET METRICS DESCRIPTION

In this work, we made use of OO software datasets pub-

lished by Li and Henry [7]. The datasets consist of five

Name Description

C&K metrics: DIT, NOC, RFC, LCOM and WMC, and four

L&H metrics: MPC, DAC, NOM and SIZE2, as well as Depth of the inheritance tree (=

SIZE1, which is a traditional lines of code size metric. DIT inheritance level number of the

Those metric data were collected from a total of 110 classes class, 0 for the root class)

in two OO software systems: User Interface Management Number of children (= number of

System (UIMS) and Quality Evaluation System (QUES). NOC direct sub-classes that the class

The codes were written in Classical − AdaTM. The UIMS has)

and QUES datasets contain 39 classes and 71 classes, re- Message-passing coupling(=

MPC number of send statements de-

spectively. Li and Henry measure maintainability in

fined in the class)

CHANGE metric by counting the number of lines in the

Response for a class (= total of

code, which were changed during a three-year maintenance

the number of local methods and

period. Neither UIMS nor QUES datasets contain actual RFC

the number of methods called by

maintenance effort data. The same datasets are also used by

local methods in the class)

other researchers [5][6][7]. The description of each metric is

Lack of cohesion of methods (=

shown in Table I. number of disjoint sets of local

LCOM methods, i.e. number of sets of

local methods that do not interact

with each other, in the class)

Data abstraction coupling (=

DAC number of abstract data types

defined in the class)

Weighted method per class (=

sum of McCabe’s cyclomatic

WMC

complexity of all local methods

in the class)

Number of methods (= number

NOM

of local methods in the class)

Lines of cod (= number of semi-

SIZE1

colons in the class)

Number of properties (= total of

the number of attributes and the

SIZE2

number of local methods in the

class)

b fl h d h

B. Types of Skewness:

Usually there are three types of skewness; Right, normal

and left skewness [21]. The distribution is said to be right-

skewed if the right tail is longer and the mass of the distribu-

A. Data Analysis based on Skewness tion is concentrated on the left. Similarly the distribution is

said to be left-skewed if the left tail is longer and the mass of

Definition:

the distribution is concentrated on the right. Finally if the

Skewness is a measure of symmetry, or more precisely,

skewness is nearly equal to zero, it is normally distributed

the lack of symmetry. A distribution, or data set, is symme-

throughout the range. Acceptable range for normality is

tric if it looks the same to the left and right of the center

skewness lying between -1 to 1. Figure 1 shows the three

point. Skewness tells us about the direction of variation of

types of skewness (right: skew >0, normal: skew ~ 0 and

the data set.

left: skew <0).

Mathematical Expression:

In our experiment, we have generated the skew graphs of

The skewness of random variable X is defined as:

some of our dataset fields. Two of the observations are

Xi

3

1 N

shown in figure 2 and figure 3.

Skewness

N

i 1 3

where Xi is the random variable,

is the mean, is the

V. RESEARCH APPROACH

For each data set, the available data was divided into two

parts. One part was used as a training set, for constructing a

standard deviation and N is the total length of random variable.

maintainability prediction model. The other part was used

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 54

for testing to determine the prediction ability of the devel- mean magnitude of relative error (MMRE):

oped model. Although there are many different ways to split

a given dataset, we have chosen to use the stratify sampling 1 n

approach in breaking the datasets due to its ability to break MMRE MREi

n i 1

data randomly with a resultant balanced division based on

the supplied percentage. The division, for instance could be

70% for training set and 30% for testing set. In this work, According to [14], Pred is a measure of what proportion

we selected 70% of the data for building the model (internal of the predicted values have MRE less than or equal to a

validation) and 30% of the data for testing/ validation (ex- specified value, given by:

ternal validation or cross-validation criterion). We repeated

both internal and external validation processes for 1000 Pred (q) = k / n

times to have a fair partition through the entire process oper- where q is the specified value, k is the number of cases

ations. whose MRE is less than or equal to q and n is the total num-

It was ensured that the same percentage of division are ber of cases in the dataset.

used, for each of the two dataset, in order to maintain the According to [15] and [8], in order for an effort prediction

comparability of prediction accuracy of the two datasets by model to be considered accurate, MMRE < 0.25 and/or ei-

using the same proportion of the sample cases for learning ther pred(0.25) > 0.75 or pred(0.30) > 0.70. These are the

and testing. suggested criteria in literature as far as effort prediction is

We also evaluated and compared our developed model concerned.

with other OO software maintainability prediction models,

cited earlier, quantitatively, using the prediction accuracy B. Empirical Results, Comparison and Discus-

measures recommended in the literatures: absolute residual sion

(Ab.Res.), the magnitude of relative error (MRE) and the

As stated earlier, In order to conduct a valid comparison,

proportion of the predicted values that have MRE less than

our model, ELM, was obtained by training it on exactly the

or equal to a specified value suggested in the literatures

same training set and evaluated on the same testing set

(pred measures). Details of all these measures of perfor-

samples as used in the previous works cited earlier, particu-

mance are provided in next section.

larly as contained in [5]. Below are tables and figures show

the results of our newly developed model in comparison to

VI. MODEL EVALUATION

the other earlier models used on the same data set.

A. Prediction accuracy measures

1) Results from QUES dataset

We compared the software maintainability prediction

Table II shows the values of the prediction accuracy

models using the following prediction accuracy measures:

measures achieved by each of the maintainability prediction

absolute residual (Abs Res), the magnitude of relative error

models for the QUES dataset. Recall as quoted earlier that,

(MRE) and Pred measures.

in order for an effort prediction model to be considered ac-

curate, MMRE < 0.25 and/or either pred(0.25) > 0.75 or

The Ab.Res. is the absolute value of residual evaluated

pred(0.30) > 0.70, [15][8]. Hence the closer a model value is

by:

to these baseline values, the better. Since Table II shows that

the Extreme learning machine model has achieved MMRE

Ab.Res. = abs ( actual value − predicted value )

value of 0.3502, the pred(0.25) value of 0.368 and the

pred(0.30) value of 0.380. Thus, the ELM is the only one

We used the sum of the absolute residuals (Sum Ab.Res.),

that is very close to the required value for MMRE which is

the median of the absolute residuals (Med.Ab.Res.) and the

4.918, hence it is the best in term of MMRE and also in term

standard deviation of the absolute residuals (SD Ab.Res.).

of the sum absolute residuals values. It is very close to satis-

The Sum Ab.Res. measures the total residuals over the data-

fying the criterion of an accurate prediction in term of

set. The Med.Ab.Res. measures the central tendency of the

MMRE. In term of other measures, it competes favorably

residual distribution. The Med.Ab.Res. is chosen to be a

with the other models.

measure of the central tendency because the residual distri-

In comparison with the UIMS dataset, the MMRE value

bution is usually skewed in software datasets. The SD

of 0.3502 is better, while the pred(0.25) and pred(0.30) val-

Ab.Res. measures the dispersion of the residual distribution.

ues are poorer. This indicates that the performance of the

MRE is a normalized measure of the discrepancy between

extreme learning machine models may vary depending on

actual values and predicted values given by

the characteristics of the dataset and/or depending on what

MRE = abs ( actual value − predicted value ) / actual val-

prediction accuracy measure is used.

ue

The Max.MRE measures the maximum relative discre-

pancy, which is equivalent to the maximum error relative to

the actual effort in the prediction. The mean of MRE, the

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

TABLE II

WWW.JOURNALOFCOMPUTING.ORG 55

PREDICTION ACCURACY FOR THE QUES DATASET

Model Max. MMRE Pred Pred Sum Med. SD

MRE (0.25) (0.30) Ab.Res. Ab.Res. Ab.Res.

Bayesian

1.592 0.452 0.391 0.430 686.610 17.560 31.506

Network [5]

Regression

2.104 0.493 0.352 0.383 615.543 19.809 25.400

Tree [5]

Backward

1.418 0.403 0.396 0.461 507.984 17.396 19.696

Elimination [5]

Stepwise

1.471 0.392 0.422 0.500 498.675 16.726 20.267

Selection [5]

Extreme Learn-

ing 1.803 0.3502 0.368 0.380 56.122 28.06 22.405

Machine

TABLE III

PREDICTION ACCURACY FOR THE UIMS DATASET

MRE (0.25) (0.30) Ab.Res. Ab.Re Ab.Res.

s

Bayesian

7.039 0.972 0.446 0.469 362.300 10.550 46.652

Network [5]

Regression

9.056 1.538 0.200 0.208 532.191 10.988 63.472

Tree [5]

Backward

11.890 2.586 0.215 0.223 538.702 20.867 53.298

Elimination [5]

Stepwise

12.631 2.473 0.177 0.215 500.762 15.749 54.114

Selection [5]

Extreme Learning

4.918 0.968 0.392 0.450 39.625 18.768 16.066

Machine

Going by the sum absolute residuals values, we can see 3) Discussion

that there is strong evidence that the Extreme learning ma- With the exception of extreme learning machine that has

chine model’s value is significantly lower and thus, better values closer to satisfying one of the criteria; none of the

than those of the other models. maintainability prediction models presented get closer to

satisfying any of the criteria of an accurate prediction model

cited earlier. However, it is reported that prediction accura-

2) Results from UIMS dataset cy of software maintenance effort prediction models are

Table III shows the values of the prediction accuracy often low and thus, it is very difficult to satisfy the criteria

measures achieved by each of the maintainability prediction [16].

models for the UIMS dataset. Table III shows that the Ex- Thus, we conclude that extreme learning machine model

treme learning machine model has achieved the MMRE val- presented in this paper can predict maintainability of the OO

ue of 0.968, the pred(0.25) value of 0.392 and the pred(0.30) software systems reasonably well to an acceptable degree.

value of 0.450. Those values are one of the best among all This work shows that only extreme learning machine model

the five models presented in table III. Specifically, in term has been able to consistently perform better by having val-

of MMRE values, it is the best among all the models and we ues closer to satisfying one of the criteria laid down in litera-

can see that there is strong evidence that the Extreme learn- ture, MMRE, for both data set. For both QUES and UIMS

ing machine model’s value is significantly lower and thus, datasets, whenever the extreme learning machine model’s

better than those of the other models. In term of pred(0.25) prediction accuracy has not been as good as the other mod-

and pred(0.30), it is the second best model after Bayesian els, it has been reasonably close. In terms of absolute resi-

network. In addition, it is also the best in the absolute resi- duals, ELM is better than other models for both datasets.

dual measures. The values of the absolute residuals have

again confirmed strong evidence that the differences of the

Extreme learning machine model from the other models are VII. CONCLUSION

significant.

An extreme learning machine OO software maintainabili-

Thus, it is concluded that the Extreme learning machine

ty prediction model has been constructed using the OO

model is able to predict maintainability of the UIMS dataset

software metric data used by Li and Henry [7]. The predic-

better than the other models presented.

tion accuracy of the model is evaluated and compared with

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 56

the Bayesian network model, regression tree model and the [8] S.G. MacDonell. Establishing relationships between specification size

and software process effort in case environment. Information and

multiple linear regression models using the prediction accu- Software Technology, 39:35–45, 1997.

racy measures: the absolute residuals, MRE and pred meas- [9] Y. Zhou and H. Leung, "Predicting object-oriented software maintai-

ures. The results indicate that extreme learning machine nability using multivariate adaptive regression splines," The Journal

of Systems and Software, vol. 80, 2007, pp. 1349-1361

model can predict maintainability of the OO software sys- [10] Kamaldeep Kaur, Arvinder Kaur, and Ruchika Malhotra "Alternative

tems. For both datasets, the extreme learning machine mod- Methods to Rank the Impact of Object Oriented Metrics in Fault Pre-

el achieved significantly better prediction accuracy, in term diction Modeling using Neural Networks," Proceedings Of World

of MMRE, than the other models, as it was closer to satisfy- Academy Of Science, Engineering And Technology, vol. 13, 2006,

pp. 207-212.

ing one of the criteria, a fit, which none of the other models [11] S.Kanmani, V. Rhymend Uthariaraj, V. Sankaranarayanan, and P.

have been able to achieve. Also, for both QUES and UIMS Thambidurai, "Object Oriented Software Quality Prediction Using

datasets, whenever the extreme learning machine model’s General Regression Neural Networks," ACM SIGSOFT Software En-

gineering Notes, vol. 29, 2004, pp. 1-5.

prediction measure accuracy has not been as good as the [12] N.E. Fenton, P. Krause, and M. Neil, “Software Measurement: Uncer-

best among the models, it has been reasonably competitive tainty and Causal Modeling,” IEEE Software, vol. 10, no. 4, 2002, pp.

against the best models. 116-122.

[13] M. M. T. Thwin and T.-S. Quah,, “Application of Neural Networks

Therefore, we conclude that the prediction accuracy of the for predicting Software Development faults using Object Oriented

extreme learning machine model is better than, or at least, is Design Metrics”, Proceedings of the 9th International Conference on

competitive against the Bayesian network model and the Neural Information Processing, November 2002, pp. 2312 – 2316.

[14] N.E. Fenton and S.L. Pfleeger. Software Metrics:A Rigorous & Prac-

regression based models. These outcomes have confirmed tical Approach. PWS Publishing Company, second edition, 1997.

that extreme learning machine is indeed a useful modeling [15] S.D. Conte, H.E. Dunsmore, and V.Y. Shen. Software Engineering

technique for software maintainability prediction, although Metrics and Models. Benjamin/Cummings Publishing Company,

1986.

further studies are required to realize its full potentials as [16] A. De Lucia, E. Pompella, and S. Stefanucci. Assessing effort estima-

well as reducing its shortcomings if not totally eradicated. tion models for corrective maintenance through empirical studies. In-

The results in this paper also suggest that the prediction formation and Software Technology, 47:3–15, 2005.

accuracy of the extreme learning machine model may vary [17] F. Fioravanti and P. Nesi, "Estimation and Prediction Metrics for

Adaptive Maintenance Effort of Object-Oriented Systems'', IEEE

depending on the characteristics of dataset and/or the predic- Transactions on Software Engineering, vol. 27, no. 12, 2001, pp.

tion accuracy measure used. This provides an interesting 1062–1084.

direction for future studies. Another interesting direction [18] Qin-Yu Zhu, A.K. Qin, P.N. Suganthan, Guang-Bin Huang, Evolutio-

nary extreme learning machine, Elsevier, Pattern Recognition 38

would be using the other variants of extreme learning ma- (2005) 1759 – 1763

chine such as Evolutionary extreme learning machine [18] [19] Ming-Bin Li, Guang-Bin Huang_, P. Saratchandran, N. Sundararajan,

and Fully complex extreme learning machine [19] for soft- Fully complex extreme learning machine, Elsevier, Neurocomputing

68 (2005) 306–314

ware effort prediction. [20] N.E. Fenton and M. Neil, “A Critique of Software Defect Prediction

Research,” IEEE Transactions on Software Engineering, vol. 25, no.

5, 1999, pp. 675–689.

[21] PEARSON K., Contributions to the Mathematical Theory of Evolu-

Acknowledgment tion,-II. Skew Variation in Homogeneous Material. Phil. Trans. Roy.

The authors would like to thank the anonymous review- Soc. London (A.) 1895,186,343-414.

ers for their constructive comments. The authors also ac-

knowledge the support of King Fahd University of Petro-

leum and Minerals in the development of this work.

REFERENCES

[1] Liguo Yu, S.R. Schach, Kai Chen, “Measuring the maintainability of

Open Source Software” IEEE 2005

[2] Guang-Bin Huang_, Qin-Yu Zhu, Chee-Kheong Siew, “Extreme

learning machine: Theory and applications”, Neurocomputing 70

(2006) 489–501, ELSEVIER.

[3] G.-B. Huang, Q.-Y. Zhu, K.Z. Mao, C.-K. Siew, P. Saratchandran, N.

Sundararajan, “Can threshold networks be trained directly?”, IEEE

Trans. Circuits Syst. II 53 (3) (2006) 187–191.

[4] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, “Extreme learning machine: a

new learning scheme of feedforward neural networks”, in: Proceed-

ings of International Joint Conference on Neural Networks

(IJCNN2004), 25–29 July, 2004, Budapest, Hungary.

[5] Chikako Van Koten, Andrew Gray,” An Application of Bayesian

Network for Predicting Object-Oriented Software Maintainability”,

The Information Science, Discussion Paper Series, 2006.

[6] Rikard Land, "Measurements of Software Maintainability IEEE

Standard Glossary of Software Engineering Terminology, report IEEE

Std 610.12-1990, IEEE, 1990.

[7] W. Li and S. Henry. Object-oriented metrics that predict maintainabil-

ity, Journal of Systems and Software, 23:111–122, 1993.

- Analysis of relationship between road safety and road design parameters of four lane National Highway in IndiaUploaded byRavi Shenker
- SW-CMMUploaded byapi-3738458
- 12 Reliability QA StandardsUploaded byapi-3775463
- Regression Testing TechniquesUploaded byJournal of Computing
- Very important bookUploaded byMadi Shimizu
- Excel functions of Decile and PercentileUploaded byAshish Adholiya
- Software Metrics: Some degree of software measurement and analysisUploaded byijcsis
- ISO9KvsCMMUploaded byapi-3738458
- Analysis of Street Level Air Quality along traffic corridors of Lagos MetropolisUploaded byDr. C.O AKANNI
- A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network VariantsUploaded byidescitation
- A Practical Model for Measuring Maintain AbilityUploaded byrohnish7
- CmmUploaded byapi-3712319
- Optimal Release Time: Numbers or Intuition?Uploaded byhans3910
- Lecture 1Uploaded bySajal Aggarwal
- CMM vs ISOUploaded byG.C.Reddy
- Maintain Ability in Software EngineeringUploaded byHarshal Raut
- Software Project ManagementUploaded byapi-3819901
- Software Project ManagementUploaded byapi-3774345
- A Manual Method for Applying the Hansch Approach to Drug DesignUploaded byBen Josey
- RfitUploaded bysdsdfsds
- Regression AnalysisUploaded bymeetwithsanjay
- sessional 1Uploaded bysadamkhan
- Chap 4 Probability&Statics ECUploaded byMidhun Babu
- Business Research Excel DataUploaded bysyed usman wazir
- Hansen Aise Im Ch03Uploaded byRoristua Pandiangan
- 17.pdfUploaded byMickey Koen
- stata_2016Uploaded byJelly Berry
- 1Uploaded byHakinlabi Babatunde Hamed
- 02-tar3.pdfUploaded byGabin Lenge
- pptUploaded byAravind Desilva Saladi

- Business Process: The Model and The RealityUploaded byJournal of Computing
- Hybrid Network Coding Peer-to-Peer Content DistributionUploaded byJournal of Computing
- Applying a natural intelligence pattern in cognitive robotsUploaded byJournal of Computing
- Mobile Search Engine Optimization (Mobile SEO): Optimizing Websites for Mobile DevicesUploaded byJournal of Computing
- Product Lifecycle Management Advantages and ApproachUploaded byJournal of Computing
- Exploring leadership role in GSD: potential contribution to an overall knowledge management strategyUploaded byJournal of Computing
- Complex Event Processing - A SurveyUploaded byJournal of Computing
- A Compact Priority based Architecture Designed and Simulated for Data Sharing based on Reconfigurable ComputingUploaded byJournal of Computing
- Divide and Conquer For Convex HullUploaded byJournal of Computing
- Detection and Estimation of multiple far-field primary users using sensor array in Cognitive Radio NetworksUploaded byJournal of Computing
- Image Retrival of Domain Name system Space Adjustment TechniqueUploaded byJournal of Computing
- Using case-based decision support systems for accounting choices (CBDSS): an experimental investigationUploaded byJournal of Computing
- Impact of Facebook Usage on the Academic Grades: A Case StudyUploaded byJournal of Computing
- QoS Aware Web Services Recommendations FrameworkUploaded byJournal of Computing
- Decision Support Model for Selection of Location Urban Green Public Open SpaceUploaded byJournal of Computing
- Analytical Study of AHP and Fuzzy AHP TechniquesUploaded byJournal of Computing
- Energy Efficient Routing Protocol Using Local Mobile Agent for Large Scale WSNsUploaded byJournal of Computing
- Towards A Well-Secured Electronic Health Record in the Health CloudUploaded byJournal of Computing
- Real-Time Markerless Square-ROI Recognition based on Contour-Corner for Breast AugmentationUploaded byJournal of Computing
- Hiding Image in Image by Five Modulus Method for Image SteganographyUploaded byJournal of Computing
- Combining shape moments features for improving the retrieval performanceUploaded byJournal of Computing
- K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer LearningUploaded byJournal of Computing
- Predicting Consumers Intention to Adopt M-Commerce Services: An Empirical Study in the Indian ContextUploaded byJournal of Computing
- Application of DSmT-ICM with Adaptive decision rule to supervised classification in multisource remote sensingUploaded byJournal of Computing
- Arabic documents classification using fuzzy R.B.F classifier with sliding windowUploaded byJournal of Computing
- When Do Refactoring Tools Fall ShortUploaded byJournal of Computing
- Detection of Retinal Blood Vessel using Kirsch algorithmUploaded byJournal of Computing
- Secure, Robust, and High Quality DWT Domain Audio Watermarking Algorithm with Binary ImageUploaded byJournal of Computing
- Overflow Detection Scheme in RNS Multiplication Before Forward ConversionUploaded byJournal of Computing
- Impact of Software Project Uncertainties over Effort Estimation and their Removal by Validating Modified General Regression Neural Network ModelUploaded byJournal of Computing

- The Effect of Technology on a Students Motivation and KnowledgeUploaded byNishath
- VB Note for StudentsUploaded byBhagya Thilakaratne
- Illuminance Level RequiredUploaded byKhurram Shahzad Rizwi
- 02- Barrel Design (6)Uploaded byAnonymous SMCiCF
- Placement ListUploaded bysunilbhol
- תקשורת ספרתית- הרצאה 6 | מסננת מתואמת, איפנון בסיסUploaded byRon
- Menu_635641019064208796_PG 2015 Information Brochure - 08 April 2015Uploaded byAnirwan Bhargav
- e Types StrategyFormulation&Implementation Merged _GL v0.1Uploaded byGiorgiLobjanidze
- 2002 Thesis HanoofUploaded byMohammed Almusawi
- ECG With DRL-TexIn_slau516Uploaded byLulu Sweet Thing
- SatelliteUploaded byChristian Emmanuel Tolentino
- Branch IO FinalUploaded byh_abdo
- 00649608Uploaded byNilesh Mahajan
- Staten Solar Brief ProfileUploaded byRakesh Singh
- A Good Practical Training ReportUploaded byMuhd Shahir Salleh
- MMI3G Software Update 22 Presentation Part OneUploaded byeduardorojas007
- Opemana Final PaperUploaded byKennethInui
- Xgb AnalogUploaded bycucu
- CastIron.pptUploaded bySanthoshsharma Devaraju
- Samsung Wisenet Smt 3233 Data SheetUploaded byJohn Carlos Montes Gutierrez
- ijtra140708.1Uploaded byAkshay Kumar Pandey
- Lab ReportUploaded byAnita Burford
- 1000 Solved Problems in Fluid k Subram 20347920Uploaded byRhiscptreohLtusocan
- Skjulstad--J---Circuit-bending-as-an-aesthetic-phenomenon.pdfUploaded byJulio Kaegi
- Information System GuidelinesUploaded byShantel Mucheke
- Action Plan on Echo Training on Nutritional Assessment- ILIGAN CITYUploaded byVon Tatil
- fb_structuredText.pdfUploaded byalejandrobarillas
- courseoutlinesp2016 (2)Uploaded byali
- Build Your Own Superheterodyne ReceiverUploaded byHaggai Ben Baruch
- Ford Motor Company Supply Chain StrategyUploaded byyolekdm