You are on page 1of 17

​ TABLE OF CONTENTS

1.​ ​List of Tables​………………………………………………………………………………...1


2. ​ ​List of figures​………………………………………………………………………………..1
3. Abstract​…………………………………………………………………………………….. 2
4. Introduction​………………………………………………………………………………….3
​5.Literature Review​…………………………………………………………………………...4
​6.Research Objective​…………………………………………………………………………..6
7.Methodology​……………………………………………………………………………….…7
7.1 Support Vector Machine(SVM).......................................................................................7
7.2 Working of Support Vector Machine(SVM)....................................................................7
7.3 ​Metrics and Datasets​…………………………………………………………………...12
8. Conclusion​………………………………………………………………………………....…14
9. References​……………………………………………………………………………...…….15

.
LIST OF TABLES

1. Metrics Definition…………………………………………………………………...13

LIST OF FIGURES

1. Figure1: Graphical representation of Support Vector Machine(SVM)........................7


2. ​Figure2: Identifying the right hyper-plane…………………………………………....8
3. ​Figure3:Choosing right hyper-plane from A,B,C…………………………………….8
4. Figure4:Margin for hyper-plane C is higher as compared to both A and B…………..9
5. Figure5: Hyper-plane A has no classification error…………………………………...9
6. ​Figure6: Difficult to segregate the two classes using a straight line……………….....10
7. Figure7: SVM is robust to outliers…………………………………………………….10
8. Figure8: We can’t have linear hyper-palne between the two classes………………….11
9. Figure9: Solving problem by Kernel Trick…………………………………………....11

1
ABSTRACT

Software maintainability is one of the most important aspects while evaluating the quality of the
software product. It is defined as the ease with which a software system or component can be
modified to correct faults, improve performance or other attributes or adapt to a changing
environment. Tracking the maintenance behaviour of the software product is very complex. This
is precisely the reason that predicting the cost and risk associated with maintenance after delivery
is extremely difficult which is widely acknowledged by the researchers and practitioners. In an
attempt to address this issue quantitatively, the main purpose is to propose the use of few
machine learning algorithms with an objective to predict software maintainability and evaluate
them.
There are several issues related to software maintenance but a more important critical one is
tracking over the behaviour of software maintenance. This is because inferring the knowledge
about the maintenance of software products in advance is really a difficult process. In this report,
an attempt has been made to use subset of class-level object-oriented metrics in order to predict
software maintainability. Here, a different subset of Object-Oriented software metrics has been
considered to provide requisite input data to design the models for predicting maintainability
using Support Vector Machine(SVM). This technique is applied to estimate maintainability on a
dataset collected from two different case studies such as Quality Evaluation System (QUES) and
User Interface System (UIMS).

2
INTRODUCTION

Software Maintainability is a design quality attribute which described as the ability of the system
to undergo changes with a degree of ease, these changes could impact components, features and
interfaces when changing the functionality or fixing errors. Software maintainability is a key
quality attribute of the software systems. Maintainability is one of the important quality attributes
which can result in decreasing the cost of the software. There is a belief that software with high
quality is easy to maintain so that minimum time and effort need to fix the faults and this will
lead to decent maintainability and easily maintainable software. Developing software with no
changes in future is really not possible and it is very sensitive to cost. The significant quality of
developing software is it can be maintained easily and it leads to less resource usage both in
terms of monetary and efforts. During the development phase, the software metric plays an
important role in controlling the cost of software maintenance. The earlier research work on
software maintenance agreed that the metrics can be used for predicting the software
maintainability and the reason for it is given below:
● This will assist the project manager to evaluate the cost and productivity between other
projects.
● Usage of valuable resources can be well planned using the information provided by this
prediction system.
● Allocation of staff can also be efficiently handled by the manager with the aid of it.
● The effort of further maintenance can be kept under control to reach the least
maintenance cost.
● It facilitates the software developers to recognize the quality of software, therefore, they
can get better design and coding.
● It facilitates practitioners to progress the systems software quality and thus minimize the
maintenance costs.

3
LITERATURE REVIEW

Various methods have been proposed in the past to reduce the cost of software maintenance
using object-oriented metrics. Here, few of those studies have been discussed.

John et al [1] presented a set of metrics are proposed to quantify and measure these attributes.
The proposed complexity metrics are used to determine the difficulty in implementing changes
through the measurement of method complexity, method diversity, and complexity density.

The mostly employed approach for estimating software effort is the multivariate linear regression
technique which has numerous shortcomings and motivates the exploration of many machine
learning techniques. Marico et al [2] proposed a technique by employing an evolutionary
algorithm to generate a decision tree tailored to a software effort data set provided by a large
worldwide IT company. Their findings show that evolutionarily induced decision trees
statistically outperform greedily induced ones, as well as traditional logistic regression.

The software complexity metric is one of the measurements that use some of the internal
attributes or characteristics of software to know how they affect the software quality. Yahya et al
[3], covered some of the more efficient software complexity metrics such as Cyclomatic
complexity, line of code and Halstead complexity metrics. In their work, they presented their
impacts on software quality.

In 2005, Thwin and Quah [4] used neural networks to predict software quality using
object-oriented metrics.

In 2005, Misra [5] used linear regression to predict software maintenance effort.

In 2010, A. Kaur, K. Kaur and Malhotra [6] used the artificial neural network, fuzzy inference
system (FIS), and adaptive neuro fuzzy inference system (ANFIS) approaches to predict
software maintainability.

In 2010, [7] MO Elish and KO Elish used the TreeNet technique to predict software
maintainability.

In the study performed by Ruchika and Anuradha [8] an attempt has been made to evaluate and
examine the effectiveness of prediction models for the purpose of software maintainability using
real-life web-based projects. Three models using Feed Forward 3-Layer Back Propagation
Network (FF3 LBPN), General Regression Neural Network (GRNN) and Group Method of Data

4
Handling (GMDH) are developed and performance of GMDH is compared against two others
i.e. FF3 LBPN and GRNN. With the aid of this empirical analysis, they suggest safely that
software professionals can use OO metric suite to predict the maintainability of software using
GMDH technique with least error and best precision in an object-oriented paradigm.

Probabilistic Neural Networks (PNN) is a feed-forward neural network created by Ibrahim [9]. It
is based on the Bayesian network and Kernel Fisher discriminant analysis. In a PNN, the
operations are organized into a multilayered feed-forward network. The first layer is the input
layer where one neuron is present for each independent variable. The next layer is the hidden
layer. This layer contains one neuron for each set of training data. It not only stores the values of
each predictor variable but also stores each neuron along with its target value. Next is the pattern
layer. In PNN networks one pattern neuron is present for each category of the output variable.
The last layer is the output layer. At this layer weighted votes for each target, category is
compared and selected.PNN is known for its ability to train quickly on sparse datasets as it
separates data into a specified number of output categories. The network produces activations in
the output layer corresponding to the probability density function estimate for that category.

The highest output represents the most probable category. GRNN is a modification of
Probabilistic Neural Network (PNN) for regression problems. It is a one-pass learning algorithm
with a highly parallel structure and provides a smooth transition from one observed value to
another even with sparse data in a multidimensional measurement space [10]. If the relationship
between independent variables and dependent variables is very complex and not linear in nature,
this modified form of regression is a perfect solution. It is very fast in learning and converges to
the optimal regression surface as the number of samples becomes very large and allows learning
from previous outcomes​.

5
RESEARCH OBJECTIVE

The biggest irony of the software industry is that the largest cost associated with any software
product over its lifetime is actually its maintenance cost. Most suggested approaches by all
researchers for controlling maintenance costs is to utilize software metrics during the
development phase. The result shows in almost all the studies that the prediction accuracy of one
model is more accurate on one dataset but is less accurate for another dataset. Although a
number of maintainability prediction models have been developed in the last two decades, they
have low prediction accuracies according to the criteria suggested by Conte et al. Therefore, it is
necessary to explore new techniques, which are not only easy to use but also provide high
prediction accuracy for the purpose of maintainability prediction.
The maintenance effort data is obtained from Object-Oriented software data sets published by Li
and Henry. The Software Systems are User Interface System (UIMS) and Quality Evaluation
System (QUES) are chosen for computing the maintenance effort. The software system, UIMS
have 39 and QUES have 69 classes. The data was collected by the author over the past three
years. The maintainability of software is measured by the number of lines changed per class.
The Neural Network models act as efficient predictors of dependent and independent variables
due to its character in modelling where they possess the ability to model complex functions. In
this approach, Support Vector Machine algorithm is used for updating the weight during the
learning phase.

6
METHODOLOGY

Support Vector Machine(SVM)


“Support Vector Machine” (SVM) is a supervised ​machine learning algorithm which can be used for
both classification or regression challenges. However, it is mostly used in classification problems. In
this algorithm, we plot each data item as a point in n-dimensional space (where n is the number of
features) with the value of each feature being the value of a particular coordinate. Then,classification
is performed by finding the hyper-plane that differentiate the two classes very well as shown below:

​Figure1: Graphical representation of Support Vector Machine(SVM)

Support Vectors are simply the coordinates of individual observation. Support Vector Machine is
a frontier which best segregates the two classes (hyper-plane/ line).

Working of Support Vector Machine(SVM):

Identifying the right hyperplane-

● Identify the right hyper-plane (Scenario-1): ​Here, we have three hyper-planes (A, B
and C). Now, identify the right hyper-plane to classify star and circle.

7
 

​ ​Figure2: Identifying the right hyper-plane

Thumb rule to identify the right hyper-plane: “Select the hyper-plane which
segregates the two classes better”. In this scenario, hyper-plane “B” has
excellently performed this job.

● Identify the right hyper-plane (Scenario-2): ​Here, we have three hyper-planes (A,
B and C) and all are segregating the classes well. Now, How can we identify the right
hyper-plane?

​Figure3:Choosing right hyper-plane from A,B,C

8
Here, maximizing the distances between nearest data point (either class) and hyper-plane
will help us to decide the right hyper-plane. This distance is called as ​Margin​.The margin
for hyper-plane C is high as compared to both A and B. Hence, we name the right
hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher
margin is robustness. If we select a hyper-plane having low margin then there is a high
chance of miss-classification.

​Figure4:Margin for hyper-plane C is higher as compared to both A and B

● Identify the right hyper-plane (Scenario-3):​Use the rules as discussed in previous


section to identify the right hyper-plane

Figure5: Hyper-plane A has no classification error

We seem to have selected the hyper-plane ​B ​as it has higher margin compared to ​A. ​But,
here is the catch, SVM selects the hyper-plane which classifies the classes accurately
prior to maximizing margin. Here, hyper-plane B has a classification error and A has
classified all correctly. Therefore, the right hyper-plane is ​A.

9
● Can we classify two classes (Scenario-4)?: ​Below, It is difficult to segregate the two
classes using a straight line, as one of stars lies in the territory of other(circle) class as an
outlier.

​Figure6: Difficult to segregate the two classes using a straight line

SVM has a feature to ignore outliers and find the hyper-plane that has maximum margin.
Hence, we can say, SVM is robust to outliers.

​Figure7: SVM is robust to outliers

● Find the hyper-plane to segregate to classes (Scenario-5): ​In the scenario below, we
can’t have linear hyper-plane between the two classes, so how does SVM classify these
two classes? Till now, we have only looked at the linear hyper-plane.

10
​Figure8: We can’t have linear hyper-palne between the two classes

SVM can solve this problem. Easily! It solves this problem by introducing additional
features. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on
axis x and z:

​Figure9: Solving problem by Kernel Trick

In the above plot, points to consider are:

● All values for z would be positive always because z is the squared sum of both x
and y

11
● In the original plot, red circles appear close to the origin of x and y axes, leading
to lower value of z and star relatively away from the origin result to higher value
of z.

In SVM, it is easy to have a linear hyper-plane between these two classes. But, another
burning question which arises is, should we need to add this feature manually to have a
hyper-plane. No, SVM has a technique called the ​kernel trick. These are functions which
takes low dimensional input space and transform it to a higher dimensional space i.e. it
converts not separable problem to separable problem, these functions are called kernels.
It is mostly useful in non-linear separation problem. Simply put, it does some extremely
complex data transformations, then find out the process to separate the data based on the
labels or outputs you’ve defined.

​Metrics and Datasets

We would be working two commercially available datasets i.e.Quality Evaluating System


(QUES) and ​User Interface System (UIMS) for Software Maintainability
Prediction(SMP) using Artificial Intelligence Technique such as Support Vector Machine
(SVM).

Datasets:

In our study we use two most popular object-oriented maintainability datasets which are
also published by Li and Henry [11]: UIMS and QUES datasets. Their Descriptive
statistics is given in Table -1 and Table - 2 followed by the interpretation. These datasets
were chosen mainly because they have been recently used by many researchers to
evaluate the performance of their proposed model in predicting object-oriented software
maintainability. The UIMS dataset contains class-level metrics data collected from 39
classes of a user interface management system, whereas the QUES dataset contains the
same metrics collected from 71 classes of a quality evaluation system. Both systems were
implemented in Ada. Both datasets consist of eleven class-level metrics: ten independent
variables and one dependent variable.

The independent variables are taken as follows:

(i) Five variables are taken from Chidambar et al. [12] : WMC, DIT, NOC, RFC, and
LCOM;

12
(ii) Four variables are taken from Li and Henry [11, 13 ]: MPC, DAC, NOM, and SIZE2;

(iii) One variable is taken from traditional lines of code metric (SIZE1).

(iv) The dependent variable is a maintenance effort surrogate measure (CHANGE), which
is the number of lines in the code that were changed per class during a 3-year maintenance
period. A line change could be an addition or a deletion. A change in the content of a line is
counted as a deletion and an addition.

​ Table1: Metrics Definition

Metrics Definition

WMC (Weighted Methods per Class) The sum of McCabe’s cyclomatic complexities of
all local methods in a class.

DIT (Depth of Inheritance Tree) The depth of a class in the inheritance tree where
the root class is zero

NOC (Number of Children) The number of child classes for a class. It counts
number of immediate subclasses of a class in a
hierarchy

RFC (Response For a Class) The number of local methods plus the number of
non local methods called by local methods.

LCOM (Lack of Cohesion of Methods) The number of disjoint sets of local methods. Each
method in a disjoint set shares at least one instance
variable with at least one member of the same set

MPC (Message Passing Coupling) The number of messages sent out from a class.

DAC (Data Abstraction Coupling) The number of instances of another class declared
within a class.

NOM (Number of Methods) The number of methods in a class.

SIZE1 (Lines of code) The number of lines of code excluding comments

SIZE2 (Number of properties) The total count of the number of data attributes and
the number of local methods in a class.

CHANGE (Number of lines changed) The number of lines added and deleted in a class,
change of content is counted as two.

13
 

C
​ ONCLUSION

Artificial Intelligence Technique i.e.Support Vector Machine algorithm is used for the
purpose of prediction of software maintainability.The goal of our study is to construct
suitable model using SVM for the prediction of object oriented software maintainability
which are not only easy to apply but could reduce the prediction errors to minimum.Thus,
it is concluded that SVM model is indeed a very useful modeling technique and it could
be used as a sound alternative for the prediction of software maintainability.

As the proposed model was found suitable for estimating the software reliability in an
earlier work, therefore with current findings it can be safely presumed that both the
reliability and maintainability, which remain indispensable components of software
quality, can possibly be predicted with application of same model i.e.SVM. This will
certainly reduce the challenges involved with prediction of maintainability and assists
software developers to strategically utilize their resources, enhance process efficiency
and optimize the associated maintenance costs.

As the current study was based on two commercial software datasets UIMS and QUES,
therefore the authors are actively involved in carrying out further work that applies the
proposed SVM model to more objective datasets to ascertain its authenticity for wider
software paradigms. Such studies would allow us to investigate the capability of SVM
model and finally establishing a generalized model in the field of Software Quality.
Further research is planned in an attempt to combine SVM model with other data mining
techniques so as to develop prediction models which can estimate the maintainability of
software more accurately with least precision errors.

14
​REFERENCES

[1] ​ ​John Michura, Miriam A. M. Capretz, and Shuying Wang, Extension of Object-Oriented
Metrics Suite for Software Maintenance, Software Engineering, Volume 2013 (2013).
[2] Marcio P. Basgalupp , Rodrigo C. Barros , Duncan D.Ruiz, Predicting software maintenan-
ce effort through evolutionary-based decision trees, SAC '12 Proceedings of the 27th Ann-
ual ACM Symposium on Applied Computing pp 1209-1214, March 2012
[3] Yahya Tashtoush, Mohammed Al Maolegi, Bassam Arkok, The Correlation among
Software
Complexity Metrics with Case Study, International Journal of Advanced Computer Research
(ISSN (print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-2 Issue 15 5 June-
2014 414
[4] M. Thwin and T. Quah, "Application of neural networks for software quality prediction
using object oriented metrics," Journal of Systems and Software, vol. 76, no. 2,pp 147-
.156, 2005.
[5] S. Misra, "Modeling design/coding factors that drive maintainability of software system,
systems,”Software Quality Journal, vol. 13, no. 3, pp. 297-320, 2005.
[6] A Kaur, K Kaur, R Malhotra, “Soft Computing Approaches for Prediction of Software
Maintenance Effort”, International Journal of Computer Applications),vol 1, no. 16, pp :
0975 – 8887, 2010.
[7] MO. Elish and KO. Elish “Application of TreeNet in predicting object-oriented software
maintainability: A comparative study,” European Conference on Software Maintenance and
Reengineering, pp 1534-5351, DOI 10.1109/CSMR
[8] Ruchika Malhotra, Anuradha Chug, Application of Group Method of Data Handling model
for software maintainability prediction using object oriented systems,International Journal
of System Assurance Engineering and Management, ISSN 0975-6809Volume 5Number 2,
2014
[9] M.M. Ibrahiem, E.l. Emary and S. Ramakrishnan, “On the Application of Various Probabili-
Stic Neural Networks in Solving Different Pattern Classification Problems”, World Applied
Sciences Journal , vol. 4, no. 6, pp. 772-780, 2008, ISSN 1818- 4952.
[10]Martin CL , Applying a general regression neural network for predicting development effort
of short-scale programs. Neural Comput Appl, 20:389–401, 2011
[11]W. Li and S. Henry, "Object-Oriented Metrics that Predict Maintainability," Journal of
Systems and Software,vol. 23, no 2, pp. 111-122, 1993.
[12]S. Chidamber and C. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE
Transactions on Software Engineering, vol. 20, no. 6, pp. pp. 476-493, June 1994
[13]W. Li, “Another Metric Suite for Object-oriented Programming”, The Journal of System and

15
and Software, vol 44, no : 2, pp 155–162, December 1998.

16

You might also like