Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
A Hybrid Classifier approach for Software Fault Prediction

A Hybrid Classifier approach for Software Fault Prediction

Ratings: (0)|Views: 11|Likes:
Published by surendiran
Software fault prediction is the process of identifying the faulty modules in software, which plays a major role
in the quality of the software. In this paper a hybrid classifier is proposed for software fault prediction. This proposed
hybrid classifier is a combination of Linear Discriminant Analysis (LDA) and Neural Network (NN). NASA’s public
datasets KC1 and PC1 available at promise software engineering repository are used to test the proposed model. Here
apart from the existing features of the dataset, LDA score is introduced as an additional feature for the neural network
classifier. The proposed hybrid classifier LDA-NN shows improvement in accuracy of prediction, which plays a vital
role to improve the quality of soft wares in future.
Software fault prediction is the process of identifying the faulty modules in software, which plays a major role
in the quality of the software. In this paper a hybrid classifier is proposed for software fault prediction. This proposed
hybrid classifier is a combination of Linear Discriminant Analysis (LDA) and Neural Network (NN). NASA’s public
datasets KC1 and PC1 available at promise software engineering repository are used to test the proposed model. Here
apart from the existing features of the dataset, LDA score is introduced as an additional feature for the neural network
classifier. The proposed hybrid classifier LDA-NN shows improvement in accuracy of prediction, which plays a vital
role to improve the quality of soft wares in future.

More info:

Categories:Types, Presentations
Published by: surendiran on Sep 30, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/19/2015

pdf

text

original

 
 IJITMR
Vol.5 No.2 July-December 2013, pp.54-63© Serials Publications, (India)
54
 A Hybrid Classifier approach for Software Fault Prediction
C. Akalya Devi
1
and B. Surendiran
 2
 
1
 Assistant Professor, Department of Information Technology,PSG College of Technology, Coimbatore, India.
2
 Associate Professor (Sr. Gr), Department of Information Technology, Info Institute of Engineering, Coimbatore, India. Email:-akalya.jk@gmail.com, surendir@gmail.com
 Abstract:-
Software fault prediction is the process of identifying the faulty modules in software, which plays a major rolein the quality of the software. In this paper a hybrid classifier is proposed for software fault prediction. This proposed hybrid classifier is a combination of Linear Discriminant Analysis (LDA) and Neural Network (NN). NASA’s publicdatasets KC1 and PC1 available at promise software engineering repository are used to test the proposed model. Hereapart from the existing features of the dataset, LDA score is introduced as an additional feature for the neural network classifier. The proposed hybrid classifier LDA-NN shows improvement in accuracy of prediction, which plays a vitalrole to improve the quality of soft wares in future.
  Keywords:-
Fault prediction, LDA, NN, software fault, software quality
1. INTRODUCTION
Quality of the software is an important aspect and software fault prediction helps to better concentrate onfaulty modules. A fault is a mistake in the coding of the software which makes the software to behave in anabnormal way and in the way it is expected to behave. And software fault prediction is to identify the modulesmost likely to have the faults. Software failures no only results in heavy loss of cash and to the worst human livestoo in critical systems. In this modern world software is everywhere like air which we use all the day. Right from abus ticket to Intensive care unit or anything is of software’s. Software’s are developed by humans who are errorprone by mistake or either by un-noticed basic requirements. And for complex software’s Software Reliability ishard to archive. To predict the faults techniques like statistical methods, machine learning methods, parametricmodels and mixed algorithms are used. Supervised machine learning technique is a method in which input andoutput in hand is used to train the machine to better predict in future. Metrics datasets KC1 and PC1 available atpromise repository is used for the experiments [1]. In the proposed model additional feature (LDA score) is used.This model better predicts the fault than the grey features.
2. LITERATURE REVIEW
Software crisis is the difficulty involved in delivering software within budget, on time and of good quality.These three aspects are very important for a software company to survive. At the same time maintenance cost [4]has been increasing from 75 percent in late 1980s and 90 percent in early 1990s. And also over 50 percent of programmer effort is spent for maintenance alone [5]. Developing quality software is an important factor andhence this is an attractive area for many researchers. Predicting whether software contains defective modules isnot only useful for future versions of large software but will also be helpful while developing similar projects [6].[7] introduced maintenance severity which would help to locate the modules that need attention and their precisionmodel showed 70% accuracy for KC1 dataset.Statistical, machine learning, and mixed techniques are widely used in the literature to predict softwaredefects. [8, 9] shows that reducing the number of independent factors (attribute set) does not significantly affectthe Accuracy value in software quality prediction. In [10] compared decision trees, naïve Bayes, and 1-ruleclassifier on the NASA software defect data. A clear trend was not observed and different predictors scored betteron different data sets. However, their proposed ROCKY classifier outscored all the above predictor models. [11]compared different case-based reasoning classifiers and concluded that there is no added advantage in varying thecombination of parameters (including varying nearest neighbour and using different weight functions) of theclassifier to make the prediction Accuracy better.Bayesian Belief Networks, Causal Probabilistic Networks, casual Nets, Graphical Probability Networks,Probabilistic Cause- Effect Models, and Probabilistic Influence Diagrams [12] have attracted much recent
 
 IJITMR
Vol.5 No.2 July-December 2013, pp.54-63© Serials Publications, (India)
55
attention as a possible solution for the problems of decision support under uncertainty with the help of data miningtools available.Many modelling techniques have been developed and applied for software quality prediction. Theseinclude logistic regression, discriminant analysis [13, 14], the discriminative power techniques, Optimized SetReduction, artificial neural network [15, 16], fuzzy classification Bayesian Belief Networks (Fenton & Neil,1999), recently Dempster-Shafer Belief Networks. It is clear that not a single model is enough for betterprediction.
3. DATASET
Data sets used here for the evaluation of hybrid classifier are KC1and PC1 NASA data sets from promisesoftware engineering repository [1]. KC1 is a “C++" system implementing storage management for receiving andprocessing ground data. It has a total of 2109 modules with 326 defective instances. And PC1 (written in C), is anearth orbiting satellite system containing 1109 modules with 79 defective instances. The following table1 depictsthe source code metrics of KC1 and PC1 datasets. Out of 22 attributes defects is the boolean variable which sayswhether the module contains defects or not.Table 1.List of Kc1 Dataset Attributes1.
 
loc McCabe's line count of code2.
 
v(g) McCabe "cyclomatic complexity"3.
 
ev(g) McCabe "essential complexity"4.
 
iv(g) McCabe "design complexity"5.
 
N Halstead total operators + operands6.
 
V Halstead "volume"7.
 
L Halstead "program length"8.
 
T Halstead's time estimator9.
 
d Halstead "difficulty"10.
 
i Halstead "intelligence"11.
 
e Halstead "effort"12.
 
B Halstead's "error estimate"13.
 
lOCode Halstead's line count14.
 
lOComment Halstead's count of lines of comments15.
 
lOBlank Halstead's count of blank lines16.
 
locCodeAndComment Total lines of Code And Comment17.
 
uniq_Op unique operators18.
 
uniq_Opnd unique operands19.
 
total_Op total operators20.
 
total_Opnd total operands21.
 
branchCount of the flow graph22.
 
defects {false,true} {false,true}module has/has not one ormore reported defectsloc: This metric describes the total number of lines for a given module. This is the sum of the executablelines and the commented lines of code and blank lines. Pure, simple count from open bracket to close bracket andincludes every line in between, regardless of character content.v(g): Cyclomatic Complexity(v(g)), measures the number of "linearly independent paths". A set of pathsis said to be linearly independent if no path in the set is a linear combination of any other paths in the set through aprogram's "flow graph". A flow graph is a directed graph where each node corresponds to a program statement,and each arc indicates the flow of control from one statement to another. v(g) is calculated by,v(g) = e - n + 2 (1)where "g" is a program's flow graph, "e" is the number of arcs in the flow graph, and "n" is the number of nodes in the flow graph. The standard McCabe’s rules ("v(g)">10), are used to identify fault-prone module.
 
 IJITMR
Vol.5 No.2 July-December 2013, pp.54-63© Serials Publications, (India)
56
ev(g): Essential Complexity(ev(g)) is the extent to which a flow graph can be reduced by decomposing allthe sub-flow graphs of ‘g’ that are "D-structured primes". Such "D-structured primes" are also sometimes referredto as "proper one-entry one-exit sub-flow graphs [1]. ev(G) is calculated using,ev(g)= v(g) m (2)where "m" is the number of sub-flow graphs of "g" that are D-structured primes.iv(g): Design Complexity(iv(g)), is the Cyclomatic complexity of a module's reduced flow graph. Theflow graph, "g", of a module is reduced to eliminate any complexity which does not influence the interrelationshipbetween design modules. According to McCabe, this complexity measurement reflects the modules callingpatterns to its immediate subordinate modules.N: This metric describes the Halstead total operators + operands.V: This metric describes the halstead (V) metric of a module that contains the minimum number of bitsrequired for coding the program.L: This metric describes the halstead level (L) metric of a module i.e. level at which the program can beunderstood.T: This metric describes the halstead programming time metric of a module. It is the estimated amount of time to implement the algorithm.d: The difficulty level or error proneness (d) of the program is proportional to the number of uniqueoperators in the program.i: Intelligence(i) shows the complexity of a given algorithm independent of the language used to expressthe algorithm. The intelligence Content determines how much is said in a program.e: This metric describes the halstead effort (e) metric of a module. Effort is the number of mentaldiscriminations required to implement the program and also the effort required to read and understand theprogram.B: This metric describes the halstead error estimate metric of a module. It is an estimate for the number of errors in the implementation.lOCode: The number of lines of executable code for a module. This includes all lines of code that are notfully commented.lOComment: This metric describes the number of lines of comments in a module.lOBlank: Halstead's count of blank lines.locCodeAndComment: This metric describes the number of lines which contain both code & comment ina module.uniq_Op: This metric describes the number of unique operators contained in a module i.e. the number of distinct operators in a module.uniq_Opnd: This metric describes the number of unique operands contained in a module. It is a countof unique variables and constants in a module.total_Op: This metric describes the total usage of all the operators.total_Opnd: This metric describes the total usage of all the operands.branchCount: It describes the number of decision points in a given module. Decisions are caused byconditionals statements.defects: It describes weather the particular module is defective or not. This attribute is usedfor prediction.
4. HYBRID CLASSIFIER MODEL
Hybrid classifier [17] (LDA-NN) approach is the combination of Liner Discriminant Analysis (LDA) andneural network (NN) classifiers. Discriminant score plays the role of an additional feature to classify defectivemodules. Proposed hybrid classifier for software fault prediction is shown in the fig.1.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->