You are on page 1of 6




Impact of Software Project Uncertainties over Effort Estimation and their Removal by Validating Modified General Regression Neural Network Model
Brajesh Kumar Singh, A. K. Misra
Abstract Software cost estimation accuracy is one of the greatest challenges for software developer and customers. In general algorithmic models like Constructive Cost Model (COCOMO) are used but these have inability to deal with uncertainties related to software development environment and other factors. The Soft computing approach provides the solution for estimating the effort along with handling these uncertainties. In this paper, COCOMO is used as algorithmic model and an attempt is being made to validate the soundness of modified general regression neural network technique (MGRNN) using NASA project data. The main objective of this research is to analyze the accuracy of systems output when MGRNN model is applied to the NASA dataset to derive the software effort estimates. MGRNN model is validated by using 93 NASA project dataset. Empirical results show that application of the GRNN model for software effort estimates resulted in smaller mean magnitude of relative error (MMRE) and probability of a project having a relative error of less than or equal to 0.25 as compared with results obtained with COCOMO is improved by approximately 28.21%. Index Terms Modified General Regression Neural Network, COCOMO, Soft Computing, Effort Estimation, Mean Magnitude of Relative Error.


development with poor quality, which may be the reason of getting delay in project completion. On other hand, overestimating of costs of projects may result in too many additional resources involved in the project, or due to over cost of project, results may be losing the contract during the contract biding, which can ultimately lead to loss the jobs. Software development effort estimation deals with the prediction of the effort, probable amount of time, cost and quality required to complete the specific task. In many cases, previous experience based effort estimations are the only guide. However, in most of the cases projects are different and hence past experience alone may not be enough to handle them because they do not match partly or completely with previous project development estimations. Nowadays, many quantitative models of software cost estimation have been developed. Most of these models are based on the size measure, such as Lines of Code (LOC) and Function Point (FP), obtained from size estimation. Based on the context that the accuracy of size estimation directly impacts the accuracy of cost estimation, new alternative approach in soft computing techniques such as artificial neural networks (ANN) can be a good choice for software development effort estimation task. Various papers [4, 5, 6, 7, 8, 9, 10, 11 and 16] in a review of Brajesh Kumar singh is research scholar in Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, the literature revealed that there are two major types of Allahabad, India. cost estimation methods i.e. Algorithmic and Non algo Dr. A. K. Misra is working as Profeesor in Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, rithmic models.
Allahabad, India.

he continuous development of hardware and software, jointly with the world economical perspective has increased the competitiveness between producing and delivering software product and services by the companies [1]. Software development is becoming increasingly expensive and is a major cost factor for considering the budget in any information system. So there has been a growing need to produce low cost, high quality software in a short time span. The accuracy of software project cost estimation has a direct and significant impact on the quality of the companys software investment decision making process [2]. A quality level and international productivity can be achieved through the use of effective software management paradigm [1]. Software Management process carefully considers costs and software benefits before committing the required resources to that project or bidding for a contract [2].Accurate estimation of a new software project is still a critical task for every software project developer and customer. Unfortunately it is difficult to measure preliminary estimation because there is little information provided about the project at an early stage. Underestimating to costs of projects may result in increasing total budget of project

2013 Journal of Computing Press, NY, USA, ISSN 2151-9617




The late 1970s produced a flowering of more robust models. Understanding and calculation of algorithmic techniques based on historical data are different due to inherent complex relationship between the related attributes. Attributes and relationships used to estimate software development effort could change overtime and differ for software development environment and hence may create problems to software managers in committing resources and controlling costs. In order to address and overcome these problems, a new model with accurate estimation will be desirable such as SLIM [12], Checkpoint [13], PRICE-S [14], SEER [15], Albrechts Function Point [17, 18] and COCOMO [16]. Although most of these researchers started working on developing models of software cost estimation at about the same time, all of them faced the same dilemma: as the software size increases and importance there is also a growth in complexity, which makes it very difficult to accurately predict the cost of software development. This dynamic field of software cost estimation sustained the researcher's interests who succeeded in setting the stepping-stones of software engineering cost models.

COMO for effort estimation has the form given in Equation 1. Effort = a(KLOC)b * EAF (1) The software effort is computed in person-months. The values of the parameters a and b depend mainly on the class of software project. Software projects were classified based on the complexity of the project into three categories. They are: Organic Semidetached Embedded. EAF is Effort Adjustment factor which depends of the values 15 cost drivers. These models exhibit some nonlinearity characteristics. Extensions of COCOMO, such as COCOMO II, can be found [3], however, for the purpose of research reported, in this paper, the intermediate COCOMO model is used. The three models are given in Table I. These models are expected to give different results according to the type of software projects [26] (i.e. Organic, semi-detached and embedded) [16], [18]. Table 1 Describing the values of a and b for intermediate COCOMO Project modes a b Organic 3.2 1.05 Semidetached 3.0 1.12 Embedded 2.8 1.20 The limitations of the algorithmic models led to the exploration of the non-algorithmic techniques which are soft computing based. So, based on these contexts, new alternative approaches like soft computing techniques are required for better solutions.

2.1 Algorithmic models These Traditional algorithmic techniques require long term estimation process. Algorithmic models are based on the statistical analysis of historical data (past projects) [19,20], few of them are, Software Life Cycle Management (SLIM) [21] and Constructive Cost Model (COCOMO) [16,22]. All of them require inputs, which are accurate estimate of specific attributes, such as Line Of Code (LOC), number of user screen, interfaces and complexity, which are not easy to acquire at the early stage of software development. Besides, attributes and relationships used to predict software development effort could change over time and/or vary for different software development environments [23]. 2.2 Constructive Cost Model (COCOMO) Many software cost estimation models where proposed to help in providing a high quality estimate to assist software project manager in making accurate decision related to their projects [16,18,24]. A well known mathematical model for software cost estimation is the COCOMO. COCOMO was first provided by Boehm [16, 18]. This model was built based on 63 software projects. The model is defining mathematical equations that can identify the developed time, the effort and the maintenance effort. COCOMO is used to make estimates based upon three different software project estimates. The three ways of estimating software project effort/cost with increasing levels of accuracy are simple, intermediate and complex models. These three models are defined by increasing the details in mathematical relationship between the developed time, the effort and the maintenance effort [25]. The software cost estimation accuracy is significantly improved when we adopt models such as the Intermediate and Complex COCOMOs [18]. The CO-


3.1 Non-algorithmic models Newer computation techniques, to estimate the software effort are non-algorithmic i.e. approaches that are soft computing based came up in the 1990s, and drew the attention of researchers towards them. This section discusses some of non-algorithmic models for software development effort estimation. Soft computing has techniques like fuzzy logic (FL), artificial neural networks (ANN) and evolutionary computation (EC). These methodologies manage real life like ambiguous situations by their capability of flexible information processing. Soft computing techniques have been used by many researchers for software development effort prediction to handle the imprecision and uncertainty in data and to learn accordingly, due to their inherent nature. The first realization of the fuzziness of several aspects of one of the best known [27], most successful and widely used model for cost estimation, COCOMO, was that of Fei and Liu [28]. They observed that an accurate estimation of delivered source instruction (KDSI) is not possible before starting the project; therefore, it is unreasonable to assign a finite

2013 Journal of Computing Press, NY, USA, ISSN 2151-9617



number for it. Boetticher has described a neural network approach for characterizing programming effort based on internal product measures [29]. An accuracy of within 25% of actual effort more than 75% of the time can be achieved for one large commercial data set for a neural network based model when used to estimate software development effort [30]. In summary, the previous research reveals that all of the soft computing-based software effort prediction models that are developed, lack in some aspect or the other. Selecting a suitable technique is a difficult decision which in turn requires some ranking for each prediction technique as and when it is applied to any prediction problem. In the present study an effective model based on Artificial Neural Network ANN has been proposed to overcome the uncertainty problem and to acquire more accurate software cost estimates.

Rather, it allows the appropriate form to be expressed as a probability density function (pdf) that is empirically determined from the observed data using parzen window estimation [33]. Thus, the approach is not limited to any particular form and requires no prior knowledge of the appropriate form. From the Fig 1 it may be inferred that R is the number of element present in the input vectors that are provided to network for predicting the results. Q is the total number of input set available for which the network is trained or tested. Key layers in the modified MGRNN are the radial basis function layer and linear function transfer layer.


4.1 Dataset Description We have considered the source data from 93 NASA projects from different centers for projects in years 1971 1987 Collected by Jairus Hihn, JPL, NASA, Manager SQIP Measurement & Benchmarking Element. Dataset consists of 15 cost drivers, 1 attribute of the 3 development modes, Project Size (in KLOC), and Actual effort used to evaluate the prediction done by different approaches [31]. 4.2 Proposed Approach A brief overview of the Modified General Regression Neural Network Model (MGRNN) is as follows: 4.2.1 Modified General Regression Neural Network (MGRNN) Model General regression neural network is one pass learning algorithm which has a highly parallel structure. It has very smooth transitions from actual values. It requires very less data for training in comparison to back propagation neural network which can be very useful for designing the system with very few dataset.

4.2.2 Radial Basis Function Layer The inputs obtained from the input layer R which shows the number of elements present in input vector are analyzed. Each individual element of vector R is dented by P. Their weights are stored inside the input weight vector IW 1, 1 for first layer. When an input is presented, the || dist || called as weighted input box produces a vector whose elements indicate how close the input is to the vectors of the training set. It accepts the input p and the single row input weight vector, and produces the dot product of the two.
Weighted input= (|| IW1,1 p || (2)

The net input n1 to the radial basis transfer function is the vector distance between its weight vector IW and the input vector P, multiplied element by element, by the bias b1. n1= ((|| IW1,1 p || bi1)) (3)

The transfer function used is radial basis can be given as follows:

a = e n


The Plot of radial basis function can be described in fig 2. The radial basis function has its peak that is equal to 1 when its input is 0. This is due to the reason, the distance between w and p decreases, the output increases. Thus, a radial basis neuron acts as a detector that produces 1 whenever the input p is identical to its weight vector w.

Fig. 2. Plot of radial basis function

Fig. 1. Describing the layers of MGRNN

The approach presented here uses a method that frees it from the necessity of assuming a specific functional form.

The bias b allows the sensitivity of the radial basis neuron to be adjusted. For example, if a neuron had a bias of 0.1 it would output 0.5 for any input vector p at vector distance of 8.326 (0.8326/b) from its weight vector w.

2013 Journal of Computing Press, NY, USA, ISSN 2151-9617



This transfer function is the input vector a1 to the next layer, which can be shown as, ai1=radbas (|| IW1,1 p || bi1) (5)

nonparametric estimator for f(x,y) and performing the integrations leads to the fundamental equation of GRNN.

4.2.3 Linear Transfer Function Layer: nprod is the input neuron to the second layer, produces S2 elements in vector n2. Each element is the dot product of a row of LW2,1 and the input vector a1, all normalized by the sum of the elements of a1. The second-layer weights, LW2,1 are set to the matrix T of target vectors.

y exp( D( x, x ))
i i i =1 m



exp( D( x, x ))
i =1


D ( x, xi ) =
j =1

x j x j ,i


The resulting regression as equation 5, which involves summations over the observations, is directly applicable to problems involving numerical data.

Fig. 3. plot of linear transfer function

Linear transfer function is depicted in fig. 3.Each vector has a value '1' only in the row associated with that particular class of input, and 0's elsewhere. The multiplication T with a1 sums the elements of a1 due to each of the K input classes. Finally, the linear Transfer functions a2 calculate a layer's output from its net input. The secondlayer transfer function, compete, produces the output corresponding to the largest element of n2, and 0's elsewhere. Thus, the network classifies the input vector into a specific K class because that class has the maximum probability of being correct [32].

The value of an effort predictor can be reported many ways like Mean Magnitude of Relative Error (MMRE) and probability of a project having a relative error of less than or equal to L (PRED(L)). MMRE and PRED(L) are the most commonly accepted evaluation criteria for evaluating the different software effort estimation. MMRE and PRED are computed from the Magnitude relative error, or MRE, which is the relative size of the difference between the actual and estimated value of individual effort i

MREi =

Estimated _ effort i ~ actual _ effort i actual _ effort

4.2.4 Proposed Algorithm for MGRNN: Step 1: Store linguistic terms in vector S. Step 2: Convert the linguistic terms into numerical values by the rules defined in IR and save in vector R as given in table II. Step 3: Initialize Y, with the targets that is effort in our case. Step 4: Using representing the joint continuous probability distribution function of a vector random variable, x, and a scalar random variable, y. Let X be a particular measured value of the random variable x. The conditional mean of y given X (also called the regression of y on X) given by,

The MRE value is calculated for each observation i of actual and predicted effort. The aggregation of MRE over multiple observations (N) can be achieved through the Mean MRE (MMRE) as follows:


1 N


Another criterion is the prediction at level L, Pred(L) = k/N, where k is the number of observations where MRE is less than or equal to L and N is the total number of observations. Thus, Pred(25) gives the percentage of projects which were predicted with a MRE less than or equal to 0.25.

yf ( x, y)dy


(6) Initially training has been implied by considering 93 datasets of different NASA projects where 17 inputs and 1 output are made available for the purpose. Test data is taken from NASA dataset as input for testing the performance of trained network. The NASA Dataset contains the linguistic terms so they are replaced according to the table II. Spread constrain is set to 1.0.


f ( x, y)dy

where f(x,y) is the joint density and can be estimated by using Parzens nonparametric estimator. When the density f(x, y ) is not known, it must usually be estimated from a sample of observations of x and y . For a nonparametric estimate of f(x, y), we will use the class of consistent estimators proposed by Parzen [33]. Substituting Parzens

2013 Journal of Computing Press, NY, USA, ISSN 2151-9617



Table II Linguistic terms and their values

Modes and parameters Organic Semidetached Embedded Nominal(N) very low(VL) Low(L) High (H) Very high (VH) Extra high(Xh)

Values used 1 2 3 0 1 2 3 4 5
Graph 2: Effort Comparison Graph

Table III reveals the comparisons of performance between MGRNN model and COCOMO based results. Table III Performance of GRNN Model and COCOMO

In our case, we considered three different sets of efforts that are actual results obtain from NASA datasets, COCOMO results calculating by NASA input data sets, and MGRNN results calculated by NASA input data sets. It could be seen that MGRNN plot is almost identical to the actual effort plot.

Paper presented a new model for handling imprecision and uncertainty by using the MGRNN. This work has further shown that accurate effort estimation is possible by evaluating algorithmic and non algorithmic software effort estimation models. The proposed model showed better software effort estimates in view of the MMRE, Pred(0.25) evaluation criteria as compared to the traditional COCOMO. Graph-2 depicts that most of the efforts calculated by proposed models are overlapping with actual efforts and COCOMO estimated efforts. Graph-2 demonstrates that applying proposed model to the software effort estimation is a feasible approach to address the problem of uncertainty and vagueness in software effort drivers. Empirical results show that application of the MGRNN model for software effort estimates resulted in smaller mean magnitude of relative error (MMRE) and probability of a project having a relative error of less than or equal to 0.25 as compared with results obtained with COCOMO is improved by approximately 28.21%. The utilization of MGRNN based approaches for other applications in the software engineering field can also be explored in the future.

Data set

Model MGRNN Model

MMRE 0.18 0.46 28.21%

Pred (25%)

0.91 0.38

NASA 93 Dataset

COCOMO Improvement (%)

Graph-1 shows the comparison made between 93 results produced by test data for MGRNN model and corresponding data set for COCOMO as well.

Graph1 showing MRE of GRNN and COCOMO

It is observed by the Graph-1 that MRE by proposed MGRNN model is always kept near to the mean of MRE which shows the accuracy of the model. But in case of COCOMO, there are relatively more spikes as compared to MGRNN with high MRE which show the inconsistency of COCOMO in the evaluation of efforts. Graph-2 shows the efforts applied with each project during its development.


[3] [4]

[5] [6]

I.F. Barcelos Tronto, J.D. Simes da Silva, N. Sant'Anna, Comparison of Artificial Neural Network and Regression Models in Software Effort Estimation, INPE ePrint: v1 2006 Hasan Al-Sakran, Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning, Journal of Computer Science 2 (3): 276-282,, Science Publications, ISSN 1549-3636, 2006. B. Boehm and et all, Software Cost Estimation with COCOMO II. Prentice Hall PTR, 2000. C. E. Walston, C. P. Felix, A method of programming measurement and estimation, IBM Systems Journal, vol. 16, no. 1, pp. 54 73, 1977. G.N. Parkinson, Parkinson's Law and Other Studies in Administration, Houghton-Miffin, Boston, 1957. L. H. Putnam, A general empirical solution to the macro software sizing and estimating problem, IEEE Trans. Soft. Eng., pp. 345-361, July 1978.

2013 Journal of Computing Press, NY, USA, ISSN 2151-9617







[11] [12]

[13] [14]


[16] [17]

[18] [19]










J. R. Herd, J.N. Postak, W.E. Russell, K.R. Steward, Software cost estimation study: Study results, Final Technical Report, RADCTR77- 220, vol. I, Doty Associates, Inc., Rockville, MD, pp. 1-10, 1977. R. E. Park, PRICE S, The calculation within and why, Proc. of ISPA Tenth Annual Conference, Brighton, England, pp. 231240, July 1988. R.K.D. Black, R. P. Curnow, R. Katz, M. D. Gray, BCS Software Production Data, Final Technical Report, RADC-TR-77-116, Boeing Computer Services, Inc., March, pp. 5-8, 1977. R. Tausworthe, Deep Space Network Software Cost Estimation Model, Jet Propulsion Laboratory Publication 81-7, pp. 67-78, 1981 W. S. Donelson, Project Planning and Control, Proc. Datamation, pp. 73- 80, June 1976. Putnam, L. and Myers, W. (1992), Measures for Excellence, Putnam, L. and Myers, W, Yourdon Press Computing Series., 1992. Jones, C., Applied Software Measurement, Jones, C., McGraw Hill, 1997 Park R., The Central Equations of the PRICE Software Cost Model, 4th COCOMO Users Group Meeting, November 1988. Jensen R An Improved Macro level Software Development Resource Estimation Model, Jensen R., Proceedings 5th ISPA Conference, , pp. 88-92, April 1983 Boehm B. W. Software engineering economics. Englewood Cliffs, Prentice-Hall, NJ. ISBN: 10: 0138221227, pp: 768, 1981 Boehm B., C. Abts and S. Chulani, Software development cost estimation approaches-A survey. Ann. Software Engg., 10: 177-205. DOI: 10.1023/A: 1018991717352,2000 Boehm, B. Cost models for future software life cycle processes: COCOMO 2.0. Ann. Software Eng. 1: 45-60. ,1995 Strike, K., K. El-Emam and N. Madhavji. Software cost estimation with incomplete Data. IEEE Trans. Software Engg., 27: 890-908. DOI: 10.1109/32.962560, 2001. Hodgkinson, A.C. and P.W. Garratt, A neurofuzzy cost estimator. Proceedings of the 3rd International Conference on Software Engineering and Applications, (SEA99), ePrint, pp: 401-406. , 1999. Schofield C.,Non-Algorithmic effort estimation techniques. Technical Reports, Department of Computing, Bournemouth University,England. SERG/Technical_Reports/TR98-01/ 1998. Putnam, L.H., A general empirical solution to the macro software sizing and estimating problem. IEEE Trans. Software Eng., 4: 345-361. 1978. Srinivasan, K. and Fisher D., Machine learning approaches to estimating software development effort. IEEE Trans. Software Eng., 21: 126-137. DOI: 10.1109/32.345828, 1995. Sultan Aljahdali, Alaa F. Sheta, Software effort estimation by tuning COCOMO model parameters using differential evolution, ACS International Conference on Computer Systems and Applications (AICCSA), IEEE, 2010. C. F. Kemere, An empirical validation of software cost estimation models, Communication ACM, vol. 30, pp. 416 429, 1987. O. Benediktsson, D. Dalcher, K. Reed, and M. Woodman, COCOMO based effort estimation for iterative and incremental software development, Software Quality Journal, vol. 11, pp. 265281, 2003. C. Kirsopp, and M. J. Shepperd, Making inferences with small numbers of training sets, Sixth International Conference on Empirical Assessment & Evaluation in Software Engineering, Keele University, Staffordshire, UK, 2002. Z. Fei, and X. Liu, f-COCOMO: fuzzy constructive cost model in software engineering, Proceedings of the IEEE International Conference on Fuzzy Systems, IEEE Press, New York, pp. 331 337, 1992.

[29] G. D. Boetticher, An assessment of metric contribution in the construction of a neural network-based effort estimator, Proceedings of Second International Workshop on Soft Computing Applied to Software Engineering, 2001. [30] G. Wittig, and G. Finnie, Estimating software development effort with connectionist models, Information and Software Technology, 39, pp. 469476, 1997. [31] sa_2.arff [32] Specht DF ,A general regression neural network. IEEE Transactions on Neural Networks 2(6): 568-576., 1991. [33] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist., vol. 33, pp. 1065-1076, 1962.

Brajesh Kumar Singh is presently doing Ph.D. from MNNIT, Allahabad, India, under the guidance of Prof. A.K.Misra, Department of Computer Science & Engineering. He is working as Reader in Computer Science & Engineering at FET, RBS College, Agra, India. He has few national and international research papers including IEEE publications. He is the member of international associations in the field of soft computing and software engineering. He is the member of Reviewer board/Conference Committee/ Editorial board in many national/International conferences/journals India and abroad.

2013 Journal of Computing Press, NY, USA, ISSN 2151-9617