Professional Documents
Culture Documents
Software reliability and production bugs being the major problem for software developers, there was a
need of a bug and reliability prediction in the field. Near around 18 NHPP SRGM models have been
proposed until now. Reliability decides whether the work in the development should be continued or it
has no future so reliability prediction should be as accurate as they can. This investigation elucidates that
SVM can be a better alternative for the forecasting of reliability. We have tested the NHPP models and
SVM on future predictability and tested the accuracy and ranked them on the basis of RPE, MSE and R 2 .
Three different data sets are used for the testing of proposed model.
1. Introduction
softwares ::::::::::: ????
In software qualities reliability is a key feature. One of the major problem being faced in the software
industry is software reliability. Reliable software are lacking in the market and the demand is increasing
rapidly. Developing a software that is reliable is not an easy task as the resources are limited and the
requirements are unrealistic so they take time to be processed. It isn’t easy to determine whether the
delivered software is reliable or not. Nearly 30 software reliability models came over the time to find out
the quantitative measure of software reliability during development of the software.
SRGMs(software reliability growth models) have been proven to be successful in estimating the software
reliability and the no of errors remaining in the software. Using SRGMs one can predict the future
reliability of the software that is not even developed and also it can conduct the software testing and
debugging process. The present models be it yam model , be it s shaped model or be it the geo model they
assume that the fault process follow a specific kind of curve which is not true all the time and a bit
unrealistic. Though these growth model being inaccurate and insufficient to read the actual software
failure data for reliability assessment.
SVM is something which is being used for classifying and cracking the patterns and also for future
predictions. It is proving itself very useful in the real world implementation and it is known for
categorising and generalising well in most of the cases and adapt at modelling non-linear functions
relationships which are difficult to do with the SRGs. We are proposing SVM as a growth model and
using that here we are predicting the bugs that will be encountered and the software reliability at any time
during the development of the software.
Here we present the real projects on which SVM is applied for software reliability growth generalisation
and prediction . Three data sets data1 ,data2 and data3 are software failure data applied for software
reliability growth modelling data1 has 81 data pairs ,data 2 has 111 data pairs and data3 has 81 data pairs.
[[all three data are normalised to the range of [0,1] first]]. The data are given in table 1. Input of SVR
is the normalised successive failure occurrence times and the normalised accumulated failure number is
the output of SVR function. We use SVRSRG to refer the SVR-based software reliability growth model.
Table 1
MODEL NAME FORMULA
S SHAPED m(t)=(a(1-e^(-bt)))/(1+Be^(-bt))
(1)
where is the features for one data point and and are the weight vector and bias.
The weight and bias are estimated by minimizing the cost function. The best line is selected by
those and where the cost function is minimum. The cost function is given as follows:
(2)
where
(3)
where C and are predefined constants. C signifying the trade-off between training error
and model complexity and is the -insensitive loss function. The term is the
measure of complexity of the function. By the introduction of and which represents the
distance from observed value to corresponding boundary value of - tube, the cost function is
transformed into a simpler function. When the observed point lies above the support vector, is the
positive difference between the observed point and . While if the point lies below the support
vector, is the negative difference between observed point and . The new optimization problem
now can be defined as:
Minimize:
(4)
subjected to:
Finally, the result of the regression function at any sample is given by:
(5)
Here is the kernel function. It is the inner product of two vectors and in the feature
space and , i.e., . Kernel functions must satisfy Mercer’s
condition. In this study we have used ### and ### kernels.
Goodness of fit
TIM
E BUGS 20 211 40 346 60 460 80 473 100 477
1 5 21 217 41 367 61 463 81 473 101 477
2 10 22 226 42 375 62 463 82 473 102 477
3 15 23 230 43 381 63 464 83 473 103 478
4 20 24 234 44 401 64 464 84 473 104 478
5 26 25 236 45 411 65 465 85 473 105 478
6 34 26 240 46 414 66 465 86 473 106 479
7 36 27 243 47 417 67 465 87 475 107 479
8 43 28 252 48 425 68 466 88 475 108 479
9 47 29 254 49 430 69 467 89 475 109 480
10 49 30 259 50 431 70 467 90 475 110 480
11 80 31 263 51 433 71 467 91 475 111 481
12 84 32 264 52 435 72 468 92 475
13 108 33 268 53 437 73 469 93 475
14 157 34 271 54 444 74 469 94 475
15 171 35 277 55 446 75 469 95 475
16 183 36 293 56 446 76 469 96 476
17 191 37 309 57 448 77 470 97 476
18 200 38 324 58 451 78 472 98 476
19 204 39 331 59 453 79 472 99 476
Table 3 Dataset 3 (Failure data from online bug tracking system [30])
TIME BUGS 20 43 40 55 60 88
1 9 21 43 41 56 61 92
2 12 22 44 42 59 62 94
3 16 23 45 43 60 63 94
4 25 24 45 44 60 64 94
5 27 25 46 45 60 65 99
6 29 26 47 46 61 66 102
7 29 27 47 47 61 67 104
8 32 28 49 48 62 68 105
9 34 29 50 49 62 69 105
10 35 30 50 50 62 70 106
11 36 31 50 51 62 71 106
12 36 32 50 52 64 72 107
13 39 33 51 53 65 73 108
14 39 34 52 54 66 74 108
15 40 35 53 55 73 75 109
16 40 36 54 56 76 76 112
17 40 37 55 57 81 77 113
18 41 38 55 58 83 78 113
19 42 39 55 59 87 79 115
4 Result and Discussion
Software reliability growth models capture the pattern of datasets in a specific behaviour. These models
are therefore less flexible to fluctuations in pattern of bug production. This behaviour brings less accurate
results of reliability. To surpass this problem, we need a model or a technique which can map a better and
more flexible way to generate better results with more accuracy. Machine learning is a field of science
which uses existing data to predict future patterns. We used these concepts, as gradient descent and
support vector machine to try to combat this problem. SVM splits the data in hyper plane to form a linear
problem from a non-linear problem. We have made our own software reliability growth model and
optimised it using SVM. We used relative predictability error criteria to compare the reliability of the
existing models with the model we proposed. The models were trained manually using gradient descent
algorithm as more accurate results were found as compared to SPSS version . We selected 7 well known
models (described in Table 1) from the software engineering literature to rank the models. selected
models were trained on 3 different dataset as mentioned in Table 2, Table 3 and Table 4. The models
were first trained on 70% of the data and prediction was made for whole dataset and the Mean Square
Error (MSE) was calculated. This process was repeated at 80%, 90% and100% for all of the datasets.
After finding out the MSE, Relative Predictability Error (RPE) was calculated and finally these results
were compared to the results of SVM.
in the tables presented the RPE and the MSE are show .
Rpe is evaluated by (predicted(final) - actual(final))/actual . MSE is determined from the formula
((sum(actual-pred))^2)/no of terms.
The parameters v and c of SVR used in the experiment which is actually v-SVR are optimised by the
cross validation method .In our evaluation process, for each data set we first trained our model with 100%
,90%,80%,70% of the data individually and then predicted the complete 100% data and then evaluated
the relative predictability error (RPE) which tells about the future predictability power of the model.we
have used this RPE and MSE for the ranking of the models and for the same model we have 4 scores for
the fitness(for 100% trained ,90% trained ,80%$70%) . MSE is the mean square error which is calculated
by ((actual -predicted)^2)/no of data points) .it is obvious that SVRSRG is having a high score of fitness
on the data sets than the older models . From the experiment it is visible that the relative predictability
error and the mean square error of SVMSRG are smaller than other older SRGMs.
Conclusion
We first applied the models to calculate the software reliability and marked the results, then we applied
the proposed model i.e. SVM to same data setsthe results obtained by SVM had less forecasting errors
than the NHPP software reliability growth models. SVM has non-linear mapping due to which it captures
the pattern of any dataset and produces better results with less RPE. SVM’s parameters play a major role
in the prediction , improper selection of the parameter can led to overfitting or under fitting of the model.
[30] Yang, J., et al. (2016). Modeling and analysis of reliability of multi-release open source software
incorporating both fault detection and correction processes. Journal of Systems and Software, 115, 102-
110.
[23] Hwang, S., & Pham, H. (2009). Quasi-renewal time-delay fault-removal consideration in software
reliability modeling. IEEE Trans. on Systems, Man and Cybernetics, Part A: Systems and Humans,
39(1),200-209.