Professional Documents
Culture Documents
) 1
Algorithms 3
Abstract: Test suite code coverage is most likely used as an indicator for test suite capability in 9
detecting faults. However, earlier studies that explored the correlation between code coverage and 10
test suite effectiveness have not addressed this correlation evolutionally. Moreover, some of these 11
works were studied small or identical domain systems, which makes the result generalization pro- 12
cess unclear for other systems. Software refactoring reveals a positive consequence in terms of soft- 13
ware maintainability and understandability. It aims to enhance the software quality by modifying 14
the internal structure of systems without affecting their external behavior. However, identifying the 15
refactoring needs and which level should be executed is still a big challenge to the software devel- 16
opers. In this paper, we explore the effectiveness of employing the Support Vector Machine along 17
with three optimization algorithms in predicting software refactoring at the class level. In particular, 18
SVM is trained with Genetics, Particle swarm (PSO), and whale algorithms. A well-known dataset 19
is used in this study that belongs to open-source software systems (i.e., ANTLR4, JUnit, MapDB, 20
and McMMO). The experimental results show that there is no significant difference between the 21
developed approaches based on accuracy since all the developed approaches achieved a high accu- 22
racy rate range between 94.75 and 99.09 and with an SD range between 1.07 and 3.09. 23
24
1. Introduction 27
In any business sector, the quality of a particular product or a service matters and 28
this quality is often dependent on the process that is followed to build that product or 29
service [35]. Today's world massively depends on software technology, and the high qual- 30
ity of these software systems has been greatly in demand for the past few decades. The 31
main expectancy of high-quality software is its reliability and ecosystem. This is done by 32
reducing the bugs or failures in the software algorithms. These bugs tend to slow down 33
the software response and user experience, which can harm their performance. These sys- 34
tem errors cause faults, and hence faults cause system failures [17]. Altering a software 35
system in a way that does not affect its external response but improves the internal struc- 36
ture is known as refactoring [9]. It also improves the external response by adding quality 37
like improved user experience and user interface. 38
During the Software applications life cycle, the software keeps changing to adapt to 39
new features or modify existing ones to cope with new requirements. In order to continue 40
satisfying stakeholders' needs, their requirements enforce the developer to reflect their 41
intended needs in the software. It is known that software maintenance is the most expen- 42
sive phase in the software development lifecycle [1]. These maintenance activities usually 43
happen incrementally. They can be to add or modify functionality or restructure the 44
design for a better user experience. If the system does not go through several design cor- 45
rection activities, its quality will degrade [2]. 46
Once software developers receive new demands or requests, they modify the soft- 47
ware to accommodate these requirements (software refactoring) [3]. Software Refactoring 48
modifies the internal structure of the software without altering its external functionality 49
[2, 3]. Moreover, software refactoring is employed to enhance the understandability, re- 50
duce the complexity, and increase the maintainability of the targeted software [5]. 51
Refactoring might change the software at three levels, from the lower to higher-level 52
variables, functions, and classes. These changes introduce a big challenge to the software 53
developers from the technical side, especially when they need to identify both the level 54
and all code pieces that need refactoring. The primary aim of refactoring is to make the 55
code more maintainable without changing its semantics [32]. Refactoring is a highly chal- 56
lenging task in the sense that how to identify which parts of the software have to be refac- 57
tored and which methods are to be used. These challenges arise due to the significant 58
functionality limitations software repositories contain and the type of data used in them 59
[35]. Hence, too much research raises the need for building refactoring prediction/recom- 60
mendation systems to assist in evolution tasks [5, 6, 7, 8, 9, 10]. 61
Although the refactoring task is roughly dependent on software developers' skills 62
and insights, still this process may be supported by refactoring prediction/recommenda- 63
tion systems. These prediction systems facilitate the process of detecting the classes or 64
methods that need refactoring. 65
Several techniques have been studied to predict refactoring, for example, code smells 66
[11], pattern mining [12], invariant mining [13], search-based [14]. Machine Learning al- 67
gorithms reveal encouraging results when utilized in various fields of software engineer- 68
ing, such as defect prediction [15, 16, 17, 18, 19], code smells, and code comprehension [20, 69
21]. 70
To the best of our knowledge, the work presented in this article introduces a new 71
research contribution. The research work in this paper presents a class-level refactoring 72
prediction on 4 open-source Java-based systems, i.e., ANTLR4, JUnit, MapDB, and 73
McMMO, using a Support Vector Machine (SVM) and three optimization algorithms Ge- 74
netic Algorithm (GA), Particle Swarm (PSO), and Whale Algorithms (WA). This paper 75
uses the studied algorithms to predict the refactoring needs at the class level when stand- 76
alone and integrated algorithms are applied. The main problem software practitioners en- 77
counter is recognizing which code segment has to be refactored. Therefore, this paper fo- 78
cuses on the use of SVM and optimization methods in this regard. 79
The primary aim of this study is to understand if there is a difference in these predic- 80
tion software with the three different optimization algorithms. By repeating the experi- 81
ments several times, we can develop a better understanding of which technique response 82
is better, leading to optimized results in terms of software quality. Thus, by conducting 83
these experiments, suggestions and conclusions are made about the aforementioned re- 84
factoring methods and algorithms. 85
86
87
88
89
2. Related Work 90
In order to make predictions about the defects in particular software, researchers and 91
developers also apply a machine learning approach to a software system in real-time. 92
Some famous examples of these machine learning approaches are telecon- 93
trol/telepresence, robotics and mission planning systems. Many studies were conducted 94
in the field of predicting software faults, and these researches differ between optimization 95
techniques, machine learning techniques and classification techniques [17]. There are 96
Processes 2022, 10, x FOR PEER REVIEW 3 of 10
several techniques to examine the defects present in the software, but there is no such 97
technique until now which can display highly accurate results. 98
As known by now, various refactoring implication types exist. The main process of 99
refactoring involves the modification of classes, methods and variables. Upon that, for 100
developers, it is also an important aspect to identify all the code elements or code seg- 101
ments of software in a large complex system that requires refactoring. 102
In this regard, Support Vector Machines (SVM) has high popularity among software 103
developers and testers. SVM classifies data into predefined classes by computing a hyper- 104
plane in a high-dimensional space. In other words, it is a machine learning technique that 105
could be used for classification. The advantage of using this as feature selection is that it 106
tends to reduce the computation time and also improves prediction performance. Since it 107
improves prediction accuracy and helps to observe different values and crucial factors for 108
the performance, many researchers use SVM as feature selection in their research. 109
110
Figure 1. SVM system 111
112
Refactoring is studied extensively in the literature. Fowler initiated the effort by com- 113
ing up with the first 72 refactoring catalog with an accompanying guide [3]. Simon et al. 114
[22] proposed an approach to generate visualizations that support developers to identify 115
bad smells. 116
Many different studies have been conducted in order to examine fault prediction in 117
software by using object-oriented metrics. The results of these studies showed that the 118
object-oriented metrics are able to produce significantly enhanced outcomes than the 119
static code metrics. This is because object-oriented metrics represent different structural 120
characteristics like coupling, cohesion, inheritance, encapsulation, complexity, and size 121
metrics [17]. 122
An early survey [2] was conducted to shed light on refactoring that discussed refac- 123
toring activities, techniques, and tools. They discussed their beliefs about how refactoring 124
can improve software quality in the long run. Most existing research studies are based on 125
rule-based, machine learning approaches or search-based. Systematic literature review 126
(SLR) in [23] discusses how researchers are increasingly getting interested in automatic 127
refactoring techniques. Their results suggested that source code approaches are by far 128
more studied than model-based ones. The results also showed that search-based ap- 129
proaches are more popular, and recently more machine learning approaches have been 130
explored by researchers to help experts in discovering refactoring needs. 131
Processes 2022, 10, x FOR PEER REVIEW 4 of 10
Mariani and Vergilio [24] conducted an SLR of search-based refactoring approaches. 132
They observed that evolutionary algorithms, specifically genetic algorithms, were the 133
most used algorithms. Mohan and Greer [25] investigated search-based refactoring in 134
more depth covering tools, metrics, and evolution since their focus was software mainte- 135
nance. They also found that the evolutionary algorithms were the most used algorithms. 136
Moreover, Shepperd and Kadoda [36] used simulation methods to differentiate be- 137
tween software prediction with the help of Stepwise regression Rule induction (RI), Case- 138
based reasoning (CBR) and Artificial Neural Networks (ANN). They compared these pre- 139
diction models on an actual software in terms of accuracy, explanatory value and config- 140
urability, and they found out that CBR and RI give them an advantage over ANN while 141
CBR was favored by all. 142
Azeem et al. [21] conducted a systematic literature review to summarize the research 143
on Machine Learning (ML) algorithms for code smelling predictions. Their review in- 144
cluded 15 research studies that involved code smell, prediction models. According to the 145
results of the study, Decision Trees and SVM are the most widely used ML algorithms for 146
code smell detection. Furthermore, JRip and Random are the most effective algorithms in 147
terms of performance. 148
In addition to this, Liu et al. [20] describe a tool that uses conceptual relationship, 149
implementation similarity, structural correspondence and inheritance hierarchies to iden- 150
tify potential refactoring opportunities in the source code of open-source software sys- 151
tems. 152
Besides this, Liu et al. [20] also showed that machine learning models which could 153
predict a high level of defect classes could be built by using the static measures and defect 154
data, which is collected at a high-class level. 155
Tsantalis and Chatzigeorgiou [37] report a way to recognize refactoring suggestions 156
with the help of polymorphism. Their main focus was on the detection and elimination of 157
state-checking problems in programs that implemented Java and deployed as eclipse add- 158
on or plug-in. In 2007, Ng and Levitin proposed correcting faults in addition to making 159
predictions of faulty parts in software. In order to achieve this, they applied a genetic 160
algorithm and a number of neural networks iteratively. The genetic algorithm in this pro- 161
ject was used to increase the performance of the prediction model. 162
Erturk and Sezer [34] have analyzed the evolution of an object-oriented source code 163
at a class level. The refactoring events, which depend on a vector space model, are the 164
main focus here. A list of class refactoring operations was created by the application of 165
this proposed approach to an open-source domain. 166
Another study by Caldeira et al. [35] investigates the effects of some aspects like da- 167
taset size, metrics sets and feature selection techniques on software fault prediction prob- 168
lems. These aspects were previously not researched before this study. Random Forest Al- 169
gorithm and Artificial Immune Systems were used as machine learning methods, and a 170
dataset was collected from the PROMISE repository. The selected algorithm was deter- 171
mined to be much more important than the selected metrics as per this study [35]. 172
173
3. Methodology 174
In this section, the authors present the developed technique for predicting Software 175
refactoring by using a Support vector machine classifier and three optimization algo- 176
rithms. The developed approach is composed of four main phases. In the first phase, a 177
pre-processing procedure is conducted on collected datasets. In the second phase, the GA, 178
PSO, WA, and SVM classifiers are applied to the processed datasets to predict the refac- 179
toring opportunities. In the third phase, the results are evaluated by using Wilcoxon 180
signed-rank test [26]. In the last phase, we compare the results to find out the best ap- 181
proach overall. Figure 1 depicts the main phases of the proposed technique. 182
183
Processes 2022, 10, x FOR PEER REVIEW 5 of 10
184
Figure 2: Developed Methodology. 185
194
All unnecessary attributes are deleted in data pre-processing, such as Long-Name, 195
Parent, path, and Component. Moreover, the class labels are replaced with 0 and 1, where 196
false becomes 0 and true becomes 1. 197
close to the food, these particles inform the other particles to come closer to its route. If 218
other particles encircling closer to the source (target food) are more than the previous par- 219
ticle, then this particle warns all other particles to move forward in its direction. This tech- 220
nique is repeated till one particle catches the source food (best/optimal solution) [29]. Par- 221
ticle swarm algorithm is another optimization algorithm that is widely used in software 222
fault prediction. In this stage, we integrated it with the SVM classifier and applied it to the 223
four datasets; this iteration was repeated 50 times. 224
Accuracy STD
Processes 2022, 10, x FOR PEER REVIEW 7 of 10
GA+PSO+Wahl
GA+PSO+Wahl
Dataset
GA+Whale
GA+Whale
GA+PSO
GA+PSO
Whale+
Whale+
+SVM
PSO+
PSO+
SVM
SVM
SVM
SVM
SVM
SVM
GA+
GA+
Antlr4 94.75 94.72 94.29 94.73 94.73 94.72 2.48 2.32 3.09 3.07 3.25 2.32
Junit 98.63 98.62 98.48 98.63 98.63 98.63 1.43 1.45 1.18 1.27 1.44 1.07
MapDB 99.09 99.08 99.09 99.09 99.09 99.09 1.51 2.13 1.11 1.11 1.51 1.11
McMMO 98.67 98.68 98.67 98.68 98.67 98.67 1.63 1.62 1.63 1.62 2.21 2.21
264
In this paper, the effectiveness of merging the optimization algorithms and machine 265
learning (SVM) classifiers are evaluated in terms of refactoring prediction performance. 266
Three optimization algorithms are studied in this work and four prediction data sets. To 267
evaluate the developed approach, we used the Wilcoxon signed-rank test to calculate the 268
P-value and check any significant differences in the experiments conducted several times. 269
In this work, the prediction effectiveness is mainly measured by accuracy. After con- 270
ducting several experiments, the result shows that there is no significant difference be- 271
tween the addressed approaches. Mostly, all experiments achieved a high accuracy rate 272
range between 94.75 and 99.09 and with an SD range between 1.07 and 3.09. 273
274
275
302
Funding: “This research was funded by Prince Sultan University”. 303
Processes 2022, 10, x FOR PEER REVIEW 8 of 10
Acknowledgment: Authors Appreciate Prince Sultan University, Saudi Arabia, for endless 304
support and for providing the fund to carry out this work. 305
306
307
Processes 2022, 10, x FOR PEER REVIEW 9 of 10
References 308
309
1. A. Ghannem, G. El Boussaidi and M. Kessentini. "Model refactoring using interactive genetic algorithm.", In International Sym- 310
posium on Search Based Software Engineering, Springer, Berlin, Heidelberg, pp. 96-110, 2013. 311
2. T. Mens and T. Tourwé, "A survey of Software Refactoring", IEEE Transactions on Software Engineering, vol. 30, no. 2, pp.126- 312
39, 2004. 313
3. M. Fowler, K. Beck, J. Brant, W. Opdyke and D. Roberts, "Refactoring: Improving the Design of Existing Code", Addison-Wesley 314
Professional. Berkeley, 1st ed., CA, USA, 1999. 315
4. L. Kumar, S. Satapathy and A. Krishna, "Application Of Smote And Lssvm With Various Kernels For Predicting Refactoring At 316
Method Level", International Conference on Neural Information Processing, Springer, Cham, pp. 150-161, 2018. 317
5. S. Nyamawe, H. Liu, Z. Niu, W. Wang, and N. Niu, "Recommending refactoring solutions based on traceability and code met- 318
rics," IEEE Access, vol. 6, pp. 49460–49475, 2018. 319
6. M. D'Ambros, M. Lanza and R. Robbes, "Evaluating defect prediction approaches: a benchmark and an extensive comparison," 320
Empirical Software Engineering, vol. 17, no. 4, pp. 531-77, 2012. 321
7. P. Deep Singh and A. Chug, "Software defect prediction analysis using machine learning algorithms," 7th International Confer- 322
ence on Cloud Computing, Data Science Engineering - Confluence, pp. 775-781 2017. 323
8. D. Silva, N. Tsantalis and M. T. Valente, "Why we refactor? Confessions of GitHub contributors," in Proceedings of the 24th 324
ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, Seattle, WA, USA, pp. 858–870, 2016. 325
9. M. Aniche, E. Maziero, R. Durelli and V. Durellim "The effectiveness of supervised machine learning algorithms in predicting 326
software refactoring". IEEE Transactions on Software Engineering, Early Access 2020. 327
10. M. Alenezi, M. Akour and O. Al Qasem, "Harnessing deep learning algorithms to predict software refactoring. Telkomnika, 328
vol. 18, no. 6, pp. 2977-2982, 2020. 329
11. R. Marinescu, "Detection strategies: Metrics-based rules for detecting design flaws," in 20th IEEE International Conference on 330
Software Maintenance Proceedings, pp. 350–359, 2004. 331
12. G. Bavota, S. Panichella, N. Tsantalis, M. D. Penta, R. Oliveto et al., "Recommending refactoring based on team co-maintenance 332
patterns," in Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, pp. 337–342, 333
2014. 334
13. Y. Kataoka, T. Imai, H. Andou and T. Fukaya, "A quantitative evaluation of maintainability enhancement by refactoring," in 335
International Conference on Software Maintenance Proceedings, pp. 576–585, 2002. 336
14. M. O'Keeffe and M. O. Cinn'eide, "Search-based refactoring for software maintenance," Journal of Systems and Software, vol. 337
81, no. 4, pp. 502–516, 2008. 338
15. O. Al Qasem, M. Akour and M. Alenezi, "The influence of deep learning algorithms factors in software fault prediction". IEEE 339
Access, vol. 8, pp. 63945-63960, 2020. 340
16. M. Akour, I. Alsmadi and I. Alazzam, "Software fault proneness prediction: a comparative study between bagging, boosting, 341
and stacking ensemble and base learner methods". International Journal of Data Analysis Techniques and Strategies, vol. 9, no. 342
1, pp. 1-16. 2017. 343
17. H. Alsghaier and M. Akour, "Software fault prediction using particle swarm algorithm with genetic algorithm and support 344
vector machine classifier", Software: Practice and Experience, vol. 50, no. 4, pp.407-427, 2020. 345
18. O. Al Qasem and M. Akour, "Software fault prediction using deep learning algorithms". International Journal of Open Source 346
Software and Processes (IJOSSP), vol. 10, no. 4, pp.1-19. 2019. 347
19. M. Akour and W. Melhem, "Software defect prediction using genetic programming and neural networks", International Journal 348
of Open Source Software and Processes (IJOSSP), vol. 8, no. 4, pp.32-51.2017. 349
20. K. Liu, D. Kim, T. F. Bissyand'e, T. Kim, K. Kim et al., "Learning to spot and refactor inconsistent method names," in Proceedings 350
of the 41st International Conference on Software Engineering, pp. 1–12, 2019 351
21. M. I. Azeem, F. Palomba, L. Shi, and Q. Wang, "Machine learning techniques for code smell detection: A systematic literature 352
review and meta-analysis," Information and Software Technology, vol. 108, pp. 115 – 138, 2019. 353
22. F. Simon, F. Steinbruckner, and C. Lewerentz. "Metrics based refactoring". In Proceedings Fifth European Conference on Soft- 354
ware Maintenance and Reengineering, pp. 30–38, 2001. 355
23. A. Baqais and M. Alshayeb, "Automatic software refactoring: a systematic literature review," Software Quality Journal, vol. 28, 356
no. 2, pp.459-502, 2020. 357
24. T. Mariani and S. R. Vergilio, "A systematic review on search-based refactoring," Information and Software Technology, vol. 83, 358
pp. 14–34, 2017. 359
25. M. Mohan and D. Greer, "A survey of search-based refactoring for software maintenance," Journal of Software Engineering 360
Research and Development, vol. 6, no. 1, pp. 3–55, 2018. 361
26. R.F. Woolson, "Wilcoxon signed‐rank test", Wiley encyclopedia of clinical trials, 2007. [Online]. Available: https://onlineli- 362
brary.wiley.com/doi/abs/10.1002/9780471462422.eoct979. 363
27. A. Baqais and M. Alshayeb, "Automatic software refactoring: a systematic literature review," Software Quality Journal, vol. 28, 364
no. 2, pp.459-502, 2020. 365
Processes 2022, 10, x FOR PEER REVIEW 10 of 10
28. M. Rosli, N.H.I. Teo, N.S.M. Yusop and N.S. Mohamad, "Fault prediction model for web application using genetic algorithm", 366
In International Conference on Computer and Software Modeling (IPCSIT), vol. 14, pp.71-77, 2011. 367
29. G. C. Chen and J.S. Yu, "Particle swarm optimization algorithm", Information and Control-Shenyang, vol. 34, no. 3, pp. 129- 368
318, 2005. 369
30. A. Ebrahimi and E. Khamehchi, "Sperm whale algorithm: an effective metaheuristic algorithm for production optimization 370
problems", Journal of Natural Gas Science and Engineering, vol. 29, pp.211-222, 2016. 371
31. N. Rana, M. Latiff, S. Abdulhamid and H. Chiroma, "Whale optimization algorithm: a systematic review of contemporary ap- 372
plications, modifications and developments", Neural Computing and Applications, vol. 32, no. 20, pp. 16245-16277, 2020. Avail- 373
able: 10.1007/s00521-020-04849-z [Accessed 11 February 2022]. 374
32. L. Kumar and A. Sureka, "Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class 375
Level", 2017 24th Asia-Pacific Software Engineering Conference (APSEC), 2017. Available: 10.1109/apsec.2017.15 [Accessed 11 376
February 2022]. 377
33. Analyticsindiamag. (2018, January 19). 7 Types of Classification Algorithms. Retrieved from https://analyticsindiamag.com/7- 378
types-classification-algorithms/#:~:text=Classifier%3A%20An%20algorithm%20that%20maps,catego- 379
ries%20for%20the%20new%20data 380
34. E. Erturk and E. Sezer, "A comparison of some soft computing methods for software fault prediction", Expert Systems with 381
Applications, vol. 42, no. 4, pp. 1872-1879, 2015. Available: 10.1016/j.eswa.2014.10.025 [Accessed 11 February 2022]. 382
35. J. Caldeira, F. Brito e Abreu, J. Cardoso and J. dos Reis, "Unveiling process insights from refactoring practices", Computer Stand- 383
ards & Interfaces, vol. 81, p. 103587, 2022. Available: 10.1016/j.csi.2021.103587 [Accessed 11 February 2022]. 384
36. M. Shepperd and G. Kadoda, "Comparing software prediction techniques using simulation", IEEE Transactions on Software 385
Engineering, 2022. 386
37. N. Tsantalis and A. Chatzigeorgiou, "Identification of refactoring opportunities introducing polymorphism", Journal of Systems 387
and Software, 2022. 388
389