You are on page 1of 7

IJIRST –International Journal for Innovative Research in Science & Technology| Volume 4 | Issue 1 | June 2017

ISSN (online): 2349-6010

Effective Bug Triage using Software Data

Reduction Techniques
Rinku A. Chaudhari Prof. Sarika V. Bodake
ME Student Assistant Professor
Department of Computer Engineering Department of Computer Engineering
P.V.P.I.T Bavdhan, Pune, Maharashtra, India P.V.P.I.T Bavdhan, Pune, Maharashtra, India

Bug triage is a very much important step during bug fixing. Bug triage is the way toward fixing bug whose primary target is to
accurately apportion a designer to another bug additionally taking care of. Many software organizations spend their too much cost
in managing these bugs. To reduce the time cost in manual work and to improve the working of programmed bug triage, two
procedures are connected in particular content characterization and double arrangement. In writing different papers address the
issue of information diminishment for bug triage, i.e., how to lessen the scale and enhance the nature of bug information. By joining
the example determination and the component choice calculations to at the same time reduce the information scale and upgrade
the correctness of the bug reports in the bug triage. According to writing, need to build up a powerful model for doing information
lessening on bug information set which will decrease the size of the information and increment the nature of the information, by
minimizing the time and cost to get accurate result. System produces bug report and assigns that bug to appropriate developer
according to his domain.
Keywords: Bug Triage, Data Reduction, Instance Selection, Feature Selection, Data Mining


Various product organizations spend a large portion of the cash in settling the bugs. Vast programming ventures have bug
storehouse that gathers together all the data identified with bugs. In bug store, every product bug has a bug report. The bug report
comprises of literary data with respect to the bug and overhauls identified with status of bug settling [1]. Once a bug report is
shaped, a human triager doles out this bug to a designer, who will attempt to settle this bug. This engineer is recorded in a thing
allocated to the appointed to will change to another designer if the already relegated engineer can't settle this bug [2]. The way
toward allotting a right developer for settling the bug is called bug triage. Bug triage is a standout amongst the most tedious stride
in treatment of bugs in programming ventures for solving it [3]. Manual bug triage by a human triager is tedious and blunder
inclined since the quantity of day by day bugs is substantial and absence of learning in engineers about all bugs. Due to every one
of these things, bug triage brings 5678 about costly time misfortune, high cost and low precision. The data put away in bug reports
has two main primary difficulties [4]. Firstly the extensive scale information and also low nature of information. Because of vast
number of daily reported bugs, the quantity of bug reports is scaling up in the store [5]. Uproarious and repetitive bugs are debasing
the nature of bug reports. In this paper a powerful bug triage framework is proposed which will lessen the bug information to spare
the 745-63.


Present a system for discovering flaws in PHP Web applications that depends on joined concrete and typical execution [1]. The
work is novel in a few regards. To start with, the strategy distinguishes runtime mistakes as well as utilizations a HTML validator
as a prophet to decide circumstances where deformed HTML is made. Second, address various PHP-particular issues, for example,
the recreation of intelligent client input that happens when UI components on created HTML pages are enacted, bringing about the
execution of extra PHP scripts. Third, we play out a robotized examination to minimize the span of disappointment actuating
Audits the issues with utilizing basic K-Means as a part of the order of datasets [2]. The viability of Quad Tree based EM
grouping calculation in foreseeing deficiencies while characterizing a dataset, when contrasted with other existing calculations, for
example, K-Means has been assessed. The Quad Tree approach appoints proper introductory group focuses and wipes out the
exceptions. K-Means is thought to be one of the most straightforward strategies to group information. Nonetheless, the proposed
EM calculation is utilized to group information adequately. Consolidating the Quad Tree approach and the EM calculation gives a
bunching strategy that not just fits the information better in the groups additionally tries to make them minimal and more significant.
Utilizing EM alongside Quad Tree makes the grouping procedure quicker. With K-implies, merging is not ensured but rather EM
ensures rich union. It conducts a contextual analysis on high effect bugs, which characterized bugs answered to four open source
ventures into six sorts of high effect bugs [3].

All rights reserved by 214

Effective Bug Triage using Software Data Reduction Techniques
(IJIRST/ Volume 4 / Issue 1/ 036)

For the situation study, one hundred bug reports were physically reviewed for every venture and are arranged into six sorts of
high effect bugs in light of past reviews which concentrate on high effect bugs. Our contextual investigation expected to uncover
dispersions of high effect bugs in reported bugs and covered connections among high effect bugs. It has actualized cost-triage
calculation which abuses the cost and exactness of bug settling or bug forecast [4]. This paper has likewise actualized the element
determination procedure which lessens the quantity of components utilized by a machine learning classifier for bug forecast. It is
investigated the utilization of highlight determination strategies to foresee programming bugs. A critical sober minded outcome is
that component determination can be performed in augmentations of half of every single outstanding element, permitting it to
continue rapidly. Somewhere around 3.12 and 25 percent of the aggregate list of capabilities yielded ideal characterization comes
about. The lessened list of capabilities allows better and speedier bug expectations. The procedure is quick performing and can be
connected to anticipating bugs on different activities [5]. The most imperative outcomes originating from utilizing the element
determination process are found in Table, which presents F-measure improved outcomes for the Naive Bayes classifier. A helpful
even minded outcome is that component determination can be performed in augmentations of half of every single residual element,
permitting it to continue rapidly. The normal carriage is accuracy is very high at 0.97, with a sensible review of 0.70. This output
beats comparative classifier-based bug forecast strategies and the outcomes prepare for reasonable reception of classifier-based
bug expectation. From the perspective of an engineer accepting bug expectations on their work, these figures imply that if the
classifier says a code change has a bug, it is quite often cost of engineers. It additionally expects to manufacture a
brilliant arrangement of bug information by expelling the excess and non-useful bug reports [7].
Drawback of Existing System
 Bugs are manually triaged by an expert developer.
 Manual bug triage is expensive in time cost and low in accuracy.
 Lack of data reduction technique.
 Developers needs to study whole bug repository to solve bug.


Manual Bug fixing is time consuming task and didn’t get accurate result. So that proposed system is provided. There is problem
of getting accurate bug solution according to domain. In existing approach, get reduced bug dataset and high quality bug dataset.
For that point, proposed system is provided. We used existing system instance selection and feature selection for reducing bug
dataset. And additionally use Top-K pruning algorithm for improving results of data reduction quality as compared to existing
system and get domain wise bug solution.

Fig. 1: Proposed System Architecture

Bug triage is one of the most time consuming step in handling of bugs in software projects. Manual bug triage by a human
triager is time consuming and error-prone since the number of daily bugs is large and lack of knowledge in developers about all
bugs. Because of all these things, bug triage results in expensive time loss, high cost and low accuracy [1]. The information stored
in bug reports has two main challenges. Firstly the large scale data and secondly low quality of data. Due to large number of daily
reported bugs, the number of bug reports are added up in the repository. Noisy and redundant bugs are degrading the quality of
bug reports. The effective bug triage system is proposed which will reduce the bug data to save the labour cost of developers. It
also aims to build a high quality set of bug data by removing the redundant and non-informative bug reports .The proposed system
consist of following modules:

All rights reserved by 215

Effective Bug Triage using Software Data Reduction Techniques
(IJIRST/ Volume 4 / Issue 1/ 036)

Instance Selection
Instance selection and feature selection are widely used techniques in data processing. For a given data set in a certain application,
instance selection is to obtain a subset of relevant instances (i.e., bug reports in bug data) while feature selection aims to obtain a
subset of relevant features (i.e., words in bug data). In our work, we employ the combination of instance selection and feature
Data Reduction
In our work, to save the labor cost of developers, the data reduction for bug triage has two goals.
 Reducing the data scale.
 Improving the accuracy of bug triage.
In contrast to modelling the textual content of bug reports in existing work, we aim to augment the data set to build a pre-
processing approach, which can be applied before an existing bug triage approach. We explain the two goals of data reduction as
Create Bug Report
According to error it will create the bug report.
Assign Bug to Developer
Assign bug report to appropriate developer.


Reducing Bug Data for Bug TRIAG

Bug data reduction in our work, which is applied as a phase in data preparation of bug triage. We combine existing techniques of
instance selection and feature selection to remove certain bug reports and words, i.e., in Fig. 2. A problem for reducing the bug data is to
determine the order of applying instance selection and feature selection, which is denoted as the prediction of reduction orders.

Fig. 2: Illustration of reducing bug data for bug triage

All rights reserved by 216

Effective Bug Triage using Software Data Reduction Techniques
(IJIRST/ Volume 4 / Issue 1/ 036)

1) Bug Summery is generalized in pdf file. Pdf is very nice contribution for this project. Pdf contains details as follows.
 Bug Summary: We have use reduction algorithm here to reduce the result and show that in pdf.
 Bug Deadline: It contains date by which the bug should be solved.
 Bug Description: Here the details of bug are reported. This is nothing but the bug reported by the manager.
 Suggestions: Developer who has already worked on some or have some knowledge have a bug can be solved then he or she
can give suggestion to the user.
2) Graphical representation for bug assigning and completion by developer.

Fig. 3: Project work flow

Algorithm Used
Data reduction based on FS → IS
 FS- Feature Selection
 IS- Instance Selection
 Input: training set T with n words and m bug reports, reduction order FS→ IS
 Final number nF of words,
 Final number m1 of bug reports,
 Output: reduced data set TFT for bug triage
 Apply FS to n words of T and calculate objective values for all the words;
 Select the top nF words of T and generate a training set TF;
 Apply IS to m1 bug reports of TF;
 Terminate IS when the number of bug reports is equal to or less than ml and generate the final training set TFI.
In Algorithm 1, we briefly describe how to reduce the bug Data based on FS ! IS. Given a bug data set, the output of Bug data
reduction is a emerging new trend and reduced data set. Two algorithms FS and IS are applied sequentially. Note that in Step2,
some of bug reports may be blank during feature Xuan et al.: towards effective bug triage with software data all the words in a bug
report are removed. Such Blank bug reports are also removed in the feature selection. In our work, FS ! IS and IS ! FS are viewed
as two Orders of bug data reduction. To avoid the bias from a single Algorithm, we examine results of four typical algorithms of
Instance selection and feature selection.
Naive Bayes
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is also
known to outperform even highly sophisticated classification methods. Bayes theorem provides a way of calculating posterior
probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

All rights reserved by 217

Effective Bug Triage using Software Data Reduction Techniques
(IJIRST/ Volume 4 / Issue 1/ 036)

 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.
1) Step 1: Convert the data set into a frequency table
2) Step 2: Create Likelihood table by finding the probabilities
3) Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest
posterior probability is the outcome of prediction.
Mathematical Model
Let, S = {I, P, IO, O}
Where, S = Set of bug report (text file)
I = Set of inputs
P = Set of processes
IO = Intermediate output
O = Set of outputs/ final outputs
1) I = {i}
Where, i = Folder containing bug file in text format
2) P = {p1, p2, p3, p4}
Where, p1 = Load data set,
p2=Vector quantization and applying LVQ algorithm (instance selection),
p3 = Feature selection,
p4 = Result in the cluster file (new report).
3) IO = {io1, io2, io3}
io1 = Check the file format,
io2 = Data reduction,
io3 = Checking the path for output file.
4) O= {o} Where, o=cluster of document (new bug report).


Following table shows result comes from the system performance, it shows that the required to assign bug to appropriate developer
with respect to file size of bug.
Table - 1
Time required to triage bug
Sr. No Time(Sec) Image Size(KB)
1 100 7
2 160 8
3 200 11
4 310 12
5 400 13

Fig. 4: Developers / Solved Bugs

All rights reserved by 218

Effective Bug Triage using Software Data Reduction Techniques
(IJIRST/ Volume 4 / Issue 1/ 036)

Fig. 5: Result graph

As shown in figure 5 the graph represents time require to assign bug to developer. If bug size is increasing rapidly then the time
is also increased.
Outcome of System
Getting result from this dissertation, it’s just an actual result. It’s part of knowledge about Developer and Manager how they can
work. The actual output of modules is as follows:
 Developer can register the details and self-login into the system.
 Developers views the assign bug task and solves it and update the status.
 Manager login to the system and upload the bug data.
 After uploading the bug data system do the data reduction and create appropriate report.
 Manager can view the developer’s task status.


Bug triage is an expensive walk of programming upkeep in both work cost and time cost. In this paper, we combine highlight
assurance with case decision to diminish the span of bug data sets and also improve the data quality. To choose the demand of
applying event decision and highlight decision for another bug data set, we remove properties of each bug data set and set up a
perceptive model in perspective of bona fide data sets. We tentatively investigate the data diminishment for bug triage in bug
storage facilities of two immense open source endeavours, to be particular Eclipse and Mozilla. Our work gives an approach to
manage using procedures on data get ready to outline diminished and first rate bug data in programming headway and support. We
have included many other, modules which are helpful in many ways. Those are Bug summery that will summarize the bug
generated report in the form of pdf file. It will contain bug summery, bug deadline, bug description and suggestions by develop if
any. Graphical representation for bug assigning and completion by developer is also provided. This makes the analysis easier for
the higher authority to decide the developer to assign further bug to repair.
In future work, we expect improving the eventual outcomes of data diminishment in bug triage to explore how to set up a first
rate bug data set and handle a space specific programming task. For predicting reducing orders, we plan to pay attempts to find the
potential relationship between the characteristics of bug data sets and the diminishment orders. In Future System used in as:
 For IT industry to manage bug solution process task.
 Used automatic bug triage in industry.

[1] Shay Artzi, Adam Kie zun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Senior Member, IEEE, and Michael D. Ernst,” Finding Bugs in Web
Applications Using Dynamic Test Generation and Explicit-State Model Checking” 2010.
[2] Partha Sarathi Bishnu and Bandana Bhattacherjee, Member, IEEE,” Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm”,
[3] Yutaro Kashiwa Hayato Yoshiyuki “A Pilot Study of Diversity in High Impact Bugs” 2014.
[4] Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella “Reducing Features to Improve Bug Prediction.” 2009.
[5] Ram Akella, Sunghun Kim” Reducing Features to Improve Code Change-Based Bug Prediction” 2013.
[6] K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expert finding in enterprise corpora,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop.
Inform. Retrieval, Aug. 2006, pp. 43–50.

All rights reserved by 219

Effective Bug Triage using Software Data Reduction Techniques
(IJIRST/ Volume 4 / Issue 1/ 036)

[7] P. S. Bishnu and V. Bhattacherjee, “Software fault prediction using quad tree-based k-means clustering algorithm,” IEEE Trans. Knowl. Data Eng., vol. 24,
no. 6, pp. 1146–1150, Jun. 2012.
[8] H. Brighton and C. Mellish, “Advances in instance selection for instance-based learning algorithms,” Data Mining Knowl. Discovery, vol. 6, no. 2, pp. 153–
172, Apr. 2002.
[9] S. Breu, R. Premraj, J. Sillito, and T. Zimmermann, “Information needs in bug reports: Improving cooperation between developers and users,” in Proc. ACM
Conf. Comput. Supported Cooperative Work, Feb. 2010, pp. 301–310.
[10] V. Bol_on-Canedo, N. S anchez-Maro~no, and A. Alonso-Betanzos, “A review of feature selection methods on synthetic data,” Knowl. Inform. Syst., vol.
34, no. 3, pp. 483–519, 2013.

All rights reserved by 220