Professional Documents
Culture Documents
domains where there is lack of knowledge needed to develop effective algorithms, or domains
where programs must dynamically adapt to changing conditions [105]. The following list of
publications and web sites offers a good starting point for the interested reader to be acquainted
with the state-of-the-practice in ML applications [2, 3, 9, 15, 32, 37-39, 87, 99, 100, 103, 105-
107,117-119,127,137].
ML is not a panacea for all the SE problems. To better use ML methods as tools to solve real
world SE problems, we need to have a clear understanding of both the problems, and the tools
and methodologies utilized. It is imperative that we know (1) the available ML methods at our
disposal, (2) characteristics of those methods, (3) circumstances under which the methods can be
most effectively applied, and (4) their theoretical underpinnings.
Since many SE development or maintenance tasks rely on some function (or functions,
mappings, or models) to predict, estimate, classify, diagnose, discover, acquire, understand,
3
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
generate, or transform certain qualitative or quantitative aspect of a software artifact or a
software process, application of ML to SE boils down to how to find, through the learning
process, such a target function (or functions, mappings, or models) that can be utilized to carry
out the SE tasks. Learning involves a number of components: (1) How is the unknown (target or
true) function represented and specified? (2) Where can the function be found (the search space)?
(3) How can we find the function (heuristics in search, learning algorithms)? (4) Is there any
prior knowledge (background knowledge, domain theory) available for the learning process? (5)
What properties do the training data have? And (6) What are the theoretical underpinnings and
practical issues in the learning process?
1.2.1. Target functions
Depending on the learning methods utilized, a target function can be represented in different
hypothesis language formalisms (e.g., decision trees, conjunction of attribute constraints, bit
strings, or rules). When a target function is not explicitly defined, but the learner can generate its
values for given input queries (such as the case in instance-based learning), then the function is
said to be implicitly defined. A learned target function may be easy for the human expert to
understand and interpret (e.g., first order rules), or it may be hard or impossible for people to
comprehend (e.g., weights for a neural network).
Based on its output, a target function can be utilized for SE tasks that fall into the categories of
binary classification, multi-value classification and regression. When learning a target function
from a given set of training data, its generalization can be either eager (at learning stage) or lazy
4
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
(at classification stage). Eager learning may produce a single target function from the entire
training data, while lazy learning may adopt a different (implicit) function for each query.
Evaluating a target function hinges on many considerations: predictive accuracy, interpretability,
statistical significance, information content, and tradeoff between its complexity and degree of
fit to data. Quinlan in [119] states:
Learned models should not blindly maximize accuracy on the training data, but
should balance resubstitution accuracy against generality, simplicity,
interpretability, and search parsimony.
structures properties
lattlce
\ realizable (fs H)
\
no structure \ unrea Iizable (fg H)
Copyright © 2005. World Scientific Publishing Co Pte Ltd. All rights reserved.
\ \ Hypothesis
7 / Space H
/ / computational complexity
/ domain theory /
of finding a simple and
/ / consistent h
constraints tradeoff
5
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
1.2.3. Search and bias
How can we find a simple and consistent h e H that approximates / ? This question essentially
boils down to a search problem. Heuristics (inductive or declarative bias) will play a pivotal role
in the search process. Depending on how examples in D are defined, learning can be supervised
or unsupervised. Different learning methods may adopt different types of bias, different search
strategies, and different guiding factors in search. For an /, its approximation can be obtained
either locally with regard to a subset of examples in D, or globally with regard to all examples in
D. Learning can result in either knowledge augmentation or knowledge (re)compilation.
Depending on the interaction between a learner and its environment, there are query learning and
reinforcement learning.
There are stable and unstable learning algorithms depending on the sensitivity to changes in the
training data [37]. For unstable algorithms (e.g., decision tree, neural networks, or rule-learning),
small changes in the training data will result in the algorithms generating significantly different
output function. On the other hand, stable algorithms (e.g., linear-regression and nearest-
neighbor) are immune (not easy to succumb to) small changes in data [37].
Instead of using just a single learned hypothesis for classification of unseen cases, an ensemble
of hypotheses can be deployed whose individual decisions are combined to accomplish the task
of new case classification. There are two major issues in ensemble learning: how to construct
ensembles, and how to combine individual decisions of hypotheses in an ensemble [37].
^7 ^ i ^ 7 1 T Search
/ / / / how to / '
/ query learning / supervised / unstable / construct /
/ / learning / algonthms / ensembles / local (subset of
/ I / / I training data)
/ reinforcement / unSupervised / stable / how to /
/ learning / leamin / algorithms / combine / global (all training
classifiers ta)
/ / / / /
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
6
Another issue during the search process is the need for interaction with an oracle. If a learner
needs an oracle to ascertain the validity of the target function generalization, it is interactive;
otherwise, the search is non-interactive [89]. The search is flexible if it can start either from
scratch or from an initial hypothesis.
1.2.4. Prior knowledge
Prior (or background) knowledge about the problem domain where an / is to be learned plays a
key role in many learning methods. Prior knowledge can be represented differently. It helps
learning by eliminating otherwise consistent h and by filling in the explanation of examples,
which results in faster learning from fewer examples. It also helps define different learning
techniques based on its logical roles and identify relevant attributes, thus yielding a reduced H
and speeding up the learning process [124].
There are two issues here. First of all, for some problem domains, the prior knowledge may be
sketchy, inaccurate or not available. Secondly, not all learning approaches are able to
accommodate such prior knowledge or domain theories. A common drawback of some general
learning algorithms such as decision tree or neural networks is that it is difficult to incorporate
prior knowledge from problem domains to the learning algorithms [37]. A major motivation and
advantage of stochastic learning (e.g., naive Bayesian learning) and inductive logic programming
is their ability to utilize background knowledge from problem domains in the learning algorithm.
For those learning methods for which prior knowledge or domain theory is indispensable, one
issue to keep in mind is that the quality of the knowledge (correctness, completeness) will have a
direct impact on the outcome of the learning.
representation quality properties
\ probabilities \ \
\ \ \ Prior
Copyright © 2005. World Scientific Publishing Co Pte Ltd. All rights reserved.
7 7 Knowledge
/expedite learning / I
/fromfewerdata / e ^ t0 accommodate
/ define different /
/ learning methods /
hard t0
/ identify relevant / accommodate
/ attributes /
roles accommodation
7
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
1.2.5. Training data
Training data gathered for the learning process can vary in terms of (1) the number of examples,
(2) the number of features (attributes), and (3) the number of output classes. Data can be noisy or
accurate in terms of random errors, can be redundant, can be of different types, and have
different valuations. The quality and quantity of training data have direct impact on the learning
process as different learning methods have different criteria regarding training data, with some
methods requiring large amount of data, others being very sensitive to the quality of data, and
still others needing both training data and a domain theory. Training data may be just used once,
or multiple times, by the learner.
Scaling-up is another issue. Real world problems can have millions of training cases, thousands
of features and hundreds of classes [37]. Not all learning algorithms are known to be able to
scale up well with large problems in those three categories.
When a target function is not easy to be learned from the data in the input space, a need arises to
transform the data into a possible high-dimensional feature space F and learn the target function
in F. Feature selection in F becomes an important issue as both computational cost and target
function generalization performance can degrade as the number of features grows [32].
Finally, based on the way in which training data are generated and provided to a learner, there
are batch learning (all data are available at the outset of learning) and on-line learning (data are
available to the learner one example at a time)
7 7 7 I Data
usec
/ sequences / ^ / batch /
/ / once / learning / large data set
8
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
1.2.6. Theoretical underpinnings and practical considerations
Underpinning learning methods are different justifications: statistical, probabilistic, or logical.
What are the frameworks for analyzing learning algorithms? How can we evaluate the
performance of a generated function, and determine the convergence issue? What types of
practical problems do we have to come to grips with? Those issues must be answered if we are to
succeed in real world SE applications.
9
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.