1.2. Overview of Machine Learning

> How do we determine which learning method is appropriate for what type of software
development or maintenance task?

> Which learning methods can be used to make headway in what aspect of the essential
difficulties in software development?
> When we attempt to use some learning method to help with an SE task, what are the general
guidelines and how can we avoid some pitfalls?
> What is the state-of-the-practice in ML&SE?
> Where is further effort needed to produce fruitful results?
Figure 1. Scope of this book.
1.2. Overview of Machine Learning

The field of ML includes: supervised learning, unsupervised learning and reinforcement
learning. Supervised learning deals with learning a target function from training examples of its
inputs and outputs. Unsupervised learning attempts to learn patterns in the input for which no
output values are available. Reinforcement learning is concerned with learning a control policy
through reinforcement from an environment. ML algorithms have been utilized in many different
problem domains. Some typical applications are: data mining problems where large databases
contain valuable implicit regularities that can be discovered automatically, poorly understood
Copyright © 2005. World Scientific Publishing Co Pte Ltd. All rights reserved.
domains where there is lack of knowledge needed to develop effective algorithms, or domains
where programs must dynamically adapt to changing conditions [105]. The following list of
publications and web sites offers a good starting point for the interested reader to be acquainted
with the state-of-the-practice in ML applications [2, 3, 9, 15, 32, 37-39, 87, 99, 100, 103, 105-
107,117-119,127,137].
ML is not a panacea for all the SE problems. To better use ML methods as tools to solve real
world SE problems, we need to have a clear understanding of both the problems, and the tools
and methodologies utilized. It is imperative that we know (1) the available ML methods at our
disposal, (2) characteristics of those methods, (3) circumstances under which the methods can be
most effectively applied, and (4) their theoretical underpinnings.
Since many SE development or maintenance tasks rely on some function (or functions,
mappings, or models) to predict, estimate, classify, diagnose, discover, acquire, understand,
3
Zhang, Du, and Jeffrey J.P Tsai. <i>Machine Learning Applications in Software Engineering</i>, World Scientific Publishing Co Pte Ltd, 2005. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/upc-ebooks/detail.action?docID=259264.
Created from upc-ebooks on 2019-08-17 17:47:40.
generate, or transform certain qualitative or quantitative aspect of a software artifact or a
software process, application of ML to SE boils down to how to find, through the learning
process, such a target function (or functions, mappings, or models) that can be utilized to carry
out the SE tasks. Learning involves a number of components: (1) How is the unknown (target or
true) function represented and specified? (2) Where can the function be found (the search space)?
(3) How can we find the function (heuristics in search, learning algorithms)? (4) Is there any
prior knowledge (background knowledge, domain theory) available for the learning process? (5)
What properties do the training data have? And (6) What are the theoretical underpinnings and
practical issues in the learning process?
1.2.1. Target functions
Depending on the learning methods utilized, a target function can be represented in different
hypothesis language formalisms (e.g., decision trees, conjunction of attribute constraints, bit
strings, or rules). When a target function is not explicitly defined, but the learner can generate its
values for given input queries (such as the case in instance-based learning), then the function is
said to be implicitly defined. A learned target function may be easy for the human expert to
understand and interpret (e.g., first order rules), or it may be hard or impossible for people to
comprehend (e.g., weights for a neural network).
interpretability representation formalism

\ \ \ attribute constraints
\ easy to understand \ ex pij c i t \ bit string, ANN
\ \ \ decision trees
\ hard to understand \ \ bayesian networks
\ \ implicit \
\ \ \ linear functions
\ length \ \ propositions
\ \ \ Horn clauses _
\ \ \ Target
7
/ predictive accuracy
7
/
7
/ |
Function
/ / PQrrpr / binary
ea
/ statistical significance / Ser / classification
/information content / / mu lti-value

/ / / classification
/ tradeoff between / lazy /
/ complexity and degree / / regression
/ of fit to data / /
properties generalization output
Figure 2. Issue in target functions.
Based on its output, a target function can be utilized for SE tasks that fall into the categories of
binary classification, multi-value classification and regression. When learning a target function
from a given set of training data, its generalization can be either eager (at learning stage) or lazy
4
(at classification stage). Eager learning may produce a single target function from the entire
training data, while lazy learning may adopt a different (implicit) function for each query.
Evaluating a target function hinges on many considerations: predictive accuracy, interpretability,
statistical significance, information content, and tradeoff between its complexity and degree of
fit to data. Quinlan in [119] states:
Learned models should not blindly maximize accuracy on the training data, but
should balance resubstitution accuracy against generality, simplicity,
interpretability, and search parsimony.
1.2.2. Hypothesis space

Candidates to be considered for a target function belong to the set called a hypothesis space H.
Let/be a true function to be learned. Given a set D of examples (training data) of/, the inductive
learning amounts to finding a function h e H that is consistent with D. h is said to approximate/.
How an H is specified and what structure H has would ultimately determine the outcome and
efficiency of the learning. The learning becomes unrealizable [124] when / £ H. Since / is
unknown, we may have to resort to background or prior knowledge to generate an H in which /
must exist. How prior knowledge is utilized to specify an appropriate H where the learning
problem is realizable (f e H) is a very important issue. There is also a tradeoff between the
expressiveness of an H and the computational complexity of finding simple and consistent h that
approximates / [124]. Through some strong restrictions, the expressiveness of the hypothesis
language can be reduced, thus yielding a smaller H. This in turn may lead to a more efficient
learning process, but at the risk of being unrealizable.
structures properties
lattlce
\ realizable (fs H)
\
no structure \ unrea Iizable (fg H)
\ \ Hypothesis
7 / Space H
/ Prior knowledge / expressiveness
/ / computational complexity
/ domain theory /
of finding a simple and
/ / consistent h
constraints tradeoff
Figure 3. Issues in hypothesis space H.
5
1.2.3. Search and bias
How can we find a simple and consistent h e H that approximates / ? This question essentially
boils down to a search problem. Heuristics (inductive or declarative bias) will play a pivotal role
in the search process. Depending on how examples in D are defined, learning can be supervised
or unsupervised. Different learning methods may adopt different types of bias, different search
strategies, and different guiding factors in search. For an /, its approximation can be obtained
either locally with regard to a subset of examples in D, or globally with regard to all examples in
D. Learning can result in either knowledge augmentation or knowledge (re)compilation.
Depending on the interaction between a learner and its environment, there are query learning and
reinforcement learning.
There are stable and unstable learning algorithms depending on the sensitivity to changes in the
training data [37]. For unstable algorithms (e.g., decision tree, neural networks, or rule-learning),
small changes in the training data will result in the algorithms generating significantly different
output function. On the other hand, stable algorithms (e.g., linear-regression and nearest-
neighbor) are immune (not easy to succumb to) small changes in data [37].
Instead of using just a single learned hypothesis for classification of unseen cases, an ensemble
of hypotheses can be deployed whose individual decisions are combined to accomplish the task
of new case classification. There are two major issues in ensemble learning: how to construct
ensembles, and how to combine individual decisions of hypotheses in an ensemble [37].
outcome guiding factor style domain theory bias

\ \ VS \ generalTospecific \ _ \ searchbias
\ knowledge \ info. g a i n \ idi^ \ training data \
6
\ augmentation \ ,. . \ \ alone \ language Dias
\ \ distance metric \ , ,.... ,. . \ \
\ \ \ g r e e d y ( hlU c h m - ) \ \ declarative bias
\ \ gradient desce. \ \ \
\ \ fitnp^ \ s i m P l e T o c o m P l e x \ training data + \ changeable vs. unchangeable
\ Knowledge \ \ deductive \ domain theory \
\recompilation \ cumulat. reward \ _ \ \ I m p l i c i t vs. explicit
\ \relat. frequency \ randomized beam \ \
\ \ m-esti. accuracy \ n o explicit search \ \ I
^7 ^ i ^ 7 1 T Search
/ / / / how to / '
/ query learning / supervised / unstable / construct /
/ / learning / algonthms / ensembles / local (subset of
/ I / / I training data)
/ reinforcement / unSupervised / stable / how to /
/ learning / leamin / algorithms / combine / global (all training
classifiers ta)
/ / / / /
learner-environment supervision stability ensemble approximation

interaction
Figure 4. Issues in search of hypothesis.
6
Another issue during the search process is the need for interaction with an oracle. If a learner
needs an oracle to ascertain the validity of the target function generalization, it is interactive;
otherwise, the search is non-interactive [89]. The search is flexible if it can start either from
scratch or from an initial hypothesis.
1.2.4. Prior knowledge
Prior (or background) knowledge about the problem domain where an / is to be learned plays a
key role in many learning methods. Prior knowledge can be represented differently. It helps
learning by eliminating otherwise consistent h and by filling in the explanation of examples,
which results in faster learning from fewer examples. It also helps define different learning
techniques based on its logical roles and identify relevant attributes, thus yielding a reduced H
and speeding up the learning process [124].
There are two issues here. First of all, for some problem domains, the prior knowledge may be
sketchy, inaccurate or not available. Secondly, not all learning approaches are able to
accommodate such prior knowledge or domain theories. A common drawback of some general
learning algorithms such as decision tree or neural networks is that it is difficult to incorporate
prior knowledge from problem domains to the learning algorithms [37]. A major motivation and
advantage of stochastic learning (e.g., naive Bayesian learning) and inductive logic programming
is their ability to utilize background knowledge from problem domains in the learning algorithm.
For those learning methods for which prior knowledge or domain theory is indispensable, one
issue to keep in mind is that the quality of the knowledge (correctness, completeness) will have a
direct impact on the outcome of the learning.
representation quality properties
\ first order \ correctness \setey

\ theories \ \
\ \ \ inaccurate
\COnStraintS \ completeness \ notavailable
\ probabilities \ \
\ \ \ Prior
7 7 Knowledge
/expedite learning / I
/fromfewerdata / e ^ t0 accommodate
/ define different /
/ learning methods /
hard t0
/ identify relevant / accommodate
/ attributes /
roles accommodation
Figure 5. Issues in prior knowledge.
7
1.2.5. Training data
Training data gathered for the learning process can vary in terms of (1) the number of examples,
(2) the number of features (attributes), and (3) the number of output classes. Data can be noisy or
accurate in terms of random errors, can be redundant, can be of different types, and have
different valuations. The quality and quantity of training data have direct impact on the learning
process as different learning methods have different criteria regarding training data, with some
methods requiring large amount of data, others being very sensitive to the quality of data, and
still others needing both training data and a domain theory. Training data may be just used once,
or multiple times, by the learner.
Scaling-up is another issue. Real world problems can have millions of training cases, thousands
of features and hundreds of classes [37]. Not all learning algorithms are known to be able to
scale up well with large problems in those three categories.
When a target function is not easy to be learned from the data in the input space, a need arises to
transform the data into a possible high-dimensional feature space F and learn the target function
in F. Feature selection in F becomes an important issue as both computational cost and target
function generalization performance can degrade as the number of features grows [32].
Finally, based on the way in which training data are generated and provided to a learner, there
are batch learning (all data are available at the outset of learning) and on-line learning (data are
available to the learner one example at a time)
feature selection valuation properties
\ «rctra'kyViscre,e value \ «-™-««

\ \ \ noisy/accurate
\ irrelevant feature \ r e a l v a l u e \
\ detection and \ \ r a n d o m errors
\ elimination \ \
\ -u \ vector value \ redundancy
J
\ filters \ \
\ wrappers \ \ I
\ \ \ Training
7 7 7 I Data
usec
/ sequences / ^ / batch /
/ / once / learning / large data set
/time series / / / large number of

/ / used multiple / online / features
/ / times / learning /
/ spatial / / / large number of classes
type frequency availability scale-up
Figure 6. Issues in training data.
8
1.2.6. Theoretical underpinnings and practical considerations
Underpinning learning methods are different justifications: statistical, probabilistic, or logical.
What are the frameworks for analyzing learning algorithms? How can we evaluate the
performance of a generated function, and determine the convergence issue? What types of
practical problems do we have to come to grips with? Those issues must be answered if we are to
succeed in real world SE applications.
application types convergence analysis framework

\ , . . \ \ PAC
\ data mining \ feasibility \
\ \ \ stationary assumption
\ poorly understood \ \
settin
\ domains \ 8s \ sample complexity of H
\ changing conditions \ conditions \ mistake bound

\in domains \ \
\ \ \ Theory &
/ 7 I Practice
/
overfitting / accuracy / I
/, . / (sample/true / statistical
/ underfitting / em)r) /
/ local minima / confidence / logical
/ crowding / intervals /
/ / / probabilistic
/ Curse of / comparison /
/ dimensionality / /
practical problem evaluating h justification
Figure 7. Theoretical and practical issues.

1.3. Learning Approaches

There are many different types of learning methods, each having its own characteristics and
lending itself to certain learning problems. In this book, we organize major types of supervised
and reinforcement learning methods into the following groups: concept learning (CL), decision
tree learning (DT), neural networks (NN), Bayesian learning (BL), reinforcement learning (RL),
genetic algorithms (GA) and genetic programming (GP), instance-based learning (IBL, of which
case-based reasoning, or CBR, is a popular method), inductive logic programming (ILP),
analytical learning (AL, of which explanation-based learning, or EBL is a method), combined
inductive and analytical learning (IAL), ensemble learning (EL) and support vector machines
(SVM). The organization of different learning methods is largely influenced by [105]. In some
literature [37, 124], stochastic (statistical) learning is used to refer to learning methods such as
BL.
9

1.2. Overview of Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1.2. Overview of Machine Learning

Uploaded by

Copyright:

Available Formats

> How do we determine which learning method is appropriate for what type of software

development or maintenance task?

Figure 1. Scope of this book.

1.2. Overview of Machine Learning

interpretability representation formalism

/information content / / mu lti-value

Figure 2. Issue in target functions.

1.2.2. Hypothesis space

/ Prior knowledge / expressiveness

Figure 3. Issues in hypothesis space H.

outcome guiding factor style domain theory bias

learner-environment supervision stability ensemble approximation

Figure 4. Issues in search of hypothesis.

\ first order \ correctness \setey

Figure 5. Issues in prior knowledge.

feature selection valuation properties

\ «rctra'kyViscre,e value \ «-™-««

/time series / / / large number of

type frequency availability scale-up

Figure 6. Issues in training data.

application types convergence analysis framework

\ changing conditions \ conditions \ mistake bound

Figure 7. Theoretical and practical issues.

1.3. Learning Approaches

You might also like