You are on page 1of 20

Applied Intelligence 14, 95–114, 2001

°
c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.

An Interactive Visualisation Tool for Case-Based Reasoners

ELIZABETH MCKENNA AND BARRY SMYTH


Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
Elizabeth.McKenna@ucd.ie
Barry.Smyth@ucd.ie

Abstract. Case-based reasoning (CBR) offers many opportunities for human interaction as part of its reasoning
cycle. In particular, one of the main advantages of case-based methods is their use of real case data, the sort of
data that humans are intrinsically comfortable with—this is typically in contrast to the rule-based and model-based
knowledge of more traditional first-principles reasoning systems. As a result, human participation has been a key
factor in a number of case-based systems, particularly when it comes to assisting in the retrieval and adaptation
processes. In this article we consider the case authoring process and note that, although the authoring process has
always been driven by human involvement, it is probably the least well developed CBR process when it comes
to offering real-time assistance to the human author. Many conventional CBR authoring tools provide editing and
auditing facilities only. In this article we describe the innovative approach behind the CASCADE authoring system,
which allows case authors to interact with, and be guided by, a model of case competence through a variety of novel
visualisation tools. We argue that this mode of interaction facilitates the more rapid development of high quality
case bases.

Keywords: interactive case-based reasoning, competence, interactive tools, authoring support, visualisation

1. Introduction participation during various CBR stages has proven to


be very successful. In fact human participation has been
Case-based reasoning (CBR) systems solve problems explored in most stages of the CBR problem solving
by reusing the solutions to similar problems stored as cycle, especially with respect to domain modeling, re-
cases in a case base [1–4]. CBR is a rapidly emerg- trieval, adaptation, and the learning of cases ([5–10]).
ing technology that has been generally recognised for Interestingly, research is only now beginning to be
its wide range of fielded applications and comprehen- carried out on how best to assist the case authoring
sive commercial toolkits. The success of CBR can be process [11]. Of course the user is involved in the au-
attributed to a number of factors. First of all, the perfor- thoring process, more so perhaps than in any other stage
mance of a CBR system depends critically on its pri- of the CBR cycle, and of course there are many tools
mary source of knowledge, the case base. Case knowl- available to facilitate authoring. However, these tools
edge is typically more readily available, and easier to offer only very rudimentary support facilities. They fo-
encode, than more traditional first-principles knowl- cus on supporting the representation and storage of
edge, such as if-then rules or formal models. This can cases, but there is little in the way of real intelligent
lead to the more rapid development of robust reason- support. For example, there are few tools capable of
ing systems. Secondly, CBR is an excellent frame- directing the author’s attention towards poorly repre-
work for developing an interactive reasoning system. sented regions of the problem space, so that the au-
Case-like knowledge is relatively easy for humans to thor may concentrate on adding cases where they are
understand and manipulate, more so than traditional most needed. Similarly, case authors need to be aware
first-principles knowledge, and for this reason human of over-populated regions of the problem space, i.e.,
96 McKenna and Smyth

regions with high case redundancy, but this too is be- 2. Related Work
yond the scope of existing authoring tools. In the fi-
nal analysis, the tools that we currently use provide The related work for this article falls into two cat-
valuable housekeeping functions, but the knowledge- egories. On the one hand there are many existing
intensive decision-making, about which cases to add or case-based reasoning development environments, all of
leave out, is left entirely up to the human author. which provide some level of authoring support, even if
Our primary objective in this article is to explore it only extends to basic case editing facilities [12, 13].
the potential for improved dialogue and interaction be- On a different track, given that we are interested in
tween case author and authoring system. We describe a helping the case author to distinguish between good
system called CASCADE (Case Authoring Support & and bad cases, recent work on case base maintenance,
Development Environment) that keeps the knowledge- and specifically on recognising high quality and redun-
engineer informed about how case authoring is pro- dant cases, becomes especially relevant.
gressing, and in particular, how case base competence
is evolving. The key to this, we believe, is a clearer 2.1. Case Authoring Tools
understanding of the relationship between cases and
system performance (particularly from a competence In a relatively short space of time (roughly 1985 to the
viewpoint). We propose that, by explicitly modelling present day) case-based reasoning technology has met
this relationship, it is possible to develop an innovative an unprecedented level of commercial success with a
set of tools to assist the knowledge-engineer during au- variety of successfully fielded applications and a grow-
thoring. In particular, we propose visualisation tools to ing number of commercial development tools. These
allow the author to perceive the competence of a case development tools generally share certain core fea-
base as it develops. Moreover, this visualisation system tures: an environment for authoring cases and organis-
will allow the user to interact directly with the growing ing a case base; retrieval functionality (and possibly
case base and will assist the user in making informed some limited adaptation functionality); client-server
decisions about how to modify the existing case base and interface functionality; and, possibly, some sim-
or about what new cases to add. We believe that such ple maintenance functionality. For example the suc-
tools are generally applicable and that they will pro- cessful commercial systems, such as CBRWorks from
vide a much needed, and valuable, layer of additional TecInno, k-Commerce from the Inference Corporation,
functionality for traditional authoring toolkits. and ReMind (previously by Cognitive Systems but no
The remainder of this article is organised into four longer supported) provide sophisticated tools with all
basic sections. In the next section we look at related of the above functionality. Other tools such as CaseAd-
work in the area of case authoring support and case visor by ICue Solutions or KATE by AcknoSoft fo-
base maintenance. Section 3 describes an innovative cus on the representation and retrieval functionality
model of competence for CBR systems, which allows ([12, 13]).
the evolving competence of a case base to be measu- In terms of case authoring support, these tools all
red and predicted. Using this model, we demonstrate offer basic authoring functionality, providing user-
how it is possible to measure a number of important friendly editing facilities for defining case structures
competence-related properties of a case base, and it and for editing individual cases ([12, 13]). However,
is these properties that allow us to visualise and sup- when it comes to guiding the development of a case
port the authoring process in CASCADE. Section 4 base many commercial tools are found to be lacking.
describes and demonstrates CASCADE’s interactive Case base authoring can be a long, difficult, and tedious
visualisation support tools and shows how these tools process, and the only advice given to the author is of-
can be of help to the knowledge engineer through- ten of the “choose representative cases” variety. This
out the authoring process. Finally, before concluding, can ultimately lead to the development of poor case
Section 5 describes some recent results from an ini- bases, which offer limited coverage of the target prob-
tial evaluation of the CASCADE visualisation tool. lem space, and which include significant redundancy.
These results show that there are significant bene- Fortunately some commercial tools do try to address
fits to be derived from CASCADE when it comes to this problem by providing the case author with some
guiding the author during the addition and deletion of testing functionality or valuable statistics relating to the
cases. cases in the case base.
An Interactive Visualisation Tool 97

For example, the k-Commerce CBR engine provides feature selection techniques to remove or add certain
a test function to allow the author to use a newly added features.
case as a target query in order to determine whether The research on techniques for building optimal case
a very similar case already exists in the case base. bases has its origin in the classification literature where
This will at least allow the author to recognise poten- an important goal was to edit training data (cases) to
tially redundant cases, and eliminate them if necessary. produce a condensed training set for lazy learning tech-
ReMind took a different approach by generating statis- niques such as nearest-neighbour methods ([21–24]).
tical data on individual features in the case base. This More recently, the work of Aha et al. [25] has built on
allowed the author to look at the range and distribution these techniques to produce algorithms to edit train-
of feature values in the case base, which is useful for ing data for instance-based learning algorithms. The
identifying anomalous cases. For instance, in a travel basic technique used to edit training data is often the
application a case base might contain cases describ- same. A number of passes are made over the training
ing different vacation packages. One important feature data to build up a reduced set of examples. An example
might be the price of the vacation, and this feature is only added to the reduced set if it cannot be prop-
may normally contain values in the range of $500– erly classified by the reduced set built so far; this basic
$2000. However, if a case is added to represent an “all approach is implemented as the condensed nearest-
the frills—once in a lifetime” vacation with a cost of neighbour (CNN) rule [23]. Basic improvements to
$10,000 this may skew the distribution of cases in the CNN include methods for sorting the training data to
case base. This will be recognised by the author as an improve the quality of the final edited sets. A range of
increase in the range and standard deviation of the price sorting methods are proposed that attempt to measure
feature, and the appropriate action may then be taken. the classification power of the training examples, so
For example, such an abnormal case may be deleted that the best examples are added to the edited set early
or the similarity function over the price feature may be on ([26–28]).
adjusted. An alternative method focuses on deleting cases
While commercial tool vendors have now begun to from a complete case base. Again variations on this
recognise the importance of providing authoring sup- theme are found in the classification literature. In par-
port facilities the current state-of-the-art falls short of ticular, Gates [22] proposes a version of CNN that
the desired goal. The onus is very much on the au- makes multiple passes over an edited training set in
thor to analyse and evaluate the developing case base, order to delete unnecessary examples. Minton [29] pro-
and while some localised support facilities are avail- poses a similar approach in a speed-up learner by delet-
able (viewing feature-value statistics, for example) the ing rules if they are found to have a detrimental affect on
author is kept in the dark when it comes to understand- problem solving efficiency. Minton proposes a model
ing the global evolution of the case base. Research is of problem solving efficiency for speed-up learners and
only now beginning to be carried out on how best to this model is used explicitly to guide the deletion pro-
assist the case authoring process [11] and we believe cess. Smyth and Keane [20] propose a similar approach
that CASCADE offers a number of new benefits to au- for CBR systems. This time the aim is to delete cases
thors by providing interactive, visual feedback during from a case base while preserving its competence char-
the authoring process. acteristics. Since the cases in a CBR system account for
the primary source of problem solving knowledge, the
2.2. Maintaining & Optimising Case Bases preservation of case base competence is of the utmost
importance. Smyth and Keane address this by using an
Recently, maintenance and optimisation have become explicit model of case competence in order to guide the
important topics in CBR research, particularly as deletion process, and show that case base competence
systems are scaled to handle real-world problems is preserved as cases are removed from the case base.
[14–20]. One important aspect of maintenance focuses Even when a good set of cases has been selected
on the case base itself and looks for ways of opti- (whether by addition or deletion processes) further op-
mising the case base by adding or deleting cases. Al- timisations may be desirable. In Robbie [30], intro-
ternatively, instead of adding or deleting entire cases, spective reasoning methods are used to determine the
maintenance may involve the modification of existing features that should be used to index a case in order
cases, typically by adjusting feature weights or by using to guarantee retrieval success. By monitoring retrieval
98 McKenna and Smyth

results, Robbie attempts to refine case indices after a t ∈ T , if and only if two conditions hold. First, the case
retrieval failure has occurred. Alternatively, the work of must be retrieved for the target, and second it must be
Wetteschereck et al. [31] and Munoz-Avila and Hullen possible to adapt its solution so that it solves the target
[32] concentrate on automatically learning appropriate problem. Competence is therefore reduced if adaptable
feature weights by analysing problem solving perfor- cases fail to be retrieved or if non-adaptable cases are
mance over test cases; a similar approach is advocated retrieved, [20, 36]. We can model these relationships
by others ([33, 34]). according to Definitions 1–3.
Wilson and Martinez [28] make the useful distinction
between decremental versus incremental approaches to Definition 1. RetrievalSpace(t ∈ T ) = {c ∈ C: c is
case base (or instance-base) optimisation. Decremen- retrieved given t}
tal approaches correspond to the techniques that bene- Definition 2. AdaptationSpace(t ∈ T ) = {c ∈ C: c
fit from the availability of a complete set of cases and can adapted to solve t}
optimisation works by reducing this set. In contrast, in-
cremental methods do not benefit from the availability Definition 3. Solves(c, t) iff c ∈ [RetrievalSpace (t)
of a complete case set, and instead must make local op- ∩ AdaptationSpace(t)]
timisation decisions on the basis of the cases that have
been seen so far. CASCADE uses a competence model Two important competence properties are the cover-
to support both strategies. age set and the reachability set. The coverage set of a
case is the set of all target problems that this case can
be used to solve. Conversely, the reachability set of a
3. A Model of Case Competence target problem is the set of all cases that can be used
to solve it.
Competence is all about the number and type of target
problems that a given system can solve (we are ignoring Definition 4. CoverageSet (c ∈ C) = {t ∈ T :
the related issue of solution quality). This will depend Solves(c, t)}
on a number of factors including statistical properties of
Definition 5. ReachabilitySet (t ∈ T ) = {c ∈ C:
the case base and problem-space, and the proficiency of
Solves(c, t)}
the retrieval, adaptation and solution evaluation com-
ponents of the CBR system in question. The compe- If we could specify these two sets for every case in the
tence model described in this section is based on a case-base, and all possible target problems, then we
similar model first introduced by Smyth and McKenna would have a complete picture of the competence of a
[35]. The present model introduces a number of im- CBR system. Unfortunately, this is not feasible. First,
portant modifications to increase the effectiveness of due to the sheer size of the target problem space, com-
the competence estimates and to improve the general puting these sets for every case and target problem is
applicability of the model. intractable. Second, even if we could enumerate every
Our objective in this article is to describe an inter- possible problem that the system might be used to solve,
active authoring and visualisation tool for case base it is next to impossible to identify the subset of prob-
designers that is based on this explicit model of case lems the system would actually encounter. Clearly, the
competence. Later, in Section 4, we will describe ex- best we can do is to find some approximation to these
actly how this model is used to provide interactive au- sets by making some reasonable, simplifying assump-
thoring support through competence visualisation. tion. So, to characterise the competence of a case-base
in a tractable fashion we make the so-called Represen-
3.1. Coverage & Reachability tativeness Assumption, namely that the case-base is a
representative sample of the target problem space.
When we talk about the competence of a case we are To put it another way, this assumption proposes that
referring to its ability to solve certain target problems. we use the cases in the case-base as proxies for the
Consider a set of cases, C, such that each case con- target problems the system is expected to solve. This
sists of a problem description part and a solution part. assumption may seem like a large step, as it proposes
Further, consider a space of target problems, T , such that the case-base is representative of all future prob-
that each target problem consists of a problem descrip- lems encountered by the system. It could be argued
tion only. A case, c ∈ C, can be used to solve a target, that we are assuming that all the problems faced by
An Interactive Visualisation Tool 99

the system are already solved and in the case-base. We


think that this greatly overstates the reality of the sit-
uation and underestimates the contribution that adap-
tation knowledge can play in modifying cases to meet
target problems. Furthermore, we would argue that the
representativeness assumption is one currently made,
albeit implicitly, by CBR researchers; for if a case-
base were not representative of the target problems to
be solved then the system could not be forwarded as
a valid solution to the task requirements. In short, if Figure 1. Relative coverage values for cases. Each ellipse denotes
the coverage set of its corresponding case and each RC value is shown
CBR system builders are not making these assumptions
in brackets.
then they are constructing case-bases designed not to
solve problems in the task domain. Of course implic-
itly this assumption is made by all inductive learners, Relative coverage weights the contribution of each cov-
which rely on a representative set of example instances ered case by the degree to which these cases are them-
to guide their particular problem solving task. Armed selves covered. It is based on the idea that if a case c0 is
with the representativeness assumption, we can now covered by n other cases then each of the n cases will
provide tractable definitions for coverage (Definition 6) receive a contribution of 1/n from c0 to their relative
and reachability (Definition 7): coverage measures.
Fig. 1 displays a number of cases and their relative
Definition 6. ConverageSet (c ∈ C) = {c0 ∈ C: Solves coverage values. Case A makes an isolated competence
(c, c0 )} contribution that is not duplicated by any other cases.
Definition 7. ReachabilitySet(c ∈ C) = {c0 ∈ C: Its coverage and reachability sets contain just a single
Solves (c0 , c)} case (case A itself) and so its relative coverage value is
1; case A is a pivotal case according to the competence
3.2. Relative Coverage categories of Smyth and Keane [20]. Case B makes the
largest local competence contribution (its coverage set
The size of the coverage set of a case is only a mea- contains 3 cases, B, C and D) but this contribution is
sure of its local competence. For instance, case cover- diluted because other cases also cover C and D. The
age sets can overlap to limit the competence contribu- relative coverage of B is 11/6 (that is 1+1/2+1/3). B
tions of individual cases, or they may be isolated and is also a pivotal case but using relative coverage we can
exaggerate individual contributions [20]. It is actually see that it makes a larger competence contribution than
possible to have a case with a large coverage set that A; such fine-grained competence distinctions were not
makes little or no contribution to global competence previously possible. Cases C and D make no unique
simply because its contribution is subsumed by the lo- competence contribution as they only duplicate part of
cal competences of other cases. At the other extreme, the existing coverage offered by B. Consequently, C
there may be cases with relatively small contributions and D have relative coverage values of 5/6 and 1/3
to make, but these contributions may nonetheless be respectively; they are both auxiliary cases according to
crucial if there are no competing cases. For a true pic- the competence categories of Smyth and Keane [20].
ture of competence, a measure of the coverage of a A case is an auxiliary case if the coverage it provides
case, relative to other nearby cases, is needed. For this is completely subsumed by the coverage of one of its
reason we define a measure called relative coverage reachable cases.
(RC), which estimates the unique competence contri- As it stands, the relative coverage metric ignores the
bution of an individual case, c, as a function of the size distance (similarity) between cases and the adaptation
of the case’s coverage set (see Definition 8). effort required to transform case solutions. Our objec-
tive is to design a metric that makes no representa-
Definition 8. tional or computational assumptions about the retrieval
and adaptation of cases. Clearly, such additional fac-
RelativeCoverage(c)
X tors could be taken into account when computing rel-
1
= 0 )|
ative coverage by extending the existing RC metric.
c0∈CoverageSet(c)
|ReachabilitySet(c However, such extensions would inevitably lead to the
100 McKenna and Smyth

Figure 2. A set of 4 cases is shown to form two separate competence groups, one containing the 3 cases c1, c2, and c3, the other containing
the pivotal case, c4.

coding of domain specific preferences into the relative case in the group (this is the first half of the equation).
coverage metric, and limit its immediate applicability In addition, the group must be maximal in the sense
in other domains. that there are no other cases in the case base that share
coverage with any group member (this is the second
half of the equation).
3.3. Competence Groups

As cases are added to a case base, clusters begin to form Definition 10. For G = {c1, . . . , cn} ⊆ C,
to produce well-defined regions of competence. Com-
mon problem types are typically represented by large, CompetenceGroup(G, c) iff ∀ci ∈ G, ∃cj ∈ G − {ci}:
densely packed clusters, while smaller clusters, or even SharedCoverage(ci, cj) ∧ ∀ck ∈ C − G, ¬∃cl ∈ G:
lone cases, generally represent more unusual problem
SharedCoverage(ck, cl)
types. Importantly, these clusters do not overlap (inter-
act) with other clusters, and as such their competence
contributions can be treated independently of one an- A simple example of two groups in a simple case base is
other. This is critical because it means that we can cal- illustrated in Fig. 2. There are four cases, c1, c2, c3 and
culate global competence directly from the competence c4, and their coverage sets are as shown. The cases c1,
contribution of each cluster. We call these clusters com- c2 and c3 form a group because they share regions of
petence groups. coverage; that is, c2 is covered by c1 and c3 is covered
A competence group is a collection of related cases, by c2. In contrast, c4 is sufficiently different from the
which together make a collectively independent contri- others so that it does not share any coverage. Thus,
bution to overall case base competence. The key idea there are two competence groups, one with three cases
underlying the definition of a competence group is that (c1, c2 and c3) and a second with just one case (c4).
of shared coverage (see Definition 9). Two cases ex-
hibit shared coverage if their coverage or reachability 3.4. From Competence Group
sets overlap. This is seen as an indication that the cases to Competence Footprint
in question make a shared competence contribution,
and as such belong to a given competence group. The essential thing to notice about competence groups
in general is that they are independent of one another
Definition 9. For c1, c2 ∈ C SharedCoverage(c1, c2) in the sense that there is no interaction between their
iff competence contributions because, by definition, there
is no shared coverage between different groups. This
[CoverageSet(c1) ∪ ReachabilitySet(c1)] means that each competence group makes a unique
∩ [CoverageSet(c2) ∪ ReachabilitySet(c2)] 6= { } contribution to the competence of the case base.
Within a given group, not every case will make a
This shared coverage relationship provides a way of contribution to competence; for example, Smyth and
linking related cases together. Formally, a competence Keane [20] define the concept of an auxiliary case,
group is a maximal collection of cases exhibiting shared which by definition, makes no competence contribu-
coverage (see Definition 10). Thus, each case in a com- tion. For this reason, in our competence model, we base
petence group must share coverage with some other group competence on the group footprint—a subset of
An Interactive Visualisation Tool 101

Figure 3. The footprint creation algorithm is a variant of the condensed nearest neighbour method ([23]).

the competence group cases that cover all cases in the does not include an explicit distance calculation, and
group. thus, provides a more general method for computing
The algorithm in Fig. 3 is used for identifying the group coverage.
group footprint cases; it is a simple modification of the Once the coverage of individual groups has been cal-
CNN/IB2 algorithms ([23, 25]) in the sense that the first culated, the coverage of the case base as a whole is com-
step is to sort the cases in descending order of their puted as the sum of its group coverage values—again,
relative coverage values. This causes cases with large this is valid because each group makes an independent
competence contributions to be added before cases with coverage contribution.
smaller contributions, and thus helps to keep the foot-
print size to a local minimum. The footprint is then
constructed according to the normal CNN algorithm, 3.6. The Effectiveness of the Competence Model
i.e., by considering each case in turn, and adding it to
the footprint only if it is not already covered by the The true test of our model is whether it accurately pre-
footprint built so far. dicts real system competence. If our model is effective
then there should be a high correlation between case
3.5. From Group Coverage to Case Base Coverage base coverage and true system competence. In this sec-
tion we examine this issue and provide experimental
One of the problems with our original competence evidence in support of our model.
model [35] was that its density-based view of compe-
tence was tied directly to the distance between cases, Materials. This experiment uses two publicly avail-
making certain assumptions about the distribution of able case bases: 1400 cases from the Travel domain
cases in the problem space, and introducing domain- (available from the Case base Archive at http://www.ai-
specific distance calculations during competence as- cbr.org [37]) and a 500 case case base from the
sessment. For example, cases may be densely packed Residential Property domain (available from the UCI
near concept boundaries without providing more cov- Machine Learning Repository [38]). We also produced
erage than cases from more regular regions of the prob- extra cases for each domain. For the Travel domain 400
lem space, but the model treats both as equal. duplicate cases and 400 near-miss (redundant) cases are
added to generate a total of 2200 cases. For the Property
11. For G = {c1, . . . , cn} ⊆ C, Cove-
Definition P domain 200 duplicates and 200 near-misses are added
rage(G) = c∈ Footprint(G) RelativeCoverage(c) to give a total of 900 cases. Near-misses are generated
by using a set of rules to produce new cases that are
In our current version we measure the coverage of slight perturbations of existing cases by modifying fea-
a group by summing the relative coverage values of ture values with specific ranges. The extra cases were
its footprint cases (see Definition 11). Relative cover- produced to investigate how the model copes with re-
age makes no assumptions about case distributions and dundant cases.
102 McKenna and Smyth

Method. For each data set we build case bases by new competence model correlates slightly better with
adding cases incrementally. This produces a 2200 case true competence in addition to benefiting from a do-
case base for Travel and a 900 case case base for Prop- main independent measure of competence, through the
erty. At regular intervals the true competence of the case use of the relative coverage metric (see Section 3.2).
base, measured as a percentage of the domain problems Therefore, we prefer to use this model rather than the
that can be solved by the case base so far, is noted, along original density-based model.
with the coverage of the case base according to our
model. For completeness we also computed the cover- Discussion. Clearly the results are extremely posi-
age values using our original competence model [35]. tive, with a very close match between predicted and
true competence in both domains. In particular, the cov-
Results. The results are shown in Fig. 4(a) and (b) as erage model can accurately predict competence even
true competence and case base coverage plotted against when duplicate or redundant cases are added to the
case base size for Travel and Property respectively. case base. Competence does not increase when dupli-
They provide excellent support in favour of both com- cate cases are added and generally only increases min-
petence models (original and new) as there is clearly a imally with near-miss cases, as exemplified in Fig. 4(a)
very close relationship between the coverage and com- after the 1400 case mark and after the 500 case mark
petence curves and hence a strong correlation between in Fig. 4(b). The important point to note is that the
predicted competence (that is, case base coverage ac- competence model predictions continue to track true
cording to the new and original competence models) competence under these conditions. The same is not
and true competence. In fact, the correlation coeffi- true of case base size, which continues to rise through-
cient between the true competence and predicted cov- out the experiment, hence its low correlation coefficient
erage curve for the new competence model is 0.856 value.
for the Travel domain and 0.993 for the Property do- Finally, the results presented in Fig. 4 correspond
main, both of which are statistically significant at the to just one case base building run, and it is not clear
0.001 level, using a paired 2-sample for means t-test. how the proposed model performs over multiple runs
This is a significantly better correlation than the stan- for different initial case orderings. For this reason we
dard base-line predictor of competence, case base size, carried out 30 runs of the above experiment each with
which has a correlation coefficient of only 0.574 for the a different initial ordering of cases, and for each run we
Travel domain and 0.90 for the Property domain. In this noted the correlation values for case base size and the
experiment the original competence model [35], also competence model predictions. Table 1 reports the av-
correlated well with the true competence (0.846 for the erage and standard deviations of these correlation val-
Travel domain and 0.991 for the Property domain). Our ues and it is clear that both competence models (original

Figure 4. Competence and coverage vs. Case Base Size for the Travel and Property domains.
An Interactive Visualisation Tool 103

Table 1. Mean and standard deviation correlation values for the Travel and Property domains.

Travel domain Property domain

Size correl. Original model correl. New model correl. Size correl. Original model correl. New model correl.

Man 0.648 0.897 0.898 0.89 0.984 0.986


Std. dev. 0.038 0.026 0.025 0.026 0.012 0.01

and new) provide a far more accurate prediction of true Adolescence. As the case base grows, and the prob-
competence than that provided by case base size. In ad- lem space becomes more densely populated, new cases
dition, the new model proposed in this article provides have a higher chance of falling within reach of an exist-
this prediction in a domain independent manner, and ing competence group, thereby preventing the creation
thus, we argue, represents an advance on the previous of a new group. Thus, the incidence of new compe-
model reported by Smyth and McKenna [35]. tence groups drops significantly during this phase and
the number of groups tends to remain fairly constant.
Importantly, although cases fall within existing groups
3.7. The Evolution of Competence this does not mean that these cases are redundant. Dur-
ing this phase case base coverage is still rising because
A case base can evolve along a number of different the new cases tend to extend the coverage of their as-
competence trajectories during the authoring process. sociated groups. As groups extend their reach, the pos-
New cases can be added to form new competence sibility of group merger tends to increase, which leads
groups, or to extend existing groups, and either option to the next stage of development.
can significantly improve competence. Alternatively,
new cases may be similar to existing cases and so serve Adulthood. The penultimate phase occurs when a
only to increase the redundancy in a case base. In gen- case base nears its saturation limit. New cases are likely
eral as cases are added one of four things can happen: to fall within reach of not just one group, but possibly
either (1) new groups are created, (2) existing compe- many existing groups, and the result then sees the merg-
tence groups grow in size and coverage, (3) a number ing of these multiple groups into a single super-group.
of existing groups merge to form a new ‘super’ group, This causes the number of groups in the case base to
or (4) existing groups can grow in size but without begin to fall.
increasing coverage (conversely, as cases are deleted,
groups may disappear altogether, or they may split into Old-Age. Finally, when a case base reaches its satura-
smaller ‘sub’ groups). In general, which of these will tion level, new cases tend to fall within the boundaries
happen depends on the size of the case base, or more ac- of existing groups but do not result in further growth
curately the distribution of cases in the case base. We for these groups—as a result overall case base cov-
believe that during the authoring process, case bases erage tends to remain stable. This means that future
will move through four distinct developmental phases group mergers are unlikely and at this stage the princi-
corresponding to infancy, adolescence, adulthood, and pal competence groups represent the basic clusters of
old-age. cases in the problem space.
These 4 developmental phases can be studied empir-
Infancy. Early on during the authoring process the ically by graphing the number of competence groups
tendency is for new cases to cause the creation of new against the size of a developing case base. Figure 5(a)
competence groups. This is a result of there being rel- and (b) shows this experiment for the Travel and the
atively few, sparsely distributed cases in the case base Property case bases respectively. For each data set a
and so the chances are that any new cases will not fall case base is incrementally constructed to yield a Travel
within reach of an existing group. According to Smyth case base of 1400 cases and a Property case base of
and Keane [20] these new cases are pivotal cases and 500 cases. The number of competence groups is mea-
the new competence groups contain only a single case. sured after each case addition. This case base construc-
During this phase of development case base coverage tion process is repeated 30 times for different random
(and competence) rises dramatically. initial orderings of cases to produce 30 different case
104 McKenna and Smyth

Figure 5. Graphs of how the number of competence groups changes with case base size, averaged over 30 runs. Four distinctive developmental
phases are seen as the number of competence groups increases and decreases during the evolution of a case base.

bases for each data set, and a mean number of compe- ing a number of potential limitations with the model as
tence groups at each case base size. it stands. Perhaps most importantly, the tractability of
Each of the 4 developmental phases can be seen in the proposed model depends greatly on the validity of
the graphs. For example, in Fig. 5(a), for the Travel the so-called representativeness assumption discussed
domain, the infancy phase stretches from 0 cases to in Section 3.1, which saw the case base as a suitable
about 100 cases. During this time about 50% of the proxy for the target problem space. We would argue
case additions translate into new groups. The adoles- that this assumption is one already made, albeit im-
cence phases goes from 100 cases to about 375 cases, plicitly, by CBR researchers; for if a case base were
and it can be seen that, for this range, the number of not representative of the target problems to be solved
groups remains relatively stable as new cases tend to then the system could not be forwarded as a valid so-
fall with existing groups, extending their boundaries. lution to the task requirements. In short, a CBR system
Phase 3 (adulthood) extends from 375 cases to about that did not meet this assumption would be one that
850 cases, at which time the number of groups falls off had been constructed not to solve problems in the task
dramatically because new cases tend to fall within reach domain. Of course implicitly this assumption is made
of multiple groups and result in the merging of existing by all inductive learners, which rely on a representa-
groups. Finally, after about 850 cases, the number of tive set of example instances to guide their particular
groups remains relatively stable as most new cases re- problem solving task. However, if the case base is not
dundantly fall into existing groups. Similar results can a good representation of the target problem space then
be seen in Fig. 5(b) for the Property domain. the effectiveness of our proposed competence model
will inevitably degrade, albeit gracefully.
3.8. Discussion & Limitations In this section we have referred to an earlier model
of competence [35] which has led to the current model.
We have presented and evaluated our current compe- A criticism of this earlier model was that it relied on
tence model for case-based reasoning systems. In the the use of domain specific similarity (or distance) met-
next section we will describe how this model acts as rics to produce its predicted coverage values, and that
the foundation for the CASCADE authoring support this potentially introduces domain sensitivity into the
environment. In particular, we will discuss how the model. We agree with this criticism and for this reason
model can be used to visualise the evolving case base in we have developed the current model, which is at least
a meaningful way, and how this visualisation facility as effective in predicting true competence, but uses no
supports a new level of interaction between user and domain-specific components.
authoring system. Finally, in Section 3.2 we noted that the current com-
However before closing this section on the mechan- petence model does not account for potentially impor-
ics of the competence model it is worthwhile mention- tant factors such as adaptation cost in its assessment of
An Interactive Visualisation Tool 105

a given case’s competence contribution. While this is case authors by allowing them to visualise the evolving
true, adaptation cost was not a competence factor in the case base in real-time, and, on the basis of this, help
case-based reasoning systems that we have examined them make better decisions about which cases should
to date. However, we do accept that it may be a fac- be added (or deleted) in the future. In other words, the
tor in other CBR domains, but we also believe that our real innovation of CASCADE is to provide a mecha-
model could be extended to cope with such domains nism for supporting a competence-guided interaction
by adding an adaptation-cost factor to the relative cov- between author and case base reasoner during the de-
erage metric. velopment process.
The CASCADE architecture (shown in Fig. 6) is
composed of the following:
4. CASCADE: An Authoring
Support Environment • A case base and competence model.
• An access process that provides the author with direct
Deciding on which cases should be added to a case base access to the cases in the case base and to the details
(or conversely, which cases should be left out) is one of the current competence model such as group and
of the most important tasks facing the case author. De- coverage information.
pending on the application domain this can be relatively • An update process that allows the author to change
straightforward or it can be extremely difficult. For ex- the case base (by adding, deleting or editing individ-
ample, there may be a small set of well understood ual cases), and ensures that the competence model is
and well accepted cases for initialising the case-base. changed accordingly.
Alternatively, there may be a huge number of possi- • A visualiser process that allows the author to request
ble cases with varying degrees of quality. In fact, since certain types of visualisations and statistics from the
the latter is usually the case with real-world problem case base, specifically a visualisation of case base
domains, the case author is faced with an extremely competence (to be discussed in the next section).
difficult, and poorly supported, knowledge acquisition
task. The CASCADE project aims to fill this gap by 4.2. The Competence Visualisation Tool
providing much needed practical support to the case
author. CASCADE’s basic visualisation tool enables the case
This support is achieved in a number of ways. Most base to be viewed in terms of its consitituent compe-
importantly, CASCADE commissions an innovative tence groups. Each group is displayed as a point on
interactive visualisation tool that allows the compe- a two dimensional graph of the logarithm of group
tence of the case base to be accurately monitored, and, size (y-axis) versus the logarithm of group coverage
in so doing, allows the case author to directly perceive (x-axis). Figure 7 shows a snapshot of the visualisation
the evolving competence of the case base in real-time. for the Travel case base that corresponds to a case base
Moreover, by understanding how case base competence of 1200 cases and 38 groups. Each group is represented
can evolve during the authoring process, the case author by a coloured square and the colour of the square corre-
can identify and track regions of high or low coverage sponds to the age of the group as per the ‘temperature’
within the case base so that future authoring effort may scale shown on the left side of the tool (see Section 4.7
be concentrated where it is needed the most. for details on group age).
In addition, groups are connected to their nearest
4.1. The CASCADE Architecture neighbouring group by an arc to allow the author to
judge the relationship between groups where the dis-
CASCADE is a development environment for case- tance between two groups, g1 and g2 is the minimal
based reasoning systems that emphasises the impor- distance between a case in g1 and a case in g2. Fur-
tance of case authoring. Like all other authoring sys- thermore, g1’s nearest-neighbour group is that group
tems, CASCADE provides the usual functionality to that has the smallest distance to g1.
allow cases to be added, deleted, and edited. However The tool maintains a real-time visualisation of a case
the main innovation is that, as case authoring proceeds, base. If new cases are added (or deleted) by the au-
a model of case competence is maintained, and in turn, thor the visualisation changes accordingly. For exam-
this model forms the basis for a suite of visualisation ple, as new cases are added newly created groups may
tools. These tools provide the much needed support for appear, existing groups may grow, or collections of
106 McKenna and Smyth

Figure 6. The CASCADE architecture.

Figure 7. A snapshot from the visualisation tool for a case base of 1200 cases (and 38 competence groups) from the Travel domain. The
visualisation is a logarithmic graph of group size (y-axis) versus group coverage (x-axis). Each group is plotted as a coloured square, and nearby
groups are connected with arcs. The colour of the group corresponds to its age. The insert displays a CASCADE frame containing group details.

groups may merge into a single ‘super’ group. Simi- the groups spread farther and farther from the origin
larly, as cases are deleted groups may disappear, reduce as they grow in size and competence. By understand-
or divide. In this way the tool provides an animated ing how a case base is likely to evolve under various
record of the evolving case base organisation and com- conditions the case author can evaluate the progress
petence. Although, for obvious reasons, the full effect that is being made during the authoring process and
of this animated facility cannot be effectively commu- then act accordingly. Sections 4.3–4.7 explain how the
nicated here, the way that groups move in the visuali- author can use CASCADE to decide what cases
sation tool can be seen by examining future snapshots should be added or deleted next, on the basis of how
in detail (Fig. 8). In general, as the case base grows competence is evolving at the group level.
An Interactive Visualisation Tool 107

Figure 8. Competence snapshots of the Travel case base for various case base sizes. Each snapshot shows the competence visualisation for
one particular case base size (as indicated in the display title-bar).
108 McKenna and Smyth

CASCADE permits the user to select and query in- 4.4. Competence Regions
dividual groups by clicking on individual group nodes.
Once a group has been selected a range of useful group The competence display can be divided into quadrants
information can be displayed. In particular, CASCADE as shown in Fig. 9. Quadrants A and B correspond
will show a list of cases within the group and also dis- to large and small groups respectively with relatively
play summary information concerning the features of low competence contributions, while C and D corre-
these cases. This is an important feature because it pro- spond to groups with much higher competence values.
vides intuition on what sort of cases belong to a par- Normally, we would expect competence groups to fall
ticular group, and what sort of cases are likely to fall into B and C. The former represent small competence
into the group in the future. (Unfortunately, for space groups with low coverage, while the latter represent
reasons, this feature will not be discussed further here). large groups with high coverage. Both quadrants cor-
Finally, CASCADE provides information on the foot- respond to groups with low degrees of redundancy. For
print and non-footprint cases in a particular group and the author, groups from B (small size, low coverage)
this, as we shall see, can provide the user with impor- are due further attention as they may correspond to re-
tant insight into the way that the overall competence of gions of the problem space that are under-represented
a particular group is evolving. by the case base.
Competence groups can also fall into quadrants A
and D, and can be interpreted in a number of ways.
4.3. Tracking the Evolution of Competence For example, the low coverage value of a group from
quadrant A may be the result of a high degree of redun-
A case base can evolve along a number of differ- dancy among the cases in the group. Alternatively, the
ent competence trajectories during the authoring pro- group may contain cases from an unusually complex
cess. New cases can be added to form new compe- region of the problem space. Each case may offer little
tence groups or to extend existing groups and this new coverage (relative to other cases from groups in
will tend to significantly improve competence. Al- more regular regions of the problem space) but may
ternatively the new cases may be redundant and so display no redundancy of coverage. In contrast groups
serve only to increase the redundancy in a case base. from quadrant D (small size, high coverage) are likely
Section 3 described how new groups are born, and to come from very regular regions of the problem space
how existing groups grow and merge with others that are easy to cover.
during the four-phase developmental cycle of a case
base.
This developmental behaviour can also be seen by 4.5. Improving Case Base Competence
using the competence visualisation tool. For example,
Fig. 8 shows 12 snapshots of the visualisation tool at An important goal of the case author is to select cases
various stages during the authoring of the Travel case that will improve the competence of the developing
base. The first 3 snapshots, Fig. 8(a–c), show that new system (i.e., the author should try to add cases that are
competence groups are regularly created during the in- likely to improve the coverage of the case base). Of
fancy stage of the case base. Most of these new cases course, identifying such cases can be extremely dif-
form pivotal groups and so they remain clustered at ficult. The competence visualisation tool can provide
the origin of the graph. Fig. 8(d–f) shows the case valuable guidance and visual feedback because it can
base is in its adolescence. Here we see that instead immediately identify low-competence groups that may
of new groups being created, the number of groups re- benefit from adding cases.
mains fairly static, but the position of the groups on By way of an example, consider the competence vi-
the graph is continually changing as group size and sualisations shown in Fig. 10, which shows three snap-
coverage changes. During adulthood (Fig. 8(g–i)), the shots for the Travel domain. Figure 10(a) shows one
case base becomes more densely packed, and new cases particular group from quadrant B that has been se-
often result in the merging of groups. Finally, during lected. The group contains 13 cases and so has very lit-
old-age the case base becomes saturated. New cases are tle coverage. By accessing summary information about
added to existing groups, but they do not increase the the cases in the group (not shown here) the author
competence of these groups—group movement is in the can identify the type of cases that fall into this group.
vertical axis (group size dimension) only (Fig. 8(j–l)). By adding these new cases the coverage of the group
An Interactive Visualisation Tool 109

Figure 9. The main competence regions.

should increase. This is seen in Fig. 10(b) and (c). As C. The groups from quadrant B are considered to be “a
the highlighted group grows in size we see that its cov- good bet” because they are small and so have room to
erage also increases as the group moves from quad- grow, both in size and competence. In contrast, groups
rant B to quadrant C. This occurs because the chosen from quadrant C are likely to be nearing saturation,
group is not yet saturated with cases, and because the and adding new cases may only serve to increase their
cases that have been added are sufficiently different redundancy.
from the other group cases so that they make a sig- The author is supported in this task not only by the
nificant contribution to group coverage. In addition, visualisations produced but also by the statistical in-
as cases are added to this group merging occurs with formation that is available concerning group details,
other nearby groups to increase coverage further. In this (e.g., group size, footprint size). For example, adding
particular example, the highlight group merges with a new case to a large group that increases the size of
other groups on a number of occasions as authoring the group’s footprint is, by definition, guaranteed to
proceeds. improve the coverage of the group significantly, and
According to this strategy, the author can efficiently therefore increase the group’s true competence. Con-
increase the competence of the case base by adding versely, adding a non-footprint case is unlikely to im-
cases that move groups from quadrant B into quadrant prove competence.

Figure 10. Visualisation snapshots from the Travel case base showing one particular group that grows significantly in coverage as cases are
added.
110 McKenna and Smyth

Figure 11. The visualisation snapshots show how, in a short space of time, one particular group has followed a trajectory that suggests it
contains a significant amount of redundant cases. The group has grown in size but its coverage has not increased appreciably.

4.6. Identifying Redundant Regions added to the case base. Group age is a measure of how
long a group has been present in the case base. When
During the authoring process it is possible (likely) that a group is created it is time-stamped with the current
redundant cases will be added to the case base. Redun- time, and its future age can be calculated as the differ-
dancy will manifest itself in the visualisation tool in an ence between the current time and the creation time of
obvious way: as redundant cases are added to groups, the group.
the groups will increase in size but not in coverage. The Visualising the age of a group can be very useful for
author can easily identify these problematic groups by the case author, especially when it comes to identifying
looking for large groups with small coverage values outlier or anomalous cases that may not be important
that, over time, move only in a vertical direction. Again, enough to keep in the case base. By its very nature an
the visualisation tool supports the user by providing a outlier (or anomalous) case will be pivotal [17, 20], and
mechanism for tracking redundant cases. it will be represented by a pivotal competence group at
Figure 11(a–c) shows an example of this from the the origin of the competence visualisation graph. How-
Property domain. One particular group is highlighted ever this is not a necessary and sufficient condition for
in all three snapshots. As cases are added, the group recognising such unwanted cases. The problem is that
is seen to move from quadrant B to quadrant A. Group many important cases will start out as pivotal groups
size is increasing but coverage is not, indicating that and as such may be mistaken for outliers. This is where
the new cases are not contributing to competence, i.e., group age comes into play. The important thing to un-
they are falling outside the footprint of the group. Con- derstand is that true outliers are unlikely to evolve into
sequently, the cases are increasing redundancy in the larger groups of cases—they will grow old as pivotal
case base. At this point the author can request further groups. Therefore it is possible to highlight outliers by
information about the group and act to reduce its redun- directing the author’s attention to old pivotal groups.
dancy by removing certain cases, or by replacing redun- The author can examine these groups in the usual way
dant cases with ones that do improve group coverage. and then decide whether or not to delete them from
Again, CASCADE supports the user during deletion by the case base. Of course if the author decides that the
ranking cases according to their predicted competence groups are not outliers, then they can focus on devel-
values. oping them further by adding the appropriate cases to
the case base.
4.7. Identifying Outliers & Anomalous Cases
4.8. Discussion & Limitations
In Section 4.2 we mentioned that the groups are
coloured coded to reflect their age. In CASCADE, time To summarise, CASCADE introduces a new type of
is measured in terms of the number of cases that are visualisation tool to assist knowledge engineers during
An Interactive Visualisation Tool 111

case authoring. The tool goes one step farther than tra- users to follow a set of instructions, when adding or
ditional CBR authoring tools by providing the case au- deleting cases, which reflects the sort of decision sup-
thor with a global picture of evolving case competence. port that CASCADE provides. For example, with the
In the preceding sections we have explained how the aid of the visualisation tool, an author can see whether
case author can interpret these visualisations, and how adding a particular case increases the coverage of an
this interpretation can guide future authoring effort. existing competence group or causes existing compe-
In particular, there are a number of specific authoring tence groups to merge. Conversely, if the added case
problems that this competence visualisation can help were a pivotal case the visualisation tool would show
to address, including: the formation of a new competence group. Therefore
a virtual user was instructed to add cases that (a) be-
1. Identify regions of the case base that may warrant come footprint cases of an existing competence group
additional cases in order to improve competence. and hence increase the group’s coverage value, (b)
2. Identify regions of the case base that may be over- cause group mergers, or (c) result in a new group being
populated with redundant cases that may need to be formed.
removed. CASCADE tool can also inform the author of redun-
3. Identify outlier or anomalous cases. dant cases. This information is vital in deleting cases
from a case base so as not to delete competence rich
At the present time the visualisation tool provides the cases. We hypothesise that without the use of CAS-
case author with an animated window on the case base CADE, users are left in the dark as to which are the
that facilitates increased interaction during the author- good and bad cases and hence follow an essentially
ing process. This ability to visualise the evolving case random manipulation strategy.
base is the crucial step forward; it provides the user
with potentially vital feedback during authoring. Materials: Both experiments use the Travel and Resi-
Of course the present system does suffer from a num- dential Property data sets. As before we produced extra
ber of shortcomings. Most significantly, the interpre- cases for each domain (near misses and duplicates) to
tation of the competence visualisation, and the actions give 2200 cases in the Travel domain and 900 cases in
taken on the basis of this interpretation, is ultimately the Property domain.
left up to the case author. However, this too will change
as the degree of automation increases in CASCADE. 5.1. Case Addition Experiment
We are currently investigating techniques for automat-
ically notifying the author when an anomaly is noticed. Method. For each data set, two case bases were con-
For example, if a given group has grown in size but not structed using two case addition strategies. At regu-
in competence for a certain period of time (or over a lar intervals the competence of the developing case
range of case additions) then the group can be high- base was measured with respect to unseen target prob-
lighted and cases with low competence values can be lems and case addition continued until 100% con-
recommended for deletion (see Section 5 for some pos- sistency with respect to the original training set was
sible ways forward). reached. The first case base was produced by adding
cases at random, a strategy which we believe is a base-
5. Experimental Analysis line indicator for a case author working without using
CASCADE (realistically this probably represents the
In this section we describe an empirical evaluation action of a novice case author). The second case
of the CASCADE visualisation and authoring support base was built using a strategy that is supported by
services. Resource limitations prevented us from con- CASCADE. The competence groups of the seed case
ducting this experiment with real users, asking them base were examined in increasing order of their cov-
to manipulate a case base with and without the aid of erage to size ratio. Each of the remaining cases were
CASCADE, and then noting the competence of the re- added to the case base if (a) it became a footprint case of
sulting case base. Instead, we simulated the actions of the chosen competence group, (b) it caused the chosen
virtual users as they added or deleted cases from a case competence group to merge with one or more neigh-
base with (a) the aid of the CASCADE tool and (b) bouring groups, or (c) it could not be solved by the
without any visualisation or authoring aid. To simulate existing case base. Each of these conditions is read-
the use of the CASCADE tool we allowed the virtual ily visualised using the competence visualisation tool
112 McKenna and Smyth

Figure 12. Competence vs. Case Base Size for the Travel (a) & (c) and Property (b) & (d) domains respectively. Graphs (a) & (b) show the
results of the ‘Addition Experiments’ (on Travel and Property domains respectively) and Graphs (c) & (d) show the results of the ‘Deletion’.

(see Section 4.5). The results were averaged over 20 main (from a total of 900). Each competence model
runs for each of the domains. was produced for the case base. We used two deletion
strategies. The first simulated an author deleting cases
Discussion. The results are shown in Fig. 12(a) and without any advice (essentially at random). To simu-
(b) as competence against case base size. The com- late deletion with the visualisation tool we proposed a
petence of the case bases grown at random grows strategy that selected the low competence cases (i.e.,
quite slowly compared to the case bases grown using the non-footprint cases) in each competence group for
CASCADE’s addition strategy. In fact, in both do- deletion beginning with the largest group—these are
mains the case bases grown using the CASCADE’s the cases that CASCADE highlights as potentially re-
addition strategy reach 95.5% and 94% competence dundant as part of its normal operation (see Section
for the Travel and Property domains respectively when 4.6). The competence of the case base with respect to
the case bases grown at random are only at 78.5% unseen targets was recorded at regular intervals and
and 63.6% competence respectively. This shows that deletion stopped when all the non-footprint cases were
CASCADE supports an addition strategy that improves removed. The results were then averaged over 20 runs.
case base competence with respect to size.
Discussion. The results can be seen in Figure 12(c)
and (d) as competence against case base size. When
5.2. Case Deletion Experiment all the non-footprint cases have been removed compe-
tence of the case base remains at near optimal levels.
Method. In this experiment we assess how the visu- However, when removing cases at random to produce
alisation tool can help support the author during case the same size case base, competence drops to 87.3% in
deletion, and specifically, the removal of redundant the Travel domain and 82.9% in the Property domain.
cases from a case base. For each data set a case base Clearly, when deleting cases from a case base the CAS-
was produced; 1000 cases for the Travel domain (from CADE system provides valuable guidance to an author
a total of 2200) and 700 cases for the Property do- as it can supply a list of the non-footprint cases in each
An Interactive Visualisation Tool 113

competence group, hence preventing the removal of of case competence that may form the basis for addi-
important cases from the case base. This shows that tional support facilities (e.g., identifying competence
CASCADE supports a deletion strategy that actively “holes” in the case base).
preserves case base competence.
Acknowledgments
6. Conclusions
The author’s gratefully acknowledge the support of
Case-based reasoning has many advantages to offer Enterprise Ireland through grant number SP/97/08. We
for building interactive reasoning and problem solv- would also like to thank our anonymous reviewers for
ing systems. In many CBR systems (industrial applica- the many constructive comments that have helped to
tions and research alike) the user is involved in differ- shape this paper.
ent stages of the CBR problem-solving cycle, usually
during the retrieval, adaptation or learning stages. Of References
course there is also a long history of user involvement
in the case authoring process, but, surprisingly perhaps, 1. A. Aamodt and E. Plaza, “Case-based reasoning: Foundational
little progress has been made in developing new ways issues, methodological variations, and system approaches,” AI
Communications, vol. 7, pp. 39–52, 1994.
of supporting the case author beyond the usual editing
2. J. Kolodner, Case-Based Reasoning, Morgan Kaufmann: San
and auditing facilities. Mateo, CA 1993.
In this article we have looked at the issue of case 3. D.B. Leake, Case-Based Reasoning: Experiences, Lessons, and
authoring support and we have described an innovative Future Directions, MIT Press: Cambridge, MA, 1996.
authoring environment called CASCADE. As well as 4. I. Watson and F. Marir, “Case-based reasoning: A review,” The
Knowledge Engineering Review, vol. 9, no. 4, pp. 355–381,
providing all of the usual interactive case editing fa-
1994.
cilities, CASCADE maintains a predictive model of 5. S. Craw, N. Wiratunga, N., and R. Rowe, “Case-based design
evolving case base competence, and uses this model as for tablet Formulation,” in Proc. Of the 4th European Workshop
the foundation for a powerful case base visualisation on Case-Based Reasoning, Dublin, Ireland, 1998, pp. 358–369.
tool that allows the author to monitor and interact with 6. M. Fagan and S. Corley, “CBR for the reuse of corporate SQL
knowledge,” in Proc. Of the 4th European Workshop on Case-
the developing case base.
Based Reasoning, Dublin, Ireland, 1998, pp. 382–391.
In the course of this article we have described how 7. D. Hinkle and C. Toomey, “Clavier: Applying case-based rea-
the competence model is computed, and we have shown soning to composite part fabrication,” in Proc. of the 6th Inno-
empirically that it correlates well with true competence vative Applications of Artificial Intelligence Conference, AAAI
under a range of conditions in two different publicly Press (MIT Press), Cambridge, MA, 1994, pp. 54–62.
8. P. Klahr, “Global case-based development and deployment,” in
available data sets. In addition, we have shown how this
Proc. Of the 3rd European Workshop on Case-Based Reasoning,
model allows the case base to be visualised in an in- Lausanne, Springer Verleg: Switzerland, 1996, pp. 519–530.
formative way. This visualisation facility provides the 9. E. Simoudis, “Using case-based reasoning for customer techni-
author with a dynamic visual record of the case base. It cal support,” IEEE Expert, vol. 7, no. 5, pp. 7–13, 1992.
serves to focus the author’s attention on certain regions 10. D.W. Aha, T. Maney, and L.A. Breslow, “Supporting dialogue
inferencing in conversational case-based reasoning,” in Proceed-
of the problem space so that competence may be max-
ings of the 4th European Workshop on Case-Based Reasoning,
imised by adding cases where they are most needed, Dublin, Ireland, 1998, pp. 262–273.
and removing cases when they are redundant. In short, 11. D.W. Aha and L.A. Breslow, “Refining conversational case li-
the competence model and visualisation tools facilitate braries,” in Proc. Int. Conf. Case-Based Reasoning, Providence,
directed interaction between author and case base. RI, Springer Verleg: USA, 1997, pp. 267–278.
12. K.D. Althoff, E. Auriol, R. Barletta, and M. Manag. A Review of
Future work will concentrate on developing addi-
Indstrial Case-Based Reasoning Tools. AI Intelligence, Oxford
tional authoring support facilities. Currently, the case University Press: Oxford, UK, 1995.
author is responsible for identifying which groups to 13. I. Watson, Applying Case-Based Reasoning: Techniques for En-
focus on during authoring. In the future we will in- terprise Systems, Morgan Kaufmann: Los Altos, 1996.
vestigate ways of automatically recognising so-called 14. A.G. Francis and A. Ram, “A comparative utility analysis of
Case-based reasoning and control-rule learning systems,” in
‘interesting’ groups, either because they may benefit
Proc. of AAAI Case-Based Reasoning Workshop, Seattle, USA,
from extra cases, or because they contain redundant 1994, pp. 36–41.
cases. We will also continue to develop our model 15. F. Heister and W. Wilke, “An architecture for maintaining
of case competence and to investigate new properties case-based reasoning systems,” in Proc. Of the 4th European
114 McKenna and Smyth

Workshop on Case-Based Reasoning, Dublin, Ireland, Springer 34. Z. Zhang and Q. Yang, “Towards lifetime maintenance of case
Verleg: 1998, pp. 221–232. base indexes for continual case-based reasoning,” in Proceedings
16. D.B. Leake and D.C. Wilson, “Categorizing case-based mainte- of the International Conference on AI Methodologies, Systems
nance: Dimensions and directions,” in Proc. Of the 4th European and Applications (AIMSA98), Bulgaria, 1998.
Workshop on Case-Based Reasoning, Dublin, Ireland, Springer 35. B. Smyth and E. McKenna, “Modelling the competence of case-
Verleg: 1998, pp. 196–207. bases,” in Proceedings of the 4th European Workshop on Case-
17. K. Racine and Q. Yang, “Maintaining unstructured case bases. Based Reasoning, Dublin, Ireland, 1998, pp. 208–220.
case-based reasoning research and development,” in Proc. of the 36. B. Smyth and M.T. Keane, “Adaptation-guided retrieval: Ques-
2nd Int. Conference on Case-Based Reasoning, Providence, RI, tioning the similarity assumption in reasoning,” Artificial Intel-
Springer Verleg: USA, 1997, pp. 553–564. ligence, vol. 102, pp. 249–293, 1998.
18. B. Smyth, “Case-based maintenance,” in Proc. of the 11th Int. 37. M. Lenz, H.-D. Burkhard, and S. Brückner, “Applying case re-
Conference on Industrial and Engineering Applications of AI trieval nets to diagnostic tasks in technical domains,” in Proc. of
and Expert Systems, Castellon, Springer Verleg: Spain, 1998, the 3rd European Workshop on Case-Based Reasoning, 1996,
pp. 507–516. pp. 219–233.
19. B. Smyth and P. Cunningham, “The utility problem analysed: 38. C. Blake, E. Keogh, and C.J. Merz, UCI Repository of
A case-based reasoning perspective,” in Proc. of the 3rd Euro- Machine Learning Databases [http://www.ics.uci.edu/∼mlearn/
pean Workshop on Case-Based Reasoning. Lausanne, Springer MLRepository.html]. Irvine, CA: University of California,
Verleg: Switzerland, 1996, pp. 392–399. Department of Information and Computer Science, 1998.
20. B. Smyth and M.T. Keane, “Remembering to forget: A compe-
tence preserving case deletion policy for CBR systems,” in Proc.
of the 14th Int. Joint Conf. on Artificial Intelligence, Montreal,
Morgan Kaufmann, Canada, 1995, pp. 377–382.
21. D.V. Dasarathy, Nearest Neighbor Norms: NN Pattern Classifi-
cation Techniques, IEEE Press: Los Alamitos, California, 1991.
22. G.W. Gates, “The reduced nearest neighbor rule,” IEEE Transac-
tions on Information Theory, vol. 18, no. 3, pp. 431–433, 1972.
23. P.E. Hart, “The condensed nearest neighbor rule,” IEEE Trans-
actions on Information Theory, vol. 14, pp. 515–516, 1967.
24. I. Tomek, “Two modifications of CNN” IEEE Transactions on
Systems, Man, and Cybernetics, vol. 7, no. 2, pp. 679–772, 1976.
25. D.W. Aha, D. Kibler, and M.K. Albert, “Instance-based learning
algorithms,” Machine Learning, vol. 6, pp. 37–66, 1991.
26. B. Smyth and E. McKenna, “Building compact competent case-
bases,” in Proc. of the 3rd International Conference on Case- Dr. Barry Smyth is a lecturer in Artificial Intelligence in the Depart-
based Reasoning, Munich, Germany, 1999, pp. 329–342. ment of Computer Science at University College Dublin. Dr. Smyth
27. D.R. Wilson, “Advances in instance-based learning algorithms,” specialises in the areas of Case-Based Reasoning and intelligent In-
Ph.D. Dissertation, Computer Science Department, Brigham ternet Systems. He has published over 70 research papers, has won
Young University, 1997. numerous international awards for his research, and has recently co-
28. D.R. Wilson and T.R. Martinez, “Instance pruning techniques,” founded Changing Worlds.com.
in Proceedings of the 14th International Conference on Machine
Learning, 1997, pp. 403–411.
29. S. Minton, “Qualitative results concerning the utility of
explanation-based learning,” Artificial Intelligence, vol 42, no.
2, 3, pp. 363–391, 1990.
30. S. Fox and B. Leake, “Using introspective reasoning to refine
indexing,” in Proc. of the 14th Int. Joint Conf. on Artificial In-
telligence, Montreal, Canada, 1995, pp. 391–397.
31. D. Wettschereck, D.W. Aha, and T. Mohri, “A review and com-
parative evaluation of feature weighting methods for lazy learn-
ing algorithms,” Artificial Intelligence Review, vol. 11, pp. 273–
314, 1997.
32. H. Muñoz-Avila and J. Hullen, “Feature weighting by explaining
case-based planning episodes,” in Proc. of the 3rd European
Workshop on Case-Based Reasoning, Lausanne, Switzerland,
1996, pp. 280–294. Elizabeth McKenna received her BSc from the Department of Com-
33. A. Bonzano, P. Cunningham, and B. Smyth, “Using introspective puter Science at University College Dublin in 1998 and is currently
learning to improve retrieval in CBR: A case study in air traffic completing her PhD in the area of Case-Based Reasoning and com-
control,” in Proc. Int. Conf. Case-Based Reasoning, Providence, petence modeling. She has published her work in numerous interna-
RI, USA, 1997, pp. 291–302. tional conferences and journals.