You are on page 1of 13

43

Proteus: A Software Reuse Library System that


Supports Multiple Representation Methods
W. B. Frakes and T. P. Pole

Software P r o d u c t i v i t y C o n s o r t i u m

H e r n d o n , V A 22070

ABSTRACT

Many methods for representing software components for reuse have been proposed.
These include traditional library and information science methods, knowledge based
methods, and hypertext. There has been no empirical evaluation of these methods,
and consequently there is no data about their relative costs and effectiveness. Proteus
is an experimental reuse library system that will be used to help gather such data.
Reuse library systems are usually tied to one methodmProteus supports multiple
methods.

1. INTRODUCTION

One of the many problem areas in software reuse concerns the representation of
reusable components. Specifically, how should one represent reusable software
components so that they can be found, understood, and perhaps automatically
synthesized? These components will be not only code, but also other software
lifecycle objects such as requirements, designs, test cases, and perhaps general
knowledge about software products and processes. The last few years have seen a
proliferation of methods proposed for representing reusable software, and the
concomitant development of systems to support those methods. These methods
are drawn from three major areas: library and information science, artificial
intelligence, and hypertext technology[FRA89,90].

Proteus is a tool to support the empirical investigation of software reuse


representation methods. Specifically, it will support the use of multiple
representation methods for a set of reusable components. It is thus different from
most reuse library tools which are tied to one representation method. We plan to
use Proteus to represent collections of reusable components in different ways, and
then evaluate the relative costs and performance of the representations. Our plans
for these experiments are described in greater detail below.
44

2. EXPERIMENTAL EVALUATION OF REPRESENTATION


METHODS USING PROTEUS

While many representations have been proposed for reuse, little empirical
evaluation of them has been done. Currently, the relative costs and effectiveness
of these representation methods for helping a user to find, understand, and
synthesize reusable components is unknown. Given this, there is a clear need for
empirical research to provide this information to software practitioners.

We are considering several evaluation methods. The first is to simply make


Proteus available to users, and observe how frequently each representation
method is used. While unsatisfying from a scientific point of view, this method is
the de-facto standard for software systems.

To experimentally evaluate these methods, we plan to select some of them from


the major categories of techniques: library and information science, artificial
intelligence and hypertext. A set of reusable components will be represented
using the selected methods. Our current thinking is to use either the UNIX
toolset[Ear86], or the Booch Parts[Boo86], since these components are thought
to be reusable.

The first experimental method we are considering is 'known item' search. For this
approach, each subject will be given a list of descriptions for items in the database,
and be randomly assigned to one of the representation methods for searching. The
dependent variables will be the number of items correctly located, and the time it
takes to find the items.

In another experimental method, subjects will be assigned the task of building a


small software system from a set of reusable components, and will be randomly
assigned one of the representation methods for finding and understanding the
components. The dependent variables for the experiment will be: successful
completion of the task, time to complete, and perhaps some variables related to
attributes of the systems produced such as efficiency or complexity. Information
about the subjects perceptions of the methods will be collected via a survey.

2.1 PROBLEMS WITH THE EXPERIMENTS

The general difficulties of experimentation both in IR[SPA81] and software


engineering[CON86] are well known and we will not discuss them here. The
experiments described above, however, have certain specific problems worthy of
mention here, since they are directly related to the design of Proteus. The first is
simply the need for a tool powerful enough to support the representation methods
needed in the experiments. We have constructed Proteus so that it will support all
45

known representation methods and so new representation methods can easily be


added. The internal design of Proteus, which supports this flexibility, is described
below.

Another problem with the experiments is the need to control extraneous


differences between the representation methods. One method may appear to be
better than another, for example, simply because it has a better user interface. We
have attempted to maintain a consistent interface look and feel across the
representations to minimize this problem. A similar problem might arise from the
different computational requirements of the methods. That is, one method might
appear to be better than another simply because it was more suited to the
underling data manipulation methods in the system which would make it faster or
appear somehow more natural. Proteus is designed to minimize this problem as
well.

3. PROTEUS OVERVIEW

Proteus currently supports four representation methods--keyword, faceted,


enumerated graph, and attribute value. These were chosen to be implemented
first because they are the most common methods used for software reuse. We will
now discuss how they are implemented in Proteus.

Figure 1 is the first user screen in Proteus. The users selects one or more available
databases from the Databases window, and a representation/search method from
the Methods window. Figure 2 shows the interface for keyword search. The top
window displays messages to the user. The Keywords-in-database window displays
all of the keywords in the database. Below it is a menu of operators that a user can
select with a mouse to construct queries. The List-of-previous-queries window
displays the search history. The one below it displays the items retrieved by the
currently selected query. The window on the bottom is where the users enters
queries. Queries use standard Boolean operators, prefix or infix notation, and
truncation.

Figure 3 shows the interface for the enumerated graph search. The top window
displays user messages. The Current-root-class window displays the node where
the user is currently positioned. Below that, the subclasses are displayed. To the
right are displayed the super-classes of the current classes, and the the parts in the
current node. We plan to replace this interface with a purely graphical one.

Figure 4 shows the interface to the faceted classification method. A good way to
represent and search a faceted classification is with a spreadsheet which we have
used. The top window is for user messages, as in the other methods. The next
window contains the facet names, one per column. Below this are the terms in
46

each of the facets. The windows below the facets window display the current facet
name, facet term, and the current part. Other windows provide the ability to sort
the terms within a facet, to center the spreadsheet on a given term, and to retrieve
information about the current part.
Figure 5 shows the interface for the attribute-value method. The top window, as
before, is for user messages. The attributes available window lists all attributes
available for the selected database(s). The values window lists all values for the
current attribute. A search is specified by selecting sets of attribute-value pairs.
In figure 5, the pairs AUTHOR = W W O O D F O R D and L A N G U A G E = ADA
have been selected. The Parts that match pattern window shows that two parts
match the search
4. DESIGN AND IMPLEMENTATION OF PROTEUS

To support these various methods, the Proteus system has been designed in a three
layer architecture(see figure 6). Software component data resides in the
Information Retrieval (IR) Layer. The IR layer is the lowest level of the Proteus
library system. The next layer up, the Reusable Functions Layer (RF), consists of
methods to search, organize, change, and perform any other tasks required to
manipulate information stored in the IR layer. The third and highest layer is the
Library Representation (LR) Layer. The LR layer consists of application
programs that are independent implementations of library representations. They
can each access the same information about the components, but each represents
those components in a different way.
4.1 THE INFORMATION RETRIEVAL (IR) LAYER
The IR layer does not relate the components to each other, it only makes data
about each object available. Retrieval of data about a part, consists of asking that
part to provide data about itself. No data item of any object is easier to retrieve
than any other datum about any other object. Data about multiple components,
relationships between components, components of similar attributes, or
knowledge about anything beyond the part under examination, is not available
to the part object. This is done to encapsulate the tasks of retrieving component
data in the IR layer, and the tasks of relating components and data about them
in the RF layer.
4.2 THE REUSABLE FUNCTIONS (RF) LAYER
All representations have certain requirements, and need a tool set of functions
that can be used to implement those requirements. All representations must, for
example, have functions that implement a user interface. Different tool sets must
all be able to add descriptions of components to the IR level. Once a list of
candidate components has been found, it will need to be sorted and/or examined.
Proteus has a set of generic functions for these tasks.
By making these functions generally available, a tool set for a new representation
can be built in less time. By isolating them in a separate layer and building them
47

Library Representation Layer

keyword search by attribute faceted


search class value pattern classification
matching search

Reusable Functions Layer


sort-routines user-menus control-strategies

classification population
pattern-matchers
hierarchies routines

Information Retrieval Layer


Description of components, as instantiations of objects.
Methods are supplied only to interrogate a part about itself

[ figure 6 ] Three Layered Architecture of Proteus


The LR implementations in the top layer are only some
examples of those possible, as are the functions in the RF layer. RF
components developed for one LR implementation can be reused by
another. LR implementations can be combined to allow representa-
tion of components by multiple methods (i.e. a search that names
the keywords to search for in a specific class of components.)

from another set of reused lower level functions (that are also in the RF layer)
differences in the performance of different representations caused by factors
extrinsic to the representation, can be minimized
4.3 LIBRARY REPRESENTATION (LR) LAYER

Different representations are implemented in the top (LR) layer. The LR


implementations are built primarily from reusable components from the RF layer,
and they all use the IR layer for data storage and retrieval. Each representation
of the software components in the databases also may require services not
produced for previous representations. These new services are supplied by
functions that are added to the RF layer. These may be reused in new
48

representations, or by existing representations that would acquire some


advantage by adding the service supplied by the new function.

4.4 ALLOW QUICK CONSTRUCTION OF NEW REPRESENTATIONS

Proteus is more than a collection of software component libraries with different


representations. Proteus is the set of tools and methods that allow those
representations to be implemented. The IR level of Proteus supplies the
low-level functionality required to store information about the components and
relations between components. The RF layer is a collection of software
components that supply the functionality to construct data-structures, to sort
these structures and search them, to alter them, and to display their status. After
the first representation was implemented, approximately two-thirds of the
functionality for the second was available for reuse in the Reusable Function
Layer. When the second was implemented, an even larger segment of the
requirements for the third was available in the RF layer. Each new representation
has required some new construction in the RF layer. Some future representation
implementations will require additions to the RF and the IR layers.
Representing components as they are used in existing systems, such as those
used by a domain specific code generator, requires that the relationships among
components be treated as atomic data. This can be implemented in the IR layer
by defining components which represent the relationships among other
components. However, the greater the supply of RF components, the fewer
entirely new components (in contrast to old components adjusted to a new
implementation) will have to be produced for new representations.

4.5 M I N I M I Z E EXTRANEOUS DIFFERENCES AMONG METHODS

For the experiments to be valid, Proteus must not favor some representations over
others. The underlying services that are supplied to the different representations
must fulfill all of the requirements of each representation, while not favoring
some over others. For example, an enumerated class structure might be favored
by a database that stores the components in a tree structure
To minimize the effects of the design and implementation of these services, care
has been taken to make all component information equally retrievable by all
representations. The sub-classes of a class of component can be returned via a
function call, with approximately the same delay as a function that returns all the
keywords of a group of components. The list of all components that are members
of a class, along with all members of that class's descendant classes, is available
with a delay equivalent to that of returning a list of all components that match a
two or three word Boolean keyword query. The object oriented design paradigm
used in the design and implementation of Proteus retrieves information about
an object via a function call whether it is static data, or the return of a function
that determines the data. This helps to enforce the requirement of having all types
49

of information concerning the databases, equally available. In the source code


that the representations use to access services, any data retrieved employs the
same syntax, no matter how the underlying services layer implements that service.
It is probably impossible to equalize all such services among all representations,
or even to determine a method of exhaustively testing for all such
service/performance/representation relationships. We have, however, attempted
to minimize the impact of such relationships between services supplied, and
representations implemented.
We did some performance testing of services supplied to representation s ,
especially on services that showed up as bottlenecks to performance. In some
cases where an underlying service could have been improved (e.g. in response
speed), but this service would have given advantage to one representation over
another the "better" service implementation was not employed. This project's
purpose is to compare representations, not produce an optimized library search
and retrieval system.
4.6 C O M M O N LOOK AND FEEL

The success of Proteus as a vehicle for the evaluation experiments is dependent on


the quality and consistency of the user interface across representations. We used a
joint application design(JAD) [WOO89] to design the major features of the
interface. In a JAD, users and developers meet and design an interface using some
rapid prototyping method. We used supercardon a Macintosh since it provided us
with the ability to create and change a window based interface rapidly. Our user
group was drawn from several member companies of the Software Productivity
Consortium. Use of JAD was very successful for this activity--we arrived at an
acceptible interface after one day.

The graphical user interfaces (GUI) of the different interfaces are all built from
the same window building toolkit which was produced in support of the Proteus
project. It is an object oriented set of tools that allows the GUI interface windows
to be built quickly, and to easily expand the services supplied by classes of objects
available from the toolkit. This inherently produces interfaces with a common
"look and feel", as most interface objects in each representation are of the same
common classes of objects. For example, each representation must have an
interface that allows the user to examine the data available on a single component,
once a likely component is found. All the representations use an identical window
for this purpose. Also, each interface supplies some information that is
appropriate to the database of components that the user has chosen (keywords
defined for the database in a keyword based representation, list of all sub or
super-classes of a class in the enumerated class representation.) and the window
objects to display these data in each representation are of the same class of GUI
object. The help windows which are attached to each representations GUI objects
are also instances of the same class of objects.
50

4.7 COMMON LOWER LEVEL DATA-STRUCTURES MINIMIZE


DIFFERENCES

The lower level data structures and the tools that manipulate these structures are
shared by all the representations. These services include the retrieval of data on
a component by that components name, the ability to order textual or numerical
data for sorting, the search of a data structure for a datum by matching some
search criteria to an aspect or attribute of the data, and the ability to present this
information to a user through GUI interface objects and to define and customize
these objects. As stated above, Proteus has been designed and implemented to
minimize the effect of the relationships between representations and underlying
services. Whenever feasible, underlying services have been implemented so that
no representation will gain apparent advantage in functionality or usability
though its commonality or similarity to the implementation of underlying services.
5. CURRENT STATUS OF PROTEUS

Proteus currently consists of 9951 lines of Lisp (Lucid's implementation of the


proposed CLOS Lisp standard version 4.0). It was developed on a TI explorer and
an Apollo workstation. Approximately 1.5 person years of effort have been
expended so far to build Proteus and populate its libraries. The development time
for Proteus has far exceeded our original estimate of .33 person years. Much of
the delay is due to problems with the implementations of Lisp we used.

Proteus was developed on two platforms with major differences in operating


system, editors, and Lisp development environment. This was expected, but the
degree to which this would affect the development time was not. The Texas
Instruments Explorer I used for the initial rapid prototyping of the interface, and
the development of the underlying data structures of Proteus, uses a proprietary
operating system and a ZMACS editor (among others available), while the
Lucid/Domain Lisp on the Apollo was used in a Unix operation system and an
editor that combined features of ZMACS and EMACS editors as well as the
Apollo Pad editor.
A Lisp environment is quite different from those for most other languages (the
Rational environment for Ada being one exception). Many tools, including the
editor, compiler, incremental compilers and interpreters, debuggers, and
directory editor, are all integrated. The ways that Apollo/Lucid have implemented
these are quite different from the TI version, with some functionality available in
one but not the other, and those that are commonly available using different
(sometimes contradictory) keystroke combinations.

The window tool kit in the Apollo/Lucid Lisp was also more difficult to work with
for two reasons. First, the Apollo toolkit is functionally defined rather than OOD
defined as it is on the TI. To follow the design of Proteus, and to reuse components
between the representations, an OOD library of user interface objects had to be
5i

defined and implemented. This brought about the second problem. The Domain
window toolkit environment was not robust. There were many difficult to explain
errors that occurred (some of which were fixed by patches from Apollo when they
were made aware of the problems) and Lisp system crashes that were difficult to
reproduce, as they did not give any error messages. These crashes would often
crash the Unix shell process running the Lisp process as well, making traceability
of the error even more difficult. Some of the errors that were found to cause this
behavior, were as simple as a misspelled symbol name being evaluated during the
initialization of graphical objects. About 40% of the time spent building the
graphical user interface for Proteus was spent diagnosing errors in the Lisp
window tool kit, or in having to repeatedly rebuild images, and reboot systems
that exhibited errors that could not be traced because they were not dependably
repeatable.

The current Proteus libraries are as follows.

Library Booch Cosmic Lisp Unix Total


Number of 434 75 58 120 687
Parts

The following table gives timings (in seconds) for Proteus operations. Empty cells
in the table indicate that no time is required for the operation. For example,
faceted and enumerated searches have essentially no retrieval time associated
with them because of the way they are structured. We feel that time differences
between operations for different representations are not so large as to bias the
experiments.

Operation Keyword i Faceted Enumerated Attr-Value


Load 26 48 10 24
Search 11 . . . . < 10
Display 29 24 24
6. FUTURE WORK

Future enhancements to Proteus will include other representations including a


rule based representation and perhaps other AI based representation as well. A
hypertext representation would also be desirable. We also plan to add other
databases. One challenge has been to make Proteus efficient. This is a common
problem with Lisp based systems. To create a production version of Proteus,
capable of handling databases in the megabyte range, a rewrite in a production
52

oriented language would probably be required. C + + seems a likely candidate


since it offers many of the object oriented constructs used in constructing Proteus.

7. CONCLUSIONS

In this paper, we described Proteus, a reuse library tool that allows multiple
methods to be used. Proteus will be used to empirically evaluate reuse
representation methods, so that cost/perforance data about them can be used to
guide practitioners. Planned experiments were described. We have discussed how
the design of Proteus attempts to handle experimental problems, and have given
descriptive data about Proteus' execution speed and databases.

References

[BOO86] Booch, G. (1986), Software Components with Ada: Structures, Tools, and
Subsystems, Menlo Park, CA: Benjamin/Cummings.
[CON86] Conte, S., Dunsmore, H., Shen, V., Software Engineering Metrics and
Models, Menlo Park, CA: Benjamin Cummings, 1986.

[EAR86] Earhart, S. (Ed.) Unix Programmers Manual, v.1, Commands and


Utilities, New York: Holt, Rinehart, and Winston, 1986.
[FRA89] Frakes, W. and Gandel, E (1989). "Representation Methods for
Software Reuse" Proceedings of TRI-ADA'89. New York: ACM Press

[FRA90] Frakes, W. and Gandel, E (1990). "Representing Reusable Software"


Information and Software Technology. November 1990.
[KAT83] Katzer, J., McGill, M., Tessier, J., Frakes, W., and Das Gupta, P. (1983).
"A Study of the Overlaps among Document Representations", Information
Technology : Research and Development, (January).
[SPA81] Sparck-Jones, K., Information Retrieval Experiment, London:
Butterworths, 1981.

[TRA88] Tracz, W. "Software Reuse Myths". (1988) in Tracz, W. ed., Software


Reuse: Emerging Technology, Washington: IEEE Computer Society Press.

[WOO89] Wood, J. Joint Application Design: how to design quality systems in 40%
less time. New York: Wiley, 1989.
53

:.p.screen []

• g • B( . i o

Melcone to Proteus...
Please select one or note databases
House click left to choose.
]hen choose • search method in the same way.
Rouse click on the 9o box to continue your search.

Labase~ thods~
~OOCH-PARI ATIRIBUIE-VALUE
COSNIC-PARI ENUMERATED-GRAPH
LISP-PARI FACEIED-CLAS$IFICATION
UNIX-RPD-IOOL KEYNORD-2
UNIX-IDOL

Figure 2
isp.Sc~-een .. ' ' ' [] IB

ord-I is t box.
I C V ~ p l e t e t o per~or,
I "Lh~ r~I~l~ Imf~t~ ~aiIabIe one part ~ound.
, click ]eft on that part in the "Items found in..." window.

~i~,,~],¢II?J,~IT.~,~R~,r- .- --
NOIATION
NTH
NTH-KEY
NTH-KEYTH
NTHS
NULL
OBJECT
|{~I.k'II,1~]tI,lB I ~m | iI~I, ~ lll-'i:-I ~ ¢ J I~.~;IFJ,I,~.,(: O~JECT-INSTANCE
~LPHR-SORT-IWO OBJECT-INSIANCES
INSERI-IN-ORDER-ALPHA O£COEJ¢CE
I~ER T-IN-ORDER-PLPHB-UNIQUE ODD
LISI-SORI ONLT
OPS
ORDER
ORDERING
@IHERNIS£
54

Figure 3

Chor'~Poot class appears halo..


Click on Sub-class or Super-class rm.es to Rove through enu.erated graph.
Click on part hare to exaRine a part.
Click on SHOWPARIS to see the l i s t o~ parts.
Click on CLASS DEF. to see the definition of the chosen root class.

l ~$MiC-PARI
b-classes of Current Root

AN-INTERACTIVE-EDIIOR-FOR-DLFINIiXUN-U
ATHENA-FORIRAN-COMPILER
BUIOMATED-FLONCHBRI-SYSTEM-TEXBS-BBM-U
BDMBDS-BASIC-DBTA-MBNIPULnTION-AND-DIS
BOX-LBNGUAGE-SOFINARE-DEVELOPMENI-SYST
-COST-AND-RELIA~ILITY-ESIIMATION-
•AREM
OAP-COLLECIOR-OUIPUT-ANALYSIS-PROGRAM
COMPLEX-BN-flPL-WORKSPBCE-FOR-NANIPULBI
CRISP80-SOFIWBRE-DESIGN-ANBLYZER-SYSTE
CROSSREF-LIBRARY-LOAD-CROSS-REFERENCE-
CROSSREF-LI~RARY-LOAD-CROSS-REFERENCE-
CSMR-COMMON-SOFTWBRE-MODULE-REPOSIORY
ii ~CUAS-CODE-USAGE-ANALYSIS-SYSTEM
A T A T IB f l n h W n n h P n ^ w w r B r h r T u A ~ r rh~

Figure 4

PRRTLNANE ACTION FUNCTIONAL-AREA OBJECT

L.D.-R.P.D LiNK. . . . . cOMPiLER FILE


LORDER-RPD SEARCH COMPILER LIBRARY
M~RPD CONVERT COMPILER PROGRAM
YACC-RPD CONVERT COMPILER PROGRAM
IOIN-RPD JOIN DAIABASE-SYSTEM FILE
)O-RPD DUMP DEBUGGING-SYSIEM FILE
iD-RPD UPDATE EDITOR FILE
:nTT-RPD IIPflATF DTTNR
TX-RPD
PJX-RPD
SED-RPD
UPDATF
GENLRAIE
UPDATE
DITOR
LDIIOR
EDITOR
Ill[
FILE
$PELL-RPD VERIFY EDITOR TEXT
;PELLIN-RPD VERIFY EDIIOR TEXT
VI-RPD UPDATE EDIIOR FILE
TOUCH-RPD UPDAIE FILE-HANDLER lIME
-HANDLER FILE
TAIL-RPO
SUM-RPD
SCCSOIFF-RPD
Lt~I
COMPARE
tILE
ILE-HBNDLER
FILE-HANDLER
FILE
VERSION
~MDIR-RPD DELEIE FILE~ANDLER DIRECTORY
~MOEL-RPD DELETE FILE-HANDLER VERSION
RN-RPO DELEIE FILE-HANDLER FILE
DD-RPD ~ CONVERT F,!LJ~

EX-RPD

lFUNcIIONAL'AREAL41IqBII'z~e~('~'('J~I'uNI
JT~n'l~mzq~'mqtz'~'~'
55

Figure 5
---'kq-111111
i i l i l ; l i l l r ) l i~3Ifl~ k'~.'~'1~,~[rPil'~] [~]~, :L?1~ [~iw,mlII~,{fllll

Attribute value pattern natchin9 allows you to define


the itens(s) you are seekin9 by definin9 a set oF a t t r i b u t e
pairs that describe the itens sought.

lues of the attrlbute: AUIHOI


• JONES
L MONROE
LANGUAGE I •W PERRINE
OODFORD
LEVEL-OF-CONFIDENCE
LOCATION
m
l • RH SZUCH
PACE
SIZE
$IZE-NEIRIC
m
m • FR GLANDORF
COOK

I
r t s t h a t match p a t t e r n ttern to match
BUTOMATED-FLOWCHARI-SVSIEM- AUTHOR N NOODFORD
FDAS-FLIGHI-DYNANICS-AN~VS LRNGUAGE ADA

You might also like