You are on page 1of 32

PhD Thesis Proposal: Linking Design to Source Code using Design Rationale Graphs

Elisa Baniassad
Supervisor: Gail Murphy Committee Members: Cristina Conati, Kris DeVolder, Norm Hutchinson

Abstract
As source code travels the software lifecycle, the reasoning behind certain design decisions is lost, often leading to violations of high-level constraints and requirements during program maintenance.

We present a model called a Design Rationale to Source Graph (DR-SG) that is a graph formed from Design Pattern documentation and linked to a source code base. The DR-SG allows developers to trace design concepts through design documentation down to code.

The thesis of this proposed research is that software engineers would could completely and confidently satisfy high-level design goals when performing software change tasks if provided with a semi-automatic technique for tracing from the high level design goals, through design documentation, to source.

We will test software engineers confidence and completeness in satisfying high-level design goals while performing software change tasks by conducting two studies: a controlled experiment, and a set of industrial case studies.

Elisa Baniassad

Page 1

Thesis Proposal

Table of Contents
1 Introduction............................................................................................................................................................. 3 1.1 Example 1: Finding the Rationale behind Code ............................................................................................. 4 1.2 Example 2: Learning How Responsiveness is Implemented.......................................................................... 7 Design Rationale to Source Graphs ........................................................................................................................ 9 2.1 Creating a DR-SG ........................................................................................................................................... 9 2.1.1 Step 1: Provide the Design Pattern Text................................................................................................. 9 2.1.2 Step 2. Provide a dictionary.................................................................................................................. 10 2.1.3 Step 3. Create the Source Model........................................................................................................... 10 2.1.4 Step 4. Provide the Match File.............................................................................................................. 10 2.1.5 Step 5. Run the DR-SG Tool ................................................................................................................ 11 2.2 Using the DR-SG Tool.................................................................................................................................. 12 2.2.1 Reading a DR-SG.................................................................................................................................. 12 2.2.2 Creating a Sub-DR-SG by Querying .................................................................................................... 12 Issues to Resolve................................................................................................................................................... 14 3.1 Making Use of More Formal Natural Language Mechanisms ..................................................................... 14 3.2 Pattern Mining to Help Infer Matches .......................................................................................................... 14 3.3 Accounting for Synonyms and Pronouns when Forming DRG Relationships............................................. 14 3.4 Enhancing Readability of Sequences............................................................................................................ 15 3.5 Showing the Context of Examples................................................................................................................ 15 3.6 Reducing redundancy in the Graph by Forming Clusters............................................................................. 15 Studies and Experiments....................................................................................................................................... 17 4.1 Pre-study: Determining Readability.............................................................................................................. 17 4.2 Controlled Experiment: Comparing Completeness ...................................................................................... 19 4.3 Industrial Case Studies: Recording Confidence............................................................................................ 20 Related Work ........................................................................................................................................................ 22 5.1 Examining Code............................................................................................................................................ 22 5.2 Finding Structure in Source .......................................................................................................................... 22 5.3 Linking High-level Design Information to Source ....................................................................................... 23 5.3.1 Using Pre-linked information................................................................................................................ 24 5.3.2 Checking for Structure in Source.......................................................................................................... 25 5.4 Tracing Requirements to Source................................................................................................................... 25 5.5 Linking from Within Documentation to Source ........................................................................................... 26 5.6 Alternative Representations of Design Patterns............................................................................................ 26 Contributions and Summary ................................................................................................................................. 27 References............................................................................................................................................................. 28

6 7

Elisa Baniassad

Page 2

Thesis Proposal

1 Introduction
Changing software is a difficult task. While changing code, developers must try to respect the systems functional and non-functional goals. In his paper Software Aging [Parnas], Parnas notes a phenomenon he calls ignorant surgery. Ignorant surgery occurs when software developers change code, without adequate understanding of the design. Changes lead to design inconsistency, and code that is complex and difficult to maintain. Lehman and Belady [Lehman] noted that when changing software, developers inject into it, hidden underlying design assumptions, thus increasing software complexity. In their study of code decay [Eick], Eick et al noted several reasons code becomes harder to change as its life progresses. One of the key reasons is the introduction of change that violates the systems original design principles. To change a system without contributing to its decay, developers must understand the design goals that affect or are affected by each changed piece. If code is changed without accounting for all the effects, the systems design may be violated.

Software developers told us this during a study we conducted [concerns-study] about design concerns faced when changing software systems. The study reinforced what Parnas expressed so well: source code is a tangled mess of interacting design goals. In our study, we asked developers about their need to understand the design rationale context of the code they were changing. They expressed the desire to see which design goals affected a piece of code, and which design goals depended on a particular piece of code. Knowing how a piece of code fit structurally into a system was insufficient. Structural information, such as calls information or class hierarchy information shows programmatic dependencies, but does not reflect the rationale for a particular piece of code. Developers claimed that with design rationale context present, they would have been better able to predict the ramifications of changing a portion of source code.

[Parnas] and [Leman] state that documentation of design structure, goals, and assumptions, is necessary to ease the pain of software evolution. To better understand the reasoning behind the structure of a system, developers can examine the design documentation to look for rationale context. However, in another study we performed [DRG1], we noted that when reading design documentation, developers often lose track of the relevant design rationale. For instance, we asked developers to read the Visitor pattern [GOF] for the first time, and then answer a question central to the design: how does the Visitor pattern determine which operation is executed?. After reading the documentation, none of the developers were able to correctly answer the question, or link it back to the concept of double dispatch; a seemingly obvious connection. In examining the responses to this and other questions, we observed that to correctly and completely answer, the participants needed a way to follow a high-level design ideal through to the design details that carry it out. Tracing the rationale manually and mentally did not yield correct or complete results.

In short, when changing source code, developers must keep in mind all relevant design rationale. However, even with the appropriate documentation, this is difficult to do mentally. Software engineers could more confidently and completely satisfy high-level design goals when performing software change tasks if provided with a semiautomatic technique for tracing from the high-level design goals, through design documentation, to source.

To test this thesis, we propose the Design Rationale Graph (DRG), and the Design Rationale to Source Graph (DR-SG) tool. A DRG is a graphical representation of the text of design documentation. A DRG encodes and depicts all the

Elisa Baniassad

Page 3

Thesis Proposal

relationships between design entities described in the design text, and allows developers to explore how design concepts are carried out. The DRG is semi-automatically linked to source code, creating a DR-SG. In the DR-SG, design entities from the DRG are shown pointing to portions of code that implement them. The DR-SG tool allows the developer to trace rationale from design goals to implementation details.

As the textual basis for the DRG we propose to use Design Pattern [GOF] documentation. Design Patterns are high level abstractions for commonly used solutions to commonly encountered software design problems. The solutions they describe are general: not tied to one specific implementation. Design Patterns describe rationale for design decisions.

In the next chapter we discuss the proposed implementation of the DR-SG tool. Chapter 3 describes the planned experiment and case studies using the DR-SG tool to test the thesis statement. Chapter 4 outlines related work. Chapter 5 concludes and summarizes the proposed approach. The timeline for the work is shown in Appendix A.

Before going into details, however, we provide two examples to illustrate how the DR-SG helps developers respect design goals when changing source code. In the first example, a developer wishes to alter a method in an implementation of the Visitor pattern. The developer uses the DR-SG tool to better understand the design context of the method. In the second example, a developer is interested in identifying portions of code related to a design solution outlined in the Reactor pattern [Schmidt et al]. In this case the DR-SG is used to trace the design goal from the description in the pattern through to examples in the source code.

1.1 Example 1: Finding the Rationale behind Code


In this example, a developer wishes to change a portion of code that implements the Visitor pattern. The code implements a simulation of patrons attending a theatre and an orchestra. The developer is interested in altering the behaviour of the admission method, and wonders why it calls the visitTheatre and visitOrchestra methods. The developer knows that this code implements the Visitor pattern, and so uses a DR-SG of the Visitor pattern to trace the implementation back to the general design context. This elucidates how, in the context of the Visitor pattern, the admission method could be altered.

The description of the Visitor pattern covers 14 pages of text. It supports the selection of a method to execute based on both the type of the initial recipient of a message and on the type of the sender of that messagethe caller. It is a simple, yet subtle design, which prescribes double-dispatch as a solution.

To begin, the user creates a DR-SG of the Visitor pattern. The user supplies four things to the DR-SG tool to create the DR-SG: a dictionary of design elements, the Visitor pattern text, a list of mappings from the dictionary entities to source code entities, and a source model of the source code. The pattern text is extracted from an electronic copy of the Visitor pattern. The developer spends about 10 minutes compiling the dictionary of design elements. Design elements are typically operation and class names, but can also be design constructs, such as double dispatch. Obtaining a source model from the code takes about an hour. The developer has limited understanding of how the code corresponds to the Visitor pattern, but has noticed that one of the classes in the code is called visitor. The developer inserts the entry visitor < first line number of the visitor class > into the match file.

Elisa Baniassad

Page 4

Thesis Proposal

The full DR-SG representation of the Visitor pattern is large, comprising over 250 nodes and 400 edges. Examining the entire DR-SG is intractable. Instead, developers use a set of query operations to select relevant portions of the DR-SG. Queries are expressed as collections of regular expressions. The user can iteratively expand or delete parts of the graph through a series of queries.

In this example, the developer is interested in seeing how the admission, visitOrchestra and visitTheatre methods fit into the overall design of the Visitor pattern. First, the developer expands nodes corresponding to lines of code of interest, and produces an initial graph. Then, the developer expands the initial graph with the regular expression depend, as a means of querying high-level dependence of the methods. The results of these operations are shown in Figure 1.

Design elements and other nouns found in the pattern are represented in the DR-SG by rectangles, the oval nodes represent verbs, and the edges are labelled with phrases linking the nouns and verbs in pattern text sentences. The subject of a verb points into the verb; the object of the verb is pointed to by the verb. Table-like nodes display implementation level information, in this case, the implementation code.

Reading the DR-SG helps the developer understand how the methods of interest fit into the Visitor pattern: the visitTheatre and visitOrchestra methods both implement the visit operation and the admission method implements the accept operation. The DR-SG also exposes double dispatch as the design concept that links the visit and accept operations. The accept operation has an explicit link to the double dispatch node, and the type of both the visitor and concrete element are shown to influence which visit operation is called. The developer also sees that in the code there are two concrete elements: the orchestra and the theatre.

Now that the developer has a better idea of how the methods of interest fit into the overall design of the Visitor pattern, they begin to understand what they can and cannot change if they are to keep the double dispatch design goal in tact.

Elisa Baniassad

Page 5

Thesis Proposal

Elisa Baniassad

Page 6

Figure 1: DR-SG: Visitor implementation linked to design concepts

Thesis Proposal

1.2 Example 2: Learning How Responsiveness is Implemented


In this example, a developer is interested in implementing a responsive client-server application. A good way to learn about such a task is to examine code, reusing design techniques that have worked in the past. In this case, the developer has a copy of code that implements the Reactor pattern [Schmidt et al]. The Reactor architectural pattern allows event-driven applications to handle service requests from one or more clients. One of the main goals of the Rector pattern is to ensure server responsiveness. The developer wishes to explore the implementation, and reuse portions that pertain to server responsiveness.

By reading the first few paragraphs of the pattern , the developer learns that for a server to be responsive, it must only block when necessary. The developer tries to read the pattern to find more information about designing for minimal blocking. However, the Reactor pattern is longer and more complex than the Visitor pattern. It is 22 pages long, and contains many options for how to design and implement the solution. Wading through the details of the pattern is difficult, and the developer has trouble extracting design information about blocking.

To trace the design goal of minimal blocking, through the relevant portions of the Reactor pattern, and into the code that implements it, the developer creates a DR-SG of the Reactor pattern. As with the Visitor pattern example, the

developer provides the DR-SG tool with a massaged text of the Reactor pattern, a dictionary of design elements, a source model of the relevant source files, and a mapping file of design elements to concrete code elements. In this case, however, the developer knows very little about the code, so leaves the mapping file empty. In such situations the DR-SG tool uses lexical matching mechanisms to link the DR-SG to the source.

To explore how blocking is implemented in the Reactor pattern, the developer queries using the regular expression block*. The DR-SG in Figure 2 is produced.

Figure 2: DR-SG of Blocking

Elisa Baniassad

Page 7

Thesis Proposal

This DR-SG shows three things. First, it shows that the Synchronous Event Demultiplexer (SED) is the only design entity mentioned with relation to blocking, and that it is commonly implemented by a select call. The DR-SG points to several such calls in the code. As in the Visitor example, portions of code that are linked to nodes in the DR-SG are shown in gray. The developer can choose to expand any of these items to see more of the code around them. Second, it shows that the SED blocks only when no events are queued at the handles. Third, it shows some sequential information. Diamond shaped nodes in the graph denote sequences of events. The unexpanded 2 diamond in the right hand side of the DR-SG indicates that there is more information about what happens after the SED blocks while waiting for events to occur.

To continue exploring the design, the developer can perform further expansions and queries. The developer may choose to expand all the design entities shown in this graph down to the code level, or may investigate each one seperately. The developer might also choose to expand the sequence diamond in the graph to find out more about what happens after the SED has blocked. The developer can perform queries to determine the relationship of the SED to the server. The iterative exploration can continue until the developer is satisfied with their understanding of how to implement server responsiveness. This knowledge can be reused in other implementations.

Elisa Baniassad

Page 8

Thesis Proposal

2 Design Rationale to Source Graphs


Design Rationale to Source Graphs (DR-SG) are intended to do two things. First, DR-SGs are intended to help developers understand informal design text, such as that found in Design Patterns, by semi-automatically structuring, amalgamating and graphically displaying the text. Second, DR-SGs provide a link between this structured design information, and the concrete implementation at code level. In this chapter, we describe in more detail the process involved in creation and manipulation of DR-SG. The goal of this chapter is to show that the proposed approach is tractable, both from an implementation and a usability perspective.

2.1 Creating a DR-SG


Creating a DR-SG for a Design Pattern requires four inputs: The text comprising the pattern. A dictionary of design elements specific to the pattern. We define a design element as an entity, participant, or concept. Some examples of design elements are names used in the implementation, such as method names or class names, and concepts, such as double dispatch. 2.1.1 A source model of the relevant code files, created by using the output of source code analysis tools such as Field [Reiss] or CIA [Chen et al]. A match file, in which some of the design elements are mapped to concrete code elements. Step 1: Provide the Design Pattern Text

Design Patterns were chosen as the textual basis for the DRG for several reasons. First, because Design Patterns describe a general solution rather than a specific implementation, their application and relevance is broader than most design documentation. Second, Design Patterns are based on observations of typical solutions to common problems. This means that developers often use them even when not designing with Design Patterns specifically in mind. Third, the documentation of Design Patterns in source code has been observed to have a positive effect on program comprehension [Prechelt and Unger]. Finally, and most practically, Design Patterns provide several important things for facilitating a link between the design and the source: Structural relationship information (containment, class hierarchies, and others) Temporal relationships ("First this, then that") Non structural ("A allows B to happen") Rationale: descriptions of reasoning behind design decisions

Providing the Design Pattern text is easy as it can be extracted from an electronic copy of the pattern. This extracted text requires one step of pre-processing by the DR-SG user before it can be input to the DR-SG tool: annotation of the text to include sequential information. The annotation involves adding the word "First" to the beginning of the first sentence in a set of steps, and the word "then" to the beginning of each subsequent sentence. Although this might appear onerous, it took less than 10 minutes to annotate the text of the Visitor pattern, and approximately 30 minutes for the Reactor pattern.

Elisa Baniassad

Page 9

Thesis Proposal

2.1.2

Step 2. Provide a dictionary

The user must next provide a dictionary of design elements. To help the user with this step, our tool presents the user with a list of nouns found in the pattern text. The user then peruses the list and selects the design elements. Among the noun phrases that were selected as design elements for Visitor were double dispatch, visitor, the accept operation, and concrete element. Nouns not chosen included key, meaning, class, and call. These were omitted because they should be allowed to appear more than once in the graph. Not all references to class, for example, should point to the same class node. The process of creating the dictionary took about five minutes for the Visitor pattern and about 20 minutes for the Reactor pattern. 2.1.3 Step 3. Create the Source Model

The source model provides the DR-SG tool with information about code, and acts as a basis for matching design entities to the corresponding portions of code. The source model is created by running source code analysis tools on the relevant files. The user of the DR-SG tool can choose any source model extraction mechanism they wish. The xrefdb tool, for instance, created by [Reiss], creates a database that provides adequate information for the source model construction. Xrefdb analyzes C and C++ code, and provides access to all the information the DR-SG tool uses to link the design to source. Source models for Java can also be created by accessing information provided by tools such as JikesBT [Charles99].

The source model should include structural information, and control transfer information. The format for the source model includes four relations:

Inherits <subclass> <superclass> Member <class name> <member name> <file name> <line number> Subroutine <name> <file name> <line number> <argument list> Control transfer <call site line number> <call to line number>

Not all of these must be present for the DR-SG tool to perform the linkage between the DRG and the source. However, more information will lead to more complete analysis. For instance, if the source model contains only inherits relations are present, then no subroutines will be matched. 2.1.4 Step 4. Provide the Match File

To match the DRG to the source, the user provides a file that contains two columns: the first, containing design entities from the dictionary, the second containing their corresponding code element. The relationship between design entities and code entities can be many to many. If more than one portion of code relates to a design entity, then one entry in the matching file would be made for each portion. The user makes as many matches as possible, but not every design entity must be linked to a code entity. If the user is unable to make any matches, the DR-SG tool will search for lexical matches between the design entities and portions of source code. Matches between entities with similar names will be added to the initial match file.

Elisa Baniassad

Page 10

Thesis Proposal

The DR-SG tool uses this initial list to infer other matches between portions of the DRG and the code. The DR-SG tool examines the list of matches, and then checks the source model for further relationships that can be inferred from each match file entry. The DR-SG tool will track what it knows about each design entity in terms of several factors:

Methods belonging to classes Inheritance relationships between classes Calling relationships between functions or methods Membership relations for classes

Then, it will compare what it has tracked from the text and the match file, and compare it to what can be extracted from the source model. As other matches are found they are inserted into the match file. 2.1.5 Step 5. Run the DR-SG Tool

Once the pattern text, the dictionary, the match file, and the source model are provided to the tool, a DR-SG can be created. In order to create the DR-SG, the tool analyzes the pattern text using a parts of speech tagger, called LTCHUNK [ltchk]. LTCHUNK differentiates between noun and verb phrases in the text. Each noun that has been identified as a design element in the dictionary is represented by one node in the DRG. Each instance of a noun that is not a design element each introduces a new node into the graph, as does each instance of a verb phrase.

Sentences are processed one at a time. The process involves the setting of the source and destination node for each edge. The first node of each sentence automatically becomes the source node. Each subsequent node encountered is set as the destination node. The source node only changes if a new verb node is encountered. Phrases linking the noun and verb phrases are placed on the edges between them.

The following list of sentences appears in the DRG shown in Figure 1. The first sentence shows the bracketed format output by LTCHUNK. The underscores were introduced during the dictionary pre-processing.

1.

Paragraph 2, page 339: [[double_dispatch]] ((is)) the [[key]] to the [[visitor_pattern]] because [[the operation_]] ((executed)) ((depends)) on the type of the [[visitor_]] and the type of the [[concrete_element]].

2.

End of page 338: Double dispatch means the operation that gets executed depends on the kind of visit request and the types of two receivers.

3.

End of page 338: The accept operation is double dispatch because its meaning depends on both the type of the visitor and the type of the concrete element.

4.

Middle of page 337: The visit operation that ends up getting called depends on both the type of the concrete element and the type of the visitor.

Sequential paths are also inserted at this time. When the keyword First is detected at the beginning of a sentence, a diamond-shaped node labelled FIRST is inserted into the DRG, and an edge is inserted from that node to the first verb node extracted from that sentence. Subsequently, when the keyword Then is encountered at the beginning of the next sentence, another diamond shaped node is inserted, this time with the number 2 as its label. An edge is

Elisa Baniassad

Page 11

Thesis Proposal

drawn from the FIRST node to the 2 node, and an edge is drawn from the 2 node to the first verb node extracted from the next sentence.

Our tool outputs the DR-SG in the AT&T graphviz format [dotty]. The graphviz (dotty) package can then be used to view the DR-SG.

2.2 Using the DR-SG Tool


Once the DRG is created and linked to the source model it becomes a DR-SG, and the user will examine it to explore design rationale. 2.2.1 Reading a DR-SG

When viewing a DR-SG, the user can trace sentences from the original pattern text. Sentences generally start at a design element or noun node, then pass through a chain of verbs. These verbs may be attached to a noun or design element nodes. When reading a verb node, all the edges leading to noun or design element nodes should be read before the edge to the next verb node. Sentences do not pass through noun or design element nodes. After reading such nodes, the reader returns to the previous verb node, and continues along the chain. When the user has reached a verb node with no verb nodes following it, they have reached the end of the sentence.

The subject of a verb has an edge pointing to the verb. To locate objects of a verb, the user follows the outgoing edges. For example, in Figure 1, the accept_operation node points to a calls verb node, hence it is the subject of the verb. The same calls node points to the visit_operation node, which is the object of that verb.

Often, it is helpful for the user to understand the ordering of sentences from the Design Pattern text. Sometimes, in Design Patterns, a set of sentences will describe a chain of events, or the concept of one thing allowing another to happen. For example, in Figure 5, the top diamond node labelled FIRST points to an unexpanded calls verb. The 2 node is read next, and points to another calls verb that links the accept operation to the visit operation. A third calls verb follows, which is also unexpanded.

Table-like nodes in the DR-SG contain location information for portions of the source code. 2.2.2 Creating a Sub-DR-SG by Querying

Even small patterns produce large graphs, so the user needs support in manipulating a DR-SG to produce a useful view. To help the user generate views pertinent to a concept of interest, we have provided regular expression based operations to expand or subtract portions of the graph. The expansion or subtraction can be with relation to the entire graph or to another sub-DR-SG.

For example, the user who expanded the DR-SG for the Visitor pattern based on the accept and visit operations would have asked for an expansion based on the regular expressions visit.*, accept.* and impl.* in order to get all related implementation nodes for the operations of interest.

Elisa Baniassad

Page 12

Thesis Proposal

Figure 3: Proposed User Interface for DRG tool DRGs are created and manipulated through a user interface (shown in Figure 3) intended to make specification of regular expressions easier. First, the user specifies the the search space for the regular expression. In Figure 3, the class traits DR-SG is selected. This selection means that any regular expression(s) specified will only be applied to the nodes and edges in the class traits DR-SG. In the bottom of the Select Context column, the list of regular expressions and operations used to create the selected DR-SG is shown.

Next, the user specifies the DR-SG to be modified. They can choose to edit an existing DR-SG, or create a new one. Once again, the list of operations and regular expressions used to create the selected DR-SG is shown in the bottom box.

The user now specifies the regular expression used to modify the selected DR-SG. To help with this, the user is presented a list of design entity and sequence nodes, and a list of categories of regular expressions. The user can select items from the list of nodes to be used in the regular expression. Similarly, the user can insert regular expression categories. The categories are stored lists of regular expressions. For instance, a category called concurrency may contain several regular expressions including concur*, synch*, and mutual*. The user also selects the types of nodes to which the regular expression should be applied. In Figure 3, the user has selected verbs only.

Finally, the user specifies which operation they would like to perform: expansion, subtraction or intersection. Intersection takes no regular expressions into account.

Elisa Baniassad

Page 13

Thesis Proposal

3 Issues to Resolve
Previous chapters have described the proposed DR-SG tool. This chapter covers unresolved design and implementation issues. Some of the issues discussed are based on the results of a small exploratory study that was conducted to help us understand how using a DRG would help readers of patterns report design details.

3.1 Making Use of More Formal Natural Language Mechanisms


The intent of the DR-SG tool is not to provide the state of the art in natural language-to graph technology. Instead, we intend to apply the simplest natural language processing possible to associate design entities through verbs. However, we will monitor the correctness and usability of the DR-SG in terms of the need for more sophisticated mechanisms than those present. For instance, if it is determined that users need to be able to express queries in terms of precise parts of speech, more sophisticated NLP techniques may be necessary. Tools supporting creation and querying conceptual graphs [Sowa] may be useful for providing this functionality.

3.2 Pattern Mining to Help Infer Matches


Currently, there is no match analysis implemented in the DR-SG tool. We will investigate examining the source model to infer matches other than those supplied by the user by making use of pattern mining techniques. Pattern mining techniques perform structural [Prechelt and Kramer] and behavioural [Lange95] analysis of source code, and indicate where in the source Design Patterns are implemented. Applying such techniques to users source code could help indicate matches between elements of the Design Pattern and highlighted portions of source.

3.3 Accounting for Synonyms and Pronouns when Forming DRG Relationships
Currently, synonyms and pronouns in the text must be massaged for the graph to be formed correctly. Pronouns in the DR-SG are interpreted as new noun nodes, and thus inserted into the graph without adequate linkage to the concept or entity to which they refer. Replacing the pronouns with appropriate concrete nouns can provide the proper link. Typically, this massaging is only necessary for pronouns at the beginning of sentences. If a pronoun appears in the middle of a sentence, it usually refers to a noun within that sentence. Since queries are generally returned with collections of complete sentences, we are assured that the pronoun will be attached to the concept or entity to which it refers.

The tool also lacks automatic support for synonyms. To assure that all synonyms of a noun or concept are linked it is necessary to replace all mentions with one common label. This ensures that all of the related paths will converge into one node. More sophisticated text analysis support and the inclusion of a synonym dictionary could help address these problems.

Elisa Baniassad

Page 14

Thesis Proposal

3.4 Enhancing Readability of Sequences


Through preliminary testing, we have determined that sequences in the DRG format are difficult to follow. The reason for the difficulty is that sequence nodes point to the first verb in a sentence, but not all of the subsequent verbs. The DRG reader is supposed to read the entire sentence pointed to by the First node before reading the sentence attached to the 2 node. For instance, in Figure 5, the second step in the upper most sequence is not merely the call from the accept operation to the visit operation, but also that the call belongs to the visitor, and passes the concrete element as an argument.

A remedy is to eliminate diamond shaped nodes and instead, insert special edges to link each verb together. One drawback of this approach is that it would be difficult to indicate unexpanded sequence information. For example, in the very top of Figure 4, we can see that the user has not expanded the verb corresponding to the 2 node. This diamond would not be visible if sequence nodes were eliminated.

3.5 Showing the Context of Examples


Currently in a DRG, all information from the pattern text is represented in the same way, regardless of where it appears in the text. This approach allows the user to focus on all information relevant to a topic or entity. Although useful for most text, this approach can be problematic for text associated with examples of design entities.

At the beginning of the Visitor pattern, there is an example of a type checking system. The fact that the type checking related nodes occur in an example in the Design Pattern text is encoded in the DRG. However, if the user is looking these nodes in a sub-DR-SG, the portion of text that states that these are example nodes is not necessarily going to be showing. Looking at type checking related nodes out of context may cause the user to assume that details about the type checking system are part of the general Visitor solution, when in fact they are specific to the example implementation.

One way to address this problem would be to show the nodes that stem from an example node differently, for instance, in a different colour. This visual cue would show the DR-SG user which nodes are associated with examples; the user could then perform further queries to draw in any larger design context desired.

3.6 Reducing redundancy in the Graph by Forming Clusters


Often, in pattern text, concepts are repeated. Translated into a DRG, this repetition results in duplicate edges and nodes. For example, as is shown in Figure 1, the central concept of the Visitor patternthe double dispatchingis expressed at least three ways in the text.

When information appears more than once, users of the DR-SG sometimes assume that the nodes refer to different concepts. For example, DR-SG users, when seeing two references to the accept operation calling the visit operation (Figure 4), assumed that the two nodes referred to different calls, when, in fact, they refer to the same call. The DRG representation may be easier to read if equivalent nodes were merged into one, or if they were explicitly grouped together into a visual box, as shown in Figure 5.

Elisa Baniassad

Page 15

Thesis Proposal

Figure 4: Gray nodes indicate duplication

Figure 5: Gray nodes clustered to explicitly show duplication

Elisa Baniassad

Page 16

Thesis Proposal

4 Studies and Experiments


This chapter outlines the proposal for testing the thesis: software engineers could more confidently and completely satisfy high-level design goals, while performing software change tasks, if they are provided with a semi-automated technique for tracing design documentation to source. Two studies are planned to support or disprove the thesis statement. These studies involve determining whether programmers given the DR-SG tool are able to more completely and confidently satisfy high-level design goals when changing software.

The first study will be a controlled experiment [Basili1]. This study will be set up such that pairs of programmers will be given specific change tasks to perform, and their abilities evaluated in terms of completeness with respect to the number and relevance of high-level design goals considered. In this study, one group of programmers will be given the DR-SG tool to help with their tasks. Other programmers in the study will be given other tools to use, and others given no tool support. The results of the DR-SG group will be compared to the other programmers.

The second study will be comprised of a set of case studies [Yin] in an industrial setting, where software engineers will be given the DR-SG tool to use while performing software change tasks. Their feedback will be collected, and evaluated in terms of the effect of the DR-SG tool on their confidence in their ability to perform considerate change tasks.

4.1 Pre-study: Determining Readability


Before embarking on proving or disproving the thesis, we wished to ensure that the DRG tool1 was effective in terms of allowing users to explore design rationale. To do this, we conducted a small exploratory study.

This study involved eight participants broken into two groups. The first group consisted of four software developers from Siemens AG: this group worked with the Reactor pattern. The second group consisted for four graduate students from the University of British Columbia (UBC): this group worked with the Visitor pattern.

We chose to use two different patterns to help reduce the likelihood that a problem in understanding was related to the way in which a particular pattern was written, or to the questions we chose to ask. The two patterns have different authors and are of differing size: the Visitor pattern is short yet subtle; the Reactor pattern is longer and more detailed.

In each of the two groups, two participants took part in control trials that involved only the pattern, and two participants took part in test trials that involved both the pattern and the DRG representation of the pattern. None of the participants were told the goal of the study.

In the UBC trials, all participants were given 20 minutes to read a hard-copy of the Visitor pattern. They were permitted to take notes as they read. Participants in the DRG trials were then given a tutorial on the DRG tool.

The DRG tool, not the DR-SG tool was used in this study. We only wished to test the users ability to read and interpret the DRG format, not perform any validation of the design to source linkage.

Elisa Baniassad

Page 17

Thesis Proposal

Then, the participants were asked to answer, as fully as possible, three questions:

1. 2. 3.

What allows the Visitor to directly access the concrete element? How is it determined which operation is executed? What is the sequence of events that occur in the Visitor pattern?

In the DRG trials, participants were able to request the experimenter to perform operations on a DRG; the participants were then presented with the resultant graphs. This approach was used due to limitations in the current DRG tool interface. DRG trial participants did not have access to the pattern text when answering the questions.

In the trials at Siemens, participants were given one hour to read a hard-copy of the Reactor pattern and to take notes as needed. As in the UBC trials, the DRG participants were then given a tutorial on the tool. The participants were then asked three questions:

1. 2. 3.

What does the logging handler register with, and what does it register for? About what does the synchronous event demultiplexer notify the initiation dispatcher? What happens after a connection request arrives?

As before, DRG participants were able to request operations to be performed on DRGs and were able to view results. These participants did not have access to the pattern text when answering the three questions.

After the participants in both the Visitor and Reactor trials had responded to the three questions, they were asked follow-up questions. Participants in control trials were asked about their level of confidence in their answers to the questions, how they used the pattern text to reach their answers, and where they drew their answers from in the pattern text. Participants in DRG trials were asked four questions:

1. 2. 3. 4.

Did the graphs help you visualize design entities? Did the graphs help you visualize relationships between entities? Did the graphs help you feel more confident about your answers? Would you choose to use this tool again?

We found the results of the study encouraging from three perspectives: DRG readability, support for detailed understanding of design concepts, and support for linking design context to design elements.

All participants in the DRG trials were able to read the graphs with relative ease, and were able to collect the information displayed in the graphs to fully answer the questions posed.

The DRG participants answered the questions in a more detailed way than those using the pattern because the DRG participants all examined each portion of the relevant graph containing the details before answering. The pattern participants, in contrast, referred specifically to only one portion of the text per question, and even then, they did not delve deeply enough in the text to draw out every relevant detail.

Elisa Baniassad

Page 18

Thesis Proposal

Finally, the DRG participants noted design concepts that provided context for the design elements involved in answering the questions. For instance, in the Reactor trials, only the DRG participants noted information about how a process blocks while awaiting arrival of events. This information helps ensure the concept of the responsiveness of servers to clients. In the case of Visitor, only the participants in the DRG trials connected the double dispatch concept to how the method to be executed is determined. In each case, the participants noted the relevant concept information only after seeing it connected to parts of the graph they were viewing.

The results of this study indicate that the DRG tool does help readers of Design Patterns trace design rationale through Design Pattern documentation. Thus, the DRG tool shows promise in helping us address the question of whether such a link helps developers performing software change tasks respect high-level design goals more confidently and completely.

4.2 Controlled Experiment: Comparing Completeness


We intend to conduct a controlled experiment [Basili2] of nested design [Pfleeger3] to determine whether programmers given a DR-SG are able to more completely satisfy high-level design goals when changing source code. The focus of this experiment is to measure how completely programmers report which high-level design goals are affected when performing change tasks.

Design Analysis Tools

DR-SG tool

Reverse engineering tool

Pattern finding/mining tool

Source code visualization tool

No tool support

Participants Programming Experience Comfortable 3 pairs


Elisa Baniassad

3 pairs

The experimental design is shown in Table 1. This study is compares the performance of users of the DR-SG tool to programmers using another tool that gives a design abstraction of source code, and programmers with no tool support. Comparison tools will be selected so as to represent reverse engineering tools, pattern finding/mining tools, and source code visualization tools. There will be one group of participants using the DR-SG tool, one group for each of the comparison tools, and one group given no tool support. To control for varying levels of programmer experience, we will use blocking [Pfleeger2]. In blocking, the pool of participants will be divided into two groups based on the assessment of their programming experience as either comfortable or expert. Participants in these groups will then be evenly and randomly assigned to a design understanding tool group. Participants will work in pairs, and will be

Expert

3 pairs

Comfortable

3 pairs

Table 1: Experimental Design

Expert

3 pairs

Comfortable
Page 19

3 pairs

Expert

3 pairs

Comfortable

3 pairs

Expert

3 pairs

Comfortable

3 pairs

Thesis Proposal

Expert

asked to talk out their reasoning while performing the assigned tasks [Wildman]. There will be three pairs of participants in each of the groups. Subjects will not participate in more than one trial, as in unrelated between-subject experimental design [Pfleeger2]. The sessions will be videotaped.

The control variable in this experiment will be the tool available for supporting the software change task.

Participants

will be asked to work with a code base that makes use of Design Patterns. All participants will be given a copy of the relevant Design Pattern documentation, and will be given time to read the documentation before beginning their tasks. Participants in will be asked to perform two change tasks in which they are asked to alter the behaviour of a certain feature of the code base. The tasks will be set up such that changing certain portions of the feature will affect other high-level design goals that are introduced by the Design Pattern. These design goals may be functional or nonfunctional.

The results of the test group will be compared to the other groups in terms of their abilities to satisfy high-level design goals while performing code change tasks. This will involve determining:

Whether they violated high-level design goals knowingly or unknowingly, Which high-level design goals they reported taking into consideration, Whether the design goals noted by the participants were relevant to the task2.

Usability information will also be collected during this experiment. Problems the DR-SG users encounter will be carefully recorded, and may result in changes to the DR-SG tool or format before the industrial case studies begin.

4.3 Industrial Case Studies: Recording Confidence


We will conduct a series of case studies [Yin] to measure whether the presence of the DR-SG tool increases programmers confidence in satisfying high-level design goals when performing change tasks.

Working with Siemens AG, we will ask software engineers to make use of the DR-SG tool while analyzing their own code. The participants in this study will be selected based on the kinds of tasks they are performing. Software engineers changing systems designed using Design Patterns will be recruited. If software engineers performing reuse tasks are found, then they will also be recruited as long as the systems they are working with have been designed with Design Patterns. In this case the thesis statement will be broadened to include reuse tasks.

Participants will be asked to report the design goals they are attempting to maintain while making changes to the code. Those performing software reuse tasks will be asked which portions of code they intend to reuse, and which high-level design goals they anticipate maintaining during this task.

The participants will be asked to use the DR-SG tool while performing their task. They will be allowed to use the tool for up to three weeks. This helps constrain the size of tasks being compared. The DR-SG tool will be instrumented to
2

This assessment would catch programmers who follow a formulaic way of attacking problems: always checking for certain design ideals regardless of the specific situation.

Elisa Baniassad

Page 20

Thesis Proposal

keep a log of all queries and DR-SGs created so as to enable us to trace the use of the tool. The investigator will keep copies of the participants code for tracking and analysis. During the three weeks, the programmers will be interviewed every two days.

At each interview, participants will be asked to walk the investigator through the portions of code they have changed since their last interview. The participants will be asked questions specific to the changes they have made, such as how did you know you could change this line of code? and what did you need to keep in mind when changing this method?. They will also be asked if the DR-SG revealed any design dependences they did not expect, or had not known beforehand. Participants will also be asked general questions about the DR-SG tool, including how they have used it, why they chose to use it in the way they did, and how it has affected their task. The participants will not be directly asked about their levels of confidence since this could easily taint their responses.

At the end of the three weeks, their responses will be analyzed in terms of their levels of confidence about maintaining design goals while performing their tasks. Confidence will be assessed by the level of certainty reported by the participants when deciding to make a particular change to their code.

It is possible that we will also be able to conduct a case study with the doc group at Washington University. This group is currently porting the implementation of a Corba Object Request Broker called ACE/TAO from C++ to Java and AspectJ. ACE/TAO was designed with heavy emphasis on the use of Design Patterns. We see the port from Object-oriented code to Aspect-oriented code as potentially involving many change and reuse tasks that involve Design Patterns.

Elisa Baniassad

Page 21

Thesis Proposal

5 Related Work
Developers can investigate design rationale by examining source code and design in many ways. In this chapter, we compare the Design Rationale to Source Graph approach to these approaches. Additionally, we compare mechanisms that seek to clarify nuances of Design Patterns with the approach taken by the DRG tool.

5.1 Examining Code


Developers can look for design intent in the code itself through information transparency. In information transparency, programmers encode design information into their code by means of naming conventions. Mechanisms, such as [Griswold], can then be used to search through the code base and expose portions pertaining to a particular concept. Concepts are seeded into the code one by one, and are best integrated at the time of program creation, which limits such a technique in terms of legacy system use. While retrieving the concepts can be lightweight, information transparency relies heavily on the willingness of the original designer to encode as many concerns as possible. Also, there is a risk that the conventions used to expose certain concepts will be violated when the system undergoes change. In the DRSG approach, because the linking of concept to code is based on structural and behavioural information, adapting a DRSG to a changed system means re-building the source model and re-seeding the match file. Since the DR-SG approach is designed to work with existing systems, it more automatically deals with changes to the underlying source code.

A widely known technique in which high-level information is integrated into the code itself is known as literate programming [Knuth]. In addition to the original, WEB, many literate programming tools are available. An example is CWEB [Knuth2], a pre-processor that supports literate programming in C and C++. In CWEB, programmers use mark-up language features to distinguish between different sections in their program. Each section contains a text part and a code part, or chunk. The code parts can be extracted so that the program can be compiled. The entire program, or web, can be typeset for easier reading. As with information transparency, literate programming presents a bottom-up approach. Because the description portions are separate from the code itself, it would be straight forward to document an existing code base, however this would require significant knowledge, and should be left up to someone who understands the code. The DR-SG approach is intended to aid developers who do not have intimate knowledge of the code they are examining, while requiring nothing of the original developers. In this sense, applying the DR-SG approach to existing code is less involved. However, literate programming can be applied to any code base, whereas the DR-SG is limited to systems that apply Design Patterns.

5.2 Finding Structure in Source


Another way to investigate design intent is to extract structural information from source code. Clich mining techniques such as GRASPR [Wills] look for algorithmic constructs in code. Clichs are common stereotypical structures resident in code. They are intended to describe algorithms and data structures, and are characterized by data and control flow information. GRASPR is able to recognize algorithmic computation such as list enumeration, event driven systems and binary searches, as well as data structures. A general problem with clich mining techniques is that because they locate such basic constructs, extrapolating design intent and context is difficult.

Elisa Baniassad

Page 22

Thesis Proposal

Design Patterns are related to clichs but express higher-level concepts. Pattern mining techniques such as Pat [Prechelt and Kramer], SPOOL [Keller] and Program Explorer [Lange95], bridge the gap between source and design. They allow developers to automatically identify portions of code as belonging to a particular pattern. These mechanisms all function by locating entire patterns in source. The DR-SG offers more flexibility in that portions of the systems source can be mapped to a pattern description without the entire pattern implementation being present. The DR-SG also provides the added benefit of linking the portions of source code directly to the design description, so tying in the design intent. It is conceivable that where possible, the DR-SG could make use of pattern mining tools as mechanisms for matching the pattern documentation to portions of the code.

Developers can also extract structural and behavioural information from source without making use of mining techniques. Program visualization [Ball et al],[Price et al] involves providing the developer with a visual abstraction of certain features of source code based on the source itself. While some such tools extract and provide visualizations of performance related information [Walker et al], and some look at interaction patterns to help elucidate system behaviour [ISVIS], those most similar in motivation to the DR-SG approach, such as [Richner et al, Stasko et al] provide graphical models to help the developer understand design intent. Richner et al provides a mechanism for customized generation of views of the source code extracted from static and dynamic information about the source. Developers can query using set operations on programmatic constructs such as methods and classes, and in terms of relationships such as contains and invokes. We see this and similar techniques as complementary to the DR-SG approach. These mechanisms provide views of code that do not link to documentation. In terms of structural and behavioural analysis of code, these techniques are preferable to the DR-SG. In the DR-SG, code is modelled only in terms of how it fits into the context of the documentation. Developers cannot query on the structure of the code itself.

In automatic clustering, rather than providing mechanisms for querying existing software structure or behaviour, tools offer restructurings of the source code based upon the optimal grouping of similar features. In general, an assignment of code portions to clusters is based on call and variable dependence relationships. In some tools, such as BUNCH [M ancoridis], the clustering is assigned based on minimal call and variable dependence relationships between clusters, and maximal relationships within. In others, such as the Automatic Query Language (AQL) [Sartipi], clustering is assigned based on the best match to a target structure set by the developer. The model is described in terms of relationships such as file includes library, function calls function and function uses identifier. In general in automatic clustering, results are reported both in terms of module-to-module relationship views of the proposed structure, and also code level views. The DR-SG approach differs from these mechanisms in that it is not intended to directly help with a re-modularization of source code. By the same token, these techniques are not intended to support linkage to high-level design intent.

5.3 Linking High-level Design Information to Source


There are various techniques available whereby developers can seek out particular high-level design information in source. In general, developers can use higher-level information that has been linked into the source, or they can check for higher-level properties in the source.

Elisa Baniassad

Page 23

Thesis Proposal

5.3.1

Using Pre-linked information

One mechanism directly intended to include design rationale information in code is the Synchronized refinement technique [Rubager]. This is a manual technique for refining source code until it is expressed as a collection of clichs. This is a laborious process, and still suffers from the same problem as the clich mining techniques in that clich information is too low level to be able to extrapolate high-level design intent.

Knowledge based techniques provide the developer with the ability to search for higher-level concepts involved in a software system. The techniques that are closest in motivation to the DR-SG are LaSSIE [Devanbu et al], where the knowledge base is built before the tool is used, and DESIRE [Biggerstaff], where knowledge is incrementally included in a domain model as developers use the tool.

LaSSIE attempts to combat design invisibility and complexity by allowing developers to query about actions and actors in the source code. For instance, in examining a private branch exchange phone system, a developer could ask "what actions by a bus controller are caused by an attendant". Such querying ability has been shown useful to developers trying to understand the design of a large software system, however, LaSSIE has some serious drawbacks. First, the because the knowledge base only contains actors and actions, there is no way to encode context information such as "why is this action performed here" or "is this operation involved in more than one feature". The DR-SG approach has no such inherent limitation. However, whether such information will be present for a given portion of code depends on the presence of the information in the pattern text. Second, the knowledge base of actors and actions is built manually, which makes start-up time for use of the tool an issue. One of the key features of the DR-SG is that it takes relatively little time to set up and begin to use. This short ramp-up time offsets the fact that the information linked will not account for all the functionality of the code base.

A different take on elucidating high-level design information is that provided by DESIRE. DESIRE allows developers to query a code base for "human level concepts". For example, in a debugging system, a developer could ask to see all the portions of code related to the "set breakpoint command". In DESIRE, identification of a concept involves examining the typical features that characterize the concept, its relationship with other concepts in the domain, relevant domain knowledge (such as synonyms or nicknames for terms) and the syntactic or conceptual context likely to occur. Concepts are stored in and retrieved from a domain model. The knowledge in the domain model is built up incrementally as developers use the tool. This means, that for conceptual information to be readily available to a developer, it must have been recorded in the past. If it has not been, then the developer can use the code analysis facilities available (a program slicer, lexical searching, structural visualization) to help build up the concept anew. If a developer is not interested in taking the time to build up a concept, and the concept does not already exist in the domain model, then DESIRE is of little help. With the DR-SG, developers can browse from any high-level information to follow it through the design and, if matched to source, then also get pointers to relevant portions of code. Granted, if the concept is not included in the Design Pattern that was the basis for the DR-SG then that conceptual information is not available. However, if the concept of interest is described in the pattern, then it is available "for free".

A new approach to including high-level design and rationale information is the Variorium tool [Chiueh et al]. This tool facilitates a multi-media walk-through of a code base to give the designer the opportunity to attach design rationale to the code. This provides the user with the ability to hear and see exactly what the designer was thinking when they

Elisa Baniassad

Page 24

Thesis Proposal

created a portion of source code. Of course, the effectiveness of the tool depends on whether the original designer has taken the time, or had the forethought needed to include such information. There is no guarantee that what is recorded will be of use in later explorations of the code. Because a DR-SG is linked to existing code bases, developers will only make use of DR-SGs that are useful to them. The DR-SG is obtained quickly and easily from the Design Pattern, and requires no additional work of the original designer. Additionally, querying in the Variorium tool is limited to the titles of video and audio, and the textual annotations. The contents of the video and audio are not searchable, which means that the developer using the tool must actually watch or listen to an entire video or audio entry to access them. This means that apart from being work for the original designer to set up, it is also a serious task for the developer to use. 5.3.2 Checking for Structure in Source

When checking for the conformance to design in source code, developers can look for posited structure, or pattern compliance. As an example of checking for posited structure, we discuss the RM tool [Murphy et al], because it is both flexible in terms of the kinds of structures that can be addressed, and lightweight. We use PatternLint [Sefika et al] as a representative example of pattern checking tools and techniques.

Software Reflexion Models (RM tool) allow the developer to check the conformance of system code to intended structural properties. This mechanism is intended to help programmers understand high-level structural design information as opposed to functional design information. In this way, we see the structural checkers as complementary to the DR-SG approach, which does not provide functionality intended to help with structural understanding of the source, or conformance to a particular design model.

PatternLint is intended to facilitate checking whether a pattern is implemented correctly. Code is analyzed for information like calls and variable dependence between classes, and stored in a fact base. Facts describing structural features of patterns are also introduced. A developer may then compare the system against a pattern description using a series of conformance and non-conformance rules. The DR-SG is a less formal approach than this for linking pattern design information to source. For PatternLint to locate a pattern, all the conformance and non-conformance rules must be satisfied. Partial checking is not provided, whereas with the DR-SG approach, if portions of a source system relate to portions of a pattern, they can still be linked to the pattern description. Also, the use of Prolog makes PatternLint a heavier-weight technique in terms of preparation for use.

5.4 Tracing Requirements to Source


To link rationale for design choices into the source itself, developers may look to requirements traceability mechanisms. Such tools help developers relate various artefacts including documentation about system requirements [Lindvall]. Only certain tools allow traceability through to source code. An example is the Pegasys system [Moriconi et al]. In Pegasys, a system's design is described by a hierarchy of pictures that show important design concepts such as dataflow, structure and module interconnection. These pictures are refined according to precise refinement rules, such as the requirement that refinements must add detail to an existing concept. The pictures were then connected to programs written in Ada. Text could also be associated with icons to add detail and explanation to a portion of the design. Icons denote predefined or user-defined concepts about dependencies in programs. Although the graphical refinement characteristics of PegaSys make it seem attractive as a way to encode design rationale, the technique

Elisa Baniassad

Page 25

Thesis Proposal

requires significant up-front effort. As with the knowledge based techniques, the usefulness of the tool for a particular source base is limited by how well the refinements have been constructed and annotated. This is a technique geared towards construction in at the time of construction of the source base itself. The DR-SG approach is intended to be used on-demand with existing source code, and to demand limited effort.

5.5 Linking from Within Documentation to Source


Hypertext allows a non-linear navigation of documentation. Some hypertext systems, such as CHIME [Devanbu99] allow linking within source code to allow browsing from variable definitions to uses, and from a caller to a callee. Some, such as SODOS [Horowitz86] offer linkage between documents without linkage to source. The most related to the DR-SG approach are those techniques, such as SLEUTH [French et al], that link design documentation to source. In SLEUTH, links within the documents are generated when the author specifies filters, indicated by regular expressions. The links can point to files, or to anchors within documents. The within-document anchors must be inserted manually. There are several main differences between SLEUTH and the DR-SG technique. One of the main assumptions of SLEUTH is that documentation will be designed for hyper-linkage. In contrast, the DR-SG technique, offers no such limitation. Another difference is the granularity of linkage to code. In SLEUTH, links to code are at the file level, whereas in the DR-SG, portions of the code are pointed to. This difference limits the degree of precision with which documents in SLEUTH can describe portions of the code. This is an inherent problem in SLEUTH, because it operates directly on source files, into which hypertext anchors be inserted. In the DR-SG there is no inherent limitation to the granularity of links. This allows developers to query on very specific portions of the source code, and, where possible, retrieve appropriate design context.

5.6 Alternative Representations of Design Patterns


While the DR-SG is meant to link Design Pattern documentation to relevant portions of source, even when not linked to source, the DRG presents an alternative decomposition of the pattern text for the sake of clarification and exposure of design intent. Pattern formalization approaches such as LePUS [Eden et al], DisCo [Mikkonen] and the graphical approach by Lauder and Kent are similar in motivation to this. For instance, Lauder and Kent [Lauder et al] present a three-model approach to formalizing Design Pattern specification. The three models are: a role model, which is the most abstract representation of the pattern, a type model, which refines the role model, and the class model, which forms the concrete implementation. Through these models, they mean to describe patterns more precisely, and unambiguously. The three models represent layers of abstraction that do not restrict expressiveness. The visual nature of these models means that, though formal, these descriptions are easy to read. The formality of these specifications facilitates model checking and comparison in a way that is not intended by the DRG deconstruction. We see pattern formalization techniques as complementary to the DRG for two reasons. First, the DRG contains information that formal pattern descriptions do not: the reason for design decisions. The DRG allows developers to examine the Design Pattern from a non-functional point of view. Second, exploring a Design Pattern using a DRG can help a developer understand an existing pattern sufficiently to formalize it.

Elisa Baniassad

Page 26

Thesis Proposal

6 Contributions and Summary


This proposal has introduced the Design Rationale Graph (DRG) model and the Design Rationale to Source Graph (DR-SG) approach and tool, which provide developers with a mechanism to explore the relationships between design elements found in a Design Pattern and code. Existing tools do not appropriately address the need for the connection between design rationale and relevant portions of source code.

The DRG is a graphical representation tailored to the space of Design Patterns. It provides the readers of patterns a new way to understand and explore what they are reading. Readers are presented with a visualization of the design entities found in the text of the pattern, and can perform queries to shift their view to better explore concepts of interest.

The DR-SG links the Design Pattern to the code, thus enhancing the DRG by providing design context for portions of the implementation. Such context is important when understanding how design concepts manifest themselves in a software implementation, and also to knowing the design reasoning behind portions of code.

This document has presented a proposal for the functionality of the DR-SG approach and tool, and a plan for proving or disproving the thesis statement.

Elisa Baniassad

Page 27

Thesis Proposal

7 References
[Ball et al] Thomas Ball and Stephen G. Eick, "Software Visualization in the Large, " IEEE Computer, Vol. 29(4), April 1996. [Basili1] Basili, V. (1993) The Experimental Paradigm in Software Engineering. In H. Dieter Rombach, V. R. Basili, & R. Selby (eds.), Experimental Software Engineering Issues: Critical Assessment and Future Directives. Proceedings of Dagstuhl-Workshop, September 1992, published by SpringerVerlag, #706, Lecture Notes in Computer Software [Basili2] The role of experimentation in software engineering: past, current, and future; Victor R. Basili; Proceedings of the 18th international conference on Software engineering, 1996, Pages 442 449 [Biggerstaff] Biggerstaff, T.J., B.G. Mitbander, and D.E. Webster, Program Understanding and the Concept Assignment Problem. CACM, 1994. 37(5): p. 72-83 [Biggerstaff1] T. J. Biggerstaff, J. C. Hoskins, and D. Webster. DESIRE: A System for Design Recovery. Technical Report STP-081-89, MCC Software Technology Program, May 1989. [Biggerstaff2] The concept assignment problem in program understanding; Ted J. Biggerstaff, Bharat G. Mitbander and Dallas Webster; Proceedings of the 15th international conference on Software Engineering, 1993, Pages 482 498 [Brusilovsky et al] Brusilovsky, P., Pesin, L., & Zyryanov, M.: "Towards an adaptive hypermedia component for an intelligent learning environment"; In Bass, L.J., Gornostaev, J., & Unger, C. (Eds.), HumanComputer Interaction, Springer-Verlag, Berlin (1993) 348-358. [Charles99] Charles, P., and Shields, D. The Jikes project, 1999. Available at oss.software.ibm.com/developerworks/opensource/jikes/project [Chaumun et al] M. A. Chaumun, H. Kabaili, R. K. Keller and F. Lustman. A Change Impact Model for Changeability Assessment in Object-Oriented Software Systems. In Proceedings of the Third Euromicro Working Conference on Software Maintenance and Reengineering, pages 130-138, Amsterdam, The Netherlands, March 1999 [Chen et al] Chen, Y.-F., Nishimoto,M.Y.,AND Ramamoorthy, C. V. 1990. The C information abstraction system. IEEE Trans. Softw. Eng. 16, 3 (Mar.), 325334. [Chiueh et al] T. Chiueh; W. Wu, "Variorum: A Multimedia-Based Program Documentation System," TR-49, Experimental Computer Systems Laboratory, Computer Science Department, SUNY at Stony Brook, April 1998. [concerns-study] E. Baniassad, G. C. Murphy, C. Schwanninger, M. Kircher, Where are Programmers Faced with Concerns? Workshop on Advanced Separation of Concerns, OOPSLA, Minneapolis, Minnesota, USA, October 2000. [Devanbu et al] P. Devanbu, R. Brachman, P. Selfridge, and B. Ballard. Lassie: A knowledge-based software information system. In Proceedingsof the 12th International Conferenceon Software Engineering, pages 249--261, Los Alamitos, CA, 1990. IEEE Computer Society Press [Devanbu99] Premkumar Devanbu, Yih-Farn Chen, E. Gansner, Hausi Muller, and J. Martin. CHIME: Customizable hyperlink insertion and maintenance engine for software engineering environments. In 21st International Conference on Software Engineering, May 1999 [dotty] AT&T Inc., dotty: Graphviz, version 1.3, 1998. See http:// www.research.att.com/sw/tools/graphviz

Elisa Baniassad

Page 28

Thesis Proposal

[DRG1]

E. Baniassad, G. C. Murphy, C. Schwanninger, Understanding Design Patterns Using Design Rationale Graphs, March 2000.

[Dumais et al]

Dumais, S., Furnas, G., and Landauer, T. Using latent semantic analysis to improve access to textual information. In Proceedings of Computer Human Interaction '88 (1988).

[Eden et al]

Eden A., Hirshfeld Y., Lundqvist K., LePUS --- Symbolic Logic Modeling of Object Oriented Architectures: A Case Study. NOSA '99 Second Nordic Workshop on Software Architecture, University of Karlskrona/Ronneby, Ronneby, Sweden, 1999

[Eick]

S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mockus, "Does code decay? Assessing the evidence from change management data," IEEE Transactions on Software Engineering, 1999

[French et al]

French, J.C.; Knight, J.C.; Powell, A.L., "Applying hypertext structures to software documentation," Information Processing & Management, vol.33, no.2, p. 219-31, March 1997.

[Girardi et al]

M. R. Girardi and B. Ibrahim. Automatic indexing of software artefacts. In Proceedings of 3rd. International Conference on Software Reuse, pages 24--32, Rio de Janeiro, Brazil, November 1994.

[GOF]

Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design Patterns : Elements of Reusable Object-Oriented Software. Reading, Mass.: Addison-Wesley.

[Griswold]

W.G. Griswold, Coping with Software Change Using Information Transparency, Tech. Report CS98-585, Dept. of Computer Science and Eng., Univ. of California, San Diego, 1998

[Horowitz86]

E. Horowitz and R.C. Williamson. SODOS: a software documentation support environment - its denition. IEEE Transactions on Software Engineering, 12(8):849-859, 1986

[ISVIS] [Keller]

D. F. Jerding, ISVis, http://www.cc.gatech.edu/morale/tools/isvis/isvis.html Keller, R. K., Knapen, G., Lagu, B., Robitaille, S., SaintDenis, G., and Schauer, R., The SPOOL design repository: Architecture, schema, and mechanisms. In Hakan Erdogmus and Oryal Tanir, editors, Advances in Software Engineering. Topics in Evolution, Comprehension, and Evaluation. Springer-Verlag, 2000

[Knuth] [Knuth2]

D. E. Knuth, "Literate programming," The Computer Journal, vol. 27, pp. 97--111, May 1984 Donald E. Knuth and Silvio Levy. The CWEB System of Structured Documentation (AddisonWesley, 1994). ISBN 0-201-57569-8.

[Kraemer et al]

Kramer, C. and Prechelt, L. Design Recovery by Automated Search for Structural Design Patterns in Object-Oriented Software, Proceedings of WCRE 96, pp. 208-215

[Lange95]

D.B. Lange and Y. Nakamura. Interactive Visualization of Design Patterns Can Help in Framework Understanding. In Proceedings of the OOPSLA'95, ACM SIGPLAN Notices vol. 30, no. 10, October 1995, pages 342-356

[Lauder et al]

Lauder, A., Kent, S.: Precise Visual Specification of Design Patterns. In: Jul, E. (ed.):. ECOOP'98 - Object-Oriented Programming, Lecture Notes in Computer Science, Vol. 1445. Springer-Verlag (1998) 114-134

[Lehman]

Belady, L. A. and , M. M. Lehman; A model of large program development.IBM Systems Journal 15, 3 (1976), 225252.

[Lindig]

Lindig, C., and Snelting, G. Assessing modular structure of legacy code based on mathematical concept analysis. In 19th International Conference on Software Engineering, ICSE-19 (1997), ACM Press, pp. 349359

[Lindvall]

Lindvall, M., A Study of Traceability in Object-Oriented Systems Development, Licentiate Thesis 462, Dep. of Computer and Information Science, Linkping University, Sweden 1994

Elisa Baniassad

Page 29

Thesis Proposal

[ltchk] [Mancoridis]

LTCHUNK: The Language Technology Group, http://www.ltg.ed.ac.uk/index.html. Mancoridis, B. Mitchell, C. Rorres, Y. Chen, and E. Gansner. Using automatic clustering to produce high-level system organizations of source code. In Proceedings of IWPC'98, pages 45--53, Ischia, Italy, 1998.

[Mikkonen]

Mikkonen, T., Formalizing Design Patterns. Proc. 20th Int. Conf. on Software Eng., IEEE Computer Society 1998, 115-124

[Moriconi et al]

PegaSys: a system for graphical explanation of program designs; Mark Moriconi and Dwight F Hare; Proceedings of the ACM SIGPLAN 85 symposium on Language issues in programming environments, 1985, Pages 148 160

[Murphy et al]

G. C. Murphy, D. Notkin, and K. Sullivan. Software reflexion model: Bridging the gap between source and higher-level models. In In proceedings of the 3rd ACM SIGSOFT SFSE, pages 18--28, October 1995

[Parnas]

Software aging; David Lorge Parnas; Proceedings of the 16th international conference on Software engineering, 1994, Pages 279 287

[Passardiere et al] de La Passardiere, B., & Dufresne, A.: "Adaptive navigational tools for educational hypermedia"; In Tomek, I.(Ed.) Computer Assisted Learning, Springer-Verlag, Berlin (1992)555-567 [Pfleeger2] S. Pfleeger. Design and analysis in software engineering. Part 2: How to Set Up an Experiment. ACM SIGSOFT Software Engineering Notes, 20(1):22--26, January 1995 [Pfleeger3] S. Pfleeger. Design and analysis in software engineering. Part 4: Choosing an Experimental Design. ACM SIGSOFT Software Engineering Notes, 20(3):13--15, July 1995 [Prechelt and Kramer] Lutz Prechelt, Christian Krmer. Functionality versus Practicality: Employing Existing Tools for Recovering Structural Design Patterns. Journal of Universal Computer Science (J.UCS), 4(12):866882, December 1998 [Prechelt and Unger] Lutz Prechelt, Barbara Unger and Michael Philipsen. Documentting Design Patterns in code eases program maintance. In Proc. ICSE Workshop on Process Modeling and Empirical Studies of SW Evolution, pages 72--76, 1997 [Price et al] B. A. Price, R. M. Baeker, and I. S. Small. A principled taxonomy of software visualisation. Journal of Visual Languages and Computing, 4(3):211--266, September 1993 [Reiss] Reiss, S. 1995. The Field Programming Environment: A Friendly Integrated Environment for Learning and Development. Kluwer Academic Publishers, Hingham, MA. [Richner et al] T. Richner and S. Ducasse. Recovering high-level views of object-oriented applications from static and dynamic information. In Proceedings of the International Conference on Software Maintenance, pages 13--22, Oxford, England, 1999. [Rubager] S. Rugaber, "MORALE METHODOLOGY GUIDEBOOK: Methodology Guidebook for Synchronized Refinement, " Georgia Institute of Technology, 1998 [Sartipi] K. Sartipi, K. Kontogiannis, and F. Mavaddat. A pattern matching framework for software architecture recovery and restructuring. In 8th International Workshop on Program Comprehension (IWPC 2000), pages 37--47, Limerick, Ireland, June 10-11 2000. IEEE Computer Society. 10. [Schmidt et al] D. Schmidt, M. Stal, H. Rohnert, F. Buschmann "Pattern-Oriented Software Archetecture - Patterns for Concurrent and Network Objects", John Wiley & Sons, 2000. [Sefika et al] M. Sefika, A. Sane, and R. Campbell. Monitoring compliance of a software system with its high-level design models. In Proceedings of ICSE-18, pages 387--396, 1996

Elisa Baniassad

Page 30

Thesis Proposal

[Siff]

Siff, M., and Reps, T. Identifying modules via concept analysis. In InternationalConference on Software Maintenance, ICSM97 (1997), IEEE Computer Society.

[SniFF]

SNiFF+. User's Guide and Reference, TakeFive Software, version 2.3. http://www.takefive.com, December 1996

[Sowa]

F. Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, 1988

[Stasko et al]

Stasko, J.; Jerding, D. Using Visualization to Foster Object-Oriented Program Understanding. Atlanta, Georgia Institute of Technology, July 1994.

[Walker et al]

R. J. Walker, G. C. Murphy, B. Freeman-Benson, D. Wright, D. Swanson, and J. Isaak. Visualizing dynamic software system information through high-level models. In Proc. OOPSLA '98, pages 271--283, 1998

[Wildman]

Getting the most from paired-user testing; Daniel Wildman; interactions 2, 3 (Jul. 1995), Pages 21 27

[Wills]

L.M. Wills. Using attributed flow graph parsing to recognize clichs in programs. In Proceedings of the International Workshop on Graph Grammars and Their Application to Computer Science, pages 101--106, 1994.

[Yin]

Yin, R.K. (1989). Case Study Research, Design, and Methods. London: Sage Publications Ltd. Yukl, GA (1989). Leadership in Organizations. Englewood Cliffs, NJ: Prentice-Hall Inc

Elisa Baniassad

Page 31

Thesis Proposal

Appendix A.
To finishing thesis:

Timeline
05 / 01 06 / 01 07 / 01 08 / 01 09 / 01 10 / 01 11 / 01 12 / 01 01 / 02 02 / 02 03 / 02 04 / 02 05 / 02 06 / 02 07 / 02

Defend thesis proposal Link DRG to Source (DR-SG) Improve the DRG Format Conduct Controlled Experiment Re-work DR-SG tool Conduct Industrial Case Studies Write Thesis Defend Thesis

Elisa Baniassad

Page 32

Thesis Proposal