You are on page 1of 14

Variorum: A Multimedia-Based Program Documentation System

Tzi-cker Chiueh Wei Wu

Computer Science Department State University of New York at Stony Brook Stony Brook, NY 11794-4400

chiueh@cs.sunysb.edu 516-632-8449(Tel)/8334(Fax)

Abstract
Conventional software documentation systems are mostly based on textutal descriptions that explain or annotate the program's source code. Typically they also support interactive browsing of high-level control ows, and name-based searching of program primitives such as variable declarations and function de nitions. Because these systems rely solely on texts, it is di cult for program authors to describe overall algorithm structures and detailed implementation considerations of the programs in an interactive and exible fashion. Variorum is a novel software documentation system that allows program authors to record the process of \walking through" their own code using multimedia technology, speci cally, text, audio, and digital pen drawing. This approach greatly improves the interactivity and exibility in the software documentation process. In addition, to broaden its applicability and to reduce the implementation complexity, Variorum is designed to inter-operate with the WWW technology, in that the program source code les and their annotations are stored on web servers and directly accessible via commercial web browsers such as Microsoft's Internet Explorer. This paper describes the design and implementation of Variorum, as well as preliminary usage and performance experiences with the current prototype.

Digital Pen

Category: Research Paper Keywords: Multimedia, Program Understanding, Software Documentation, Hypertext, WWW,

Structural understanding. so that the latter can use these information to add new functionality to or re ne existing ones of the programs.1 Introduction It is well-known in software industry that the maintenance cost of software products typically accounts for at least 50% of the total cost in their entire life cycle 1]. Several research e orts 3] 4] 5] 2] on theorizing the program understanding process have been performed previously.e. the programmers. and thus in itself is not inadequate to support program understanding e ectively. i. to pass the software's underlying logic on to program readers. From a cognitive science point of view. Typically a module is either a function or a procedure. understanding a program is essentially a reverse engineering process { recovering the logical design decisions from the source code implementation. based on hypertext and multimedia technologies. is about the discovery of inter-module relationships." Unless documenting programs becomes at least as e ortless as entering source codes. Variorum is a novel program documentation system that provides a simple and yet powerful paradigm for program annotation and comprehension. Many eld evidences also indicate that most of a programmer's time is actually spent in reading programs rather than actually writing the codes 1] The implication of this is that research on software development systems should shift from conventional edit-compile-link tools to what we call Program Understanding Support SYstems (PUSSY).. However. a SDS should allow software writers. proper authoring support for software writers to organize and explain the mapping between the conceptual design and the physical implementation.e. The following design principles guide the development and implementation of Variorum: 1 . by striking a well-engineered balance between these two seemingly con icting requirements. they could at the same time intimidate program writers to the extent that they choose not to write the documents at all! A good SDS must thus satisfy both the programmers' and maintainers' needs. The implication of this is that elaborate SDSs may be useful for composing richer software documents that are informative to program readers. Conventional inline comments or annotations work at a level even lower than the module level. which must provide. One of the most important conclusions from these studies is that the program understanding process is actually interleaved with two subprocesses: module-level comprehension and structural understanding. the distinction between these two subprocesses is by no means well-de ned and indeed is often blurry in practice. the maintainers. especially in an industry where they are constantly pressured for `'the next product shipping deadline. and on the other. In other words. i. an easy-to-use interface for software readers to retrieve and assimilate various annotated information. Therefore the central component of a PUSSY is a software documentation system (SDS). One major challenge in developing a SDS is the reluctance that program writers have in documenting programs. Module-level comprehension concerns identi cation of the program's execution logic on a module by module basis. on the one hand.. it is di cult to convince software developers to document their programs at the level of details useful for future software maintainers. on the other hand. and the global control ow of software products.

2 Related Work There were several attempts to determine the underlying mechanics of program understanding from empirical studies. which integrates ne grained information with architectural views extracted from source code.Integrate the documentation with the source code it annotates su ciently tightly together to ensure the proximity and consistency between annotation and code. Examples include SODOS 7. Section 6 concludes this paper with a summary of the main features of Variorum and a brief outline of on-going work to improve the prototype. Section 5 discusses the preliminary experiences we had with Variorum in terms of e ectiveness and performance. Klosch and Mittermeir 5] found that high-level design patterns that software developers use or have to use when developing systems. Provide interactive digital pen drawing and speech input to annotate programs in the form of a detailed code walk-through as practiced in the software industry. LaSSIE 10] used knowledge representation and reasoning technology to address the invisibility problem in developing large software systems. Exploit commercial WWW browsers as browsing/navigation tools for the source codes and their annotations. et al. The focus of this paper is on the system architecture and implementation details of Variorum. and so on. Storey. and HyperCASE 9]. can serve to support the program understanding part of the reverse engineering process. without changing the compilability of the programs. plug-in extension. 8]. Section 3 presents the authoring/playback interfaces and the system architecture of Variorum. That is. Von Mayrhause. and the initial usage and performance experiences with the current prototype. et al. 4] explores the question of how existing program understanding tools help or hinder real-world program understanding tasks. Section 2 reviews previous works on software documentation systems to set the contributions of this work in perspective. modi cations to the programs for annotation purposes should be transparent to the compiler. More recently. SLEUTH 11] maintains software documentation as a hypertext 2 . LIGHT 12]. 6] described a comprehensive program understanding and maintenance environment called CANTO. and frequently switch between levels of abstraction. several such systems have been built. Several research projects proposed to use hypertext technology to organize the source code and other software documents. HyperCode 13]. The decision of maintaining WWW inter-operability is especially important because it greatly simpli es Variorum's development e ort by reusing the powerful capabilities built into existing web browsers. Antoniol et al. such as text search. Section 4 describes the implementation details of Variorum. 3] reported on the results of a software understanding eld study and found that programmers work predominantly at the code and algorithmic levels.

Soga et al. without using video cameras.1 Rationale The easiest way to understand a program is to sit side by side with the program's original author. which sometimes are exactly what are needed to modify the software. Speci cally. This is especially true when a software product involves multiple authors. and graphically illustrate the algorithms or program structures by drawing gures on digital tablets. And last but not least. software maintainers can access these annotations by clicking on the code segments of interest. exactly the same way as they would have done if they were asked to work with software maintainers. The program author typically goes through his/her code in rather low-level details to convince his/her peers that it is logically correct and can interface with other software modules successfully. Video recording requires users to constantly struggle 3 . Secondly. it is typically quite expensive to ask original authors to deviate from their current responsibilities whenever programs they previously wrote need to be modi ed. Younger and Bennett 14] presented tools to record the knowledge gained by maintainers engaged in understanding an existing program. although there is no system implementation described in the paper. First. Code walk-through is a standard practice in the software industry to ensure a minimum level of code quality and correctness. even if the author is present. Subsequently. this is rarely possible for various reasons. by providing a multimedia recording and editing tool for program authors to explain how the programs work. the elimination of video cameras not only saves cost but also reduces signi cantly recording e orts. Variorum chooses not to use video for the following reasons. to go over the program line by line. Essentially the Variorum approach is to record the code walkthrough session for program understanding. The design goal of Variorum is to duplicate the ideal scenario for software maintainers without all the di culties. program authors can verbally describe the program's design and implementation. First of all. These links are generated mechanically by the system and kept accurate under updates. as in Variorum. 15] is one of the few works that attempted to integrate multimedia with program understanding. and explain both what the program does and why it is designed and implemented the way it is. None of the previous works on software documentation support systems attempt to record the authors' code walk-through sessions using multimedia technology. In practice. 3 The Variorum Architecture 3. the author(s) may not be in the same geographical location or may even already have left the organization. Variorum transparently captures these annotation sessions and synchronizes them with the corresponding program segments. and are shipped with the source code les as standard complements. These multimedia recordings are logically embedded within the programs based on the hypertext model.with typed links for the purpose of browsing by users with varying needs. chances are that s/he may not recall all the design and implementation details of the program.

Variorum allows nested scoping. using diagrams to illustrate complex algorithms and data structures is much more e ective than lengthy textutal exposition. To annotate a program segment. most people still do not feel comfortable talking to cameras. test plans. Except limited use of owcharts or data. The most serious drawback associated with text-based program documentation is that it is di cult. users rst de ne the scope of the annotation by highlighting the code segment. to give an overview of what a procedure does and then to detail the implementation tricks used in each block of the procedure. Of course. Therefore. try explaining the insertion and deletion operations of a simple doubly-linked cyclic list without drawing pictures. Variorum transparently 4 . to distinguish it from other un-annotated segments. etc. users can verbally explain. A doubly and triply annotated code segment is shown in a third and fourth color. But even this is not good enough. That is. For example. it is applicable to all textual documents to which annotation is required. to explain complicated algorithms or program behavior using only textual description. Variorum does not allow crossed scoping. the storage space requirement is greatly reduced without video. one can not annotate a code segment whose scope only partially overlaps with that of another annotated segment. users feel less intimidated and therefore may be able to make more e ective use of Variorum's documentation facility. focuses only on source code les. because static pictures usually do not convey the dynamics of an algorithm very well. 3. That is. making the technology more a ordable and thus more likely to succeed.with issues such as focusing or view control. Variorum presents a separate window initialized to the program segment highlighted. design speci cations. The following description. To avoid confusion. Consider how one explains the internal working of a 3D rendering algorithm without drawing pictures to illustrate the idea of shooting rays across the data set. blue and green. Digital pen drawing is incorporated because from our experiences. When a code segment is annotated. After the scope is de ned. with the help of digital pens. to facilitate the document authoring process. the algorithmic and implementation details of the program segment. one can further annotate a code segment within another code segment that has already been annotated. Second. Variorum chooses to support speech and digital pen drawing as the main media types. users can use Variorum to annotate source code les. for example. red in the current implementation. respectively.2 Authoring and Playback Interface Although Variorum is designed for program documentation. however. Third. which are shown in black. On the same window. Nested scoping is useful. if not impossible. most existing software documentation is in the form of either in-line textual comments or separate textual detailed descriptions. one can always incorporate static gures to complement the textual description. Speech input is a much more exible interface than textual entry for program authors to convey sophisticated and subtle implementation details. By avoiding cameras altogether. it is shown in a di erent color.ow diagrams.

Figure 1: An example snapshot of how programmers annotate a program segment that implements bubble sort. Currently. the ability to edit annotations is essential. The e ect is that the previous contents of the portion of the annotation after 5 . the annotation process is iterative in nature. in this case. using the pen drawing to illustrate the dynamics of the sorting process. Variorum does not support arbitrary delete and insert to recorded annotation sessions. and stores them in a way that guarantees synchronized playback of these two streams. records the speech inputs and the digital pen strokes. etc. as well as addition to existing annotations. Figure 1 shows an example screen snapshot of an annotation session for the \bubble sort" program. The drawings can be in multiple colors. the corresponding annotations should be modi ed accordingly to reect the changes. In addition. Users can use fast forward/rewind to identify the cuto point. and multiple passes are typical to complete an annotation session. and start recording. shown as di erent shades of Grey. Therefore. and which portions of the code are performing what speci c functions. Variorum supports deletion and insertion of entire annotation sessions. beyond which everything in the annotation session is to be discarded. It only allows users to erase trailing portions. users can also type in textual annotations. Also. When programs change.

Figure 2: An example snapshot of the layout of an annotated source code section. the cuto point are over-written. The overlaid contours and their associated annotations are stored as 6 . An example snapshot is shown in Figure 2. By clicking on the underlined annotation links on the program text. One is the overlay model. users can access the corresponding annotations. Software maintainers access the source code les and annotations using commercial Web browsers such as Microsoft's Internet Explorer. 4 Implementation There are two ways to associate annotations with program segments. Fast forward/rewind and pause/resume are provided to help navigating through the annotation sessions. Di erent colors are used to denote di erent nesting levels. in which the scope of an annotation is de ned in terms of the bounding box that encloses the target program segment. di erent shades of Grey. in this case. where code segments at di erent nesting levels are displayed in di erent colors.

In this case.the source codes are required. blue for the second level. the annotation le is stored in the le annotation-link. because not only the nal drawing but also the entire drawing process need to be recorded. etc. Variorum rst locates where the annotated code segment is in the source code le. speech. In this model. the statements //<html><body bgcolor=white> <xmp> and //</xmp> are put in the beginning and end of each source code le to ensure that the web browser interprets it as an HTML le. Variorum does not take this approach because it does not allow recording of only selected window contents and it consumes too much storage space. are more challenging. The advantage of this approach is that no modi cations to Here // is C++'s comment-starting statement. Programs written in di erent programming languages may need di erent comment statements. on the other hand. the embedded annotation links should be commented out. the <xmp> and </xmp> tags are used to preserve the original layout of the programs. The disadvantage is that even small changes to source codes. Digital pen drawings. In this case. but exhibits the disadvantage that the annotation window 7 . Variorum also needs to determine the nesting level of the annotation. may change the bounding box de nitions and thus require modi cations to the annotations. As HTML ignores spaces and tabs.g. links to annotations are inserted in the source code les. This approach signi cantly reduces the storage requirement.. and then inserts the following HTML statements to enclose the segment: //</xmp> <font color=red><a href="annotation-link. the highlighting colors need to be modi ed accordingly. e. Because the names of source code les do not end with ". as used in Variorum. inserting a space line. and picture drawing annotations. Variorum supports textual.ann. Instead. Textual and speech annotations are relatively straightforward to capture.htm". red for the rst level. After users de ne the scope of an annotation by highlighting the code segment. Variorum records directly the coordinates of the pen strokes by sampling the serial interface periodically. One possible approach is to record the drawing process at the image pixel level by dumping the contents of the entire frame bu er to the disk periodically. Even when annotation updates are required. To ensure that modi ed source codes can go through the compiler successfully. and use the proper color to highlight the annotated segment. Although conceptually simple. because the former encloses the latter. respectively.ann">Annotation</a> <xmp> Code segment being annotated // </xmp> </font> <xmp> separate les from the source codes. such as line breaks and indentation. The other approach is the embedding model. Note that subsequent annotations may change the nesting level of previous annotations. the modi cations are strictly local. but source code changes do not necessarily lead to updates to annotation. green for the third level. and <font> and <a> are HTML's primitive for choosing fonts and for hyperlinking. unlike the overlay model.html" or ".

If the interleaving unit is too large. waiting for another second. where each array element points to a list of pen movements within a particular 3-second period. then drawing the pen coordinate lists from the rst array element to the (N ? 2)-th array element and waiting for yet another second. while the pen stroke part is drawn on the screen. Each compressed voice data unit contains 3 seconds worth of ADPCM data. To facilitate the access to annotation les. i.e. since window scrolling is not captured. Given this array. in that it is not necessary to de ne the compensation operation for each pen-related operation in the list. Variorum supports fast forward by starting drawing all the pen coordinates on the list pointed to by the array element corresponding to the current time. or 12 KBytes.is unscrollable and thus limited in size. then drawing the pen coordinate lists from the rst array element to the (N ? 1)-th array element and waiting for another second. the X/Y coordinates. Speech signals are digitized and compressed in real time using ADPCM without silence detection. the two streams are stored in the same le in an interleaved fashion. the speech part of an interleaving unit is sent to the sound card and played continuously. The current implementation uses 3 seconds as the interleaving unit. Each digital pen data unit contains the coordinates of wherever the pen traverses during the 3-second period. The annotation le consists of three parts. until the beginning of the array is reached. a code section that contains the annotated program segment. Variorum supports fast forward/rewind of the pen drawing process. and the timestamp of pressing are recorded. If the interleaving unit is too small. The current implementation of Variorum samples and records the coordinates of pen strokes every 10 msec. drawing the list of pen coordinates pointed to by the next array element. The major advantage of this fast rewind implementation is its simplicity. the digital pen data units are stored in an in-memory array of lists. compared to other implementations. the disk e ciency of writing speech samples and pen coordinates to les may be low. For each pen movement. 8 .. until the end. the pen drawing playback may seem jumpy or discontinuous during fast forward. waiting for one second. When an annotation le is read in. and so on. and so on. according to the record time stamp. without regard to their timestamps. at the rate of 32 Kbits/sec. Assume the current time corresponds to the N -th array element. the current color. This feature is useful for both annotation authors and readers. It avoids storing duplicated coordinate values when the pen stands still by checking whether the current sampled coordinates are the same as previous samples. All the drawings run at full speed. Variorum supports fast rewind by drawing all the pen coordinate lists from the rst array element to the N -th array element and waiting for one second. a textual annotation section. To ensure the playbacks of the pen stroke stream and the speech stream of an annotation session are synchronized. The speed of fast forward in this case is thus three times as fast as the normal playback speed. Fast rewind is implemented slightly di erently. and an array of interleaved compressed voice data and digital pen data units. During playback.

. as it is implemented now. our measurement results show that this is not the case: it takes less than 0. It's possible to signi cantly reduce this requirement by an order of magnitude. users can easily visualize complex data structures. if program authors fail to exploit the multimedia annotation mechanisms provided by Variorum to convey the internal working of a software product. such as multi-dimensional B trees. greatly facilitates the understanding of the overall structure and the detailed mechanics of the programs under study. The digital pen and tablet is from Wacom Inc. such as bubble sort. Web browsers are augmented with the play-back tool as a plug-in extension application. In the authoring tool. it takes 7. is insigni cant. In particular. One of the key future research issues in the Variorum project is to devise a set of concrete guidelines to help programmers to better document their code using Variorum. Modi cations to source code les are transparent to the compiler. Annotation les are stored as les on a Web server with a ".2 GBytes for 30-minute recording.000 pen coordinates. and supports Wintab API for device and pen control. as far as program maintenance is concerned.0 platform using Microsoft's Visual C++ 5. At 32 Kbits/sec. The e ectiveness of Variorum critically depends on individual authors' annotation style. and stored in the Waveform audio format. The prototype's performance is generally acceptable. Variorum alone provides no guarantee on the quality of the software document. It includes a separate authoring and playback tool. the value of the resulting software documentation in helping software maintainers to understand and modify the programs is limited. The current Variorum prototype is implemented under the Windows NT 4. However.0. voice data is captured through a microphone and sound card. is excessive. Fortunately. and include the support for stop/resume and fast forward/rewind. That is. and has adversely a ect the startup latency for playback. through a combination of silence detection and advanced speech compression algorithms such as CELP.ann les. ADPCM is the standard compression method supported by Microsoft's voice API. 9 .Consider. 5 Experience Our initial usage experience with Variorum is quite positive in that its interactive multimedia presentation indeed signi cantly improves the comprehensibility of several short programs that we tested. which can display all . but at the same time preserve the original code layout when displayed via commercial Web browsers. One potential disadvantage is that the fast rewind speed may be slow when the entire drawing requires signi cant time to re-display. the storage requirement for digital voice. Just as di erent people may give qualitatively di erent presentations when using the same presentation authoring tool.5 second to draw up to 3. the fact that with the help of digital pen drawing. on the other hand. and the dynamics of the algorithm. The storage cost for digital pen stroke stream.ann" le name extension. how to rewind a "change pen color" or "erase" operation. for example.

Lastly. so that what is drawn on the tablet is shown on the screen in situ. no timestamp check of the avor of UNIX Make is built into Variorum to ensure the consistency between the source code les and their annotations. We found that this coordination is by no means a serious hindrance. 10 . as well as fundings from Sandia National Laboratory. the annotation window is unscrollable and thus restricts the spatial extent of the annotations. the store and forward approach may not be acceptable. and speech and pen drawing collectively make a powerful presentation medium for explaining how programs actually work. Second. 6 Conclusion This paper describes a novel program documentation system called Variorum. Variorum is the rst program documentation system that features such interactive multimedia technology. Although the initial limited experience with Variorum appears positive. and allowing program maintainers to access these annotations interactively through existing Web browsers. and Computer Associates/Cheyenne Inc. the current digital pen setup su ers from a well known problem: it takes some practice for the authors to coordinate the hands and eyes when they draw on the tablet and the resulting gure is shown separately on the screen. a contract 95F138600000 from Community Management Sta 's Massive Digital Data System Program.. One possible solution to address this problem is to use a see-through touch sensitive tablet that can mount on a standard monitor. Acknowledgement This research is supported by an NSF Career Award MIP-9502067. We are planning to test Variorum in an undergraduate software engineering course to collect subjective evaluations of the understandability of the program documents composed using Variorum. Such an automatic checking mechanism goes a long way towards addressing the long standing problem of documentation outdatedness in the software engineering community. a more extensive eld testing is required to fully validate the promise of Variorum. The current Variorum prototype exhibits several weaknesses that can be further improved. but in a store and forward fashion. For complicated annotations. annotation les are played back not in a streaming format. When the size of annotations les is large. To the best of our knowledge. which exploits interactive multimedia technology to enable program authors to explain intricate design logic and subtle implementation tricks of their programs in a exible and easy-to-understand way. Reuters Information Technology Inc. Variorum embodies this philosophy by allowing program authors to attach multimedia annotations to speci c code segments at di erent nesting levels. The basic philosophy of Variorum is that the best program documentation is the recording of a detailed walk through of the source code by the original author. First. this restriction may pose a serious limitation. NSF MIP-9710622. NSF IRI9711635. Finally.

Bari. March 1992. R. 1986.28."WPC 96. E. "LaSSIE: a knowledge-based software information system.T. 11 . "A hypertext based software-engineering environment. Nov. 12] Barone. no. L. p.C. May 1991. T. p.9..M." Journal of Software Maintenance: Research and Practice..34.. 7] Horowitz... Mittermeir. K. p. vol.A. p. 12-21. 1997.E.References 1] Corbi.5.5.. Germany.. no.SE-12. K. 19-21 Nov. p. 6] Antoniol. vol. 1996. March 1997. p. 1076-87. R. Powell. A. G.. T..A.. 62-8. Paris.. vol. 9] Cybulski. 5] Klosch. J.. R. 8] Horowitz. 11] French.. J. vol. "Applying hypertext structures to software documentation. no." 9th International Conference on Software Engineering and its Applications. 299-327. p. 1-3 Oct. Howe. E.. 2] Biggersta . Mitbander. "SODOS: a software documentation support environment-its use. B.11. "SODOS: a software documentation support environment-its de nition.-A. J. P.D. Zanfei. Selfridge. Rousseau. 10] Devanbu.37. vol." Communications of the ACM. p.. Williamson..SE-12. 6-8 Oct. E. Fiutem." IEEE Software.. 29-31 March 1996. Ballard.2.R." IEEE Transactions on Software Engineering. Williamson.J. Brachman.-Oct... p. 4th Workshop on Program Comprehension. R." Communications of the ACM. "Program understanding: challenge for the 1990s. 1997. no. p. 208-17. B. p.. M.G.. 72-82. May 1994 3] Von Mayrhauser. Knight..J. vol.. "Software documentation on the Web: the LIGHT project (Life Cycle Global Hypertext)." Proceedings of the Fourth Working Conference on Reverse Engineering. P. A. A. A. Aug. 219-31. Muller.E. Wong.8. vol. no. "Improving program understanding by unfolding layers of interacting patterns.9. Italy. S. "Program understanding behaviour during enhancement of large-scale software. "Program understanding and the concept assignment problem. D.. no.C. G.. 1989.2. Netherlands. "Program understanding and maintenance with the CANTO environment. no. Merlo.W.." Information Processing & Management. B. P.." IEEE Transactions on Software Engineering.G. 294-306.L. p 849-59." IBM Systems Journal. 4] Storey. Webster. R.2. Reed.. 72-81. 119-20. France. H. Amsterdam." International Conference on Software Maintenance. 34-49. R. Berlin.33.. "How do program understanding tools a ect how programmers understand programs?. 1997. vol. 1986.. Tonella. Lutteri. no..L.5. Vans.. Sept.

July 1996." Proceedings of the IEEE 1996 National Aerospace and Electronics Conference.J.2.. "Model-based tools to record program understanding. Australia. J. K." IEEE Second Workshop on Program Comprehension.. Italy. Melbourne. "Designing self-explanation environment for multilayer understanding. 14] Younger. 852-7. USA. 49-57. OH.. E. B. "Getting the most out of legacy code: the uses of HyperCode within a typical IS organization. 8-9 July 1993.13] Petry. 87-95..L. Dayton. 15] Soga. 12 . In case of program understanding. M. Vic. vol.-I. Toyoda. p.. A. 20-23 May 1996. Bennett.." 1996 IEEE International Conference on Multi Media Engineering Education.. Capri. p. p.H. Kashihari.

13 . SUNY at Stony Brook. Wei Wu Address: Computer Science Department. SUNY at Stony Brook Lead Author and Presenter: Tzi-cker Chiueh Permission: The authors have the permission of SUNY at Stony Brook to publish the results contained in this paper.sunysb.Authors: Tzi-cker Chiueh. Stony Brook NY 11794-4400 Email Address: chiueh@cs.edu A liation: Computer Science Department.