You are on page 1of 13

1

Code Migration Through Transformations: An Experience Report


Kontogiannis, K.1, Martin, J.2, Wong, K.2, Gregory, R.3,
Mueller, H.2 and Mylopoulos, J.3
University of Waterloo1 University of Victoria2 University of Toronto3
Dept. of Electrical Eng. Dept. of Computer Science Dept. of Computer Science

Abstract surprisingly, management is looking for alternatives,


which sometimes take the route of totally replacing
One approach to dealing with spiraling maintenance the legacy system in question with a new one, or re-
costs, manpower shortages and frequent breakdowns engineering it.
for legacy code is to "migrate" the code into a new Unfortunately, re-engineering a large system not
platform and/or programming language. The objec- only requires a very high commitment of human re-
tive of this paper is to explore the feasibility of semi- sources, but also introduces a number of risk factors
automating such a migration process in the presence such as integration errors, introduction of faulty code
of performance and other constraints for the migrant and non-compliance with global constraints on perfor-
code. In particular, the paper reports on an experi- mance, maintainability, etc.
ment involving a medium-size software system writ- One pragmatic approach to the re-engineering of
ten in PL/IX. Several modules of the system were mi- legacy code is to "migrate" the code into a new plat-
grated to C++, rst by hand and then through a semi- form and/or programming language. Migration can
automatic tool. After discovering that the migrant be done at di erent levels which are increasing more
code was performing up to 50% slower than the orig- ambitious and time-consuming to implement. At low-
inal, a second migration e ort was conducted which est levels, migration takes the form of transforming
improved the performance of the migrant code sub- (or, "transliterating") the code from one language into
stantially. The paper reports on the transformation another. At higher levels, the structure of the system
techniques used by the transformation process and the may be changed as well to make it, for instance, more
e ectiveness of the prototype tools that were developed. object-oriented. At still higher levels, the global ar-
In addition, the paper presents preliminary results on chitecture of the system may be changed as part of
the evaluation of the experiment. the migration process. For this paper, we adopt a
lower level migration strategy, because it is practical,
it does not require that the software re-engineer is fa-
1 Introduction miliar with the legacy code, and is most amenable to
Legacy software systems are software systems that automation.
have been in operation for many years, have evolved to Legacy code migration is rarely done in a vacuum.
meet changing organizational demands and computing Instead, for each migration project there are require-
platforms, and are often mission critical for the organi- ments such as "the migrant code must run at least
zation that owns and operates them. Managing such as fast as the original code", or "the migrant system
systems is dicult because of frequent breakdowns, must be easier to maintain". These non-functional
spiraling maintenance costs and shortages of quali ed requirements introduce a risk factor into the migra-
personnel who are willing to work with obsolete pro- tion process, since they can usually only be evaluated
gramming languages and operating platforms. Not after the migration is complete. Another constraint
often adopted in order to reduce the risk of migra-
 This work was funded by IBM Canada Ltd. through the
tion projects is that the process must be incremen-
Center for Advanced Studies (Toronto), the Natural Sciences tal [Brodie95], i.e., can be conducted so that certain
and Engineering Research Council (NSERC), and the National
Research Council of Canada. The IBM contact for this paper components of the legacy system are selected and mi-
is Bill O'Farrell billo@ca.ibm.com grated, resulting in an operational system. Such an
incremental process ensures that migration can pro- are substantially di erent in the two languages. Sec-
ceed in a piecemeal fashion, thereby lowering the risk tions 5 and 6 describe the the transformation process,
of overall failure, cost and time overruns, and the like. while section 7 discusses integration and porting issues
This paper reports on an experiment intended to for migrated modules, also some of the limitations of
develop a methodology for building code migration the tools that have been developed. Finally, section 8
tools which meet non-functional requirements and presents an overall evaluation of the experiment and
support incremental migration. In particular, the section 9 o ers conclusions and describes directions for
paper reports on the migration of components of a further research.
medium-size software system from PL/IX to C++.
PL/IX is a dialect of PL/I that has been used widely
for system development. As part of the experiment, 2 Related Work
several modules of the system were migrated to C++, A number of research teams have addressed the is-
rst by hand and then through a semi-automatic tool.
After discovering that the migrant code was perform- sue of source code migration. In [Gillesp98] a tool
ing up to 50migration e ort was conducted which im- that is used to transform Pascal programs to C ,
proved the performance of the migrant code substan- while in [Feldman93] a Fortran-77 to C and C++ con-
tially. In both phases, migration was rst carried out verter is proposed. Both migration tools are based
by human experts, followed by identi cation of the on parse trees that are generated from source code,
heuristics used by the experts, and then coding of are consequently transformed and then fed to a com-
these into the tools that semi-automate the migra- piler of the target language. In [Yasuma95] a system
tion process. The transformation tool has been built for translating Smalltalk programs into C is presented
within the framework of the Consortium for Software which creates runtime replacement classes that imple-
Engineering Research and it took roughly 8 person ment the same functionality as Smalltalk classes in the
months to develop. source code. A semi-conservative, real-time garbage
collection mechanism is also provided in this environ-
The case study used for the experiment involves a ment. On the commercial front, a Fortran to Fortran-
compiler optimizer written in PL/IX and consisting Windows converter has been developed by [Sigma98]
of approximately 300KLOC. The "owners" of the sys- and allows for HP UX, Sun, VAX, and IBM For-
tem have variable degrees of familiaritywith the global tran. to be converted to Fortran that can run on
architecture of the system and extensive familiarity Windows/98/NT environments. Similarly, a number
with particular components they are responsible for. of translators for Ada 83 to Ada 95, CMS-2 to Ada,
The coding standards and the informal information Jovial to Ada, Fortran to Ada have been developed by
(comments, variable names) are key guiding elements Xinotech [Xino98].
for the understanding and maintenance of the system. In many respects, the translation problem of code
The system implements sophisticated code optimiza- from one programming language to another can be
tion techniques and has been highly optimized itself thought as a problem of as mapping syntactic and se-
during its 15-year lifetime. Some components of the mantic patterns of the source language to patterns
system are not owned by any member of the devel- in the target language. Within the framework of
opment team and are therefore very dicult to main- pattern identi cation, Baker ([Baker94]) represents
tain. Not surprisingly, the team is reluctant to per- source code as a stream of strings. The approach uses
form radical changes to its structure since this may parameterized pattern matching techniques based on
a ect negatively its overall performance. For such a a variation of the Boyer-Moore algorithm to identify
case, code transformation is an attractive alternative, duplication within a string. Paul ([Paul94]) proposes a
promising to move parts of the system, for instance, system (SCRUPLE) in which regular-expressions are
orphaned components, into a modern programming used to locate programming patterns in a large soft-
language (C++), without any deterioration in perfor- ware system. Likewise, Johnson ([Johnson94]) iden-
mance, nor a high risk of failure . ti es text-level patterns in source code by comput-
Section 2 of the paper reviews the literature for re- ing ngerprints using a hashing mechanism. These
lated research, while section 3 describes the transfor- are then compared to identify similarities between two
mation process adopted for the project. In section 4 texts corresponding to code fragments.
we discuss some of the migration tactics used for data Finally, other approaches, originally developed
types, which in some cases (e.g., for aggregate types) for the area of syntax based editing, include CIA
([Chen90]), CSCOPE ([Ste en85]), the Pan system : ArithmeticSubtract
([Ballance92]), the CENTAUR system ([Borras88]), : IdentifierReference
the Cornell Synthesizer Generator ([Reps84]) and, A : LogicalAnd
([Ladd95]). : LogicalNot
: LogicalOr
: ReferenceComponent

3 The Transformation Process Each language construct is represented in the do-


main model as an object with a list of attributes. For
In order to perform software transformations, it example a PL/IX If Statement is represented as:
is useful to represent source code at a higher level
of abstraction than, say, text or even parse trees StatementIf object
[Yasuma95]. Like many other research e orts, we have Subclass-of Statement
chosen to represent the source code in the form of Ab- Attributes
stract Syntax Trees (AST) which has been emitted condition : Expression
from a custom-built PL/IX parser and linker. Ab- then-clause : Statement
stract Syntax Trees provide a ne-granularity rep- else-clause : Statement
resentation of source code details (type information, End StatementIf
scoping, resource usage) which is exploited throughout
the transformation process. A custom-built PL/IX The transformation of a given code fragment con-
language domain model provides an hierarchical view sists of the following steps:
of the PL/IX language constructs and "drives" the Transformation of its data structures: PL/IX
set of transformation routines which maps PL/IX lan- data structures of the subject system were analyzed
guage constructs to corresponding C and C++ con- and migrated based on a set of transformation rules
structs. Once a PL/IX construct is identi ed, a trans- applied both to the basic and to aggregate data types
formation routine is selected and invoked. Each en- of the source language. Overlays, unnamed elds,
tity of the custom-made PL/IX domain model has memory areas, aliases, controlled variables, and many
a corresponding transformation routine. For exam- other complex features of the source language were
ple a StatementIf in PL/IX will be transformed mapped to functionally similar structures in the tar-
to an equivalent statement in C by transforming its get language.
condition, its then-clause, and its else-clause,
which the domain model indicates that are of type Generation of supporting utilities and other
Expression, Statement, and Statement respectively target language speci c artifacts: In many cases con-
(see example in Section 6). structs of the source language do not have obvious cor-
A subset of the PL/IX domain model is illustrated responding constructs in the target language. In this
below, with indentation representing sub-classing: case, a set of utilities and libraries need be constructed
to support execution of the migrant code. Examples
PlixObject of such constructs include array classes (one dimen-
PlixApplication sional or multidimensional), memory allocation and
PlixFile deallocation mechanisms, and built-in functions that
: PlixProgramFile
: PlixIncludeFile
are speci c to the source language.
Statement Generation of the new system: Programs are
: StatementIf mapped from the source language to the target lan-
: StatementAssignment guage in a way that preserves structure, algorithmic
: StatementCall patterns, informal information (i.e variable names),
: StatementDeclare
comments, and indentation. In particular comments
: :
: :
DeclareObject
StatementInclude
are handled as annotations in the AST and are ex-
: StatementDoEnd tracted by the custom-made linker. In most cases map
Expression very well to the correct position in the new generated
: ArithmeticAdd code as most comments occupy a single line and do not
: ArithmeticDivide interleave with expressions inside a source code state-
: ArithmeticMultiply ment. For cases where there is a position discrepancy
this usually is in the order of one or two lines and can Aggregate Type Conversions: Mappings for ag-
be easily identi ed at the code inspection phase. gregate data types were developed by the code trans-
The process of selecting which components of the formation program, using information on basic data
legacy system to migrate begins with program under- type conversion. As a general rule, PL/IX records
standing and re-documentation techniques which lead were transformed to C/C++ structures (struct).
to a detailed analysis of the legacy system in terms Overlays were handled as C/C++ unions. The type
of its major components, les, subsystems, and corre- conversion table can be modi ed with ease during the
migration process and can be tailored to particular
sponding interfaces [Finni97]. Each subsystem is thor- requirements for a speci c migration project. A de-
oughly analyzed in terms of its control and data ow tailed discussion on the issues involved for data type
properties, its interface with the rest of the system, transformations can be found in [CserRep]. An exam-
and the major algorithms and data structures used. ple that illustrates the transformation of an aggregate
Once a global decomposition of the legacy system data type is shown below:
has been achieved in terms of subsystems and inter-
faces, a number of subsystems can be selected as the dcl 1 dsstate external,
primary candidates for migration. The selection cri- 2 prg integer,
teria focus on: 2 externs integer,
2 statmem integer,
Code size: Subsystems between 5-10KLOC are 2 curr_pdx_statmem integer,
ideal candidates since they can be manually examined 2 aregcnt integer,
and inspected. 2 aregused integer,
2 flags bit(32),
Interface patterns: Self contained, systems with .2 *,
simple interface patterns, and as few as possible side 3 dbg bit,
e ects, are also primary candidates. 3 detail bit,

Maintainability: Subsystems which degrade in 3 warn bit,


3 detail2 bit,
quality and performance due to the lack of resources 2 memused bit(32),
are also good candidates for migration. Maintenance 2 maxmuse bit(32);
for these systems is performed on as needed basis and
usually is of a corrective variety. The incorporation
of substantial domain knowledge in the code (i.e. do- The data structure above is automatically trans-
main speci c algorithms) results in increased diculty formed into:
on understanding and maintaining the particular sub-
system.
Performance: Subsystems that have been iden- struct any_DSSTATE {
ti ed as a performance bottleneck - time or space - int prg;
during testing due to, algorithmic complexity or poor int externs;
utilization of language constructs are also high on the int statmem;
candidate list. int curr_pdx_statmem;
int aregcnt;
The following subsections discuss in detail the is- int aregused;
sues involved for the migration of any given subsys- union FLAGS {
tem. int flags;
struct any_DSSTATE_8 {
int dbg:1;

4 Data Type Migration int


int
detail:1;
warn:1;
int detail2:1;
The rst step towards code migration is the trans- };
formation of data structures. };
Basic Type Conversions: A global table that int memused;
maps PL/IX primitive data types to their correspond- int maxmuse;
ing C/C++ data types was constructed. The type };
mappings are illustrated in Table.1.
code. The rst step in dealing with this problem
PL/IX Type C Type is nding eld names that are used more than once
integer int within the same compiler option. A dis-ambiguation
integer half short int process then locates all such names and builds a path
integer long long int of elds that is required to access each such eld. Once
oat double those paths are available, a subset of those chains
oat short oat is selected which completely distinguishes among the
oat long double ambiguous elds. For example, a record rec1 may
character char consist of eld fielda which consists of nested elds
bool bool fieldb and fieldc . One can then write an assignment
bit bool statement in PL/IX as fielda = val (as opposed to
bit(1) :1 fielda :fieldb :fieldc = x). The migration tool builds
bit(2) :2 a macro for each previously ambiguous identi er and
bit(8) unsigned char this macro de nition is consequently stored in a global
bit(9) :9 macro table. Examples of macro de nitions for trans-
bit(16) unsigned short forming reference components are illustrated below:
bit(17) : 17 Uses of eld access for the bb begin, bb end,
bit(32) unsigned bb count, and irreducible PL/IX elds (sample
from the original PP/IX code):
Table 1: Basic PL/IX to C type conversion.
do plx= bc(bb_end(bb))
repeat(bc(plx))

Macros to access elds: while(plx^=bb_begin(bb));


Statements
In order to replicate the access of a eld that is -------------
deeply nested within a record, a macro is generated do bb= 2 to bb_count;
for every eld access pattern found in the code. These Statements
macros allow the migrant C++ code to closely resem- -------------
ble its PL/IX ancestor. The syntax and the de nitions irred= irreducible(rgn);
for calling these macros are stored in tables during a
preprocessing phase. Another transformation issue re-
lates to PL/IX array references for which indexes do the C generated code for the PL/IX code segments
not necessarily appear next to the element in the ref- above is:
erence component to which they belong. For example,
in eld access a.b(i), it may be the case that a is an
array, not b as it may be thought initially. An elab- for (plx = bc[C_BB_END(bb)];
orate analysis of the PL/IX data structures, and the plx != C_BB_BEGIN(bb);
generated corresponding C++ data structures is re- plx = bc[plx]) {

quired in order to construct the correct transformation Statements


-------------
for such a eld access. This analysis automatically for (bb = 2; bb <= C_BB_COUNT; bb++) {
reconstructs the full path of the corresponding C++ Statements
structure, starting from the last, inner-most, eld on -------------
the reference component and proceeding towards the irred = C_IRREDUCIBLE(rgn);
outer-most eld of the structure. Array indices are
substituted, from left to right for each array related
eld found in the reconstructed path. "Liked" elds while the C corresponding macro de nitions for the
are also handled appropriately. eld accesses are:
Ambiguous eld names: The problem here is
that PL/IX allows paths of eld accesses to be abbre- #define C_BB_BEGIN(i)
viated (for instance, a:b:c is abbreviated as c). Such (bb_tab.bb_t[(i)].BB_BEGIN.bb_begin)
paths need to be reconstructed for the migrant C++
#define C_BB_END(i) Constructing the Foundation Classes: This
(bb_tab.bb_t[(i)].bb_end) rst step deals with the construction of a set of "foun-
#define C_BB_COUNT dation" classes used in the header les. Each of these
(bb_tab.bb_count) classes is de ned to have only static methods in order
#define C_IRREDUCIBLE(i)
to resemble the PL/IX semantics. For every public
(bb_tab.bb_t[(i)].flags.irreducible)
procedure in PL/IX, one class with the procedure as
its public method is created.
Literals: In this phase of the migration process, A PL/IX procedure group is transformed into a
we look for all PL/IX literal declarations and attempt class with multiple public methods. Every nested pro-
to classify them into one of six types. These types are cedure, or unexposed procedure in a group, is made
as follows: into a private method of the class which holds its outer
procedure (or group). This is not exactly orthodox
1. The literal is an integer. object-oriented programming (O-O-OP), but it allows
us to encapsulate a PL/IX procedure within a C++
2. The literal is an ambiguation class so that new functionality can be added to the
3. The literal is a string of PL/IX variable attributes procedure (for example, a nested PL/IX procedure)
in a transparent with respect to the rest of the legacy
4. The literal is a string that is an Identi er system way. An example of this type of transforma-
tion is illustrated below:
5. The literal is a string that is a PL/IX type
Given a the PL/IX procedure declaration, which
6. The literal is a bit mask de nes a nested procedure:
All of these cases are handled either by declaring
appropriate macros evaluating on the correct value of dslvbb: proc(bb);
the literal de nition, or by using the C preprocessor ..........
(i.e. #de ne) to de ne literal constant values. Exam- proc_ensc: proc(memref);
ples of literal de nitions include: ..........
The PL/IX code for literal de nitions is: end proc_ensc
end dslvbb

dcl mode32 integer constant init(0);


dcl tags_full lit('3'); the corresponding C++ foundation class are gener-
dcl tags_small lit('2'); ated as follows:

and the corresponding generated C code for the class Dslvbb {


PL/IX literal de nitions above is: public:
static void dslvbb(int bb);
private:
const int mode32 = 0; static void proc_ensc(any_L_BAG memref);
#define C_TAGS_FULL 3; };
#define C_TAGS_SMALL 2;
Merging Classes: The next step in the process
of creating class header les is merging classes that
5 Generating Class Header Files correspond to identical types for parameters passed
in several procedures. For example, if we have proce-
The second step for transforming a PL/IX applica- dures P1 and P2 with parameters p1 of type T1, and
tion to C/C++ was to generate the C/C++ header p2 of type T2, then we want to map P1 and P2 into
les using the data type information obtained from methods of the class C T1 corresponding to the type
the PL/IX Abstract Syntax Tree, the data type ta- T1. A Java based interface has been built for to make
bles, and the macro de nition tables. Generating the this process interactive and allow the user to specify
header les required the following steps: how the classes corresponding to procedures are to be
merged. As the reader might infer, a data type that modeled after the custom-built PL/IX domain model
is heavily used as a parameter to several procedures is (referred to as the "skeleton") is the driver for all
a primary candidate to be transformed to a class and transformations in the system. A tree traversal rou-
the procedures to be member functions with the par- tine, traverses a PL/IX Abstract Syntax Tree and for
ticular parameter removed. As an example consider each AST node corresponding to a PL/IX element
the scanner PL/IX data type which is transformed that needs to be transformed, a transformation and
to a class as follows: formatting function is invoked. For example when
transforming the body of a procedure the routine that
class Scanner {
handles the transformation of Statement is invoked.
private:
static alloc(int lim);
A selection process identi es what type of Statement
static new_item(int elem, P_BAG flags);
is to be transformed and the appropriate transforma-
void join(int lo, int hi, tion routine is invoked.
boolean ds_used_bit); An example on how the transformation process is
public: applied to the PL/IX Abstract Syntax Tree that corre-
void init_scan(int lst); sponds to an StatementIf is shown below. The trans-
int advance(); formation routine proceeds as follows:
int pick();
boolean ds_is_used(); PL/IX Code:
static int make_empty_list();
static void kill_list(int list); if ^kill then cannot_kill = true;
int find_item(int elem);
void delete_item();
void delete_lfrag(); Corresponding AST Node:
void insert_mem_dsc(Scanner sc,
boolean ds_used_bit); #1445 <a statementif>
boolean join_item(int m, class: STATEMENTIF
boolean ds_used_bit); parent-expr: #1456<a statementdoassigment>
void insert_live(int item, condition: #1457<a logicalnot>
P_BAG attrs); then-clause: #1458<a statementassignment>
void insert_after(int elem,
P_BAG p_flags);
void insert_item(int elem, Corresponding Domain Model:
P_BAG p_flags);
boolean join_live(Scanner join_me); StatementIf
void split_item(int m); condition : Expression
void constrict_item(int m); then-clause: Statement
boolean defrag_live(boolean ds_u);
};

Handling External Variables: After processing Transformation Process Trace:

all of the declarations in the original PL/IX code, all Transform Statement
of the external variables are stored in a global table. .. Transform StatementIf
These declarations are printed to a le that is not in- --> "if ("
cluded anywhere in the new C++ code, but references .... Transform Expression
to the variables in it are made via extern statements. ...... Transform LogicalNot
An alternative solution is to include this le in all oth- --> "(!"
ers and remove the corresponding extern declarations. ........ Transform IdentifierReference
--> "kill"
--> ")"

6 Source Code Transformations --> ")"


.... Transform Statement
...... Transform StatementAssignment
The main objective of this phase is to write out --> ...................
C/C++ source code that can be compiled with min- --> ...................
imal corrections. A set of transformation routines --> "cannot_kill = true;
be carried out concurrently, by compiling the modules
le-by- le, and then adding interfaces for routines and
Code Translates To: functions where necessary. Since the modules share
most of the interface code, less interface code needs
if ( (!kill) ) {
to be written for each new module so that the last
cannot_kill = true; }
step requires minimal e ort after the migration of sev-
eral modules. We have identi ed a number of manual
Formatting for nested functions is generated changes required for the new code to compile and link
through the use of a two way pass. A rst pass on properly. These relate to handling multi-dimensional
the AST is used to collect all necessary information, arrays, and eld accesses and to limitations on map-
while second pass generates the migrant code. Note ping the PL/IX semantics to C/C++ semantics (i.e.
that in order to conform with PL/IX semantics, nested in the case of LEAVE statement in PL/IX which has to
procedures have been de ned as private methods on be mapped to some form of GOTO in C/C++). Some
the "basic" class encapsulating the top level procedure of these manual changes have either been incorporated
(see data type transformations). in the tool or been agged so that the developers can
Syntactically correct C++ code is output to a .h or identify easily the points that manual intervention is
to a .cpp le. As expected, this process does not al- required.
ways guarantee that the resulting code will be seman- The migration process has limitations, including:
tically correct, with respect to the original application
code. However, the process provides a fast and rela-  Local variables of nested procedures: C++ does
tively reliable way to handle massive volumes of code not have nested procedures, so local variables that
and thus makes code migration a feasible alternative. are used in sub-procedures, need to be passed as
The nal result of this step is checked manually, and parameters to the sub-procedures (or be made
the generated code is passed through all test buckets class variables)
to verify the correctness of the migrant code.
 Insertion of gotos and labels: multi-level
A number of helper libraries have also been de ned leave/iterate statements have to be done auto-
for built-in PL/IX functions, such as those for mem- matically
ory allocation. For example, an ArrayALLOCATION(x)
function is de ned as a macro and corresponds to the  Grouping sub-procedures: static procedures of a
PL/IX statement allocation(x). Helper data types class will not work with the current treatment
are also used to simplify the appearance of the gen- for recursive procedures. A design has been pro-
erated code, and to make it look as much as possible posed on how to x the problem and will be im-
to the original PL/IX code 1 . These types inherit plemented with the next release of the transfor-
from basic library elements such as set, bag, list, mation tool.
sequence and array. For example a template class
PlixArray< type > was used to model one dimen-
sional arrays. 8 Tool Evaluation
Evaluation of the migration tool focused on four
7 Integration, Porting, Limitations major areas. The rst area examines the generality
of the tool in terms of the lessons learned on how
Once one or more subsystems of the legacy system the same transformation process can adopted in or-
have been migrated, they need to be integrated with der to migrate and transform applications written in
other existing PL/IX subsystems. This integration re- similar to PL/IX languages (i.e. PL/I). The second
quires (i) writing of emulation classes for PL/IX built- area, examines the performance of the new compiler
in data types and functions, (ii) manual xing of the with respect to memory usage, the third investigates
generated code where necessary, and (iii) writing of in- the performance of the migrant code compared to the
terface code, so the C++ code can access PL/IX func- original, while the fourth examines reductions in the
tions and data types. The last two steps can actually human e ort required to migrate legacy code.
1 This was another requirements for the migration e ort, in The main characteristics of the transformation
order to facilitate future maintenance of the migrant code process as compared to other transformation tools
Subsystem PL/IX Time C++ Time # of Runs Subsystem Code Size Transformation Time
(in minutes) (in minutes) (KLOC) (min:sec)
Test13k 11.96 12.19 956 BI 6.2KLOC 18:01
gcc 70.52 70.30 121 (9:04)
lib g++ 0.916 0.8626 1065 DS 7.8KLOC 28:12
JPEG 9.4257 9.106 764 (13:10)
SM 7.4KLOC 26:45
Table 2: Compilation time statistics with the old (14:00)
(PL/IX) and the new (C++) compiler. Results were
averaged on the number of runs. The experiments Table 3: The time to transform the selected subsys-
were conducted on four di erent models of RISC/6000 tems after parsing. Code size refers to actual code
machines. excluding comments and blank lines.

[Verhoef97] is the use of a repository that holds all to evaluate the performance of the new version of the
the necessary information for the system to be trans- optimizer (which includes the three migrated subsys-
formed, as opposed to applying transformations at the tems), the optimizer was run on four sets of source
syntactic/semantic level using grammar and semantic code, namely:
actions re-write rules. This repository is populated at i) A single 13,000 line pre-processed C source
parse time and is annotated appropriately by a cus- le, about 7,000 lines of which were actual code
tom made for PL/IX linker. All code transformations (rather than declarations); ii) The Independent JPEG
have been built in terms of modular application pro- Group's free JPEG software, sixth public release (C
grams at the repository level. It has become apparent sources); iii) Stage 1 of the GNU C compiler, version
from this project, that the transformation logic should 2.7.2.2 (C sources) and; iv) The GNU C++ library,
be as modular and localized as possible (i.e. di erent version 2.7.1 (C++ sources)
transformation programs for each language construct), The performance results obtained when running the
and focus on three main points; a) transformation of optimizer against these four test cases with the PL/IX
data types, b) development of support utilities (i.e. and C++ versions of the subsystems respectively are
macros, interface classes) and, c) transformation of reported in Table 2. These are preliminary results
the source code entities (i.e. the actual application). obtained after the optimization heuristics were added
The schema in the repository has been built in such a in the transformation tool. More results are being
way as to be compatible with the PL/AS and PL/X collected as the migration tool is undergoing further
domain model. Therefore, the transformation process enhancement related to heuristics that can be incor-
is also applicable, with minor schema and transforma- porated to the transformation logic and used for en-
tion logic changes, to applications written in PL/AS hancing the performance of the new code.
and PL/X. We currently investigate the possibility of The time it takes the tool to generate migrant code
applying the same transformation logic to applications as a function of the size of the input code (time for
written in PL/I. parsing is shown in parentheses) as well as the e ort in
As indicated in the introduction, the migration tool person days for adapting and integrating the migrated
was built in two phases. The rst phase aimed at de- code with the rest of the original legacy system are
veloping a tool which would produce C and C++ code illustrated in Tables 3 and 4 respectively.
that resembled substantially the hand-transformed These results indicate a signi cant enhancement in
code produced for a legacy subsystem. The second productivity for the migration process, with no de-
phase aimed on incorporating into the transforma- terioration in the performance of the migrated code.
tion routines heuristics that have been used to hand- In particular, manual transformation of one subsystem
optimize the migrant code. The migration tool has (DS) required approximately 50 person-days. By com-
been tested with three subsystems of the legacy sys- parison, it took only one person day to adapt the auto-
tem. The performance results obtained for these sub- matically generated migrant code for the same subsys-
systems are reported in Table 2. tem so that it can be compiled, linked, and integrated
As indicated earlier, the legacy system that has with the rest of the legacy system. Statistics gathered
been partly migrated is a compiler optimizer. In order for the other two subsystems (BI and SM) follow the
guage (as opposed to SmallTalk), and with a highly
Subsystem Code Size Automatic optimized application which, in our case, is the IBM
Transliteration compiler back-end. Therefore our margins of perfor-
(KLOC) (person days) mance improvement are limited compared to the ones
BI 6.2KLOC 0.3pd that could be obtained in SmallTalk to C conversion.
SM 7.4KLOC 2pd In [Feldman93] and [Gillesp98] the authors report
DS 7.8KLOC 1pd that the performance of the converted code using the
f2c and p2c utilities respectively, is the same as the
Subsystem Code Size Manual Transliteration
DS 6.2KLOC 50pd performance of the original code, indicating thus a
Subsystem Code Size Manual Optimization performance ratio of 1.0. However, the code gener-
ated by these tools is reported to be non-maintainable
DS 6.2KLOC 10pd due to the structure and the characteristics of the gen-
erated code (no informal information and code struc-
Table 4: The e ort to adapt and integrate the new ture preserved). This is to be expected since these
components to the rest of the system compared to the tools were meant to produce code, which after compi-
e ort to manually transform the same subsystem. lation is binary-equivalent with respect to the original
code. We took a di erent approach, and we generate
code that is structurally "similar" with respect to the
Subsystem Subsystem Original New original code so that we could still produce portable
Size System Size System Size and maintainable code without of course a ecting the
(KLOC) (MB) (MB) performance.
BI 6.2KLOC 27.150 27.265 On these aspects, our tool performs well, as it both
DS 7.8KLOC 26.738 27.976 maintains an acceptable performance ratio, and allows
SM 7.4KLOC 26.687 28.172 for the new code to be more maintainable and portable
due to resources, analysis tools, and the range of exist-
Table 5: The total size of the compiled system (binary) ing compilers available for C-based systems, as com-
before and after migration. pared to limited support that is available for PL/IX-
based systems. In this respect, we can say that our
tool could be placed in the middle of the spectrum rep-
same trend and indicate an e ort of approximately resented in its extremes by: i) the SmallTalk to C con-
half a person-day and two person-days respectively. verter which deals with low optimized SmallTalk con-
The diculty of the integration task depends largely structs and not compatible programming paradigms
on the usage patterns of the data structures in the between the source and the target language and on
legacy system which have not been migrated yet and the other end by ii) the Fortran to C converter which
therefore need be shadowed, so that the new version of deals with easier mappings between the source and the
the system can be compiled and linked as a whole. Fi- target language, but with no further maintainability
nally, in Table 5 the size of the old and the new version and portability requirements for the target system.
of the overall system is compared. As it is shown, the
new system is approximately 5% larger in size than
the original one. This can be explained by the added
C++ libraries to handle and simulate the behavior of
9 Conclusion
several PL/IX language-speci c constructs. We have described an experiment in code migra-
The comparison of the PL/IX to C converter dis- tion in the presence of global constraints, namely non-
cussed in this paper, with the other tools found in the deterioration in the performance of the maintainable
literature is summarized in the following points. In migrant code, and an incremental migration process.
[Yasuma95] the authors present a higher performance Our experiment suggests that it is possible to develop
ratio (that is the performance of the old system di- tools which reduce the human e ort required for the
vided by the performance of the new system) for the migration process by one to two orders of magnitude,
migrant code (ratio = 1.22) than the performance ra- while respecting such constraints.
tio we could obtain on our initial experiments (ratio = We are currently completing a thorough evaluation
1.02). The reason for this discrepancy is that we deal of the migrant code in comparison to its legacy an-
both with PL/IX, which is a highly optimized lan- cestor. We are also in the process of enhancing our
migration tools to meet maintainabilityand other non- [Borras88] Borras, P., Clement, D., Despeyroux, Th.,
functional requirements for the migrant code. Khan , J., Lang, G., Pasual, V., \CEN-
TAUR: The System", Proceedings of the
Acknowlgement SIGSOFT/SIGPLAN Software Engineering
Symposium on Practical Software Develop-
The authors would like to thank Bill O'Farrell and ment Environments, Boston, Mass, 1988.
Stephen Perelgut of IBM Center for Advanced Studies,
for their technical and management support without [Brodie95] Brodie, M., Stonebraker, M., \Migrating
which this e ort would not be possible. We would Legacy Systems", Morgan Kaufman Publish-
also like to thank Greg Mori for his e orts on the rst ers, 1995.
phase of the tool implementation. [Chen90] Chen, Y., Nishimoto, M., Ramamoorthy,
About the authors C., \The C Information Abstraction Sys-
tem.", IEEE Transactions on Software En-
Kostas Kontogiannis is an Assistant Professor at gineering, vol.16, No.3, 1990, pp.32 5-334.
the Electrical & Computer Engineering department, [CserRep] Consortium for Software Engineering Re-
University of Waterloo. His interests include software search: http://www.cser.ca
re-engineering, software migration and, distributed
object technology. Johannes Martin is a Ph.D can- [Feldman93] Feldman, S., Gay, D., Maimone, M.,
didate at the Computer Science department, Univer- Schryer, N., \A Fortran to C Converter",
sity of Victoria. His research interests include soft- AT&T Technical Report No. 149, 1993.
ware architectures, software re-engineering and, tools
for source code visualization. Kenny Wong is a post- [Finni97] Finnigan , P. et.al \The Software Book-
doctoral Research Associate at the department of shelf", IBM Systems Journal, vol.36, No.4,
Computer Science, University of Victoria. His inter- 1997.
ests include software migration, tools for visualization [Gillesp98] Gillespie, D., \A Pascal To C Converter",
of software architectures, and distributed systems. The HP-UX Porting and Archive Center,
Richard Gregory is a fourth year student at the Com- http://hpux.u-
puter Science department, University of Toronto. His aizu.ac.jp/hppd/hpux/Languages/p2c-
interests include software re-engineering and network- 1.20/readme.html
centric computing. Hausi Muller is a Professor at the
University of Victoria, department of Computer Sci- [Johnson94] Johnson, H., \Substring Matching for
ence. His interests include software visualization, dis- Clon e Detection and Change Tracking", In-
tributed object technology and network-centric com- ternational Conference on Software Mainte-
puting. John Mylopoulos is a Professor at the Uni- nance 1994, Victoria BC, 21-23 September,
versity of Toronto, department of Computer Science. 1994, pp.120-126.
His interests include software requirements, concep- [Ladd95] Ladd, D., Ramming, J., \A : a Lan-
tual modeling, software repositories, and software re- guage for Implementing Language Proces-
engineering. sors", IEEE Transactions on Software En-
gineering, vol.21, no.11, Nov. 1995, pp.894-
901.
References [Paul94] Paul, S., Prakash, A., \A Framework for
[Baker94] Baker S. B, \Parameterized Pattern Match- Source Code Search Using Program Pat-
ing: Algorithms and Applications", Journal terns", IEEE Transactions on Software En-
Computer and System Sciences, 1994. gineering, June 1994, Vol. 20, No.6, pp. 463-
475.
[Ballance92] Ballance, R., Graham, S., Van De Van- [Reps84] Reps, T., Teitelbaum, T., \The Synthe-
ter, M., \The Pan Language-Based Edit- sizer Generator", In Proc. of the SIG-
ing System", ACM Transactions on Software SOFT/SIGPLAN Symposium on Practical
Engineering and Methodology, Jan. 1992, Software Development Environments, Pitts-
Vol. 1, No.1, pp.95-127. burgh PA, 1984, pp.42-48.
[Sigma98] Sigma Research Inc. : http://www.sigma-
research.com/for2win
[Ste en85] Ste en, J., \Interactive examination of a C
program with Csope", Proceedings USENIX
Assoc., Winter Conference, Jan. 1985.
[Yasuma95] Yasumatsu, K., Doi, N., \Spice: A Sys-
tem for Translating SmallTalkPrograms Into
a C Environment" IEEE Transactions on
Software Engineering, vol. 21. no.11, Novem-
ber 1995.
[Verhoef97] van den Brand, M., Sellink, A., Verhoef,
C., \Generation of Components for Soft-
ware Renovation Factories from Context-
free Grammars", Proceedings Working Con-
ference on Reverse Engineering, WCRE'97,
Amsterdam, The Netherlands, October
1997.
[Xino98] Xinotech Inc. http://www.xinotech.com

You might also like