You are on page 1of 6

Project Report

C to C++ Migration

CS 846: Design Recovery and Transformation


Prof. Andrew Malton

Submitted by:
Raihan Al-Ekram 20082523 rekram@swen.uwaterloo.ca
Gerald Tarcisius 96166393 gtarcisi@swen.uwaterloo.ca

April 27, 2002


1 Introduction
The idea behind this project was to perform dialect conversion between two dialects of C.
We have developed a process that takes as input a dialect of C that incorporates ANSI C
and other older dialects. The output is a dialect of C that compiles with a C++ compiler.
Ideally, C++ should be a simple superset of C, but for various reasons, there are
discrepancies between C and C++.
This stage of the process – the Migration stage – takes as input a post factored UID
marked-up program containing a fact-base at the beginning. It outputs a UID marked-up
program that omits the fact base, and which contains C code that should compile with a
C++ compiler. However, the program is not perfect, and there are instances in which it
will not work. Furthermore, the sample transformations we provide (Section 3) are fairly
trivial, and do not necessarily include the full extent of the actual TXL transforms.

2 Proposed modifications
2.1 Changes made in the grammar
New grammar rule addition modifications are done in Overrides.Grammar on top of the
miracle overrides. The major modification is redefining non-terminal reference_id to
accommodate multi level UID markups of pointers and functions as discussed in the
class.

2.2 Introducing new facts


Also new fact bases were introduced to facilitate migration. The added facts are:
§ Function definitions: UID of the functions defined in the program (not
referenced).
§ Variable types: The type of the a variable expressed as a [uid]. For instance, the
type of a void pointer might be “void *”. It is advisable that we use a
[type_name] or some other equivalent for this in the future.

3 Transformations
3.1 K&R type function definition
In K&R style of C programming formal parameters of functions and their types are
declared separately, which is not allowed in C++. So types for formal parameters need to
be declared with them.

Before After
int kAndR (s) int kAndR (char * s)
char * s; {}
{}

2
3.2 Missing prototypes
All the functions used in a C++ program must be declared. In C, the compiler implicitly
declares the functions as needed. So the functions that are ref-ed but not def-ed should be
explicitly def-ed by adding prototypes for them.

Before After
int caller () { protolessSubr ();
protolessSubr (); int caller () {
} protolessSubr ();
}

3.3 Omitted arguments


If a function prototype does not have any argument, the C++ compiler interprets it as no
arguments, whereas in C it implies any number of arguments. So function prototypes
with no arguments should be translated to accept any number of arguments.

Before After
protolessSubr (); protolessSubr (…);

3.4 Implicit type declarations


All the declarations must have a type in C++. In C identifiers without types are implicitly
int. So all identifiers without types declared should be explicitly declared as int.

Before After
protolessSubr (…); int protolessSubr (…);

3.5 Resolving name space conflict


In C++ type names, structure names and enumerated type names are all in same name-
space, whereas in C they are all in their own name-space. As a result conflicts in name-
space have to be resolved by adding suffixes after the identifiers to make them unique.

Before After
typedef int foo; typedef int foo;
enum foo {FOO_A}; enum foo_enum {FOO_A};

3.6 Control transfers bypassing initializers


Control transfer inside compound state ments should not bypass any initializers inside the
compound statement. The most common case is initializers inside a switch statement
block. Any such initializers should be brought outside the block.

Before After
switch(y){ {
int x=5; int x=5;
case 1: x=3; switch(y) {

3
} case 1: x=3;
}
}

3.7 Changing C++ reserved words

Before After
int new; int new1;
new = 5; new1 = 5;
All ids in C that have C++ reserved keyword names need to be changed to some other
unique id. This was done by first looking for the <def> associated with all ids, and
storing the uid and actual id in two separate global lists. Then, every <ref> associated
with each of the <def> instances was found, compared with the two lists, and changed.
So, for instance, “int new;” would be changed to “int new1;”. This was done using
TXL’s [!] command.

3.8 Mark-up enum declarations with <type>…</type>


Any function definition that returned an enum type, or any declaration of enum type was
marked up as follows:

Before After
enum foo enumReturner () <type>foo</type> enumReturner ()
{ {
enum foo x; <type>foo</type> x;
return 1; return 1;
} }
The enum fact was used to determine which ids were in fact of enum type.

3.9 Adding explicit cast to enumerated types from integer type


Return values and assignments were cast as follows:

Before After
enum foo enumReturner () enum foo enumReturner ()
{ {
enum foo x; enum foo x;
x = 1; x = (foo)1;
x = y; x = (foo)y;
return 1; return (foo)1;
} }
In fact, regardless of whether or not the return value or rvalue was an int, they were
always cast to be on the safe side. The “typeof” fact was introduced and used in this part
to determine if a variable such as “x” above is an instance of an enum type. Ideally, we
would make sure that the rva lue was not an instance of the respective enum type, or a
member of the respective enum type.

4
3.10 Adding explicit cast from void* to T* for other T

Before After
void* pvoid(); void* pvoid();
int * enumReturner () int * enumReturner ()
{ {
int* i = pvoid(); int* i = (int*)pvoid();
void* x; void* x;
return x; return (int*)x; // case 1
} }
One assumption we make is that all declarations are one per line. This way, it is easier to
determine the type of a variable. “Case 1” in the table above still needs to be done.

3.11 Allow only one definition of external symbols in a compilation unit

Before After
int* gm = (int*)0; int* gm = (int*)0;
const int gl; const int gl;
const int* gm, gn; const int gn;
const int* gm;
const int gl;

This functionality ensures that there is only one definition of an external variable in a
compilation unit. We still need to ensure that the declaration with the initialization (if
any) is maintained. Right now, the first declaration is always chosen to be maintained.

3.12 Initialize global consts

Before After
const int gl; extern const int gl;
const int* gm, gn; extern const int* gm;
int* const gi, gj, gk; extern const int gn;
extern int* const gi;
int gj, gk;
Global consts must be initialized. In C, they are implicitly extern and need not be
initialized.

3.13 Remove superfluous ampersands

Before After
int array [10]; int array [10];
int *ptr; int *ptr;
ptr = & array; ptr = & array;
int* someFunc (void* x) int* someFunc (void* x) {}
{} int main (){
int main (){ int (*function)(void*) =
int(*)(void*)) someFunc;

5
int (*function)(void*) int(*)(void*)) someFunc;
= (int(*)(void*)) }
&someFunc;
}
An array variable is an rvalue, and has no address. Furthermore, “&”s in front of
function names do not mean anything, and are also removed.

4 Conclusion
We have attempted to pr ovide sufficient functionality to cover all the major migration
issues discussed in the course and in the project specification. There is still significant
work that must be done to cover all the other cases. As the old adage goes: “For every
rule (or function), there is an exception!”

You might also like