You are on page 1of 6

Migrating from Legacy to SoA (Invited Talk)

Harry M. Sneed
SoRing Kft.
Budapest, Hungary
Harry.Sneed@T-Online.de

Abstract—This presentation discusses a strategy for migrating be invoked. Each operation has its own parameters, rules and
to a service-oriented architecture. The starting point is legacy return values.
code in a procedural or object-oriented language. The result is a
set of web services that can be accessed in a private or public
A service-oriented architecture is intended to support the
cloud. The technique used is to cut out selected portions of code
and to wrap them behind a service interface. The code itself can enterprise business process. It is the task of the IT department
be left in the original language. The service interface is in WSDL. to provide the services. It is up to the user business
The speaker describes how to go about selecting code for reuse departments to model their processes. In the end the two
and how to extract that code from its current environment. Case worlds should meet. There are two ways of reaching this end:
studies are given for the languages COBOL and Java. The  Top down from the process model to the services
presentation then goes on to describe how to test the services  Bottom-up from the services to the process model.
using a web service testing tool which generates artificial requests
from the service interface definition and validates the responses
against the assertions provided by the tester.
The top-down approach implies that first the business
processes are modeled and then the services are implemented
Index Terms—SoA Migration, Legacy Code, Service extraction, to fit to that model. The service developers customize their
Service wrapping, Interface generation, WSDL, Web-Service software to satisfy the requirements of the model. Any time
testing, Test automation, Test coverage measurement. the model is changed the underlying services are changed to
follow the model. This can also be referred to as the model-
I. REUSING LEGACY CODE AS SERVICES driven approach [4].
One way to get to a service-oriented architecture is by reusing
the existing code base to create services. The idea is not new. The bottom-up approach assumes that first the services exist
It goes back to the very beginning of the object-oriented and then the business processes are modeled in a way that they
movement. At that time developers were faced with the fit to the existing services. This way the same common
problem of what to do with the existing procedural code. It services can be reused to satisfy the requirements of different
was Wally Dietrich who suggested at an OOPSLA conference business processes. Not only can standard services be used but
in 1988 to wrap it [1]. By wrapping it behind an object also existing legacy services. When the standard or legacy
interface, a legacy program could be reused as an object services change the business process model has to be adapted
among other objects in an object-oriented architecture. Of to reflect that change. This is called the service-driven
course this was not so simple. The legacy source code had to approach. For this approach, the services must first be
be altered to fit to the interface. In a paper entitled collected and made available. A prime source for web services
“Encapsulating Legacy software for Reuse in Client/Server is as has been pointed out in previous papers the existing code
Systems” published in 1996, H. Sneed described a source base. [5]. Web Services can be identified in and extracted from
reengineering approach to replace file and screen interfaces by existing source code, whether it is procedural or object-
call interfaces in order to access the code from another oriented.
program [2]. This approach was refined and used in later
projects to reuse COBOL programs as web services [3]. Business analysts prefer the top-down model as long as they
are not paying for the development of the services, since this
Now, 20 years after the object-oriented transition, the software approach gives them the maximum freedom to shape the
world is faced with another transition, that from object- business processes any way they want. The bottom-up
orientation to service orientation. Objects should now be approach is preferred by the IT departments since it allows
converted to services which can be accessed from anywhere them to optimize their service offer. In both cases the process
within an enterprise architecture. Besides the fact that services model and the services have to be joined together both
are universally accessible, they are also at a higher level of physically and logically. Physically they are joined by a
granularity than objects which encapsulate single data message sending mechanism or a service invocation. The
elements or small groups of data such as an address. Services business process sends a request to the desired service and
can contain whole applications such as a travel booking with receives back a response. This communication is normally
many points of entry. Each entry is defined as an operation. handled by an enterprise service bus. Logically they are joined
Booking a flight is only one of many travel operations that can by connecting the business process model with the model of

978-1-4673-7935-9/15/$31.00
c 2015 IEEE 1 MESOCA 2015, Bremen, Germany
the underlying services. By linking the two model descriptions WRAP* AT END MOVE '09' TO DISPATCH-STATUS
together it becomes possible to trace changes from one to WRAP MOVE 'RD' TO XML-FUNCODE
another and to make an impact analysis on the effects of WRAP MOVE ZEROES TO XML-RETCODE
change to one model on that of the other model. The linkage of WRAP MOVE 'DISPATCH' TO XML-FILNAME
the two models is an essential prerequisite to the maintenance WRAP MOVE 2048 TO X-REC-LNG
and evolution of the service architecture as a whole. WRAP ENTRY'DISPATCH' USING XML-FUNCODE,
XML-RETCODE,XML-
The language of the services is the programming language FILNAME,LIEFERPOSTEN
with which they are implemented in. It could be Java, C#, DISPATCH-STATUS
COBOL, PL/I or any other programming language. It could
also be in a unified modeling language like UML. Newer Here the original read statement is converted to an entry
applications may be depicted in UML but that is seldom the statement so as to be able to call this procedure with a stream
case. Older applications are described only by the of data from outside. The type of wrapping differs between
programming language they are written in. That means, in online, batch and sub programs [7].
order to obtain a description of the existing software it is
necessary to extract it from the code by means of reverse The statements within the procedural blocks can be considered
engineering. As once noted by Allan Perlis the only true to be business dependent if they refer to a business object,
description of a program is the code itself [6]. otherwise they are implementation dependent. The
intertwining of business and technical statements within the
II. REUSING LEGACY CODE AS SERVICES same procedural blocks is next to the naming of data the
Legacy procedural languages such as COBOL, PL/I and RPG greatest obstacle to mapping procedural code on to web
have separately compilable source code members. Their services. For this reason the code needs to be re-factored
compiled objects are linked together to create a load module or before wrapping. Technical code blocks should be factored out
run time unit. One of the modules acts as the main module and and placed in separate service routines. In may even be
calls the others. The linked run time unit is referred to as a necessary to retest the re-engineered module before going on
program. The parts thereof are referred to as modules. At a [8].
program level we have data objects in the form of structures.
Structures can be nested, so there are objects within objects, III. REUSING OBJECT-ORIENTED CODE
some of which may have multiple occurrences. Some of the The approach pursued here foresees three language
objects such as file records, database records and map reports transformations. In the first transformation the method
are imported via an input interface. The same or other records interfaces are transformed into a relational table. In the second
of the same type are exported via an output interface. Objects transformation a WSDL interface definition is generated from
may also be received and returned via a call interface. Objects that relational table. In the third transformation, both a BPEL
are as a rule singletons but with an occurrence attribute they business process to use the service and a test script to test the
can have many instances. Some structures correspond to service are produced from the WSDL service interface
business objects but most are strictly of a technical nature. definition (see Figure 1).

In COBOL and RPG the procedural code is separated from the


data descriptions. The procedural part is broken up into
procedural blocks, each with its own unique label. Procedural
blocks, i.e. paragraphs in COBOL or Begin blocks in PL/I are
executed is sequence, but they can also be performed with a
return to the next statement or be branched to via a GOTO
with no return. In PL/I the procedural blocks may also contain
local data which is allocated when the block is invoked and
released when the block is terminated. The code within the
procedural blocks contains the conditions for executing
statements. Some, but not all of these conditions have to do
with business logic and can be considered to be business rules. Figure 1: Interface Code Transformations
The statements for sending and receiving data, for accessing
databases and files and for calling other programs are nested In automatically generating web-service interfaces from
inside the procedural blocks where they can be conditional or xisting object-oriented code, it is necessary to parse that code
non-conditional. Before commencing with the analysis of the to identify all publically declared methods. It is relatively easy
code, the access operations to the controlling files and maps to convert Java C++ or C# method interfaces over into a
are wrapped as depicted below. WSDL operation interface. The name of the method becomes
the name of the operation and the names and types of the
WRAP* READ DISPATCH-FILE arguments become the input parameters in WSDL.

2
Problematic is the conversion of the return results. Since there IV. STEPS TO WRAPPING WEB SERVICES
may be several return statements within a single method, each
returning another result of the same type, these have to be There are in all seven steps to wrapping existing code for
collected together and converted into a set of alternate output reuse as web services:
parameters in WSDL. The caller will recognize which the a) Parsing the code to populate a repository
actual result is, since the others will be empty. The data type b) Processing the repository to collect all data types re-
declarations in the WSDL schema will be taken from the ferred to
argument and result types defined in the method declaration. c) Processing the repository to collect all entries to and
exists from the code
<resultType> <methodName> (<arguments>) d) Generating the type definitions of the message data
e) Generating the WSDL operations and messages
Another problem comes up with the polymorphic methods. f) Documenting the interfaces to the code
Methods with the same name can appear in many classes. g) Generating a test script.
Therefore it is necessary to qualify the operation names in
WSDL with the names of the classes to which the methods A. Populate the Code Repository
belong. Thus, from the viewpoint of the service user they are
different operations. For the service user it is only necessary to The first step is to parse the code to populate a repository.
assign the arguments and to invoke the operation as is done in Every statement needs to be analyzed whether it is relevant to
BPEL. Of course the service user must first invoke the the structure of the code. If so, it becomes an entry in the
constructor operation before starting to invoke the others. For repository table. In the end the repository is a mirror of the
this, a special constructor operation for each reused object is code structure, an abstract syntax tree without all of the
inserted into the service interface. This is another difference to details.
procedural wrapping where the program state is singular and
static. In the case of objects, there may be multiple instances public void Bausparer ( VarChar cRole,
and the service user must know which instance is currently Nummer nVersion,
active. For that an object identifier is required as a prefix to TimeStamp tsReadTime,
the operation name. If the object is derived from a higher order int iSeqNumber,
object, i.e. it is inherited; the qualifying name must include the Nummer nGrdNr,
full path to that operation, e.g. Nummer nBBZS,
VarChar cNachname,
<class_A.class_A1.class_A11.Operationname > VarChar cVorname,
VarChar cKzTitel,
The names of the superordinate classes are collected by tree Datum dGeburtsdatum,
walking up the hierarchy of classes contained within the boolean bKzArchiv,
system. This may result in very long operation names, but it is Adresse oAdresse
necessary to uniquely identify each and every operation. ){
Identifying all of the outputs of an operation presents another
challenge. As already mentioned before, the return statements B. Process the Repository in the First Pass
may appear anywhere within the method being reused. They
must be collected together with their types to define a union of The second step is to go through the repository and collect all
alternate output variables. It is possible to declare redefined of the entries = public interfaces, and exits = returns. These are
data types in XML using the choice statement meaning that the marshaled into an internal operation table for further
data element is either one or the other. The data type “return- processing.
result” may be defined in the XSD schema as being: CLAS;Bausparer ;OWNS;FUNC;Bausparer
FUNC;Bausparer ;USES;INTR;Bausparer
<xsd:complexType name = “returnvalue” INTR;Bausparer ;RECV;PARM;cRole
<xsd:choice> PARM;cRole ;USES;TYPE;VarChar
<xsd:element name = “resulta”> INTR;Bausparer ;RECV;PARM;nVersion
<xsd:element name = “resultb”> PARM;nVersion ;USES;TYPE;Nummer
<xsd:element name = “resultc”> INTR;Bausparer ;RECV;PARM;tsReadTime
</xsd:choice> PARM;tsReadTime ;USES;TYPE;TimeStamp
</xsd:complexType> INTR;Bausparer ;RECV;PARM;iSeqNumber
PARM;iSeqNumber ;USES;TYPE;int
The invoking business process procedure must know which of INTR;Bausparer ;RECV;PARM;nGrdNr
the return value types to expect. PARM;iGrdNr ;USES;TYPE;Nummer
INTR;Bausparer ;RECV;PARM;nBBZS
PARM;iBBZS ;USES;TYPE;Nummer

3
</operation>
C. Process the Repository in the Second Pass <operation name="BausparerMarshalling">
<input message="tns:BausparerMarshalling_Input"/>
The third step is to go through the repository and collect all of <output message="BausparerMarshalling_Output"/>
the data types referred to in the interface and return <operation name="Person">
statements, i.e. the elementary variables, structures, arrays and <input message="tns:Person_Input"/>
unions. These definitions are needed to create the WSDL data <output message="tns:Person_Output"/>
schema. </operation>
</portType>
D. Generate the XSD Type Definitions
F. Document the Interface Definitions
The fourth step is to generate the XSD data schema from the
data types in the internal data table. If the references are made The sixth step is to generate a documentation of the generated
to data types in external schemas, then these schemas are web service interface. The interface is displayed in a tabular
copied into the interface definition source, so that in the end form with four columns. In the first column are the service
the source includes all data definitions. If the external schemas names which are equivalent to the interface names. To view
are changed they will not effect this interface definition. the contents of the interface, the user has to select one. In the
second column are the names of the operations belonging to
<complexType name = "Person_Input_Params"> the service selected. Here too, the user can select one. In the
<sequence> third column are the input and output parameters of the
<element name = "cRole " type = "ns:string"/> operation selected. When the user selects one, the data
<element name = "nVersion " type = "ns:iNummer"/> elements of the input or output are displayed in the fourth
<element name = "tsReadTime " type = “TimeStam"/> column with level, name and type.
<element name = "iSeqNumber " type = "ns:int"/>
<element name = "nBBZS " type = "ns:string"/> In each column the user has the possibility of viewing all
<element name = "dGeburtsdatum " type = entries in that column, e.g. all operations or all data elements,
“Datum"/> from which he can select one. The user also has the possibility
<element name = "nGrdNr " type = "ns:Number"/> here to change the names. If he alters a name, that name will
<element name = "cNachname " type = "ns:string"/> be replaced in all instances of that column. This is particularly
<element name = "cVorname " type = "ns:string"/> important for renaming the data elements from old procedural
<element name = "cKzSparer" type = "ns:string"/> programs. To be useful in a business process model the names
<element name = "bKzArchiv " type = "ns:boolean"/> should be speaking ones.
<element name = "cOrt " type = "ns:string"/>
<element name = "cPlz " type = "ns:Nummer"/> This hierarchical view of the service interfaces is very
<element name = "cStrasse" type = "ns:string"/> important to the reuse of the services, since it gives the
</sequence> potential user an insight into the contents of the service. He
</complexType> can readily see what goes in and what comes out. This makes
it easier to select, which existing services he wants to use.
E. Generate the WSDL Operations and Messages Other WSDL interfaces not taken from the existing code may
also be documented in this form, thus giving the user an
The fifth step is to generate the messages and operations. First overall view of all services available to him in a service-
the messages are created with parts referring to the data types oriented architecture (see Figure 4)
defined in the data schema or to elementary XML types. Then
the operations are created with inputs and outputs. The inputs
refer to the messages taken from the arguments of the function
called. The outputs refer to the predefined return results. The
operation itself bears the name of the qualified method, i.e.
with the concatenated names of the classes to which it
belongs. Finally the SOAP binding is generated to complete
the interface definition and a service name assigned, the
service name is the name of the component or package in
which the operations are located.

<portType name="BAUSPAR_DBBSAG">
<operation name="Bausparer">
<input message="tns:Bausparer_Input"/>
<output message="tns:Bausparer_Output"/>

4
for both requests and responses to each of the operations made
available. The assertions for the requests assign data to the
input parameters. The assertions for the responses validate the
output results against the expected values. It is left to the tester
to assign specific data values such as strings and numbers.
They can of course be randomly generated from the parameter
types, but it is better when the test values are taken from the
previous tests of the original code. If a test driver such as
JUnit was used, the assertions of that test can be taken over. In
any case, whether the data is automatically generated, copied
from previous tests or assigned manually, it is important to test
each and every wrapped operation before making it available
in the SOA. There are several tools for testing web services,
but the one used here is WebsTest [10].

service: BausparerSOAPServerservice;
if (testcase = „Bausparer001");
// It should be possible to select a Bausparer byGrdNr
Figure 4: View of Interface Definition
// and BBZS
if ( operation = „Bausparer");
In March of 2012 the Object Mangement Group released the
if ( request = „Bausparer1Request");
Version 1.0 of a new Service-oriented Modelling Language –
if ( object = "Person" );
SoaML – to model the semantics of software services [9]. This
assert inp.CRole = “Besitzer";
language has since been built into several UML modeling
assert inp.nGrdNr = „4711";
tools including IBM’s Rational Software Architect. For users
assert inp.bBBZS = „120036";
of these tools it is very important that their service architecture
assert inp.tsReadTime = „201303130700";
can be modeled with this new standard notation. The tool
assert inp.iSeqNumber = "100";
SofReuse described here has much of the information required
endObject;
to produce that model. It knows the services, the operations,
endRequest ;
the messages and the parameters. They can be passed over to
if ( response = „Bausparer1Response");
the SoA modeling tool in the form of an XMI interface file.
assert out.$ResponseTime < „1200";
There they can be represented as a Service Interface prototype
if ( object = "return" );
as depicted in Figure 5.
assert out.cNachname = „Schmidt";
assert out.cVorname = „Karl";
What is not known is the client side, i.e. what requests will be assert out.dGeburtsdatum = „19220420";
sent in what order and what callback operations are required. endObject;
These modeling elements must be filled in by the system endResponse ;
architect, but at least the server side of the model is there and endOperation;
can be expanded. This is one of the issues to be dealt with in endCase;
future work.

V. TESTING THE NEW SERVICES

A recommended method of testing web services is to use a test


script to simulate all potential clients [11]. It also allows a
more fine-grained test coverage measurement. The frame of
the test script is generated automatically when the service
interface is genaerated. In the test script values are assigned to
the input parameters of the service. Some values are singular;
others are ranges of values or sets of representative values.
There is an n:1 relationship between data values and data
types. The lowest level of data coverage is to set each input
Figure 5: Service Interface Prototype parameter with any value at all. This is referred to as
parameter coverage. The next higher level is to assign each
G. Creating a Test Script specified input value at least once. This is referred to as value
coverage. The highest level of data coverage is to assign each
The final step of the process is to generate a script for testing possible combination of values within a message. This is
the encapsulated web service. The script contains assertions

5
referred to as state coverage. In the case of value coverage the For future work, it is foreseen to refine the wrapping
number of test requests generated is additive. In the case of algorithms. It has proved to be very difficult to isolate units of
state coverage it is exponential [12]. code, especially in object-oriented languages due to the many
foreign method invocations. More research is required in how
When preparing the test procedures to test a service, the testers to reduce code dependencies and to isolate legacy code
should be aware of what coverage level they want to achieve: functions from their environment. That would facilitate the
 parameter coverage reuse of existing code and reduce the costs of migrating to
 value coverage or SoA.
 state coverage.
This will influence how they write the data assignments either Finally, more work needs to be done in validating and
with single values or with arrays of values. evaluating the imported services. It must be demonstrated that
the new services perform exactly the same as the old code
The same rules apply to the output of a service. In the test procedures and methods. They must be functionally
procedure the testers will check if a returned value is a equivalent. Once these problems are solved the costs of
member of the set of expected values. This is to validate that migrating to SoA will be much less.
the result is correct. The lowest coverage level for output is to
REFERENCES
see that the response parameters have been set at all with any
value. The next higher coverage level is when the return [1] Dietrich, W.: “Saving a Legacy System with Objects”, Proc. of
OOPSLA-88, ACM Press, New York, 1989, p. 54
values are checked against a specified combination. The [2] Sneed, H.: “Encapsulating Legacy Software for Reuse in Client
testers must formulate post conditions which cover all return Server Systems” Proc. Of 3rd WCRE, IEEE Computer Society
values and if necessary, all combinations of return values. Press, Monterey, CA., Nov., 1996, p. 104
[3] Sneed, H.: ”Integrating legacy Software into a Service oriented
Thus on the output side the tester has a choice between Architecture”, in Proc. of CSMR-2006, IEEE Computer Society
 parameter coverage Press, Bari, March 2006, p. 3
 value coverage and [4] Winter, A., Ziemann, J.: “Model-based Migration to Service-
oriented Architectures”, in Proc. of SOAM Workshop, CSMR-
 state coverage. 2007, Amsterdam, p. 107.
[5] Hasselbring, W., Conrad, S., Koschel, A.: Enterprise Application
This may vary from operation to operation depending on the Integration, Elsevier Akademischer Verlag, Heidelberg, 2006
[6] DeMillo, R./ Lipton, R./ Perlis, A.: “Social Proceses and Proofs
criticality of the operation. of Theorems and Programs”, Comm. Of ACM, Vol. 22, No. 5,
May 1979, p.22
VI. CONCLUSIONS AND FURTHER RESEARCH [7] Sneed, H.: “Wrapping Legacy COBOL programs behind an
XML Interface”, Proc. of Working Conference on Reverse Eng.,
This presentation has presented a tool supported method for IEEE Computer Society Press, Stuttgart, Oct. 2001, p. 189.
[8] Canfora, G, Cimitile, A., Munroe, M.: “Reverse-engineering and
creating a set of SoA services from existing code. The legacy Reuse re-engineering”, Journal of Software Maintenance, Vol.
code can be either procedural or object-oriented. In the case of 6, No. 2, March, 1994.
procedural code it will have to first be re-engineered to [9] Fischbach, M./Puschmann, T./ Alt, R.: „Service LifeCycle Ma-
nagement“, Wirtschaftsinformatik, Nr. 1, Feb.2013, p. 51
remove unwanted dependencies. It is especially important to [10] Sneed, H. / Huang, S.: “WSDLTest – A tool for testing Web Ser-
separate the business logic from the technical processing logic. vices”, Proc. Of WSE-2006, IEEE Computer Society Press, Phil-
In the case of object-oriented code the public methods can be adelphia, Sept. 2006, p. 14
selected for reuse. But, here too dependencies must be [11] Tsai, W.T., Zhou, X,, Chen, Y., “On testing and evaluating ser-
vice-oriented software”, IEEE Computer Magazine, August,
removed. Then the code can be wrapped whereby the 2008, p. 40
wrapping interfaces are stored in a SoA service repository. [12] Sneed, H.: “Testing Web Services in the Cloud” in Software
From there the final WSDL interfaces and the test scripts to Testing in the Cloud, Ed.: S. Tilley & T. Parveen, IGI Global,
test them are generated. In addition, a documentation of the 2013, p. 136-172
services and their interfaces are provided. In the end, the user
has a good starting point for building up an enterprise wide
service architecture.

You might also like