You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321198405

Migrating Enterprise Legacy Source Code to Microservices: On Multi-Tenancy,


Statefulness and Data Consistency

Article in IEEE Software · June 2018


DOI: 10.1109/MS.2017.440134612

CITATIONS READS

25 663

5 authors, including:

Andrei Furda Olaf Zimmermann

16 PUBLICATIONS 173 CITATIONS


Hochschule für Technik Rapperswil
93 PUBLICATIONS 2,202 CITATIONS
SEE PROFILE
SEE PROFILE

Some of the authors of this publication are also working on these related projects:

CYBERCARS View project

Decision Support View project

All content following this page was uploaded by Andrei Furda on 25 March 2019.

The user has requested enhancement of the downloaded file.


1

Migrating Enterprise Legacy Source Code


to Microservices:
On Multi-Tenancy, Statefulness and Data
Consistency
Andrei Furda, Colin Fidge, Olaf Zimmermann, Wayne Kelly and Alistair Barros

Abstract—Microservice migration is a promising technique to incremen-


tally modernise monolithic legacy enterprise applications and enable
them to exploit the benefits of Cloud computing environments. In this
article we elaborate on three challenges of microservice migration: multi-
tenancy, statefulness and data consistency. We show how to identify
each of these challenges in legacy code and explain refactoring and
architectural pattern-based migration techniques relevant to microser-
vice architectures. We explain how multi-tenancy enables microservices
to be utilised by different organisations with distinctive requirements,
why statefulness affects both availability and reliability of a microservice
system and why data consistency challenges are encountered when
migrating legacy code that operates on a centralised data repository
to microservices operating on decentralised data repositories. We also
explain the interdependencies between multi-tenancy, statefulness and
data consistency.
Fig. 1. Legacy (e.g., monolithic) architecture compared to a Microser-
vices architecture.
Index Terms—Microservices, Multi-Tenancy, Refactoring, Architectural
Patterns, Statefulness, Data Consistency
enables organisations that demand complete autonomy
1 I NTRODUCTION in the administration of their users and associated
data to share access to the same physical instances
Modernisation of legacy enterprise systems is a challenge
of the system and to the application instances, while
faced by many organisations. It is needed in order to meet
keeping their data strictly separated and ruling out
high scalability and high availability needs by exploiting
any superuser having control and access over it. Multi-
the new Cloud computing technologies. Assuming that the
tenant applications must be highly configurable with
legacy system’s source code is available and maintainable,
tenant-specific settings. Legacy enterprise applications
Microservices [19], [14] are a promising solution, in which
have often been designed for single organisations and
centralized services are reimplemented as multiple indepen-
operate in single-tenant mode.
dent services (Figure 1). Microservices support incremental
Statefulness in the context of microservices is the
modernisation, leading to highly scalable systems [10] with
ability to retain state information that was generated
high availability through redundancy of service instances
previously. The response of a stateful microservice may
and reduced costs. Microservices also facilitate the low-
depend not only on the most recent service request, but
risk, small-scale incremental modernisation that is often
also on the retained state from previous interactions.
preferred to large-scale approaches [11].
Ideal microservices are stateless, but monolithic legacy
In this article we explain three closely related challenges
code is often stateful.
of microservice migration: (i) multi-tenancy, (ii) statefulness
Data Consistency is enforced by a system’s ability
and (iii) data consistency.
to affect data stored in a shared repository only in
a
Challenges of Microservice Migration allowed ways. Data consistency defects can be intro-
Multi-Tenancy is a system’s ability to fulfil the re- duced by mistake when migrating sequential legacy
quirements of multiple groups of service consumers, code that accesses a centralised data repository to mi-
organisations, and even competitors in an industry. It croservices that access decentralised data repositories.
2

tenant-aware business objects need to be tagged with the


correct tenant context as follows (Listing 1).
Listing 1. Multi-tenancy through tenant-aware I/O channels.
processInputChannel ( TenantAwareBO i np ut ) {
//1. r e t r i e v e t h e t e n a n t c o n t e x t
te nan tID = getTenantID ( ) ;
//2. s e t t h e t e n a n t c o n t e x t
i np ut . setTenantID ( te nan tID ) ;
...
}
processOutputChannel ( TenantAwareBO output ) {
...
//1. r e t r i e v e t h e t e n a n t c o n t e x t
te nan tID = getTenantID ( ) ;
//2. s e t t h e t e n a n t c o n t e x t
output . setTenantID ( t ena ntI D ) ;
Fig. 2. Multi-tenant enterprise SaaS application architecture [2] [8]. sendResponse ( output ) ;
}

Procedures that process tenant-aware input data


To improve performance and scalability, microservices
(e.g., service requests, user interface inputs, database read
rely on decentralised data repositories, which leads to
operations) need to retrieve the tenant ID from the devel-
data consistency challenges when synchronising the
opment framework and assign it to the input channel. For
microservice data to a centralised database. Concur-
example, if the multi-tenant microservice is hosted by the
rent database access is rarely supported in sequential
Google App Engine, the tenant ID is retrieved from the
legacy code, while microservices are designed to be
Google Apps domain [17].
scaled-out, allowing multiple microservice instances to Similarly, procedures that process tenant-aware data out-
operate in parallel. puts (e.g., service responses, user interface outputs, database
a. Note to Editor: this could be a sidebar write operations), retrieve the tenant ID and assign it to
the output channel. For example, Windows Azure requires
setting the tenant context before calling the storage API [17].
2 M ULTI -T ENANCY In summary, the multi-tenant source code first retrieves
the tenant ID (tenant context) before receiving data from
A multi-tenant software-as-a-service application fulfils the input channels, sets the tenant context for all required
needs of multiple groups of users, organisations or depart- tenant-aware business objects, and sets the tenant context
ments. The application-level multi-tenancy model allows for all data that is sent to output channels.
application instances to be shared by multiple tenants.
These instances can be configured to meet the tenants’ 2.2 Pattern-based microservice migration of
requirements [12], [8]. While the application instances are single-tenant legacy source code
shared, the tenant data must be separated and must be
only accessible by the tenant that owns it [2] (Figure 2). Multi-tenancy challenges can be decomposed into sub-
In addition to the separation of tenant data, a multi-tenant problems that can be solved using a combination of existing
environment should ensure that computing resources are architectural patterns. In previous work [8] we have shown
equally distributed between the tenants. Performance sepa- how to enable multi-tenancy by applying architectural pat-
ration ensures that a demand increase of resources by one terns for enterprise applications and architectural refactor-
tenant does not negatively affect other tenants [13], [15]. ings. We focused on enabling multi-tenancy in components,
however the techniques and architectural patterns can also
be applied for multi-tenant microservices.
2.1 Single/Multi-Tenancy challenges in The microservice’s data access component to a multi-
legacy source code tenant database can be implemented using the “Two-Level
Legacy enterprise applications often operate in single-tenant Data Mapping Gateway” pattern [8], a combination of the
mode. They were not developed for modern multi-tenant “Data Mapper” pattern and the “Table Data Gateway”
Cloud environments and they usually do not support multi- pattern [5] (Figure 3). The first (Data Mapper) stage maps
tenancy. Single-tenant legacy applications access the data domain objects to the database gateway and implements
repository of a single organisation. basic ’create’, ’read’, ’update’, ’delete’ (CRUD) operations,
It is essential that the migration of single-tenant legacy while the second (Gateway) stage loosely couples the access
code to microservices also considers multi-tenancy. Multi- to a multi-tenant data repository and ensures the separation
tenant microservice instances can be shared by multiple of tenant data.
tenants, configured to meet individual requirements, while The operational logic component of a multi-tenant mi-
strictly separating the tenants’ data. croservice can be implemented using the “Strategy” pat-
To ensure tenant data isolation, the source code needs tern [5]. This pattern allows to modify or extend tenant-
to ensure that the data associated with tenant-aware input specific logic implementations without affecting other parts
channels is tagged with the correct tenant ID, and that this of the implementation [8] (Figure 4).
tenant tag is maintained unmodified until the data reaches The user interface component of a multi-tenant mi-
a tenant-aware output channel. At the source code level, croservice can be implemented using the “Two-Step-View”
3

Fig. 3. Multi-Tenant MVC Model.

Fig. 5. Multi-Tenant MVC View.

Fig. 4. Multi-Tenant MVC Controller.

Fig. 6. Availability and reliability through redundant (stateless) microser-


pattern [8]. This pattern facilitates the decoupling of user vices.
interface data from the layout. Therefore, this allows the
flexible implementation of tenant-specific user interface lay-
outs using the same user interface data (Figure 5).
state, are not scalable and do not facilitate high-availability
and high-reliability [9]. For example, state information has
3 S TATEFULNESS to be made persistent and synchronized in hot standby
A stateful system produces outputs that depend on the configurations and recreated in failover scenarios.
state generated in previous interactions and conversations. Even without failure, statefulness leads to “session affin-
For example, an e-commerce application is stateful if it ity”, which can decrease throughput and increase latency
‘remembers’ previous shopping activities and past visits to due to the need to wait for specific stateful instances. On
the online shop. the other hand, stateless microservices can decrease the
average response time by distributing the load among the
available microservices, without the need of complex state
3.1 Why stateless microservices are ideal management and state synchronisation techniques.
In many cases stateless microservices are ideal because they In a typical microservice architecture (Figure 6), service
better exploit the benefits of Cloud computing, such as on- consumer requests are directed to a load balancer that routes
demand elasticity, load balancing, high availability and high them to available microservice instances. Assuming that all
reliability through redundancy. Stateful microservices on the microservice instances have a similar mean time between
other hand require a more complex logic for managing the failures (MTBF), the system availability (Equation 1) [1] can
4

in a database. Microservice option A includes only session


state variable s1 and therefore microservice A is stateful
only with respect to s1 (writing to s2 or s3 can be considered
an output). Microservice option B is stateful with respect to
s1 and s2 , and microservice option C (which includes the
database) is stateful with respect to s1 , s2 and s3 .

3.2.2 Stateful procedures and objects


Legacy source code can be stateful at the procedure level,
object level, or component level. A stateful procedure re-
turns values that depend on variables whose lifetime ex-
ceeds the procedure execution, i.e., variables that are not
reset between the invocations of the procedure. In PHP, a
stateful procedure can be implemented by declaring a static
variable in a procedure (Listing 2, function p1), or as in
most languages by declaring a global variable outside the
procedure (Listing 2, function p2). A procedure can become
Fig. 7. Statefulness in the context of microservice extraction. itself stateful by invoking and returning values of other
stateful procedures (Listing 2, function p3).

be increased by reducing the mean time to repair (MTTR). Listing 2. Examples of stateful procedures.
f u n c t i o n p1 ( ) {
This can be achieved by allowing the load balancer to stop static $static state = 0;
routing incoming consumer requests to failed microservice $ s t a t i c s t a t e ++;
instances and to re-route them to operating instances in- return $ s t a t i c s t a t e ;
stead. The load balancer is only able to immediately route }
$glbl state ;
requests to already deployed redundant microservices if f u n c t i o n p2 ( ) {
these are stateless. global $glbl state ;
MTBF $ g l b l s t a t e ++;
Availability = (1) return $ g l b l s t a t e ;
MTBF + MTTR }
f u n c t i o n p3 ( ) {
Successful Responses r e t u r n p1 ( ) ;
Reliability = ∗ 100% (2)
Total Requests }

Stateless microservices also achieve a higher system The most common form of legacy code statefulness is
reliability through redundancy. The reliability of a system found in objects (class instances). An object can be stateful
is defined as the ratio between the number of successful by instantiating a class that implements stateful procedures
responses to the total number of requests (Equation 2) [1]. (Section 3.2.2) or one with (state) properties (Listing 3).
If the services are stateless, the load balancer can increase
State properties can be static, or non-static. Static proper-
the probability of receiving a successful response from one
ties lead to class-level statefulness that affect all instances of
of the available microservices by sending the same request
a class when modified (Listing 3, class StatefulStatic). In this
to multiple available microservices, without the need to
case, the state can only be changed for all class instances at
synchronize or replicate session state in a session database.
once, not for individual ones. Non-static properties on the
other hand affect individual class instances (Listing 3, class
3.2 Statefulness in legacy code StatefulNonStatic).
In the context of extracting microservices, legacy source
Listing 3. Stateful classes using a static state variable.
code is stateful if it is capable of retaining values (i.e., its class StatefulStatic {
state) between invocations and is able to generate outputs private s t a t i c $state ;
that depend not only on the current input parameters, but p u b l i c s t a t i c f u n c t i o n s e t S t a t e ( $s ) {
also on the previously retained state (Figure 7). s e l f : : $ s t a t e = $s ;
}
public function getState ( ) {
3.2.1 Statefulness in the context of microservices return s e l f : : $state ;
When analysing the statefulness of legacy code for the pur- }
}
pose of microservice extraction, the inclusion or exclusion class StatefulNonStatic {
of state variable definitions (i.e., in-memory session state) private $state ;
determines whether or not the analysed code is stateful p u b l i c f u n c t i o n s e t S t a t e ( $s ) {
$ t h i s −> s t a t e = $s ;
with respect to a specific state variable. Figure 7 depicts this }
in an example. Session state variable s1 is defined within public function getState ( ) {
the analysed code, a second state variable s2 is defined r e t u r n $ t h i s −> s t a t e ;
outside the analysed code (this could be either session or }
}
domain state) and a third domain state value s3 is stored
5

3.2.3 Stateful components


Stateful legacy components may include stateful procedures
or stateful objects. For migrating such stateful components
to microservices, it is important to distinguish between the
microservice’s public API and its configuration API. The
public API is accessible by service consumers, and therefore
the ideal microservice is stateless with respect to the public
API.
The microservice configuration API on the other hand
allows the automatic deployment and automatic reconfigu-
ration of microservice instances. For example, the configu-
ration API may expose procedures for setting specific multi- Fig. 8. The Eventual Consistency pattern.
tenancy related properties, such as a database connection
setting. Therefore, the configuration API does not necessar-
ily need to be stateless.
code to microservices, multiple instances of a microservice
interact in parallel with a data repository, creating data
3.3 Microservice migration of consistency challenges.
stateful legacy source code
We observe that migrating legacy code to microservices is 4.1 Identifying data consistency issues in legacy code
either a top-down, bottom-up or meet-in-the-middle ap- Legacy source code often contains a mix of read/write
proach between code-level refactoring decisions [7] of the operations to a data repository such as a database, file
legacy code and architectural decisions [18] of the desired system, or other persistent storage. We observe that data
microservice solution. The following architectural SOA pat- consistency issues occur however when multiple instances
terns address statefulness and are applicable in microservice of the same code (i.e., microservices) interact with a shared
architectures: (i) stateful messaging, (ii) partial state deferral, data repository. Therefore, before migrating such code to
(iii) state repository and (iv) stateful service [3]. microservices, data access operations should be grouped
The stateful messaging pattern [3] delegates internal state into read-only or read/write operations.
data to microservice messages, i.e., the service request and Read-only operations only retrieve data from the data
service response. Stateful legacy procedures (Section 3.2.2) repository, but do not update, delete, or create new data.
can be made stateless by replacing the local state variables Read-only data is, for example, a multi-tenancy configura-
with additional parameters and returning values that allow tion setting that is read at start-up and the local copy is never
setting and retrieving the state. These additional state pa- updated during the runtime of the microservice. Read/write
rameters and return values are then linked to the microser- operations update, delete, or create new data in a repository.
vice request and response, respectively. Stateful objects and
components (Section 3.2.3) can also be refactored in the same
way. 4.2 Restructuring legacy code
The partial state deferral pattern [3] is an option if the into data consistent microservices
microservice can remain partially stateful, and the goal is to Microservices include their own data repository, such as a
reduce its memory consumption. dedicated database instance, dedicated database schema in
The state repository pattern [3] defers the state of a mi- a shared database, or dedicated database tables. This leads
croservice into a dedicated state repository. By sharing the to consistency challenges, when the microservice data is
state repository among the microservice instances, these can synchronised with a centralised database.
be refactored to be entirely stateless for the purpose of in- The SOA pattern “Service Data Replication” [3] repli-
creased availability and reliability. This pattern corresponds cates data in a service database. This pattern can be safely
to option “B” in Figure 7. applied for read-only operations. However, read/write op-
The stateful service pattern [3] defers the state of a mi- erations need additional data consistency checks.
croservice into a set of stateful utility services whose only The Cloud computing patterns “Strict Consistency” and
purpose is to manage state information. This pattern isolates “Eventual Consistency” [4] address the consistency problem
the problem of statefulness and defers it to other stateful in Cloud storage solutions. The “Strict Consistency” pattern
services that have the same negative impact on scalability, allows a variable number of data replicas to be read from
availability and reliability. and written to. For example, in a system consisting of n
data replicas, a write operation might access w replicas,
while a read operation accesses r. For each operation it is
4 DATA C ONSISTENCY ensured that n < w + r and strict consistency is guaranteed
Data consistency is a property of a distributed shared data through the number of read or written replicas, by making
storage system. It specifies the allowed behaviour with re- sure that each operation accesses at least one most current
spect to data access operations. Sequential (single-threaded) data version [4].
legacy source code does not encounter data consistency The “Eventual Consistency” pattern [4], [6], [16] is ap-
issues when accessing data in a repository, database, or plied for unreliable or limited bandwidth networks, or
shared memory. However, when migrating such sequential high data volume, where simultaneously accessing multiple
6

replicas is not feasible (Figure 8). Instead, only one replica is [10] I. Gorton. Software Architecture for Big Data and the Cloud, chapter
accessed for read/write operations, resulting in temporary Hyper Scalability - the Changing Face of Software Architecture.
Elsevier, 2017.
data inconsistencies for the benefit of increased availability [11] S. Johann. Dave Thomas on innovating legacy systems. IEEE
and performance. The inconsistencies are eventually cor- Softw., 33(2):105–108, 2016.
rected through synchronisation of the data replicas. Data [12] J. Kabbedijk, C.-P. Bezemer, S. Jansen, and A. Zaidman. Defining
inconsistencies can persist if data entries in different replicas multi-tenancy: A systematic mapping study on the academic and
the industrial perspective. Journal of Systems and Software, 2014.
have been modified at the same time. In such a case, rules [13] V. Narasayya, S. Das, M. Syamala, B. Chandramouli, and
specify how such issues are resolved, for example by simply S. Chaudhuri. Sqlvm: Performance isolation in multi-tenant re-
dropping one of the modified versions [4]. We are of the lational database-as-a-service. 2013.
[14] C. Pautasso, O. Zimmermann, M. Amundsen, J. Lewis, and N. Jo-
view that this rule only works in a non data-critical system, suttis. Microservices in practice, part 1: Reality check and service
while a critical enterprise system would require a roll-back. design. IEEE Software, 34(1):91–98, Jan 2017.
[15] R. Taft, W. Lang, J. Duggan, A. J. Elmore, M. Stonebraker, and
D. DeWitt. Step: Scalable tenant placement for managing database-
5 C ONCLUSION as-a-service deployments. In Proceedings of the Seventh ACM
Symposium on Cloud Computing, pages 388–400. ACM, 2016.
We have described three basic challenges for migrating [16] W. Vogels. Eventually consistent. Communications of the ACM,
52(1):40–44, 2009.
legacy code to microservices. A best practice solution is to
[17] S. Walraven, E. Truyen, and W. Joosen. Comparing PaaS offerings
develop a microservice iteratively, focussing on: in light of SaaS development. Computing, 96(8):669–724, 2014.
(i) eliminating statefulness from the extracted legacy [18] O. Zimmermann. Architectural refactoring: A task-centric view on
software evolution. IEEE Software, 32(2):26–29, 2015.
code, [19] O. Zimmermann. Microservices tenets. Computer Science - Research
(ii) implementing multi-tenancy functionalities, and and Development, pages 1–10, 2016.
(iii) solving potential new introduced data consistency
challenges.
It is important to note that while these three challenges
are interrelated, they are created by different types of re-
quirements. Multi-tenancy is driven by the economic advan-
tages resulting from sharing resources and infrastructure,
while taking into consideration the tenants’ data privacy
needs. Statefulness is a characteristic that influences non-
functional requirements with respect to scalability, reliability
and availability of the modernized system. Finally, data
consistency is a functional requirement that defines the
correct operation of the system and ranges from relaxed
eventual consistent systems to strictly consistent ones.

6 ACKNOWLEDGMENTS
This research was supported in part by ARC-DP grant
DP140103788.

R EFERENCES
[1] E. Bauer and R. Adams. Reliability and Availability of Cloud Com-
puting. Wiley-IEEE Press, 2012.
[2] F. Chong, G. Carraro, and R. Wolter. Multi-tenant data archi-
tecture. http://msdn.microsoft.com/en-us\\/library/aa479086.
aspx, 2006. last accessed Jan 2017.
[3] T. Erl. SOA design patterns. Prentice Hall, Upper Saddle River, NJ,
1st edition, 2009.
[4] C. Fehling, F. Leymann, R. Retter, W. Schupeck, and P. Arbitter.
Cloud Computing Patterns. Springer Verlag, 2013 edition, 2014.
[5] M. Fowler. Patterns of enterprise application architecture. Addison-
Wesley, Boston, 2003.
[6] M. Fowler. Eventual consistency. http://martinfowler.com/
articles/microservice-trade-offs.html#consistency, 2015. last ac-
cessed Dec. 2016.
[7] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactor-
ing: improving the design of existing code. Addison-Wesley, Reading,
MA, 1999.
[8] A. Furda, C. Fidge, A. Barros, and O. Zimmermann. Software
Architecture for Big Data and the Cloud, chapter Re-Engineering
Data-Centric Information Systems for the Cloud - A Method and
Architectural Patterns Promoting Multi-Tenancy. Elsevier, 2017.
[9] M. F. Gholami, F. Daneshgar, G. Low, and G. Beydoun. Cloud
migration process-a survey, evaluation framework, and open chal-
lenges. J. Syst. Softw., 120(C):31–69, 2016.

View publication stats

You might also like