You are on page 1of 4

The Complete Data Migration Methodology

Vivek Salunke
IT Elite Systems Pvt. Ltd.
What is data migration?
Simply put data migration is the process of moving of data from one data source to
another data source. Generally the term is used to refer to migration of data from legacy
database application to an RDBMS. It primarily involves rearrangement of database
entities such as physical data, schemas, view, stored procedure and triggers.
Why migrate data?
The trend is more and more companies are moving away from their legacy database to
commonly available and popular RDBMS system. In many of the re-engineering
projects, it becomes necessary to move away from the existing database system to the
database systems, which provide for technological advances in the software industry.
This paper attempts to
Identify the complexities involved in data migration.
Identify the phases in the data migration project
Define a conceptual road map for efficient data migration
Complexities
Data migration is done between two disparate database systems and is the reason
involves incompatible data structures, inconsistent and inaccurate data. Insufficient
documentation of the existing database system complicates the issues further.
Data Structure incompatibilities:

Incompatible data
structure
Inconsistent data
Inadequate
documentation
Large Size database
24x7 application

The major need for any re-engineering project induces data structure changes in the target
database. Moreover proprietary data structure between the source and the target RDBMS
system introduces additional complexity. For instance a data type blob in Informix is
handled differently in Oracle.

In accurate / In-consistent data:


If the source database is defined without foreign key constraints then making the data
clean in the target database becomes a big task in itself. At the same time we have to
make sure that live data is not lost because of in-accuracies in the source database.
In-adequate documentation:
This is mostly true in old legacy systems where the systems are not documented enough
for the database migration team to do justice to the data migration task. The lack of
documentation induces re-work after the proto-type stage of migration is over. Before we
can even think about moving the first element from one system to another, we need to
completely understand both environments. This understanding cannot stop at the data
level. Full understanding of the entire architecture, in the As-Is and To-Be worlds, is
required for a complete and accurate migration
Size of the source Database:
The size of the source database also plays a major in the data migration plan. The
resultant road map for data migration should take into consideration the size to calculate
the time and temporary space required for migration.
Database downtime constraints:
Many of the mission critical applications of the enterprises require that the application is
up and running 24x7. In such scenario it is very important that the migration effort is time
bound.
Data Migration Phases
A logical process flow comprising distinct phases for key activities such as strategy,
analysis, design, development, testing and implementation, is essential to efficient and
trouble-free data migration. The ideal data migration project plan should be broken down
into phases that mirror the overall project development phases. These phases are: Phase
Strategy
Analysis
Design
Development
Test/Implementation

Objectives
Determine whether or not the objectives documented in
the pre-strategy phase are achievable.
Perform a complete examination in order to make a
checklist of the legacy data elements that we know need
migrating.
Map the legacy data elements to the columns.
Develop data migration code or write scripts using the
specific data migration tool chosen during the analysis
phase.
Deals with the semantics of the scripting language used
in the transformation effort, specifically the logical and
physical errors. Identify and resolve logical errors.

Strategy Phase: Data migration should be treated as a project in itself and not a small
part of the application re-engineering project. The project milestones and the timelines
for the migration projects should be outlined in the plan. The
Strategy
activities should include tasks such as database layout,
Analysis
design, development, testing, piloting the application, bench
Design
marking, documentation, and actual migration.
Develop
Test
Implementation

Following activities should be carried out during strategy phase:

Determine the source and the target system by identifying all sources of the
incoming data.
Determine the timeframe we can expect for the duration of the data migration.
This is especially important if you are migrating a large 24x7 system that cannot
afford any significant downtime.

Analysis Phase:
After the analysis phase you must resolve the following:
Determine the required schema changes including which elements are missing in
the current schema that will be needed to support the new application
functionality.
Define the required data scrubbing to be performed.
Perform topology analysis and obtain performance estimates. You will need to
know precisely how long will the data migration effort need to run. This will add
to the required downtime.
Decide what tool/language the data migration utilities will be performed in
Design Phase:
At this point we are in a position to generate the physical database design
Development Phase:
During the development phase, the actual data migration code is developed, or the scripts
are written if a specific data migration tool was chosen during the analysis phase.
Generally the following issues are addressed:
1. Handling data irregularities and exceptions during transformations (such as
incompatible data types). Ideally, you should have your tool create a data
irregularities and integrity violations report since the source data will need to be
corrected by the end users or the legacy system administrators.

2. Maximizing code efficiency through increasing the degree of parallelism and


performing tuning. Again remember that the timeframe for the migration can be
quite tight.

Test/Implementation Phase:
The Test phase is where we identify and resolve logical errors. The first step is to execute
the mapping. Even if the mapping is completed successfully, we must still identify:
The number records we expect this script to create.
Whether or not the correct number of records got created, and if not, why?
Was the data loaded into the correct fields?
Was the data correctly formatted?
The ultimate deliverable of the implementation phase is the completion of the data
migration effort. However, you must remember that the data migration might not be a
one-time effort: you may end up doing cyclical iterations that can spread over a period
when the legacy data would be continually fed into the new system.

Conclusion
Data Migration is a necessary evil, but not an impossible one to conquer. Data migration can
make or break a project. The key is to prepare for it very early on, and monitor it carefully
throughout the process. Project timelines tend to become more rigid as time passes, so it really
makes sense to meet migration head on. Devoted teams with a clearly defined project plan, from
project inception, and armed with automated tools where applicable, is indeed the formula to
success.