You are on page 1of 10

CHECKLIST 2023

Migrating Your Data


Warehouse to the Cloud:
Five Best Practices
By James Kobielus
Migrating Your Data Warehouse
to the Cloud: Five Best Practices
By James Kobielus

M
odern data runs in cloud-native
environments. Migration of legacy Five best practices for migrating your data
on-premises data to the cloud is warehouse to the cloud:
becoming necessary because enterprises are facing
several critical pain points with their on-premises 1 Modernize enterprise data warehousing
deployments, including siloed, on-premises environments around cloud-native
data stores that prevent unified rollup of data principles
and analytics, limited technical support staff
to manage on-premises data, and scalability 2 Modularize enterprise data warehousing
constraints resulting from inflexible architectures of assets for maximum reuse
on-premises data platforms.
3 Convert legacy enterprise data warehousing
Legacy on-premises data warehouses (DWs) environments
are increasingly showing their age. Over the
past decade, enterprises everywhere have been 4 Automate enterprise migration to modern
migrating to cloud-based DWs that offer greater environments
scalability, performance, flexibility, and cost-
effectiveness. Typically accessed on a fully managed, 5 Augment enterprise teamwork of data
pay-as-you-go basis, the modern cloud DW can warehousing professionals to operationalize
migrated assets

1   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

support both structured and unstructured data • Convergence. Migrate one or more DWs to
assets. In addition to supporting conventional support unified data governance and deliver
ETL-based data integration, the modern cloud DW a “single version of truth” to all downstream
can also provide real-time and low-latency analytics, business intelligence, reporting, and other
automated data onboarding and integration, analytics applications.
orchestrated data pipelines, data engineering and
preparation, self-service data delivery, embedded • Enhancement. Migrate one or more DWs to a
analytics models, and other sophisticated features. fully managed cloud computing environment to
reduce infrastructure and operating costs, boost
This TDWI Checklist shares five best practices for performance and scalability, enable usage-based
migrating your enterprise DW to the cloud and pricing, and improve reliability and availability.
presents recommendations for aligning your DW
migrations with your overall cloud modernization • Evolution. Migrate one or more DWs to a
strategies and business imperatives. cloud-based platform with the flexibility to
evolve into a larger multicloud, virtualized,
mesh, or cloud-to-edge data architecture. The
target cloud DW should be able to ingest and
store new types of structured, semistructured,
Modernize enterprise
1 data warehousing
and unstructured data; support streaming,
low-latency, and real-time data applications;
environments around and enable unification of data integration and
machine learning pipelines.
cloud-native principles
Migration is often far more than “lifting and
Migrating to a modern cloud-based data platform can
shifting” existing DW system footprints to the new
accelerate business transformation. It can expand
environment. Beyond such essential tasks as copying
data accessibility; scale data ingestion, storage,
data from legacy databases and loading it onto
processing, and delivery; and help your organization
the target platform, there are numerous details to
harness new data types for innovative applications.
iron out before switching over to the target cloud
platform. These necessary migration chores
Migrations involve transferring the capabilities,
often include:
services, and assets of legacy data platforms partially
or entirely to newer, more modern platforms. Usually,
• Revising data schemas
the target platform will replace the legacy platform,
with the latter being turned off or repurposed after
• Rewriting stored procedures and user-defined
the migration is complete.
functions

When the data integration and data warehousing


• Tweaking query plans to boost performance
environments are being transferred to a new cloud
platform, the migration project may have any or all
• Changing database drivers and job-scheduling
of the following modernization objectives:
tools

2   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

• Implementing new connectors to synchronize • Interoperability. Cloud data integration


data reliably between the original sources and pipelines optimize connectivity for API-based
the new target cloud platform cloud DWs. They enable mass ingestion from
databases, files, applications, and streaming
The architecture of the legacy on-premises DW sources. They support data storage in
platform might not be suited to take the best cloud platforms and broad metadata-aware
advantage of cloud resources and services. To ensure connectivity through APIs into PaaS, SaaS, and
a successful DW modernization, the first steps in any other cloud services.
migration project should be:
• Performance. Cloud data integration pipelines
• Migrate and tweak ETL code, data engineering are better suited than on-premises ETL/data
workflows, analytics applications, and machine- integration platforms to some use cases, such
learning model libraries to work with the new as fast loading, staging, and processing of
cloud DW platform data through mass ingestion and pushdown
optimization. They support elastic serverless
• Identify business requirements for the DW as processing and they eliminate the need to
critical application infrastructure implement time-consuming, expensive,
and complex upgrades to the ETL, data
• Assess the degree to which legacy DW systems integration, and other capabilities. They also
can continue to meet these requirements support advanced pushdown optimization for
loading data from sources to cloud DWs with
• Highlight the technology constraints, limitations, transformations.
and risk factors that can be eliminated using
modern cloud DW architectures • Agility. Cloud data integration pipelines
make it easier for users to explore and try new
• Refactor reporting, analytics, data governance, capabilities and services as the cloud provider
and other application requirements to align with introduces them rather than requiring users
emerging requirements and with the capabilities to install new software versions in their own
of the target cloud DW platform environments, as with on-premises ETL and
data integration.
Migration of a DW may also involve migration of the
associated data engineering pipeline functions to a • Accessibility. Cloud data integration pipelines
more modern, cloud-based data integration platform. make it easier to democratize data integration
Here are a few key reasons enterprises might want tasks to a wider range of users in a cloud-native
to undertake migration/modernization of the data platform compared to on-premises ETL and data
integration pipeline to the cloud in parallel with DW integration tools.
migration/modernization:

3   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

• Programmability. The target environment

Modularize enteprise should support self-service, visual, low-code


2 data warehousing assets development and operationalization of DW
capabilities as composable containerized
for maximum reuse microservices.

Migration projects are smoother and less disruptive • Containerization. The target environment
if the legacy DW platform’s assets are transferred should enable orchestration of DW capabilities
as modular capabilities to the new modern cloud as microservices within a containerized cloud
architecture. computing fabric.

Modularity enables enterprises to start small with • Elasticity. The target environment should
well-defined use cases for the cloud DW migration. support elastic provisioning of computing,
Organizations can selectively and iteratively migrate storage, memory, and network resources to
specific business domains—along with associated support dynamic changes to DW workloads and
data objects, integration code, ETL mappings, usage patterns.
stored procedures, sessions, and other DW and data
integration assets—in a fashion that minimizes
disruption to the business.

Convert legacy
At the very least, you should migrate DW data sets 3 enterprise data
only after they’ve been made correct, consistent,
and compliant. To ensure a smoothly modular warehousing
migration of legacy DW assets, the target cloud DW
architecture should be architected in conformance
environments
with these tenets:
Migration may involve conversion of data,
integration code, and other artifacts to new formats
• Visibility. The target cloud data environment
that can be more efficiently processed by the
should support discovery, search, and
target platform.
management of all assets through a searchable
data catalog. This catalog should provide a
Conversion is one of the central requirements for
central point of management of all DW and data
any enterprise trying to leverage its substantial
integration assets.
investment in data, mappings, ETL code, and other
assets from legacy DWs. Enterprises may choose
• Openness. The target DW should present open
to move all or only a portion of legacy data over
computing APIs for access, programming,
to their new modern cloud DW. Whatever their
manipulation, monitoring, and management of
overarching business requirements may be, this
all DW and data integration functions.
presents an opportunity to ensure that whatever data
is migrated remains trustworthy, clean, consistent,
and compliant with all relevant mandates.

4   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

This critical task requires a keen focus on whether


the corresponding ETL logic is sufficient as-is or
Automate enterprise
needs to be converted to make it fit for both the 4 migration to modern
migration and for operationalization in the target
cloud DW. In many enterprises, ETL mappings have environments
been built over many years, may number in the
thousands, and certainly represent a substantial Migrations are faster and less prone to human
investment of time and money. Ideally, enterprise error if all necessary activities are automated to
data professionals should be able to reuse as much of the maximum extent possible and if management
that logic as possible instead of starting from scratch. of the target DW and data integration environments
is also automated.
When undertaking a migration of DW assets to a
cloud-computing environment, enterprises should Migrating a DW to the cloud is a complex project
make sure that they also have ETL and other data that can be delivered on time, within budget, and
integration tools that are cloud-native and can with no unforeseen glitches if the appropriate level
continue to be of service after the DW has been of automation is used in its execution. Automated
switched over to the target cloud environment. Using conversion can be a significant cost-saving tactic
a data integration tool that enables reuse of ETL in DW migrations, eliminating much of the need
code and other integration logic can significantly for manual conversion or wholesale rewriting
accelerate successful migration of an on-premises of ETL mappings and other assets. It can also
DW to the cloud. greatly accelerate the migrations while reducing
the incidence of errors being introduced into the
Enterprises should explore tools for converting and migrated data integration and DW assets.
migrating their ETL and data integration assets with
full fidelity and interoperability to a cloud-native At the start of the migration, DW professionals
environment. The assets to be migrated and possibly should estimate their likely level of effort, identify
converted generally include data objects, metadata, relevant sources and targets, characterize the task
stored procedures, ETL mappings, data-engineering breakdowns and dependencies, and prepare a road
workflows, integration code, analytics libraries, map under which the project will be executed in
parameter files, custom scripts, schedulers, and phased sprints. This analysis should also identify the
access controls. tasks that can be automated and precisely how that
will be carried out and supervised.

Where migration automation is concerned, data


professionals should scope several tasks. Prior to
the conversion, they should automate scanning and
assessment of the scope of the assets to be migrated
from the legacy on-premises ETL and DW platforms
to that of the target platform, and preparation of
the target cloud environment to ingest, store, and
manage the assets being migrated. When automating

5   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

conversion of assets, they should validate and test • Select appropriate ETL transformations
that the assets are being converted and/or rewritten
with full fidelity for migration and processing within • Anticipate and proactively resolve data quality
the target cloud environment. and noncompliance issues

Finally, data professionals should automate the


productionization of the assets that have been
migrated. This involves loading, organizing,
Augment enterprise
and listing converted assets in the target cloud
environment. It also entails optimization of the
5 teamwork of data
performance of the target cloud DW functions that warehousing
rely on the migrated assets and it requires use of
AI/ML-, metadata-, and rule-driven approaches to
professionals to
automate ETL and other data pipeline workloads for operationalize
consistency and repeatability.
migrated assets
At the very least, automate the following functions in
Migrations require tight coordination and complex
the target cloud platform:
dependencies among diverse data, IT, and other
stakeholders pulling together toward common
• Validate data as it’s ingested into the data
objectives.
integration pipeline

Migration of DWs from legacy on-premises platforms


• Document data integration pipeline traffic
to modern clouds requires close collaboration
patterns
among DW and data integration professionals. To be
successful, all these technical stakeholders require
• Correlate data integration pipeline events
training and skills enhancement to understand
and work with the target cloud DW, the enabling
• Identify data integration root causes
ETL and data engineering tools, and the migration
tools needed to manage conversion, movement, and
• Remediate data integration pipeline issues
optimization of the migrated assets.

• Optimize ETL and other data integration


One of the key staff augmentation strategies
pipeline processing
in DW and data integration migrations is to
enlist consultants, system integrators, and other
• Map business glossary terms to underlying
professional services partners. They may have
DW data
the expertise needed to drive these projects while
also training enterprise technical staff, identifying
• Tag and classify data stored in the DW
the best tools and platforms, and developing
standardized processes for managing it all.
• Predict source-to-target ETL mappings

6   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

Augmentation of the technical staff’s collaborative cloud data platform capable of ingesting and
productivity is a key benefit of the automation storing new types of structured, semistructured,
features built into DW and data integration and unstructured data, and providing a scalable
migration tools. When used effectively, these tools platform for unification of data integration and
can deliver such benefits as: MLOps pipelines

• Accelerated data integration and DW migrations • Adopt data engineering, validation, security,
with fewer scheduling challenges protection, and governance tools optimized for
the target cloud DW
• Greater data integration and DW asset reusability
with less conversion effort • Make necessary investments in enabling cloud
infrastructure for DW modernization, especially
• Improved data integration and DW developer and AI/ML-automated data engineering pipelines and
administrator productivity searchable data catalogs

• Faster data integration and DW code generation • Deploy a cloud DW that boosts the density and
with fewer errors capacity utilization of CPU, memory, storage,
and other hardware utilization on the back-end
• Higher-quality data integration and DW code cloud platforms
generation, testing, and validation with less
rework • Deliver consumable cloud DW functions at low
latency via a consistent, secure, web-native API
• Contextual recommendations on monitoring, that can be called from any client application on
management, and troubleshooting pipeline a pay-as-you-go basis
issues, as well as profiling, matching, merging,
cleansing, exception handling, issue resolution, When undertaking the migration of a legacy
and other stewardship functions on-premises DW to this target cloud DW, your
enterprise should:

• Migrate ETL, data integration, and other data


engineering capabilities to a more modern
Concluding thoughts cloud-based data integration pipeline platform in
parallel with migration of the on-premises legacy
Migrating data from an on-premises DW to a
DW to a modern cloud-based architecture
cloud-based DW can be a complex project with
many moving parts. When identifying the optimal
• Selectively and iteratively migrate specific
target architecture for the DW migration, your
business domains—along with associated data
enterprise should:
objects, integration code, ETL mappings, stored
procedures, sessions, and other DW and data
• Architect the target cloud DW so that it has the
integration assets—in a fashion that minimizes
flexibility to evolve into a more comprehensive
disruption to the business

7   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

• Adopt tools for converting data objects,


metadata, stored procedures, ETL mappings,
About our sponsor
data-engineering workflows, integration code,
analytics libraries, parameter files, custom
scripts, schedulers, access controls, and other
data integration assets with full fidelity and
snowflake.com
interoperability to a cloud-native environment
Snowflake enables every organization to mobilize
• Use AI/ML-, metadata-, and rule-driven
their data with Snowflake’s Data Cloud. Customers
approaches to automate ETL and other data
use the Data Cloud to unite siloed data, discover and
pipeline workloads for consistency and
securely share data, and execute diverse analytic
repeatability
workloads. Wherever data or users live, Snowflake
delivers a single data experience that spans multiple
• Provide DW and data integration technical
clouds and geographies. Thousands of customers
stakeholders with the necessary training and
across many industries, including 543 of the 2022
skills enhancement to understand and work
Forbes Global 2000 (G2K) as of October 31, 2022, use
with your target cloud DW, the enabling ETL and
the Snowflake Data Cloud to power their businesses.
data engineering tools, and the migration tools
needed to manage conversion, movement, and
Learn more at snowflake.com.
optimization of the migrated assets

8   TDWI RESEARCH tdwi.org


TDWI CHECKLIST REPORT: MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: FIVE BEST PRACTICES

About the author About TDWI Research


James Kobielus is senior director of TDWI Research provides industry-leading research
research for data management at and advice for data and analytics professionals
TDWI. He is a veteran industry analyst, worldwide. TDWI Research focuses on modern data
consultant, author, speaker, and management, analytics, and data science approaches
blogger in analytics and data and teams up with industry thought leaders and
management. He focuses on advanced practitioners to deliver both broad and deep
analytics, artificial intelligence, and cloud understanding of business and technical challenges
computing. Kobielus has held positions at Futurum surrounding the deployment and use of data and
Research, SiliconANGLE Wikibon, Forrester analytics. TDWI Research offers in-depth research
Research, Current Analysis, and the Burton Group reports, commentary, assessment, inquiry services,
and also served as senior program director, product and topical conferences as well as strategic planning
marketing for big data analytics, for IBM, where he services to user and vendor organizations.
was both a subject matter expert and a strategist on
thought leadership and content marketing programs
targeted at the data science community. You can
reach him by email (jkobielus@tdwi.org), on
Twitter (@jameskobielus), and on LinkedIn
About TDWI Checklist Reports
(https://www.linkedin.com/in/jameskobielus/).
TDWI Checklist Reports provide an overview of
success factors for a specific project in business
intelligence, data warehousing, analytics, or a related
data management discipline. Companies may use
this overview to get organized before beginning a
project or to identify goals and areas of improvement
for current projects.

A Division of 1105 Media


6300 Canoga Avenue, Suite 1150
© 2023 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or
Woodland Hills, CA 91367 part are prohibited except by written permission. Email requests or feedback to info@tdwi.org.

Product and company names mentioned herein may be trademarks and/or registered trademarks
E info@tdwi.org of their respective companies. Inclusion of a vendor, product, or service in TDWI research does
not constitute an endorsement by TDWI or its management. Sponsorship of a publication should
tdwi.org not be construed as an endorsement of the sponsor organization or validation of its claims.

9   TDWI RESEARCH tdwi.org

You might also like