You are on page 1of 12

7 CONSIDERATIONS FOR

DATA LAKE MIGRATION


EBOOK
3 Why large enterprises are investing in data lakes
4 Define your company’s vision
5 Assess your current state
6 Set your goals
7 Understand costs
8 Focus on key capabilities
9 Be prepared
10 Think ahead
11 Conclusion
12 About Snowflake
WHY LARGE ENTERPRISES
ARE INVESTING IN DATA LAKES
Heading into 2020, many companies were already generating Data lakes are an increasingly popular way of accomplishing this.
or accumulating more data than they could use. The pandemic Indeed, the global data lake market (estimated at $3.74 billion in
exacerbated this problem by forcing some industries to accelerate 2020) is expected to expand at a compound annual growth rate of
their digitalization efforts and shift more of their operations to the 29.9% over the next five years, landing at $17.6 billion by 2026,
cloud, which resulted in members of the C-suite having more data according to Research and Markets.2
than they knew what to do with. That said, the decision to fully migrate a company’s data is just a
It’s clear that the ability to harness data and turn it into actionable start, and the work of corralling all of its disparate data sets can be
insights will determine which companies are competitive over the long daunting. At its best, the process can be complex; at its worst, it can
term—and which are likely to fall behind. According to a prediction be labor-intensive and messy.
from Gartner®, “by 2023, organizations that promote data sharing will Before diving into a data lake migration, it’s crucial for business and
outperform their peers on most business value metrics.”1 technology leaders—whether they be CEOs, CFOs, or CIOs—to ask
The need for data organization and accessibility became more acute the right questions up front and make the appropriate preparations.
during the COVID-19 pandemic, leading many C-suite executives to Given the importance and sensitivity of such an endeavor, it’s also
explore comprehensive data migration efforts in order to bring data vital to partner with the right experts.
together in a single platform where it’s accessible and usable by a In the pages that follow, we’ll outline seven parameters every company
variety of users. should consider when approaching a data lake migration.
CHAMPION GUIDES
DEFINE YOUR
COMPANY’S VISION

WHAT DOES YOUR DATA-DRIVEN IDEAL Asking these questions ahead of time will help you
LOOK LIKE? frame the goal and better outline the right solution
for your business. The answers should also help steer
Before getting into the details of how your decision-making throughout the process or show that
company would execute a data lake migration, your data problems are due to issues unrelated to your
step back and explore some of the more data lake.
fundamental questions regarding your data use Having a data role model to emulate can also be
and its potential. Here are a few to start with: helpful. For example, after a successful data lake
partnership, 95% of the employees at the energy firm
• Why are you considering this move? Devon have access to the same complete database
• What is your dream data usage scenario? and can make decisions quickly and autonomously.
Is this type of data democratization important to your
• I f neither bandwidth nor legacy systems company? If so, what do you need to do to get there?
were an issue, how would your company best
employ data to empower decision-making The more leaders can identify their ultimate rationale
across the organization? for considering a data migration, the closer they will be
to achieving their goals.

4
CHAMPION GUIDES
ASSESS YOUR
CURRENT STATE

PERFORM A COMPLETE ANALYSIS OF DATA In addition to documenting their company’s current


STORAGE, USAGE, AND SYSTEMS data storage and usage, leaders should have a full
understanding of who in the company has access to
Once you know where you want to go, which information, where gaps exist, and who makes
take stock of where you are. For companies decisions about these hierarchies and rules. Only
looking to break down silos and pull together then can leaders determine where bottlenecks may
many forms of data via multiple systems and form and how decision-making should be managed
in order to better organize and streamline adoption
protocols, this requires some honest scrutiny.
after a migration.
Among the questions leaders need to ask are:

• Where is all of our data now?


• How do we use it?
•  hat is holding us back from achieving our
W
data ideal?

5
CHAMPION GUIDES
SET YOUR GOALS

IDENTIFY AND QUANTIFY PAIN POINTS Inefficient data pipelines slow access Companies should draft a set of quantifiable, achievable
YOU AIM TO SOLVE and querying KPIs related to the problems associated with data
sprawl. These KPIs can be used to gauge the payoff of
The movement of data via temporary or imperfect
Now is the time to get more specific and a complete data migration. For example:
pipelines can also contribute to data usage
map out issues you are trying to fix and how •  roject duration: How long will the actual migration
P
breakdowns—in part because these makeshift
you plan to address them. It helps to put processes tend to be slow. take? Which processes are likely to take longer,
this in writing in the form of measurable, and why?
Take Scripps Health, for example. By moving to
achievable results. Snowflake’s platform, Scripps Health achieved a 50%
•  isruption: How much will this migration temporarily
D
cut off critical systems or slow down vital processes?
One of the main reasons companies invest in reduction in full-time equivalent (FTE) staff dedicated What workloads will be increased or put on hold?
data lakes is to battle “data sprawl.” This refers to database administration, and it saved 60% for Will IT be strained during this transition?
software licensing. Previously, users retrieved data
to data that is housed all over the place, often •  ost: Various parties will need to be compensated
C
and analyzed it on their own systems, creating data
inconsistently, with no single group or leader both to complete the migration itself and to steer
silos. Now, users stay entirely in Snowflake, and silos operations during any disruptions, including current
having full access to or control over its usage. are eliminated. employees and contractors. Companies may also
need to pay for additional monitoring or tools to
Data sprawl results in a limited view of the Decentralized data collection and housing provide visibility into the migration’s progress.
customer and business can create governance challenges •  sage: Companies can set usage estimates based
U
The more scattered a company’s data is, the harder A lack of centralized data control can lead to disjointed on expected workloads, but the time spent with
it is to act swiftly and with the best intelligence. decision-making and security vulnerabilities since the data lake and/or the number of users involved
A 2021 IDC Technology Spotlight found that typical there is no holistic view or rule for managing access. can fluctuate. Companies should take this into
data pipelines are processing up to 10 different account ahead of time since many may benefit
This can compromise the security of sensitive
types of data coming from up to nine distinct from a consumption-based pricing model via
customer or business data while also making it
sources.3 For example, marketers’ efforts to deliver which they pay only for what they use. Read more
difficult for individuals to do their jobs. IT leaders about the importance of pricing decisions and
personalized content, ads, and offers can be stymied require the ability to oversee and make decisions on why consumption pricing is a good choice on the
by disjointed data sets that yield incomplete views data centrally, which siloing naturally inhibits. Snowflake blog.
of consumers. More broadly, businesses can end up
with incomplete views of their true revenue drivers,
leading them to misinterpret opportunities or focus
on the wrong markets.

6
CHAMPION GUIDES
UNDERSTAND COSTS

ASSESS YOUR POTENTIAL FOR SAVINGS Here are five less-obvious costs to include in any Remember that moving to a new platform may
evaluation. These can help clarify the value of a present some up-front costs while you are running
As part of their assessment, companies should data lake: multiple systems at once. The benefits of a migration
fully account for current costs of their data will increase with each subsequent silo that you
•  xcess capacity: All systems have excess capacity
E
merge. It often helps to start small and expand over
platform. Their list should include software for load spikes and growth, which can be amortized if
merged in a data lake. For a company that maintains time so that it becomes easier to showcase the
and server bandwidth the firm is paying for
an excess capacity of 20% per system, this could benefits of migration to decision-makers.
currently, what siloing or gaps are costing in amount to considerable savings. Finally, thinking through the worst-case scenarios—
terms of lost opportunities—and, of course,
• I T, accounting, and engineering time: Resources such as a data security breach—is a difficult but
what price a migration will ultimately incur. that are currently being allocated to support a variety worthwhile exercise. When data is distributed
of silos can be streamlined and merged to greatly throughout an organization, the cost of attempting
reduce human costs and complexity. to secure multiple silos, along with governance
•  raining and user productivity: When you have a
T management, logging, and access control, can seem
diversity of systems, users need time and training prohibitive. This leads some companies to adopt
to get up to speed and remain productive on each a more lax approach, potentially exposing them to
separate data silo. This cost is often distributed massive harm from a breach, leak, or lawsuit.
across the entire organization.
In a data lake, management can be much simpler
•  econdary system costs: Integrations with other
S and more centralized. Teams can more easily run
technologies can easily be overlooked but are often audits to ensure that data control is being tightly
quite expensive. Consolidating the workloads these maintained. They can also identify a breach much
many systems support into a single platform can
more quickly if it occurs; centralizing everything in
significantly reduce costs. Quantifying these costs
a data lake provides a more comprehensive view of
will depend greatly on the business in question.
their data, and there are fewer places to search when
•  pportunity costs: Many companies have a “wish
O performing audits and investigations.
list” of things they’d like to do if they had more
complete data sources or potential revenue streams As an example, global investment firm EQT Group
they could tap into if they had greater data access. uses Snowflake’s data lake product to “mask”
Gauging the value of what you might be missing certain data that includes personal information from
out on is challenging yet crucial, as calculating consumers. This enabled far better governance and
opportunities is complex. It often requires estimating compliance, helping EQT Group move quickly to a
not just missed revenue but accounting for potential privacy-first way of operating and avoid any potential
delays and bottlenecks. fines or interest from regulators.

7
CHAMPION GUIDES
FOCUS ON KEY
CAPABILITIES

WHAT DOES YOUR COMPANY NEED IN A Here are a few areas to look for: • Concurrency: How many users and divisions will
DATA LAKE SOLUTION? need to access the data lake? (Bear in mind that
•  now and protect your data: A common problem
K
once data silos are broken down and access is
with data lakes is that they are cluttered with so
There are numerous providers in the market simplified, a surprising number of people in your
much data that they become unusable. If you
organization can potentially make use of a data lake.)
touting a range of capabilities under the banner don’t know what’s in your data lake, how can you
Ensure you pick a provider who can provide the
of a data lake. When vetting potential partners, confidently control who has access to that data? A
sign-on capabilities, auditing, and, most importantly,
good solution should help you understand what’s
it’s important to be aware of the capabilities elasticity to ensure the data lake doesn’t degrade as
in your data lake, easily enforce access policies, and
that will do the most to address the pain points more users are onboarded.
monitor usage to ensure only the right people are
you outlined in Section 3—and drive the most accessing the right data. It should also help you • I nteroperability: Your business will grow and evolve,
impact for your business. generate easy-to-run reports that allow for a better which means that new partners, data systems, and
understanding of usage and currency, enabling tools are inevitable. The platform supporting your
smarter decision-making. data lake should be flexible and adaptable enough
to satisfy your needs as they change.
•  cale: For most companies, it’s reasonable to expect
S
that scale requirements will increase over time. This •  ontinuity: Ideally, companies should not notice
C
makes it important to ensure that the data lake’s any differences in access or performance if there
capabilities can grow with your business but remain is a need to fail over across regions or clouds. In
cost-efficient when operating at a smaller scale. In the event of a disruption, the right platform makes
addition, the scalability of storage and computing failover and failback seamless.
resources needs to be independent so companies
don’t end up paying for more than they need.
•  roductivity: It’s important to consider which types
P
of users in your business will need to interact with
the data lake today or in the future. Are real-time
or batch interfaces required? Which languages
(for example, SQL, Python, Java) do analysts and
developers use and prefer?

8
CHAMPION GUIDES
BE PREPARED

GET READY FOR THE MIGRATION Process mapping is a great place to start. The best When going through these questions, it’s helpful to
JOURNEY BY THINKING EVERYTHING way to proceed is by asking the right pre-migration think about today, tomorrow, and next year. Initially,
THROUGH questions: your company’s focus will be on the immediate impact
• How is your data structured today? of centralized storage and accessibility, but you’ll
By now, you’ve conducted a deep, far-reaching eventually need to think about interoperability. It’s a
examination. You have your data north star, • Where does it live?
good bet that your storage needs will evolve over time,
and you’ve calculated all the pertinent costs •  hat are your company’s language
W so anticipate and prepare for such scenarios.
requirements?
and tradeoffs. You’re convinced that a data lake Whichever direction your company decides to take,
will be greatly beneficial to the business. It’s •  hat is the most crucial workload currently?
W mapping the order of migration steps is crucial. The
What elements do you need to have working exact path a migration takes can shape the data
time to prepare for the actual migration.
on Day One?
lake’s future usability. For example, establishing data
•  hich processes are currently being replicated
W security protocols for a migration is crucial for most
and can be streamlined? companies, and thinking through initial use cases and
•  hich processes are no longer necessary and
W requirements should usually happen first.
can be eliminated? The way a migration unfolds, and the order in which
•  ow will your data scientists’ and analysts’
H different functionalities of a data lake are implemented,
upcoming projects be impacted by this change? will have a direct impact on employees—some of whom
Are they still supported? will require new skills and intensive training. Ideally,
such training should be provided in parallel with the
rollout of relevant phases of a data lake adoption.

9
CHAMPION GUIDES
THINK AHEAD

CONSIDER YOUR POST-MIGRATION While business leaders look to complete a data lake
ARCHITECTURE AND WHERE YOUR migration in a way that best unlocks efficiency and
INDUSTRY IS HEADED effectiveness today, it’s likely they’ll find themselves
in different or entirely new categories in a few
Just as a company is never “done” transforming short years. This means that leaders should plan
into a data-driven organization, data lake for different eventualities while considering how
adoption doesn’t end once a migration is their data needs might change over time. For some
complete. As the past few years have taught helpful direction, take a look at this TDWI Checklist
us, industries can change abruptly, consumer for companies looking to future-proof their data
lake strategies.
preferences and habits are forever fluid,
and the march of technological innovation Given the pace of change and innovation, some
companies’ organizational structures—or even their
never ceases.
business models—could be unrecognizable in a few
years. The most successful enterprises will have a
flexible data platform that can adapt to meet their
business needs wherever they land.

10
CHAMPION GUIDES
CONCLUSION

There’s no doubt that executing a data It starts with making sure you and your organization’s Learn more about Snowflake data lake solutions
migration is a major decision and that leaders are aligned on why they want to invest in a and how you can streamline a migration project at
choosing the right data lake partner and migration in the first place. Mission clarity will help snowflake.com/workloads/data-lake
guide the rest of the process. With this in place, leaders
strategy can be daunting. But like most major
must conduct a comprehensive analysis of their current
operational transitions, challenges can be data practices, determine where they’d like to make
mitigated with the right preparation. These changes and why, and put lofty but achievable goals
seven considerations are designed to help down on paper. Of course, a full examination of current
steer leaders toward data lake success. and future costs is essential.
Once the decision to move forward has been made,
technology leaders and their migration teams/partners
must think through what kind of cloud partner they
need, which criteria and tasks are most important,
and how their needs are likely to evolve over time.
The data centralization payoff should be measurable
and immediate. Over time, the companies that are
most diligent about this process will be the ones best
prepared for the inevitable next wave of (data) change.

11
ABOUT SNOWFLAKE
Snowflake delivers the Data Cloud—a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency,
and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute
diverse analytic workloads. Wherever data or users live, Snowflake delivers a single and seamless experience across multiple public clouds.
Snowflake’s platform is the engine that powers and provides access to the Data Cloud, creating a solution for data warehousing, data lakes,
data engineering, data science, data application development, and data sharing. Join Snowflake customers, partners, and data providers already
taking their businesses to new frontiers in the Data Cloud. snowflake.com

© 2022 Snowflake Inc. All rights reserved. Snowflake, the Snowflake logo, and all other Snowflake product, feature and service names mentioned herein
are registered trademarks or trademarks of Snowflake Inc. in the United States and other countries. All other brand names or logos mentioned or used
herein are for identification purposes only and may be the trademarks of their respective holder(s). Snowflake may not be associated with, or be
sponsored or endorsed by, any such holder(s).

CITATIONS
1
Gartner, “Data Sharing Is a Business Necessity to Accelerate Digital Business,” Laurence Goasduff, May 20, 2021, www.gartner.com/smarterwithgartner/data-sharing-is-a-business-necessity-to-accelerate-digital-business.
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
2
Research and Markets, “Data Lakes Market - Growth, Trends, COVID-19 Impact, and Forecasts (2021 - 2026),” https://bit.ly/3roSSRW, October 2021.

IDC Technology Spotlight, sponsored by Matillion, “Calming the Storm: Cloud-Native Data Integration,” doc #US47518521, March 2021
3

You might also like