You are on page 1of 10

Incorta | eBook

4 COMMON MISTAKES TO
AVOID WHEN OPTIMIZING
YOUR DATA WAREHOUSE
WORK WITHIN THE DATA WAREHOUSE’S INHERENT
LIMITATIONS—NOT AGAINST THEM.
DATA WAREHOUSES:
A COMPLEXITY OF LAYERS
In some of the beautiful, ancient cities of the world, there are two cities:
the modern, bustling city above ground and the city of archaic, layered
architecture underneath. That’s because when buildings were rebuilt • The data warehouse team is besieged with new requests they can’t
in ancient times, workers tore down the old structures and used the meet on a timely basis.
original material to form the foundations of the new buildings. Often, old • Users frequently want disparate data added to their analyses.
buildings were stripped of their roofs then filled with debris to make solid
• Many users—out of sheer desperation—load data extracts into Excel,
foundations for new buildings or even entire neighborhoods. As a result,
so they can try to figure out on their own, using whatever means at
for example, there are places in Rome where the layers and remnants of their disposal, what’s going on.
the original Roman civilization stretch as deep as 60 feet below ground.
• Ongoing support costs are staggering in terms of staff, licensing, and
maintenance.
Many companies today have a corporate data infrastructure that
resembles this kind of ancient city built layer upon layer in these old • Rapidly changing user complexity and specific user needs simply
couldn’t be predicted when data warehousing foundations were built.
worlds. Thousands of users rely upon often hard-to access data stored in
complex systems to get the information they need to run their company,
their department, their research efforts, their market research, their
financial closings, and on and on. Yet, with each layer, getting to trusted
information easily and effectively becomes more and more difficult.

At the heart of this layered maze of hardware, software, data A lot of IT teams would like to walk away from all of the complexity and
repositories, and thousands of stored reports sits your legacy data start over with a new approach. But for those companies with data
warehouse. Internal and external data volumes have skyrocketed, and— warehouse strategies too deeply embedded to abandon altogether, there
like the layered architecture of yesteryear—the more things you add to is a way to squeeze more performance from existing data warehouses.
your data architecture, the more complex it and your data warehouse First, here are four common mistakes you definitely want to avoid.
can become. But what was state-of-the-art technology 20 years ago is
now in trouble.

.02 Incorta | eBook


SQUEEZING MORE PERFORMANCE FROM A DATA WAREHOUSE—
DON’T MAKE THESE FOUR COMMON MISTAKES

Mistake #1 Mistake #2 Mistake #3 Mistake #4

Assuming a data warehouse Believing the cloud’s low price Thinking open-source, Big Believing continual data model
appliance is the answer. and “elastically scaling data Data technologies will get you and Extract Transform Load
stores” will solve everything. where you need to be. (ETL) tuning will give you the
performance gains you need.

Let’s discuss each mistake in more detail.

.03 Incorta | eBook


MISTAKE #1

ASSUMING A DATA WAREHOUSE


APPLIANCE IS THE ANSWER.
Many companies still concerned about rapidly growing data stores and complex
performance issues sometimes turn to a data warehouse appliance solution.
These finely tuned and very expensive systems can be tempting, but the problem
is this one: these appliances attempt to solve your problems by using brute force—
massive hardware resources.

An appliance might reduce the time required to run queries against billions
of rows, but it does so only through extensive hardware and memory
configurations—it doesn’t get to the heart of performance problems caused by the
overall data warehouse architecture.

The costs for these systems, the needed data migrations, and the corresponding
managed services also are staggering, even for vendors who have built
sophisticated systems running on commodity hardware. Many IT leaders who
ventured down this path still find their data warehouse performance and data
access needs left unfulfilled.

.04 Incorta | eBook


MISTAKE #2

BELIEVING THE CLOUD’S LOW PRICE AND


“ELASTICALLY SCALING DATA STORES”
WILL SOLVE EVERYTHING.
In many cases, the cloud solves real problems, especially regarding Plus, migrating to the cloud is no easy feat. It involves many steps for
data volumes and scaling out computational power. And no one can many components, such as: ETL; migrating the data itself; rebuilding
dispute that the cloud is where almost everything is headed. data pipelines and connections; and migrating metadata, users, and
applications.
But moving your struggling data warehouse to the cloud merely results
in a cloud-based, struggling data warehouse. If, despite the above, you’re considering migrating your data warehouse
to the cloud for cost reduction reasons, you should know that many
Modernizing your enterprise data warehouse requires many companies find cloud storage and processing costs add up quickly,
components—including architecture, management processes, and resulting in amounts much, much higher than anticipated.sophisticated
user-vetted requirements—and the cloud really addresses only “the systems running on commodity hardware. Many IT leaders who
platform” side of the equation. ventured down this path still find their data warehouse performance
and data access needs left unfulfilled.

.05 Incorta | eBook


MISTAKE #3

THINKING OPEN-SOURCE, BIG DATA TECHNOLOGIES


WILL GET YOU WHERE YOU NEED TO BE.
Most data warehouses fail altogether or fail to deliver all of their
promised benefits when it comes to performance and analytics. And
Big Data project results are even worse—Gartner estimates 85 percent • Requires new headcount possessing specialized, scarce, and
of Big Data projects fail to move past preliminary stages.1 expensive skills;

For instance, the Big Data technology platform Hadoop was never • Often requires the implementation of new reporting tools that end
users might not welcome;
designed to be a data warehouse, yet eager data architects and
scientists experiment with Hadoop on projects such as data lakes. • Is not a database management system, so you will need to
implement a whole new set of tools to get it to do what you want
They soon run into problems, however, because these architectures
it to do;
lack discipline in critical areas like integration of outside sources of
• Is very complex, requiring many external technologies to make it
data, reduction of reporting stress on production systems, data security,
work; and
historical analysis, data governance, user-friendly data structures and
• Performs poorly on complex queries for many reasons, so
schemas, and—ultimately—delivering a single version of the truth.
improving performance will require many additional commercial
There’s also a misconception that a Big Data approach like Hadoop and open source systems.

is less expensive. While some data actions using Hadoop indeed can
cost less, building an entire data warehouse and analytics solution
ultimately can be massively more expensive due to the cost of writing
complex queries and analysis. And remember, Hadoop:

The lure of a Big Data approach might be enticing, but beware the
risky, expensive, and complex architecture you’d need to embrace,
with no guarantee your data warehouse performance and analytics
objectives will ever be achieved.

.06 Incorta | eBook


MISTAKE #4

BELIEVING CONTINUAL DATA MODEL AND ETL TUNING


WILL GIVE YOU THE PERFORMANCE GAINS YOU NEED.
While brilliant, dedicated analytics data modeling gurus have built and Then, of course, there’s ETL. ETL is a complex, ever-changing process
maintained many thousands of star or snowflake schemas, some of that needs to be constantly tuned by specially trained ETL teams. But
them also describe those beautifully engineered models as “the curse this constant tuning is merely required maintenance—it’s not a viable
of data analytics” because they often limit analytics results: users can’t strategy for boosting data warehouse performance. That’s because ETL
see what they want to see and can’t get at the level of detail they need. tuning is typically a very manual process where a lot of time and effort
And, when limitations become an insurmountable barrier to obtaining yields only incrementally small results.
much-needed business insights or new, disparate data needs to be
With these data model and ETL challenges, performance issues can
loaded into the analytics environment, the data models must be taken
quickly emerge when companies need to scale up platforms due to
back to the drawing board and tuned once again.
massively increasing data volumes, data complexities, and disparate
Since analytics is a process, not an IT project, data modeling and data. Often, improved modeling and ETL processes just don’t scale. As
database tuning needs just keep growing—costing companies millions a result, this type of tuning will consume all of your resources—both
of dollars and anxious users awaiting their results millions of minutes people and money.
along the way. It’s a lose-lose situation.

.07 Incorta | eBook


A POWERFUL ALTERNATIVE—BOOST PERFORMANCE
NOW BY PAIRING YOUR DATA WAREHOUSE WITH A
MODERN ANALYTICS PLATFORM
Ripping and replacing your data warehouse can be a scary and details inaccessible to them within a traditional data modeling
undertaking if you’re not ready for it. Many organizations prefer to first approach while continuing to benefit from snapshots, Type 2 Slowly
find ways to deliver analytics projects more quickly using their existing Changing Dimensions, or semi-additive or non-additive facts in a
data warehouse—a business user-focused approach to boosting dimensionalized model. And all of this progress can be achieved in
performance. But how do you do this if you need to avoid the four mere hours, rather than scarce resources wasting valuable hours in
common mistakes outlined above? countless design meetings and business requirements analysis.

Modern analytics platforms such as Incorta enable you to access and This approach is much simpler than any of the four common, mistaken
analyze data directly from source data models—including your existing approaches discussed above: it speeds and removes unnecessary
data warehouse—which immensely speeds the development and complexity from the development process while giving you a
updating of analytics projects. Since the analytics platform mirrors phenomenal boost in performance.
data, in its original data model, directly from the source applications,
And—when it’s time to re-implement your aged data warehouse in 3-5
you no longer need to change the data model via star or snowflake
years—you might choose to migrate off of it altogether, instead opting
data modeling whenever new reports or new insights are needed. You
for a new, no-data-warehouse strategy using a modern analytics
also easily can extend beyond existing star schemas to add other data
platform like Incorta.
sources whenever needed.

Executing queries on the source data model using this type of approach
enables analysts to access all attributes of the data without any
predetermined assumptions. They’re free to explore and uncover trends

.08 Incorta | eBook


4 Common Mistakes to Avoid When Optimizing Your Data Warehouse
Work within the data warehouse’s inherent limitations—not against them.

HOW INCORTA WORKS IN CONJUNCTION WITH AN


EXISTING DATA WAREHOUSE.
How Incorta works in conjunction with an existing data warehouse.

Sample data
sources can be:
Input through a
data warehouse

ETL Data BI Tools


OR Processes Warehouse

Input directly
to Incorta

Find out how a Fortune 5 company uses Incorta to bypass data modeling
Find out how a Fortune 10 company uses Incorta to bypass data modeling to
to deliver complex operational reports in only seconds—read the case study.
deliver complex operational reports in only seconds—read the case study.

.09 Incorta
1 | eBook
Designing for Analytics, “Failure Rates for Analytics, BI, and Big Data Projects = 75%—Yikes!” Feb. 21, 2018.
THE DIRECT DATA
PLATFORM
TM

A B O U T I N C O R TA
Incorta is the data analytics company on a mission to help data-driven enterprises be more agile and competitive by resolving
their most complex data analytics challenges. Incorta’s Direct Data Platform gives enterprises the means to acquire, enrich,
analyze and act on their business data with unmatched speed, simplicity and insight. Backed by GV (formerly Google Ventures),
Kleiner Perkins, M12 (formerly Microsoft Ventures), Telstra Ventures, and Sorenson Capital, Incorta powers analytics for some
of the most valuable brands and organizations in the world. For today’s most complex data and analytics challenges, Incorta
partners with Fortune 5 to Global 2000 customers such as Broadcom, Vitamix, Equinix, and Credit Suisse. For more information,
visit https://www.incorta.com

You might also like