Professional Documents
Culture Documents
4 COMMON MISTAKES TO
AVOID WHEN OPTIMIZING
YOUR DATA WAREHOUSE
WORK WITHIN THE DATA WAREHOUSE’S INHERENT
LIMITATIONS—NOT AGAINST THEM.
DATA WAREHOUSES:
A COMPLEXITY OF LAYERS
In some of the beautiful, ancient cities of the world, there are two cities:
the modern, bustling city above ground and the city of archaic, layered
architecture underneath. That’s because when buildings were rebuilt • The data warehouse team is besieged with new requests they can’t
in ancient times, workers tore down the old structures and used the meet on a timely basis.
original material to form the foundations of the new buildings. Often, old • Users frequently want disparate data added to their analyses.
buildings were stripped of their roofs then filled with debris to make solid
• Many users—out of sheer desperation—load data extracts into Excel,
foundations for new buildings or even entire neighborhoods. As a result,
so they can try to figure out on their own, using whatever means at
for example, there are places in Rome where the layers and remnants of their disposal, what’s going on.
the original Roman civilization stretch as deep as 60 feet below ground.
• Ongoing support costs are staggering in terms of staff, licensing, and
maintenance.
Many companies today have a corporate data infrastructure that
resembles this kind of ancient city built layer upon layer in these old • Rapidly changing user complexity and specific user needs simply
couldn’t be predicted when data warehousing foundations were built.
worlds. Thousands of users rely upon often hard-to access data stored in
complex systems to get the information they need to run their company,
their department, their research efforts, their market research, their
financial closings, and on and on. Yet, with each layer, getting to trusted
information easily and effectively becomes more and more difficult.
At the heart of this layered maze of hardware, software, data A lot of IT teams would like to walk away from all of the complexity and
repositories, and thousands of stored reports sits your legacy data start over with a new approach. But for those companies with data
warehouse. Internal and external data volumes have skyrocketed, and— warehouse strategies too deeply embedded to abandon altogether, there
like the layered architecture of yesteryear—the more things you add to is a way to squeeze more performance from existing data warehouses.
your data architecture, the more complex it and your data warehouse First, here are four common mistakes you definitely want to avoid.
can become. But what was state-of-the-art technology 20 years ago is
now in trouble.
Assuming a data warehouse Believing the cloud’s low price Thinking open-source, Big Believing continual data model
appliance is the answer. and “elastically scaling data Data technologies will get you and Extract Transform Load
stores” will solve everything. where you need to be. (ETL) tuning will give you the
performance gains you need.
An appliance might reduce the time required to run queries against billions
of rows, but it does so only through extensive hardware and memory
configurations—it doesn’t get to the heart of performance problems caused by the
overall data warehouse architecture.
The costs for these systems, the needed data migrations, and the corresponding
managed services also are staggering, even for vendors who have built
sophisticated systems running on commodity hardware. Many IT leaders who
ventured down this path still find their data warehouse performance and data
access needs left unfulfilled.
For instance, the Big Data technology platform Hadoop was never • Often requires the implementation of new reporting tools that end
users might not welcome;
designed to be a data warehouse, yet eager data architects and
scientists experiment with Hadoop on projects such as data lakes. • Is not a database management system, so you will need to
implement a whole new set of tools to get it to do what you want
They soon run into problems, however, because these architectures
it to do;
lack discipline in critical areas like integration of outside sources of
• Is very complex, requiring many external technologies to make it
data, reduction of reporting stress on production systems, data security,
work; and
historical analysis, data governance, user-friendly data structures and
• Performs poorly on complex queries for many reasons, so
schemas, and—ultimately—delivering a single version of the truth.
improving performance will require many additional commercial
There’s also a misconception that a Big Data approach like Hadoop and open source systems.
is less expensive. While some data actions using Hadoop indeed can
cost less, building an entire data warehouse and analytics solution
ultimately can be massively more expensive due to the cost of writing
complex queries and analysis. And remember, Hadoop:
The lure of a Big Data approach might be enticing, but beware the
risky, expensive, and complex architecture you’d need to embrace,
with no guarantee your data warehouse performance and analytics
objectives will ever be achieved.
Modern analytics platforms such as Incorta enable you to access and This approach is much simpler than any of the four common, mistaken
analyze data directly from source data models—including your existing approaches discussed above: it speeds and removes unnecessary
data warehouse—which immensely speeds the development and complexity from the development process while giving you a
updating of analytics projects. Since the analytics platform mirrors phenomenal boost in performance.
data, in its original data model, directly from the source applications,
And—when it’s time to re-implement your aged data warehouse in 3-5
you no longer need to change the data model via star or snowflake
years—you might choose to migrate off of it altogether, instead opting
data modeling whenever new reports or new insights are needed. You
for a new, no-data-warehouse strategy using a modern analytics
also easily can extend beyond existing star schemas to add other data
platform like Incorta.
sources whenever needed.
Executing queries on the source data model using this type of approach
enables analysts to access all attributes of the data without any
predetermined assumptions. They’re free to explore and uncover trends
Sample data
sources can be:
Input through a
data warehouse
Input directly
to Incorta
Find out how a Fortune 5 company uses Incorta to bypass data modeling
Find out how a Fortune 10 company uses Incorta to bypass data modeling to
to deliver complex operational reports in only seconds—read the case study.
deliver complex operational reports in only seconds—read the case study.
.09 Incorta
1 | eBook
Designing for Analytics, “Failure Rates for Analytics, BI, and Big Data Projects = 75%—Yikes!” Feb. 21, 2018.
THE DIRECT DATA
PLATFORM
TM
A B O U T I N C O R TA
Incorta is the data analytics company on a mission to help data-driven enterprises be more agile and competitive by resolving
their most complex data analytics challenges. Incorta’s Direct Data Platform gives enterprises the means to acquire, enrich,
analyze and act on their business data with unmatched speed, simplicity and insight. Backed by GV (formerly Google Ventures),
Kleiner Perkins, M12 (formerly Microsoft Ventures), Telstra Ventures, and Sorenson Capital, Incorta powers analytics for some
of the most valuable brands and organizations in the world. For today’s most complex data and analytics challenges, Incorta
partners with Fortune 5 to Global 2000 customers such as Broadcom, Vitamix, Equinix, and Credit Suisse. For more information,
visit https://www.incorta.com