You are on page 1of 7

10 ways to begin a data warehouse project Página 1 de 7

TechRepublic : A ZDNet Tech Community

10 ways to begin a data warehouse


project
by Guest Contributor | Oct 12, 2006 7:00:00 AM

Tags: Guest Contributor

Takeaway: IT departments typically launch data warehouse projects without input from
business partners, explaining DW concepts to the uninitiated and building the business case
themselves. But what steps should you take once the project is greenlighted? CIO Jeff Relkin
examines key aspects of the process, including determining data access controls, deciding on
an implementation approach, identifying project scope, developing the spec document,
assessing data quality, and building a metadata repository.

By Jeff Relkin

This article is also available as a PDF download.

High functioning value-added IT departments operate in a consultative mode, using the


enterprise business model and strategic plan to work effectively with business partners to
identify technology-based solutions in response to requirements as articulated by the business.
Projects are launched based on collaboratively deciding what's needed.

A data warehouse, however, is one of the few examples of a project that's typically initiated
independently by IT without input from the business. IT has to build the initial business case
for the data warehouse, since few people outside the technology discipline understand what a
data warehouse really is or what kind of value it can provide. What's the best way to get
started? Here are some suggestions.

#1: Determine your organization's appetite for change

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010
10 ways to begin a data warehouse project Página 2 de 7

What makes a data warehouse different from other traditional applications? One obvious
answer is the rather profound difference between transactional (OLTP) and analytic (OLAP)
processing. Another is the fact that data warehouses are additive in nature, which means they
don't conform to standard accounting rules for financial data and are tolerant of redundancy, a
construct normal relational databases avoid at all costs.

Legacy applications typically have standard interfaces and proscribed reporting packages, but
data warehouses are mostly accessed via ad-hoc single-use queries. Dimensional cubes don't
look anything like relational tables. All in all, the use of a data warehouse requires a completely
different user mindset. Is your organization ready for that? If not, you have two choices: Save
yourself a lot of grief and don't start the project to begin with or figure out how to use your IT
professional change agent skills to move the enterprise culture into one that embraces high-
level analytics and the associated set of advanced technology tools.

#2: Identify the most likely business unit to benefit


from a data warehouse and approach it proactively

Every time I've gone to my business customers and said, "I'm building a data warehouse... what
would you like to do with it?", the response I've gotten is, "What's a data warehouse?" Well
begun is well done, so a key initial activity is to determine which of your business customers
has the greatest potential need for analytical data and tools, along with the capacity and
interest to put those tools to effective use.

If you work for a small organization, you'll probably want to go straight to the top. At this
point, you're just trying to generate interest and excitement (and funding), so build a short but
effective business case before you have any meetings. Cover the high-level basics of data
warehouse functionality in business value terms. Be scrupulously careful to avoid anything
remotely resembling geek-speak. Do not make the fatal mistake of attempting to quantify data
warehouse ROI. Many people have heard that data warehouses are "sinkholes of money." Don't
deny this, because it's usually true, and the worst thing you can do now is kill your credibility.
The initial sale you're trying to make is based on improved decision making through advanced
analytics. Financial benefits will be longer term and difficult to measure, so be honest about
that. Once you get approval to proceed, you'll want to quickly get business management
sponsorship. IT should be the project owners only for the shortest time possible.

#3: Determine data access controls

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010
10 ways to begin a data warehouse project Página 3 de 7

Your proposal has been greenlighted--now what do you do? All kinds of questions need to be
answered, and one good place to start is by determining inclusion parameters and security
requirements. You have two main areas to think about, sources and targets. Your instinct will
be to use every piece of data you can get your hands on, but not every system, database, legacy
application, or information enrichment file is a potential source for your data warehouse. A lot
of data derived from those sources probably isn't relevant to your identified customer group
and may contain qualitative data that doesn't lend itself to analysis.

You also have to consider whether your downstream data warehouse customers are even
allowed to process raw data. Users who are authorized at the application level shouldn't
necessarily be granted access to granular information. Likewise, the target data warehouse will
need some security controls imposed on it. One of the great dangers of data warehouses is that
in the hands of unskilled practitioners, they can be used to make and justify spectacularly bad
business decisions. Sometimes users have to be protected from themselves.

#4: Assemble the team

Who should work on the data warehouse effort? This is a project, so you'll need a project
manager, someone comfortable with nontraditional project management methods (highly
iterative process, multiple short-duration interim deliverables, toleration for change and
uncertainty, etc.). You'll also need a database designer and a dedicated DBA, people who are
well versed in the differences between relational tables and multidimensional cube constructs.

Your DBA needs to be dedicated because a good data warehouse project involves constant
revision and tuning, and in addition to unique architectural skills, your DBA can't be distracted
by conflicts with other responsibilities. A development team will be necessary to build the
various EAI (enterprise application interface) and ETL (extract/transform/load) interfaces,
and you'll need dedicated report and query specialists. You'll want to make sure your query
developers have great customer focus skills, because they'll also be the ones who'll support the
data warehouse to the customer community when it goes live. Don't forget the most important
members of your team--a customer council.

#5: Decide on the implementation approach

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010
10 ways to begin a data warehouse project Página 4 de 7

How should you build your data warehouse? There are two traditional approaches: the galactic
data warehouse and the architected datamart. Classic data warehouse topology consists of a
source layer, which feeds into an ODS (operational data store), from there into the enterprise
data warehouse, and from there into a series of datamarts. All data flows unidirectionally
downstream, and the reporting layer at the bottom connects to the datamarts and the data
warehouse. The entire environment is connected to a metadata repository.

In the galactic data warehouse approach, you build the complete backend infrastructure from
source to data warehouse, encompassing as much of the source data as you might possibly
someday want, and implement datamarts fed from the data warehouse as needed. The
architected datamart approach bypasses the central data warehouse altogether, sending
focused subsets of data directly from the sources to the datamarts. Both approaches have
merit, but typically you can get to an interim deliverable far faster with architected datamarts.
The danger is that you'll never have, or take, the opportunity to go back and insert the middle,
the data warehouse itself. Your decision as to which approach to use depends on your own
technology and business environment and should be based on a strategic risk assessment.

#6: Identify the project scope

How broad should you make your data warehouse environment? As is the case with any
project, before you start you need to define the three most fundamental components of the
effort: what are the inputs, what kind of processing will be performed on them, and what
comes out the other side?

One of the key differences between a data warehouse project and nearly every other technology
project is that it's almost always impossible to define what the data warehouse and all its
various components will look like by the time you're done, so you actually want to avoid too
much scope definition up front or you'll lock yourself into a rigid design and end up spending
even more money to overcome limitations. You're basically trying to put up a building without
working from a blueprint. Nevertheless, you must come to some agreement with your
customers as to at least the first few deliverables. This is why the project manager has to be
comfortable with uncertainty. You'll be going to your customer intially with a very raw product
and saying, "Try this, let me know how you like it," and then moving forward iteratively from
there. Definition will become clearer the more you move into implementation.

#7: Establish the success criteria

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010
10 ways to begin a data warehouse project Página 5 de 7

What's your customer trying to achieve? If you ask your customer at the start of the project,
you'll get a blank stare and a shoulder shrug, so here's where your consultative and business
skills get exercised. There's no way to make a definitive statement about the financial value of
the data warehouse up front--sometimes not even after it's finished. You must understand
enough about your customer's business that you can help define, in qualitative and not
quantitative terms, how the project will be judged successful. What results will better decision
making generate? How will having advanced analytics help to understand the business? What
will some of the new techniques, like segmentation analysis, provide in terms of business
intelligence? How will the data warehouse help drive strategy at the enterprise and
departmental level?

#8: Conduct the "25 question" analysis

Now that you have the beginning of an overall strategy, where do you go from here? I've found
the most effective way to proceed is to assemble a group of 15 users and ask them to write down
the 25 questions about the business for which they'd most like to know the answers. Don't
allow them to collaborate. It's critically important for each one to develop a question list
independently. Make sure they don't view this as a technical assignment but phrase each
question in English business terms as if they were addressing their boss or the CEO (or as if
those folks were addressing them).

Collect everyone's lists, which will give you 375 total questions. Now normalize the list.
Eliminate all the redundancies and combine all the similar questions. You'll find you'll end up
with somewhere around 50 discrete questions. Convene all 15 users and review the list with
them, getting them to put the questions in priority order. Know what you have in your hand?
Your spec document, not only for the data warehouse but also for the first iteration of your
standard query repository. If you can build a data warehouse and the queries that answer those
questions in the order the users specified, you'll be a hero.

#9: Assess current data quality and pre-cleansing


efforts

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010
10 ways to begin a data warehouse project Página 6 de 7

What's the surest way to doom a data warehouse project? Forget about quality. No matter how
good your organization thinks the data in your source systems really is, you know better. It's
highly suspect. Since a data warehouse is additive in nature, you'll never do any update
transactions to it. That means garbage in is not just garbage out, it's permanent garbage. Build
the effort and cost of data cleansing into your data warehouse project. Not only will you have a
data warehouse that produces credible results, you'll drive an overall push that's probably long
overdue to improve the quality of the data in the backend systems themselves. At the same
time, you'll probably uncover a host of errors in the legacy systems that should have long since
been corrected.

#10: Don't forget the metadata

Remember last Christmas when you tried to put together your daughter's bicycle without the
instructions? That's what using a data warehouse without metadata is like. Metadata, or data
about your data, is the glossary that documents all the important information users need so
they can understand how to correctly utilize the data. The metadata repository is generally set
up as a separate database, connected passively to all components of the data warehouse--the
sources, the ODS, the warehouse, and the datamarts. It also drives the interfaces, providing
technical information that documents the processes and transformations that operate on each
data entity. Whether you construct it yourself or purchase a product, remember to build this
critical component into your project plan and expect the maintenance of it to be as iterative as
the data warehouse itself. As one changes, so does the other.

Jeff Relkin has 30+ years of technology-based experience at several Fortune 500 corporations
as a developer, consultant, and manager. He has also been an adjunct professor in the master's
program at Manhattanville College. At present, he's the CIO of the Millennium Challenge
Corporation (MCC), a federal government agency located in Washington, DC. The views
expressed in this article do not necessarily represent the views of MCC or the United States of
America.

People who read this, also read...


Resources for designing, planning, and implementing a data warehouse strategy
Hands-on tasks in building a data warehouse
Oracle 9i makes data warehousing easy to implement
Making the operational case for data warehousing
Beware of data warehousing death knells

Print/View all Posts


Comments on this article
10 ways to begin a data warehouse project JodyGilbert  | 10/12/06

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010
10 ways to begin a data warehouse project Página 7 de 7

My Updates
My Contacts

Would you like your own dynamic Workspace on TechRepublic?

Take two minutes and set up a TechRepublic member profile.

Would you like your own dynamic Workspace on TechRepublic?

Take two minutes and set up a TechRepublic member profile.

Popular on CBS sites: Fantasy Baseball | iPad | Video Game Reviews | Cell Phones | NFL Draft

About CBS Interactive | Jobs | Advertise | Mobile | Site Map

© 2010 CBS Interactive Inc. All rights reserved. | Privacy Policy (updated) | Terms of Use

http://articles.techrepublic.com.com/5100-22_11-6125211.html 27-05-2010

You might also like