You are on page 1of 7

Training Program for Data Warehousing

DATA WAREHOUSING

LESSONS LEARNT AND BEST


PRACTICES

December 2000

TCS Confidential
DATA WAREHOUSING: LESSONS LEARNT AND BEST PRACTICES

Table of Contents
1. OVERVIEW.........................................................................................................................3
...............................................................................................................................................3
1.1 SIGNIFICANT ISSUES IN DATA WAREHOUSING....................................................3
1.2 WHY DATA WAREHOUSE PROJECTS FAIL..............................................................4
1.3 STEPS FOR A SUCCESSFUL DATA WAREHOUSE....................................................4
1.4 PILOTING THE DATA WAREHOUSE..........................................................................5
1.5 IMPLEMENTING BEST OF BREED SOLUTION........................................................6
1.6 BUDGETING FOR A DATA WAREHOUSE..................................................................6
1.7 DATA WAREHOUSING STAFFING REQUIREMENTS..............................................7
1.8 TESTING.........................................................................................................................7
1.9 PERFORMANCE ISSUES AND TUNING....................................................................7

2
DATA WAREHOUSING: LESSONS LEARNT AND BEST PRACTICES

1. OVERVIEW
The key to a successful data warehouse that truly improves business intelligence is a rock-solid
design foundation. Only when such a structure is in place can a company leverage their data to
empower knowledge workers to make more effective business decisions and be more productive.
Many Data Warehousing projects fail due to lack of management sponsorship and wrongly identifying
the business problem, amongst other things. This document identifies the common mistakes
undertaken in Data Warehouse projects and lists steps for a successful data warehouse
implementation.

1.1 SIGNIFICANT ISSUES IN DATA WAREHOUSING


In an in-depth analysis report, the META Group described the following emerging issues as most
significantly altering the data warehousing approach taken by many organizations:

• Inadequate Planning Assumptions for Large Data Warehouse Initiatives. Once into the
implementation stage, too many designers and project leaders for corporate data warehouse
initiatives discover that their assumptions about the impact of organizational, political, and
cultural dynamics — as well as their assessment of technology and skills shortfalls — were
naïve and, in fact, major obstacles.

• Technology Limitations. A common stumbling block for corporate data warehousing is that
even after the data is scrubbed, rationalized, and loaded, the final database is too large to fit
on any known server. In addition, even the most focused data mart solution can grind to a halt
as new sources of internal and external data identified as critical to its business intelligence
analysis deluge the system. Moreover, the processing power required to support access by
the expanding user constituencies (encompassing employees, business partners, and
ultimately customers) is beyond what is currently available. Massively parallel processing
(MPP) systems and even clusters of inexpensive symmetric multiprocessing (SMP) systems
often turn out to be white elephants, because these massive, centralized databases are too
difficult to design, manage, and tune.

• Data Mart Proliferation. Business users are unable to tolerate the long lead times typical of
even the best-planned and executed data warehouse. As a result, many organizations focus
efforts on data marts rather than corporate/enterprise-wide data warehousing. The
uncoordinated architecture(s) that result often compromise the corporate IT infrastructure
plans and result in future major diseconomies.

• Exploding User Constituencies. Organizations continue to state ambitious plans to increase


the number of online analytical users for their data warehouses. Yet, year-to-year
measurements show that these same organizations consistently fail to deploy at such
magnitudes. Meanwhile, the projected number of users is again restated each year for
presumed emerging constituencies — including direct consumer access to certain data
warehouses.

The META Group report further describes the following obstacles to Data Warehouse success:

• Data Quality (41%)


• Managing End-User Expectations (31%),
• Legacy Data Transformation (28%),
• Business Rule Analysis (28%),
• Business Data Modeling (25%), and
• Managing Management Expectations (23%).

3
DATA WAREHOUSING: LESSONS LEARNT AND BEST PRACTICES

1.2 WHY DATA WAREHOUSE PROJECTS FAIL


The overriding reasons for many decision support projects failure is not that the projects were
technically unfeasible. On the contrary, many of the technological challenges of data warehousing
have proven answers. The most common cause of failure is that the warehouse did not meet the
business objectives of the organization.

Measurement of clear business objectives is critical. Because once the data warehouse project is
completed, the management team will have to justify the expenditure. The users always dictate the
success or failure of the warehouse. They therefore need to be heavily involved throughout the data
warehousing project.

Any substantial project lacking executive management participation has a high probability of failure.

Like any other large information systems project, data warehouse development can get bogged down
if the scope is too broad and the number of people involved is too large. A clear purpose and scope
are necessary to manage the application of information systems resources, as well as the
expectations of potential data warehouse users.

Many large warehouse projects have failed because of an inability of the organization to handle the
size and scope of the project. It is tempting to think of a single repository where all of the enterprise's
data problems can be solved in one fell swoop.

For the initial data mart, which usually provides the data warehouse proof-of-concept, the scope must
be sufficient to provide real, immediate, and high profile benefits.

Blindly copying dirty source data into the data warehouse is extremely dangerous. Dirty data can lead
business analysts to make erroneous business decisions and rapidly causes end users to lose
confidence in the quality and integrity of the data warehouse. Lack of data integrity in the data
warehouse is a common source of failure of many data warehouses and analytical applications.

Disaster recovery is a particular challenge, given the large volumes of data found in today's
warehouses. Many organizations just do not take disaster recovery seriously until they had some
catastrophic experience.

Unpredictable or unreliable systems management functions will raise user questions over the quality
of the information and the usefulness of the data warehouse. In a worst-case scenario, poorly planned
and executed system management functions can easily lead users to discontinue using their existing
data warehouse applications and halt the funding of new applications.

1.3 STEPS FOR A SUCCESSFUL DATA WAREHOUSE


A successful DSS project will always be driven by sound ‘business’ reasons for its use. Projects can
fail due to lack of sufficient business reasons for its use, or due to pursuit of purely ‘technology’
reasons. This is understandable because of the very nature of DSS projects, which are delivered to
‘business’ users. Unless these users play an active role during the project lifecycle, there is a high risk
that key requirements will be missed out, leading to user dissatisfaction.

Clearly define the set of business issues to be addressed by the data warehouse. Different types of
business problems yield different types of technical challenges. For example, customer churn analysis
frequently involves overcoming the technical challenge of working with large dimensions. And product
profitability analysis often requires the difficult process of allocating costs across multiple dimensions
and creating analyses across time for changing dimensions.

The technology dimension will of course play a major role in the enterprise strategy. Different
strategies will require different technological characteristics and features. Just as the technology must

4
DATA WAREHOUSING: LESSONS LEARNT AND BEST PRACTICES

align itself with the business mission, the strategy must also consider the technology. Therefore the
introduction of any new technology must be aligned with the business units and processes which it is
intended to support.

Senior management sponsorship is crucial for data warehousing projects and the level of support
must be commensurate with the scope of the project.

Scalability refers to the ability of a system to increase in capacity as users demand more, as data
stores grow, as more users are added to the system and as more applications are developed against
the Warehouse. When considering a system to house a critical warehouse environment, it is important
to select one that scales in a number of aspects. As the data on a system doubles, the query
response time of that system should scale linearly, so that queries take no longer than twice the
original response time to complete. Additionally, when the capacity of a system doubles, the platform
should to be able to handle twice the query volume

Taking it a step further, users need to have a personal stake in the success of the project. Make it a
point to educate users on the fundamentals and processes of data warehousing. Teach them its
benefits, as well as its limitations. This will aid significantly in managing their expectations.

Data warehouses are best built in an iterative fashion. Do not misunderstand; this is not to
recommend that a company should not build a fully functional, multiple Terabyte, multiple subject
area, Web-enabled, end-to-end, enterprise data warehouse with a complete metadata interface. It
simply means that the highest probability for success comes from implementing a decision support
system in a phased approach. Organizations should try to isolate - if appropriate - the first phase of a
data warehouse implementation to one or two important business areas. By using the first iteration as
an opportunity to train the corporation, it will set the stage for bigger and better future
implementations.

Data warehousing projects are fundamentally different from operational projects. These fundamental
challenges require highly experienced, senior-level individuals. These positions can be filled via in-
house resources or by consultants. If consultants are used to fill these roles, it is imperative that they
are highly skilled at knowledge transfer; in-house employees should be assigned to shadow
consultants for both of these roles.

To identify the obstacles and objections to a data warehouse that must be addressed to ensure
successful deployment

1.4 PILOTING THE DATA WAREHOUSE


Piloting the system is a crucial stage of a data warehouse and many organizations miss taking
advantages of the opportunities that a well run pilot can provide to the implementation of the data
warehouse. The pilot demonstration may be scheduled at the beginning of the data warehousing
project, often before any commitments to vendors of data warehousing components have been made.
The function of the pilot/Proof of Concept is to demonstrate a working subset of the initial data mart,
including the system infrastructure that can be subsequently expanded to become the initial
production data mart.

There are two basic reasons why an organization embarks on a pilot:


Technology Prototype: The technology is new to the organization and there is a need to understand
the benefits of the technology to see how it performs. In this case the organization should be able to
measure the cost/benefit that the technology may provide.
Business Prototype: The organization wants to know if a particular application of this technology to a
business problem is the right fit. In other words will this technology solve the business problem.

Objectives of piloting/POC include the following:

5
DATA WAREHOUSING: LESSONS LEARNT AND BEST PRACTICES

• Provide a test of the ability of an ETL tool vendor to support the specific requirements of the data
warehouse
• Provide a test of the ability of one or more end-user tool vendors to support the specific
requirements of the data warehouse
• Provide sufficient information to made a final selection of the ETL and end-user tool vendors
• Prove that the architecture and infrastructure recommended for the data warehouse meet the
functional requirements of the organization
• Demonstrate to managers and end users a working subset of the functionality to be delivered in
the first data mart

1.5 IMPLEMENTING BEST OF BREED SOLUTION


Due to the availability of so many products from different vendors, each promising better functionality,
organizations seeking Data Warehousing solutions typically go for best-of-the-breed solutions to
derive maximum benefit. Therefore, integrating a best of breed set of technologies into your data
warehousing infrastructure that can flexibly accommodate changing requirements can be extremely
challenging. The objective is to develop a workable set of technologies, processes, and standards that
help guide your decisions for both the initial building and deployment of the warehouse and its
subsequent set of BI applications.

However, while implementing a best-of-breed solution the following considerations should be taken:

• Pull together a multi-disciplinary team.

• Confirm the enterprise business vision.

• Establish enterprise data model.

• Select vendor(s) for technology and integration.

• Identify the scope of any interoperability standards, as well as the proprietary vendor extensions
to the standard.

• Design and develop extract, transform and load routines able to accommodate a wide range of
disparate source and target platforms.

• Design and develop the enterprise metadata management system.

• Design and develop the warehouse management and scheduling approach test to prove and
ensure that data that is moved through the architecture arrives at target systems in a timely and
predictable fashion while meeting integrity, consistency, completeness, and granularity
requirements.

• Use performance modeling and testing to ensure that load times meet availability requirements.

• Roll out the solution to users.

• Put in place the on-going procedures to keep the system up and running.

1.6 BUDGETING FOR A DATA WAREHOUSE

A data warehouse must be compiled using a variety of components glued together and is not
something that can be bought off the shelf as a unitized whole. This effort reduces product
cohesiveness and makes quantification of costs more challenging. Moreover, new products are

6
DATA WAREHOUSING: LESSONS LEARNT AND BEST PRACTICES

entering the market at an ever-faster pace and major vendors are putting new emphasis on data
warehousing. Data Warehousing budgets should thus include plans for the addition or substitution of
new products in the latter stages of implementation. Additionally, hardware upgrades must be planned
to account for the data warehouse's broadening deployment throughout the business units. Other
events affecting budgeting will occur, including:

• Historical data will quickly accumulate in the warehouse

• Subject areas will be added

• To maintain performance levels achieved during the initial rollout, memory, disk, and processors
will have to be added to the hardware platform that supports the data warehouse's database
management system (DBMS)

1.7 DATA WAREHOUSING STAFFING REQUIREMENTS

Including business analysts from the business units as key members of the data warehouse
implementation is essential. Most organizations would struggle if they tried to undertake the task with
IS staff resources alone. Business analysts understand what data the warehouse needs, and they can
assist with the deployment tasks that face the IS staff and help train end users in using the data
warehouse.

1.8 TESTING

Testing is an important process in the project and it spans through all the phases. The purpose of
performing testing is to assure that the work product is defect free. For this, it is required to build the
test cases that will satisfy all user requirements. These test cases need to be tracked/traced
throughout the development phases (requirements, design, and construction).

1.9 PERFORMANCE ISSUES AND TUNING

From end-user perspective, the main objective of the DW program is to ensure end-user requirements
in terms of publishing functional reports and providing the ad hoc query facility with accepted time
period for publishing. To successfully achieve this and ensuring that the user expectations are
realistic, it is necessary to define "performance". Once "performance" is defined for the project,
performance issues have to be identified.

Upon successful identification performance issues, benchmarking and projection of key performance
indicators need to be performed. Throughout the development process, performance issues have to
be monitored and tuned to ensure the desired performance.

You might also like