You are on page 1of 13

Data Warehousing and Data Mining Unit 2

Unit 2 Planning and Requirements


Structure:
2.1 Introduction
Objectives
2.2 Key Issues in Planning a Data Warehouse
2.3 Planning and Project Management in Data Warehouse Construction
2.4 Data Warehouse Project
Data Warehouse Development Life Cycle
Kimball Lifecycle Diagram
Requirement Gathering Approaches
2.5 Summary
2.6 Terminal Questions
2.7 Answers

2.1 Introduction
Data Warehouse is a technology, which abstracts and analyzes useful data
that helps companies make best business decisions. Data Warehouse is
becoming the core technology of Business Intelligence field. Requirements
are essential ingredients for developing the Data Warehouse systems.
Usually Project Managers or Leads focus much about requirements. This
chapter is designed for all IT professionals irrespective of their roles in Data
Warehousing projects. It will show you how best you can fit into your specific
role in a project. If you want to be part of a team that is passionate about
building a successful Data Warehouse, you need the details presented in
this unit.
Note: Developers have to gather requirements with the view of analysis in
mind.
Objectives:
After studying this unit, you should be able to:
describe the importance of Project Planning and Requirements
Gathering
discuss Data Warehouse development strategies and development Life
cycle approaches
highlight the importance of both generalized lifecycle and Kimball
lifecycle and its sequences

Sikkim Manipal University Page No.: 11


Data Warehousing and Data Mining Unit 2

2.2 Key Issues in Planning a Data Warehouse


More than any other factor, improper planning and inadequate project
management tend to result in failures. First and foremost, determine if your
company really needs a Data Warehouse. Is it really ready for one? You
need to develop criteria for assessing the value expected from your Data
Warehouse. Your company has to decide on the type of Data Warehouse to
be built and where to keep it. You have to ascertain where the data is going
to come from and even whether you have all the needed data. You have to
establish who will be using the Data Warehouse, how they will use it, and at
what times.
We will discuss the various issues related to the proper planning of a Data
Warehouse.
You will learn how a Data Warehouse project differs from the types of
projects you were involved in the past. We will study the guidelines for
making your Data Warehouse projects a success.
Key Issues during data warehouse construction
Planning for your Data Warehouse begins with a thorough consideration of
the key issues. Answers to the key questions are vital for the proper
planning and the successful completion of the project. Therefore, let us
consider the pertinent issues, one by one.
Values and Expectations. Some companies jump into Data
Warehousing without assessing the value to be derived from their
proposed Data Warehouse. Of course, first you have to be sure that,
given the culture and the current requirements of your company; a Data
Warehouse is the most viable solution. After you have established the
suitability of this solution, only then can you begin to enumerate the
benefits and value propositions.
Risk Assessment. Planners generally associate project risks with the
cost of the project. If the project fails, how much money will go down the
drain? But the assessment of risks is more than calculating the loss from
the project costs. What are the risks faced by the company without the
benefits derivable from a Data Warehouse? What losses are likely to be
incurred? What opportunities are likely to be missed?

Sikkim Manipal University Page No.: 12


Data Warehousing and Data Mining Unit 2

Differences between OLTP and Data Warehouse projects


The Data Warehouse and the OLTP database are both relational
databases. However, the objectives of both these databases are different.
The OLTP database records transactions in real time and aims to automate
the clerical data entry processes of a business entity. Addition, Modification
and Deletion of data in the OLTP database is essential and the semantics of
the application used in the front end makes an impact on the organization of
the data in the database.
The Data Warehouse on the other hand does not cater to real time
operational requirements of the enterprise. It is more a storehouse of current
and historical data and may also contain data extracted from external data
sources.
However, the Data Warehouse supports OLTP systems by providing a place
for the latter to offload data as it accumulates and providing services, which
would otherwise degrade the performance of the database table.
The primary differences between the Data Warehouse database and OLTP
database are given in the table 2.1.
Table 2.1: Data Warehouse VS OLTP Databases
Data Warehouse Database OLTP Database
Designed for analysis of business Designed for real time business
measures by categories and operations
attributes
Optimized for bulk loads and large, Optimized for a common set of
complex, unpredictable queries that transactions, usually adding or retrieving
access many rows per table a small set of rows at a time per table
Loaded with consistent, valid data Optimized for validation of incoming data
and requires no real time validation during transactions and uses validation
data tables
Supports limited users. Particularly Supports thousands of concurrent users
data analyzers (Decision makers)

Data Warehouse Implementation Strategy


Top - Down and Bottom Up
In unit 1, we discussed the top-down and bottom-up approaches for
building a data warehouse. The top-down approach is to start at the
enterprise- wide data warehouse, although possibly build it iteratively.

Sikkim Manipal University Page No.: 13


Data Warehousing and Data Mining Unit 2

Then data from the overall, large enterprise-wide data warehouse flows
into departmental and subject data marts. On the other hand, the
bottom-up approach is to start by building individual data marts, one by
one. The integration of these data marts will make up the Enterprise
Data Warehouse. We looked at the pros and cons of the two methods.
We also discussed a practical approach of going bottom-up, but making
sure that the individual data marts are conformed to one another so that
they can be viewed as a whole. For this practical approach to be
successful, you have to first plan and define requirements at the overall
corporate level.
Build or Buy. This is a major issue for all organizations. No one builds a
Data Warehouse totally from scratch by in-house programming. There is
no need to reinvent the wheel every time. A wide and rich range of third-
party tools and solutions are available.
If you want to build the Data Warehouse using in-house development, a
lot of coding and maintenance is required. Particularly Meta Data
maintenance (DWH schema) becomes difficult. In addition to this, you
have to write in-house programs for data extraction, data transformation,
programs for loading the Data Warehouse storage.
Single Vendor or Best-of-Breed. Vendors come in a variety of
categories. There are multiple vendors and products catering to the
many functions of the Data Warehouse.
So what are the options? How should you decide?
Two major options are:
1) Use the products of a single vendor
2) Use products from more than one vendor, selecting appropriate tools
Planning your Data Warehouse using Single Vendor approach provides:
High level of integration among the tools
Constant look and feel
Seamless cooperation among components
Centrally managed information exchange
Overall price negotiable (non technical)

Sikkim Manipal University Page No.: 14


Data Warehousing and Data Mining Unit 2

This approach will naturally enable your Data Warehouse to be well


integrated and function coherently. However, only a few vendors such as
IBM, SAS and NCR offer fully integrated solutions. Reviewing this specific
option further, here are the major advantages of the best-of breed solution
that combines products from multiple vendors.
With the best-of-breed approach, compatibility among the tools from
different vendors could become a serious problem. If you are taking this
route, make sure the selected tools are proven to be compatible. In this
case, staying power of individual vendors is crucial. Also, you will have less
bargaining power with regard to individual products and may incur higher
overall expense. However, the multi-vendor approach is not advisable if
your environment is not heavily technical.
Business Requirements, Not Technology
Let business requirements drive your Data Warehouse, not technology.
Although this seems so obvious, you would not believe how many Data
Warehouse projects grossly violate this maxim. So many Data Warehouse
developers are interested in putting pretty pictures on the users screen and
pay little attention to the real requirements. They like to build snappy
systems exploiting the depths of technology and demonstrate their prowess
in harnessing the power of technology.
Note:
Data warehousing is not about technology, it is about solving users
need for strategic information.
Do not plan to build the Data Warehouse before understanding the
requirements. Start by focusing on what information is needed and not
on how to provide the information. Do not emphasize the tools.
The basic structure and the architecture to support the user requirements
are more important. So before making the overall plan, conduct a
preliminary survey of requirements. What types of information must you
gather in the preliminary survey? At a minimum, obtain general information
on the following from each group of users:
Mission and functions of each user group
Computer systems used by the group
Key performance indicators
Factors affecting success of the user group
Sikkim Manipal University Page No.: 15
Data Warehousing and Data Mining Unit 2

Who the customers are and how they are classified


Types of data tracked for the customers, individually and groups
Products manufactured or sold
Categorization of products and services
Locations where business is conducted
Levels at which profits are measured per customer, per product, per
district
Levels of cost details and revenue
Current queries and reports for strategic information.
As part of the preliminary survey, include a source system audit. Even at
this stage, you must have a fairly good idea from where the data is going to
be extracted for the Data Warehouse. Review the architecture of the source
systems. Find out about the relationships among the data structures. What
is the quality of the data? What documentation is available? What are the
possible mechanisms for extracting the data from the source systems? Your
overall plan must contain information about the source systems.
Self Assessment Questions
1. Data Warehouse contains data for ______________ purpose.
2. Data Warehouse is a store house of _______________ data.
3. In most organizations, two groups of people are key to the success of
the project, ______________________ and _________________.
4. OLTP systems are designed for __________________.
5. Data Warehouses does not require real-time validation (True / False)

2.3 Planning and Project Management in Data Warehouse


Construction
The overall plan
The seed for a data warehousing initiative gets sown in many ways. The
initiative may get ignited simply because the competition has a Data
Warehouse. Different stakeholders may have different opinions for Data
Warehouse construction. Coming to the concise decision is very crucial
here. The Data Warehouse plan discusses the type of Data Warehouse and
enumerates the expectations. This is not a detailed project plan. It is an
overall plan to lay the foundation, to recognize the need, and to authorize a
formal project.

Sikkim Manipal University Page No.: 16


Data Warehousing and Data Mining Unit 2

2.4 The Data Warehouse project


As an IT professional, you have worked on application projects before. You
know what goes on in these projects and are aware of the methods needed
to build the applications from planning through implementation. You have
been part of the analysis, the design, the programming, or the testing
phases. If you have functioned as a project manager or a team leader, you
know how projects are monitored and controlled. A project is a project. If
you have seen one IT project, have you not seen them all?
The answer in not a simple yes or no; the data Warehouse projects are
different from projects building the transaction processing systems. If you
are new to Data Warehousing, your first Data Warehouse project will reveal
the major differences. We will discuss these differences and also consider
ways to react to them. We will also ask a basic question about the readiness
of the IT and user departments to launch a Data Warehouse project.
How about the traditional system development life cycles (SDLC) approach?
Can we use this approach to Data Warehouse projects as well? If so, what
are the development phases in the life cycle?
2.4.1 Data Warehouse Development Life Cycle
The Data Warehouse development life cycle covers two vital areas as
depicted in the fig 2.1. One is warehouse management and the second one
is data management. The former deals with defining the project activities
and requirements gathering; whereas the latter deals with modeling and
designing the Warehouse (see fig. 2.2).

Sikkim Manipal University Page No.: 17


Data Warehousing and Data Mining Unit 2

Life Cycle of Data Warehouse Development

Define the
Gather Requirements
Project

Model the Warehouse

Validate the Model

Design the Warehouse

Validate the Design

Implementation

Figure 2.1: Life Cycle steps of a DWH (SDLC)

Managing the Project


Managing the Data Warehouse project is an ongoing activity. It is not like
traditional systems project. The Data Warehouse is concerned with the
execution of warehousing process and the data.
Defining the Project
The process of defining the project typically involves the following questions:
What do I want to analyze?
Why do I want?
What if I do not do this?
How do I get this?
Software personnel should get answers to these questions, then we can
understand the requirements that must be addressed.
Requirements Gathering
Transaction Processing Systems focus on automating the process, making
it faster and efficient. This, in turn means that the requirements for

Sikkim Manipal University Page No.: 18


Data Warehousing and Data Mining Unit 2

transactional systems are specific and more directed towards business


process automation.
In contrast, the Data Warehousing environment focuses on facilitating the
analysis that will change the process to make it more effective.
Common questions/ information required during requirements.
Who is of interest to the user?
What is the user trying to analyze?
Why does the user need data?
When does the data need to be recovered?
Where do relevant processes occur?
How do we measure the performance?
2.4.2 Kimball Lifecycle Diagram

Figure 2.2: Kimball Lifecycle Diagram

Ralph Kimball is known worldwide as an innovator, writer, educator, speaker


and consultant in the field of Data Warehousing. The lifecycle strategy of
Kimball became industry standard since then. Kimball had proposed Life
Cycle approach for the development of Data Warehouse. The Kimball life
cycle describes general flow of a DWH implementation, identifies task
sequencing and highlights activities that should happen concurrently.
In the above diagram (Fig. 2.2) the Dimensional Modeling and ETL will be
discussed in the subsequent chapters.

Sikkim Manipal University Page No.: 19


Data Warehousing and Data Mining Unit 2

Project Planning
o Scope, definition and understanding the business requirements
o Task Identification
o Scheduling
o Resource Planning
o Workload Assignment
o The end document represents a blueprint of the project.
Program/Project Management
o Enforces the project plan
o Status monitoring
o Issue tracking
o Development of a comprehensive communication plan that
addresses both the business and IT units
Business Requirements Definition
o Success of the project depends on a solid understanding of the
business requirements.
o Understanding the key factors driving the business is crucial for
successful translation of the business requirements into design
considerations
What follows the business requirements definition?
3 concurrent tracks focusing on:
Technology (Technical Architecture)
Data (Dimensional Modeling, Physical Design and ETL)
Business Intelligence Applications.
Arrows in the diagram indicate the activity workflow along each of the
parallel tracks and dependencies between the tasks are illustrated by the
vertical alignment of the task boxes.
Deployment
It is crucial that adequate planning was performed to make sure that the
results of technology, data, and BI application tracks are tested and fit
together properly. Deployment should be deferred if all the pieces, such
as training, documentation, and validated data, are not ready for
production release.

Sikkim Manipal University Page No.: 20


Data Warehousing and Data Mining Unit 2

Maintenance
This occurs when the system is in production. It includes technical
operational tasks that are necessary to keep the system performing
optimally. Some of the technical tasks are listed below:
Usage Monitoring
Performance Tuning
Index Maintenance
System Backup
Ongoing support, education, and communication with business users
2.4.3 Requirement Gathering Approaches
There are two widely used methods for deriving business requirements:
Source Driven Requirements Gathering
User Driven Requirements Gathering
Source Driven Requirements Gathering
This process is based on defining the requirements by using the
source data in production transactional systems. Analyzing the E-R
model of source data does this or the actual physical record layout
and selecting data elements deemed to be of interest.
User Driven Requirements Gathering
This process is based on defining the requirements by conducting
interviews and discussions with users about business needs and
also investing the functions they perform.
It is recommended to follow the user-driven approach to
breakdown the project into manageable pieces. Here, each
piece is a subject area. The requirements are gathered for each
subject area.
Note: In the above paragraph, the details about the subject area will
be given in subsequent chapters.
Self Assessment Questions
6. In most organizations, two groups of people are key to the success of
the project, ______________________ and _________________.
7. In Data Warehouse, the requirements are gathered subject area wise.
(True / False)

Sikkim Manipal University Page No.: 21


Data Warehousing and Data Mining Unit 2

2.5 Summary
Requirements Gathering is a different strategy for Data Warehouse
development.
An OLTP system collects data for transaction recording purposes.
Where as for a Data Warehouse, data is collected for analysis purpose.
Analysis can be sales analysis or mortality analysis or trend analysis,
etc.
OLTP systems support predefined reports; where as Data Warehouse
supports ad-hoc reports.
There are two widely used methods for deriving business requirements,
Source-driven requirements gathering and User-driven requirements
gathering
Data Warehouse can be implemented using either top-down or bottom-
up development methodologies. This decision always depends upon the
business requirements.
Like Conventional (OLTP projects) projects, Data Warehouses also
follow SDLC life cycle approach.
Like conventional projects there are certain roles and responsibilities for
Data Warehouse development. Roles can be Executive Sponsor,
Business Analyst, Testing, and Infrastructure Specialist Coordinator etc.

2.6 Terminal Questions


1. What are Data Warehouse requirements? How do you gather the
requirements?
2. Explain the Data Warehouse Kimball life cycle.
3. Differentiate between Data Warehouse requirements approach and
OLTP systems approach.
4. Explain any five responsibilities and roles in the development of Data
Warehouse.
5. What are the maintenance issues in Data Warehouse?

2.7 Answers
Self Assessment Questions
1. Analysis
2. Historical
3. Senior Management and Working Management

Sikkim Manipal University Page No.: 22


Data Warehousing and Data Mining Unit 2

4. Real-time business operations


5. True
6. Senior Management,
7. True
Terminal Questions
1. The Data Warehouse is a relational databases requires for business
Refer section no. 2.2
2. Kimball life cycle describes general flow of a DWH implementation,
Refer section 2.4
3. Designed for real time business operations Refer table 2.2
4. Refer section 2.5 (Roles)
5. Refer section 2.4 (maintenance)

Sikkim Manipal University Page No.: 23