You are on page 1of 51

Fair Use Notice

The material used in this presentation i.e., pictures/graphs/text, etc. is solely


intended for educational/teaching purpose, offered free of cost to the students for
use under special circumstances of Online Education due to COVID-19 Lockdown
situation and may include copyrighted material - the use of which may not have
been specifically authorised by Copyright Owners. It’s application constitutes Fair
Use of any such copyrighted material as provided in globally accepted law of many
countries. The contents of presentations are intended only for the attendees of the
class being conducted by the presenter.
Data Warehousing (DWH)
Chapter # 1

Disclaimer: The contents in this presentation have been taken


from multiple resources available at the internet including
books, notes, reports, websites and presentations.
Recommended Books
1. Paulraj Ponniah. “Data Warehousing Fundamentals”, John
Wiley & Sons, Latest Edition.
2. Thomas Connolly. “Database Systems”, Latest Edition.
3. Jiawei Han & Kamber M., Data Mining: Concepts &
Techniques, Morgan Kaufman Publisher, Latest Edition.
4. NCR Teradata University Program Guide, Latest Edition.

Additional Resources:
-Research papers and articles
Credit hours
(3+0)

Pre-requisite
Database Systems

Assessment
Theory: 20% Sessional, 80% Written Semester Examination
(20% Mid, 60% Final)
Course Outline- Major Topics/Chapters

1. Introduction to Data Warehousing


2. Planning and Requirements
3. Logical & Physical Data Modeling
4. Denormalization
5. Dimensional Modeling
6. Data Extraction, Transformation and Loading
7. Online Analytical Processing (OLAP) Implementation
Techniques
CLOs Description Taxonomy level PLO

1 Explain concepts, architectural design and C3 1


implementation of data warehouse
2 Interpret business requirements and provide C6 2, 4
their solutions
3 Design and build data warehouse to match C5 3
the business requirements

The course is designed so that students will achieve the following PLOs:

1 Engineering Knowledge

2 Problem Analysis

3 Design/Development of Solutions

4 Investigation
What is Data Warehouse?
• The term data warehouse was introduced by William
Inmon known as the father of data warehouse.

• The original concept was essentially a historical


database containing tables derived from an active
operational database.
What is Data Warehouse?
• A large store of data accumulated from a wide range of sources
within a company and used to guide management decisions.

• A single, complete and consistent store of data obtained from a


variety of different sources made available to end users in a way
that can be understood and used in business context.
[Barry Devlin]
• Data Warehousing (DW) is process for collecting and
managing data from varied sources to provide meaningful
business insights.
• A copy of transaction data specifically structured for query and
analysis. [Ralph Kimball]
Problem: Heterogeneous Information
Sources
“Heterogeneities are everywhere”
Personal
Databases

World
Scientific Databases
Wide
Web
Digital Libraries
 Different interfaces
 Different data representations
 Duplicate and inconsistent information
Goal: Unified Access to Data

Integration System

World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases

• Collects and combines information


• Provides integrated view, uniform user interface
• Supports sharing
The Warehousing Approach
• Information
Clients
integrated in
advance
Data
• Stored in WH Warehouse

for direct
querying and Integration System Metadata

analysis
...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor

...
Source Source Source
THE COMPELLING NEED FOR DATA
WAREHOUSING
• As an information technology professional, you have worked on computer
applications as an analyst, programmer, designer, or project manager.

• You have been involved in the design, implementation, and maintenance of


system, such as order processing, general inventory, in-patient billing, that
support day-to-day business operations.

• They gather, store, and process all the data needed to successfully perform
the daily operations. They provide online information and produce a variety
of reports to monitor and run the business.
• The operational computer systems did provide information to run the
day-to-day operations, but what the executives needed were different
kinds of information for making strategic decisions.

• Data warehousing is an increasingly important business intelligence


tool.
• The operational systems, important as they were, could not provide strategic
information.
• Data warehousing is a new paradigm specifically intended to provide vital
strategic information.
• In the 1990s, organizations began to achieve competitive advantage by
building data warehouse systems.
ESCALATING NEED FOR STRATEGIC
INFORMATION
Who needs strategic information in an enterprise? What exactly do we mean
by strategic information?

• The executives and managers who are responsible for keeping the enterprise
competitive need information to make proper decisions.

• They need information to formulate the business strategies, establish goals,


set objectives, and monitor results. This type of information is strategic.
Examples
• Retain the present customer base
• Increase the customer base by 15% over the next 5 years
• Gain market share by 10% in the next 3 years
• Improve product quality levels in the top five product groups
• Enhance customer service level in shipments
• Bring three new products to market in 2 years
• Increase sales by 15% in the North East Division
So the ultimate need is:
“Drowning into data and starving for information”

“Knowledge is power and Intelligence is absolute


is power”
The Need?
$
POWER

INTELLIGENCE

KNOWLEDGE

INFORMATION

DATA

18
Strategic information characteristics
The Information Crisis
• Just think of all the various computer applications in a large scale
company.
• Think of all the databases and the quantities of data that support the
operations of a company.
• How many years’worth of customer data is saved and available? How
many years’ worth of financial data is kept in storage? Ten years?
Fifteen years?
• Where is all this data? On one platform? In legacy systems? In
client/server applications?
Two startling facts:
(1) organizations have lots of data;
(2) information technology resources and systems are not effective
at turning all that data into useful strategic information.

• Lots and lots of information exists. Why then do we talk about an


information crisis?

• Most companies are faced with an information crisis not because of


lack of sufficient data, but because the available data is not readily
usable for strategic decision making.
• Data needed for strategic decision making must be in a format suitable for
analyzing trends.

• Executives and managers need to look at trends over time and steer their
companies in the proper direction.

• The tons of available operational data cannot be readily used to spot


trends. Operational data is event-driven. You get snapshots of transactions
that happen at specific times.

• In the operational systems, you do not readily have the trends of a single
product over the period of a month, a quarter, or a year.
• Data warehousing is a solution for the ‘data glut, knowledge scarcity’
problem; it is essentially a kind of decision-support system

• The historical data in the warehouse play an important role in providing


Business Intelligence.
Why a Data Warehouse?

• Because Businesses want much more…

• What happened?
• Why it happened?
• What will happen?
• What is happening?
• What do you want to happen?
Technology Trends
• The entire spectrum of computing has undergone tremendous
changes.
• There is explosive growth in technology day by day.
• Three critical areas:
• Computing technology
• Human/Machine interface
• Processing options
FAILURES OF PAST DECISION-SUPPORT
SYSTEMS
• Consider the following scenario:
• The marketing department in your company has been concerned about the
performance of the West Coast Region and the sales numbers from the monthly
report this month are drastically low.

• Marketing Vice President is agitated and wants to get some reports from the IT
department to analyze the performance over the past two years, product by
product, and compared to monthly targets.

• He wants to make quick strategic decisions to rectify the situation.

• There are no regular reports from any system to give the marketing department
what they want. You have to gather the data from multiple applications and start
from scratch.
• You may have to go to several applications, perhaps running on different
platforms in your company environment, to get the information (ad-hoc
reports)
• What happens next?

• Marketing department likes the ad hoc reports you have produced. But now
they would like reports in a different form, containing more information.
• After the second round, they find that the contents of the reports are still
not exactly what they wanted. They may also find inconsistencies among
the data obtained from different applications.

• This chain continues!!

• Most of these attempts by IT in the past ended in failure.


History of Decision-Support Systems
• Ad Hoc Reports
• IT would write special programs, typically one for each request.
• Special Extract Programs
• IT anticipate somewhat the types of reports that would be requested from
time to time.
• Small Applications
• The users could stipulate the parameters for each special report.
History of Decision-Support Systems
• Information Centers
• The information center typically was a place where users could go to request
ad hoc reports.
• Decision-Support Systems
• The systems were menu-driven and provided online information and also the
ability to print special reports.
• Executive Information Systems
• This was an attempt to bring strategic information to the executive desktop
FAILURES OF PAST DECISION-
SUPPORT SYSTEMS
• IT receives too many ad hoc requests, resulting in a large overload.
• Requests are not only too numerous, they also keep changing all the
time.
• The users have to depend on IT to provide the information.
• The information environment ideally suited for making strategic
decision making has to be very flexible and conducive for analysis.
Operational versus Decision support
systems
• What is a basic reason for the failure of all the
previous attempts by IT?

• Reason for the inability to provide strategic


information is that we have been trying to provide
strategic information from the operational systems.
Operational versus Decision support systems
How are they Different?
• Making the wheels of Business Turn
• Take an order
• Process a claim
• Make a shipment
• Generate an invoice
• Receive cash
• Reserve an airline seat.
How are they Different?
• Watching the wheels of Business Turn
• Show me the top-selling products.
• Show me the problem regions.
• Tell me why (drill down)
• Show me the highest margins
• Alert me when a district sells below target
Different scope different purposes!!
How are they Different?
• Different purposes
• Different scope
• Data content is different
• Data usage patterns are different
• Data access types are different.
How are they Different?
Data Warehousing- The only viable
solution
• A new type of system environment:
• Database designed for analytical tasks
• Data from multiple applications
• Easy to use
• Read-intensive data usage
• Content updated periodically and stable
• Includes current and historical data
Data Warehousing- The only viable
solution
• Ability for users to run queries
• Ability for users to initiate reports
Business Intelligence at the Data
Warehouse
Data Warehouse Defined

• Its an informational environment that:


• Provides integrated and total view of the enterprise.
• Makes the current and historical information easily available.
• Makes decision support transactions possible without hindering operational
systems.
• Renders the organization’s information consistent
• Presents an interactive source of strategic information
An Environment, not a Product

• Data warehouse is a computing environment where users can find


strategic information.
• Flexible and Interactive.
• Its 100% user driven.
• Responsive to ask-answer-ask-again pattern.
• Provide the ability to discover answers to complex and unpredictable
questions.
A Blend of Many Technologies
Data Warehousing Tools- A practical
approach
• The following are the functions of data warehouse tools and utilities:
• Data Extraction - Involves gathering data from multiple heterogeneous
sources.
• Data Cleaning - Involves finding and correcting the errors in data.
• Data Transformation - Involves converting the data from legacy format to
warehouse format.
• Data Loading - Involves sorting, summarizing, consolidating, checking
integrity, and building indices and partitions.
• Refreshing - Involves updating from data sources to warehouse.
• Note: Data cleaning and data transformation are important steps in
improving the quality of data and data mining results.
TeraData

• Teradata offers a full suite of service which focuses on Data Warehousing.


• Teradata is massively parallel open processing system for developing large-
scale data warehousing applications.
• Teradata is an open system. It can run on Unix/Linux/Windows server
platform.
• This tool provides support to multiple data warehouse operations at the same
time to different clients.
• The system is built on open architecture. So whenever any faster devices are
made available, it can be incorporated into the already build architecture.
• Teradata supports 50+ petabytes of data.
Microsoft SQL Server Integration Services

• SQL Server Integration Services is a Data warehousing tool that used to


perform ETL operations; i.e. extract, transform and load data. SQL Server
Integration also includes a rich set of built-in tasks.

• Organizations that are already using Microsoft SQL Server for their
database needs will likely find that the software's integration services (SSIS)
are adequate to meet their needs. It integrates easily with other Microsoft
products and offers data quality and master data management features as
well as data integration capabilities.
• SSIS is a platform for building enterprise-level data integration and data
transformations solutions.
• It can extract data from relational databases and data warehouses, XML files,
flat files and other sources before transforming them and loading them into
other applications.
• Design and Development Environment: Graphical environment

• Download link: https://www.microsoft.com/en-us/download/details.aspx?id=


39931
Amazon Redshift
• an excellent data warehouse product which is a very critical part of Amazon Web
Services – a very famous cloud computing platform.

• Redshift is a fast, well-managed data warehouse that analyses data using the existing
standard SQL and BI tools. It is a simple and cost-effective tool that allows running
complex analytical queries using smart features of query optimization.

IBM Infosphere
• IBM Infosphere is an excellent ETL tool which uses graphical notations to execute
data integration activities.

• It provides all the major building blocks of data integration & data warehousing along
with data management and governance.
• Other commonly used tools are Panoply, Talend, Numetic, Hyperion(oracle),
Oracle 12c etc.

• There are several options that are available to companies in data warehouse
tools. This, in turn, lays stress over the importance of proper analysis of the
organizational requirements and needs before picking any tool.
CHAPTER SUMMARY
• Companies are desperate for strategic information to face
competition, extend market share and improve profitability.

• In spite of tons of accumulated data, enterprises were facing


information crisis.

• All the past attempts by IT were simply Failures.


CHAPTER SUMMARY
• Informational systems are different form OLTP systems.

• We need a new type of computing environment to provide strategic


information.

• Data warehousing is the only viable solution.


Chapter Review
1. Information Crisis A. OLTP application

2. Strategic information B. Produce ad hoc reports.

3. Operational Systems C. Explosive growth

4. Information center D. Despite lots of data

5. Data Warehouse E. Data cleaned and transformed

6. Order processing F. Users go to get information

7.Executive information system G. Used for decision making

8. Data staging area H. Environment, not product

9. Extract Programs I. For day-to day operations

10. Information Technology J. Simple, easy to use

You might also like