Introduction To Data Warehouse Using Cognos 8 BI

Created By : Gourav Atalkar Reviewed By: Amit Sharma Contact Point : bisp.consulting@gmail.com

Course Roadmap
• • • • • Data Warehousing - An Overview Data Warehouse Architecture Data Modeling for Data Warehousing Overview (OLAP) Multidimensional Analysis Multidimensional Analysis Introduction Operations In multidimensional Analysis Multidimensional Data Model Multi-Dimensional vs. Relational

Objectives
• At the end of this lesson, you will know : – What is the Need of Data Warehousing (Scenarios) – What is Data Warehousing – The evolution of Data Warehousing – Need for Data Warehousing – OLTP Vs Warehouse Applications – Data marts Vs Data Warehouses – Data Warehouse Schemas – Reporting fundamentals

Business Scenario –I
You are a database administrator for a company that is called TBC: The FMCG Company. The company manufactures daily needs products for sale to other businesses. The financial department wants to track, analyze, and forecast the sales revenue across geographic regions on a periodic basis for all products sold. •What is the most effective distribution channel ? •What product promotions have the biggest impact on revenue? •Who are my customers and what products are they buying? •Which are our lowest/highest margin customers ? •What impact will new products/services have on revenue and margins? •Which customers are most likely to go to the competition ?

Business Scenario -I
Data Input

Delhi

Mumbai

Sales per product type per branch for first quarter.

Kolkata

Bhopal

O L A P S E R V E R

Sales Manager

Solution: I
Extract sales information from each database. Store the information in a common repository at a single site.
Data Input

Query &Analysis tools Delhi

Report

Mumbai

Data Ware House

Data Output via Business Intelligence Tool (i.e. Cognos, MSBI, Hyperion)
Sales Manager

Kolkata

Bhopal

Business Scenario –II

One Stop Shopping Super Market has huge operational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Business Scenario –II
Data Entry Operator

Report Management Wait

Operational Database

Data Entry Operator

Solution: II
Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.

Solution: II
Data Entry Operator

Report

Transaction

Operational Database

Extract data

Data Ware House
Management

Data Entry Operator

Business Scenario –III
Cakes & Cookies is a small, new company. President of the company wants his company should grow. He needs information so that he can make correct decisions.

Solution: III
Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support ad-hoc queries.
Improvement

Query &Analysis tools

Data Ware House

Data Output via Business Intelligence Tool (i.e. Cognos, MSBI, Hyperion)
President

What is a Data ware House ?
A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. A process of transforming data into information and making it available to users in a timely enough manner to make a difference

Characteristics of Data Warehouse
• A data warehouse is a Subject oriented Integrated Time varying Non-volatile collection of data that is used organizational decision making. primarily in

Subject-oriented Characteristics of a Data Warehouse
Operational Data Warehouse

Leads

Inventory

Customers

Products

Quotes

Orders

Regions

Time

Integrated Characteristics of a Data Warehouse • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency.
RDBMS

Legacy System

Data Warehouse

Flat File

Data Processing Data Transformation

Time Variant Characteristics of a Data Warehouse

Operational

Data Warehouse

Current Value data • time horizon : 60-90 days • key may not have element of time

Snapshot data • time horizon : 5-10 years • key has an element of time • data warehouse stores historical data

Non Volatile Characteristics of a Data Warehouse
insert change Only Select

Operational
delete insert load

Data Warehouse

replace

change

Data Warehouse Architecture
Relational Databases Optimized Loader ERP Systems

Extraction Cleansing Data Warehouse Engine Analyze Query

Purchased Data

Legacy Data

Metadata Repository

OLTP vs Data Warehouse • OLTP • Warehouse (DSS) – Application Oriented – Subject Oriented – Used to run business – Used to analyze business – Detailed data – Summarized and refined – Current up to date – Snapshot data – Isolated Data – Integrated Data – Repetitive access – Ad-hoc access – Clerical User – Knowledge User (Manager)

Online analytical Process[OLAP]
OLAP is a category of software tools that provides analysis of data stored in a database. With OLAP, analysts, managers, and executives can gain insight into data through fast, consistent, interactive access to a wide variety of possible views.

Product

Data Ware House

Online analytical Process[OLAP]
OLAP is a category of software tools that provides analysis of data stored in a database. With OLAP, analysts, managers, and executives can gain insight into data through fast, consistent, interactive access to a wide variety of possible views. •What is an OLAP Cube? As you saw in the definition of OLAP, the key requirement is multidimensional. OLAP achieves the multidimensional functionality by using a structure called a cube. The OLAP cube provides the multidimensional way to look at the data. The cube is comparable to a table in a relational database.

Features of Cube Representation
Slicing: A slice is a subset of a multidimensional array corresponding to a single value for one or more members of the dimensions not in the subset.

Features of Cube Representation
Dicing : A related operation to slicing is dicing. In the case of dicing, you define a sub-cube of the original space. The data you see is that of one cell from the cube. Dicing provides you the smallest available slice.

Features of Cube Representation
Rotating : Rotating changes the dimensional orientation of the report from the cube data. For example, rotating may consist of swapping the rows and columns, or moving one of the row dimensions into the column dimension.

Features of Cube Representation
Dimension :A dimension represents descriptive categories of data such as time or location. In other words, dimensions are broad groupings of descriptive data about a major aspect of a business, such as dates, markets, or products.

Features of Cube Representation
Measure : The measures are the actual data values that occupy the cells as defined by the dimensions selected. Measures include facts or variables typically stored as numerical fields, which provide the focal point of investigation using OLAP. For instance, you are a manufacturer of cellular phones. The question you want answered is how many xyz model cell phones (product dimension) a particular plant (location dimension) produced during the month of January 2003 (time dimension).

Data Warehouse Schema

Star Schema Fact Constellation Schema Snowflake Schema

Fact:
Definition : Facts are numeric measurements (values) that represent a specific business activity. Facts are stored in a FACT table I.e. the center of the star schema . Facts are used in business data analysis, are units, cost, prices and revenues Example: sales figures are numeric measurements that represent product and/or service sales.

Fact:
The Fact Table holds the measures, or facts. The measures are numeric and additive across some or all of the dimensions. For example, sales are numeric and users can look at total sales for a product, or category, or subcategory, and by any time period. The sales figures are valid no matter how the data is sliced. The centralized table in a star schema is called as FACT table, that contains facts and connected to dimensions.

Fact:
A fact table typically has two types of columns: Contain facts and Foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.

Dimension
Definition : Qualifying characteristics that provide additional perspective to a given fact. Example: sales might be compared by product from region to region and from one time period to the next. Here sales have product, location and time dimensions. Such dimensions are stored in DIMENSIONAL TABLE.

Dimension Table
Definition : The dimensions of the fact table are further described with dimension tables Fact table: Sales (Market_id, Product_Id, Time_Id, Sales_Amt) Dimension Tables: Market (Market_Id, City, State, Region) Product (Product_Id, Name, Category, Price) Time (Time_Id, Week, Month, Quarter)

What is Star Schema?
• Definition: Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. • It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. • The center of the star schema consists of a large fact table and it points towards the dimension tables. • The advantage of star schema are slicing down, performance increase and easy understanding of data.

Steps in designing Star Schema
Identify a business process for analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). List the columns that describe each dimension.(region name, branch name, region name). Determine the lowest level of summary in a fact table(sales dollar). In a star schema every dimension will have a primary key.

Steps in designing Star Schema
In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Star Schema Examples
Fact table provides sales statistics broken down by product, period and store dimensions Dimension tables contain descriptions about subjects of the business

1:N relationship between fact and dimension tables

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

Snowflake Schema
Represent dimensional hierarchy directly by normalizing the dimension tables Easy to maintain Saves storage, but is alleged that it reduces effectiveness of browsing A single , large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables.

Snowflake Schema Example
Product Dim. Product_id Region Dim. Region_id City State Country Store Dim. Store_id Store Name Store Add. Region id Sales Fact Store_id Product_id Time_id measure Product Desc Product Name Product Line Product Type Time Dim. Time_id Year Quarter Month

Drawbacks: Time consuming joins , report generation slow

Fact Constellation
Fact Constellation Multiple fact tables that share many dimension tables Booking and Checkout may share many dimension tables in the hotel industry This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.

Fact Constellation Example
Product Dim. Period Key Shipping Fact Shipper Key Store Key Product Key Period Key Store Dim. Price Store Key Store Name Store Add. City measure Product Desc Product Name Product Line Product Type Period Key Sales Fact Store Key Product Key

From the Data Warehouse to Data Marts
Information Individually Structured Less

Departmentally Structured

History Normalized Detailed

Organizationally Structured Data

Data Warehouse

More

Reporting Fundamental Case Study
• DSS Books & Music is a new company which Sales books,music and videos items. • There products are sold in different region of the world. • They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda. • The President of the company wants sales information.

Sales Measures & Dimensions • Measure – Units sold, Amount. • Dimensions – Product ,Time , Region.

Sales Data Ware House Tables
Store Dimensions Table

Sales Data Ware House Tables
Region Dimensions Table

Sales Data Ware House Tables
Product Dimensions Table

Sales Data Ware House Tables
Time Dimensions Table

Sales Data Ware House Tables
Sales Fact Table

Sales Data Ware House Model

Sales Information
The product details which has minimum Amount Sales less than 50000 rupees.

Sales Information
The Top N Store details which has maximum Amount Sales.

Sales Information
sales by Store Type to determine which Store are generating the most revenue and the highest sales volume.

Sales Information
Contribution that each Country makes to revenue.

Questions

Thanks You

Contact Us: bisp.consulting@gmail.com bispsolutions.wordpress.com learnhyperion.wordpress.com

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.