You are on page 1of 35

Sales Data Warehouse

Advisor : Dr. Irwin Levinstein


Presented By : Kalyan Yadavalli
Contents

 Mission of the project


 Need For a Data Warehouse
 Data Warehouse - Overview
 Sales Data Warehouse
 Conclusion
Mission of the project

 The mission of this project is to provide strategic and


tactical support to the Marketing-Sales and Advertising
departments of a media company through the acquisition
and analysis of data pertaining to their customers and
markets.

 This project helps to identify areas of readership and


marketing through creation of a Data Warehouse that will
provide a media company with a better understanding of
its customers and markets.
Need For a Data Warehouse
 To provide an environment where a relatively small amount of knowledge of
the technical aspects of database technology is required to write and
maintain queries and reports .
 To provide a means to speed up the writing and maintaining of queries and
reports by technical personnel.

For example, a query that requests the total sales income and quantity sold
for a range of products in a specific geographical region for a specific time
period can typically be answered in a few seconds or less regardless of how
many hundreds of millions of rows of data are stored in the data warehouse
database.

 To make it easier, on a regular basis, to query and report data from multiple
transaction processing systems ,external data sources for querying or
reporting purposes.
 To prevent persons who only need to query and report transaction
processing system data from having any access whatsoever to transaction
processing system databases and logic used to maintain those databases.
Data Warehouse – Overview

 A data warehouse is a copy of data combined from different data sources


specifically structured for querying and reporting.
 Data warehouses support business decisions by collecting, consolidating,
and organizing data for reporting and analysis with tools such as online
analytical processing (OLAP) and data mining.

 Dimensional Modeling VS Entity-Relationship Modeling

An OLTP system requires a normalized structure to minimize redundancy,


provide validation of input data, and support a high volume of fast
transactions. A transaction usually involves a single business event, such as
placing an order or posting an invoice payment. An OLTP model often looks
like a spider web of hundreds or even thousands of related tables.
In contrast, a typical dimensional model uses a star design that is easy to
understand and relate to business needs, supports simplified business
queries, and provides superior query performance by minimizing table joins.
Dimensional Modeling VS Entity-Relationship Modeling

 This project uses Dimensional modeling, which is the name of the


logical design technique often used for data warehouses. It is
different from entity-relationship modeling.

 Entity relationship modeling is a logical design technique that seeks


to eliminate data redundancy while Dimensional modeling seeks to
present data in a standard framework that is intuitive and allows for
high-performance access.

 For example, a query that requests the total sales income and
quantity sold for a range of products in a specific geographical
region for a specific time period can typically be answered in a few
seconds or less regardless of how many hundreds of millions of
rows of data are stored in the data warehouse database.
Entity–Relationship Modeling

Customer Demographics

CustomerSubscriptions Salesperson

Payment SalesConditions Channel

Campaign Offer Carrier Master Carrier History


Campaign History
District

Zones
City
Dimensional Modeling

Dimensions Fact Table Dimensions

Subscription Sales
Customer Date
EffectiveDateKey
CustomerKey
SubscriptionsKey
Payment PaymentKey Subscriptions
CampaignKey
SalesPersonKey
RouteKey
Demographics Key
Campaign UnitsSold Salesperson
DollarsSold
DiscountCost
PremiumCost
Route Demographics
Kimball- Dimensional life cycle diagram

Technical Product
Architecture Selection &
Design Installation

Business
Data Staging Maintenance
Project Dimensional Physical
Requirement Design & Deployment and
Planning Modeling Design
Development Growth
Definition

End-User End-User
Application Application
Specification Development

Project Management
Sales Data Warehouse
 Business Users Requirements
 Technical Architecture
 Product [ Software] Selection
 Dimensional Modeling
 Logical Design
 Data Staging Design & Development
 Building Data Cube using SQL Analysis Services
 End User Application Specification & Development
 Deployment
 Maintenance & Growth
Requirements Gathering
This phase involves the following steps:
 Collect some business questions the users want an answer for.
 Gather details/requirements from the business users
 Get user sign off on the business questions.

Business Questions:
 Can we profile our "best subscribers" to pull lists of "like" non-subscribers
that we could touch in some way?
 Who exists in the marketplace and have we touched them?
 Can we build a loyalty model based on a subscriber's payment history?
Sales Data Warehouse High Level Technical Architecture

Source Systems Data Staging Area Presentation Area Data Access Tools

Services Subscription
Transform from Sales
source to Dimensional. SQL
Marketing/Sales Atomic and Reporting
Data Extract target.
Maintain summary Services
conformed data.
dimensions. Business Excel
process. Access
Load
Data Storage Design Access
Demographics Goals
Extract Flat files or
relational tables Ease-of-use.
Query
Design Goals performance.
Staging
Name Phone throughput. Dimensional Bus:
Data Extract Integrity and Conformed facts
consistency. and dimensions
Product Selection

Hardware Specs:
 AMD Opteron Processor 252
 2.6 GHz, 3.83 GB RAM
 Operating System: Windows Server 2003

Software Specs:
 Kimball Data Warehouse Tool [Create staging and production databases]
 Microsoft ® SQL Server™ 2000 [ETL { Extract Transform Load} ]
 Microsoft ® SQL Server™ 2005 Integration Services [ Nightly Automation]
 Microsoft ® SQL Server™ 2005 Analysis Services [ Create OLAP Data
Cube]
 Microsoft ® SQL Server™ 2005 Reporting Services[ End User Reports]
 Internet Information Services [ IIS 6.0] [ Web Server to Host the Reports]
Dimensional Modeling

 Design Dimensions
 Attributes of the dimension
 Hierarchy in the dimension
 Dimensional Bus Matrix
 Design Fact Tables
Dimension Hierarchy-Subscriptions

Subscription
Name

Rate Service Term Publication

Frequency Term Length Business


Rate Year Rate Area Rate Type
Groups Groups Group

Discount Short or Long


Category Term
Dimensional Bus Matrix

Dimensions Date Demographics Sales Salesperson Campaign


Conditio ……
ns
Business
Processes Subscription X X X X X
Sales
( starts)
Subscription X X X X
Tracking
Complaints X X X
Stops X X X
Upgrades X X X X X
And
Downgrades
SubscriptionSales Fact Table
Design Fact Tables
•Choose the Business Process as the fact EffectiveDateKey
table CampaignKey
•Declare the grain SalespersonKey
•Choose the dimensions CustomerKey
•Choose the facts DempgraphicsKey
SubscriptionKey
………………..
Grain: Each subscription sold
Facts
Units Sold
Number of Sales (=1)
Dollars Sold
Discount Cost
Premium Cost
Dimension Model: SubscriptionSales

Date
Sales
Customer
Conditions

Subscription Sales

Grain: Each Loyalty-


Campaign subscription sold Payment
Measures
Subscr.
Units Sold
Sales
Number of Sales (=1)
(starts)
Dollars Sold
Discount Cost
Premium Cost
Demographics Subscription

Address Salesperson

Route
Logical Design
 Fact Table Design

 Dimension Table Design

 Slowly Changing Dimensions


 Type 1: The new record replaces the original record. No trace of the old
record exists.
 Type 2: A new record is added into the customer dimension table.
Therefore, the customer is treated essentially as two people.
Fact Table -SubscriptionSales
Column Names Data N
Type U
L
L Ke
? y? FK TO Dimension Description
EffectiveDateKe
y int N FK DimDate Key of effective date
EnteredDateKey Key of date entered in
int N FK DimDate the system
CustomerKey int N FK DimCustomer Key of customer
LoyaltyKey int N FK DimLoyalty Key of loyalty score
PaymentKey int N FK DimPaymentHistory Key of payment behavior
SalesPersonKey Key of sales person for
int N FK DimSalesPerson the change
CampaignKey int N FK DimCampaign Key of campaign
SalesConditions
Key int N FK DimSalesConditions Key of sales conditions
SubscriptionKey int N FK DimSubscription Key of subscription
PersonKey Key of person on the
int N FK dimPerson subscription
Dimension Table - CustomerSubscriptions
Column Names Data N Slowly
Type U changin
L g
L FK TO dimensio
? Key? Dimension Description n type

CustomerSubscriptionK
ey int N PK ID Surrogate Primary Key

Business key of the subscription summary


AddressNum int Y BK record address

Business key of the numbered subscription at


SubscriptionNum int Y BK ` the address

BusinessKey int Y BK Concatenated business key

CustomerID int N Unique identifier for this customer 1

BillingMethod int The method of delivery for the customers bill 2

The earliest start date on record for this


OriginalStartDateKey int FK DimDate customer 1

The start date of this customer’s current


StartDateKey int FK DimDate subscription 2

StopDateKey int FK DimDate The most recent stop date for this customer 2

The expiration date of this customer’s current


ExpireDateKey int FK DimDate or most recent subscription 2
Data Staging
The following are the sub processes of Data Staging process.

Extracting : Reading and understanding the source data, copying the parts
that are needed to the staging area.
Transforming: Possible transformation steps in the data staging area
 Cleaning the data – correct misspellings, deal with missing data elements,
parsing into standard formats.
 Purging selected data which is not required
 Combining data sources, by matching exactly on key values or performing fuzzy
matches on non-key attributes.
 Creating surrogate keys for each dimensional record.
 Building aggregates to boost performance of common queries.
Loading – Loading the transformed data into the production database.
Data Staging -ETL Architecture
Kimball Data Modeling
Source Schema Source Queries tool
Creates the tables for Creates views (queries) Creates the staging and
the common source that feed data to production database tables
database production and metadata.

Common Source Staging Database Production Data


Database Warehouse Database

Setup files on SQL server


that is running the DTS
packages.

DTS
Data Transformation
Services Package
Database Dates Configuration
Configuration File File
Database connection Important date info for
information for the ETL the ETL process
process
Data Staging –Development for Sales Data Warehouse

The ETL packages perform the following work.


 Extract the full sets of dimension rows
 Most transformation logic occurs in the extract query, using SQL
 Extracted rows are stored in a staging table until the ETL package is run.
 There are steps for the staged rows to be fixed up, via SQL statements.
There’s a statement for deleting bogus rows, and a separate statement for
updating rows.
 Find rows that are new; insert them into the target table.
 Use a checksum to find rows that have seen a Type 1 change. Update the
appropriate columns in the target table.
 Use a checksum to find rows that have seen a Type 2 change. Propagate a
new dimension row.
 Log the number of rows extracted, staged, deleted and updated from the
staging tables, inserted into target, Type 1 and Type 2 rows updated in
target.
Dimension-Customer Subscription
Building Data Cube using SQL
Analysis Services

 SQL Server Analysis Services 2005 provides tools for developing OLAP applications

 OLAP [ Online Anlytical Processing ] organizes data warehouse data into


multidimensional cubes based on the dimensional model, and then preprocesses
these cubes to provide maximum performance for queries that summarize data in
various ways.

 Build the cube using SQL Analysis Services and deploy it to SQL Analysis Services
Server.
End User Application Development

 Reporting Services - Uses the Analysis


Services Data Cube as Data Source.
 Sample Report Screenshots
Solicitor Sales
Sales Type Sales Sales Agent Number Of Cost Per Retention
Channel Sales Unit
Carrier
Sales
Carrier
Sales
Carrier Sales 2 $0.00 50.0%

DM Sales

DM Sales

DM Sales 124 $0.13 13.7%

Non-
Solicited
Other

2005 THISISHAMPRDS 8 $0.00 12.5%


FREE2WKSAMP
AD CONTRACT START 8 $0.17 62.5%

ALLCONNECT 1,914 $0.04 43.5%

CAN'T AFFORD DM "2 WKS 7 $0.10 71.4%


FREE"
COLLECTIONS TEAM 15 $0.14 73.3%

COOLSAVINGS.COM 9,415 $21.55 7.1%

CUSTOMER SERVICE 1,934 $0.15 73.7%


PROMO STARTS
DATA ENTRY STARTS 49 $0.14 59.2%
Sales Agent Details
Sales Source Phone Address St City ZI # Units $ Discou Premiu
Agent Name P Sale Sold Sold nt Cost m Cost
s
AD ULTIMATE (757) 13412 BENNS VA SMITHF 234 1 260 $30.9 $37.91 $5.00
CONTRACT TAN OF 365-9400 CHURCH BLVD IELD 30 9
START SMITHFIELD
AD CHOREY & (757) 330 W VA SUFFOL 234 1 260 $30.9 $37.91 $5.00
CONTRACT ASSOCIATE 539-7451 CONSTANCE RD # K 34 9
START 100
AD CHOREY AND (757) 804 W VA SUFFOL 234 1 260 $30.9 $37.91 $5.00
CONTRACT ASSOCIATE 539-7454 WASHINGTON ST K 34 9
START
AD VIRGINIA 0 254 GRANBY ST VA NORFO 235 1 260 $30.9 $37.91 $5.00
CONTRACT STAGE CO LK 10 9
START
AD SPINE & 0 6160 KEMPSVILLE VA NORFO 235 1 260 $30.9 $37.91 $5.00
CONTRACT ORTHAPEDIC CIR # 303A LK 02 9
START CTR, PC
AD COUNTRYWID 0 3000 WOODLAWN VA SUFFOL 234 1 260 $30.9 $37.91 $5.00
CONTRACT E HOME LOAN DR K 34 9
START
AD WHITE, E.D. 0 730 10TH ST VA VIRGINI 234 1 260 $30.9 $37.91 $5.00
CONTRACT A 51 9
START BEACH
AD PERMANENT (757) 434 N MAIN ST # VA SUFFOL 234 1 260 $30.9 $37.91 $5.00
CONTRACT COATING 539-4366 D K 34 9
START SOLUTIONS
IN
8 2,080 $247. $303.26 $40.00
94
Deployment
 Deploy the Reports to SQL Reporting
Services 2005 server.
 Give Access to the users to view the
reports.
 Desktop Installation – Dot Net Framework
2.0 For access to Report Builder.
Maintenance & Growth
 Training the End Users.
 Automated Nightly Updates to Data
Warehouse.
Conclusion
The reports generated from the data warehouse answered the following questions
collected form the business users during the requirement gathering phase of the
project.

 Identify their best customers/loyal customers [ customer subscriptions


/subscription sales]
 Non-subscribers who can be reached
 Contact history of customers in market place [ Demographic data]

Benefits to Marketing
 Increased telemarketing close rates and increased direct mail response rates
 Reduced cost and use of outside telemarketing services and reduced print and
mailing costs
 Identification of new product bundling and distribution opportunities
 Increased acquisition and retention rates, and reduced cost of acquisitions
Benefits to Advertising
 An increase in the annual rate of revenue growth.
 Increase in new advertisers
 Improved targeting capabilities
.
QUESTIONS ?