You are on page 1of 33

Database Data Warehouse Data Mining Competing on Analytics

Dr.Parijat Upadhyay

Competing on Analytics

Maintaining competitive advantage has become difficult

Competitive strategies that are employed today involve optimization of key business processes

Serving the most profitable customers

Optimize supply chains to minimize inventory


Proactive management through accurate predictions

Process optimization vs. take as it comes However process optimization requires data and extensive analysis of that data

Business Value of Analytics


Customers or consumers Supply chain

Wal-Mart

Financial performance and cost management Research and new product/service development Strategic planning Human resources

Database What is it?


A A

structured collection of data data-centered mirror of an organizations business processes


Structure of data reflects organizational processes

Content of data reflects organizations history

database is designed to store data useful for daily operation operational data Systems are designed to handle transactions (insertion, updation, and deletion of data) OLTP systems systems maintain integrity and consistency of data

Database

Database

An Example
SONY World selling electronic items

Various Items sold through various Branches Customers Buy different items in different Quantities Employees work in various branches Employees earn commissions for Purchases

Represent the Real World as Data

Entity

A person, place, thing on which we maintain information Examples Employees, Customers, Products, Warehouses

Attribute

Characteristic or quality of particular entity Examples Employees PAN Card No., Customers Phone number, Products unit price, Warehouse address

Relationships among Entities

Examples : Customer orders Product Order serviced by Employee

SONY World Example

Various Items sold through various Branches

Customers Buy different items in different Quantities


Employees work in various branches

Employees earn commissions for Purchases

Objects of Interest
Item

Branch

Customer

Employee

Limitations of a Spreadsheet

Things become complicated when we want to keep track of several related entities For example:

Customers Products

Orders

An Order is essentially a relationship between one Customer and one or more Products

Database is for easy retrieval of information which aid decision making


Who are the top 10 customers in 2002 based on total order value? Can you do it in MS Excel?

SONY World Example

Various Items sold through various Branches

Customers Buy different items in different Quantities


Employees work in various branches Employees earn commissions for Purchases

Objects of Interest Item Branch

Customer

Employee

SONY World Example


Employee Data EmpId 1 2 EmpName Rahul Sachin Category Computer Home Theatre Designation Manager Sales Rep

Customer Data CustId 1 2 CustName M.K. Saxena K Gupta CustAddress Bistupur C.H. Area (East)

Item Data ItemId 1 2 ItemName LCD TV Home Theatre ItemPrice 25000 19000

Entities have attributes; key attributes and other attributes

SONY World Example


Objects of Interest
Item Branch

Customer

Employee

Data must be stored about relationships between entities


M K SAXENA purchased two units of High-res TV

Sachin sold 1 unit of multi-disc CD player, 3 units of 50-CD racks


Rahul is a manager of the branch at Bistupur

Relationships also have attributes and Key attributes

Entity Relationship Diagram

Customer

Purchases

Item

Entities and Relationships Are Represented As Tables

CUSTOMER(cust_id, cust_name, cust_address) ITEM(item_id, item_name, item_type, item_price) Purchases(cust_id, cust_name, cust_address, item_id, item_name, item_type, item_price, qty, date, time, trans_id) Transaction(cust_id, cust_name, cust_address, item_id, item_name, item_type, item_price, qty, date, time, trans_id)

SONY WORLD Database


CustId C01 C02 CustName M.K. Saxena K Gupta CustAddress Bistupur C.H. Area (East) ItemId TransId T100 T100 T100 T200 T300 ItemId I01 I02 I03 I01 I03 Qty 1 1 2 1 4 I01 I02 ItemName LCD TV Home Theatre ItemPrice 25000 19000

TransId CustId
T100 T200 T300 C01 C02 C02

Date
1/8/09 5/9/09 7/9/09

Time
19:00 13:00 18:00

SQL (Structured Query Language)


The database must be created Data must be inserted, modified and deleted Data must be retrieved SQL makes a database management system successful/popular

Queries
1. List all customers

2. List all items

3. List all items which have been purchased in more than

quantity 1 in any transaction.


4. List all transactions made after 5/9/2009 arranged in reverse order of date

5. How many customers are there?

Queries
6. For every transaction, find out the number of items
purchased

7.

How many times is every item purchased?

8.

Find out the total quantity of item I01 purchased

9.

List the items and the corresponding quantity purchased by customer "C01".

10. List the custids who have purchased items in quantity > 1.

SONY WORLD
Sony Each

World has branches all around the country branch has its own database sales per item per branch in the third quarter

Companys Can

you do this?

Data Warehouse
Data

collected from multiple sources

Summarized,

Integrated, ..

Residing

in a single site

Used

primarily for analytical reporting (OLAP)

Data Warehouse
Delhi Terminal

Mumbai

Clean Transform Integrate Load Refresh

Data Warehouse

Query and Analysis Tool

Kolkata

Chennai

Terminal

Data Warehouse
Around major subjects
- Customer, Item, Supplier

Historical perspective (5 10 years)

Summarized, Integrated

Loading and access of data (nonvolatile)

Data Warehouse and Database


Feature Characteristic User Data Summarization No of records accessed No of users Database (OLTP) Day-to-day operations, Transaction processing Clerks, DB Professionals, DBA Current, up-to-date, read/write, 100 Mb to GB Highly detailed Tens Thousands Data Warehouse (OLAP) Data Analysis, Decision support Knowledge Workers Managers, Analysts Historical, mostly read, 100 GB to TB Summarized Millions Hundreds

Data Warehouse
Dimensions

Sony World time, item, branch location


Track

of monthly sales - facts as spreadsheets to data cubes

Viewed

Data Warehouse

location = Delhi

item type
home entertainment

time (quarter) Q1
Q2 Q3 Q4

computer 825
952 1023 1038

phone 14
31 30 38

security 400
512 501 580

605
680 812 927

Data Warehouse

time Q1 Q2 Q3 Q4

location = Delhi item home ent comp phone sec 854 882 89 623 943 890 64 698 1032 924 59 789 1129 992 63 870

location = Mumbai location = Kolkata item item home ent comp phone sec home ent comp phone 1087 968 38 872 818 746 43 1130 1024 41 925 894 769 52 1034 1048 45 998 940 795 58 1142 1091 54 984 978 864 59

sec 591 682 728 784

location =Chennai item home ent comp phone sec 605 825 14 400 680 952 31 512 812 1023 30 501 927 1038 38 580

Data Warehouse
Delhi 882 623 89 854 Mumbai 1087 968 872 38 Kolkata 746 591 43 818 Chennai Q1 605 825 14 400

Time (quarters)

Q2

680

952

31

512

Q3

812

1023

30

501

Q4

927

1038

38

580

Home ent

Comp

Phone

Security

Data Warehouse
Roll

Up (on location from cities to zones)

Drill
Dice

Down (on time from quarters to months)


(location = Kolkata or Chennai) and (time = Q1 or Q2)
(home entertainment or computer) and item =

Slice

(for time = Q1)

OLAP Operations Roll Up


North
South

Q1

time (quarters)

Q2

Q3

Q4 Home ent Comp Phone Security

OLAP Operations Drill Down


Delhi Mumbai Kolkata Chennai

Jan Feb
Mar Apr May Jun Jul Aug Sep Oct Nov Dec

time (months)

Home ent

Comp

Phone

Security

OLAP Operations - Dice


Kolkata Chennai

Q1

time (quarters)
Q2

Home ent

Computer

OLAP Operations Slice

Delhi Mumbai

location

Kolkata Chennai Home ent

Comp

Phone

Security

Data Mining

The process of analyzing data

Extract information not offered by the raw data alone


Designed to uncover non-obvious patterns in the data

Data Mining Some Interesting Questions

What items are typically purchased on the same shopping trip?

How long after someone buys a computer do they buy a certain software?
What unusual credit card transactions have occurred? Are certain promotions more effective than others? Do certain customers form segments that were not obvious before? Can lifetime value be calculated? Can propensity to switch be forecasted?

Market Basket Analysis

How should I place products in a supermarket?

Gut feeling or hard analysis of data?


Transactional (Market) Basket Analysis

Placing associated products close together (or far apart?)


Encouraging customers to buy associated products by providing discount vouchers Ensuring minimum level of inventory at a location Providing real time cues to CSRs to advise what products should be offered during order placement