You are on page 1of 33

Case Study: Banks

Prof. Navneet Goyal
Computer Science Department
BI TS, Pilani

10/25/2014 Prof. Navneet Goyal, BITS, Pilani 2
Banks
Role of DW in Financial Service Industry
We will concentrate on Retail Banks
Most of us understand the basic functioning
of a bank

10/25/2014 Prof. Navneet Goyal, BITS, Pilani 3
Banks: Services Offered
Checking accounts
Saving accounts
Mortgage loans
Investment loans
Personal loans
Credit card
Safe deposit boxes

“heterogeneous” set of “products” “sold" by the
bank.
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 4
Banks: Households
Each account belongs to a household
Major goal of the bank is to market more
effectively to households that already have one
or more accounts with the bank
Household DW
Track all the accounts owned by the bank
See all the individual holders
See the residential and commercial
household groupings to which they belong

10/25/2014 Prof. Navneet Goyal, BITS, Pilani 5
Banks: Requirements
Users want to see five years of historical monthly
snapshot on every account.
Every type of account has a primary balance. There
is a significant need to group different kinds of
accounts in the same analyses and compare primary
balances
Every type pf account (known as product within the
bank) has a set of custom dimension attributes and
numeric facts that tend to be different from product
to product

10/25/2014 Prof. Navneet Goyal, BITS, Pilani 6
Banks: Requirements
Every account is deemed to belong to a household.
Upon studying the historical production data, we
conclude that accounts come and go from
household as much as several times per year for
each household due to changes in marital status and
other life-stage factors
In addition to the household identification, we are
very interested in demographic information as it
pertains to both the individuals and the households.


10/25/2014 Prof. Navneet Goyal, BITS, Pilani 7
Crucial Observations
A large bank can have as much as 10 million
accounts and 3 million households. The accounts
can be more volatile than household
There are different types of products that this bank
provides as discussed earlier. In addition to that it
can also provide many customized products for a
specific customer
We also keep track of the status of the account,
which can be alive or dead and would like to store
information related to reason behind closing of any
account. Needless to say there are enormous new
accounts created in a day to be stored


10/25/2014 Prof. Navneet Goyal, BITS, Pilani 8
Issues in Designing DW
Heterogeneous Products. How to model?
Grain. Finest grain data?
Highly volatile demographic profile of customers
Type 2 change?
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 9
New DM Features
Core and Custom Fact
Tables
Rapidly Changing Monster
Dimensions
Outriggers
Mini-dimensions
Multi-valued dimensions &
Helper Tables
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 10
Monster Dimension
Customer dimensions can be very wide
- Dozens or hundreds of attributes
Customer dimensions can be very large
- Tens of millions of rows in some warehouses
- Sometimes includes prospects as well as actual
customers
Size can lead to performance challenges
- One case when performance concerns can trump
simplicity
- Can we reduce width of dimension table?
- Can we reduce number of rows caused by
preserving history for slowly changing
dimension?
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 11
Snowflaking & Outriggers
Snowflaking is removal of low cardinality
columns from dimension tables to separate
normalized tables.
Snowflaking not recommended
User presentation becomes difficult
Negative impact on browsing performance
Query response time suffers
Prohibits the use of Bitmap Indexes
Some situations permit the use of dimension
outriggers
Outriggers have special characteristics that
make them permissible snowflakes
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 12
Outrigger Tables
Limited normalization of large dimension table
to save space
Identify attribute sets with these properties:
- Highly correlated
- Different grain than the dimension (# of customers)
- Change in unison


10/25/2014 Prof. Navneet Goyal, BITS, Pilani 13
Outriggers
Example:
A set of data from an external data provider
consisting of 150 demographic &
socioeconomic attributes regarding the
customer’s district of residence
Data for all customers residing in a particular
district is identical
Instead of repeating this large block of data
for all customers, we model it as an outrigger
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 14
Outriggers
How To:
Follow these steps for each attribute set:
1. Create a separate “outrigger dimension”
for each attribute set
2. Remove the attributes from the customer
dimension
3. Replace with a foreign key to the outrigger
table
4. No foreign key from fact row to outrigger
- Outrigger attributes indirectly associated with facts via
customer dimension
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 15
Outriggers
Reasons:
Demographic data is available at a significantly
different grain than the primary dimension data
(district vs. individual customer)

The data is administered & loaded at different
times than the rest of the data in the customer
dimension
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 16
Outriggers
Advantages:
Space savings
- Customer dimension table becomes narrower
- Outrigger table has relatively few rows
- One copy per district vs. one copy per customer
Disadvantages:
Additional tables introduced
- Accessing outrigger attributes requires an extra join
- Users must remember which attributes are in outrigger
vs. main customer dimension
- Creating a view can solve this problem
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 17
Outriggers
Dimension outriggers are permissible, but they
should be exceptions rather than the rule
Avoid having too many outriggers in your schema
If query tool insists on a classic star schema,
we can hide the outrigger under a view
declaration
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 18
Mini-Dimensions
Rapidly changing Monster Dimension
Need special treatment
Type-2 SCD not recommended
Business users often want to track the myriad of
changes to customer attributes
Insurance companies must update information
about their customers and their specific insured
automobiles & homes
Throws browsing performance and change-
tracking challenges
SOLUTION!!!
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 19
Mini-Dimensions
Some attributes change relatively frequently
- Behavior-based scores
- Certain demographic attributes
- Age, Income, Marital Status, # of children
How to preserve history without row
explosion?
Some attributes are queried relatively
frequently
- Queries using huge customer dim. are slowed
How to improve query performance?
Create a mini-dimension
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 20
Creating Mini-Dimensions
Remove frequently-changing or frequently-
queried attributes from the customer dimension
Add them to a separate mini-dimension table
instead
Discretize mini-dimension attributes to reduce
cardinality
- Group continuously-valued attributes into
buckets or bands
- Example: Age < 20, Age 20-29, Age 30-39,
Age 40-49, Age 50+
Include foreign keys to both customer
dimension & mini-dimension in fact table
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 21
Mini-Dimensions
Advantages
- History preserved without space blow-up
- FT captures historical record of attribute values
- Mini-dimension has small no. of rows
- # of unique combinations of MD attributes is small
- Consequence of discretization
- 5 attributes with 10 possible values has 10000 rows
- Limit no. of attributes in a single MD
- Imp. Performance for queries that use MD
- Atleast those queries that don’t use main customer dim.


10/25/2014 Prof. Navneet Goyal, BITS, Pilani 22
Mini-Dimensions
Disadvantages
- Fact table width increases
- Due to increase in no. of dimension foreign keys
- Information lost due to discretization
- Less detail is available
- Impractical to change bucket/band boundaries
- Additional tables introduced
- Users must remember which attributes are in mini-
dimension vs. main customer dimension



10/25/2014 Prof. Navneet Goyal, BITS, Pilani 23
Outriggers vs. Mini-Dimensions
Fact
Fact
Customer
Dimension
Customer
Dimension
Outrigger
Mini-dimension
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 24
Dimension Triage

Most dimensional models have between 5
and 15 dimensions
Core fact table containing only the primary
balance of every account at the end of each
month
Only two dimensions – Month & Account
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 25
Dimension Triage
Month Dimension
Month End Date key
Month Attributes..
Primary Balance Core Fact
Month End Date key (FK)
Account key (FK)
Primary Balance
Account Dimension
Account key
Account Attributes..
Product Attributes..
Household Attributes..
Status Attributes..
Branch Attributes..
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 26
Star Schema
Month Dimension
Month End Date key
Month Attributes..
Primary Balance Core Fact
Month End Date key (FK)
Account key (FK)
Primary Balance
Account Dimension
Account key
Account Attributes..
Product Attributes..
Household Attributes..
Status Attributes..
Branch Attributes..
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 27
Too Few Dimensions

Account dimension is a HUGE entry table to
the fact table, thereby slowing queries
For a large bank, # of customers could touch
10 m and using type 2 SCD could render it
unworkable
Products and branches could be thought of
as two separate dimensions as there is a M:N
relationship between the two


10/25/2014 Prof. Navneet Goyal, BITS, Pilani 28
Dimensions

Account
Time (month in this case)
Branch
Household
Product
Status
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 29
Status Dimension

Records the status of the account at
the end of each month
Status could be active or inactive
Status change, such as new account
opening or closure occurring during
the month, is also recorded
Reasons for status change are also
stored
A mini-dimension??
10/25/2014 Prof. Navneet Goyal, BITS, Pilani 30
Household Dimension

Household as a separate dimension-
designer’s prerogative
Account & household dimension
closely related
Still it is a good idea to treat HH as a
separate dimension
- Size of the account dimension (~10m)
- Smaller entry point to the fact table (~3m HHs)
- Account can change HH many times


10/25/2014 Prof. Navneet Goyal, BITS, Pilani 31
Star Schema Month Dimension
Month End Date key
Year
Fiscal_quarter
Primary Balance Core Fact
Month End Date key (FK)
HH_key
Account key (FK)
Status_key
Product_key
Primary_Balance
Tx_Count
Account Dimension
Account key
Primary_name
Secondary_name
Account_address
Account_city
Account_state
Account_zip
Date_opened

HH Dimension
HH_key
HH_head_name
HH_address
HH_city
HH_state
HH_zip
HH_type
HH_income
Presence of Children

Status Dimension
Account_status_key
Account_status_desc
Account_status_reason
New_acct_flag
Closed_acct_flag
Product Dimension
Product_key
Product_desc
Product_type
Product_category
Branch Dimension
Branch_key
Branch_address
Branch_Region
Branch_type
Q & A
Thank You