Data Warehousing In Retail Sales

Click to edit Master subtitle style


 Goals of Data Warehouse Components of Data Warehouse Dimensional Modeling Case Study : Retail Business Designing the Dimensional Model Dimensional Table Attributes
o Date Dimension o Product Dimension o Store Dimension

Goals of Data Warehouse • The data warehouse must make an organization’s information easily accessible • The data warehouse must present the organization’s information consistently • The data warehouse must be adaptive and resilient to change • The data warehouse must be a secure bastion that protects our information assets • The data warehouse must serve as the foundation for improved decision making 3/15/12 .

Components of Data Warehouse 3/15/12 .

and many more Discretely valued description that is more or less constant and participates in constraints It implements user interface to the Data Warehouse • 3/15/12 .Dimensional Modeling Fact Table • Daily Sales Fact Table Date Key (FK) Product Key (FK) Store Key (FK) Quantity Sold Dollar Sales Amount Stores Business Performance measurement  Mostly numeric & additive • It expresses many to many relationship between dimensions Dimension Table • Product Dimension Table Product Key (PK) Product Description SKU Number (Natural Key) Brand Description Category Description Department Description Package Type Description Package Size ...

Bringing together facts and dimensions Star Join Schema Simplicity and symmetry Highly recognizable to business users High performance benefits 3/15/12 .

Example: A simple report 3/15/12 .

Case Study: Retail Business A large grocery chain 3/15/12 .

Retail Business 100 grocery stores spread over five-state area Each store has a full complement of departments. dairy.000 SKUs come from 3/15/12 .000 SKUs on its shelves About 55. frozen foods.000 of the SKUs come from outside manufacturers and have Universal Product Codes (UPCs) imprinted on the product package. meat and health/beauty aids Each store has roughly 60. including grocery. The remaining 5.

Retail Business Data collection happens at  Cash Registers (POS systems) Back door where vendors make deliveries Key inputs to the dimensional modeling 3/15/12 .

product. atomic data  Provides maximum flexibility  Can support all possibilities of user requests  The most granular data is an individual line item on a POS transaction  Step 3. Choose the Dimensions  Date. and store dimensions 3/15/12 . Declare the Grain  Granularity.Designing the Dimensional Model  Step 1. Select the Business Process  Aim: Management wants to better understand customer purchases as captured by the POS system  The business process is POS retail sales  Step 2.

Designing the Dimensional Model Measured facts in the retail sales schema 3/15/12 .

Dimensional Table Attributes Date Dimension Product Dimension Store Dimension Promotion Dimension Promotion Coverage Factless Fact Table Degenerate Transaction Number 3/15/12 .

Date Dimension It is present in every data mart as a data mart is a time series Date Dimension Unlike other dimension table date Date Attributes of date dimension: Description Full Date Day of Week o Day Number Day Number in Epoch o Month Number Week Number in Epoch Month Number in Epoch o Holiday Indicator Day Number in o Weekday Indicator Calendar Month Day Number in o Selling Season 3/15/12 Calendar Year Date Key (PK) dimension can be build in advance .

so he or she would be unable to directly leverage inherent capabilities associated with a date data type  SQL date functions do not support filtering by attributes such as weekdays versus weekends.. Why an explicit date dimension table is needed? As SQL query can directly constrain on fact table date key. seasons.  Usability: business user is not versed in SQL date semantics.Date Dimension contd. fiscal periods. holidays. if the date key in the fact table is a date-type field. or major events  Presuming that the business needs to slice data by 3/15/12 .

Product Dimension The product dimension describes every SKU in the grocery store.  The product dimension is almost always sourced from the operational product master file Most retailers administer their product master files at headquarters and download a subset of the file to each store’s POS system at frequent intervals attributes of each SKU The product master holds many descriptive The merchandise hierarchy is an important 3/15/12 .

476 1.234 Sales Dollar Amount Sales Quantity attributes translates into user capabilities for robust and complete analysis 3/15/12 .321 $10.640 5.162 5.024 $6.184 $6. Box) in the product dimension table which are not part of the merchandise hierarchy. can combine constraints with a constraint on a merchandise hierarchy attribute three primary dimensions in nearly every data mart Bakery Bakery Bakery Frozen Foods Frozen Foods Frozen Foods Frozen Foods Frozen Foods Frozen Foods Baked Well Fluffy Light QuickFreeze Freshlike Frigid Icy QuickFreeze Freshlike The product dimension is one of the two or  Department Description Brand Description set of dimension A robust and complete $3.092 1.467 $10.298 $5..476 2. Bag.009 $3.474 2. There are attributes (Bottle.437 3.328 $2.Product Dimension contd.234 3.138 1.476 $7.

we can roll stores up to any geographic attribute. county The store dimension is the Each store can be thought of Store Key (PK) Store Name Store Number (Natural Key) Store Street Address Store City Store County Store State Store Zip Code Store Manager Store District Store Region Floor Plan Type Photo Processing Type Financial Service Type Selling Square Footage Total Square Footage First Open Date Last Remodel Date … and 3/15/12 more . As a result.Store Dimension Store Dimension The store dimension describes every store in our grocery chain primary geographic dimension in our case study as a location. such as ZIP code.

end-aisle displays. newspaper ads.Promotion Dimension It describes the promotion conditions under which a product was sold Causal dimension: Temporary price reductions. and coupons Factors on which Promotions are judged: o Lift: Measured on the agreed baseline sales o Whether transferred sales from regularly priced products to temporarily reduced-priced products but sales decrease in nearby products on the o Cannibalization: Gain in sales of3/15/12 one product .

Browsing in the dimension table does not reveal which stores 3/15/12 or products were affected by the promotion. the combined single dimension is not much larger than any one of the separated dimensions would be o The combined single dimension can be browsed efficiently but it only shows the possible combinations. This information is found in the fact table . The tradeoffs in favor of keeping the four dimensions together include the following: o Since the four causal mechanisms are highly correlated.

it merely captures the relationship between the involved keys promotion but didn’t sell requires a two3/15/12 step process To determine what products where on .Promotion Coverage Factless Fact Table It is needed to find the products that were on promotion but did not sell We’d load one row in the fact table for each product on promotion in a store each day regardless of whether the product sold or not. It is a factless fact table as it has no measurement metrics.

Degenerate Transaction Number The POS transaction number is the key to the transaction header record. containing all the information valid for the transaction as a whole. such as the transaction date and store identifier header information is already extracted into other dimensions as it serves as the grouping key for pulling together all the products purchased in a single transaction 3/15/12 In dimensional model this interesting  The POS transaction number is still useful .

Retail Schema A frequent shopper dimension table and add another foreign key in the fact table is created to see exact purchase of frequent shopper on a weekly basis 3/15/12 A frequent .

Retail Schema Original schema gracefully extends to accommodate these new dimensions largely because we chose to model the POS transaction 3/15/12 data at its .

denormalized dimension table and placed in normalized secondary dimension tables o The multitude of snowflaked tables makes for 3/15/12 Reason for not adopting modelling: .Dimension Normalization Perceived benefits of Dimension Normalization cryptic codes o This design saves space as we’re only storing o The normalized design for the dimension tables is easier to maintain Snowflaking: Redundant attributes are removed from the flat.

Surrogate Key Surrogate keys are integers that are assigned sequentially as needed to populate a dimension It is encouraged to use surrogate keys in dimensional models rather than relying on operational production codes operational code: Reason to avoid natural keys based on the o To avoid embedding intelligence in the data warehouse keys because any assumptions that we make eventually may be invalidated o Queries and data access applications should 3/15/12 .

However in the absence of these tools. a the pairs of products more direct approach is used purchased together during a specified time period o as SQL was never designed to constrain and group across line item fact rows o The market basket fact The basket count is a semiadditive fact 3/15/12 .Market Basket Analysis  Market Basket Analysis is the notion of analyzing the combination of products that sell together combinations of items  It gives the retailer insights about how to merchandise various  The retail sales fact table cannot be used easily to perform MBA table is a periodic  Data mining tools and some OLAP products can assist with snapshot representing market basket analysis.

Thank You 3/15/12 .

References The Data Warehouse ToolKit –Ralph Kimbal & Margy Ross  3/15/12 .

Sign up to vote on this title
UsefulNot useful