You are on page 1of 98

Prepared by Kesavan Krishnan

Lab Topic 2: Designing and


Implementing a Data Warehouse
Tasks:
Designing Dimension Tables, Designing Fact
Tables, Implement a Star Schema, Implement a
Snowflake Schema, and Implement a Time
Dimension
Understanding Normalization
• What Is Normalization?
▪ The process of organizing the tables in a relational
database
▪ Eliminates data redundancy
▪ Lowers record locking
▪ Increases efficiency in concurrency
• Accomplished By Dividing Large Tables Into Smaller Tables
▪ Tables have relationship defined
Understanding Normalization
• Before Normalization…
Understanding Normalization
• Before Normalization…
Understanding Normalization
• After Normalization…
▪ No redundant data
Understanding Normalization
• After Normalization…
▪ No redundant data
Normalized Structure Challenges
• A normalized Database Structure Has A Few Basic
Characteristics
▪ It is designed to store detailed data
▪ It is designed to store the data as efficiently as possible
▪ It is designed to provide data integrity
▪ It results in many tables
▪ It is an excellent solution for adding and managing daily
activities
Normalized Structure Challenges
• A Normalized Database Structure Has A Few Basic Challenges
▪ It is usually very inefficient for data extraction (queries)
▪ Usually requires multiple table joins to reach all the data
▪ It doesn’t store data in the form needed for data analysis
▪ Data is stored in most detailed form
▪ No aggregated data
▪ No discretized data
▪ Data may be stored in multiple, normalized databases
Normalized Structure Challenges
• A Normalized Database Structure Has A Few Basic Challenges
▪ As data storage increases, query efficiency decreases
▪ Needed historical data may not exist in the database
▪ Required reporting aggregations can create serious
inefficiencies in data extraction
▪ Database LOB functions can be impacted
Star Schema Basics
• What Is A Star Schema?
▪ The simplest form of database structure used in a DW
▪ Answers the basic question:
▪ What happened, who did it, when did they do it…etc.
▪ Focuses on one, single business area
• What Advantages Does A Star Schema Offer?
▪ Separates data into two main categories
▪ Facts
▪ Dimensions (Measurements or descriptive information)
Star Schema Basics
• What Is A Fact?
▪ Fact (What happened)
▪ Product sold
▪ Customer who bough
▪ Etc.
▪ Dimension (Attributes that describes what happened)
▪ When the product was sold
▪ Day, date, year, quarter, day of the week, etc.
▪ Where the product was sold
Star Schema Basics
Snowflake Schema Basics
• What Is A Snowflake Schema?
▪ A star schema with a little normalization added in
▪ Dimension tables are normalized somewhat
• What Use A Snowflake Schema?
▪ To satisfy data gathering functionality of more advanced
data warehousing/mining tools
▪ To logically separate large dimension tables
▪ To more naturally separate dimensional data
▪ Known customers vs. anonymous customers
Snowflake Schema Basics
• Two Main Rules Concerning Snowflake Schema
▪ Don’t use it
▪ Unless you want to or need to
▪ Test your design and make your own determination
Snowflake Schema Basics
Understanding Granularity
• What Is Meant By The Term Granularity In A DW?
▪ The level of details available?
• What determines Granularity?
▪ The level of data loaded into the fact table
▪ Per order numbers
▪ Daily numbers
▪ Weekly numbers
▪ The number and detail level of dimensions
▪ Quarter, Year, etc.
Understanding Granularity
• Granularity Should Be Determined During Database Design
▪ Changes can be challenging later on
▪ Changes involve a few steps
▪ Changes structure of fact table
▪ Possible changes in dimension tables
▪ Changes in data loading
▪ Changes in data queries
Auditing and Lineage
• Data Warehouse Do Not Store Data As It Is Created
▪ The level of details available?
• Data Warehouses Are Populated From OLTP Data
▪ Based on various conditions
▪ At various times (weekly, monthly, etc.)
▪ From various sources
Auditing and Lineage
• Data Can Be Informative Based On Different Aspects
▪ The level of details available?
• These Characteristics Usually Change Over Time
• Auditing And Lineage Identify These Aspects
▪ Usually stored in tables
▪ Describe source, duration of load, who performed the load
etc.
Auditing and Lineage
• SQL Server Integration Services (SSIS)
▪ Provides SSIS logging
Simple Data Warehouse Example
Simple Data Warehouse Example
▪ This Data Is Loaded Into The Fact Table
Simple Data Warehouse Example
Simple Data Warehouse Example
▪ This Data Is Loaded Into The Sales Dimension Table
Simple Data Warehouse Example
Simple Data Warehouse Example
▪ This Data Is Loaded Into The Sales Geography Dimension
Understanding Fact Tables
• A Fact Table Is A Collection Of Measurements
▪ Note the word ‘Measurements’
▪ About a specific business process
▪ A single, identifies fact about a specific process
▪ Usually numeric
▪ Sales amount order quantity
▪ Tax amount
▪ Discount amount
Understanding Fact Tables
• Fact Tables May Contain Multiple Measurements
▪ If they are closely related
• A Data Warehouse Will Have Many Fact Tables
▪ Each one stores data (measure) for each specific business
area
Understanding Dimensions
• Dimensions Give Context To Measures
▪ Measures are the ‘facts’ or measureable numbers in the
Fact table
▪ Dimensions give context, or specific meaning, to facts
▪ The term ‘Dimension’ usually refers to a table of related
dimensions
Understanding Dimensions
• Example:
▪ A Fact table contains numbers of products sold
▪ A DateDimension table contains the following ‘dimensions’
of dates pertaining to the number of products sold
▪ Date and time (11/17/2011 10:15:32)
▪ Quarter (4)
▪ DayOfYear (321)
▪ WeekDay (Thursday)
▪ Week (44)
Understanding Dimensions
• Each Individual Column In A Dimension Table Is An Attribute
▪ Attributes usually compress or expand data detail
▪ Data can be ‘discretized’ into smaller, summarized groups
▪ Day (365 values)
▪ Weeks (52 values)
▪ Months (12 values)
▪ Quarters (4 values)
Understanding Dimensions
• Each individual column in a dimension table is an attribute
▪ Data can also be ‘drilled into’ for more detailed
information
▪ Hour of the day
▪ Minutes of the hour
▪ Seconds
▪ Milliseconds
▪ Etc.
Understanding Dimensions
Dimension Column Types
• A Dimension Table Usually Stores More Than Attributes
▪ It stores data that is not in the fact table
• A Dimension Table Can Have At Least 5 Column Types
▪ Data can also be ‘drilled into’ for more detailed
information
▪ Name
▪ Key
▪ Member Properties
▪ Lineage
Dimension Column Types
• Attributes Column
▪ Give context to measures
▪ Used by tools to create pivot tables, drill downs, etc.
• Name Column
▪ Used to make the reported data easier to read
▪ Provides human-readable names to entities (Customers,
orders, products, etc.)
Dimension Column Types
• Key Column
▪ Used to uniquely identify entitles and establish
relationships
• Member Property Column
▪ Data included for descriptive use on reports, etc.
▪ Addresses, phone numbers, descriptions, etc.
• Lineage Column
▪ Used to store auditing, source info
Understanding Slowly Changing
Dimensions
• Dimensions Provide Description Or Meaning For Fact Table
Data
• Some Dimension Data May Change Over Time
▪ Customer Last Name
▪ Customer Address
▪ Could affect Region, Country, State, City, Zip, etc.
• What Happens When Dimensions Data Changes?
▪ Historical accuracy is changed
Understanding Slowly Changing
Dimensions
• Example:
▪ OLTP Data
▪ Customer’s address is Atlanta, GA
▪ The customer orders 12,768 products over 12 months
▪ The customer moves to Pittsburg, PA
▪ If Customer Dimension Data Is Changed From Atlanta, GA
to Pittsburg, PA
▪ Historical reports now show those 12,768 products as being
purchased from Pittsburg, PA
▪ Wait a minute…
Understanding Slowly Changing
Dimensions
• Two Main Solutions For SCDs
▪ Type 1 SCD
▪ Type 2 SCD
Understanding Slowly Changing
Dimensions
• Type 1 SCD
▪ OLTP updates are moved into the DW
▪ Any changes overwrite the current DW data
▪ Past actual data history is lost
Understanding Slowly Changing
Dimensions
• Type 2 SCD
▪ Data is not overwritten in the DW
▪ A new row for the customer must be inserted
▪ Usually creates primary key problems
▪ You must now add a Surrogate Key (Data Warehouse Key)
▪ Uniquely identifies every row in the dimension table
▪ You must also add another column or two
▪ To flag the current value
▪ To provide date/time perspective
Understanding Slowly Changing
Dimensions
• Type 1 SCD
Understanding Slowly Changing
Dimensions
• Type 2 SCD
Creating Our Data Warehouse Database

• Right click Database, create a New Database


Creating Our Data Warehouse Database

• Select General, Enter a Database Name, Click Owner.


Creating Our Data Warehouse Database

• Click Browse, Add Administrator user.


Creating Our Data Warehouse Database

• Change from 3 to 100 under the Initial Size (MB).


Creating Our Data Warehouse Database

• Click on Autogrowth / Massize and adjust the In Megabytes to 10.


Creating Our Data Warehouse Database

• Click on Option, change to Simple under the Recovery model.


Creating Our Data Warehouse Database
Identifying Our Dimensions
• Three Dimensions
▪ Customer Dimension
▪ Products Dimension
▪ Date Dimension
• We’ll Load Them Using
SQL Data Tools (SSIS)
▪ Familiarize you with
various aspects of
SSIS
Identifying Our Fact Table
• Our Fact Table Will
Include:
▪ Data loaded directly
from the source
▪ Data calculated
during the data load
Understanding Indexing
• Indexing Affects How Data Is Stored And Managed In SQL
Server
• There Are Four Main Indexing Option In SQL Server
▪ Clustered Index
▪ Non-Clustered Index
▪ Filtered Non-Clustered Index
▪ Columnstore Index
Understanding Indexing
• Clustered Index
▪ Determines the physical storage order of the data
▪ There can be only one clustered index on a table
• Non-Clustered Index
▪ Sorts data in a column or column and stores pointers to
the actual data row
▪ You can have up to 999 non-clustered indexes on a table
▪ Non-clustered indexes slow down data management
Understanding Indexing
• Filtered Non-Clustered Index
▪ Creates a non-clustered index on a subset of value in a
column
• Columnstore Index
▪ A non-clustered index placed on a single column
▪ The column is store and searched separately from the data
row
▪ Adding a columnstore index to a column makes the column
read-only
Understanding Indexing
• SQL Server Stores Data In Tables In Two Forms
▪ Heap
▪ Data is stored in the order in which it is added to the table
▪ New rows are added to the bottom of the data list
▪ Balanced tree (B-tree)
▪ Data is ordered based on the clustered index key
Indexing The Data Warehouse
• Indexing In The Data Warehouse Can Be Tricky
▪ Too few indexes will allow data loads to be quick
▪ But query response times will be slow
▪ Too many indexes and data loads slow down and storage
requirements go up
▪ But query response is good
Indexing The Data Warehouse
• General Rule Of Thumb
▪ Dimension tables
▪ Place clustered index on the surrogate key
▪ If the table has a lot of columns, create non-clustered indexes on
the most popular columns
▪ Popular=most often used in queries
Indexing The Data Warehouse
• General Rule Of Thumb
▪ Fact tables
▪ Place a non-clustered index on the single-column foreign keys to
the dimension tables
▪ If the primary key is a composite of all the dimension foreign keys,
make it a non-unique clustered index
Understanding Indexed Views
• What Is A View?
▪ A result set of a query that is a virtual table
▪ The virtual table is not stored permanently in the database
▪ The view can be referenced like a table in Transact-SQL
• Indexing A View
▪ You can create a unique clustered index on a view
▪ The view’s result set is now stored in the database, just like
a regular table with a clustered index
Understanding Indexed Views
• Advantages Of Index Views
▪ Improve the performance of joins and aggregations that
process many rows
Understanding Data Compression
• SQL Server 2012 Supports Data Compression
▪ Data compression reduces the size of the database
▪ Packs more data onto few data pages
▪ Fewer data page reads required to satisfy queries
▪ Lower IO means faster response; lower processing load on
server
▪ Minor issue: extra CPU resources are required for data
updates
▪ Not a problem in data warehousing
Understanding Data Compression
• SQL Server 2012 Supports Three Compression Types
▪ Page compression
▪ Focuses on duplicate values within the data page
▪ Stores one value; places a pointer at all other locations
▪ Row compression
▪ Removes any unused bytes in a fixed data type
▪ CHAR(25)
▪ Unicode compression
▪ Reduces storage space for unicode data that doesn’t require the
space
Understanding Data Compression
• Which Compression Should You Use?
▪ Page compression
▪ It automatically uses row compression when page compression is
used
▪ Fact Tables Usually Benefit The Most From Compression
▪ Note!
▪ Compression is only available in SQL Server Enterprise Edition
▪ See SQL Books Online For Details And Implementation
Using Partitions
• Fact Tables Become Very Large Tables Over Time
• Very Large Database Tables Present Serious Challenges
▪ What if you need to delete a large portion of the data?
▪ TRUNCATE TABLE command performs deletions which minimal
logging…
▪ But it deletes the entire table data
▪ Large Data Inserts Can Become Time Consuming
▪ Index maintenance and storage can become problematic
▪ Table Partitions Deal With All These Issues
Using Partitions
• What Is A Table Partition?
▪ A large table is stored in multiple files
▪ Divided horizontally (rows) based on a condition
▪ Usually date/time
▪ SQL Server 2012 allows up to 15,000 partitions on a single
table
▪ Partitions and data are managed in the background
Using Partitions
• Partitioning Offers Many Advantages
Data Lineage
• What Is Data Lineage?
▪ It depends on who you ask
▪ Best definition…
▪ Data origination and flow details
▪ Where it is from, where it is going, how it is transformed in the
process
▪ Same concept as comments in programming
▪ A note to self
Data Lineage
• Why Do We Need Data Lineage?
▪ To provide meta-data context in the data warehouse
▪ Data can come from many locations at various times
▪ Future business rules may change, affecting some data
▪ Making it invalid
▪ Making it suspect
▪ Making it more important
▪ Data lineage allows us to identify this data
Data Lineage
• Two Main Options For Adding Data Lineage
▪ SSIS system variables
▪ If you are using SSIS
▪ T-SQL system functions
Data Lineage

• Right click AdventureWorksDW2012, Select SQL Query, Run the


following Script as above.
Creating Our Dimensions

• Go to Table from the VTCDW database, Right click Files Tables, Create
New Table by SQL Query.
• Create Fact Table and Dimensions Table under Tables by using SQL scripts
(Create Fact Table.txt; CreateDimensionTables.txt).
Creating Our Dimensions
Creating Our Dimensions
Creating Our Dimensions

• Click on Execute to run the scripts.


Creating Our Fact Table

• The second method is to execute the script. Go to File, Select Open, Click
File and input the database file from your folder. Click Execute once you
input the scripts.
Creating Our Fact Table
Creating Our Fact Table
Creating Our Fact Table
Creating Our Relationships

• Right click Database Diagram, Click New Database System


Creating Our Relationships

• Click Yes to create the database diagram.


Creating Our Relationships

• Click Add to add all the tables


Creating Our Relationships

• Click Close under the Add Table


Creating Our Relationships
Creating Our Relationships

• Create a relationship by Select and Drag the arrow from dimension table
to fact table as a foreign key (ProductKey to ProductKey).
Creating Our Relationships
Creating Our Relationships

• Make sure the Primary key value and Foreign key value are correct. Click
OK.
Creating Our Relationships

• Create a relationship by Select and Drag the arrow from dimension table
to fact table as a foreign key (DateKey to OrderDate).
Creating Our Relationships

• Make sure the Primary key value and Foreign key value are correct. Click
OK.
Creating Our Relationships

• Create a relationship by Select and Drag the arrow from dimension table
to fact table as a foreign key (CustomerKey to CustomerKey).
Creating Our Relationships

• Make sure the Primary key value and Foreign key value are correct. Click
OK.
Creating Our Relationships

Click Yes to make changes to the selected diagram.


Creating Our Relationships

• Enter a name for the diagram. Click OK.


Creating Our Relationships

• Click Yes to save all the diagrams.


Creating Our Relationships

• Under the Tables; Select dbo.FactInternetSales, Select Keys for all the
foreign keys.
Creating Our Relationships
Creating Our Relationships
Q&A

Best of luck !!!!

You might also like