You are on page 1of 75

Introduction to

Business Intelligence

Prem Shanker
Sr. Software Engineer
Credit Suisse
Goals
• Learn about the concept of Data Warehousing and what
BIDS offer.
• Learn about how to design and implement a Data
Warehouse Dimensional database.
• Learn about what is a cube.
• Learn about the SQL Server Analysis Services
Architecture
• Learn what is new in Analysis Services 2008
• Learn about what is a MDX Language.
What BIDS can do?
Cubes

Source
Systems/OLTP

Clients
SQL Server Analysis
Data Services
Query Tools
Warehouse
Reporting
Analysis

1 2 3 4
Design the Populate Create Query
Data Warehouse Data Warehouse OLAP Cubes D
Data Warehouse
• Table and Cube
• Star Schema and Snowflake Schema
• Fact Table and Dimension Table
Table vs Cube
 A simplified example:
A typical relational Make it into a cube
table
Data are organized by rows Data are organized by intersections

Sales table Region dim


Product Region Sales $ East West Total
Donut East 1
Donut 1 2 3
Donut West 2
Milk 3 4 7
Milk East 3

Milk West 4
Total 4 6 10

Product dim
The basic ingredients
to make a cube
• Two kinds of table in a data warehouse DB
1. fact table
2. dimension tables.

• Question:
1. Which one is a fact table and which one is a
dimension table?
Star Schema
• A Star Schema contains a fact table and one or
more dimension tables.
1. A Fact Table: The central fact table store the
numeric fact (measures) such as Sales dollars,
Costs, Unit Sales etc.
2. Dimension Tables: They surround the central
fact table, and they store descriptive
information about the measures
• The shape looks like a Star
Star schema
Snowflake Schema
Review: Data
Warehouse Schemas
– The Data Warehouse is either a Star Schema or a
Snowflake Schema:
• Fact tables that contain foreign keys and numeric measures
• Dimension table contains the data describes the measures.

• The schema is ready for Analysis Services to build a cube.


Client Server Architecture

Excel
OLEDB
TCP ADOMD

Client
Apps
Analysis
XMLA
Server MOSS
IIS
AMO
BIDS
HTTP
SSMS

ADOMD SSRS
.NET
A Logical Cube -
Example

Product
Donut

Sandwich
Region
The Milk
Sales$ by North
Soda by South
Soda
West in Yr East
of 2001
Beer West
West
1999 2000 2001 2002 Time
Tools to connect to
Cubes
• SQL Server Management Studio (SSMS)
• Business Intelligence Development Studio (BIDS)
• Query Analyzer (SSMS) – To write MDX
• Excel 2007 – Uses MDX
Physical Cube- BIDS
• Analysis Services Database
• Unified Dimensional Model
• Data Source connection
• Data Source View
• Dimensions
• Cube Creation Wizard
Analysis Services Database
• An Analysis Services database is the top level
container for other dependent objects:
• A database includes
– Data Source
– Data Source View
– Cube
– Dimension
– Security Role
Creating an Analysis
Services Database
• You can use one of the following to create a new
empty database on an instance of SQL Server
2005 Analysis Services.
– SQL Server Management Studio
– Business Intelligence Development Studio.
Unified Dimensional
Modeling
• Common Name: UDM
• New feature Since AS 2005
• Combine all Relational Sources in one
single environment.
• A single data model, called Unified
Dimensional Model (UDM) over one or
more physical data sources
Unified Dimensional
Model - Concept
• The user needs to understand the particulars of
each technology (e.g. the dialect of SQL used) to
generate reports.
• Within one single Analysis Services, you can have
more than one data sources to pull the data from.
Data Source
Connection
• The data sources of your AS database is your Data
Warehouse databases (SQL).
• It defines the connection string and authentication
information for a database on an OLE DB data provider.
• You can use the Data Source Wizard to specify one or
more data sources (SQLDB) for Analysis Services
databases.
The Functions of the
Data Sources
• Integrate your Analysis Services databases with
the data warehouses
• They are used for the following:
– Processing the Cubes and dimensions
– Data Retrieval if ROLAP or HOLAP is used as the
storage.
– Write Back
Different Storage
types of Cube
Data Sources
connection to SQL
Server
• For SQL Server, you can pick from the following providers:
– OLE DB provider for SQL Server
– SQL Native Client
– .NET Provider/SqlClient Data Provider

– (Avoid using .NET data sources – OLEDB is faster for


processing in practices)
Data Source Views
• New feature Since AS 2005
• A single unified view of the metadata (UDM) from specified
tables and views that the data source defines in the project.
• It hides the physical implementation of the underlying data sources from
the reporting users.
• Basic Data Layout for Cubes
• Define Data Relationships
• Can Leverage Multiple Data Sources
• The key to effective cube design
• Named Query As Objects – Not only Tables or Views
Demo
Dimension
• All dimensions are based on tables or views in a data source view.
• All dimensions are shared since AS 2005
• The structure of a dimension is largely driven by the structure of the
underlying dimension table or tables.
• The simplest structure is called a star schema, which is where each
dimension is based on a single dimension table that is directly linked
to the fact table by a primary key - foreign key relationship.
Dimension Consists of
• A dimension consists of:
– Attributes that describe the entity
– User-Defined Hierarchies that organize
dimension members in meaningful ways
• such as
Store Name  Store City  Store State  Store Country
Attributes
• New feature since AS 2005
• Containers of dimension members
• Typically have one-many relationships between
attributes in the same dimension:
– City State,
– State Country, etc.
– All attributes implicitly related to the key
User Defined
Hierarchies
• User Defined Hierarchies are created
from Attributes
• Tree-like structure
City  State  Country  All
• Provide navigation paths in a cube
Typical Example – Calendar
Hierarchy
• The Year, Quarter, and Month attributes are
used to construct a hierarchy, named Calendar,
in the time dim.
• The relationship between the levels and
members of the Calendar dimension (a regular
dimension) is shown in the following diagram.
Measure Group
• In a cube, a measure is the set of values, usually numeric, that
are based on a column in the fact table in the cube.
• A measure group contains one or more or all the measures
from a single fact table. It can’t contain measures from
different fact table.
Measure Group
Advantages
• Measure groups provide the following advantages:
– They can be partitioned and processed separately
– They allows to include measures from diff fact tables.
– They are grouped by granularity: Same measure group
same granularity.
– Security can be applied to specific measure groups
Cube
• A cube is
defined by its
measures and
dimensions.
Inside a Cube
• Measures and Measure Groups
• Dimensions Relationships
• Calculations
• Actions
• Partitions
• Perspectives
Demo
Dimension Design

• Different Dimension Relationships


– Regular Dimension Relationship
– Reference Dimension Relationship
– Fact Dimension Relationship
– Role Playing Dimension
– Parent-Child Hierarchy
Regular Dimension
Relationships
• A traditional star schema design
• The Primary Key in the dimension table joins
directly to Foreign Key in the fact table.
Reference Dimension
Relationships
• Snowflake schema
• A Reference dimension using columns from
multiple tables, or the dimension table links a
dimension that is directly linked to the fact table.
Role Playing
Dimension
It is used in a cube more than one time, each time
for a different purpose.
• Each role-playing dimension is joined to a fact
table on a different foreign key.
• Example, you might add a Time dimension to a cube three times to
track the times that
– products are ordered,
– products are shipped,
– Orders are due..
Parent-Child Hierarchy
• A parent-child hierarchy is a hierarchy in a
standard dimension that contains a parent
attribute. A parent attribute describes a self-join,
within the same dimension table.
• Example: Employee Hierarchy
 An employee is an employee
who reports to his/her manager.
His manager is an employee
as well
Employee Key self joins to
ParentEmployeeKey
Slowly Changing
Dimension
• Some attribute values may change over time.
• Two basic techniques:
– Type 1 change
– Type 2 change
Slowly Changing
Dimension – Type 1
• A Type 1 change, is to simply overwrite the old value
with the new one.
Slowly Changing
Dimension – Type 2
• You create a new dimension row with the new value and a
new surrogate key, and mark the old row or timestamp as no
longer in effect The fact table will use the new surrogate key
to link new fact measurements
Calculated Member
• A Calculated Members is a member of a dimension or a
measure group that is defined based on a MDX expression.
• The value for the member is calculated at runtime. The
result values are not stored in the disk.
Calculated Member
Properties
Named Set

• A named set is a MDX expression that returns a


set of dimension members.
• You can define named sets and save them as
part of the cube definition.
• It allows you to reuse the same named set
throughout the cube.
• Typical example:
– Create a list Top 10 customers based on Sales
– You can reuse same Top 10 customers in diff
queries.
Best practices for Cube
Design
• Use integer or numeric for key columns.
• Avoid ROLAP storage mode, particular with
custom rollup or unary operators. MOLAP is the
fastest storage structure in SSAS.
• Use parent-child dimensions prudently,
especially those containing custom rollup and
unary operators. No aggregation support in PC
dimension.
Best practices for Cube
Design (Contd..)
• Use role playing dimensions (e.g. OrderDate,
BillDate, ShipDate) - avoids multiple physical
copies. If the dimensions are base from the
same physical table(s), use role playing
dimensions.
What's New (Analysis
Services -
Multidimensional
Database)
• New Attribute Relationship designer. The dimension
editor has a new Attribute Relationship designer that
makes it easier to browse and modify attribute
relationships.

• New AMO Warnings. These new warning messages alert


users when they depart from design best practices or
make logical errors in database design.
What's New (Analysis
Services -
Multidimensional
Database)
• Backup and Restore Improvements
• The backup and restore functionality in Analysis Services has a new
storage structure and enhanced performance in all backup and
restore scenarios.
• Improved Storage Structure
• The new storage structure provides a more robust repository for the
archived database. By using the new storage structure, there is no
practical limit to the size of the database file, nor is there a limit to
the number of files that a database can have.
• Improved Performance
• The new backup and restore functionality achieves increased
performance. Tests on different sized databases and with various
numbers of files have shown significant performance improvements.
What's New (Analysis
Services -
Multidimensional
Database)
• Dynamic Management Views
• Monitoring Connections, Sessions, and Commands
Discover_Connections, Discover_Sessions, and
Discover_Commands.
• select * from $system.discover_connections
Fetching Data from
Cube
• What Is MDX
• Testing MDX with the Query Tool in SQL Server
Management Studio
• The Basic Elements of an MDX Query
What Is MDX
• An Extension of SQL Syntax That:
– Queries and manipulates multidimensional data
in OLAP cubes
– Defines calculations based on information in the cube
– Defines and populates local cubes

• Not a True Extension –


– Syntax Deviates Significantly from SQL
Testing MDX with
Management Studio
Background

Select
on axis (x),
on axis (y),
on axis (z)
From [cubeName]
Every cell has a
name...

c ts Components
du Clothing
r o Bikes
P
1997
1998
Time

1999
2000
2001
s t
Sale Cos Unit
s Measures
Every cell has a
name...
(Products.Bikes, Measures.Units, Time.[2000])

c ts Components
du Clothing
o
Pr Bikes
1997
1998
Time

1999
2000
2001
s t
Sale Cos Unit
s Measures
Every cell has a
name...
(Products.Bikes, Measures.Units, Time.[2000])
(Products.Bikes, Measures.Sales, Time.[1999])
c ts Components
du Clothing
o
Pr Bikes
1997
1998
Time

1999
2000
2001
s t
Sale Cos Unit
s Measures
A Cell is referenced by all the
dimensions

What if I only specify this?


(Products.Bikes, Measures.Units)

c ts Components
du Clothing
o
Pr Bikes 1997

1998
Time

1999
2000
2001
s t
Sale Cos Unit
s Measures
Default Member
What if I only specify this?
(Products.Bikes, Measures.Units)
If Time’s default member is [1997]
Ans: (Products.Bikes, Measures.Units, Time.[1997])

c ts Computer
d u Monitor
r o Printer
P
1997
1998
Time

1999
2000
2001
s t
Sale Cos Unit
s Measures
The Basic Elements of
an MDX Query
Select
{[Ship Date].[Calendar]} on columns,
{[Product].[Product Categories]} on rows
from [Adventure Works]
Using Braces { }

• Braces Denote a Set


• Braces Can be Omitted when the Set is Unambiguous.
• In SSAS 2005 / 2008:
• SELECT
[Ship Date].[Calendar] ON COLUMNS,
[Product].[Product Categories] ON ROWS
FROM [Adventure Works]

In AS 2000:
SELECT
{[Ship Date].[Calendar]} ON COLUMNS,
{[Product].[Product Categories]} ON ROWS
FROM [Adventure Works]
Using Brackets [ ]

• Brackets Enclose a String Value


• Necessary for:
– Field names with spaces: [New York], [Mary Lo]
– Numbers as field names: [2007], [2008]
• Otherwise, the SSAS will treat them as numerous constants
Default Members

• Every Dimension has a Default Member


– Usually the “All” member is the default member.
• Default Measures
– The measures dimension also has a default measure
– In our sample cube [Adventure Works], the default
member for the cube is [Reseller Sales Amount]
Members
You want to query more than a single cell.
Use Members function
Members function returns the set of members in a dimension, level, or
hierarchy.
select
[Ship Date].[Calendar] on columns,
[Product].[Product Categories].members on rows
from [Adventure Works]
Test Yourself: Number 1
[Ship Date].[Calendar] also has a membership; that is, it is made up of
more granular information. Modify the query to return the
membership of the [Ship Date].[Calendar]dimension.
select
[Ship Date].[Calendar] on columns,
[Product].[Product Categories].members on rows
from [Adventure Works]

Desired result:
Naming Additional
Dimensions
Number Name

AXIS(0) COLUMNS

AXIS(1) ROWS

AXIS(2) PAGES

AXIS(3) SECTIONS

AXIS(4) CHAPTERS
Retrieving Data from a
Cube
select
[Ship Date].[Calendar].[Calendar Year].[CY 2004] on axis(0),
[Promotion].[Promotions].[reseller] on axis(1)
from [Adventure Works]
[Promotion].[Promotions]

No Discunt
Reseller

2001 2002 2003 2004


[Ship Date].[Calendar]
Test Yourself: Number 2

• Modify the query to return the sales of Bikes with


No Discount

select
[Ship Date].[Calendar].[Calendar Year].[CY 2004] on axis(0),
[Promotion].[Promotions].[reseller] on axis(1)
from [Adventure Works]

Expect Result
Fully Qualified Names
• [CY 2001] below could be
– [Delivery Date].[Calendar].[CY 2001] or
– [Ship Date].[Calendar].[CY 2001]
select
[CY 2001] on axis(0)
from [Adventure Works]

• [Product].[Product Categories].[bikes] is the same as


[Product].[Product Categories].[All Products].[bikes]
Two Dimensions with
Where Clause
select
[Ship Date].[Calendar].[Calendar Year].members on axis(0),
[Promotion].[Promotions].[reseller] on axis(1)
from [Adventure Works]
where [Product].[Product Categories].[bikes]

[Promotion].
[Promotions]
No
Discount
Resell
Components
er Clothing
[Product].
Bikes
200120022003 2004 [Product Categories
[Ship Date].[Calendar]
Demo

• Lab MDX Query


Few Useful References

• www.microsoft.com/sqlserver/2008/en/us/analys
is-services.aspx
• All BI WebCasts -
http://www.microsoft.com/events/series/
bi.aspx?tab=webcasts&id=all
• MDX References –
msdn.microsoft.com/en-us/library/ms145506.aspx
Thank You
pundit.prem@gmail.com