You are on page 1of 24

MSBI, Data Warehousing and

Data Integration Techniques


By

Quontra Solutions
Email
: info@quontrasolutions.com
Contact : 404-900-9988
WebSite : www.quontrasolutions.com

Agenda
What is BI?
What is Data Warehousing?
Microsoft platform for BI applications
Data integration methods
T-SQL examples on data integration

What is BI?
Business Intelligence is a collection of theories,
algorithms, architectures, and technologies that
transforms the raw data into the meaningful data in
order to help users in strategic decision making in
the interest of their business.

BI Case
For example senior management of an industry
can inspect sales revenue by products and/or
departments, or by associated costs and incomes.
BI technologies provide historical, current and
predictive views of business operations. So,
management can take some strategic or operation
decision easily.

Typical BI Flow
Users

Data Tools

Data Warehouse

Extraction

Data Sources

Why BI?
By using BI, management can monitor objectives
from high level, understand what is happening,
why is happening and can take necessary steps
why the objectives are not full filled.
Objectives:
1)Business Operations Reporting
2)Forecasting
3)Dashboard
4)Multidimensional Analysis
5)Finding correlation among different factors

What is Data warehousing?


A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of management's
decision making process.
- Bill Inmon

A data warehouse is a copy of transaction data


specifically structured for query and analysis.
- Ralph Kimball

Dimensional Data Model


Although it is a relational model but data would be
stored differently in dimensional data model when
compared to 3rd normal form.
Dimension: A category of information. Ex. the time
dimension.
Attribute: A unique level within a dimension. Ex. Month is
an attribute in the Time Dimension.
Hierarchy: The specification of levels that represents
relationship between different attributes within a
dimension. Ex. one possible hierarchy in the Time
dimension is Year Quarter Month Day.
Fact Table: A fact table is a table that contains the
measures of interest. Ex. Sales Amount is a measure.

Star Schema A single object (the fact table) sits in


the middle and is radically connected to other
surrounding objects (dimension lookup tables) like a star.
Each dimension is represented as a single table. The
primary key in each dimension table is related to a
foreign key in the fact table.
Snowflake Schema An extension of the star
schema, where each point of the star explodes into more
points. In a star schema, each dimension is represented
by a single dimensional table, whereas in a snowflake
schema, that dimensional table is normalized into
multiple lookup tables, each representing a level in the
dimensional hierarchy.

After the team and tools are finalized, the process follows
below steps in waterfall:
a)Requirement Gathering
b)Physical Environment Setup
c)Data Modeling
d)ETL
e)OLAP Cube Design
f)Front End Development
g)Report Development
h)Performance Tuning and Query Optimization
i)Data Quality Assurance
j)Rolling out to Production
k)Production Maintenance
l)Incremental Enhancements

Microsoft BI Tools
SSIS This tool in MSBI suite performs any kind of data
transfer with flexibility of customized dataflow. Used
typically to accomplish ETL processes in Data
warehouses.
SSRS provides the variety of reports and the capability
of delivering reports in multiple formats. Ability to interact
with different kind of data sources
SSAS MS BI Tool for creating a cubes, data mining
models from DW. A typical Cube uses DW as data
source and build a multidimensional database on top of it.

Power View and Power Pivot These are self serve BI


tools provided by Microsoft. Very low on cost of
maintenance and are tightly coupled with Microsoft Excel
reporting which makes it easier to interact.
Performance Point Servers It provides rapid creation
of PPS reports which could be in any form and at the
same time forms can be changed just by right click.
Microsoft also provides the Scorecards, dashboards, data
mining extensions, SharePoint portals etc. to serve the BI
applications.

Data Integration
methods

RDBMS
Copying data from one table to another table(s)
Bulk / Raw Insert operations
Command line utilities for data manipulation
Partitioning data
File System
Copying file(s) from one location to another
Creating flat files, CSVs, XMLs, Excel spreadsheets
Creating directories / sub-directories

Web
Calling a web service to fetch / trigger data
Accessing ftp file system
Submitting a feedback over internet
Sending an email / SMS message
Other
Generate Auditing / Logging data
Utilizing / maintaining configuration data (static)

T-SQL
Best practices

Query to merge data into a table


MERGE dbo.myDestinationTable AS dest
USING

SELECT ProductID
, MIN(PurchaseDate) AS MinTrxDate
, MAX(PurchaseDate) AS MaxTrxDate
FROM dbo.mySourceTable
WHERE ProductID IS NOT NULL
GROUP BY ProductID
) AS src
ON dest.ProductID = src.ProductID
WHEN MATCHED THEN
UPDATE SET MaxTrxDate = src.MaxTrxDate
, MinTrxDate = ISNULL(dest.MinTrxDate, src.MinTrxDate)
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT (ProductID, MinTrxDate, MaxTrxDate)
VALUES (src.ProductID, src.MinTrxDate, src.MaxTrxDate);
MERGE clause is T-SQL programmers favorite as it covers 3 operations in
one

Query to get a sequence using


CTE
;WITH myTable (id) AS
(
SELECT 1 id
UNION ALL
SELECT id + 1 FROM myTable
WHERE id < 10
)
SELECT * FROM myTable

COMMON TABLE EXPRESSIONS (CTEs) are the most popular


recursive constructs in T-SQL

Move Rows in a single Query


DECLARE @Table1 TABLE (id int, name varchar(50))
INSERT @Table1 VALUES (1, 'Maxwell'), (2, 'Miller'), (3, 'Dhoni')
DECLARE @Table2 TABLE (id int, name varchar(50))
DELETE FROM @Table1 OUTPUT deleted.* INTO @Table2
SELECT * FROM @Table1
SELECT * FROM @Table2

OUTPUT clause redirects the intermediate results of


UPDATE, DELETE or INSERT into a table specified

Query to generate random


password
SELECT CHAR(32 + (RAND() * 94))
+CHAR(32 + (RAND() * 94))
+CHAR(32 + (RAND() * 94))
+CHAR(32 + (RAND() * 94))
+CHAR(32 + (RAND() * 94))
+CHAR(32 + (RAND() * 94))

Non-deterministic functions like RAND() gives


different result for each evaluation

Funny T-SQL Try it yourself


Aliases behavior is not consistent
SELECT 1id, 1.eMail, 1.0eMail, 1eMail
Ever seen WHERE clause in SELECT without FROM clause ?
SELECT 1 AS id WHERE 1 = 1
IN clause expects column name at its left? Well, not Really!
SELECT * FROM myTable WHERE 'searchtext' IN (Col1,
Col2, Col3)
Two = operators in single assignment in UPDATE? Possible!
DECLARE @ID INT = 0
UPDATE mySequenceTable SET @ID = ID = @ID + 1

Thank you!!

You might also like