You are on page 1of 18

An Introduction to Data Virtualization

in Business Intelligence
David M Walker
Data Management & Warehousing
http://datamgmt.com
18 OKTOBRIS 2013
What Is Data Virtualization?
Wikipedia:
Data virtualization is [..] an application to retrieve
and manipulate data without requiring technical
details about the data, such as how it is formatted
or where it is physically located.
Or more simply:
A solution that sits in front of multiple data
sources and allows them to be treated as a single
SQL database
Basic Model
Lnd users access
vla a 8eporung
1ools
1radluonal uaLabases
l8M (u82 & neLezza)
Mlcroso (SCL Server)
Cracle (Cracle & MySCL)
osLgres
Sybase (ASL & lC)
LLc.
noSCL / newSCL
Apache Padoop
Cassandra
Mongo
neo4!
eLc.
CLher lormaLs
Mlcroso Cmce
Messaglng
llaL llles
xML
Web
Cloud
Appllcauon Als
eLc.
uaLa vlrLuallzauon lauorm
uenes a 'model' of Lhe source sysLems (slmllar ln concepL Lo a 8C unlverse)
Models can generally be layered on Lop of oLher models
L1L LreaLs
uv plauorm
as a source
uaLa ubllshlng
8aLch/8LS1ful
Message 8ased
SCA/ubllcauon
Advanced Features:
Role Based Access Control & Data Masking
!"#$% '()* +($% '()* ,-. /(0(#1
!oe 8loggs 30-!an-1983 t60,100
!ane SmlLh 17-!un-1978 t73,400
uaLa vlrLuallzauon lauorm:
Manages sensluve lnformauon based on a users role
8ole 8ased
AuLhenucauon
!"#$% '()* +($% '()* ,-. /(0(#1
!oe 8loggs 30-!an-1983 nuLL
!ane SmlLh 17-!un-1978 nuLL
!"#$% '()* +($% '()* 23*
!oe 8loggs 30
!ane SmlLh 33
user 1 user 2
Advanced Features:
Caching
uaLa vlrLuallzauon lauorm

Local uaLabase 1able
wlLh good connecuvlLy
8emoLe uaLabase 1able
wlLh poor connecuvlLy
Cached Copy of
8emoLe uaLabase 1able
user sees performance as lf all Lhe daLa was local
Advanced Features:
Creating a Canonical Data Model
uaLa vlrLuallzauon lauorm

llnance SysLem
user sees sysLem as a slngle CuM and noL muluple sources
8llllng SysLem
C8M SysLem
WebslLe
CLher SysLems
uaLa mapped Lo
conform Lo a
Canonlcal Model
But its not a Silver Bullet
Can be slow
Depending on how much data has to be fetched from remote
systems to the DV platform platforms try to be smart to
reduce this
Can impact performance on underlying systems
Lots of BI users making queries on resource sensitive OLTP
systems is not a good idea
Requires Resources
Another set of servers, technologies, etc. to manage, but this
cost is often offset against the reduction in complexity
elsewhere.
Not a replacement it is an additional tool
You will still need ETL and Messaging
BI Use Cases:
Agile Data Mart Design
Access data
warehouse data
quickly and easily
Design the data mart
you think you want
Test it with real data
and your actual
reporting tool
Also possible with data
warehouse design
uaLa Warehouse
uaLa vlrLuallzauon lauorm
C8
A
8
BI Use Case:
Virtual Data Marts
Big Tin Appliance with
lots of horse power?
Dont want to duplicate
data in the appliance
and consume disk
space for a data mart
but want the star
schema for ease of
use?
uaLa Warehouse
uaLa vlrLuallzauon lauorm
BI Use Case:
Data Mart Extensions
Existing (physical) data
mart
New Data source that
needs to be
incorporated quickly
Create virtual copy of
existing data mart and
data source
Integrate into updated
data mart design
uaLa
vlrLuallzauon
lauorm
new uaLa
Source

uaLa MarL
BI Use Case:
Agile Set Based ELT Design
If your normal ETL style
is a series of set SQL
queries built on top of
each other then you
can quickly prototype
ETL before moving it
into your normal ETL
engine to persist
execute (normally for
performance)
Source Source Source
uaLa vlrLuallzauon lauorm
BI Use Case:
Big Data Integration
DV Platform
connects to Big Data
Sources
Data Sources are
mapped into DV
User accesses them
via standard tools
(SQL, RESTful
interfaces, etc.)
uaLa vlrLuallzauon lauorm
SCL lnLerface
Map 8educe, eLc. lnLerface
SCL based Lools
BI Use Case:
Source System Analysis
Apply your data quality
and data profiling tools
to all your data sources
Look for relationships
across systems
Remove limitations of
accessibility by
enabling caching so
that you are not hitting
the source system but
have fresh data
Source Source Source
uaLa vlrLuallzauon lauorm
uaLa CuallLy & rollng 1ools
BI Use Case:
Data Masking
Currently building two
versions of a data
mart, one with
sensitive data in and
one without
Instead build one and
use Role Based Access
Control (RBAC) to
restrict what an
individual can see
hyslcal uaLa MarL
uaLa vlrLuallzauon lauorm
Anu
BI Use Cases
Some examples
Usefulness of each example depends on the
organization
Generally an enabler for more agility
Quicker prototyping and integration
Will not solve all your problems
And has a cost associated with it (license &
hardware
Vendors: What The Analysts Say
Forrester Wave Data
Virtualization Q1 2012
Forrester Wave Q1/12
Informatica
IBM
Denodo
EU (Spanish) Origins
Composite
Now part of Cisco
Was OEMd by Informatica
Microsoft
SAP
And others
Gartner
No Magic Quadrant, instead
includes Data Virtualization
in Data Integration
Vendors: Product Positioning
Stand Alone
Players
Cisco (Composite)
Denodo
Selection
Popular where IBM/
Informatica are not already
embedded
Integrated
Players
IBM
Informatica
Selection
Popular with organisations
that already have the vendor
ETL tool
An Introduction to Data Virtualization
in Business Intelligence
David M Walker
Data Management & Warehousing
http://datamgmt.com
THANK YOU - PALDIES

You might also like