An Introduction to Data Virtualization

in Business Intelligence
David M Walker
Data Management & Warehousing
http://datamgmt.com
18 OKTOBRIS 2013
What Is Data Virtualization?
• Wikipedia:
“Data virtualization is [..] an application to retrieve
and manipulate data without requiring technical
details about the data, such as how it is formatted
or where it is physically located.”
• Or more simply:
A solution that sits in front of multiple data
sources and allows them to be treated as a single
SQL database
Basic Model
Lnd users access
vla a 8eporung
1ools
• 1radluonal uaLabases
• l8M (u82 & neLezza)
• Mlcrosoû (SCL Server)
• Cracle (Cracle & MySCL)
• ÞosLgres
• Sybase (ASL & lC)
• LLc.
• noSCL / newSCL
• Apache Padoop
• Cassandra
• Mongo
• neo4!
• eLc.
• CLher lormaLs
• Mlcrosoû Cmce
• Messaglng
• llaL llles
• xML
• Web
• Cloud
• Appllcauon AÞls
• eLc.
uaLa vlrLuallzauon Þlauorm
ueñnes a 'model' of Lhe source sysLems (slmllar ln concepL Lo a 8C unlverse)
Models can generally be layered on Lop of oLher models
L1L LreaLs
uv plauorm
as a source
uaLa Þubllshlng
8aLch/8LS1ful
Message 8ased
SCA/Þubllcauon
Advanced Features:
Role Based Access Control & Data Masking
!"#$% '()* +($% '()* ,-. /(0(#1
!oe 8loggs 30-!an-1983 t60,100
!ane SmlLh 17-!un-1978 t73,400
uaLa vlrLuallzauon Þlauorm:
Manages sensluve lnformauon based on a users role
8ole 8ased
AuLhenucauon
!"#$% '()* +($% '()* ,-. /(0(#1
!oe 8loggs 30-!an-1983 nuLL
!ane SmlLh 17-!un-1978 nuLL
!"#$% '()* +($% '()* 23*
!oe 8loggs 30
!ane SmlLh 33
user 1 user 2
Advanced Features:
Caching
uaLa vlrLuallzauon Þlauorm

Local uaLabase 1able
wlLh good connecuvlLy
8emoLe uaLabase 1able
wlLh poor connecuvlLy
Cached Copy of
8emoLe uaLabase 1able
user sees performance as lf all Lhe daLa was local
Advanced Features:
Creating a Canonical Data Model
uaLa vlrLuallzauon Þlauorm

llnance SysLem
user sees sysLem as a slngle CuM and noL muluple sources
8llllng SysLem
C8M SysLem
WebslLe
CLher SysLems
uaLa mapped Lo
conform Lo a
Canonlcal Model
But it’s not a Silver Bullet
• Can be slow
– Depending on how much data has to be fetched from remote
systems to the DV platform – platforms try to be smart to
reduce this
• Can impact performance on underlying systems
– Lots of BI users making queries on resource sensitive OLTP
systems is not a good idea
• Requires Resources
– Another set of servers, technologies, etc. to manage, but this
cost is often offset against the reduction in complexity
elsewhere.
• Not a replacement – it is an additional tool
– You will still need ETL and Messaging
BI Use Cases:
Agile Data Mart Design
• Access data
warehouse data
quickly and easily
• Design the data mart
you think you want
• Test it with real data
and your actual
reporting tool
• Also possible with data
warehouse design
uaLa Warehouse
uaLa vlrLuallzauon Þlauorm
C8
A
8
BI Use Case:
Virtual Data Marts
• Big Tin Appliance with
lots of horse power?
• Don’t want to duplicate
data in the appliance
and consume disk
space for a data mart
but want the star
schema for ease of
use?
uaLa Warehouse
uaLa vlrLuallzauon Þlauorm
BI Use Case:
Data Mart Extensions
• Existing (physical) data
mart
• New Data source that
needs to be
incorporated quickly
• Create virtual copy of
existing data mart and
data source
• Integrate into updated
data mart design
uaLa
vlrLuallzauon
Þlauorm
new uaLa
Source

uaLa MarL
BI Use Case:
Agile Set Based ELT Design
• If your normal ETL style
is a series of set SQL
queries built on top of
each other then you
can quickly prototype
ETL before moving it
into your normal ETL
engine to persist
execute (normally for
performance)
Source Source Source
uaLa vlrLuallzauon Þlauorm
BI Use Case:
Big Data Integration
• DV Platform
connects to Big Data
Sources
• Data Sources are
mapped into DV
• User accesses them
via standard tools
(SQL, RESTful
interfaces, etc.)
uaLa vlrLuallzauon Þlauorm
SCL lnLerface
Map 8educe, eLc. lnLerface
SCL based Lools
BI Use Case:
Source System Analysis
• Apply your data quality
and data profiling tools
to all your data sources
• Look for relationships
across systems
• Remove limitations of
accessibility by
enabling caching so
that you are not hitting
the source system but
have fresh data
Source Source Source
uaLa vlrLuallzauon Þlauorm
uaLa CuallLy & Þroñllng 1ools
BI Use Case:
Data Masking
• Currently building two
versions of a data
mart, one with
sensitive data in and
one without
• Instead build one and
use Role Based Access
Control (RBAC) to
restrict what an
individual can see
Þhyslcal uaLa MarL
uaLa vlrLuallzauon Þlauorm
Anu
BI Use Cases
• Some examples
– Usefulness of each example depends on the
organization
• Generally an enabler for more agility
– Quicker prototyping and integration
• Will not solve all your problems
– And has a cost associated with it (license &
hardware
Vendors: What The Analysts Say
• Forrester Wave Data
Virtualization Q1 2012
• Forrester Wave Q1/12
– Informatica
– IBM
– Denodo
• EU (Spanish) Origins
– Composite
• Now part of Cisco
• Was OEM’d by Informatica
– Microsoft
– SAP
– And others
• Gartner
– No Magic Quadrant, instead
includes Data Virtualization
in Data Integration
Vendors: Product Positioning
Stand Alone
• Players
– Cisco (Composite)
– Denodo
• Selection
– Popular where IBM/
Informatica are not already
embedded
Integrated
• Players
– IBM
– Informatica
• Selection
– Popular with organisations
that already have the vendor
ETL tool
An Introduction to Data Virtualization
in Business Intelligence
David M Walker
Data Management & Warehousing
http://datamgmt.com
THANK YOU - PALDIES