You are on page 1of 38

INTRODUCTION

TO ANALYTICS
2021 - 2022
LESSON 8.
BUSINESS INTELLIGENCE ARCHITECTURE
(PART I)
Learning Objectives

• Describe data warehouse & its characteristics


• Describe data mart & its characteristics
• Distinguish between data warehouses & data marts
• Recognize data architecture components
• Interpret BI architecture diagrams
• Understand hub-and-spoke design
• Describe origins and types of data latency
• Understand the differences between a data warehouse and a data
lake
Agenda

1. Data and BI Architecture


2. Data Warehouse
3. Data Marts
4. Other data architecture components
5. Hub-and-spoke
6. Data latency
Data Architecture:
The overall structure of data and
data-related resources as an
integral part of the enterprise
architecture

Data Warehousing & Business


Intelligence:
Managing analytical data
processing and enabling access
to decision support data for
reporting and analysis

https://dama.org/sites/default/files/download/DAMA-DMBOK2-Framework-V2-20140317-FINAL.pdf
Data
Wrangling

Textbook Chapter 5 Figure 5.1


DATA WAREHOUSE
Data Warehouse

Data warehouse:
Any database or file
or collection of files
that is used to store
integrated data that
would be then
consumed by BI and
analytics

Textbook Chapter 6 Figure 6.1


Data Warehouse

Integrated Data is gathered and made consistent from one or more source
systems

Separated Separated from operational (source) systems

Subject-oriented Organized by data subject rather than by application

Enterprise scope Sources data across enterprise

Time-variant Stores historical data

Non-volatile Data is not modified; once it is stored it becomes read-only


(historical record)
Data Warehouse – the problem

What do you think are the biggest challenges with


building an Enterprise Data Warehouse (EDW)?

Come up with 2-3 challenges.


https://www.slideshare.net/MapRTechnologies/data-warehouse-modernization-accelerating-timetoaction
DATA MART
Data Mart

Data mart:
A subset of a data
warehouse that is
usually oriented to a
business group or
process rather than
enterprise-wide views

Textbook Chapter 6 Figure 6.2


Data Warehouse vs. Data Mart
Parameter Data Warehouse Data Mart

Scope Data about the whole enterprise Data about a specific area (by business
process or department)
Sources Multiple source systems One or a small number of sources

Objective Provide integrated environment with a full Provide information for specific purpose
picture of the enterprise or project
Size Very large; from 100GB to many TB Smaller than DW, usually below 100GB

Design Complex; centralized Relatively simple; each data mart


designed separately for a purpose
Implementation Complex; long duration – years Easier and faster e.g. a few months

Granularity Time-variant; non-volatile data; detail level May be consolidated and aggregated
for specific purpose; less detail
Cost Very expensive More affordable
Independent Data Marts

Independent data mart:


A data mart that pulls
the data directly from
the source systems,
rather than being
dependent on pulling
data from a Data
Warehouse

Textbook Chapter 6 Figure 6.3


Independent Data Mart – the problem

What can go wrong when multiple independent


data marts are built?

Come up with 2-3 ideas.


Independent Data Mart – the problem

https://www.ewsolutions.com/migrating-independent-data-marts/
Independent Data Mart Challenges

Redundant Data Each of the independent data marts requires its own, typically
duplicated copy of the detailed corporate data.

Redundant Integration and cleansing processes need to be duplicated for all of


Processing the independent data marts; often using different tools

Scalability Directly read operational system – changes needed to multiple data


marts and processes when source system changes

Non-Integrated Lack of agreement on data definitions and metrics across enterprise;


Do not provide enterprise data integration – no “enterprise view” exists;
each data mart may provide a different “version of the truth”

DMU Library of free articles


OTHER DATA
ARCHITECTURE
COMPONENTS
Operational Data Store

Operational Data Store:


A database that
collects data from
multiple source systems
as close to real time as
possible to enable
specific business
processing or
operational reporting

Textbook Chapter 6 Figure 6.4


Operational Data Store (ODS)

One location All the data is loaded into one location for ease of access

Separated Separated from operational (source) systems

Minimum Data loaded without extensive data integration or


integration transformation – “as is”

Application scope Loading application-specific data rather than by subject

Current data Usually contains current data (defined period), not historic data
Federated Data Warehouse

Federated Data
Warehouse:
A collection of Data
Warehouses that
conform to the same
logical model but may
be separated physically
for better performance
or business needs e.g.
• By region or division
• By business function
Textbook Chapter 6 Figure 6.5
Accidental Data Architecture

Accidental Data
Architecture :
A collection of data
storage, warehousing
and analytics solutions
developed with lack of
consideration for
enterprise architecture
and data integration.

Textbook Chapter 6 Figure 6.6


HUB-AND-SPOKE
Hub-and-Spoke Single BI platform

Hub-and-Spoke
Architecture :
A central data hub or
data warehouse that
feeds multiple data
marts for BI and
analytical purposes

Textbook Chapter 6 Figure 6.7


Hub-and-Spoke Multiple BI tools

Textbook Chapter 6 Figure 6.8


Hub-and-Spoke Data Integration

Textbook Chapter 6 Figure 6.16


System of Integration (SOI) - Workflow

Function Data Store

Stage: Gather data from systems of Staging area (a.k.a. landing area)
record

Integrate: perform data preparation EDW integration schema

Distribute: serve as a data hub for the EDW distribution schema


Systems of Analytics (SOA)
System of Analytics (SOA) - Workflow

Function Data Store

Distribute: serve as a data hub for the EDW distribution schema


Systems of Analytics (SOA)

Wrangle data: transform data for First tier of data marts


specific business processes/functions

Refine data for analytical use: further Second and subsequent tiers of sub-data marts
wrangle subsets of data for particular BI
DATA LATENCY
Real-time analytics

When is real-time analytics necessary?

How can we achieve it?

What is latency?
Latency
Latency: delay; the time it takes for a message or a packet of
information to move from one point to another.

Data latency: the time taken to collect and store the data

Analysis latency: the time taken to analyze the data and turn
it into actionable information

Action latency: the time taken to react to the information


and take action
Wikipedia

https://www.researchgate.net/figure/Response-time-latency-30_fig3_312145765
Data Latency Levels
Category Description Examples

Real-time Information is placed in the database as soon Banking


as it occurs; and is available for analysis Payments
immediately
Telecommunication

Near-time Information is uploaded at set intervals rather Stock market price updates
(a.k.a. near than instantaneously, aiming to be as close to Social media monitoring
real-time) real-time as needed (“good enough” for
Email delivery
business needs)

Batch Information is uploaded in large batches at Processing historical data


predefined times, e.g. daily or every few hours. Trends analysis
Usually requires a significant time to process
Payroll & billing
and reach destination
DATA LAKE
What’s a Data Lake?
What’s a Data Lake?

If you think of a Data Mart as a store of bottled water, cleansed and packaged and
structured for easy consumption, the Data Lake is a large body of water in a more natural
state. The contents of the Data Lake stream in from a source to fill the lake, and various
users of the lake can come to examine, dive in, or take samples.“
James Dixon (Pentaho)
What’s a Data Lake?

Data Lake:

A storage repository that holds a vast


amount of raw data in its native
format, including structured, semi-
structured, and unstructured data.
The data structure and requirements
are not defined until the data is
needed.
Cloudtp
https://medium.com/data-ops/the-data-lake-is-a-design-pattern-888323323c66
Data Warehouse vs. Data Lake
Parameter Data Warehouse Data Lake

Types of data Structured Structured, semi-structured, and


unstructured

Data structure Processed Raw

Purpose of data Currently in use Not yet determined

Users Business professional Data scientists

Accessibility More complicated and costly Highly accessible and quick to


to make changes update

https://www.talend.com/resources/data-lake-vs-data-warehouse/

You might also like