You are on page 1of 36

DATAWAREHOUSE

CONSTRUCTS
AND
COMPONENTS
Group 2
DATAWAREHOUSE A data warehouse is a centralized
repository that contains historical and
CONSTRUCTS commutative data from single or
multiple sources that employees of an
AND organization can use for analysis,
drawing insights, and data driven

COMPONENTS decision-making. A data warehouse


architecture is the logical and physical
design of a data warehouse.
DEFINITI
 Data Warehouse Architecture (Constructs)  defines
the arrangement of data in different databases and
identifies the most effective technique for extracting
information from raw data

 When a company grows, a multi-cloud architecture can


be used to relocate data and workloads. A data
warehouse may be built in one of three ways: single-tier,
two-tier, or three-tier.
 Single-tier architecture

Two-laye
It is known as the simplest DBMS architecture.. layers which
Because the database is implemented in this manner, it is warehouses.
classified into tiers. A single database server houses all of two parts to
the data in a single database. business da
The objective of a single layer is to minimize the to store the
amount of data stored. This goal is to remove data the second p
redundancy. Small businesses will benefit from this type of users.
architecture because it is simple to manage and However
inexpensive to construct. It is not suitable for businesses supporting
with complex data requirements and numerous data connectivity
streams. This architecture is not frequently used in a single-tier
practice. database ser
 Two-tier architecture  Three-Tie

Two-layer architecture is one of the Data Warehouse


hitecture.. layers which separates physically available sources and data
anner, it is warehouses. In a two-tier database architecture, there are
uses all of two parts to it. A data warehouse is the first part of a
business data warehouse. A database such as this is used
It is the m
nimize the to store the company’s data. The client database is located in produces a we
move data the second part. As a result, it serves as a database for the valuable insight
his type of users.
nage and However, this architecture is not expandable and also not
businesses supporting a large number of end-users. It also has
rous data connectivity problems because of network limitations. Unlike
used in a single-tier, the two-tier design uses a system and a
database server.
 Three-Tier Data Warehouse Architecture

a Warehouse
ces and data  Bottom
re, there are
st part of a
s this is used The datab
It is the most common type of modern DWH design as it tier. It is usua
e is located in produces a well-organized data flow from raw information to from your sou
abase for the valuable insights. It consists of the Top, Middle and Bottom Tier. bottom tier us
The botto
and also not marts, and da
It also has integration too
ations. Unlike combine and a
stem and a
tecture

 Bottom Tier:

design as it The database of the Data Warehouse servers as the bottom


ormation to tier. It is usually a relational database system. Data is extracted
ottom Tier. from your sources and then transformed and loaded into the
bottom tier using ETL tools.
The bottom tier consists of your database server, data
marts, and data lakes. Metadata is created in this tier – and data
integration tools, like data virtualization, are used to seamlessly
combine and aggregate data.
 Middle Tier:

The middle tier in Data warehouse is an OLAP server which


is implemented using either ROLAP or MOLAP model. For a user,
this application tier presents an abstracted view of the database.
This layer also acts as a mediator between the end-user and the
database.
 Top Tier:

The top tier is a front-end client layer. It holds the data


warehouse access tools that let users interact with data, create
dashboards and reports, monitor KPIs, mine and analyze data,
build apps, and more. This tier often includes a workbench or
sandbox area for data exploration and new data model
development. It could be Query tools, reporting tools, managed
query tools, Analysis tools and Data mining tools.
These characteristics are required when building cloud-
optimized data warehouses. Everything is stored in a
central location, which is ideal for a data center. The
physical scaling of computing and storage resources is
handled by the individual. There is almost no concurrency
in a program that competes with other resources. By using
storage-as-a-service, the user can take advantage of data
storage growth and contraction as they see fit.
APPROACH
S
Top-down approach:
External Sources

External Sources
Staging area
External source is a source from
where data  is collected irrespective
of the type of data. Data can be
structured, semi-structured and
unstructured as well.
Top-down approach: Stage Area

External Sources
Since the data, extracted from the
external sources does not follow a
particular format, there is a need to
validate this data to load into the data
warehouse. For this purpose, it is
Staging area
recommended to use the ETL tool.

• E(Extracted): Data is extracted


from External data source. 

• T(Transform): Data is transformed


into the standard format. 

• L(Load): Data is loaded into data


warehouse after transforming it into
the standard format.
Top-down approach:
External Sources

Data warehouse

Staging area After cleansing of data, it is stored


in the data warehouse as a central
repository. It actually stores the
metadata and the actual data gets
stored in the data marts. Note that
data warehouse stores the data in its
purest form in this top-down approach.
Top-down approach:
External Sources

Data Marts

Staging area Data mart is also a part of the


storage component. It stores the
information of a particular function of
an organization which is handled by a
single authority. There can be as many
data marts in an organization
depending upon the functions. We can
also say that data mart contains
subsets of the data stored in the data
warehouse.
 First, the data is extracted from
external sources (same as happens
in top-down approach).
 Then, the data goes through the staging area (as
explained above) and loaded into data marts
instead of data warehouses. The data marts are
created first and provide reporting capability. It
addresses a single business area.
 These data marts are then integrated into the data
warehouse.
This approach is given by Kinball as – data
marts are created first and provides a thin view for
analyses and data warehouses are created after
complete data marts have been created.
DATA
WAREHOUSE
COMPONENTS
DATA
WAREHOUSE Now, let’s learn about the major components of
a data warehouse (DWH) and how they help build
and scale a data warehouse in detail.

COMPONENTS The different layers of a data warehouse or the


components in a DWH architecture are:
The ETL tool you choose will determine the following:

•The time expended in data extraction


•Approaches to extracting data
Extraction, Transformation, and
Loading Tools (ETL) •Kind of transformations applied and the
ETL tools are central components of an simplicity of doing so
enterprise data warehouse design. These •Business rule definition for
tools help extract data from different data validation and cleansing to improve end-
sources, transform it into a suitable product analytics
•Filling mislaid data
arrangement, and load it into a data
warehouse. •Outlining information distribution from the
fundamental depository to your BI applications
Their functionality includes:
• Anonymize data as per regulatory stipulations.
• Eliminating unwanted data in operational databases from loading into the Data
warehouse.
• Search and replace common names and definitions for data arriving from
different sources.
• Calculating summaries and derived data
• In case of missing data, populate them with defaults.
• De-duplicated repeated data arriving from multiple data sources.

These Extract, Transform, and Load tools may generate cron jobs, background
jobs, Cobol programs, shell scripts, etc. that regularly update data in data
warehouses. These tools are also helpful to maintain the Metadata.

These ETL Tools have to deal with challenges of Database & Data
heterogeneity.
 Meta Data :
In a typical data warehouse architecture, metadata describes the data warehouse database
and offers a framework for data. It helps in building, maintaining and managing the data
warehouse.

There are two types of metadata in data warehousing:

Technical Metadata
Business Metadata
comprises information
that can be used by includes information
developers and that offers an easily
managers when understandable
executing warehouse standpoint of the data
development and stored in the
administration tasks warehouse.
 Data Warehouse Access Tools :
A data warehouse uses a database or group of databases as a foundation. Data warehouse
corporations generally cannot work with databases without the use of tools unless they have
database administrators available. However, that is not the case with all business units. This is
why they use the assistance of several no-code data warehousing tools, such as:

The following are the four database types that you can use

Query and reporting Application Data mining tools


tools development tools for data OLAP tools
warehousing
help users produce
corporate reports for help create tailored help construct a
systematize the multi-dimensional
analysis that can be in reports and present procedure of identifying
the form of them in data warehouse and
arrays and links in
spreadsheets, interpretations huge quantities of data allow the analysis of
calculations, or intended for reporting using cutting-edge enterprise data from
interactive visuals. purposes. statistical modeling numerous viewpoints.
methods.
Data Warehouse Reporting Layer

The reporting layer in the data warehouse allows the end-users to access the
BI interface or BI database architecture. The purpose of the reporting layer in the
data warehouse is to act as a dashboard for data visualization, create reports, and
take out any required information.
We will learn about the Data Warehouse
Components and Architecture of Data Warehouse
with Diagram as shown below:
The Data Warehouse is based on an RDBMS server
which is a central information repository that is
surrounded by some key Data Warehousing
components to make the entire environment
functional, manageable and accessible.
- Data warehouse is an information system that contains
historical and commutative data from single or multiple
sources. These sources can be traditional Data
Warehouse,

- Cloud Data Warehouse or Virtual Data Warehouse.


Summary - A data warehouse is subject oriented as it offers
information regarding the subject instead of organization’s
ongoing operations.

- In Data Warehouse, integration means the establishment


of a common unit of measure for all similar data from the
different databases
- Data warehouse is also non-volatile means the previous
data is not erased when new data is entered in it.
- A Data Warehouse is Time-variant as the data in a DW
has high shelf life.
- There are mainly 5 components of Data Warehouse
Architecture:

Summary 1) Database
2) ETL Tools
3) Metadata
4) Query Tools
5) DataMarts
- These are four main categories of query tools
1. Query and reporting, tools
2. Application Development tools,
3. Data mining tools
4. OLAP tools
- The data sourcing, transformation, and migration tools
are used for performing all the conversions and

Summary summarizations.

In the Data Warehouse Architecture, meta-data plays an


important role as it specifies the source, usage, values,
and features of data warehouse data.
THANK YOU!
Group 2

You might also like