You are on page 1of 21

Data Warehouse

Architectures-with Layers
Introduction to Data Warehouse
Architecture
• The Data Warehouse Architecture can be defined as a
structural representation of the concrete functional
arrangement based on which a Data Warehouse is
constructed that should include all its major pragmatic
components, which is typically enclosed with four refined
layers, such as the Source layer where all the data from
different sources are situated, the Staging layer where the
data undergoes ETL processing, the Storage layer where
the processed data are stored for future exercises, and
the presentation layer where the front-end tools are
employed as per the users’ convenience.
Data Warehouse Architecture
• The Data Warehouse Architecture generally
comprises of three tiers.
• Top Tier
• Middle Tier
• Bottom Tier
Layers of Data Warehouse Architecture
• Top Tier
• The Top Tier consists of the Client-side front end of the architecture.
• The Transformed and Logic applied information stored in the Data Warehouse will be used and acquired for Business purposes in this
Tier.
• Several Tools for Report Generation and Analysis are present for the generation of desired information.
• Data mining which has become a great trend these days is done here.
• All Requirement Analysis document, cost, and all features that determine a profit-based Business deal is done based on these tools
which use the Data Warehouse information.
• Middle Tier
• The Middle Tier consists of the OLAP Servers
• OLAP is Online Analytical Processing Server
• OLAP is used to provide information to business analysts and managers
• As it is located in the Middle Tier, it rightfully interacts with the information present in the Bottom Tier and passes on the insights to the
Top Tier tools which processes the available information.
• Mostly Relational or Multidimensional OLAP is used in Data warehouse architecture.
• Bottom Tier
• The Bottom Tier mainly consists of the Data Sources, ETL Tool, and Data Warehouse.
• 1. Data Sources
• The Data Sources consists of the Source Data that is acquired and provided to the Staging and ETL tools for further process.
• 2. ETL Tools
• ETL tools are very important because they help in combining Logic, Raw Data, and Schema into one and loads the information to the
Data Warehouse Or Data Marts.
• Sometimes, ETL loads the data into the Data Marts and then information is stored in Data Warehouse. This approach is known as the
Bottom-Up approach.
• The approach where ETL loads information to the Data Warehouse directly is known as the Top-down Approach.
Difference Between Top-down Approach and
Bottom-up Approach

Top-Down Approach Bottom-Up Approach


Provides a definite and consistent view of Reports can be generated easily as Data
information as information from the data marts are created first and it is relatively
warehouse is used to create Data Marts easy to interact with data marts.
Strong model and hence preferred by big Not as strong but data warehouse can be
extended and the number of data marts
companies
can be created
Time, Cost and Maintenance is high Time, Cost and Maintenance are low.
• Data Marts
• Data Mart is also a storage component used to
store data of a specific function or part related
to a company by an individual authority.
• Data mart gathers the information from Data
Warehouse and hence we can say data mart
stores the subset of information in Data
Warehouse.
• Data Marts are flexible and small in size.
•  Data Warehouse
• Data Warehouse is the central component of
the whole Data Warehouse Architecture.
• It acts as a repository to store information.
• Big Amounts of data are stored in the Data
Warehouse.
• This information is used by several
technologies like Big Data which require
analyzing large subsets of information.
• Data Mart is also a model of Data Warehouse.
• There are four different types of layers which will always be present in Data Warehouse Architecture.
• 1. Data Source Layer
• The Data Source Layer is the layer where the data from the source is encountered and subsequently
sent to the other layers for desired operations.
• The data can be of any type.
• The Source Data can be a database, a Spreadsheet or any other kinds of a text file.
• The Source Data can be of any format. We cannot expect to get data with the same format considering
the sources are vastly different.
• In Real Life, Some examples of Source Data can be
• Log Files of each specific application or job or entry of employers in a company.
• Survey Data, Stock Exchange Data, etc.
• Web Browser Data and many more.
• 2. Data Staging Layer
• The following steps take place in Data Staging Layer.
• Step #1: Data Extraction
• The Data received by the Source Layer is feed into the Staging Layer where the first process that takes
place with the acquired data is extraction.
• Step #2: Landing Database
• The extracted data is temporarily stored in a landing database.
• It retrieves the data once the data is extracted.
• Step #3: Staging Area
• The Data in Landing Database is taken and several quality checks and staging operations are performed in the staging
area.
• The Structure and Schema are also identified and adjustments are made to data that are unordered thus trying to
bring about a commonality among the data that has been acquired.
• Having a place or set up for the data just before transformation and changes is an added advantage that makes the
Staging process very important.
• It makes data processing easier.
• Step #4: ETL
• It is an Extraction, Transformation, and Load.
• ETL Tools are used for integration and processing of data where logic is applied to rather raw but somewhat ordered
data.
• This data is extracted as per the analytical nature that is required and transformed to data that is deemed fit to be
stored in the Data Warehouse.
• After Transformation, the data or rather an information is finally loaded into the data warehouse.
• Some examples of ETL tools are Informatica, SSIS, etc.
• 3. Data Storage Layer
• The processed data is stored in the Data Warehouse.
• This Data is cleansed, transformed, and prepared with a definite structure and thus provides opportunities for
employers to use data as required by the Business.
• Depending upon the approach of the Architecture, the data will be stored in Data Warehouse as well as Data Marts.
Data Marts will be discussed in the later stages.
• Some also include an Operational Data Store.
• 4. Data Presentation Layer
• This Layer where the users get to interact with the data stored in the
data warehouse.
• Queries and several tools will be employed to get different types of
information based on the data.
• The information reaches the user through the graphical representation
of data.
• Reporting Tools are used to get Business Data and Business logic is also
applied to gather several kinds of information.
• Meta Data Information and System operations and performance are
also maintained and viewed in this layer.
• Conclusion
• An important point about Data Warehouse is its efficiency. To create an
efficient Data Warehouse, we construct a framework known as the
Business Analysis Framework.
Data warehouse main layers
• Data Sources layer 
• Data Acquisition & Integration Layer –
Staging Area
• Enterprise Data Warehouse (EDW)
• Business Intelligence layer.
Data Sources layer 
• This layer will contains the defined data source which will be used to extract
analytical information from, and load them into our data warehouse. The
information can be existing inside the organization (internal source system) or out
of it (external source system). Data can also exists in many and different formats
like:
• Databases
• Web services
• Files (Excel, CSV, PDF, TXT,…etc.)
• We form
• There are new data format started to appear in the horizon when Bid Data
concepts were introduced. Social media or in our technical terms unstructured data
is another source of information to consider now while designing your data
warehouse architecture. Big Data can handle different types of information like
recorded vice, scanned images and documents, un-structured text allowing us to
analyses information that we have been able to analyze before.
Data Acquisition & Integration Layer –
Staging Area
• This part will be the intermediate layer between data sources and Enterprise data warehouse. The layer is
responsible on data acquisition from different internal and external data sources. As data are stored in
many different formats the data acquisition layer will use multiple tools and technologies to extract the
required information. The extracted data will be loaded in a landing & staging area to pre-process the
information by applying high level data quality checks. The final output of this layer is a clean data which
will be loaded into Enterprise Data Warehouse (EDW). This layer contains the following components:
• Landing & staging area
• Data Integration Tool (ETL)
• Data Quality
• Here we will describe each component:
• Landing & Staging Area: The landing database will be used as a landing area that will store the data
retrieved from information source system. The staging area will be used to pre-process information by
applying data quality check before moving them to enterprise data warehouse data base. Data moved to
landing and staging area source like (without any transformation). This layer is a database schema with
addition to Big Data Eco system in case of we want to land un-structured data as well.
• Data integration tool: Data integration tools which also known as Extract, Transform and Load tools (ETL)
will extract information from source systems, do the required transformation and preparation for these
data and finally load them in the target place. Informatica Power Center and IBM data stage are the most
recognized ETL tools in the market.
Enterprise Data Warehouse (EDW)
• A data warehouse is constructed by integrating data from multiple heterogeneous sources
that support analytical reporting, structured and/or ad hoc queries, and decision making.
This layer is the core and mandatory one for any data warehouse implementation. It is a
database repository to store analytical and historical information which will be used by
business intelligence solutions.
• The main features and benefits of EDW are:
– A Single, complete, and consistent store of data obtained from a variety of sources and made available
to end users in many ways
– Create an infrastructure for reusing the data in numerous ways and to be the source of many systems
– complex architecture (business model rather than technical simple one) to maintain enterprise data
for a very long time and to avail current and historical data for various users and systems
– Store aggregated and summarized data will enhance the retrieval time of the information and
enhance overall performance.
– Store historical data to show trends and patterns.
– Give business users to display data gathered from different source systems in the same report or
Dashboard.
– Secure the information by allowing only the right people to see their authorized information.
Business Intelligence layer.
• BI tool will be the interface (front end) for the end business users. The BI layer will provide
end users with the required information in many formats depending on their needs. Top
Management and executives will need to see high level information to see highlights on the
current situation from dashboards. Dashboards will contains KPIs, KQIs, KRIs and general
metrics (analytics). Business and Data analysis on the other hand will need to access more
information and the ability to drill up and down.  This information usually provided by
analytical reports through data discovery and modern BI tools. Finally normal users will need
to access other raw information to do their daily job. This will be availed through normal
traditional reports.
• The following items are the main BI components:
– Presentation (semantic) layer.
– Static standard reports.
– Dashboards
– Data Discovery & exploration.
– Geo-Statistics.
– Security
• Semantic layer
• Semantic layer is a business representation of organization’s data that helps end users access data autonomously using common business terms. The
aim is to insulate users from the technical details of the data store and allow them to create queries in terms that are familiar and meaningful.
• Static Report
• A static report is a report that is run immediately upon request and then stored, with the data, in the Data Warehouse.  Using static reports, you view
reports based on very large data sets, although static reports that are run on very large data sets can take a long time to complete. Static reports are
pre-build reports to serve business users daily needs and usually we don’t expect a lot of modifications on those reports.
• Dashboards
• Dashboards often provide at-a-glance views of KPIs (key performance indicators) relevant to a particular objective or business process (e.g. sales,
marketing, human resources, or production). The term dashboard originates from the automobile dashboard where drivers monitor the major
functions at a glance via the instrument cluster. Dashboards give signs about a business letting the user know something is wrong or something is
right. The corporate world has tried for years to come up with a solution that would tell them if their business needed maintenance or if the
temperature of their business was running above normal.
• Data discovery
• Data discovery is learning something new as a result of an active interaction with data.  The tools that help us perform data exploration that results in
discovery are the real innovation behind data discovery.  These data exploration tools focus more on the interaction between people and data, rather
the production of static reports, imitation car dashboards and flashing stoplights.
• Data exploration and discovery is not new.  How many of us have used SQL queries or excel to look for answers to our questions? I’m not referring to
SQL stored procedures or monthly excel reports; but rather, when we use these tools to resolve new questions we have never answered before.
Sometimes this process of searching for the answer lasts days, but we now have a set of Data Discovery tools available that make data exploration and
discovery easier. 
• Geo-statistics
• Geo-statistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Geo-statistics is applied in varied branches of geography,
particularly those involving the spread of diseases (epidemiology), the practice of commerce and military planning (logistics), and the development of
efficient spatial networks. Geostatistical algorithms are incorporated in many places, including geographic information systems (GIS) and the R
statistical environment.
• Security
• The centralized architecture of many BI solutions means that lots of potentially sensitive data is aggregated in one place and used by many people. If
an attacker gains access he can steal vast quantities of data or alter information used by many different business units.
• The Security component provides a complete security system for your BI.  Furthermore, the component provides ways to authorize authenticated
users based on their roles.
• The scope of our project to initially avail comprehensive semantic layer for the current available information. Then create some reports and
dashboards to satisfy the organization BI needs. Ejada team will train the organization team to do their own reporting and how to enable self-service
capabilities of the BI tool and how to create new customized reports based on the delivered semantic layer.
• In Future; the organization can adopt a vision to raise the BI maturity level inside the organization by applying new modules and adopting new
features. In the following section I will talk about data warehouse supporting layers.

You might also like