You are on page 1of 4

Data Warehouse Architecture and its

components :
[ figure in note copy ]

1. **Source Data**: The source data component includes different data sources
like databases, spreadsheets, and systems used by an organization. These
sources store raw data collected from various parts of the company, such as
sales transactions, customer records, and inventory information.

Operational databases + transactional systems + external sources


Ex. SAP, SQL, ORACLE, flat files, excel files, spreadsheets etc.

2. **Data Staging**: The data staging component is like a temporary holding area
for the data. Before entering the main data warehouse, the raw data from the
source systems is taken to the staging area. Here, it is checked, cleaned, and
organized to ensure it's accurate and reliable for analysis.

E = extraction of data from source systems


T = Cleansing + filtration + standardization of raw data + Removal of useless raw
data
L = loading transformed data to DW

3. **Data Warehouse**: The data warehouse is like a big storage space where all
the organized and cleaned data is kept. It's designed in a way that makes it easy
for people to ask questions and get meaningful answers. The data in the
warehouse is structured and arranged using a special method called dimensional
modeling, which simplifies data analysis.

4. **Management and Control**: The management and control component is like


the behind-the-scenes work that keeps the data warehouse running smoothly. It
involves activities such as
- making sure only authorized people can access the data,
- monitoring performance to ensure it's working fast,
- and keeping track of all the data in the warehouse.
5. **Data Mart**: The data mart is like a smaller, specialized section of the data
warehouse. It's designed to cater to specific needs of different groups within the
organization. For example, the sales team might have its own data mart with
sales-related data, and the marketing team might have another one with
marketing-related data.

Subset of DW + enhances user request response time


Ex. sales, marketing, purchases, accounting, etc

6. Multidimensional DBMS: The Multidimensional DBMS is a specialized type of


database management system used in the data warehouse. It's like a super-smart
assistant that understands the unique way data is organized in the warehouse.
It allows users
- to analyze data from different perspectives,
- viewing it from various angles, or "dimensions."
For example, users can look at sales data by product, by region, or by time, all in one
view

7. Information Delivery: The information delivery component is like the way the data and
insights from the data warehouse are presented to users.
It involves using various Business Intelligence (BI) tools, dashboards, and reports to
communicate the findings in a user-friendly and visually appealing manner.
This enables decision-makers to access and interpret the data easily, facilitating
data-driven decision-making across the organization.

By putting all these components together, the data warehouse architecture


makes it possible for organizations to store, manage, and analyze their data in a
way that supports better decision-making and helps them understand their
business better. It's like having a well-organized library with all the books neatly
arranged, making it easier for people to find the information they need.

Key Considerations/Design Goals:

Separation: Separating analytical and transactional processing to ensure optimized


performance and resource allocation for each type of workload.
Scalability: Designing the data warehouse to handle growing data volumes and user
demands efficiently by enabling independent scaling of different components.
(upgradable hona chahiye)

Extensibility: Planning the data warehouse architecture to accommodate future


changes, additions, or enhancements seamlessly, ensuring it can adapt to evolving
business needs.

Security: Implementing robust security measures to safeguard data, control access, and
protect against unauthorized use or breaches.

Administrability: Designing the data warehouse with ease of administration and


maintenance in mind, making it simpler to manage, monitor, and troubleshoot.

These key considerations guide the decision-making process and shape the data
warehouse architecture to meet the specific objectives and requirements of the
organization. By addressing these considerations, data warehouse architects aim to
create a well-structured, flexible, and high-performing data warehouse that supports
data-driven decision-making and provides valuable insights to users.

Architectural Model

- 2-tier Architecture
The two-tier architecture in data warehousing is often referred to as the client-server
architecture. It is characterized by the physical separation of the data sources and the
data warehouse.

In the client-server architecture for data warehousing:

​ Client Tier: The client tier represents the front-end or user-facing part of the data
warehouse. It includes analytical tools, query and reporting interfaces, and other
user applications that allow users to interact with and access data.

​ Server Tier: The server tier serves as the backend or server-side of the data
warehouse. It includes the data warehouse database and the ETL (Extract,
Transform, Load) process responsible for data extraction, transformation, and
loading into the data warehouse.

The client-server architecture physically separates the data sources (where raw data is
collected) from the data warehouse (where integrated and organized data is stored).
The data warehouse resides on the server-side, and users interact with it through
client-side applications.

- 3-tier Architecture
The three-tier approach is the most widely used architecture for data warehouse
systems.

Essentially, it consists of three tiers:

1. The bottom tier is the database of the warehouse, where the cleansed and
transformed data is loaded.

2. The middle tier is the application layer giving an abstracted view of the database.
It arranges the data to make it more suitable for analysis. This is done with an
OLAP server, implemented using the ROLAP or MOLAP model.

3. The top-tier is where the user accesses and interacts with the data. It represents
the front-end client layer. You can use reporting tools, query, analysis or data
mining tools.

You might also like