0% found this document useful (0 votes)
42 views4 pages

Understanding Data Warehousing Concepts

Uploaded by

xmvwjuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views4 pages

Understanding Data Warehousing Concepts

Uploaded by

xmvwjuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Warehouse:

Large organiza ons o en have complex structures, with data stored in different loca ons, systems, or
formats. For example, data about manufacturing issues might be on one system, while customer
complaints are stored on another. Addi onally, companies may buy external data, like mailing lists for
marke ng or customer credit scores from credit bureaus, to help make decisions.

Decision-makers need informa on from all these sources, but se ng up separate queries for each
source is difficult and inefficient. Also, these systems o en store only current data, while decision-
makers may need historical data to understand trends, like how customer purchase pa erns have
changed over the years.

Data warehouses solve these problems by consolida ng data from mul ple sources and providing
access to both current and historical data for be er decision-making.

Defini on: A data warehouse is a centralized system designed to store, manage, and analyze large
volumes of data collected from mul ple sources. It focuses on providing meaningful insights for
decision-making by organizing data in a way that supports querying and repor ng.

According to William H. Inmon (father of data warehouse), “A data warehouse is an integrated, subject
oriented, non-vola le, me-variant collec on of data in support of management’s decisions.”

Characteris cs of a Data Warehouse:

Subject-Oriented: Organized around specific subjects or domains, like sales, marke ng, or finance.
Integrated: Integra on is defined as establishing a connec on between large amount of data from
mul ple databases or sources.

Time-Variant: Maintains historical data for trend analysis and forecas ng. Data stored in a data
warehouse is recalled with a specific me period and provides informa on from a historical
perspec ve.

Non-vola le: In the non-vola le data warehouse, data is permanent i.e. when new data is inserted,
previous data is not replaced, omi ed, or deleted.

Architecture of Data Warehouse:

The architecture of a data warehouse is designed to facilitate the efficient storage, retrieval, and
analysis of large volumes of data. It integrates data from various sources, organizes it for analysis,
and provides tools for querying and repor ng. The architecture typically consists of the following
components.

Data Source: Data is collected in the Data Warehouse from mul ple data sources. There are four
types of data sources

1. Produc on Data: Data coming from opera onal system is called Produc on Data. It
is the major source of data.
2. Archived Data: Data which is stored in back files of opera onal system is known as
Archived Data.
3. Internal Data: Users o en store their private spreadsheets, documents, and other files
in a data warehouse in some form. This type of data is referred to as internal data.
4. External Data: External data refers to informa on that originates outside the
organiza on and is brought into the data warehouse to complement internal data for
be er decision-making. This data is o en acquired from third-party sources or public
repositories and provides insights that internal data alone cannot deliver.

Stage Area

Since the data, extracted from the external sources does not follow a par cular format, so there is a
need to validate this data to load into data warehouse. For this purpose, it is recommended to
use ETL tool.

 E(Extracted): Data is extracted from External data source.

 T(Transform): Data is transformed into the standard format.

 L(Load): Data is loaded into data warehouse a er transforming it into the standard format.

Data Storage:

A er cleansing of data, it is stored in the data warehouse as central repository. It stores the meta
data and the actual data gets stored in the data marts. It stores integrated and historical data in a
centralized loca on. It organizes data into schemas like star or snowflake schemas for op mized
querying. Mul dimensional model is used here for efficient analysis of data.

Data which is stored in this component, it provides a view which is of the overall en re organiza on,
because of which it is called Enterprise wide Data Warehouse (EDW).

Meta Data contains informa on about the data, such as:

 Source and transforma on details.

 Schema defini ons and rela onships.

Informa on Delivery:

It delivers data from EDW or data marts to the users in the form which is required. Mul ple
informa on delivery methods are included eg: pie chart, graph etc. This informa on might be given
to mul ple tools which anlysie data like data mining, OLAP etc.
Control and Management:

The control and management component is a cri cal part of a data warehouse architecture. It
oversees the opera ons, ensures data integrity, and op mizes the overall performance of the data
warehouse system.

Key Functions of the Control and Management Component

1. Data Flow Control:


 Manages the flow of data from source systems to the warehouse and between
internal components.
 Ensures that data extrac on, transforma on, and loading (ETL) processes are
executed efficiently and in the correct sequence.
2. Job Scheduling:
 Automates ETL tasks, such as data extrac on, cleaning, and loading, to ensure mely
updates to the warehouse.
 Schedules reports, dashboards, and queries based on organiza onal requirements.
3. Data Consistency and Integrity:
 Monitors data quality to ensure accuracy and consistency across all layers of the
warehouse.
 Implements data valida on rules and handles error correc on during the ETL
process.
4. Resource Management:
 Allocates system resources (CPU, memory, storage) to op mize the performance of
data warehouse opera ons.
 Balances workloads across the system to prevent bo lenecks.
5. Metadata Management:
 Maintains and u lizes metadata for tracking data lineage, schema defini ons, and
query op miza on.
 Ensures that users and systems can easily access relevant metadata for be er data
understanding.
6. Security Management:
 Enforces access controls to protect sensi ve data from unauthorized access.
 Manages user roles, permissions, and authen ca on mechanisms.
7. Query Op miza on:
 Analyzes and improves query execu on plans to enhance performance.
 Uses indexing and caching techniques to speed up data retrieval.
8. Monitoring and Repor ng:
 Tracks the health and performance of the data warehouse system.
 Provides logs and reports on system usage, failures, and ETL opera ons for audi ng
and troubleshoo ng.

Ini al Loading:
Ini al loading is the process of popula ng a data warehouse with the complete dataset for the first
me. This involves extrac ng all the relevant data from source systems, transforming it as necessary,
and loading it into the warehouse.

Characteris cs:

 Performed when the data warehouse is being set up for the first me.

 Typically involves large volumes of data.


 Takes more me compared to incremental loading due to the size of the dataset.

 Usually done in batch mode, as it is a one- me opera on.

Incremental Loading:
Incremental loading is the process of upda ng the data warehouse by adding only new or modified
data since the last load. It keeps the warehouse current without reloading the en re dataset.

Characteris cs:

 Performed regularly (e.g., daily, weekly, or in real- me).

 Focuses only on data that has changed (inserted, updated, or deleted).

 Faster and more efficient compared to ini al loading.

Data Mart:

A data mart is a subset of a data warehouse that focuses on a specific business area, department, or
domain. It is a smaller, more targeted repository of data designed to address the analy cal needs of a
par cular group of users.

Key Features of a Data Mart

1. Subject-Specific:

 Focuses on a single subject area, such as sales, marke ng, finance, or inventory.

 Example: A marke ng data mart contains data like campaign performance and
customer demographics.

2. Smaller in Scale:

 Contains a subset of the data warehouse, making it easier and faster to access.

3. Op mized for Specific Users:

 Tailored for specific business units, ensuring relevance and usability.

4. Independence:

 Can operate independently of the main data warehouse or as part of it.

You might also like