You are on page 1of 10

What is Data Warehousing?

A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide
meaningful business insights. A Data warehouse is typically used to connect and analyze business data
from heterogeneous sources. The data warehouse is the core of the BI system which is built for data
analysis and reporting.

How Data warehouse works?


A Data Warehouse works as a central repository where information arrives from one or more data
sources. Data flows into a data warehouse from the transactional system and other relational databases.

Data may be:

1. Structured
2. Semi-structured
3. Unstructured data

The data is processed, transformed, and ingested so that users can access the processed data in the
Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse
merges information coming from different sources into one comprehensive database.

By merging all of this information in one place, an organization can analyze its customers more
holistically. This helps to ensure that it has considered all the information available. Data warehousing
makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales
and profits.

Types of Data Warehouse


Three main types of Data Warehouses (DWH) are:

1. Enterprise Data Warehouse (EDW):

Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support service
across the enterprise. It offers a unified approach for organizing and representing data. It also provide the
ability to classify data according to the subject and give access according to those divisions.

2. Operational Data Store:

Operational Data Store, which is also called ODS, are nothing but data store required when neither Data
warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is
refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the
Employees.

3. Data Mart:

A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such
as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.

General stages of Data Warehouse


Earlier, organizations started relatively simple use of data warehousing. However, over time, more
sophisticated use of data warehousing begun.

The following are general stages of use of the data warehouse (DWH):

Offline Operational Database:

In this stage, data is just copied from an operational system to another server. In this way, loading,
processing, and reporting of the copied data do not impact the operational system’s performance.

Offline Data Warehouse:

Data in the Datawarehouse is regularly updated from the Operational Database. The data in
Datawarehouse is mapped and transformed to meet the Datawarehouse objectives.

Real time Data Warehouse:

In this stage, Data warehouses are updated whenever any transaction takes place in operational
database. For example, Airline or railway booking system.

Integrated Data Warehouse:

In this stage, Data Warehouses are updated continuously when the operational system performs a
transaction. The Data warehouse then generates transactions which are passed back to the operational
system.

Components of Data warehouse


Four components of Data Warehouses are:

Load manager: Load manager is also called the front component. It performs with all the operations
associated with the extraction and load of data into the warehouse. These operations include
transformations to prepare the data for entering into the Data warehouse.

Warehouse Manager: Warehouse manager performs operations associated with the management of the
data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of
indexes and views, generation of demoralization and aggregations, transformation and merging of source
data and archiving and baking-up data.

Query Manager: Query manager is also known as backend component. It performs all the operation
operations related to the management of user queries. The operations of this Data warehouse component
are direct queries to the appropriate tables for scheduling the execution of queries.

End-user access tools:

This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application
development tools 4. EIS tools, 5. OLAP tools and data mining tools.
Building a Data Warehouse

Business Intelligence has advanced quickly and dramatically in recent years, and
many people are taking advantage of it. To be the most successful and efficient
with this newfound Business Intelligence (BI) power, it’s essential to be able to
analyze and harness ALL of your data. Enter the data warehouse.

Simply put, a data warehouse is a large store of data that’s collected from multiple
different sources within a business. A data warehouse is used as storage for data
analytic work (OLAP systems), leaving the transactional database (OLTP systems)
free to focus on transactions. With a significant amount of data kept in one place,
it’s now easier for businesses to analyze and make better-informed decisions.

There are many ways to go about data warehousing. Our focus in this tutorial,
however, is the benefits of building one and the basic foundation required.

Why Should I Build a Data Warehouse?

While having all of your data gathered in one place is arguably the biggest benefit
of having a data warehouse, it is certainly not the only one. Here, we’ve listed
some of the other benefits of having a data warehouse:

 Save Time – Business users can quickly access data from multiple sources
within a data warehouse, meaning that time won’t be wasted on retrieving
data from multiple sources.
 Boost Confidence – Having data transferred automatically to your data
warehouse by a structured system, as opposed to being transferred by human
labor, gives you more confidence that your data is clean, current and
complete.
 Increase Insight – Data warehouses structure your data so it’s easily
analyzable.
 Improve Security – Managing who has access to your data is much easier
when there’s a centralized connection point. Data warehouses make security
completely customizable, so you’re able to give access to whoever you’d
like and lock down all of your other systems.
When using a data warehouse to its full potential, analyzing data becomes
convenient and answering important questions about your business becomes
simple. Your data is organized and available so you can get your answers quickly
and securely.

Now that you know why it is beneficial to have a data warehouse for your
business, let’s talk about what it takes to build one.

Structure of a Data Warehouse

Regardless of the specific approach, you take to building a data warehouse, there
are three components that should make up your basic structure: A storage
mechanism, operational software, and human resources.

 Storage – This part of the structure is the main foundation — it’s where
your warehouse will live. There are two main options when it comes to
storage, an in-house server (Oracle, Microsoft SQL Server) or on the cloud
(Amazon S3, Microsoft Azure). An in-house server is internal hardware
that’s set up within your office, and the cloud is a digital storage solution
based on external servers. Either is a feasible option when it comes to
storage and all depends on your needs.
 Software – This is the operational part of the data warehouse structure. It’s
often broken down into two categories — centralization software and
visualization software. Centralization software is needed to collect and
maintain the data that comes from all of your separate databases.
Visualization software is needed to take the data and present it in a visual
form to aid in analyzation. Some centralization software includes
visualization software as part of its package, but it is highly recommended
that you have both types of software regardless.
 Labor – This is the management aspect of the data warehouse, something
that’s absolutely essential in having a working solution. To keep your
warehouse functional, it might be necessary to hire new positions within
your business. Hiring well-skilled professionals is crucial, as running a data
warehouse requires a lot of knowledge. However, if you choose to have a
cloud-based warehouse, it might not be necessary to have as many human
resources. The cloud is managed by third-party vendors, so it’s their
responsibility to do routine maintenance on hardware and servers.

Mapping the data warehouse architecture to Multiprocessor


architecture

Data base architectures of parallel processing


 There are three DBMS software architecture styles for parallel processing:
 Shared memory or shared everything Architecture
 Shared disk architecture
 Shred nothing architecture
Shared Memory Architecture:
 Tightly coupled shared memory systems, illustrated in following figure have
the following characteristics:

 Multiple PUs share memory.


 Each PU has full access to all shared memory through a common bus.
 Communication between nodes occurs via shared memory.
 Performance is limited by the bandwidth of the memory bus.
 It is simple to implement and provide a single system image, implementing
an RDBMS on
 SMP(symmetric multiprocessor)
 A disadvantage of shared memory systems for parallel processing is as
follows:
 Scalability is limited by bus bandwidth and latency, and by available
memory.
Shared Disk Architecture
 Shared disk systems are typically loosely coupled. Such systems, illustrated
in following figure, have the following characteristics:

 Each node consists of one or more PUs and associated memory.


 Memory is not shared between nodes.
 Communication occurs over a common high-speed bus.
 Each node has access to the same disks and other resources.
 A node can be an SMP if the hardware supports it.
 Bandwidth of the high-speed bus limits the number of nodes (scalability) of
the system.
 . The Distributed Lock Manager (DLM ) is required.
 Parallel processing advantages of shared disk systems are as follows:
 Shared disk systems permit high availability. All data is accessible even if
one node dies.
 These systems have the concept of one database, which is an advantage over
shared nothing systems.
 Shared disk systems provide for incremental growth.
 Parallel processing disadvantages of shared disk systems are these:

 Inter-node synchronization is required, involving DLM overhead and greater


dependency on high-speed interconnect.
 If the workload is not partitioned well, there may be high synchronization
overhead.
Shared Nothing Architecture
 Shared nothing systems are typically loosely coupled. In shared nothing
systems only one CPU is connected to a given disk. If a table or database is
located on that disk
 Shared nothing systems are concerned with access to disks, not access to
memory.
 Adding more PUs and disks can improve scale up.
 Shared nothing systems have advantages and disadvantages for parallel
processing:
Advantages
 Shared nothing systems provide for incremental growth.
 System growth is practically unlimited.

 MPPs are good for read-only databases and decision support applications.
 Failure is local: if one node fails, the others stay up.

Disadvantages
 More coordination is required.
 More overhead is required for a process working on a disk belonging to
another node.
 If there is a heavy workload of updates or inserts, as in an online transaction
processing system, it may be worthwhile to consider data-dependent routing
to alleviate contention
Data Warehouse and Database: Differences

Data Warehouse Database


Data warehouse stores historic data. The database stores real-time data.
Data are stored based on Extract, Data are collected based on Application usage.
Transform and Load processes.
Data warehouse model types are: Database model types are: Hierarchical database model
Virtual warehouse Data Mart Enterprise Relational model Network model Entity-relationship
data warehouse model Document model Star Schema Object-oriented
database model Entity attribute value model
The data warehouse is a central platform The database is a traditional method of storing data in
for data storage that helps businesses to tables, columns, and rows, which allows data queries
collect and integrate data from various and processing easily.
operational sources.
Data warehouses are used for analyzing Databases are used for capturing data, storing data, and
data and performance reporting. supporting operational processes.
Data must be integrated and balanced Data is balanced within the scope of the process.
from multiple processes.
Data is updated in a scheduled manner. Data is updated when a transaction occurs.
The data warehouse consumes data from The database is designed to be transactional and they
all the databases and creates an are not designed to perform data analytics.
optimized layer to perform data
analytics.
Schema is done or extracted from Databases are structured with a defined schema.
import.

Multi Dimensional Data Model

The multi-Dimensional Data Model is a method which is used for ordering data in the database
along with good arrangement and assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to interrogate analytical questions
associated with market or business trends, unlike relational databases which allow customers to
access data in the form of queries. They allow users to rapidly receive answers to the requests
which they made by creating and examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional databases. It
is used to show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the data from
many dimensions and perspectives. It is defined by dimensions and facts and is represented by a
fact table. Facts are numerical measures and fact tables contain measures of the related
dimensional tables or names of the facts.

Features of multidimensional data models:

Measures: Measures are numerical data that can be analyzed and compared, such as sales or
revenue. They are typically stored in fact tables in a multidimensional data model.
Dimensions: Dimensions are attributes that describe the measures, such as time, location, or
product. They are typically stored in dimension tables in a multidimensional data model.
Cubes: Cubes are structures that represent the multidimensional relationships between
measures and dimensions in a data model. They provide a fast and efficient way to retrieve and
analyze data.
Aggregation: Aggregation is the process of summarizing data across dimensions and levels of
detail. This is a key feature of multidimensional data models, as it enables users to quickly
analyze data at different levels of granularity.
Drill-down and roll-up: Drill-down is the process of moving from a higher-level summary of
data to a lower level of detail, while roll-up is the opposite process of moving from a lower-
level detail to a higher-level summary. These features enable users to explore data in greater
detail and gain insights into the underlying patterns.
Hierarchies: Hierarchies are a way of organizing dimensions into levels of detail. For
example, a time dimension might be organized into years, quarters, months, and days.
Hierarchies provide a way to navigate the data and perform drill-down and roll-up operations.
OLAP (Online Analytical Processing): OLAP is a type of multidimensional data model that
supports fast and efficient querying of large datasets. OLAP systems are designed to handle
complex queries and provide fast response times.

Advantages of Multi Dimensional Data Model

The following are the advantages of a multi-dimensional data model :


 A multi-dimensional data model is easy to handle.
 It is easy to maintain.
 Its performance is better than that of normal databases (e.g. relational databases).
 The representation of data is better than traditional databases. That is because the multi -
dimensional databases are multi-viewed and carry different types of factors.
 It is workable on complex systems and applications, contrary to the simple one-dimensional
database systems.
 The compatibility in this type of database is an upliftment for projects having lower
bandwidth for maintenance staff.
Disadvantages of Multi Dimensional Data Model

The following are the disadvantages of a Multi Dimensional Data Model :


 The multi-dimensional Data Model is slightly complicated in nature and it requires
professionals to recognize and examine the data in the database.
 During the work of a Multi-Dimensional Data Model, when the system caches, there is a
great effect on the working of the system.
 It is complicated in nature due to which the databases are generally dynamic in design.
 The path to achieving the end product is complicated most of the time.
 As the Multi Dimensional Data Model has complicated systems, databases have a large
number of databases due to which the system is very insecure when there is a security break.

You might also like