Professional Documents
Culture Documents
A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide
meaningful business insights. A Data warehouse is typically used to connect and analyze business data
from heterogeneous sources. The data warehouse is the core of the BI system which is built for data
analysis and reporting.
1. Structured
2. Semi-structured
3. Unstructured data
The data is processed, transformed, and ingested so that users can access the processed data in the
Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse
merges information coming from different sources into one comprehensive database.
By merging all of this information in one place, an organization can analyze its customers more
holistically. This helps to ensure that it has considered all the information available. Data warehousing
makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales
and profits.
Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support service
across the enterprise. It offers a unified approach for organizing and representing data. It also provide the
ability to classify data according to the subject and give access according to those divisions.
Operational Data Store, which is also called ODS, are nothing but data store required when neither Data
warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is
refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the
Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such
as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.
The following are general stages of use of the data warehouse (DWH):
In this stage, data is just copied from an operational system to another server. In this way, loading,
processing, and reporting of the copied data do not impact the operational system’s performance.
Data in the Datawarehouse is regularly updated from the Operational Database. The data in
Datawarehouse is mapped and transformed to meet the Datawarehouse objectives.
In this stage, Data warehouses are updated whenever any transaction takes place in operational
database. For example, Airline or railway booking system.
In this stage, Data Warehouses are updated continuously when the operational system performs a
transaction. The Data warehouse then generates transactions which are passed back to the operational
system.
Load manager: Load manager is also called the front component. It performs with all the operations
associated with the extraction and load of data into the warehouse. These operations include
transformations to prepare the data for entering into the Data warehouse.
Warehouse Manager: Warehouse manager performs operations associated with the management of the
data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of
indexes and views, generation of demoralization and aggregations, transformation and merging of source
data and archiving and baking-up data.
Query Manager: Query manager is also known as backend component. It performs all the operation
operations related to the management of user queries. The operations of this Data warehouse component
are direct queries to the appropriate tables for scheduling the execution of queries.
This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application
development tools 4. EIS tools, 5. OLAP tools and data mining tools.
Building a Data Warehouse
Business Intelligence has advanced quickly and dramatically in recent years, and
many people are taking advantage of it. To be the most successful and efficient
with this newfound Business Intelligence (BI) power, it’s essential to be able to
analyze and harness ALL of your data. Enter the data warehouse.
Simply put, a data warehouse is a large store of data that’s collected from multiple
different sources within a business. A data warehouse is used as storage for data
analytic work (OLAP systems), leaving the transactional database (OLTP systems)
free to focus on transactions. With a significant amount of data kept in one place,
it’s now easier for businesses to analyze and make better-informed decisions.
There are many ways to go about data warehousing. Our focus in this tutorial,
however, is the benefits of building one and the basic foundation required.
While having all of your data gathered in one place is arguably the biggest benefit
of having a data warehouse, it is certainly not the only one. Here, we’ve listed
some of the other benefits of having a data warehouse:
Save Time – Business users can quickly access data from multiple sources
within a data warehouse, meaning that time won’t be wasted on retrieving
data from multiple sources.
Boost Confidence – Having data transferred automatically to your data
warehouse by a structured system, as opposed to being transferred by human
labor, gives you more confidence that your data is clean, current and
complete.
Increase Insight – Data warehouses structure your data so it’s easily
analyzable.
Improve Security – Managing who has access to your data is much easier
when there’s a centralized connection point. Data warehouses make security
completely customizable, so you’re able to give access to whoever you’d
like and lock down all of your other systems.
When using a data warehouse to its full potential, analyzing data becomes
convenient and answering important questions about your business becomes
simple. Your data is organized and available so you can get your answers quickly
and securely.
Now that you know why it is beneficial to have a data warehouse for your
business, let’s talk about what it takes to build one.
Regardless of the specific approach, you take to building a data warehouse, there
are three components that should make up your basic structure: A storage
mechanism, operational software, and human resources.
Storage – This part of the structure is the main foundation — it’s where
your warehouse will live. There are two main options when it comes to
storage, an in-house server (Oracle, Microsoft SQL Server) or on the cloud
(Amazon S3, Microsoft Azure). An in-house server is internal hardware
that’s set up within your office, and the cloud is a digital storage solution
based on external servers. Either is a feasible option when it comes to
storage and all depends on your needs.
Software – This is the operational part of the data warehouse structure. It’s
often broken down into two categories — centralization software and
visualization software. Centralization software is needed to collect and
maintain the data that comes from all of your separate databases.
Visualization software is needed to take the data and present it in a visual
form to aid in analyzation. Some centralization software includes
visualization software as part of its package, but it is highly recommended
that you have both types of software regardless.
Labor – This is the management aspect of the data warehouse, something
that’s absolutely essential in having a working solution. To keep your
warehouse functional, it might be necessary to hire new positions within
your business. Hiring well-skilled professionals is crucial, as running a data
warehouse requires a lot of knowledge. However, if you choose to have a
cloud-based warehouse, it might not be necessary to have as many human
resources. The cloud is managed by third-party vendors, so it’s their
responsibility to do routine maintenance on hardware and servers.
MPPs are good for read-only databases and decision support applications.
Failure is local: if one node fails, the others stay up.
Disadvantages
More coordination is required.
More overhead is required for a process working on a disk belonging to
another node.
If there is a heavy workload of updates or inserts, as in an online transaction
processing system, it may be worthwhile to consider data-dependent routing
to alleviate contention
Data Warehouse and Database: Differences
The multi-Dimensional Data Model is a method which is used for ordering data in the database
along with good arrangement and assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to interrogate analytical questions
associated with market or business trends, unlike relational databases which allow customers to
access data in the form of queries. They allow users to rapidly receive answers to the requests
which they made by creating and examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional databases. It
is used to show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the data from
many dimensions and perspectives. It is defined by dimensions and facts and is represented by a
fact table. Facts are numerical measures and fact tables contain measures of the related
dimensional tables or names of the facts.
Measures: Measures are numerical data that can be analyzed and compared, such as sales or
revenue. They are typically stored in fact tables in a multidimensional data model.
Dimensions: Dimensions are attributes that describe the measures, such as time, location, or
product. They are typically stored in dimension tables in a multidimensional data model.
Cubes: Cubes are structures that represent the multidimensional relationships between
measures and dimensions in a data model. They provide a fast and efficient way to retrieve and
analyze data.
Aggregation: Aggregation is the process of summarizing data across dimensions and levels of
detail. This is a key feature of multidimensional data models, as it enables users to quickly
analyze data at different levels of granularity.
Drill-down and roll-up: Drill-down is the process of moving from a higher-level summary of
data to a lower level of detail, while roll-up is the opposite process of moving from a lower-
level detail to a higher-level summary. These features enable users to explore data in greater
detail and gain insights into the underlying patterns.
Hierarchies: Hierarchies are a way of organizing dimensions into levels of detail. For
example, a time dimension might be organized into years, quarters, months, and days.
Hierarchies provide a way to navigate the data and perform drill-down and roll-up operations.
OLAP (Online Analytical Processing): OLAP is a type of multidimensional data model that
supports fast and efficient querying of large datasets. OLAP systems are designed to handle
complex queries and provide fast response times.