You are on page 1of 26

Data warehousing Components

Lecture-3,4,5

Dr. Shweta Sharma


School of Computing Information Technology
Manipal University Jaipur
India
Data warehouse components
Data warehouse Architecture and its seven
components
1. Data sourcing, clean-up, transformation, and migration tools
2. Metadata repository
3. Warehouse/database technology
4. Data marts
5. Application & tools
6. Data warehouse administration and management
7. Information delivery system
Data warehouse is an environment, not a product which is based on relational database
management system that functions as the central repository for informational data.
The central repository information is surrounded by number of key components
designed to make the environment is functional, manageable and accessible.
1. Operational and external data (Data sourcing,
clean-up, transformation, and migration tools)
The data source for the data
warehouse is coming from
operational applications. The
data entered into the data
warehouse transformed into an
integrated structure and format.
The transformation process
involves conversion,
summarization, filtering, and
condensation. The data
warehouse must be capable of
holding and managing large
volumes of data as well as
different structures of data
structures over time.
2. Metadata repository
Conti…
• Metadata repository is an integral part of a data warehouse system. It has the following metadata

• Definition of data warehouse − It includes the description of structure of data warehouse. The
description is defined by schema, view, hierarchies, derived data definitions, and data mart
locations and contents.
• Business metadata − It contains has the data ownership information, business definition, and
changing policies.
• Operational Metadata − It includes currency of data and data lineage. Currency of data means
whether the data is active, archived, or purged. Lineage of data means the history of data
migrated and transformation applied on it.
• Data for mapping from operational environment to data warehouse − It includes the source
databases and their contents, data extraction, data partition cleaning, transformation rules, data
refresh and purging rules.
• Algorithms for summarization − It includes dimension algorithms, data on granularity,
aggregation, summarizing, etc.
3. Data warehouse DBMS
It is used for maintaining, managing, and using the data
warehouse. It is classified into two:
1. Technical Metadata: It contains information about data
warehouse data used by warehouse designers,
administrators to carry out development and management
tasks. It includes,
• Info about data stores
• Transformation descriptions.
• That is mapping methods from operational DB to
warehouse DB
• Warehouse Object and data structure definitions for target
data.
• The rules used to perform clean up, and data enhancement.
• Data mapping operations.
• Access authorization, backup history, archive history, info
delivery history, data acquisition history, data access etc.
Conti…
2. Business Metadata: It contains info that gives info
stored in the data warehouse to users. It includes.
• Subject areas, and info object type including
queries, reports, images, video, audio clips etc.
• Internet home pages.
• Info related to info delivery system.
• Data warehouse operational info such as
ownerships, audit trails etc.,
• Meta data helps the users to understand the
content and find the data.
• Meta data are stored in separate data stores
which is known as informational directory or Meta
data repository which helps to integrate, maintain
and view the contents of the data warehouse.
Conti…
The following lists the characteristics of info directory/ Metadata:
• It is the gateway to the data warehouse environment.
• It supports easy distribution and replication of content for high performance and
availability.
• It should be searchable by business-oriented keywords.
• It should act as a launch platform for end users to access data and analysis tools It
should support the sharing of info.
• It should support scheduling options for requests.
• IT should support and provide an interface to other applications.
• It should support end-user monitoring of the status of the data warehouse environment.
• g lists the characteristics of info directory/ Metadata
4. Data marts
A Data Mart is focused on a single functional area
of an organization and contains a subset of data
stored in a Data Warehouse. A Data Mart is a
condensed version of a Data Warehouse and is
designed for use by a specific department, unit, or
set of users in an organization. E.g., Marketing,
Sales, HR, or finance. It is often controlled by a
single department in an organization. Extremely
urgent user requirement.
• The absence of a budget for a full-scale data
warehouse strategy.
• The decentralization of business needs.
• The attraction of easy to use tools and mind sized
project
Why do we need Data Mart?
• Data Mart helps to enhance user’s response time due to reduction in volume of data
• It provides easy access to frequently requested data.
• Data mart are simpler to implement when compared to corporate Datawarehouse. At
the same time, the cost of implementing Data Mart is certainly lower compared with
implementing a full data warehouse.
• Compared to Data Warehouse, a datamart is agile. In case of change in model, datamart
can be built quicker due to a smaller size.
• A Datamart is defined by a single Subject Matter Expert. On the contrary data warehouse
is defined by interdisciplinary SME from a variety of domains. Hence, Data mart is more
open to change compared to Datawarehouse.
• Data is partitioned and allows very granular access control privileges.
• Data can be segmented and stored on different hardware/software platforms.
Types of Data Mart
• There are three main types of data mart:
1.Dependent: Dependent data marts are created by drawing data
directly from operational, external or both sources.
2.Independent: Independent data mart is created without the use of
a central data warehouse.
3.Hybrid: This type of data marts can take data from data warehouses
or operational systems.
1. Dependent Data Mart
• A dependent data mart allows
sourcing organization’s data from a
single Data Warehouse. It is one of
the data mart example which offers
the benefit of centralization. If you
need to develop one or more
physical data marts, then you need
to configure them as dependent
data marts.
2. Independent Data Mart
• An independent data mart is created
without the use of a central Data
warehouse. This kind of Data Mart is an
ideal option for smaller groups within an
organization.
• An independent data mart has neither a
relationship with the enterprise data
warehouse nor with any other data mart.
In an Independent data mart, the data is
input separately, and its analyses are
also performed autonomously.
3.Hybrid Data Mart
• A hybrid data mart combines input
from sources apart from the Data
warehouse. This could be helpful when
you want ad-hoc integration, like after a
new group or product is added to the
organization.
• It is the best data mart example suited
for multiple database environments
and a fast implementation turnaround
for any organization. It also requires the
least data cleansing effort. Hybrid Data
mart also supports large storage
Steps in Implementing a Datamart
5. Application & Tools
• Data mining queries are useful for many purposes. You can:
• Apply the model to new data, to make single or multiple predictions. You can
provide input values as parameters, or in a batch.
• Get a statistical summary of the data used for training.
• Extract patterns and rules, or generate a profile of the typical case representing
a pattern in the model.
• Extract regression formulas and other calculations that explain patterns.
• Get the cases that fit a particular pattern.
• Retrieve details about individual cases used in the model, including data not
used in the analysis.
• Retrain a model by adding new data, or performing cross-prediction.
Conti…
Its purpose is to provide info to business users for decision making. There are five
categories:
• Data query and reporting tools.
• Application development tools.
• Executive info system tools (EIS).
• OLAP tools.
• Data mining tools Query and reporting tools are used to generate query and
report.
• There are two types of reporting tools. They are:
• Production reporting tool used to generate regular operational reports Desktop report
writer are inexpensive desktop tools designed for end users.
Conti…
• Managed Query tools: used to generate SQL query. It uses Meta layer software in
between users and databases which offers a point-and-click creation of SQL
statement. This tool is a preferred choice of users to perform segment identification,
demographic analysis, territory management and preparation of customer mailing
lists etc.
• Application development tools: This is a graphical data access environment which
integrates OLAP tools with data warehouse and can be used to access all db systems.
• OLAP Tools: are used to analyze the data in multi dimensional and complex views. To
enable multidimensional properties it uses MDDB and MRDB where MDDB refers
multi dimensional data base and MRDB refers multi relational data bases.
• Data mining tools: are used to discover knowledge from the data warehouse data
also can be used for data visualization and data correction purposes.
6. Data warehouse administration and
management
The management of data warehouse include…
• Security and priority management.
• Monitoring updates from multiple sources.
• Data quality checks Managing and updating meta data Auditing
and reporting data warehouse usage and status Purging data
Replicating, sub setting and distributing data Backup and
recovery Data warehouse storage management which includes
capacity planning, hierarchical storage management and purging of
aged data etc.
7. Information delivery system
• It is used to enable the process of subscribing for data warehouse
info.
• Delivery to one or more destinations according to the specified
scheduling algorithm
Data extraction, clean up, transformation and
migration
• A proper attention must be paid to data extraction which represents a
success factor for a data warehouse architecture. When implementing
data warehouse several the following selection criteria that affect the
ability to transform, consolidate, integrate and repair the data should
be considered:
Timeliness of data delivery to the warehouse.
• The tool must have the ability to identify the particular data and that can be read by conversion
tool.
• The tool must support flat files, indexed files since corporate data is still in this type.
• The tool must have the capability to merge data from multiple data stores.
• The tool should have specification interface to indicate the data to be extracted
• The tool should have the ability to read data from data dictionary.
• The code generated by the tool should be completely maintainable.
• The tool should permit the user to extract the required data.
• The tool must have the facility to perform data type and character set translation.
• The tool must have the capability to create summarization, aggregation and derivation of records.
• The data warehouse database system must be able to perform loading data directly from these
tools
Benefits of data warehousing
• Data warehouse usage includes,
– Locating the right info
– Presentation of info
– Testing of hypothesis
– Discovery of info
– Sharing the analysis
The benefits can be classified into two:
Tangible benefits (quantified / measureable):It includes, – Improvement in
product inventory
• – Decrement in production cost
• – Improvement in selection of target markets
• – Enhancement in asset and liability management.
Intangible benefits (not easy to quantified): It includes,
• – Improvement in productivity by keeping all data in single location and
eliminating rekeying of data
• – Reduced redundant processing
• – Enhanced customer relation

You might also like