You are on page 1of 26

Data warehousing Components

Lecture-3,4,5

Dr. Sumit Dhariwal


School of Computing Information Technology
Manipal University Jaipur
India
Data warehouse components
Data warehouse Architecture and its seven
components
1. Data sourcing, clean-up, transformation, and migration tools
2. Metadata repository
3. Warehouse/database technology
4. Data marts
5. Application & tools
6. Data warehouse administration and management
7. Information delivery system
Data warehouse is an environment, not a product which is based on
relational database management system that functions as the central
repository for informational data. The central repository information is
surrounded by number of key components designed to make the
environment is functional, manageable and accessible.
1. Operational and external data (Data sourcing,
clean-up, transformation, and migration tools)
The data source for the data
warehouse is coming from
operational applications. The
data entered into the data
warehouse transformed into an
integrated structure and format.
The transformation process
involves conversion,
summarization, filtering, and
condensation. The data
warehouse must be capable of
holding and managing large
volumes of data as well as
different structures of data
structures over time.
2. Metadata repository
Conti…
• Metadata repository is an integral part of a data warehouse system. It has the
following metadata −
• Definition of data warehouse − It includes the description of structure of data
warehouse. The description is defined by schema, view, hierarchies, derived data
definitions, and data mart locations and contents.
• Business metadata − It contains has the data ownership information, business
definition, and changing policies.
• Operational Metadata − It includes currency of data and data lineage. Currency
of data means whether the data is active, archived, or purged. Lineage of data
means the history of data migrated and transformation applied on it.
• Data for mapping from operational environment to data warehouse − It
includes the source databases and their contents, data extraction, data partition
cleaning, transformation rules, data refresh and purging rules.
• Algorithms for summarization − It includes dimension algorithms, data on
granularity, aggregation, summarizing, etc.
3. Data warehouse DBMS
It is used for maintaining, managing, and using the data
warehouse. It is classified into two:
1. Technical Metadata: It contains information about
data warehouse data used by warehouse designers,
administrators to carry out development and
management tasks. It includes,
• Info about data stores
• Transformation descriptions.
• That is mapping methods from operational DB to
warehouse DB
• Warehouse Object and data structure definitions for
target data.
• The rules used to perform clean up, and data
enhancement.
• Data mapping operations.
• Access authorization, backup history, archive history,
info delivery history, data acquisition history, data
access etc.
Conti…
2. Business Metadata: It contains info that gives info
stored in the data warehouse to users. It includes.
• Subject areas, and info object type including
queries, reports, images, video, audio clips etc.
• Internet home pages.
• Info related to info delivery system.
• Data warehouse operational info such as
ownerships, audit trails etc.,
• Meta data helps the users to understand the
content and find the data.
• Meta data are stored in separate data stores
which is known as informational directory or Meta
data repository which helps to integrate, maintain
and view the contents of the data warehouse.
Conti…
The following lists the characteristics of info directory/ Metadata:
• It is the gateway to the data warehouse environment.
• It supports easy distribution and replication of content for high performance and
availability.
• It should be searchable by business-oriented keywords.
• It should act as a launch platform for end users to access data and analysis tools
It should support the sharing of info.
• It should support scheduling options for requests.
• IT should support and provide an interface to other applications.
• It should support end-user monitoring of the status of the data warehouse
environment.
• g lists the characteristics of info directory/ Metadata
4. Data marts
A Data Mart is focused on a single functional
area of an organization and contains a subset of
data stored in a Data Warehouse. A Data Mart is
a condensed version of a Data Warehouse and is
designed for use by a specific department, unit,
or set of users in an organization. E.g.,
Marketing, Sales, HR, or finance. It is often
controlled by a single department in an
organization. Extremely urgent user
requirement.
• The absence of a budget for a full-scale data
warehouse strategy.
• The decentralization of business needs.
• The attraction of easy to use tools and mind sized
project
Why do we need Data Mart?
• Data Mart helps to enhance user’s response time due to reduction in volume of
data
• It provides easy access to frequently requested data.
• Data mart are simpler to implement when compared to corporate
Datawarehouse. At the same time, the cost of implementing Data Mart is
certainly lower compared with implementing a full data warehouse.
• Compared to Data Warehouse, a datamart is agile. In case of change in model,
datamart can be built quicker due to a smaller size.
• A Datamart is defined by a single Subject Matter Expert. On the contrary data
warehouse is defined by interdisciplinary SME from a variety of domains.
Hence, Data mart is more open to change compared to Datawarehouse.
• Data is partitioned and allows very granular access control privileges.
• Data can be segmented and stored on different hardware/software platforms.
Types of Data Mart
• There are three main types of data mart:
1.Dependent: Dependent data marts are created by drawing data
directly from operational, external or both sources.
2.Independent: Independent data mart is created without the use of
a central data warehouse.
3.Hybrid: This type of data marts can take data from data
warehouses or operational systems.
1. Dependent Data Mart
• A dependent data mart allows
sourcing organization’s data from a
single Data Warehouse. It is one of
the data mart example which offers
the benefit of centralization. If you
need to develop one or more
physical data marts, then you need
to configure them as dependent
data marts.
2. Independent Data Mart
• An independent data mart is created
without the use of a central Data
warehouse. This kind of Data Mart is an
ideal option for smaller groups within
an organization.
• An independent data mart has neither a
relationship with the enterprise data
warehouse nor with any other data
mart. In an Independent data mart, the
data is input separately, and its
analyses are also performed
autonomously.
3.Hybrid Data Mart
• A hybrid data mart combines input
from sources apart from the Data
warehouse. This could be helpful
when you want ad-hoc integration,
like after a new group or product is
added to the organization.
• It is the best data mart example
suited for multiple database
environments and a fast
implementation turnaround for any
organization. It also requires the least
data cleansing effort. Hybrid Data
mart also supports large storage
Steps in Implementing a Datamart
5. Application & Tools
• Data mining queries are useful for many purposes. You can:
• Apply the model to new data, to make single or multiple predictions.
You can provide input values as parameters, or in a batch.
• Get a statistical summary of the data used for training.
• Extract patterns and rules, or generate a profile of the typical case
representing a pattern in the model.
• Extract regression formulas and other calculations that explain
patterns.
• Get the cases that fit a particular pattern.
• Retrieve details about individual cases used in the model, including
data not used in the analysis.
• Retrain a model by adding new data, or performing cross-prediction.
Conti…
Its purpose is to provide info to business users for decision making. There are
five categories:
• Data query and reporting tools.
• Application development tools.
• Executive info system tools (EIS).
• OLAP tools.
• Data mining tools Query and reporting tools are used to generate query
and report.
• There are two types of reporting tools. They are:
• Production reporting tool used to generate regular operational reports Desktop
report writer are inexpensive desktop tools designed for end users.
Conti…
• Managed Query tools: used to generate SQL query. It uses Meta layer software in
between users and databases which offers a point-and-click creation of SQL
statement. This tool is a preferred choice of users to perform segment
identification, demographic analysis, territory management and preparation of
customer mailing lists etc.
• Application development tools: This is a graphical data access environment
which integrates OLAP tools with data warehouse and can be used to access all
db systems.
• OLAP Tools: are used to analyze the data in multi dimensional and complex views.
To enable multidimensional properties it uses MDDB and MRDB where MDDB
refers multi dimensional data base and MRDB refers multi relational data bases.
• Data mining tools: are used to discover knowledge from the data warehouse data
also can be used for data visualization and data correction purposes.
6. Data warehouse administration and
management
The management of data warehouse include…
• Security and priority management.
• Monitoring updates from multiple sources.
• Data quality checks Managing and updating meta data Auditing
and reporting data warehouse usage and status Purging data
Replicating, sub setting and distributing data Backup and
recovery Data warehouse storage management which includes
capacity planning, hierarchical storage management and purging of
aged data etc.
7. Information delivery system
• It is used to enable the process of subscribing for data warehouse
info.
• Delivery to one or more destinations according to the specified
scheduling algorithm
Data extraction, clean up, transformation and
migration
• A proper attention must be paid to data extraction which represents a
success factor for a data warehouse architecture. When
implementing data warehouse several the following selection criteria
that affect the ability to transform, consolidate, integrate and repair
the data should be considered:
Timeliness of data delivery to the warehouse.
• The tool must have the ability to identify the particular data and that can be read by
conversion tool.
• The tool must support flat files, indexed files since corporate data is still in this type.
• The tool must have the capability to merge data from multiple data stores.
• The tool should have specification interface to indicate the data to be extracted
• The tool should have the ability to read data from data dictionary.
• The code generated by the tool should be completely maintainable.
• The tool should permit the user to extract the required data.
• The tool must have the facility to perform data type and character set translation.
• The tool must have the capability to create summarization, aggregation and derivation of
records.
• The data warehouse database system must be able to perform loading data directly from
these tools
Benefits of data warehousing
• Data warehouse usage includes,
– Locating the right info
– Presentation of info
– Testing of hypothesis
– Discovery of info
– Sharing the analysis
The benefits can be classified into two:
Tangible benefits (quantified / measureable):It includes, – Improvement in
product inventory
• – Decrement in production cost
• – Improvement in selection of target markets
• – Enhancement in asset and liability management.
Intangible benefits (not easy to quantified): It includes,
• – Improvement in productivity by keeping all data in single location and
eliminating rekeying of data
• – Reduced redundant processing
• – Enhanced customer relation

You might also like