Professional Documents
Culture Documents
Chapter five
o First, it acts as the glue that links all parts of the data warehouses.
o Next, it provides information about the contents and structures to the
developers.
o Finally, it opens the doors to the end-users and makes the contents
recognizable in their terms.
.
Operational Metadata
As we know, data for the data warehouse comes from
various operational systems of the enterprise.
These source systems include different data structures.
The data elements selected for the data warehouse have various
field lengths and data types In selecting information from the source
systems for the data warehouses, we divide records, combine factor of
documents from different source files, and deal with multiple coding
schemes and field lengths.
When we deliver information to the end-users, we must be able to tie that
back to the source data sets.
Operational metadata contains all of this information about the operational
data sources.
Extraction and Transformation Metadata
Interchange standard metadata model implementation assumes that the metadata itself
may be stored in storage format of any type: ASCII files, relational tables, fixed or
customized formats, etc.
It is a framework that is based on a framework that will translate an access request into
the standard interchange index.
o Procedural Approach
o ASCII Batch Approach
o Hybrid Approach
In a procedural approach,
the communication with API is built into the tool. It enables
the highest degree of flexibility.
4)The user configuration is a file explaining the legal interchange paths for metadata in
the user's environment.
Metadata Repository
The metadata itself is housed in and controlled by the metadata repository. The
software of metadata repository management can be used to map the source data to the
target database, integrate and transform the data, generate code for data
transformation, and to move data to the warehouse.
Benefits of Metadata Repository
1. It provides a set of tools for enterprise-wide metadata management.
2. It eliminates and reduces inconsistency, redundancy, and underutilization.
There are mainly two approaches to designing data marts. These approaches are
A dependent data marts is a logical subset of a physical subset of a higher data warehouse.
According to this technique, the data marts are treated as the subsets of a data warehouse. In
this technique, firstly a data warehouse is created from which further various data marts can
be created. These data mart are dependent on the data warehouse and extract the essential
record from it. In this technique, as the data warehouse creates the data mart; therefore, there
is no need for data mart integration. It is also known as a top-down approach.
Independent Data Marts
The second approach is Independent data marts (IDM) Here, firstly independent data marts
are created, and then a data warehouse is designed using these independent multiple data
marts. In this approach, as all the data marts are designed independently; therefore, the
integration of data marts is required. It is also termed as a bottom-up approach as the
data marts are integrated to develop a data warehouse.
Other than these two categories, one more type exists that is called "Hybrid Data Marts."
The significant steps in implementing a data mart are to design the schema, construct
the physical storage, populate the data mart with data from source systems, access it
to make informed decisions and manage it over time. So, the steps are:
Designing
The design step is the first in the data mart process. This phase covers all of the functions
from initiating the request for a data mart through gathering data about the
requirements and developing the logical and physical design of the data mart.
This step contains creating the physical database and logical structures associated with the
data mart to provide fast and efficient access to the data.
1.Creating the physical database and logical structures such as tablespaces associated
with the data mart.
2.creating the schema objects such as tables and indexes describe in the design step.
3.Determining how best to set up the tables and access structures.
Populating
This step includes all of the tasks related to the getting data from the source, cleaning it up,
modifying it to the right format and level of detail, and moving it into the data mart.
Accessing
This step involves putting the data to use: querying the data, analyzing it, creating reports,
charts and graphs and publishing them.
1. Set up and intermediate layer (Meta Layer) for the front-end tool to use. This layer
translates database operations and objects names into business conditions so that the
end-clients can interact with the data mart using words which relates to the business
functions.
2. Set up and manage database architectures like summarized tables which help queries
agree through the front-end tools execute rapidly and efficiently.
Managing
This step contains managing the data mart over its lifetime. In this step, management
functions are performed as:
A Data Warehouse is a vast repository of information A data mart is an only subtype of a Data Warehouses. It
collected from various organizations or departments is architecture to meet the requirement of a specific
within a corporation. user group.
It may hold multiple subject areas. It holds only one subject area. For example, Finance or
Sales.
Works to integrate all data sources It concentrates on integrating data from a given
subject area or set of source systems.
In data warehousing, Fact constellation is used. In Data Mart, Star Schema and Snowflake Schema are
used.
Now we discuss the delivery process of the data warehouse. Main steps used in data
warehouse delivery process which are as follows:
IT Strategy: DWH project must contain IT strategy for procuring and retaining funding.
Business Case Analysis: After the IT strategy has been designed, the next step is the
business case. It is essential to understand the level of investment that can be justified and to
recognize the projected business benefits which should be derived from using the data
warehouse.
Education & Prototyping: Company will experiment with the ideas of data analysis
and educate themselves on the value of the data warehouse. This is valuable and should be
required if this is the company first exposure to the benefits of the DS record. Prototyping
method can progress the growth of education. It is better than working models.
Prototyping requires business requirement, technical blueprint, and structures.
The logical model for data within the data warehouse. The source system that provides this
Technical blueprint: It arranges the architecture of the warehouse. Technical blueprint of the
delivery process makes an architecture plan which satisfies long-term requirements. It lays
server and data mart architecture and essential components of database design.
Building the vision: It is the phase where the first production deliverable is produced. This
stage will probably create significant infrastructure elements for extracting and loading
information but limit them to the extraction and load of information sources.
History Load: The next step is one where the remainder of the required history is loaded
into the data warehouse. This means that the new entities would not be added to the data
warehouse, but additional physical tables would probably be created to save the increased
record volumes.
AD-Hoc Query: In this step, we configure an ad-hoc query tool to operate against the data
warehouse.
These end-customer access tools are capable of automatically generating the database query
that answers any question posed by the user.
Automation: The automation phase is where many of the operational management processes
are fully automated within the DWH. These would include:
Extracting & loading the data from a variety of sources systems Transforming the
Requirement Evolution: This is the last step of the delivery process of a data warehouse. As
we all know that requirements are not static and evolve continuously. As the business
requirements will change it supports to be reflected in the system.
Concept Hierarchy
Concept hierarchy is directed acyclic graph of ideas, where a unique name identifies each of
the theories.
An arc from the concept a to b denotes which is a more general concept than b. We can tag
the text with ideas.
Each text report is tagged by a set of concepts which corresponds to its content.
Tagging a report with a concept implicitly entails its tagging with all the ancestors of
the concept hierarchy. It is, therefore desired that a report should be tagged with the lowest
concept possible.
The method to automatically tag the report to the hierarchy is a top-down approach.
An evaluation function determines whether a record currently tagged to a node can also be
tagged to any of its child nodes.
If so, then then the tag moves down the hierarchy till it cannot be pushed any further.
The outcome of this step is a hierarchy of report and, at each node, there is a set of the report
having a common concept related to the node.
The hierarchy of reports resulting from the tagging step is useful for many texts mining
process.
It is assumed that the hierarchy of concepts is called a priori. We can even have such a
hierarchy of documents without a concept hierarchy, by using any hierarchical
clustering algorithm, which results in such a hierarchy.
Concept hierarchy defines a sequence of mapping from a set of particular, low-level concepts
to more general, higher-level concepts.
Concept hierarchies are crucial for the formulation of useful OLAP queries. The hierarchies
allow the user to summarize the data at various levels.
For example, using the location hierarchy, the user can retrieve data which summarizes sales
for each location, for all the areas in a given state, or even a given country without the
necessity of reorganizing the data.
Thanks for listening