You are on page 1of 10



A Paper Presentation on

- Information repository with knowledge discovery

Mobile: 9640035030

Content Overview

Page No

Introduction 3

Warehouse with a database 4

What is Data-Warehousing? 5
Warehousing Functions 6
Data Warehouse Architecture 6
What is Data Mining? 7
Warehousing and Mining 8
Data Mining as a part of Knowledge Discovery 9
Goals of Data Mining and Knowledge Discovery 10
Compendium 11
Bibliography 12
Abstract • Introduction
Organisations are today suffering from
a malaise of data overflow. The developments “Knowledge [no more Information] is not
in the transaction processing technology has only power, but also has significant
given rise to a situation where the amount and competitive advantage”
rate of data capture is very high, but the Organizations have lately realized that
processing of this data into information that just processing transactions and/or
can be utilised for decision making, is not information’s faster and more efficiently, no
developing at the same pace. longer provides them with a competitive
Data warehousing and data mining advantage vis-à-vis their competitors for
(both data & text) provide a technology that achieving business excellence. Information
enables the decision-maker in the corporate technology (IT) tools that are oriented towards
sector/govt. to process this huge amount of knowledge processing can provide the edge
data in a reasonable amount of time, to extract that organizations need to survive and thrive
intelligence/knowledge in a near real time. in the current era of fierce competition. The
The data warehouse allows the storage of data increasing competitive pressures and the
in a format that facilitates its access, but if the desire to leverage information technology
tools for deriving information and/or techniques have led many organizations to
knowledge and presenting them in a format explore the benefits of new emerging
that is useful for decision making are not technology – viz. "Data Warehousing and
provided the whole rationale for the existence Data Mining". What is needed today is not
of the warehouse disappears. just the latest and updated to the nano-second
Various technologies for extracting information, but the cross-functional
new insight from the data warehouse have information that can help decisions making
come up which we classify loosely as "Data activity as "on-line" process.
Mining Techniques". Our paper focuses on the
need for information repositories and
discovery of knowledge and thence the
overview of the so hyped, Data Warehousing Evolution of Information
and Data Mining.
Technology Tools
The evolution of the information And, these days, change is
systems characterize the evolution of systems occurring at an ever-increasing rate. A key
from data maintenance systems, to systems challenge is implementing an information
that transform the data into "information" for infrastructure that allows your company to
use in the decision making process. These rapidly respond to change. One solution to this
systems supported the information acquisition challenge is the datawarehouse.
from the database of transactional data. The
managerial knowledge acquisition function Data warehousing is an information
is/was not directly supported by these infrastructure based on detail data that
systems . The evolution of new patterns in the supports the decision-making process and
changing scenario could not be provided by provides businesses the ability to access and
these systems directly, the planner was analyze data to increase an organization's
supposed to do this from experience. competitive advantage.

Data warehousing is a process, not an off-the-

shelf solution you buy, but hardware--
database and tools integrated into an evolving
information infrastructure--that changes with
the dynamics of the business.

• What is Data-

• Warehouse with a
One thing that remains constant , especially
in corporate world , is “ Change”
* Data is organized according to
subject instead of application.

In general a database is not a data

warehouse unless it has the following two

• It collects information from a

number of different disparate
sources and is the place where this
disparity is reconciled, and

• It allows several different

applications to make use of the same

The data warehouse makes an attempt to information.

figure out "what we need", before we know Conceptually, a Data Warehouse

we need it. looks like this:

What it actually is?

* A data warehouse stores

current and historical data

* This data is taken from various,

perhaps incompatible, sources
and stored in a uniform format

* Several tools transform this

data into meaningful business
information for the purpose of
comparisons, trends and
Information Sources always include the core
* Data in a warehouse is not
operational systems which form the backbone
updates or changed in any way,
of day-to-day activities. It is these systems
but is only loaded and accessed
which have traditionally provided
later on
management information to support decision  Improving or re-inventing business
making. processes.
 Gaining a clear understanding of
Decision Support Tools are used to analyze
customer behavior.
the information stored in the warehouse,
typically to identify trends and new business
Data Warehouse
The Data Warehouse itself is the bridge
between the operational systems and the Each implementation of a data
decision support tools. It holds a copy of much warehouse is different in its detailed design (a
of the operational system data in a logical schematic high-level of the architecture and its
structure which is more conducive to analysis. components is given in the figure below), but
The Data Warehouse, which will be refreshed all are characterised by a handful of the
in scheduled bursts from operational systems following key components:
and from relevant external data sources,
• A data model to define the
provides a single, consistent view of corporate
warehouse contents.
data, leaving operational systems unaffected.
• A carefully designed
Data – Warehouse warehouse database, whether
hierarchical, relational, or
Functions multidimensional. While choosing
a DBMS it must be kept in view
The main function behind a data
that the database management
warehouse is to get the enterprise-wide data in
system should be powerful enough
a format that is most useful to end-users,
to handle huge amount of data
regardless of their locations. Datawarehousing
running up to terabytes.
is used for:

 Increasing the speed and flexibility of • A front end for Decision

analysis. Support System (DSS) for reporting
 Providing a foundation for enterprise- and for structured and unstructured
wide integration and access. analysis.
Data Mining and Data
Data Mining
Data base mining or Data mining
(DM) (formally termed Knowledge Discovery • The goal of a data warehouse is to
in Databases – KDD) is a process that aims to support decision making with data.
use existing data to invent new facts and to • Data mining can be used in
uncover new relationships previously conjunction with a data warehouse to
unknown even to experts thoroughly familiar help with certain types of decisions.
with the data. It is like extracting precious • Data mining can be applied to
metal (say gold etc.) and/or gems, hence the operational databases with individual
term “mining”, It is based on filtration and transactions.
assaying of mountain of data “ore” in order to • To make data mining more efficient,
get “nuggets” of knowledge. The data mining the data-warehouse should have
process is diagrammatically exemplified in aggregated or summarized collection
Figure below of data.
• Data mining helps in extracting
meaningful new patterns that cannot be
found necessarily by merely querying
or processing data or metadata in the
data warehouse.
Data Mining as a Part of the Goals of Data Mining and
Knowledge Discovery Process
Knowledge Discovery
• Knowledge Discovery in
The goals of data mining fall into the following
frequently abbreviated as KDD, typically
encompasses more than data mining.
• The knowledge discovery process comprises Prediction : Data mining can show how certain
six phases: attributes within the data will behave in the future.

Data selection ,Data about specific Identification: Data patterns can be used to identify
items or categories of items, or from the existence of an item, an event, or an activity.
stores in a specific region or area of
the country, may be selected. Classification: Data mining can partition the data so
that different classes or categories can be identified
Data cleansing process then may based on combinations of parameters.
correct invalid zip codes or eliminate
records with incorrect phone prefixes. Optimization:

Enrichment typically enhances the data withOne eventual goal of data mining may be to optimize
additional sources of information. the use of limited resources such as time, space,
money, or materials and to maximize output variables
Data transformation and encoding may besuch as sales or profits under a given set of
done to reduce the amount of data. constraints.
Compendium sounder decision making; improves
worker/management knowledge and
productivity; spares the operational database
from ad-hoc queries with the resulting
performance degradation and clears the legacy
database system, while moving the corporate
system architecture forward.
A data warehouse takes the organisations
operational data, historical data and external With the incorporation of new data
data delivery and presentation techniques, like
hypertext mark up language (HTML), Open
Database Connectivity (ODBC) etc. the
a) consolidates it into a separately
database mining (Data & Text) operation has
designed database (which can either be
gained wide spread recognition as a viable
relational or multi-dimensional in
tool for business intelligence gathering.
Advances in the document mining technology
b) manages it into a format that is (database mining of free form text/data, in
optimised for end users to access and contrast to the “classical” approach to data
analyse. mining of fixed length records) are making the

When a data warehouse has been data mining technology more powerful.

constructed, it provides a complete picture of Last but never the least, the Internet
the enterprise. It provides an unparalleled has emerged as the largest data warehouse of
opportunity to the management to learn about unstructured and free form data. The new
their customers. technologies are geared towards mining this

The data warehouse technology great data warehouse.

together with online transaction processing

and data mining, allows the management to
provide better customer service, create greater
customer loyalty and activity, focus customer
acquisition and retention of the most
profitable customer, increase revenue, reduce
operating cost; provides tools that facilitate