You are on page 1of 21

Data Mining

Knowledge Discovery
in Databases
Information as a Production Factor :
Most international organizations produce more
information in a week than many people could read in a
lifetime . The situation is even more alarming in
worldwide networks like Internet. Everyday hundreds of
megabytes of data are distributed around the world, but
it is no longer possible to monitor this increasingly
rapid development - the growth is exponential .We are
confronted with the new paradox of the growth of data ,
that more data means less information. In the future, the
ability to read and interpret alone will not be enough to
survive as professional or a professional organization.
The Mechanical Production and reproduction of data
force us to adapt our strategies and develop mechanical
methods for filtering , selecting , and interpreting
data .Organizations that excel in doing this will have a
better chance of surviving , and because of this ,
information itself has become a production factor of
importance .This tendency is perhaps most obvious in
the stock exchange , where it is not only the availability
of data that is vital but also the ability to interpret the
data, and to act on the basis of these interpretations.
Most organizations have large databases that contain a
wealth of potentially accessible information. However it
is usually very difficult to access this information .The
growth of data will lead to a situation in which it is
increasingly difficult to access the desired information:
it will always be like looking for a needle in a haystack,
only the amount of hay will be growing all the time.
Knowledge Discovery in Databases is a new field of
Data Mining . In data mining , enormous quantities of
debris have to be removed before diamonds or gold can
be found. With computers , one can automatically find
‘information diamond’ among the tons of debris in the
database.
Machine
Expert
Learning
Systems

KDD
Database Statistics

Visualization

Data Mining is a multi-disciplinary field


A data mining tool does not replace query tool,
but it does give the user a lot of additional
possibilities.
Data Mining in Marketing :
The standard success stories of KDD come
primarily from marketing .
If a company selling a range of products has a
database of clients who have purchased various
products manufactured by the company , this
data will be fruitful to the company to introduce
a new product or upgrade the old product . For
getting the correct information, many queries
are possible . E.g. classifying the customers
according the products they have purchased ,
continued . . . . . .
Classifying the customers according to the
region, classifying according to age groups etc.
It will be wisest to use different classification for
each marketing action.
It is obvious that knowing and applying these
kinds of rules create great commercial
opportunities. Whenever large data sets exist,
there is the possibility of discovering interesting
new applications .
Practical Applications of data mining :

Data Mining is applied by many organizations


worldwide .
Organizations like American Express and AT&T use KDD
to analyze their client files.
In UK BBC has applied data mining techniques to analyze
viewing figures.
Many banks , especially ones providing facilities like
Credit cards , Debit cards , ATMs use data mining.
Tools Available for data mining :

• Clementine from Integral Solutions


• Intelligent Miner from IBM
• 4Thought from Livingstones

New tools for data mining are introduced almost every


week.
A data warehouse is designed especially for decision
support queries, therefore only data that is needed for
decision support is extracted from the operational data
and must be stored in the warehouse.
Designing a data warehouse requires specialist
knowledge of data design because the data model
consist of data needed by the users who want access at
high speed , and so the data design for the warehouse
can be completely different from that of the operational
database. After creating a corporate data model for the
data warehouse , the data management environment
has to be designed.
Data
Warehousing
What is data warehouse and why do we need it ?
Modern organizations are under enormous
pressure to respond quickly to changes in the
market. In order to do this it is required to have
rapid access to all kinds of information before one
can make any logical decisions. To assist making
right choices for the organization , it is essential to
be able to research the past and identify the
relevant trends. In order to perform any trend
analysis one must have access to all the
information needed to support the needs , and this
information is mainly stored in very large
databases. The easiest way to gain access to this
data and facilitate effective decision making is to
set up a data warehouse.
In many organizations , there are very large
databases in operation for normal daily
transactions and some of the applications will use
transaction monitors. These types of databases are
known as operational databases . These are
designed to support all applications for day-to-day
transactions.
The other type of database found in the
organizations is the data warehouse. This is
designed for strategic decision support , and is
largely built up from the databases that make up the
operational database. The characteristic of data
warehouse is that it contains vast amounts of data,
which can mean billions of records . Smaller, local
data warehouses are called data marts . There are
some specific rules that govern the basic structure
of a data warehouse , they are :
1. Time dependent : i.e. , containing information
collected over time , which implies there must
always be a connection between the information
in the warehouse and the time when it is entered.
2. Non-volatile : i.e. , data in a data warehouse
is never updated but used only for queries. Thus
data can only be loaded from other databases
such as the operational database .This means
that warehouse will always be filled with historical
data
3. Subject oriented: i.e. built around the existing
applications of the operational data . Not all the
information in the operational database is useful for
a data warehouse , since the data warehouse is
designed specifically for decision support while the
operational database contains information for day-
to-day use.
4.Integrated : i.e. it reflects the business
information of the organization. Hence ,in data
warehouse it is essential to integrate the
information and make it consistent.
In setting up a data warehouse, the end-user and the
administrator must have access to all the information in
the tables and the attributes . They will want to know a
number of things , such as :
• Where the data is located
• What data exists
• What data type of format it is in
• How this data is related to other data in other
databases
• Where the data is from and to whom the data belongs
Within a data warehouse specific hardware and
software requirements are to be satisfied in order to
enable decision support to be successfully
accomplished. Working in a client/server environment
allows greater flexibility in choosing the appropriate
software for end users because each individual need
can be catered for on a local workstation. The only
common element is the database , which must be
completely optimized and provided with a number of
functionalities to help speed up performance.
The hardware requirements depend on the type of data
warehouse and techniques with which one wants to
work. A large data warehouse can contain hundreds of
thousands of gigabytes.
Integration with data mining :The application of data
mining techniques can be carried out in two ways :
• From the existing data warehouse
• By extracting from the existing data warehouse the part
of information that is of interest to the end-user and
copying it to a specific computer.

With a data warehouse , all the information is transferred


from the operational database to the data warehouse .
Operational Data Data
Data Warehouse Marts

Extracts
from
several
databases

The relationship between operational data , a data


warehouse , and datamarts

You might also like