You are on page 1of 12

Visit: www.geocities.com/chinna_chetan05/forfriends.

html

A Paper Presentation on
Data warehousing and
data mining in computing and
scientific environment
- Information repository with knowledge discovery

1 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

• CONTENTS
• INTRODUCTION
• EVOLUTION OF INFORMATION TECHNOLOGY TOOLS
• DEFINITION OF DATA WAREHOUSE
• BENIFITS OF DATA WAREHOUSING
• CONCEPTUAL MODEL OF DATA WARE HOUSING
• DEFINITION OF DATA MINING
• GOALS OF DATA MINING & DATA WAREHOUSING
• ARCHITECTURE OF DATA MINING
• BENEFITS OF DATA MINING
• APPLICATIONS OF DATA MINING
• ISSUES & CHALLENGES IN DATA MINING
• CONCLUSION
• BIBLIOGRAPHY & REFERENCES
• WEBSITES.

Organisations are today suffering where the amount and rate of data capture is
from a malaise of data overflow. The very high, but the processing of this data
developments in the transaction processing into information that can be utilised for
technology has given rise to a situation decision making, is not developing at the

2 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

same pace. Data warehousing and data access, but if the tools for deriving
mining (both data & text) provide a information and/or knowledge and
technology that enables the decision-maker presenting them in a format that is useful
in the corporate sector/govt. to process this for decision making are not provided the
huge amount of data in a reasonable amount whole rationale for the existence of the
of time, to extract intelligence/knowledge in warehouse disappears. Various
a near real time. technologies for extracting new insight
from the data warehouse have come up
The data warehouse allows the storage which we classify loosely as "Data
of data in a format that facilitates its Mining Techniques".

INTRODUCTION

The advent of computing technology has the competition. Question that naturally
significantly influenced our lives and two arose is whether the enormous data that is
major impacts of this effect are business generated and stored as archives can be used
data processing and scientific computing. for improving the efficiency of business
During the initial years of the development performance.
of computer techniques for business, A new discipline in computer science,
computer professionals were concerned with data mining, gradually evolved. Data mining
designing files to store the data so that is the exploration and analysis of large data
information could be efficiently retrieved. sets, in order to discover meaningful
There were restrictions on storage size for patterns and rules. The key idea is to find
storing data and on the speed of accessing effective ways to combine the computer’s
the data. Needless to stay, the activity was power to process data with the human eye’s
restricted to a very few, highly qualified ability to detect patterns. The techniques of
professionals. Then came an era when the data mining are designed for, and work best
task was simplified by dams. with, large data sets.
Business is inherently competitive and Data mining will be of interest to
in this competitive world of business one is three major streams:
constantly on the lookout for ways to beat • statistics

3 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

• computer science • business management

Evolution of Information Technology Tools

The evolution of the information information acquisition from the
systems characterize the evolution of database of transactional data. The
systems from data maintenance systems, to managerial knowledge acquisition function
systems that transform the data into is/was not directly supported by these
"information" for use in the decision making systems. The evolution of new patterns in
process. These systems supported the the changing scenario could not be provided
by these systems directly, the planner was
supposed to do this from experience

.

Processing Processing
Data Information Knowledge

Transaction
Management Data Mining Tools &
s
Information On-Line Analytical
processi
Processing Tools
ng
Systems

DEFINITION OF DATA WAREHOUSE

A Data warehouse is a subject-oriented, according to subject instead of application.
integrated, time-varying, non-volatile For egg. An insurance company using a data
collection of data in support of the warehouse would organize their data by
management’s decision-making process. customer, premium, and claim instead of by
Subject-Oriented different products.
A Data warehouse is organized around
major subjects such as customer, products, Non-Volatile
sales, etc. Data are organized
4 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

Data are stored in a data warehouse to
A data warehouse is always a physically provide a historical perspective. Every key
separate store of data, which is transformed structure in the data warehouse contains,
from the application data found in the implicitly or explicitly, an element of time.
appropriate environment. Due to this
separation, data warehouses do not require Integrated
transaction processing, recovery, A data warehouse is usually constructed by
concurrency control etc. just as we used in integrating multiple; heterogeneous sources
DBMS. such as relational databases, flat files, and
OLTP files. When data resides in many
Time Varying separate applications in the operational
environment, the encoding of data is

Conceptually, a Data Warehouse looks like this:

BENEFITS OF DATAWARE HOUSING
5 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

Data warehouse is to get the enterprise-wide  Providing a foundation for
data in a format that is most useful to end- enterprise-wide integration and
users, regardless of their locations. Data access.
warehousing is used for:  Improving or re-inventing business
processes.
 Increasing the speed and flexibility  Gaining a clear understanding of
of analysis. customer behavior.

CONCEPTUALLY A DATA WARE HOUSE LOOKS LIKE

6 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

Information Sources always The Data Warehouse itself is the
include the core operational systems bridge between the operational
which form the backbone of day-to- systems and the decision support
day activities. It is these systems tools. It holds a copy of much of the
which have traditionally provided operational system data in a logical
management information to support structure which is more conducive to
decision making. analysis. The Data Warehouse,
which will be refreshed in scheduled
Decision Support Tools are used to
bursts from operational systems and
analyze the information stored in the
from relevant external data sources,
warehouse, typically to identify
provides a single, consistent view of
trends and new business
corporate data, leaving operational
opportunities...
systems unaffected.

DEFINITION OF DATA MINING
The term ‘data mining’ refers to the interdisciplinary field, merging ideas from
finding of relevant and useful Information statistics, machine learning, databases and
from database. Data mining and knowledge parallel computing.
discovery in the databases is a new

The Data Mining Process

Transformed Data

Data Sources
Assimilated Information

1 Extracted
Information
7 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

2 Data Selected
Warehouse Data

N
Select Transform Mine Assimilate

Data Mining and Data Warehousing

o The goal of a data warehouse is to o To make data mining more efficient,
support decision making with data. the data warehouse should have an
o Data mining can be used in aggregated or summarized collection
conjunction with a data warehouse to of data.
help with certain types of decisions. o Data mining helps in extracting
o Data mining can be applied to meaningful new patterns that cannot
operational databases with individual be found necessarily by merely
transactions. querying or processing data or
metadata in the data warehouse.

Architecture Of Data Mining:

To best requiring extra steps for extracting,
apply these advanced techniques, importing, and analyzing the data.
they must be fully integrated with a
data warehouse as well as flexible
interactive business analysis tools.
Many data mining tools currently
operate outside of the warehouse,

8 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

Furthermore, when product rollout, and so on. Below figure
new insights require operational
implementation, integration with
the warehouse simplifies the
application of results from data
mining. The resulting analytic
data warehouse can be applied
to improve business processes
throughout the organization, in areas illustrates architecture for advanced
such as promotional campaign analysis in a large data warehouse.
management, fraud detection, new

Integrated Data Mining
Architecture

Benefits of data mining

The primary benefit of data mining is the we all realize that many times they are not
ability to turn feelings into facts. It also right. The fundamental benefit of data
protects you from your gut feelings, because mining is then two folds.

9 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

Let’s look at a number of tangible data warehouse technology spend millions
benefits the data mining process can bring to of dollars on new business initiatives. The
companies. research & development costs are
1. Fraud detection astronomical. Everyone has struggled with
All too often businesses are so time. These returns on investment give a
caught up in their daily operations that they finite amount of money and people
don’t have time to dedicate to uncovering to available. This is a form of targeted data
those out of ordinary business. These events mining.
include fraud, employee theft, and illegal 3. Scalability of electronic solution
redirection of company goods. Fraud The major player in the data-mining arena
detection is seen primarily as out-of-the-blue provides solutions that are robust and
data mining. scalable. A robust data mining solution is
2. Return on investments one that performs well and can display
A significant segment of the results in an acceptable time. The ability to
companies looking at, or already adopting, work with a wide range of input datasets is
part of this phenomenon called scalability
.

Applications of data mining
Wide ranges of companies have deployed health care systems. The results can
successful applications of data mining. be distributed to the sales force via a
Successful application areas include: wide-area network that enables the
- A pharmaceutical company can representatives to review the
analyze its recent sales force activity recommendations from the
and their results to improve targeting perspective of the key attributes in
of high-value physicians and the decision process.
determine which marketing activities - A credit card company can leverage
will have the greatest impact in the its vast warehouse of customer
next few months. The data needs to transaction data to identify customers
include competitor market activity as most likely to be interested in a new
well as information about the local credit product. Using a small test

10 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

mailing, the attributes of customers analyze its own customer experience,
with an affinity for the product can this company can build a unique
be identified. segmentation identifying the
- A diversified transportation attributes of high-value prospects.
company with a large direct sales
force can apply data mining to - A large consumer package goods
identify the best prospects for its company can apply data mining to
services. Using data mining to improve its sales process to retailers.

ISSUES AND CHALLENGES IN DATA MINING
Data mining systems depend on information stored. The difficulties in data
databases to supply the raw input and this mining can be categorized as:
raises problems, such as those databases  Limited information
tend to be dynamic, incomplete, noisy and
 Noise or missing data
large.
 User interaction and prior knowledge
Other problems arise as a result of the
inadequacy and irrelevance of the  Uncertainty
 Size, updates and irrelevant fields

CONCLUSION:

We conclude that all of these problems are areas of current research, but they are not yet fully
solved. Nonetheless, despite these difficulties, data mining offers an important approach to
achieving values from the data ware house for use in decision support.

11 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html

BIBLIOGRAPHY & REFERENCES

 Eckerson, W.W. (1988) "Post-Chasm Warehousing,"
Journal of Data Warehousing,

 Recent Developments in Data Warehousing by H.J.
Watson.

 Data Mining Concepts and Techniques by Jiawei
Han, Micheline Kamber

WEBSITES

- www.datawarehousingonline.com
- www.pcc.ac.uk.com
- www.dsstechniques.com

12 Email: chinna_chetan05@yahoo.com