PAPER PRESENTATION ON “DATAMINING AND DATAWAREHOUSING”

AUTHORS SUPRAJA K CSE (2/4) Ph: 040_23818232 e-mail: koneru_supu@yahoo.co.in SWETHA P CSE (2/4) Ph: 9985389725 e-mail: swe_pinky@yahoo.com

. allowing businesses to make proactive.2 INDEX ABSTRACT INTRODUCTION WHAT IS DATAMINNG? WHAT IS DATA WAREHOUSING? HOW DO DATAMINING AND DATAWARE HOUSING WORK TOGETHER? APPLICATIONS ADVANTAGES DISADVANTAGES CONCLUSION REFERENCES ABSTRACT We live in the age of information. In today’s competitive global business environment. This paper describes the practicalities and the constraints in Data mining and Data warehousing and its advancements from the earlier technologies. understanding and managing enterprise wide information is crucial for making timely decisions and responding to changing business conditions. the extraction of hidden predictive information from large databases is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. knowledge-driven decisions. Data mining tools predict future trends and behaviors. Studies indicate that the amount of data in a given organization doubles every 5 years. Data mining. In addition there is valuable data available from external sources such as market research organizations. Many companies are realizing a business advantage by leveraging one of their key assets – business Data. Data is the most valuable resource of an enterprise. There is a tremendous amount of data generated by day-to-day business operational applications. Data warehousing has emerged as an increasingly popular and powerful concept of applying information technology to turn these huge islands of data into meaningful information for better business. independent surveys and quality testing labs.

3 INTRODUCTION Data Warehousing • A data warehouse can be defined as any centralized data repository which can be queried for business benefit • Warehousing makes it possible to o o o Extract archived operational data Overcome inconsistencies between different legacy data formats Integrate data throughout an enterprise. such as “Who is my core customer that purchases a particular product we sell?” or “Geographically. Data mining is more intuitive. Data warehousing and business intelligence provide a method for users to anticipate future trends from analyzing past patterns in organizational data. An implementation of data mining in an organization will serve as a guide to uncover inherent trends and tendencies in historical information. as well as allow for statistical predictions. Typical data warehousing implementations in organizations will allow users to ask and answer questions such as “How many sales were made. by sales person between the months of May and June in 1999?” Data mining will allow business decision makers to ask and answer questions. by territory. regardless of location. typically drawn from an enterprise data warehouse is used to analyze and uncover information about past performance on an aggregate level. groupings and Classification of data. how well would a line of products sell in a . format. or communication requirements o Incorporate additional or expert information Data Mining Data mining is not an “intelligence” tool or framework. allowing for increased insight beyond data warehousing.

cuts costs. Data warehousing is defined as a process of centralized data management and retrieval. Data mining software is one of a number of analytical tools for analyzing data. Centralization of data is needed to maximize user access and analysis. data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information information that can be used to increase revenue. WHAT IS DATA WAREHOUSING? Dramatic advances in data capture. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. processing power. disk storage.4 particular region and who would purchase them. Dramatic technological advances are making this vision a reality for many companies. And. equally dramatic advances in data analysis software are allowing users to access this data freely. is a relatively new term although the concept itself has been around for years. Technically. data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. . and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. given the sale of similar products in that region. like data mining. However. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. WHAT IS DATA MINING? Generally. or both. categorize it. The data analysis software is what supports data mining. Although data mining is a relatively new term. It allows users to analyze data from many different dimensions or angles. continuous innovations in computer processing power. Data warehousing. and storage capabilities are enabling organizations to integrate their various databases into data warehouses. the technology is not. and summarize the relationships identified. data transmission.

When data are moved from the operational environment into the data warehouse. each with a specific theme: 1) Classical Techniques such as statistics. neighborhoods and clustering. gender might be coded as "m" and "f" in another by 0 and 1. • Integrated: When data resides in many separate applications in the operational environment.). gender data is transformed to "m" and "f". • Time-variant: The data warehouse contains a place for storing data that are five to 10 years old. etc. networks and rules. The data organized by subject contain only the information necessary for decision support processing. and claim. or older. instead of by different products (auto. HOW DO DATAMINING AND DATAWAREHOUSING WORK TOGETHER?? . Each section will describe a number of data mining algorithms at a high level.g. in one application. and 2) Next Generation Techniques such as trees. to be used for comparisons. We have broken the discussion into two sections.5 According to Bill Inman. encoding of data is often inconsistent.g. An Overview of Data Mining Techniques: This overview provides a description of some of the most common data mining algorithms in use today. there are generally four characteristics that describe a data warehouse: • Subject-oriented: data are organized according to subject instead of application e. and forecasting. author of Building the Data Warehouse and the guru who is widely considered to be the originator of the data warehousing concept. they assume a consistent coding convention e. focusing on the "big picture" so that the reader will be able to understand how each algorithm fits into the landscape of data mining techniques. an insurance company using a data warehouse would organize their data by customer. life. These data are not updated. trends. For instance. premium.

e.from a variety of heterogeneous operational databases o Data is transformed and delivered to the data warehouse/store based on a selected model (or mapping definition) o Metadata . the current operational information o o Preserves the security and integrity of mission-critical OLTP applications Gives access to the broadest possible base of data.  The data.i.information describing the model and definition of the source data elements . trends and correlations that might otherwise be overlooked is called "data mining. APPLICATIONS Data Warehousing • Insulate data ."  All the information is stored in Information repositories. which is sent through data mining is evaluated and presented.6 Extracting meaningful information from numerous databases and cross-referencing it to find patterns.  Data warehouse takes the cleaned and integrated data.  The data taken by Data warehouse is selected and transformed and the useful data is sent through Data mining." Assembling the information in one place is called "data warehousing. • Retrieve data .

most notably Customer relationship Management (CRM). etc.processed data transferred to the data warehouse.automotive diagnostic expert systems. fraud detection etc. which slow down the query times. identifying `unusual behavior' etc.g. genetic sequence analysis. such as low-level transaction information.superconductivity research. hospital cost analysis. Engineering . a large database on a high performance box. Marketing/sales . • • Finance . noisy. fault detection etc. incomplete.7 • Data cleansing . ADVANTAGES: • • Enhances end-user access to a wide variety of data. A data warehouse can be a significant enabler of commercial business applications. Limited Information . • • • Knowledge Acquisition Scientific discovery . target mailing.stock market prediction.drug side effects. Business decision makers can obtain various kinds of trend reports e. • Transfer .product analysis. prediction etc. the item with the most sales in a particular area / country for the last two years. DISADVANTAGES: Data mining systems rely on databases to supply the raw data for input and this raises problems in that databases tend be dynamic. buying patterns. Data Mining • Medicine . credit assessment. and large.removal of certain aspects of operational data. sales prediction. Other problems arise as a result of the adequacy and relevance of the information stored.

novelties and new candidate features have been expressed in a proliferation of small start-ups that have been ruthlessly culled from the herd by a perfect storm of bad economic news. and fraud detection. just in time inventory and market basket optimization are a staple of predictive analytics. Inconclusive data causes problems because if some attributes essential to knowledge about the application domain are not present in the data it may be impossible to discover significant knowledge about a given domain. For example cannot diagnose malaria from a patient database if that database does not contain the red blood cell count of the patients. the emerging market for predictive analytics has been sustained by professional services. On the product side. Predictive analytics should be used to get to .8 A database is often designed for purposes different from data mining and sometimes the properties or attributes that would simplify the learning task are not present nor can they be requested from the real world. travel and leisure. consumer finance. service bureaus (rent a recommendation) and profitable applications in verticals such as retail. Nevertheless. FUTURE VIEWS The future of data mining lies in predictive analytics. and related analytic applications. telecommunications. campaign optimization. success stories in demand planning. The technology innovations in data mining since 2000 have been truly Darwinian and show promise of consolidating and stabilizing around predictive analytics. customer value and churn management. Predictive analytics have successfully proliferated into applications to support customer recommendations. Missing data can be treated by discovery systems in a number of ways such as. • • • • Simply disregard missing values Omit the corresponding records Infer missing values from known values Treat missing data as a special value to be included additionally in the attribute domain • Or average over the missing values using Bayesian techniques. Variations.

Be realistic about the required complex mixture of business acumen. and market information have resulted in an explosion of information. However. but make no assumptions about the limits of predictive analytics. Quantifiable business benefits have been proven through the integration of data mining with current information systems. but brute force navigation of data is not enough.9 know the customer. and new products are on the horizon that will bring this integration to an even wider audience of users. Competition requires timely and sophisticated analysis on an integrated view of the data. segment and predict customer behavior and forecast product demand and related market dynamics. supplier. A new technological leap is needed to structure and prioritize information for specific end-user problems. there is a growing gap between more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on the information they contain. statistical processing and information technology support as well as the fragility of the resulting predictive model. Breakthroughs often occur in the application of the tools and methods to new commercial opportunities. The data mining tools can make this leap. CONCLUSION: Comprehensive data warehouses that integrate operational data with customer. . Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses.

nl b. Pujari. www.the-data-mine.com c.kluweronline.10 • • • Data mining has a lot of potential Diversity in the field of application Estimated market for data mining is $500 million REFERENCES: 1.Books Referred: a. www. Internet Sites Availed: a.Arun k. www. c.Mallach 2.internet2. Data Mining: concepts and techniques-Jiawei Han b. Data Mining Techniques.com . Decision Support and Data Warehouse systems-Efrem G.

Sign up to vote on this title
UsefulNot useful