You are on page 1of 2

Data-Mining Data-Mining is emerging as one of the new database-industry buzzwords.

So what exactly is this concept all about? To present a simple comparison, data mining can be considered analogous to excavating a diamond from its mine. The diamond in this case, is the snippet of business intelligence that is sought and the mine is the large sea of data or data warehouse, which has been accreted over a period of time. Data-Mining is essentially a knowledge discovery process. With the increase in storage capacity and the ability of DBMSs to handle very large databases, more data is stored and is accessible. Data Mining entails automated extraction of hidden, predictive information from large databases. It employs statistical and analytical techniques to ferret out consistent patterns and behaviors in the data that were so far unnoticed. This uncovered "intelligence" can be utilized for automated decision support systems or assessed by a human analyst. Data and business analysts cannot make sense of the humungous amount of data available to make informed business decisions. The human brain's limitation while educing complex multi-dimensional dependencies in data and the need for a more objective approach in carrying out such analysis has given an impetus to data-mining. Data-mining allows you to adopt a proactive approach to extract knowledge from your legacy data and anticipate future situations. It helps you to streamline your business decisions and make predictions based on sound business intelligence. For example, data mining allows banks to predict how customers will react to interest rate changes, which customers will be most receptive to a new product, which customers pose the maximum risk for defaulting on a loan and how to make each customer relationship more profitable. Based on the nature of the problem, various data-mining approaches can be followed. The most general sequences of steps that are part of any data-mining project are as follows: Data Preparation Data Modeling Deployment The various data-mining algorithms used for "modeling" the data are as follows: Regression: This method takes a numerical data-set and searches for dependence of the outcome (i.e. the target variable) on other variables in the form of a function of some predetermined form. It aims at developing a mathematical formula that fits the data in the data-set and using the formula with the new data to obtain predictions. Regression is used when the data is continuous and quantitative in nature. K-Nearest Neighbor: This technique classifies each record in the data-set based on a combination of classes of the "k" records most similar to it in the data-set. Neural Networks: Neural Networks are analytical techniques modeled after the process of learning in the cognitive system and the neurological functions of the brain. It is a system composed of many simple processing elements operating in parallel, whose function is determined by network structure, connection strengths, and the processing performed at computing elements or nodes. It is capable of predicting new observations (on certain variables) from other observations (on the same or other variables) after "learning" from existing data. Rule Induction using Decision Trees: This method involves deriving and employing useful if-then rules from the data based on statistical significance to obtain predictions. A typical decision tree is a hierarchical structure of nested if-then decisions. These decisions generate rules for the classification of a data-set. Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) are decision tree techniques used for classification of a data-set. The decision trees generate rules that can be applied to a new (unclassified) data-set to predict which records will have a given outcome. CART partitions a data-set by creating 2-way splits, while CHAID segments a data-set using chi square tests to create multi-way splits. CART

regression analysis and clustering algorithms. the higher the level of aggregation. Microsoft's SQL Server 2000 also offers data-mining functionality through the use of classification trees and clustering algorithms. the less similar are the members in a cluster. k-nearest neighbors. There has been increased interest in developing new analytical techniques specifically designed to address the issues pertinent to business data-mining. – Kapil Talreja . Oracle's Data Mining Suite (Darwin) implements classification and regression trees. neural networks. Major database vendors have taken steps to ensure that their platforms incorporate data mining techniques. Clustering: This is another classification technique for data mining. Data-Mining is becoming increasingly popular as a business information management tool. For example. Data items are organized into groups or clusters according to logical relationships or consumer preferences. In this technique. where it is required to reveal knowledge structures that can guide decisions in conditions of limited certainty.usually requires less data preparation than CHAID. data can be mined to identify market segments or consumer affinities.