You are on page 1of 20

Submitted by :

Abhishek Chowdhury(10030141006) Samradni Ghone(10030141009) Puneet Gupta(10030141026)

Contents :
Data Mining
Next Generation Data Mining Techniques

Decision Trees Neural Network Association Rules Implementation of Techniques in Business

What is Data Mining? Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.

Data Mining Techniques :


Decision Tree Neural Network Association Rules

What is a Decision Tree? A decision tree is a predictive model that, as its name implies, can be viewed as a tree. Specifically each branch of the tree is a classification question and the leaves of the tree are partitions of the dataset with their classification.

A decision tree consists of 3 types of nodes:1. Decision nodes - commonly represented by squares 2. Chance nodes - represented by circles 3. End nodes - represented by triangles

Example :

Where can decision trees be used?


Using Decision Tree for Exploration Using Decision Tree for Preprocessing Using Decision Tress for Prediction

Growing of decision Tree


When to start Growing the decision Tree?

When does the Tree stop growing?


Why would a decision algorithm stop growing the tree if

there wasnt enough data?

Rule Induction
Rule induction is one of the major forms of data mining

and is perhaps the most common form of knowledge discovery in unsupervised learning systems. In Rule induction all possible patterns are systematically pulled out of the data and then an accuracy and significance are added to them that tell the user how strong the pattern is and how likely it is to occur again.

What is a Rule?
In rule induction systems the rule itself is of a simple form

of if this and this and this then this. For example a rule that a supermarket might find in their data collected from scanners would be:

if pickles are purchased then ketchup is purchased. Or If paper plates then plastic forks

In order for the rules to be useful there are two pieces of

information that must be supplied as well as the actual rule:

Accuracy - How often is the rule correct? Coverage - How often does the rule apply?

What is a Rule?
In some cases accuracy is called the confidence of the rule

and coverage is called the support. Accuracy and coverage appear to be the preferred ways of naming these two measurements.
Rule If breakfast cereal purchased then milk purchased. Accuracy 85% Coverage 20%

If bread purchased then swiss cheese purchased.

15%

6%

The left hand side is called the antecedent and the right

hand side is called the consequent.

What to do wirh a Rule?


When the rules are mined out of the database the rules

can be used either for understanding better the business problems that the data reflects or for performing actual predictions against some predefined prediction target.
Target the antecedent.
Target the consequent. Target based on accuracy.

Target based on coverage.


Target based on interestingness.

Discovery
These systems provide both a very detailed view of the

data where significant patterns Target the antecedent. These systems thus display a nice combination of both micro and macro views:
Macro Level - Patterns that cover many situations are

provided to the user that can be used very often and with great confidence and can also be used to summarize the database. Micro Level - Strong rules that cover only a very few situations can still be retrieved by the system and proposed to the end user

Discovery
These systems provide both a very detailed view of the

data where significant patterns Target the antecedent. These systems thus display a nice combination of both micro and macro views:
Macro Level - Patterns that cover many situations are

provided to the user that can be used very often and with great confidence and can also be used to summarize the database. Micro Level - Strong rules that cover only a very few situations can still be retrieved by the system and proposed to the end user

Prediction
Each rule by itself can perform prediction - the

consequent is the target and the accuracy of the rule is the accuracy of the prediction. But because rule induction systems produce many rules for a given antecedent or consequent there can be conflicting predictions with different accuracies. This can be done in a variety of ways by summing the accuracies as if they were weights or just by taking the prediction of the rule with the maximum accuracy.

Prediction
Antecedent bread butter eggs cheese Consequent milk milk milk milk Accuracy 35% 65% 35% 40% Coverage 30% 20% 15% 8%

The business importance of accuracy and coverage


From a business perspective accurate rules are important because

they imply that there is useful predictive information in the database that can be exploited - namely that there is something far from independent between the antecedent and the consequent.
Accuracy Low Accuracy High Rule is often correct and can be used often. Rule is often correct but can be only rarely used. Rule is rarely correct but can be used often. Rule is rarely correct and can be only rarely used.

Coverage High Coverage Low

For instance you may have a rule that is 100% accurate but is only

applicable in 1 out of every 100,000 shopping baskets. You can rearrange your shelf space to take advantage of this .