Professional Documents
Culture Documents
- Prof.Sachin Lamkane
Item set : -
A set of items that appear frequently together in a transaction data set is a frequent itemset.
For e.g. purchasing milk and breads.
Subsequence :-
A subsequence, such as buying first a PC, then a digital camera, and then a memory card, it
occurs frequently in a shopping history database.
Substructures :-
A Substructure can refer to different structure forms, such as sub graphs, sub trees, or sub
lattices, which may be combined with item set or subsequences.
Association Rules:
This data mining technique helps to discover a link between two or more items. It finds a hidden
pattern in the data set.
Association rules are if-then statements that support to show the probability of interactions between
data items within large data sets in different types of databases. Association rule mining has several
applications and is commonly used to help sales correlations in data or medical data sets.
The way the algorithm works is that you have various data, For example, a list of grocery items that you
have been buying for the last six months. It calculates a percentage of items being purchased together.
o Lift:
This measurement technique measures the accuracy of the confidence over how often item B is
purchased.
(Confidence) / (item B)/ (Entire dataset)
1
o Support:
This measurement technique measures how often multiple items are purchased and compared it to the
overall dataset.
(Item A + Item B) / (Entire dataset)
o Confidence:
This measurement technique measures how often item B is purchased when item A is purchased as well.
(Item A + Item B)/ (Item A)
A typical example of frequent item set mining is Market Basket Analysis. This process
analyzes customers buying habits by finding associations between the different items that
customers place in their shopping Basket. This discovery of association can help retailers
develop marketing strategies by gaining insight into which item are frequently purchased
together by customers. For instance, if customers are buying milk, how likely are they to
also buy bread?
2
Apriori Algorithm :-
Apriori is a seminal algorithm proposed by R.Agrawal and R.Shrikant in 1994 for mining
frequent item set for Boolean association rules. The name of algorithm is based on the fact
that the algorithm uses prior knowledge of frequent item set properties. Apriori employs an
iterative approach known as level–wise search, where K-item set are used to explore (k+1)
item set.
To improve the efficiency of the level-wise generation of frequent
item set, an important property called the Apriori Property is used to reduce the search
space.
Apriori Property :-
All nonempty subset of a frequent item set must also be frequent. Apriori property consist
two-step process called Join And prune.
1. The Join Step: - To find Lk, a set of candidate K-item sets is generated by joining
Lk-1 with itself. This denoted by Lk.
2. The Prune Step: - In This step size of Lk is reduced.
3
Classification and Prediction:-
What is Classification:-
3
Here the class label (i.e. loan decision) of each training tuple is provided, Hence classification is
also known as supervised learning.
In the first step, a classifier is built describing a predetermined set of data classes or
concept. Which is known as models construction or learning step or training step. Construction
of a classification model is based on training data. Training data consist a set tuples.
A tuple, X is represented by an n-dimensional attribute vector. X=(x, x2….xn).Each tuple, X is
assumed to belong to a predicted class as determined by another database attribute called the class
label attribute.
4
How Are Decision Trees Used For Classification?
Given a tuple, X, for which the associated class label is unknown, the attribute values of the
tuple are tested against the decision tree. A path is traced from the root to a leaf node, which
holds the class predicted for the tuple. Decision trees can easily be converted to the
classification rules.
For e.g.
RID Age Income Student Credit rating Class
1 Youth High No Fair ?
Performance:-
The construction of decision tree classifiers does not require any domain knowledge or
parameter setting. It can handle multidimensional data. Their representation of acquired
knowledge in tree form is initiative and easily understood by human’s. The learning and
classification steps of decision tree induction are simple and fast. Decision tree classifiers have
good accuracy. Decision tree used for classification in many application areas such as medicine,
manufacturing, and production, financial analysis, astronomy, and molecular Biology.
5
RID Age Income Student Credit-rating Class: buys computer
In Above table:
To find the splitting criterion for these tuples, we must computer the information gain of each
attribute.
6
According To these
Here age has the highest information gain among the attributes, so it is selected as splitting
attribute.
7
The tuples falling into the partition for age=middle-aged all belong to the same class. Because they
all belong to class “Yes” a leaf should therefore be created at the end of this branch and labeled
“Yes”.
8
There are two techniques for pruning:
1) Pre pruning
2) Post pruning
1) Pre Pruning:-
In the pre pruning approach, a tree is “pruned” by halting its construction early. Upon halting,
the node becomes a leaf. The leaf may hold the most frequent class among the subset tuples or
the probability distribution of those tuples.
When constructing a tree, measures such as statistical significance, information gain,
Gini index, and so on can be used to access the goodness of a split. If partitioning the tuples at a
node would result the split that falls below a participation there sold, then further partitioning of
the given subset is halted otherwise .it is expanded high there sold result in oversimplified trees,
whereas low there sold result in very little simplification.
2) Post pruning:-
In the postpruning method, it removes sub tree from a “Fully grown” tree. A sub tree at a given
node is pruned by removing its branches and replacing it with a leaf. The leaf is labeled with
the most frequent class among the sub tree being replaced. for example, sub tree at node A3? In
the unpruned tree, the most common class within the sub tree is “class B”. In the pruned version
of the tree it is replaced with the leaf “class B”.
This Formula is evaluated at the current node, t and for each possible splitting attribute and
criterion, s. Here L and R are used to indicate the left and right sub tree of the current node in
the tree. PL and PR are the probability that a tuple in the training set will be on the left or right
side of tree. P (cj | tL) or P(cj | tR) is the probability that a tuples is in this class cj and in the
left or right sub tree.
Bayesians Classification:-
Bayesian classification is based on Bayes theorem. Bayesial classifiers are the statistical
classifiers. They can predict class membership probabilities such as the probability that a given
belong to a particular class.
9
Bayes Theorem:-
Bayes theorem is named after Thomas Bayes. Let X be a data tuples. In the
Bayesian terms, X is considered “evidence”. As usual it is described by set on n attribute. Let H
be some hypothesis such as the data tuples X belongs to a specific class C. For classification
problem, we want to determine P(H/X), the probability that the hypothesis H holds given the
“evidence” X.
There are two types of probabilities.
1. Posterior probability [ P (H/X) ]
2. Prior probability [ P(H) ]
1) Posterior Probability-P(H/X):-
P(H/X) is the posterior probability or a posteriori probability, of H conditioned on X.
For example: In the data tuple customers described by the attribute age with 35 years old and
income with $40000.Supose that H is the hypothesis that our customers will buy a computer.
Then P (H/X) reflects the probability that customer X will buy a computer given that we know
the customer’s age and income.
2) Prior Probability-P(H):-
P (H) is the prior probability, or a priori probability, of H.
For example this is the probability that any given customer will buy a computer, regardless of
age, income, or any other information.
The posterior probability, P (H/X), is based on more information than prior probability P (H),
which is independent of X.
Bayes Theorem is:
10
(For explanation consider Table A from page IV)
2. Suppose that three are M classes c1, c2, --------- cm. Given a tuples X the classifiers will predict
that X belongs to the class having the highest posterior probability, conditional on X. that is the
naïve Bayesian classifiers predict that tuples X belongs to class Ci if and only if
P (ci/x) > P (cj/x) For 1<=j<=m, j # i
Thus, we maximize P (ci/x). The class ci for which P (ci/x) is maximized is called the
maximum posteriori hypothesis. Bayes theorem:
3. The class prior probabilities may be estimated by P (Ci) = | Ci, D | / | D | , Where | Ci, D | is
the number of training tuples of class Ci in D
For eg The tuple we wish to classify is
We need to maximize P (X/Ci) P (Ci), for i=1, 2. The prior probability of each class, can be
computer on the training tuples.
11
= P (x1/ci) X p (x2/ci) X ------X p (xn/ci)
For example, using above probabilities we find.
P (x/buys. Computer = Yes) = P (age = youth/ buys. Computer = Yes)
X P (income = medium/ buys. Computer = Yes)
X P (Student = Yes/ buys. Computer = Yes)
X P (Credit rating = youth/ buys. Computer = Yes)
= 0.222 X 0.444 X 0.667 X 0.667
= 0.044
Similarly,
P (x/ buys. Computer = No) = 0.600*0.400*0.200*0.400
=0.019
To find the class, ci that maximize P (x/ci) P (ci) we compute
P(x/buys. Computer=yes) P (buys. Computer=yes) = 0.444*0.643
= 0.028
P(x/buys. Computer = No) P (buys. Computer = No) = 0.019*0357
= 0.007
Therefore, the naïve Bayesian classifier predict buys. Computer = yes for tuple x.
Bayesian Network:-
Bayesian network especially joint conditional probability distribution. They are also known as
Bayesian belief network, belief network, probabilistic networks.
12
The arc in the diagram allows representation of casual knowledge. For example , lung cancer is
influenced by a person’s family history of lung cancer , as well as whether or not the person is
smoker note that the variable positive x-ray is independent of whether the patient has a family
history of lung cancer or is a smoker, given that we that we know the patient has lung cancer.
13
Linear Classification:-
A large number of algorithm for classification can be phrased in terms of linear function.
1. Logistic regression:-
In statistic, logistic regression is a regression model where the dependent variable (DV) is a
categorical. The binary dependent variable which considers two values, such as pass/fail,
win/lose. Alive/dead or healthy/diseased. Cases with more than two categories are referred to as
Multinomial Logistic Regression.
Logistic Regression measures the relationship between the
categorical dependent variable and one or more independent variable by estimating
probabilities using logistic function which is the cumulative logistic distribution.
Logistic regression predict the probability of particular outcome. For e.g. probability of passing
an exam versus hour of study.
2. Perception:-
In machine learning, the perception is an algorithm for surprised learning of binary classifiers.
Functions that can decide whether an input belongs to one class or another. It is a type of linear
classifier i.e. a classification algorithm that makes its prediction based on a linear predictor
function combining a set of weights with the feature vector. The algorithm allows to processes
element in the training set one at a time.
14
There are an infinite number of separating lines that could be drawn. We want to find the best one
that is one that will have the minimum classification error on previously unseen tuples. How can
we find this best line?
An SVM approaches this problem by searching for the maximum
marginal hyper plane.
Prediction:-
Prediction models continuous- valued functions i.e. predicts unknown or missing values.
For e.g. A marketing manager would like to predict how much a given will spend during a sale.
Customer
Prediction Rs.50, 000
Profile
Regression Analysis is used for Prediction.
1. Linear Regression:-
Linear Regression is a statistical procedure for predicting the value of a dependent variable
from an independent variable when the relationship between the variables can be described with
a linear model.
15
A linear regression model is typically stated in the form Y= α + βx+ ε
Here α and β are linear combination of the parameters. The term ε
represent the unpredicted or unexplained variation in the dependent variable, it is
conventionally called the error.
Questions :
1. What is mean by frequent item set? Explain association rules.
2. Explain Apriori algorithm.
3. What is classification? Explain decision tree induction with example.
4. Explain over fitting tree pruning methods in detail.
5. Explain navie Bayes algorithm in detail.
6. Explain Bayesian belief network.
7. What is Linear classification? Explain logistic method of classification.
8. How regression is used for classification and prediction.
9. What is the difference between linear and nonlinear regression
16