You are on page 1of 6

# Experiment No.

08
PART B
(PART B: TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the
practical. The soft copy must be uploaded on the Blackboard or emailed to the concerned
lab in charge faculties at the end of the practical in case the there is no Black board access
available)
Roll No. E059
Program : BTech Computer
Batch: E3
Date of Submission:

## Name: Shubham Gupta

Division: E
Date of Experiment:

Classifica
tion

Clusteri
ng

Associat
ion

## B.2 Observations and learning:

We used a data set in Excel Data Mining Plugin. Classification, clustering and
association were carried out on it.

B.3 Conclusion:
Hence we have studied and implemented Association, Clustering and
Classification using MS SQL Data Mining add-in in Excel.

## B.4 Questions of Curiosity

Q.1) Draw your own observations for the rules generated by
SQL Data Mining addin for Apriori Algorithm based on your
data set used and support and confidence used.
The rules hence generated for Apriori algorithm have been observed to be
correct based on the support and confidence used. Support used here in the
data set is 3.

## Q.2) Draw your conclusion on the confusion matrix given by

classification algorithm in SQL Data Mining addin.

## In the field of machine learning, a confusion matrix, also known as a

contingency table or an error matrix, is a specific table layout that allows
visualization of the performance of an algorithm, typically a supervised
learning one (in unsupervised learning it is usually called a matching matrix).
Each column of the matrix represents the instances in a predicted class,
while each row represents the instances in an actual class. The name stems
from the fact that it makes it easy to see if the system is confusing two
classes (i.e. commonly mislabeling one as another).
Q.3) Draw your analysis for clusters generated in SQL Data
Mining addin for the suitable dataset.
Cluster is a group of objects that belong to the same class. In other words
the similar object are grouped in one cluster and dissimilar are grouped in
other cluster.Clustering is the process of making group of abstract objects
into classes of similar objects.For the data set used in the experiment, only
one cluster is generated.
Q.4) Compare Weka and SQL Data Mining Add-in.
1. Power and flexibility: Weka's Experimenter is easy to use but it is not
flexible enough to meet real-worlds process requirements. SQL Data Mining
provides tools for association, clustering and classification. Algorithms used
are quite efficient.
2. Scalability: The algorithms were also optimized for speed. Our database
contains 1.6 billion transactions and our data mining processes work quite
well on that amount of data. On Weka we always had to use rather small
samples and never were able to directly work on the database. Since sample
data is not available in the SQL Server Add-in, we have to either provide a
database or excel sheet containing records. SQL Add-in scales well for large
data if trained properly.
3. Visualization: Things look much better in Weka. In SQL Server Add-in,
various
graphs
are
provided
to
the
user
after
the
classification/clustering/association algorithms have been executed. This
gives the user a better feel of the outputs obtained and the clusters that are
formed.
4.

## Preprocessing: It is really amazing how many methods for preprocessing

and data extraction / transformation are available directly within SQL Add-in.
There are much more methods for these really important aspects of data
analysis than in Weka. This integrates all phases of analysis into one
process / tool and my work became really smooth.