You are on page 1of 10

EXPERIMENT NO : 07

AIM : Classification and clustering using weka

THEORY :

Classification is to identify the category or the class label of a new observation. First, a set of data is
used as training data. The set of input data and the corresponding outputs are given to the algorithm.
So, the training data set includes the input data and their associated class labels. Using the training
dataset, the algorithm derives a model or the classifier. The derived model can be a decision tree,
mathematical formula, or a neural network. In classification, when unlabeled data is given to the
model, it should find the class to which it belongs. The new data provided to the model is the test data
set.

Clustering is an unsupervised Machine Learning-based Algorithm that comprises a group of data


points into clusters so that the objects belong to the same group.Clustering helps to splits data into
several subsets. Each of these subsets contains data similar to each other, and these subsets are
called clusters. Clustering is the grouping of specific objects based on their characteristics and their
similarities. As for data mining, this methodology divides the data that is best suited to the desired
analysis using a special join algorithm. This analysis allows an object not to be part or strictly part of
a cluster, which is called the hard partitioning of this type.

However, smooth partitions suggest that each object in the same degree belongs to a cluster. More
specific divisions can be created like objects of multiple clusters, a single cluster can be forced to
participate, or even hierarchic trees can be constructed in group relations. This filesystem can be put
into place in different ways based on various models. These Distinct Algorithms apply to each and
every model, distinguishing their properties as well as their results. A good clustering algorithm is
able to identify the cluster independent of cluster shape.

Clustering methods can be classified into the following categories −

 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
 Model-Based Method
 Constraint-based Method

OUTPUT :

Finding recurrance and non recurrance


Missing values
Filling missing value of random forest
Use of holdout method
Identify the predicted class using holdout method
Clustering using weka
Classes to cluster evaluation
CONCLUSION : Thus we have successfully studied and implemeted Classification and
clustering using WEKA.

LAB OUTCOME : We have achieved LO3 and LO4 successfully.

LO3 : Implement the appropriate data mining methods like classification, clustering or association
mining on large dataset using open source tool like WEKA.

LO4 : Implement various data mining algorithms from scratch using language like python/java etc.

You might also like