You are on page 1of 16

Lokmanya Tilak Jankalyan Shikshan Sanstha’s

PRIYADARSHINI J. L. COLLEGE OF ENGINEERING,


NAGPUR
An Autonomus Institution Affilliated to R.T.M. Nagpur University
Accredited with Grade “A” by NAAC

SESSION 2023-24 SEMESTER-V


DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING

ACTIVITY REPORT
SUBJECT:- DATAWARE HOUSING AND MINING

ACTIVITY 1
“DATA MINING TOOL :- WEKA”

SUBMITTED BY :- RIYA PRAKASH PATIL


ROLL NO:- 15

Under The Guidance of


Prof. Manisha Vaidya

Sign of Subject Teacher Sign of HOD


Information

Weka contains a collection of visualization tools and


algorithms for data analysis and predictive modelling,
together with graphical user interfaces for easy access
to these functions. The original non-Java version of
Weka was a Tcl/Tk front-end to (mostly third-party)
modelling algorithms implemented in other
programming languages, plus data preprocessing
utilities in C and a make file-based system for running
machine learning experiments.

This original version was primarily designed as a tool


for analyzing data from agricultural domains. Still, the more recent fully Java-based
version (Weka 3), developed in 1997, is now used in many different application areas,
particularly for educational purposes and research.

The foundation of any Machine Learning application is data - not just a little data but a
huge data which is termed as Big Data in the current terminology.
To train the machine to analyze big data, you need to have several considerations on the
data −
• The data must be clean.
• It should not contain null values.
Besides, not all the columns in the data table would be useful for the type of analytics
that you are trying to achieve. The irrelevant data columns or ‘features’ as termed in
Machine Learning terminology, must be removed before the data is fed into a machine
learning algorithm.

Features of Weka
1. Preprocess

The preprocessing of data is a crucial task in data mining. Because most of the data is
raw, there are chances that it may contain empty or duplicate values, have garbage
values, outliers, extra columns, or have a different naming convention. All these things
degrade the results.

2. Classify

Classification is one of the essential functions in machine learning, where we assign


classes or categories to items. The classic examples of classification are: declaring a
brain tumour as "malignant" or "benign" or assigning an email to a "spam" or
"not_spam" class.

3. Cluster

In clustering, a dataset is arranged in different groups/clusters based on some


similarities. In this case, the items within the same cluster are identical but different
from other clusters. Examples of clustering include identifying customers with similar
behaviours and organizing the regions according to homogenous land use.

4. Associate

Association rules highlight all the associations and correlations between items of a
dataset. In short, it is an if-then statement that depicts the probability of relationships
between data items. A classic example of association refers to a connection between the
sale of milk and bread.

5. Select Attributes

Every dataset contains a lot of attributes, but several of them may not be significantly
valuable. Therefore, removing the unnecessary and keeping the relevant details are very
important for building a good model.

6. Visualize

In the visualize tab, different plot matrices and graphs are available to show the trends
and errors identified by the model.
How to Install Dataset in Weka Tool

1) Download the software from https://sourceforge.net/projects/weka/

2) After successful download, open the file location and double-click on the downloaded
file. The Step-Up wizard will appear. Click on Next
3) The License Agreement terms will open. Read it thoroughly and click on “I Agree”.

4) According to your requirements, select the components to be installed. Full


component installation is recommended. Click on Next.

5) Select the destination folder and Click on Next.


6) Then, Choose start menu folder and install.

7) Then, Installation will start

8) After the installation is complete, the following window will appear. Click on Next.
9) Click on Finish.

10) WEKA Tool and Click on Explorer

11) Explorer window opens.


The WEKA Explorer windows show different tabs starting with preprocessing. Initially,
the preprocess tab is active, as first the data set is preprocessed before applying
algorithms to it and exploring the dataset.

16) Then open file and choose local disk : C Then go to program files and choose Weka-
3-8-6 , click on the data. Now there are default data set are available . Now let us click
on the “Diabetes”

19) Now this is the graph of the Weka


20) Now we can see that this dataset having 2 distinct classes

21) Now after this preprocessing , Click the “Classify” tab. This is the area for running
algorithms against a loaded dataset in Weka. You will note that the “ZeroR” algorithm is
selected by default. Click the “Start” button to run this algorithm.
ZeroR is the simplest classification method which relies on the target and
ignores all predictors.
25) Now we can choose the classifier this is “Logistics” and then start.
27) Now we can show the visualize threshold curves and then testing negative
29) Now you can see the recall area and precision area
30) Now we can choose the classifier this is “J48” and then start and we can see
visualize tree

31) Now will go to clustering , to choose SimpleKMeans clustering And click on start
35) Now go to “Associate” , Choose the “FilteredAssociater” and click on start

38) Now you can go to select attributes and choose attribute evaluator is
“CfsSubsetEvul” and also select Search method is “BestFirst” then click on start
40) Can also do the visualization each point you can view each Plot Matrix
Conclusion
In conclusion, we have learned about the Weka Tool, i.e., is an open-source software
designed for data mining and machine learning. We discovered how to install Weka and
explored its various components. WEKA is a powerful tool for developing machine
learning models. It provides implementation of several most widely used ML
algorithms. Before these algorithms are applied to your dataset, it also allows you to
preprocess the data. We learned how data set are apply in the Weka for preprocessing,
classification, clustering, etc. We learn about the algorithm that use in the Weka.

You might also like