tool. Weka also provides the graphical user interface of the user and provides many facilities [4, 7].
We have explained how to invoke filtering and learning schemes with the Explorer and connect them together with the Knowledge Flow interface. To go further, it is necessary to learn something about how Weka is put together. Detailed, up-to-date information can be found in the online documentation included in the distribution. This is more technical than the descriptions of the learning and filtering schemes given by the more buttons in the Explorer and Knowledge Flow’s object editors. It is generated directly from comments in the source code using Sun’s Javadoc utility. To understand its structure, you need to know how Java programs are organized.
Classes, instances, and packages-
Every Java program is implemented as a class. In object-oriented programming, a class is a collection of variables along with some methods that operate on them. Together, they define the behaviour of an object belonging to the class. An object is simply an instantiation of the class that has values assigned to all the class’s variables. In Java, an object is also called an instance of the class.
The weka.classifiers package-
The classifiers package contains implementations of most of the algorithms for classification and numeric prediction described in this book. The most important class in this package is Classifier, which defines the general structure of any scheme for classification or numeric prediction. Classifier contains three methods, buildClassifier(), classifyInstance(), and distributionForInstance().
EMAIL MINING THROUGH WEKA
Clustering of spam messages means automatic grouping of thematically close spam messages. In case of information streams as E-mails, this problem becomes complicated necessity to carry out this process in real-time mode. There are some complications connected with plurality of a choice of algorithms for clustering of spam messages. Different methodologies use different similarity algorithms for electronic documents in case of a considerable quantity of signs. As soon as classes are defined by clustering method, there is a necessity of their support as spam constantly changes, and spam messages collection replenishes. In considered work, the new algorithm for definition of criterion function of spam messages clustering problem is offered. The clustering problem itself is solved by genetic algorithm . Weka tools are the subjects of many scientific works.
In email mining, Weka is a program that evaluates the data analysis to develop models to support on decision making on business warehouse management. There are many data mining techniques for model developing. Among the most popular ones are Classification, Clustering and Association Rule Discovery which are applied in model developing (Weka Machine Learning Project, 2010; Wass, 2007). For the models supporting decision making in business warehouse management in this essay, Classification is used in analyzing nominal data and prediction analysis for numeric data.
IMPLEMENTATION OF WEKA TOOL
My methodology is very simple. I am taking the past project data from the repositories and apply it on the weka. Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets.
51http://sites.google.com/site/ijcsis/ ISSN 1947-5500