You are on page 1of 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/280612446

Using WEKA in your java code (Clustering)

Research · August 2015


DOI: 10.13140/RG.2.1.4865.5207

CITATIONS READS
0 7,758

1 author:

Oussama Ahmia
Université Bretagne Sud
10 PUBLICATIONS   11 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Oussama Ahmia on 03 August 2015.

The user has requested enhancement of the downloaded file.


Using WEKA in your java code (Clustering) Oussama Ahmia
Email: ahmia@labged.net

BUILDING A CLUSTERER
BATCH:
A clusterer is built in much the same way as a classifier, but the
method buildClusterer(Instances) is replaced by buildClassifier(Instances).

The following code snippet shows how to build a SimpleKmean clusterer (I Will explain how to generate
option using GUI at the end of this document):

import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;

//The DataSource class is not limited to ARFF files. (3.5.5 and newer)
//It can also read CSV files and other formats (basically all file formats
that Weka can import via its converters using the gui.

DataSource source = new DataSource("/some/where/data.csv");

//convert the data to "Instances" instances


Instances data = source.getDataSet();
//The option string is generated using the WEKA GUI
String Options="-init 0 -max-candidates 100 -periodic-pruning 10000 -min-
density 2.0 -t1 -1.25 -t2 -1.0 -N 10 -A \"weka.core.EuclideanDistance -R
first-last\" -I 500 -num-slots 1 -S 10";

SimpleKMeans kmean= new SimpleKMeans();


kmean.setOptions(weka.core.Utils.splitOptions(Options));
kmean.buildClusterer(data);
System.out.println(kmean.toString());
//we get the 6th instance (line)
Instance test = data.get(5);
//we classify the instance
System.out.println(kmean.clusterInstance(test));

The method clusterInstance(test)is used to classify an example.

INCREMENTAL:
Clusterers implementing the weka.clusterers.UpdateableClusterer interface can be trained
incrementally (available since version 3.5.4). This conserves memory, since the data doesn't have to be
loaded into memory all at once. (Used if the data are too big to be loaded in the memory)

NB:not all the clusterer are incremental.


Using WEKA in your java code (Clustering) Oussama Ahmia
Email: ahmia@labged.net

The process of training an incremental clusterer is done following those steps:

1. Call buildClusterer(Instances) with the structure of the dataset (may not contain any
actual data rows, only the structure is important).
2. Subsequently call the updateClusterer(Instance) method to feed the clusterer
new weka.core.Instance objects, one by one.
3. Call updateFinished() after all Instance objects have been processed, for the clusterer to
perform additional computations.

Here is an example using data to train weka.clusterers.Cobweb:

import java.io.File;
import weka.clusterers.Cobweb;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader;

// load data we use arff file to get the structure of our training set
ArffLoader loader = new ArffLoader();
loader.setFile(new File("/some/where/data.arff"));
Instances structure = loader.getStructure();
String Options =" -A 1.0 -C 0.0028 -S 42";
// create Cobweb clster
Cobweb cw = new Cobweb();
cw.setOptions(weka.core.Utils.splitOptions(Options));
cw.buildClusterer(structure);
Instance current;
//we suppose that there is data in our arff file
while ((current = loader.getNextInstance(structure)) != null)
cw.updateClusterer(current);
cw.updateFinished();

CLUSTERER OPTION :
The easiest method (my personal opinion) to set Clusterer option (the method works also for classifier) is by
following this steps:

1. Configuring the clusterer/classifier using WEKA GUI.


2. Copy the option string by clicking with the right button on the clusterer/classifier field and selecting “copy
configuration to the clipboard”.
Using WEKA in your java code (Clustering) Oussama Ahmia
Email: ahmia@labged.net

3. Delete the class name of the beginning of the option string that we get from step “2” example:
“weka.clusterers.SimpleKMeans” and pay attention to put “\” after ( " ) if the option String contains some.

View publication stats

You might also like