Using WEKA in Your Java Code (Clustering) : August 2015

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/280612446
Using WEKA in your java code (Clustering)
Research · August 2015

DOI: 10.13140/RG.2.1.4865.5207
CITATIONS READS
0 7,758
1 author:
Oussama Ahmia
Université Bretagne Sud
10 PUBLICATIONS 11 CITATIONS
SEE PROFILE
All content following this page was uploaded by Oussama Ahmia on 03 August 2015.
The user has requested enhancement of the downloaded file.

Using WEKA in your java code (Clustering) Oussama Ahmia
Email: ahmia@labged.net
BUILDING A CLUSTERER
BATCH:
A clusterer is built in much the same way as a classifier, but the
method buildClusterer(Instances) is replaced by buildClassifier(Instances).
The following code snippet shows how to build a SimpleKmean clusterer (I Will explain how to generate
option using GUI at the end of this document):
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
//The DataSource class is not limited to ARFF files. (3.5.5 and newer)
//It can also read CSV files and other formats (basically all file formats
that Weka can import via its converters using the gui.
DataSource source = new DataSource("/some/where/data.csv");
//convert the data to "Instances" instances

Instances data = source.getDataSet();
//The option string is generated using the WEKA GUI
String Options="-init 0 -max-candidates 100 -periodic-pruning 10000 -min-
density 2.0 -t1 -1.25 -t2 -1.0 -N 10 -A \"weka.core.EuclideanDistance -R
first-last\" -I 500 -num-slots 1 -S 10";
SimpleKMeans kmean= new SimpleKMeans();

kmean.setOptions(weka.core.Utils.splitOptions(Options));
kmean.buildClusterer(data);
System.out.println(kmean.toString());
//we get the 6th instance (line)
Instance test = data.get(5);
//we classify the instance
System.out.println(kmean.clusterInstance(test));
The method clusterInstance(test)is used to classify an example.
INCREMENTAL:
Clusterers implementing the weka.clusterers.UpdateableClusterer interface can be trained
incrementally (available since version 3.5.4). This conserves memory, since the data doesn't have to be
loaded into memory all at once. (Used if the data are too big to be loaded in the memory)
NB:not all the clusterer are incremental.

The process of training an incremental clusterer is done following those steps:
1. Call buildClusterer(Instances) with the structure of the dataset (may not contain any
actual data rows, only the structure is important).
2. Subsequently call the updateClusterer(Instance) method to feed the clusterer
new weka.core.Instance objects, one by one.
3. Call updateFinished() after all Instance objects have been processed, for the clusterer to
perform additional computations.
Here is an example using data to train weka.clusterers.Cobweb:
import java.io.File;
import weka.clusterers.Cobweb;
import weka.core.Instance;
import weka.core.converters.ArffLoader;
// load data we use arff file to get the structure of our training set
ArffLoader loader = new ArffLoader();
loader.setFile(new File("/some/where/data.arff"));
Instances structure = loader.getStructure();
String Options =" -A 1.0 -C 0.0028 -S 42";
// create Cobweb clster
Cobweb cw = new Cobweb();
cw.setOptions(weka.core.Utils.splitOptions(Options));
cw.buildClusterer(structure);
Instance current;
//we suppose that there is data in our arff file
while ((current = loader.getNextInstance(structure)) != null)
cw.updateClusterer(current);
cw.updateFinished();
CLUSTERER OPTION :
The easiest method (my personal opinion) to set Clusterer option (the method works also for classifier) is by
following this steps:
1. Configuring the clusterer/classifier using WEKA GUI.

2. Copy the option string by clicking with the right button on the clusterer/classifier field and selecting “copy
configuration to the clipboard”.
3. Delete the class name of the beginning of the option string that we get from step “2” example:
“weka.clusterers.SimpleKMeans” and pay attention to put “\” after ( " ) if the option String contains some.
View publication stats

Using WEKA in Your Java Code (Clustering) : August 2015

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using WEKA in Your Java Code (Clustering) : August 2015

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Using WEKA in your java code (Clustering)

Research · August 2015

The user has requested enhancement of the downloaded file.

DataSource source = new DataSource("/some/where/data.csv");

//convert the data to "Instances" instances

SimpleKMeans kmean= new SimpleKMeans();

The method clusterInstance(test)is used to classify an example.

NB:not all the clusterer are incremental.

The process of training an incremental clusterer is done following those steps:

Here is an example using data to train weka.clusterers.Cobweb:

1. Configuring the clusterer/classifier using WEKA GUI.

View publication stats

You might also like