You are on page 1of 56

Using a filter in Weka

Use a filter to remove an


attribute
• Open weather.nominal.arff file
• Click on attribute filter
• Click on Remove
• Click on apply
• It will remove 3rd attribute (Humidity)
• Click on undo
How to remove instances from
data set
Problem
• Load the weather.nominal dataset.
• Use the filter
weka.unsupervised.instance.RemoveWithValues
• Remove all instances in which the humidity attribute
has the value high.
• Click here
• Click on More
• Nominal indices means label number
• Select humidity attribute
• high label count=0
• Now instances reduce to 7
• Load the weather.nominal dataset.
• Use the filter
weka.unsupervised.instance.RemoveWithValues
• Remove all instances in which the humidity
attribute has the value normal.
• Undo the changes
• Change NominalIndices value =2
• Click on Apply
• Now instances reduce to 7
• Normal label count=0
How filter data can give better
results
• Select glass data set
• Select classify
• Select J48
• Accuracy=66.8%
• Remove Fe attribute
• Select J48 classification
• Accuracy=67%
• Remove few more attributes as shown
• Accuracy = 68.6%
• Remove Fe and Ba
• Check correctly classified instances
• Visualize that tree
Activity
• Download and open the anneal dataset.
• 1. How many attributes does it have?
• 39
• 2. Apply the unsupervised attribute
filter RemoveUseless.

• How many attributes does the dataset have now?


• 32
• 3. Identify one of the attributes that was
removed by clicking Undo and then Apply. Now
figure out why it was removed.

•  The attribute name was too short


•  Only one of the attribute's values actually
appears in the dataset
•  The attributes only had two possible values
•  Only one of the attribute's values actually
appears in the dataset
Activity
• Open the glass dataset.
• 1. Apply the unsupervised attribute
filter Normalize. What is the new range (i.e.
minimum and maximum) of the Na attribute?

 [-1, 1]
 [0, 1]
 [-∞, ∞]
•  [0, 1]
• 2. Undo all changes to the glass dataset again.
Now determine which attribute set gives the
highest classification accuracy using J48, with
default options.

•  removing Fe, Si, Al, K


•  removing Fe, Mg, Rl
•  removing Fe, Si, Mg, K
•  removing Fe, Si, Al, K

You might also like