This document provides instructions on how to use filters in Weka to preprocess data:
1. It describes how to remove attributes using the attribute filter. As an example, it shows removing the humidity attribute from the weather.nominal dataset.
2. It then shows how to remove instances using the RemoveWithValues filter. It demonstrates removing instances where the humidity is "high" or "normal" from the weather.nominal dataset.
3. Finally, it discusses how filtering data can improve classification results. It shows getting better accuracy when removing attributes like Fe from the glass dataset before running the J48 classifier.
This document provides instructions on how to use filters in Weka to preprocess data:
1. It describes how to remove attributes using the attribute filter. As an example, it shows removing the humidity attribute from the weather.nominal dataset.
2. It then shows how to remove instances using the RemoveWithValues filter. It demonstrates removing instances where the humidity is "high" or "normal" from the weather.nominal dataset.
3. Finally, it discusses how filtering data can improve classification results. It shows getting better accuracy when removing attributes like Fe from the glass dataset before running the J48 classifier.
This document provides instructions on how to use filters in Weka to preprocess data:
1. It describes how to remove attributes using the attribute filter. As an example, it shows removing the humidity attribute from the weather.nominal dataset.
2. It then shows how to remove instances using the RemoveWithValues filter. It demonstrates removing instances where the humidity is "high" or "normal" from the weather.nominal dataset.
3. Finally, it discusses how filtering data can improve classification results. It shows getting better accuracy when removing attributes like Fe from the glass dataset before running the J48 classifier.
attribute • Open weather.nominal.arff file • Click on attribute filter • Click on Remove • Click on apply • It will remove 3rd attribute (Humidity) • Click on undo How to remove instances from data set Problem • Load the weather.nominal dataset. • Use the filter weka.unsupervised.instance.RemoveWithValues • Remove all instances in which the humidity attribute has the value high. • Click here • Click on More • Nominal indices means label number • Select humidity attribute • high label count=0 • Now instances reduce to 7 • Load the weather.nominal dataset. • Use the filter weka.unsupervised.instance.RemoveWithValues • Remove all instances in which the humidity attribute has the value normal. • Undo the changes • Change NominalIndices value =2 • Click on Apply • Now instances reduce to 7 • Normal label count=0 How filter data can give better results • Select glass data set • Select classify • Select J48 • Accuracy=66.8% • Remove Fe attribute • Select J48 classification • Accuracy=67% • Remove few more attributes as shown • Accuracy = 68.6% • Remove Fe and Ba • Check correctly classified instances • Visualize that tree Activity • Download and open the anneal dataset. • 1. How many attributes does it have? • 39 • 2. Apply the unsupervised attribute filter RemoveUseless.
• How many attributes does the dataset have now?
• 32 • 3. Identify one of the attributes that was removed by clicking Undo and then Apply. Now figure out why it was removed.
• The attribute name was too short
• Only one of the attribute's values actually appears in the dataset • The attributes only had two possible values • Only one of the attribute's values actually appears in the dataset Activity • Open the glass dataset. • 1. Apply the unsupervised attribute filter Normalize. What is the new range (i.e. minimum and maximum) of the Na attribute?
[-1, 1] [0, 1] [-∞, ∞] • [0, 1] • 2. Undo all changes to the glass dataset again. Now determine which attribute set gives the highest classification accuracy using J48, with default options.
• removing Fe, Si, Al, K
• removing Fe, Mg, Rl • removing Fe, Si, Mg, K • removing Fe, Si, Al, K