with Engineered Features Dr. Noha Mohamed Computer Science Dep. Introduction Feature Mining Agenda Feature Reduction Filtering Using ANN Engineered Feature Limitations The three pillars for a successful ML application are the data, features, and model. They should cope with each other. The most relevant features that differentiate among the different cases existing in the data are used. Representative features are critical in building an Introduction accurate CV application. They should be accurate enough to work well under different conditions such as a change in scale and rotation. Such features should work well with the selected ML model. You shouldn’t use more features than needed, because this adds more complexity to the model. Feature selection and reduction techniques are used to find the minimum set of features to build an accurate model. Cont. Feature reduction is applied to minimize the feature vector length and just use the most relevant features. ANN is implemented to map the image features to their output labels. You must understand your problem and its dataset. Example, classification of Fruits 360 dataset. Based on the feature categories presented in Chapter 2 (colour, texture, and edge), we need to find the most suitable set of features to Feature Mining differentiate these classes. You must understand the data structure. We should find a way to reduce this number of input features in order to reduce the number of parameters. (complexity) One way is by using a single channel rather than using all three RGB channels. The selected channel should be able to capture the colour changes among the used classes.
Cont. Figure 1. A histogram helps us to visualize the
intensity values easier than looking at the image would. The three channels for each image in addition to their histograms are available in Figure 1. Figure 1.Red, green, and blue channels in addition to their histograms for a single sample from the four classes of the Fruits 360 dataset used It seems that it is difficult to find the best channel to use. According to the histogram for any channel, there is overlap in some regions across the images. Cont. The only metric to differentiate the different images in such a case is the intensity values. For example, Braeburn apple and Meyer lemon have values for all bins according to the blue channel histogram, but their values differ. Apple has small values compared to lemon in the rightmost part. According to illumination changes, the intensity values will change and we might have a case in which both apple and lemon have close values to each other in the histogram. Cont. We should add a margin between the different classes. Even with little changes, there is no mystery in making the decision We can benefit from the fact that the four fruits used have different colours. A colour space that decouples illumination channels from colour channels is a good option. (HSV) Cont. Figure 2 shows the hue channel from the HSV colour space from the four samples used previously in addition to their histograms Figure 2. Hue channel from the HSV colour space with its histograms Based on the previous simple experiments on the four classes selected, the hue channel histogram can classify the data correctly.
Cont. Hue is measured in degrees from 0 to 360.
The umber of features, in this case, is just 360 rather than 30,000. This helps very much to reduce the number of ANN parameters. Feature Reduction The Next Filtering Using ANN