Professional Documents
Culture Documents
3034
Table 1: Recognition accuracy estimation performance of image feature distances in terms of Spearman correlation.
80 80 Front
Side
Top-5 Accuracy(%)
Top-5 Accuracy(%)
Top-5 Accuracy(%)
Top-5 Accuracy(%)
80 80
Top
70 70
60 60
60 60
40 White 40 White
2D1 2D1
2D2 2D2 50 Front 50
20 3D1 20 3D1 Side
3D2 3D2 Top
3 × 10 3 6 × 10 3 10 3 10 2 5 × 101 7 × 101 10 1 3 × 10 1
log(1 / distance) log(1 / distance) log(1 / distance) log(1 / distance)
(a) AMZN Background (b) AMZN Background (c) AMZN Orientation (d) AMZN Orientation
- VGG16 Canberra - Color Canberra - VGG11 Chebyshev - Edge l1
50 50 Front
40 40 Side
Top-5 Accuracy(%)
Top-5 Accuracy(%)
Top-5 Accuracy(%)
Top-5 Accuracy(%)
40 40 Top
30 30 30 30
20 White 20 White 20 20
2D1 2D1
10 2D2 10 2D2 Front
3D1 3D1 10 Side 10
0 3D2 0 3D2 Top
101 3 × 101 10 3 10 2 2 × 107 4 × 107 10 1 3 × 10 1
log(1 / distance) log(1 / distance) log(1 / distance) log(1 / distance)
(e) MSFT Background (f) MSFT Background (g) MSFT Orientation (h) MSFT Orientation
- VGG11 Minkowski - Color Canberra - VGG16 SAD - Edge l1
Fig. 3: Scatter plots of top hand-crafted and data-driven recognition accuracy estimation methods.
3035
3. RECOGNITION PERFORMANCE ESTIMATION feature representations and mask the effect of changes in
UNDER MULTIFARIOUS CONDITIONS the backgrounds. To distinguish differences in backgrounds
Based on the experiments reported in Section 2, the refer- overlooked by edge characteristics, frequency and orienta-
ence configuration that leads to the highest recognition per- tion characteristics can be considered with Gabor features.
formance is front view, white background, and Nikon DSLR. Data-driven methods including VGG utilize all three channels
We conducted two experiments to estimate the recognition of images for feature extraction, which can give them an in-
performance with respect to changes in background and ori- herent advantage over the methods that solely utilize color or
entation. We utilized the 10 common objects of both plat- structure information. Overall, data-driven method VGG leads
forms for direct comparison. In the background experiment, to the highest performance in the background experiment for
we grouped images captured with a particular device (5) in both recognition platforms. In terms of hand-crafted features,
front of a specific background (5), which leads to 25 image color leads to the highest performance followed by Gabor
groups with front and side views of the objects. In the orien- whereas edge-based methods result in inferior performance.
tation experiment, we grouped images captured with a partic- Distinguishing changes in orientation is more challenging
ular device (5) from an orientation (3) among front, top, and compared to backgrounds because region of interest is lim-
side views, which leads to 15 image groups with images of ited to a smaller area. Therefore, overall recognition accu-
the objects in front of white, living room, and kitchen back- racy estimation performances are lower for orientations com-
drops. For each image group, we obtained an average recog- pared to backgrounds as reported in Table 1. Similar to the
nition performance per recognition platform and an average background experiment, VGG architectures lead to the highest
feature distance between the images in the group and their ref- performance estimation in the orientation experiment. How-
erence image. Finally, we analyzed the relationship between ever, hand-crafted methods are dominated by edge features
recognition accuracy and feature distance with correlations instead of Gabor representations. We show the scatter plots
and scatter plots. We extracted commonly used handcrafted of top performing data-driven and hand-crafted methods in
and data-driven features as follows: Fig. 3 in which x-axis corresponds to average distance be-
p Color: Histograms of color channels in RGB.
tween image features and y-axis corresponds to top-5 accu-
p Daisy: Local image descriptor based on convolutions of
racy. Image groups corresponding to different configurations
are more distinctly clustered in terms of background as ob-
gradients in specific directions with Gaussian filters [16].
p Edge: Histogram of vertical, horizontal, diagonal, and
served in Fig. 3(a-b, e-f). In terms of orientation, VGG leads
to a clear distinction of configurations for Amazon Rekogni-
non-directional edges.
p Gabor: Frequency and orientation information of im-
tion as observed in Fig. 3(c) whereas image groups are over-
lapping in other experiments as shown in Fig. 3(d, g-h). Clus-
ages extracted through Gabor filters.
p HOG: Histogram of oriented gradients over local regions.
tering configurations is more challenging in the orientation
p
experiment because it is not possible to easily separate orien-
VGG: Features obtained from convolutional neural net- tation configurations based on their recognition accuracy.
works that are based on stacked 3 × 3 convolutional layers
[17]. The VGG index indicates the number of weighted 4. CONCLUSION
layers in which last three layers are fully connected layers. In this paper, we analyzed the robustness of recognition plat-
We calculated the distance between features with l1 norm, forms and reported that object background can affect recogni-
l2 norm, l2 2 norm, sum of absolute differences (SAD), sum tion performance as much as orientation whereas acquisition
of squared absolute differences (SSAD), weighted l1 norm devices have minor influence on recognition. We also in-
(Canberra), l∞ norm (Chebyshev), Minkowski distance, troduced a framework to estimate recognition performance
Bray-Curtis dissimilarity, and Cosine distance. We report variation and showed that color-based features capture back-
the recognition accuracy estimation performance in Table 1 ground variations, edge-based features capture orientation
in terms of Spearman correlation between top-5 recognition variations, and data-driven features capture both background
accuracy scores and feature distances. We highlight the top and orientation variations in a controlled setting. Overall,
data-driven and hand-crafted methods with light blue for each recognition performance can significantly change depending
recognition platform and experiment. on the acquisition conditions, which highlights the need for
In the background experiment, color characteristics of more robust platforms that we can confide in. Estimating
different backgrounds are distinct from each other as ob- recognition performance with feature similarity-based met-
served in Fig. 1. In terms of low level characteristic features rics can be helpful to test the robustness of algorithms before
including Daisy, Edge, and HOG, edges in the backgrounds deployment. However, the applicability of such estimation
can distinguish highly textured backgrounds from less tex- frameworks can drastically increase if we design no-reference
tured backgrounds. However, edges would not be sufficient approaches that can provide a recognition performance esti-
to distinguish lowly textured backgrounds from each other. mation without a reference image similar to the no-reference
Moreover, edges of the foreground objects can dominate the algorithms in image quality assessment field.
3036
5. REFERENCES Recognition,” in IEEE International Conference on Ma-
chine Learning and Applications (ICMLA), 2018.
[1] J. Deng, W. Dong, R. Socher, L. J. Li, Kai Li, and
Li Fei-Fei, “ImageNet: A large-scale hierarchical im- [11] D. Hendrycks and T. G. Dietterich, “Benchmarking
age database,” in IEEE Conference on Computer Vision Neural Network Robustness to Common Corruptions
and Pattern Recognition (CVPR), June 2009, pp. 248– and Surface Variations,” in International Conference on
255. Learning Representations (ICLR), 2019.
[2] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, [12] D. Temel, G. Kwon, M. Prabhushankar, and G. AlRegib,
D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft “CURE-TSR: Challenging Unreal and Real Environ-
COCO: Common objects in context,” in European Con- ments for Traffic Sign Recognition,” in Neural Informa-
ference on Computer Vision (ECCV), D. Fleet, T. Pa- tion Processing Systems (NeurIPS), Machine Learning
jdla, B. Schiele, and T. Tuytelaars, Eds., Cham, 2014, for Intelligent Transportation Systems Workshop, 2017.
pp. 740–755, Springer International Publishing. [13] D. Temel and G. AlRegib, “Traffic signs in the wild:
[3] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep Highlights from the ieee video and image processing
into rectifiers: Surpassing human-level performance on cup 2017 student competition [SP competitions],” IEEE
imagenet classification,” in IEEE International Con- Sig. Proc. Mag., vol. 35, no. 2, pp. 154–161, March
ference on Computer Vision (ICCV), Washington, DC, 2018.
USA, 2015, pp. 1026–1034, IEEE Computer Society. [14] M. Prabhushankar, G. Kwon, D. Temel, and
[4] R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun, G. AIRegib, “Semantically interpretable and con-
“Deep Image: Scaling up Image Recognition,” in trollable filter sets,” in IEEE International Conference
arXiv:1501.02876, 2015. on Image Processing (ICIP), Oct 2018, pp. 1053–1057.
[15] D. Temel, T. Alshawi, M-H. Chen, and G. AlRegib,
[5] S. Dodge and L. Karam, “Understanding how im-
“Challenging environments for traffic sign detection:
age quality affects deep neural networks,” in Interna-
Reliability assessment under inclement conditions,”
tional Conference on Quality of Multimedia Experience
arXiv:1902.06857, 2019.
(QoMEX), June 2016, pp. 1–6.
[16] E. Tola, V. Lepetit, and P. Fua, “Daisy: An efficient
[6] Y. Zhou, S. Song, and N. Cheung, “On classification
dense descriptor applied to wide-baseline stereo,” IEEE
of distorted images with deep convolutional neural net-
Transactions on Pattern Analysis and Machine Intelli-
works,” in IEEE International Conference on Acoustics,
gence (PAMI), vol. 32, no. 5, pp. 815–830, May 2010.
Speech and Signal Processing (ICASSP), March 2017,
pp. 1213–1217. [17] K. Simonyan and A. Zisserman, “Very deep convolu-
tional networks for large-scale image recognition,” in
[7] H. Hosseini, B. Xiao, and R. Poovendran, “Google’s arXiv:1409.1556, 2014.
cloud vision api is not robust to noise,” in 16th IEEE
International Conference on Machine Learning and Ap-
plications (ICMLA), Dec 2017, pp. 101–105.
3037