Professional Documents
Culture Documents
Ramesh G Srihari W
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
2021 International Conference on Computer Communication and Informatics (ICCCI) | 978-1-7281-5875-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCCI50826.2021.9402356
Abstract—Most of the ongoing research for image images. It contains images labelled using a WordNet [2] synset
classification happens in English and there is very little research that represents the labelled object in the image. All the images
about classification of images in regional Indian languages like in the database are labelled under 21,841 unique synsets.
Kannada. There are several uses and advantages of classifying
images natively in Kannada. A diverse dataset of 425,723 images Some of the synsets in ImageNet are more general, such as
which is accurately labelled in Kannada is obtained through this device and food, while some other synsets are specific like
study, which adds to the contribution this paper makes to image computer and fruit. The device synset consists of images like
classification literature, that is verified and corrected by a guitar, deer and squirrel, as shown in Figure 1. ImageNet
human judge. It consists of 1,083 unique classes which was sources its images from a large number of different online
constructed by a combination of randomly selecting classes from
ImageNet, a database commonly used for Image classification in sources. Solving challenges in the domain of image processing
English as well as 193 manually thought out classes. The classes and classification using the dataset has been a hot topic for
were translated first using an online translation service and later researchers in the domain of computer vision and artificial
corrected by a human judge to achieve a translation accuracy of intelligence. The Large Scale Visual Recognition Challenge is
97.15% on the entire dataset. a competition held annually in which people compete with
Keywords—Image classification, image processing,
information retrieval, Image dataset, Image classification in
each other to solve a variety of challenges in image processing
Indian language, machine translation. using the ImageNet database [3]. This database has been a
tremendous resource for significant progress in image
I. I NTRODUCTION processing for classification of moving objects in videos,
clustering search results and object segmentation in images.
Kannada is one of the official languages of India and is
the administrative language of the state of Karnataka, India.
There are more than 60 million Kannada speakers all around
the world. The language is written utilizing the Kannada
script, which evolved from the Kadamba script (325-550 AD).
An area that has shown significant development in the field of
artificial intelligence is computer vision, specifically image
classification. The aim of image classification algorithms is to
identify features occurring in an image and provide an accurate
label based on the identified feature or group of features. As Fig. 1. Some examples of images and their labels from ImageNet
there are many large datasets available for the task of image
classification, the accuracies of models and algorithms trained All the significant developments for image classification are
and tested on these datasets are also very high. ImageNet [1] is concentrated in the English language. One of the reasons may
one such database that consists of more than 14 million include the lack of big sized databases labelled in languages
other than English. Although there has been some research
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan. 27-29, 2021, Coimbatore, INDIA
in the topic of image classification in languages other than B. DATASETS IN KANNADA AND KANNADA NATURAL
English, classification of images in Kannada remains largely LANGUAGE PROCESSING
unexplored. The aim of this work is to provide a dataset for There have been a few attempts at creating datasets in
the task of image classification in Kannada. kannada for the task of image classification in the past at a
The three main contributions of this study are: smaller scale. The Char78K [14] is the largest dataset that
1) Previously, there has been very little focus on image contains kannada labels and characters of numbers in the
classification in Kannada. This study is one of the first kannada script. Kannada MNIST [15] provides a benchmark
few to explore this area hence this can be a foundation style dataset for numbers in kannada. However, there is very
for further research in this topic. little work on any datasets outside numbers in the language.
2) This is also an attempt to further expand research for There are numerous difficulties in further research for
image classification in languages other than English. developing NLP, neural networks solutions and algorithms
3) A dataset of 425,723 images with accurate Kannada natively for Kannada and there is an immediate need for
labels is provided through this dataset. This enables datasets and other tools in order to further propagate research
further research to find new algorithms and solutions for areas in Kannada. There are little to no datasets that are
classification of images natively in Kannada, without the available for free use for the aforementioned purposes and
need for any translation. through this work we provide one such dataset.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan. 27-29, 2021, Coimbatore, INDIA
B. Translation process IV. RESULTS AND DISCUSSION
A human judge well versed in both English and Kannada
Translation of all the English labels from ImageNet was performed the complete evaluation of the translated labels
performed using the Google Cloud Translate API. The Google done by the translation service. The labels were classified as
Cloud Translate API can be used to translate a large amount “Accurate”, “Inaccurate”, “Neutral or English as shown in
of given input text from one language to the selected language. Table 1, based on the judge’s knowledge of the language as
There is no information that is publicly available about the well as additional outside sources that give extra information
accuracy of Google Translate for translations from English to and context regarding the generated translations.
Kannada. However, translations from English to other The accuracy of the translation service used for the translation
languages like Arabic [17] is known to be most ac- curate with of labels in ImageNet from English to Kannada was measured
the Google Translation service as compared to Microsoft on the basis of accurate or inaccurate generated labels.
Bing. Hence, because of its renowned popularity and “Accurate and English are classified as accurate and
reliability, Google Translate was chosen for this study. Once Inaccurate, “Neutral are classified as inaccurate labels. To
the translation service had translated all the labels, a human calculate the total accuracy in terms of percentage, the number
judge carefully evaluated every label and corrected any of images classified as accurate was divided by the total
discrepancies in the translation. The differences in the
number of images in the dataset. Hence the total translation
translation services or the vocabulary and knowledge of the accuracy of the dataset is observed to be 97.15% for its
human judge may not produce the exact same results in this entirety. The distribution of accuracy of translations of classes
study. This methodology was used to translate all the labels in terms of percentage is shown in figure 5.
for the 425,723 images to Kannada. The sample of accurately
labelled classes along with an image is shown in Figure 3. TABLE I
However, there are certain limitations to this dataset. Firstly, SUMMARY OF RESULTS OF ONLINE TRANSLATION SERVICE
the generated dataset contains a smaller subset of fine grained
categories as compared to ImageNet database, as shown in Result Number of Images Percentage
Table 2. Also, some of the classes might contain relatively Accurate 307 28.35%
fewer images than some of the other classes because of the Inaccurate 16 1.47%
ImageNet URLs being outdated/broken. Hence the distribution Neutral 15 1.38%
of images over the 1,083 classes is uneven. Finally, some of the
English 745 68.80%
synsets that are translated in Kannada are direct translations
i n which the synset in the language reads the same word in
English, represented as “English” in Table 1. A sample of
these classes are shown in Figure 4. This may be due to the
lack of a word in the dictionary in Kannada. Nevertheless,
while considering these limitations, this dataset is a valuable
resource that could be instrumental in further studies for image
classification in the regional language. This dataset of images
in correctly translated classes in Kannada can be provided at
request.
TABLE II
PERCENTAGE OF SIZE OF PROPOSED DATASET
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan. 27-29, 2021, Coimbatore, INDIA
V. C ONCLUSION [7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-
A dataset of 425,723 images that are correctly labelled ton. “ImageNet Classification with Deep Convolutional
under 1,083 unique Kannada labels are provided through this Neural Networks”. In: Advances in Neural Information
study. Subsequent studies under the field of image Processing Systems 25. Ed. by F. Pereira et al. Curran
classification in Kannada can be carried out using this dataset. Associates, Inc., 2012, pp. 1097–1105.
Through this study we also found that additional pre- [8] Jonathan Huang et al. “Speed/Accuracy Trade-Offs for
processing to the labels passed to the translation service with Modern Convolutional Object Detectors”. In: Proceed-
some modifications to provide the translation service with ings of the IEEE Conference on Computer Vision and
more contextual information, if applied, can be used to achieve Pattern Recognition (CVPR). July 2017.
better results. This study also reveals that the translations [9] Yongqin Xian, Bernt Schiele, and Zeynep Akata. “Zero-
obtained for some of the specific classes at the lower levels of Shot Learning - the Good, the Bad and the Ugly”. In:
the WordNet have greater inaccuracies. This may be attributed Proceedings of the IEEE Conference on Computer
to the lack of such entities present in the region or uncertainty Vision and Pattern Recognition (CVPR). July 2017.
about the existence of names for such classes in the language. [10] Andrej Karpathy and Li Fei-Fei. “Deep Visual-Semantic
For example, a certain type of food that has originated Alignments for Generating Image Descriptions”. In:
elsewhere in the world might not have any equivalent word Proceedings of the IEEE Conference on Computer
for it defined in the dictionary for Kannada, hence provides Vision and Pattern Recognition (CVPR). June 2015.
inaccurate translations or direct (words that are written in [11] Kelvin Xu et al. “Show, Attend and Tell: Neural Image
Kannada but are read in the same way as in their native Caption Generation with Visual Attention”. In: vol. 37.
language) translations of the words to Kannada. Hence Proceedings of Machine Learning Research. PMLR,
accuracies of translations for such words can be improved July 2015, pp. 2048–2057.
through some alternative methods of translating. [12] Geert Litjens et al. “A survey on deep learning in
Future research can focus on providing Kannada labels for all medical image analysis”. In: Medical Image Analysis
the classes for all images in ImageNet through the 42 (2017), pp. 60–88. ISSN : 1361-8415. DOI: https:
methodology and evaluation techniques utilized in this study. //doi.org/10.1016/j.media.2017.07.005.
Another direction for research can include adding pre- [13] A. Alsudais. “Image Classification in Arabic: Exploring
processing steps prior to the online translation process to Direct English to Arabic Translations”. In: IEEE Access
further improve the methodology. To increase the count of 7 (2019), pp. 122730–122739.
true positives for the translations, a combination of various [14] T. E. de Campos, B. R. Babu, and M. Varma. “Character
other translation services can also be included. recognition in natural images”. In: Proceedings of the
International Conference on Computer Vision Theory
REFERENCES and Applications, Lisbon, Portugal. Feb. 2009.
[1] J. Deng et al. “ImageNet: A large-scale hierarchical im- [15] Vinay Uday Prabhu. Kannada-MNIST: A new hand-
age database”. In: 2009 IEEE Conference on Computer written digits’ dataset for the Kannada language. 2019.
Vision and Pattern Recognition. 2009, pp. 248–255. arXiv: 1908.01242 [cs.CV].
[2] George A. Miller. “WordNet: A Lexical Database for [16] Tsung-Yi Lin et al. “Microsoft COCO: Common Ob-
English”. In: COMMUNICATIONS OF THE ACM 38 jects in Context”. In: Computer Vision – ECCV 2014.
(1995), pp. 39–41. Ed. by David Fleet et al. Cham: Springer International
[3] Olga Russakovsky et al. “ImageNet Large Scale Visual Publishing, 2014, pp. 740–755.
Recognition Challenge”. In: International Journal of [17] Riyad Al-Shalabi et al. “Evaluating machine transla-
Computer Vision 115.3 (Dec. 2015), pp. 211–252. ISSN : tions from Arabic into English and vice versa”. In:
1573-1405. DOI: 10.1007 / s11263 - 015 - 0816 - y. URL: International Research Journal of Electronics & Com-
https://doi.org/10.1007/s11263-015-0816-y. puter Engineering 3.2 (2017).
[4] K. He et al. “Deep Residual Learning for Image Recog-
nition”. In: 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). 2016, pp. 770–778.
[5] Karen Simonyan and Andrew Zisserman. Very Deep
Convolutional Networks for Large-Scale Image Recog-
nition. 2015. arXiv: 1409.1556 [cs.CV].
[6] S. Ren et al. “Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks”. In: IEEE
Transactions on Pattern Analysis and Machine Intelli-
gence 39.6 (2017), pp. 1137–1149.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.