You are on page 1of 4

2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan.

27-29, 2021, Coimbatore, INDIA

Kannada ImageNet: A Dataset for Image


Classification in Kannada

Ramesh G Srihari W
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
2021 International Conference on Computer Communication and Informatics (ICCCI) | 978-1-7281-5875-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCCI50826.2021.9402356

University Visvesvaraya College of Engineering University Visvesvaraya College of Engineering


Bengaluru, India Bengaluru, India
rameshmg6308@gmail.com srihariw1999@gmail.com

Atul M Bharadwaj Champa H N


Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
S J B Institute of Technology University Visvesvaraya College of Engineering
Bengaluru, India Bengaluru, India
atulmb99@gmail.com champahn@yahoo.co.in

Abstract—Most of the ongoing research for image images. It contains images labelled using a WordNet [2] synset
classification happens in English and there is very little research that represents the labelled object in the image. All the images
about classification of images in regional Indian languages like in the database are labelled under 21,841 unique synsets.
Kannada. There are several uses and advantages of classifying
images natively in Kannada. A diverse dataset of 425,723 images Some of the synsets in ImageNet are more general, such as
which is accurately labelled in Kannada is obtained through this device and food, while some other synsets are specific like
study, which adds to the contribution this paper makes to image computer and fruit. The device synset consists of images like
classification literature, that is verified and corrected by a guitar, deer and squirrel, as shown in Figure 1. ImageNet
human judge. It consists of 1,083 unique classes which was sources its images from a large number of different online
constructed by a combination of randomly selecting classes from
ImageNet, a database commonly used for Image classification in sources. Solving challenges in the domain of image processing
English as well as 193 manually thought out classes. The classes and classification using the dataset has been a hot topic for
were translated first using an online translation service and later researchers in the domain of computer vision and artificial
corrected by a human judge to achieve a translation accuracy of intelligence. The Large Scale Visual Recognition Challenge is
97.15% on the entire dataset. a competition held annually in which people compete with
Keywords—Image classification, image processing,
information retrieval, Image dataset, Image classification in
each other to solve a variety of challenges in image processing
Indian language, machine translation. using the ImageNet database [3]. This database has been a
tremendous resource for significant progress in image
I. I NTRODUCTION processing for classification of moving objects in videos,
clustering search results and object segmentation in images.
Kannada is one of the official languages of India and is
the administrative language of the state of Karnataka, India.
There are more than 60 million Kannada speakers all around
the world. The language is written utilizing the Kannada
script, which evolved from the Kadamba script (325-550 AD).
An area that has shown significant development in the field of
artificial intelligence is computer vision, specifically image
classification. The aim of image classification algorithms is to
identify features occurring in an image and provide an accurate
label based on the identified feature or group of features. As Fig. 1. Some examples of images and their labels from ImageNet
there are many large datasets available for the task of image
classification, the accuracies of models and algorithms trained All the significant developments for image classification are
and tested on these datasets are also very high. ImageNet [1] is concentrated in the English language. One of the reasons may
one such database that consists of more than 14 million include the lack of big sized databases labelled in languages
other than English. Although there has been some research

978-1-7281-5875-4/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan. 27-29, 2021, Coimbatore, INDIA
in the topic of image classification in languages other than B. DATASETS IN KANNADA AND KANNADA NATURAL
English, classification of images in Kannada remains largely LANGUAGE PROCESSING
unexplored. The aim of this work is to provide a dataset for There have been a few attempts at creating datasets in
the task of image classification in Kannada. kannada for the task of image classification in the past at a
The three main contributions of this study are: smaller scale. The Char78K [14] is the largest dataset that
1) Previously, there has been very little focus on image contains kannada labels and characters of numbers in the
classification in Kannada. This study is one of the first kannada script. Kannada MNIST [15] provides a benchmark
few to explore this area hence this can be a foundation style dataset for numbers in kannada. However, there is very
for further research in this topic. little work on any datasets outside numbers in the language.
2) This is also an attempt to further expand research for There are numerous difficulties in further research for
image classification in languages other than English. developing NLP, neural networks solutions and algorithms
3) A dataset of 425,723 images with accurate Kannada natively for Kannada and there is an immediate need for
labels is provided through this dataset. This enables datasets and other tools in order to further propagate research
further research to find new algorithms and solutions for areas in Kannada. There are little to no datasets that are
classification of images natively in Kannada, without the available for free use for the aforementioned purposes and
need for any translation. through this work we provide one such dataset.

II. RELATED W ORK III. M ETHODOLOGY


A. IMAGE CLASSIFICATION A sample of 1000 images from 1000 classes was selected
from the ImageNet database. After verifying the image URLs
Image classification is a supervised learning problem in from the database, it was found that some URLs were no
which pixels in an image are assigned to a category or classes longer valid. As an implication, the sample reduced down to
of interest. Due to the availability of large scale image 293,213 images unevenly distributed among 890 classes. In
databases, there have been significant advances in the field as addition to these, there are 193 manually thought out classes
of late, and one such database is the ImageNet [1]. ImageNet that were cross verified with the WordNet synset, and on
has been pivotal in advancing the progress of artificial further evaluation of URLs, the sample added up to 132,510
intelligence research on a wide scale, like image classification images. Using the Google Translate service, the labels of both
and captioning as presented in [4], [5], [6], [7]. ImageNet has of these sets of classes were then translated to Kannada. A
sparked a large amount of research under image classification. human judge evaluated the correctness of the translated labels
The use of convolutional neural networks [8] for image that were obtained from the translation service. The overview
classification have been vital for models that have beneficial of the methodology is represented in Figure 2.
real world applications in a broad variety of fields. There is an
importance for zero-shot learning where the classes are
correctly identified even though they were not an exact match
with their labels from their training set [9]. By training on a
large dataset of the class, advances have been made in the
areas of object detection and object tracking. For applications
where text-based information retrieval is necessary, identifying
objects present in the image is hugely beneficial. Several such
applications rely on identifying objects from an image. Hence,
a large dataset is desirable for such purposes. Image
captioning is an area in which the contents of the image are
described after initial recognition [10], [11]. In certain
industries like healthcare, classifying images can be beneficial Fig. 2. Overview of methodology for this study.
in generating medical reports and other contextual information
from the contents of the image [12]. Image classification has
been researched and applied to a wide variety of industries but A. Collection of dataset
most of the work has been in English, even though a large The dataset was collected using a script that downloaded the
amount of non-English speakers exists. Therefore, this issue images for a specified list of classes from existing ImageNet
must be addressed so that the state-of-the-art methods of URLs using the ImageNet API. The classes are a combination
image classification is not limited to just one language but also of randomly chosen classes from a list from the ImageNet
benefits languages other than English. There have been synsets, manually thought out common image classes as well
successful attempts in creating a dataset using translation as some from the classes of the COCO dataset [16]. All of
services to translate labels from English to Arabic by the synsets specified are unique. Since the images are obtained
Abdulkareem Alsudais [13] and hence borrows a similar from ImageNet URLs, the images are accurate for each class
methodology for translating labels from English to Kannada. and they contain only the specified object of interest.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan. 27-29, 2021, Coimbatore, INDIA
B. Translation process IV. RESULTS AND DISCUSSION
A human judge well versed in both English and Kannada
Translation of all the English labels from ImageNet was performed the complete evaluation of the translated labels
performed using the Google Cloud Translate API. The Google done by the translation service. The labels were classified as
Cloud Translate API can be used to translate a large amount “Accurate”, “Inaccurate”, “Neutral or English as shown in
of given input text from one language to the selected language. Table 1, based on the judge’s knowledge of the language as
There is no information that is publicly available about the well as additional outside sources that give extra information
accuracy of Google Translate for translations from English to and context regarding the generated translations.
Kannada. However, translations from English to other The accuracy of the translation service used for the translation
languages like Arabic [17] is known to be most ac- curate with of labels in ImageNet from English to Kannada was measured
the Google Translation service as compared to Microsoft on the basis of accurate or inaccurate generated labels.
Bing. Hence, because of its renowned popularity and “Accurate and English are classified as accurate and
reliability, Google Translate was chosen for this study. Once Inaccurate, “Neutral are classified as inaccurate labels. To
the translation service had translated all the labels, a human calculate the total accuracy in terms of percentage, the number
judge carefully evaluated every label and corrected any of images classified as accurate was divided by the total
discrepancies in the translation. The differences in the
number of images in the dataset. Hence the total translation
translation services or the vocabulary and knowledge of the accuracy of the dataset is observed to be 97.15% for its
human judge may not produce the exact same results in this entirety. The distribution of accuracy of translations of classes
study. This methodology was used to translate all the labels in terms of percentage is shown in figure 5.
for the 425,723 images to Kannada. The sample of accurately
labelled classes along with an image is shown in Figure 3. TABLE I
However, there are certain limitations to this dataset. Firstly, SUMMARY OF RESULTS OF ONLINE TRANSLATION SERVICE
the generated dataset contains a smaller subset of fine grained
categories as compared to ImageNet database, as shown in Result Number of Images Percentage
Table 2. Also, some of the classes might contain relatively Accurate 307 28.35%
fewer images than some of the other classes because of the Inaccurate 16 1.47%
ImageNet URLs being outdated/broken. Hence the distribution Neutral 15 1.38%
of images over the 1,083 classes is uneven. Finally, some of the
English 745 68.80%
synsets that are translated in Kannada are direct translations
i n which the synset in the language reads the same word in
English, represented as “English” in Table 1. A sample of
these classes are shown in Figure 4. This may be due to the
lack of a word in the dictionary in Kannada. Nevertheless,
while considering these limitations, this dataset is a valuable
resource that could be instrumental in further studies for image
classification in the regional language. This dataset of images
in correctly translated classes in Kannada can be provided at
request.

Fig. 5. Distribution of accuracy of classes for percentages of “En-


glish”,“Accurate”,“Inaccurate”,“Neutral”.
Fig. 3. Sample of images with correctly translated labels.

TABLE II
PERCENTAGE OF SIZE OF PROPOSED DATASET

Accurate Kannada Percentage


Type Total in ImageNet
labels generated from ImageNet
Synsets 1,083 21,841 4.95%
Images 425,723 14,197,122 3%

Fig. 4. Sample of “English” translations of classes.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Computer Communication and Informatics (ICCCI -2021), Jan. 27-29, 2021, Coimbatore, INDIA
V. C ONCLUSION [7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-
A dataset of 425,723 images that are correctly labelled ton. “ImageNet Classification with Deep Convolutional
under 1,083 unique Kannada labels are provided through this Neural Networks”. In: Advances in Neural Information
study. Subsequent studies under the field of image Processing Systems 25. Ed. by F. Pereira et al. Curran
classification in Kannada can be carried out using this dataset. Associates, Inc., 2012, pp. 1097–1105.
Through this study we also found that additional pre- [8] Jonathan Huang et al. “Speed/Accuracy Trade-Offs for
processing to the labels passed to the translation service with Modern Convolutional Object Detectors”. In: Proceed-
some modifications to provide the translation service with ings of the IEEE Conference on Computer Vision and
more contextual information, if applied, can be used to achieve Pattern Recognition (CVPR). July 2017.
better results. This study also reveals that the translations [9] Yongqin Xian, Bernt Schiele, and Zeynep Akata. “Zero-
obtained for some of the specific classes at the lower levels of Shot Learning - the Good, the Bad and the Ugly”. In:
the WordNet have greater inaccuracies. This may be attributed Proceedings of the IEEE Conference on Computer
to the lack of such entities present in the region or uncertainty Vision and Pattern Recognition (CVPR). July 2017.
about the existence of names for such classes in the language. [10] Andrej Karpathy and Li Fei-Fei. “Deep Visual-Semantic
For example, a certain type of food that has originated Alignments for Generating Image Descriptions”. In:
elsewhere in the world might not have any equivalent word Proceedings of the IEEE Conference on Computer
for it defined in the dictionary for Kannada, hence provides Vision and Pattern Recognition (CVPR). June 2015.
inaccurate translations or direct (words that are written in [11] Kelvin Xu et al. “Show, Attend and Tell: Neural Image
Kannada but are read in the same way as in their native Caption Generation with Visual Attention”. In: vol. 37.
language) translations of the words to Kannada. Hence Proceedings of Machine Learning Research. PMLR,
accuracies of translations for such words can be improved July 2015, pp. 2048–2057.
through some alternative methods of translating. [12] Geert Litjens et al. “A survey on deep learning in
Future research can focus on providing Kannada labels for all medical image analysis”. In: Medical Image Analysis
the classes for all images in ImageNet through the 42 (2017), pp. 60–88. ISSN : 1361-8415. DOI: https:
methodology and evaluation techniques utilized in this study. //doi.org/10.1016/j.media.2017.07.005.
Another direction for research can include adding pre- [13] A. Alsudais. “Image Classification in Arabic: Exploring
processing steps prior to the online translation process to Direct English to Arabic Translations”. In: IEEE Access
further improve the methodology. To increase the count of 7 (2019), pp. 122730–122739.
true positives for the translations, a combination of various [14] T. E. de Campos, B. R. Babu, and M. Varma. “Character
other translation services can also be included. recognition in natural images”. In: Proceedings of the
International Conference on Computer Vision Theory
REFERENCES and Applications, Lisbon, Portugal. Feb. 2009.
[1] J. Deng et al. “ImageNet: A large-scale hierarchical im- [15] Vinay Uday Prabhu. Kannada-MNIST: A new hand-
age database”. In: 2009 IEEE Conference on Computer written digits’ dataset for the Kannada language. 2019.
Vision and Pattern Recognition. 2009, pp. 248–255. arXiv: 1908.01242 [cs.CV].
[2] George A. Miller. “WordNet: A Lexical Database for [16] Tsung-Yi Lin et al. “Microsoft COCO: Common Ob-
English”. In: COMMUNICATIONS OF THE ACM 38 jects in Context”. In: Computer Vision – ECCV 2014.
(1995), pp. 39–41. Ed. by David Fleet et al. Cham: Springer International
[3] Olga Russakovsky et al. “ImageNet Large Scale Visual Publishing, 2014, pp. 740–755.
Recognition Challenge”. In: International Journal of [17] Riyad Al-Shalabi et al. “Evaluating machine transla-
Computer Vision 115.3 (Dec. 2015), pp. 211–252. ISSN : tions from Arabic into English and vice versa”. In:
1573-1405. DOI: 10.1007 / s11263 - 015 - 0816 - y. URL: International Research Journal of Electronics & Com-
https://doi.org/10.1007/s11263-015-0816-y. puter Engineering 3.2 (2017).
[4] K. He et al. “Deep Residual Learning for Image Recog-
nition”. In: 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). 2016, pp. 770–778.
[5] Karen Simonyan and Andrew Zisserman. Very Deep
Convolutional Networks for Large-Scale Image Recog-
nition. 2015. arXiv: 1409.1556 [cs.CV].
[6] S. Ren et al. “Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks”. In: IEEE
Transactions on Pattern Analysis and Machine Intelli-
gence 39.6 (2017), pp. 1137–1149.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 03,2021 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.

You might also like