You are on page 1of 7

An Improved and Efficient Image Mining

Technique for Classification of Textual


Images Using Low-Level Image Features
Ankita Tripathi1, Shivam Pandey2
1
Mody University of Science and 2
Mody University of Science and
Technology, CET, Technology, CET,
Lakshmangarh, Rajasthan, Lakshmangarh, Rajasthan,
India India
er.ankita.tripathi02@gmail.com shivampandey.cet@modyuniversity.ac.in.

Hitesh Jangir3
3Mody University of Science and Technology, CET,
Lakshmangarh, Rajasthan, India
hiteshjangir.cet@modyuniversity.ac.in.

Abstract- This paper proposes a system that classifies textual first, the proposed model checked whether the image is a
images (images that encounter text within) using low-level image document or non document image taking into consideration
features. Image classification and content based image retrieval mean and skewness features only [4]. Then, if the image is
is a growing field in the area of image classification. In this non-document image, then it will be either caption or scene
paper, the approach is based on various low-level image features image. So, secondly, to classify non document images,
including GLCM features like mean, skewness, energy, contrast, energy, contrast and homogeneity features are taken into
homogeneity. Using these various features, the differences account. Then the proposed system performs classification
between images are measured, and then these are used to classify and clustering (data visualization) using J48 decision tree
the textual images by performing classification and clustering
classifier. We have used about 60 images in weka for
techniques on datasets. The proposed method experimented on
60 different textual images to obtain an improved result that was
training and testing using J48 decision tree classifier [4].
not obtained in earlier systems along with classification of
images in three main categories: document, scene and caption.

Keywords- Classification; Clustering; Textual images;


Image mining; content based image retrieval (CBIR).

I. INTRODUCTION

Advancement in image accretion and booming growth in


huge image database led to the requirement of image mining.
The process of image mining is all about to unfold relevant
knowledge out of images [1]. Numerous techniques have Fig 1: Document Image
been developed to classify images, but less work has been
done to classify textual images. Recently, a number of highly
advanced classification systems such as artificial neural
networks, fuzzy-sets, and expert systems has been developed
for the purpose of image classification [2] [3]. It is very
important to classify textual images so that most similar
images can be considered for content based image retrieval
(CBIR).
In this paper, textual images are categorized into
three different categories: 1) Document Image; 2) Caption
Image; 3) Scene Image. Fig. 1,2,3 show the three categories Fig 2: Caption Image
of textual images.
Based on low level features (including GLCM
features) mean, skewness, energy, contrast, homogeneity
textual images are classified. These are the minimum
features required to classify a textual image efficiently. At
technique and Bag of Visual Words (BoVW) technique
respectively. A learning approach has been proposed whose
experiment was implemented on 1925 document images
from industrial database. The results prove that the fusion
scheme efficiently improves the classification performances.
J. Wang et.al.[8] proposed model is based on modality-
specific feature learning. The methodology basically uses 2
types of convolution neural networks that maps the raw
data to the latent space representations for images and texts.
Fig 3: Scene Image
A back-propagation method is applied to upgrade the
parameters of 2 convolutional networks. Extensive cross-
modal retrieval experiments was carried out on three
II. RELATED WORK challenging datasets and the results shows that the proposed
In this section, the focus is on the literature review of the model is more effective.
proposed solutions gathered in the literature. Basically, In the proposed system, we have considered the
image mining work rotates around medical or astronomical minimum number of low level features(five) required to
images. But the proposed system is working on various types classify images [9]. These minimum features results in easy
of textual images. calculation as well as time efficient result. Other than this, in
S. Chitrakala et.al. [1] proposed a textual image earlier research only classification was performed to obtain
classifier using eight low level features like mean, skewness, results. But the system performs clustering i.e., data
variance, energy, entropy, correlation, homogeneity and visualization that gives an improved result not only from
contrast. And classified images into three categories- earlier research but also from classification technique.
document, scene and caption using J48 decision tree
classifier.
III. PROPOSED METHODOLOGY
T. Duong et.al. [3] presented an unsupervised model
based on featurepsilas distribution for image classification. With the boom in development of image capturing
Image are divided into grids and a hierarchical tree is then devices and many other digital devices, images became a
formed in order to mine the feature information of the image vital part of a database that is useful for many applications.
details. The implementation results show that the Managing and differentiating these images as relevant and
performance is competitive with the state of art in image irrelevant images according to demand and requirement is
classification in term of recognition rate. much important. Other than just differentiating, effective
management of images is another important research area.
S. Nandgonkar et.al. [4] presented an unsupervised The proposed system revolves around an approach to get a
method for textual image classification taken into account analyzed and improved result by applying classification and
feature distribution of textual images. The difference clustering on textual images using low level image features.
between the textual images based on various low level The proposed system is simulated using MATLAB, and the
features like mean, skewness, energy, contrast, homogeneity, performance and analysis of this methodology are compared
is used to classify them into three types- Document image, on the basis of low level image features mean, skewness,
Caption Text image or Scene Text image. Weka J48 energy, contrast, homogeneity using data mining tool.
classifier is used for classification.
Y. Liu et.al. [5] presented a survey on technical
achievements in high-level semantic-based image retrieval. A. Image Processing
Major topics this survey covering are different aspects of the To process the image, Sets of images are taken which
research: extraction of low-level image features, similarity include all three types of images. Using MATLAB, these
index measurement and measurement of high-level semantic images are converted into gray scale images and then image
features. features mean and skewness are calculated. After this, sliced
X. Huang et.al. [6] proposed a method which uses a gray value in eight levels. The system considered one
hybrid approach to tackle the problem of anatomy image window from the sliced image and then formed a GLCM
classification. A mutual information based filter and local (gray level co-occurrence matrix) for each pixel of the sliced
features extracted from the images are used. Following that, image from that GLCM features as contrast, energy,
a hybrid scheme on the results is applied to achieve further homogeneity are calculated [9], [10].
improvement of the classification results. The result proves Process can be summarized into following steps:
that the hybrid scheme improves the results over the textual
or visual method on different anatomical class settings. Convert the given color image into grayscale. The RGB
values are multiplied by factors to obtain grayscale values.
O. Augereau et.al.[7] contributes a new methodology
for classifying document images by combining textual Gray Scale value = 0.56*r + 0.33*g + 0.11*b (1)
features and visual features using Bag of Words (BoW)
Where, r= red; g= green; b=blue;
Calculate the histogram features mean and skewness. 2) Now for classification will be performed on .arff
files using J48 decision tree which will classify
images as document, caption and scene images.
Mean (m1) = ∑ x P (x) (2)
3) For clustering, user classifier tree is applied on
both train and test .arff files and this will
classify images as document, scene, caption
Variance (m2) = ∑ (x − m )P (x) (3) images.
4) Finally, output tree and the result are obtained
Skewness (m3) = ∑ (x − m ) P (x) (4) as the percentage ratio in terms of correctly
classified images.
X – Gray level associated with pixel
The aim is to construct a decision tree that can be written as
Where, P u(x) is the probability of occurrence of pixel with follows:
intensity and can be given:
If (Condition 1, 1 and Condition 1, 2 and Condition 1, 3) =
>Non doc = FALSE
P (x) = (5) ……………….
If (Condition X , X,1 and Condition 2 and Condition X, 3) =
>Non doc = FALSE
Now, Slice the image into 8 color levels and also prepare ………………
GLCM metrics for that image. Calculate energy, contrast,
and homogeneity values. If (Condition M, M1 and Condition 2 and Condition M, 3)
=> Non doc =FALSE
G=graycomatrix(I);
[m n]=size(G); ……………….
for x=1:m
If no => Non doc = TRUE [9], [13]
for y=1:n
stats=graycoprops(G, 'contrast', 'energy', 'homogeneity'); Where, Condition I,1 is written as (w> … or w<...);
end
Condition i,2 is written as (v> … or v<...);

Depending on the feature values, system can classify and Condition i,3 is written as (h> … or h<...)
cluster. At first, the system classifies document images for And (i = 1: M; M is the number of conditions where a pixel
whom mean is highest and then scene and caption are is interpreted as non doc pixel). [9]
classified on the basis of energy and contrast as there is high
contrast in scene images[4].
IV. SIMULATION & EVALUATION

B. Training and Testing in WEKA A. Classification

Weka is a multipurpose data mining tool and the The ARFF file prepared from MATLAB simulation is
proposed system used it for classification and clustering loaded to WEKA and various image features are displayed in
purpose. The system has taken input in the form ARFF file the form of histograms. WEKA calculates a statistical range
and this input file is prepared for image processing. and display as min and max values for all the image features
mean, skewness, energy, contrast, Homogeneity [13] [14].
In weka, with input a sample of m classified records
(x, c(x)), a learning algorithm must provide output as a Based on the feature statistics range histograms for all
decision tree. Most algorithms follow top down approach the three types of images are constructed as shown in fig. 4,
i.e., they consider root of the tree as result (usually a test) and where blue represent document images, red represents
then, recursively, choose label of child [11], [12], [13]. caption images and green represents scene images.

The following steps will be followed for our proposed


system:

1) Two .arff file are prepared one for training and


another for testing with values of low level
image features mean, skewness, contrast, energy
and homogeneity.
Fig 4: Histogram for each feature classifying images into various categories

On pruning for J48 decision tree the results can be


obtained from the classifier output window in fig. 5 , which Fig 6: J48 decision tree formed by WEKA
shows correctly classified instances percentage ratio as Now, Using J48 decision tree classification was
66.6667 % and incorrectly classified instances as 33.333%. performed which form a tree based on information gain.
Other than this, “Confusion matrix” that shows correctly and Information gain is basically constructing a decision tree by
incorrectly classified images in the form of a matrix. finding attribute that returns the highest information gain or
the best suitable result [15]. For the dataset of proposed
system, WEKA has selected mean and skewness as the
features on whose basis decision tree could be constructed as
shown in Fig. 6.
The below given conditions shows decision rules to be
followed:
Infogain selected attribute: Mean
if (V > 160.1669) return Document;
if ((V = 160.1669) & & (V <= 160.1669)) return false;
Prune further;
Infogain selected attribute: Skewness
if (V > 1.0593) return Scene;
if ((V = 1.0593) & & (V <=1.0593)) return false;
Prune further;
Infogain selected attribute: Mean
if (V <= 120.3411) return Caption;
if (V > 120.3411) return false;

Fig 5: Classifier output showing confusion matrix Prune further;


Infogain selected attribute: Skewness
if (V <= 0.2658) return Caption;
if ((V >0.2658) return false;
Infogain selected attribute: Skewness
if (V <= 0.3517) return Scene;
if ((V > 0.3517) return Caption;
B. Clustering
Now, the proposed system datasets which are loaded in
WEKA as the training dataset and then user classifier which
is based on clustering (data visualizer) is selected. Various
sets are submitted in the form of clusters for evaluation. Fig.
7, shows a number of instances and attributes in the loaded
training and test set.
User classifier shows a sectional window to perform
clustering on the dataset instances by selecting attributes
according to user’s choice. The blue, red and green shows
document, caption and scene instances respectively. The
window provides “data visualizer” for clustering and “tree
visualizer” for displaying decision tree constructed [15],
[16].
Incorrectly classified instances occur when system
predicts an image to be in one category and it belongs to
another category [16], [17]. These incorrectly classified
instances in visualizer are represented as squared block and
Fig 8: Incorrectly classified instances
the Figure 8, show the same.

1) Now, perform clustering by selecting mean and


skewness attribute and submitting a set as
cluster for classification.
2) The second set is submitted by selecting
contrast and energy as attributes.
3) The third set is submitted by selecting energy
and homogeneity as attributes.
4) Then obtained output tree as shown in fig. 9, in
tree visualizer which shows correctly and
incorrectly classified images on the basis of
submitted attribute and instances.
The below given conditions shows decision rules to be
followed:
Infogain selected attribute: Mean and skewness
if (V = true) return document;
else return false;
Fig 7: Data and tree visualize for clustering Prune further;
Infogain selected attribute: Contrast and Energy
if (V = true) return Scene;
if (V = true) return Caption;
else return false;
Prune further;
Infogain selected attribute: Energy and Homogeneity
if (V = true) return Scene;
if (V = true) return Caption;
else
if (V= false) return incorrectly classified scene and caption.
The designed methodology used textual based images for
analysis and based on low level features (including GLCM
features) mean, skewness, energy, contrast, homogeneity
textual images are classified. The result obtained are
explained in table 1, which shows the number of images used
for each type i.e., 20 each for document, scene, caption for
classification and clustering. And the other column shows
the number of correctly classified images obtained as result
from train and test set in classification and clustering.
Thus, under classification we got 100% result for
document images while in caption images 90% and scene
images 85%. On the other hand, for clustering document and
scene images get 100% result while caption images get 80%
result. Overall, 73.3333% result is obtained for clustering
which is 66.6667% higher than classification technique as
shown in table 2. Other than this the result for both
classification and clustering is improved from earlier
research efforts.
Table 1: No. of images used for each type and No. of correctly classified
images
Image No. of given Images Correctly Classified
Type Images
Fig 9: User classified tree output from clustering For For For For
Classification Clustering Classification Clustering
At the end Fig.10, shows the pruned tree is constructed
Document 20 20 20 20
and result is obtained which shows matrix of correctly
classified and incorrectly classified images as “confusion Caption 20 20 18 16
matrix”. As well as accuracy percentage ratio as 73.3333% Scene 20 20 17 20
in terms of correctly classified images.

Table 2: Accuracy of each category depending on each image type

Category Classification of images Clustering of images

Accuracy 66.6667% 73.3333%

VI.FUTURE WORK
In the future, the proposed system can be enhanced by
commentating and classifying text from videos specially
educating tutorials, news reports etc. More efforts can be
made to obtain accurate classification in present system as
well as in the system which have a combination of two or
more categories within a single image. Selection of low level
features can be modified and improved by using Gray Level
Run Length Matrix (GLRM).

VII. CONCLUSION
Fig 10: Classifier output showing confusion matrix As more and more textual image database is available on
the web, mining of these databases has become increasingly
important. Based on high and low level features, low, RGB
V. IMPLEMENTATION RESULTS content etc images were being classified earlier. But the main
issue is to get a classifier that gives efficient and improved
result as well as that uses minimum number of features for 13) B. Raikind, M. Lee, S. Chang, H. Yu, “Exploring text and image
classification. features to classify images in Bioscience literature,” In: Proceedings
of the BioNLP Workshop on Linking Natural Language Processing
Thus, the proposed system is very efficient in and Biology, pp. 73-80 Association for Computational Linguistics
(2006).
classifying textual images into three categories, namely:
document, scene, caption. The technique uses a minimum 14) L. Breiman, Leo, J. Friedman, C. Stone, and R. Olshen,
“Classification and regression trees” In: CRC press (1984).
number of low level features (mean, skewness, energy,
15) R. John, Smith and S. Chang, “ Transform features for texture
contrast, homogeneity) which are easy to calculate and classification and discrimination in large image databases,” In: IEEE
classify. The selected five features are enough to classify International Conference on Image Processing, pp. 407-411,IEEE
textual images in 3 defined categories. (1994).
16) Adegorite, O. Basir, M. Kamel and K. Shaban, “An approach to
Other than classification that is followed in existing mining picture objects based on textual cues,” In: Machine Learning
techniques or systems, the proposed system follows and Data Mining in Pattern Recognition, pp. 466-475 ,Springer Berlin
clustering technique (via visualization) that results in Heidelberg(2005).
improved classifying rate. The result than compared with 17) L. Tian, D. Zheng and C. Zhu, “Research on image classification
classification technique that was followed in earlier research. based on a combination of text and visual features,” In: Fuzzy
Systems and Knowledge Discovery (FSKD), Eighth International
Conference, pp. 1869-1873, IEEE (2011).

REFERENCE
1) S. Chitrakala,P. Shamini & D. Manjula, “ Multi-class Enhanced
Image Mining of Heterogeneous Textual Images Using Multiple
Image Features,” In: . IEEE International Computing
Conference,IACC, pp. 496-501, IEEE (2009).
2) S. Rasoul, Safavian & D. Landgrebe, “A Survey of decision tree
classifier methodology,” In: IEEE Transactions on Systems, Man, and
Cybernetics (1991).
3) T. Duong, J. Lim, H. Vu, & J. Chevallet, “Unsupervised Learning for
Image Classification based on Distribution of Hierarchical Feature
Tree,” In: IEEE International Conference on Research, Innovation
and Vision for the Future RIVF, IEEE International Conference on,
pp. 306-310. IEEE, (2008).
4) S. Nandgonkar, R. Jagtap, P. Anarase, B. Khadake, and A. Betale,
“Image mining of textual images using low-level image features,” In:
IEEE International Conference on Computer Science and Information
Technology (ICCSIT), 3rd IEEE International Conference on, pp.
588-592. IEEE,( 2010).
5) Y. Liu, Z. Dengsheng, L.Guojun and M. Wei-Ying, "A survey of
content-based image retrieval with high-level semantics," In: Pattern
Recognition, pp. 262-282 (2007).
6) X. Huang, T. Zhao, C. Yu, M. Xiangming, and T. Pierre, "Towards
the improvement of textual anatomy image classification using image
local features," In: Proceedings of the 2011 international ACM
workshop on Medical multimedia analysis and retrieval, pp. 25-30,
ACM (2011).
7) O. Augereau, N. Journet, A. Vialard and J. Domenger, “Improving
the classification of an industrial document image database by
combining visual and textual features,” In: Document Analysis
Systems (DAS),11th IAPR International Workshop on, pp. 314-318,
IEEE (2014).
8) J. Wang, H. Yonghao, K. Cuicui, X. Shiming, and P. Chunhong,
"Image-Text Cross-Modal Retrieval via Modality-Specific Feature
Learning." In: Proceedings of the 5th ACM on International
Conference on Multimedia Retrieval, pp. 347-354, ACM (2015).
9) Datta, Ritendra, J. Li, and J. Wang, “Content-based image retrieval-
approaches and trends of the new age,” In: Proceedings of the 7th
ACM SIGMM international workshop on Multimedia information
retrieval, pp. 253-262, ACM (2005).
10) Ojala, Timo, M. Pietikäinen and D. Harwood, “A comparative study
of texture measures with classification based on featured
distributions,” In: Pattern recognition, pp.51-59 (1996).
11) Moudani, Walid, and A. Sayed, “Efficient Image Classification using
Data Mining,” In: International Journal of Combinatorial
Optimization Problems and Informatics, p.27 (2011)
12) K. Matsuo, K. Ueda & U. Michio, “Extraction of character string
from scene image by binarizing local target area,” In: Transaction of
The Institute of Electrical Engineers of Japan, pp.232-241 (2002).

You might also like