You are on page 1of 4

International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism Ankara, Turkey, 3-4 Dec, 2018

Font and Turkish Letter Recognition in Images with


Deep Learning

Aylin Sevik Pakize Erdogmus Erdi Yalcin


Duzce University Duzce University Duzce University
Computer Engineering Computer Engineering Computer Engineering
Duzce, TURKEY Duzce, TURKEY Duzce, TURKEY
aylin.sevik.ae@gmail.com pakizeerdogmus@duzce.edu.tr erdiyalcin@duzce.edu.tr

Abstract — The purpose of this article is to recognize letter patterns in images to recognize objects, faces, and scenes.
and especially font from images which are containing texts. In They learn directly from image data, using patterns to
order to perform recognition process, primarily, the text in the classify images and eliminating the need for manual feature
image is divided into letters. Then, each letter is sended to the extraction. This model has a multilayer structure, each layer
recognition system. Results are filtered according to vowels comprising a plurality of two dimensional planes and a
which are most used in Turkish texts. As a result, font of the plurality of neurons each plane [9]. These layers can be
text is obtained. In order to separate letters from text, an examined in three sections, mainly in input layers, hidden
algorithm used which developed by us to do separation. This layers and output layers. While all the complex processes
algorithm has been developed considering Turkish characters required for learning take place in hidden layers, the input of
which has dots or accent such as i, j, ü, ö and ğ and helps these the data into the system is obtained from the input layer and
characters to be perceived by the system as a whole. In order to
the result is obtained from the result layer. This network
provide recognition of Turkish characters, all possibilities were
created for each of these characters and the algorithm was
consists of more than one layer allows the neurons to
formed accordingly. After recognizing the each character, perform the learning action in parallel. In addition, in
these individual parts are sended to the pre-trained deep classical machine learning, the answer is only 1 or 0, and in
convolutional neural network. In addition, a data set has been the output of the studies using this network structure, values
created for this pre-trained network. The data set contains between 0 and 1 such as 0.2 and 0.7 can be obtained. This
nearly 13 thousands of letters with 227*227*3 size have been makes it easier to solve the problem in a more detailed way,
created with different points, fonts and letters. As a result, 100 it increases the success in learning and provides better
percent of success has been attained in the training. %79.08 results. After learning features in many layers, the next part
letter and %75 of font success has been attained in the tests. of a convolutional neural network is classification. The next
to the last layer is a fully connected layer that outputs a
Keywords—deep learning, convolutional neural networks, vector of x dimensions where x is the number of classes that
font recognition, letter recognition the network will be able to predict. This vector contains the
probabilities for each class of any image being classified.
I. INTRODUCTION The final layer of the convolutional neural network
architecture uses a classification layer to provide the
In recent years, humanity is trying to do all operations classification output.
which are in the daily life, on digital systems by reducing
and automating human power. This automation requirement This paper has been divided into five parts. The first
enabled the creation of intelligent systems and provided an section of this paper gives a brief knowledge of the deep
environment for the application of the systems which are learning. The second section reviews the related works. The
called as artifical intelligence and machine learning [18]. third section is concerned with the methodology used for this
Many studies on deep learning have been made and study. The fourth section presents the findings of the
continue to be done [19,20]. Deep learning is a method that research. Finally, we provide the conclusion.
simulating the structure of the human brain [21]. This
method is a series of algorithms for finding a hierarchical II. RELATED WORKS
representation of the input data by simulating the way that In recent years, much more information has become
human brain senses important part of a set of sensory data available on the deep learning. Therefore a considerable
that it is exposed to at all times [1]. The idea which is at the amount of literature has been published on this topic. Even if
basis of deep learning emerged in 1950s with the definiton convolutional neural networks has been improved in the
of perceptron. Perceptron is the first machine that has the 1990s, it has gained the popularity since one decade.
learning ability. In the 1980s, the multilayer perceptron Increasing data, some unsolved problems based on image
structure was identified. But the perceptron has limited and video processing, the insufficiency of current learning
learning ability. Thus, the proposal of neural networks with methods increased the popularity of deep learning. So most
many layers emerged in the 2000s. The structure along with of the studies are based on the learning efficiency of
this recommendation has been better capable of learning. convolutional neural networks [14, 15]. Today, with the very
These are multiple layers of deep learning infrastructure successfull learning abilities, deep learning netwoks have
[2,3]. been used for classification, detection, diagnosis [4, 6, 8, 11,
Convolutional neural networks, in a variety of studies 12, 13].
over the image, showing high performance and Letter recognition in digitized documents is very
achievements, is known as a deep learning model that important for archiving. Some digitized documents contains
delivers enhanced results. This model useful for finding both images and text. So an analog neural netwok processor

978-1-7281-0472-0/18/$31.00 ©2018 IEEE

IBIGDELFT2018 61
International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism Ankara, Turkey, 3-4 Dec, 2018

has been designed and implemented for optical letter


recognition in 1991 [10]. In this study, handwritten digits
with 20X20 size has been used. In another study, visual
entity identification has been realized with neural network
[5]. In 1993, Chinese charecter recognition has been realized
using multiple convolutional neural networks [7].
Font recognition has been studied by some researchers.
Character independent font recognition on a single Chineese Fig. 1. Examples from dataset
character has been studied [22]. As stated in the article, the
recognition accuracy has increased with the number of the
B. Creation of Deep Learning Network
character used for recognition. Another study is on Arabic
font recognition. The proposed system has been based on In this study we used Alexnet to create our neural
steerable pyramids and shown high performance [23]. network which is included in deep learning toolbox of
Matlab ©.
III. FONT RECOGNITION Alexnet has been trained to classfy thousand different
In most of the studies, handwritten letter recognition has kinds of images and it has been trained on over a million
been implemented for different languages. A study was images. It has 25 layers, most of the layers are doing useful
found, that two dimensions optical letter recognition based image processing that will work for our system.
on deep learning, supports Turkish letters [16]. No font
recognition study, for Turkish was found in the the literature. First the training images have been shown to the network.
In this study, it is aimed to recognize the font of the text in an Secondly, data have been splitted into training and test
image and to extract the letter and font information in the images. 80 percent of images have been chosen for training
image. As far as we research, this is the first study on fonts images by randomizing. After these steps, the pre-trained
that support Turkish letters in the applications so far. network, AlexNet, has been modified. 23rd layer of AlexNet
has a thousand neurons in it, because AlexNet classifies a
Primarily we recognized the letters and secondly we thousand different images.It has been modified to 29 for the
recognized the fonts of the letters. Our methodology of letter first network and 38 for the second. Next step was
recognition consist Turkish letters. performing transfer learning in the network. For transfer
learning, the weights of the network have been adjusted.
This part will be described by the following way. In the How much a network is changed during training is controlled
first part, preparation of the data has been explained. In the by the learning rates. After setting up all network parameters,
second part, the training of the network has been explained. the training process has been realized.
Third part is concerned with the methodology of image
processing, before image is sent to network. Fourth part After training has been finished, the network has been
explains the way we used for separation of the letter. Final tested. Since the accuracy is accaptable, the network has
part shows the results. been saved.

A. DataSet
Since no prepared data set was found in the literature, a
new data set was created for this study. In the first part, the
data set was categorized by letters, and the second part data Fig. 2. Training process scheme
was categorized by fonts. In the letter categorization, images
have been created for each of the 29 letters. Only letters
have been included, digits have been excluded in this study. C. Preprocessing of Image
Since images may contain different sizes of letters, three The image has been processed, before it has been sent to
points have been selected. By this way, three letter images the network. These operations have been listed below.
have been created as big, medium and small for each lower
and uppercase letter. Points have been 72 point as big, 20 x In first step, the image has been converted to intensity
point as medium and 8 point as small. After the preparation image format.
of all the letter images, images have been resized as 227*227 x In second step, the image has been converted to
for deep convolutional neural network. By this way, 228 binary format.
letter images for one letter and 6612 letter images for the
whole alphabet have been created. x In third step, the image has been converted to
complement of itself, because of morphological
Another data set has been created by selecting 38 fonts image processing.
that support Turkish letters and using these fonts in each of
the 29 letters in the Turkish alphabet. Letters have been x In final step, morphologic image processing has been
categorized according to fonts. Big and small cases have used for finding each of letters locations. We have
been included and a total of 38 fonts have been used for one used a function that returns the label matrix that
letter and 174 letter images for one font, and 6612 letter contains labels for the 8-connected (horizontal,
images for all fonts have been created in total. Thus, the vertical and diagonal) objects in image.
preparation part has been completed. Some of examples from
database are given below.

Fig. 3. Preprocessing Scheme

IBIGDELFT2018 62
International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism Ankara, Turkey, 3-4 Dec, 2018

D. Separation of Letter E. Printout the Results


After preprocessing steps have been finished, letters have The results have been shown in two parts. First part is
been separated and found by labeling the image. All steps training results and the second part is recognition results. The
have been listed below. following tables show the training results.
x Corner coordinates of all labelled fields have been
found. TABLE I. TRAINING FONT NETWORK RESULTS
Mini- Mini- Base
x It has been decided that this labelled field is a letter or Epoch Iteration
Time
batch batch Learning
Elapsed
part of a letter by making proportional calculations. Accuracy Loss Rate
1 1 00:00:01 1.56% 4.3096 0.0010
x If the letter have been an accented letter such as “ğ”
2 200 00:06:34 17.19% 2.5671 0.0010
or dotted letter such as “i, j, ü or ö”, the coordinates of
labelled field have been updated according to these 3 250 00:08:13 28.13% 2.3336 0.0010
parts of letters. 4 450 00:14:48 37.50% 1.9929 0.0010
5 600 00:19:45 39.06% 1.8260 0.0010
x According to algorithm, there has been three
6 700 00:23:02 50.00% 1.6559 0.0010
possibilities. In the first cycle if the algorithm has
showed first field is a part of a letter and the second 7 850 00:27:58 43.75% 1.4769 0.0010
field is a letter then this is “i” letter. Else If first field 8 1000 00:32:55 60.94% 1.2711 0.0010
is a letter and the second field is a parf of letter then 9 1350 00:44:26 65.63% 0.8812 0.0010
this could be “j or ğ” letters. Else if the first field is 10 1640 00:56:41 73.44% 0.8055 0.0010
letter and the second and third fields are parts of a
letter then this could be “ü or ö” letter. If one of the TABLE II. TRAINING LETTER NETWORK RESULTS
possibilities become true then update the coordinates.
Mini- Mini- Base
Time
x After the coordinates of all letters including Turkish Epoch Iteration
Elapsed
batch batch Learning
Accuracy Loss Rate
letters have been defined, image has been converted 1 1 00:00:01 3.13% 3.8619 0.0010
to a pseudo rgb format again, only for the network
2 200 00:05:58 90.63% 0.2500 0.0010
requirement.
3 400 00:11:55 95.31% 0.1866 0.0010
x Size has been scaled to 227*227. 4 600 00:17:52 100.00% 0.0474 0.0010
x Each letter has been sended to the trained letter 5 800 00:23:49 98.44% 0.0217 0.0010
network. According to the network output label, 6 1000 00:29:46 100.00% 0.0151 0.0010
related letter has been added to the text. 7 1200 00:35:43 100.00% 0.0094 0.0010
x After letter recognition, each letter has been sended to 8 1400 00:41:40 98.44% 0.02165 0.0010
the font network. According to the network output 9 1600 00:47:43 100.00% 0.0034 0.0010
label, related font name has been saved. 10 1940 00:57:53 100.00% 0.0078 0.0010

x After each letters font has been attained seperately, it


has been seen that, the network result for each letter In order to increase the accuracy of the font recognition, the
can sometimes be different. Although the text of the results have been validated once again with the most used
test image has unique font, each of the letter font can Turkish letters. A, e letters are the most commonly used
be diverse because of the difficulty of font letters in Turkish alphabet [17]. So for validating the results,
recognition. So we overcome this difficulty with a after it has been found all letters and all fonts of the letters,
probability calculation. The result font has been it has been counted to each letter font frequency. So the
acheived with a probability. result font of the image is decided according to the highest
frequency. This network has been trained for 38 different
fonts. Network has been tested with 12 test images which
their proporties have been given in Table III. Lorem Ipsum
images have been created for testing. It has been selected
some most and least used fonts for testing. Some of the test
images has been shown in Fig 5. and Fig 6.

Fig. 5. Black 72 point lorem ipsum text as a test image

Fig. 4. Flow chart of recognition

Fig. 6. Blue 8 point lorem ipsum text as a test image

IBIGDELFT2018 63
International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism Ankara, Turkey, 3-4 Dec, 2018

TABLE III. PROPORTIES OF TEST IMAGES neural networks,” 2014 IEEE Conference on Computer Vision and
Pattern Recognition, Columbus, OH, pp. 1717-1724.
Fonts Points [5] C. Riley, P. Work and R. Miller, “Visual entity identification: a neural
Arial 8, 20, 72 network approach,” IJCNN-91-Seattle International Joint Conference
on Neural Networks, Seattle, WA, USA, 1991, pp. 909 vol. 2.
Bahnschrift 8, 20, 72 [6] A. Caliskan, H. Badem, A. Basturk and M. E. Yuksel, “A
Century Gothic 8, 20, 72 comparative study on classification by deep learning,” 2016 National
Conference on Electrical, Eletronics and Biomedical Engineering
Juice ITC 8, 20, 72 (ELECO), Bursa, 2016, pp.503-506.
Test results has been given in Table IV. [7] Q. -. Wu, Y. L. Cun, L. D. Jackel and B. -. Jeng, “On-line recognition
of limited-vocabulary Chinese character using multiple convolutional
neural networks,” 1993 IEEE International Symposium on Circuits
and Systems, Chicago, IL, 1993, pp. 2435-2438 vol.4.
TABLE IV. RESULTS OF TESTING
[8] Y. Le Cun and Y. Bengio, “Word-level training of a handwritten
Points word recognizer based on convolutional neural networks,”
Proceedings of the 12th IAPR International Conference on Pattern
Fonts 8 20 72 Result Recognition, vol.3 – Conference C: Signal Processing (Cat.
No.94CH3440-5), Jerusalem, Israel, 1994, pp.88-92, vol.2.
Arial 14/26: True 23/44: True 11/21: True %100
[9] P. Kuang, W. Cao and Q. Wu, “Preview on structures and algorithms
Bahnschrift 3/32: False 9/44: False 7/21: %33 of deep learning,” 2014 11th International Computer Conference on
Franklin Arial Bahnschrift Wavelet Actiev Media Technology and Information Processig
Gothic (ICCWAMTIP), Chengdu, 2014, pp. 176-179.
Century 17/30: True 24/44: True 13/21: True %100 [10] B. E. Poser, E. Sackinger, J. Bromley, Y. LeCun, R. E. Howard and
L. D. Jackel, “An analog neural network processor and its application
Gothic
to high-speed character recognition,” IJCNN-91-Seattle International
Juice ITC 3/9: False 34/43: True 20/21: True %66 Joint Conference on Neural Networks, Seattle, WA, USA, 1991, pp.
Courrier New 415-420 vol. 1.
[11] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen and H.
Total Average Success %75 Greenspan, “Chest pathology detection using deep learning with non-
medical training,” 2015 IEEE International Symposium on
IV. RESULTS AND CONCLUSION Biomedical Imaging (ISBI), New York, NY, 2015, pp.294-297.
In this study, it has been aimed to develop a deep [12] Y. Yuan, L. Mou and X. Lu, “Scene recognition by manifold
regularized deep learning architecture,” in IEEE Transactions on
network, recognizing both fonts and letters in Turkish. With Neural Networks and Learning Systems, vol. 26, no. 10, pp.2222-
this aim, a pre-trained network has been trained with nearly 2233, Oct. 2015.
13 thousands images. The letter recognition training [13] Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang and Q. Sun, “Deep
accuracy, was %100, and font recognition training accuracy learning for image-based cancer detection and diagnosis-A survey,”
was %73.44, because of the similarity of the fonts. But in Pattern Recognition, vol. 83, 2018, pp. 134-149, ISSN. 0031-3203.
order to increase the font recognition percentage, a [14] M. A. Abbas, “Improving deep learning performance using random
probabilty calculation has been used, after the network forest HTM cortical learning algorithm,” 2018 First International
output has been found. Even if the first test image font Workshop on Deep and Representation Learning (IWDRL), Cairo,
2018, pp. 13-18.
accuracy is 14/26, because of the probabilty is bigger than
[15] Y. Wang, “Cognitive foundations of knowledge science and deep
0.5, it has been accepted as Arial. So with this way, the knowledge learning by cognitive robots,” 2017 IEEE 16th
recognition performance has been increased a bit more. The International Conference on Cognitive Informatics & Cognitive
network has been tested with 12 images. These images Computing (ICCI*CC), Oxford, 2017, pp. 5-5.
contains all letters. According to the results, letter [16] A. Koyun, E. Afsin, “2D optical character recognition based on deep
recognition with this network has nearly %100 percentage, learning,” Journal of Turkey Informatics Foundation of Computer
but the accuracy of font recognition is low, as it can bee seen Science and Engineering, 2017, vol. 10, no. 1, pp. 11-14.
from Table IV. But using the probability, font recognition [17] S. I. Ilkin, M. Akin, “Attacking turkish texts encrypted by
homophonic,” Proceedings of the 10th WSEAS International
percentage has been increased. In future studies, a GUI will Conference on Electronics, Hardware, 2011.
be developped and the number of tests will be increased.
[18] S. H. Tajmir, T. K. Alkasab, “Toward augmented radiologists:
And the possible fonts will be shown with a probabilty changes in radiology education in the era of machine learning and
respectively. artificial intelligence,” Academic radiology, 2018, vol. 25, pp.747-
750.
REFERENCES [19] J. M. Valin, “A hybrid dsp/deep learning approach to real-time full-
band speech enhancement,” 2017.
[1] E. Bati, “Deep convolutional neural networks with an application [20] Y. Zhou, O. Tuzel, “Voxelnet: End-to-end learning for point cloud
towards geospatial object recognition,” Diss. Middle East Technical based 3d object detection,” 2017.
University Ankara, 2014.
[21] R. Wason, “Deep learning: Evolution and expansion,” Cognitive
[2] O. Elitez, “Handwritten digit string segmentation and recognition System Research, 2018, vol. 52, pp.701-708.
using deep learning,” Diss. Middle East Technical University Ankara,
2015. [22] X. Ding, L. Chen, T. Wu, “Character independent font recognition on
a single chinese character,” in IEEE Transactions on Pattern Analysis
[3] M. U. Oner, “Metastasis detection and localization in lypmh nodes by and Machine Intelligence,2007, vol. 29, no. 2, pp. 195-204.
using convolutional neural networks,” Diss. Middle East Technical
University. [23] F. K. Jaiem, F. Slimane, M. Kherallah, “Arabic font recognition
system applied to different text entity level anlysis,” 2017
[4] M. Oquab, L. Bottou, I. Laptev and J. Sivic, “Learning and International Conference on Smart, Monitored and Controlled Cities
transferring mid-level image representations using convolutional (SM2C), Sfax, 2017, pp. 36-40.

IBIGDELFT2018 64

You might also like