You are on page 1of 21

SLANT ESTIMATION ALGORITHM FOR OCR SYSTEMS

E.Kavallieratou, N.Fakotakis, and G.Kokkinakis


Wire Communications Laboratory, University of Patras, 26500 Patras, Greece. Tel. ++30-61-991722, fax ++30-61-991855 ergina@wcl.ee.upatras.gr

Abstract: A slant removal algorithm is presented based on the use of the vertical projection profile of word images and the Wigner-Ville distribution. The slant correction does not affect the connectivity of the word and the resulting words are natural. The evaluation of our algorithm was equally made by subjective and objective means. The algorithm has been tested in English and Modern Greek samples of more than 500 writers, taken from the databases IAM-DB and GRUHD. The extracted results are natural, and almost always improved with respect to the original image, even in the case of variant-slanted writing. The performance of an existed character recognition system showed an increase of up to 9% for the same data, while the training time cost was significantly reduced. Due to its simplicity, this algorithm can be easily incorporated into any optical character recognition system.

Keywords: Slant Removing Script Writing Wigner-Ville Distribution

Character Recognition

1 INTRODUCTION Handwritten text is usually characterized by slanted characters. In particular, the slanted characters slope either from right to left or vice versa. Moreover, different deviations may appear not only within a text but also within a single word. Some examples illustrating these cases are shown in fig.1. Slanted characters constitute a common feature of any natural language with a Latin-style alphabet (e.g., English, Modern Greek, etc.). For example, the percentage of slanted writing in IAM-DB [1] database of English reaches 77% while the corresponding percentage in GRUHD, a database of Modern Greek (1000 writers are included)[2], approaches 59%. The above rates were provided by manually counting the forms with apparent slanted writing either to the left or to the right, according to the human judge. Furthermore, slanted characters appear in both hand-printed and cursive writing. Consequently, a robust Optical Character Recognition (OCR) system has to be able to cope with slanted characters. Watanabe [3] conducted comparative experiments showing that slant normalization minimizes the error of recognition. To a further extent, the two most important problems that may arise from the existence of slanted characters, in regard to an OCR system, are the following: The application to slanted words of a character segmentation procedure, if such is required, that produces vertical segment boundaries (e.g., based on histograms), could result in defectively segmented characters as well as in noisy segments. Both the computational cost of the training procedure and the accuracy of the recognition stage would be affected in a negative way. Indeed, concerning a single

character, the amount of the training data required for covering as many slant angles as possible is substantial. Moreover, the classification of a given character into the correct class is much harder since the set of possible classes is now bigger. Therefore, the majority of recent OCR systems contain a preprocessing stage dealing with slant correction [4-7]. This stage is usually located before the segmentation module, if it exists, or just before the recognition stage otherwise. The most commonly used method for slant estimation is the calculation of the average angle of near-vertical strokes [4-6]. This approach requires the detection of the edges of the characters and its accuracy depends on the particular characters included in the word. Shridar [7] presents two more methods concerning slant estimation and correction. In the first one, the vertical projection profile is used while the second one involves making use of the chain code of entire border pixels. Vinciarelli [8] presents a technique based on a function S. that provides a measure of the slant absence across the word. The calculation of such function relies on the vertical density histogram that is easier to be obtained than the direction of the strokes. For each angle . in a interval [-15, 15], a shear transform t. is applied and the following histogram is calculated:

H (m ) =

h (m) y (m)

0 m < nCol

where h.(m) is the value of the vertical density histogram of the image shear transformed by the angle ..

y(m) is the difference between the maximum and

minimum y coordinates of the foreground pixels in the m column and nCol is the number of columns in the image. The number of foreground pixels of each column is

divided by the distance between the highest and the lowest pixel giving H.(m) = 1, if the column contains a continuous stroke, and H.(m) [0, 1] otherwise. Then, the maximum quantity is computed: S ( a) =

{i: H ( i ) =1}

(i )

The value . for which S(.) is maximum, is assumed as slant estimate and the corresponding shear transform t., when applied on the desloped original image, gives the deslanted image. Although time consuming in the shearing transformation procedure, the above method proved to decrease the error rate by 12% comparing to that proposed by Bozinovic[9]. However, the evaluation of the slant correction approaches is difficult since the slants may vary even within a single word. Additionally, in the relevant literature a slant correction procedure is rarely evaluated separately, so that comparative results cannot be given. In this paper a new method for slant estimation is presented based on a combination of the projection profile technique and the Wigner-Ville distribution. Moreover, it uses a simple and fast shearing transformation technique. Our approach is character independent and can easily be adapted in order to satisfy the requirements of any OCR system. Its performance is measured in relation to the improvement of the results of a character recognition system. In the next section the Wigner-Ville Distribution (WVD) is briefly presented. The algorithm is described in section 3. Finally, some experimental results are given in section 4 and the conclusions drawn by this study are included in section 5.

2 WIGNER-VILLE DISTRIBUTION An important chapter of signal processing is the non-stationary signals, that is, signals whose characteristics vary with time, in contrary to stationary signals that are timeindependent. Therefore, in order to succeed a better representation of non-stationary signals joint time-frequency distributions are used. A first class of time-frequency representations is the atomic decomposition or linear time-frequency representations. These distributions decompose the signal on the basis of elementary signals (i.e. the atoms) which have to be well localized in both time and frequency. However, there is a trade-off between time and frequency resolutions, as the decomposition is succeeded by windowing the signal. A good time resolution requires a short window; on the other hand, a good frequency resolution requires a long window. This is a consequence of the time-frequency resolution relation via Heisenberg-Gabor inequality [10]. In contrast, the energy distributions, another class of time-frequency representations, distribute the energy of the signal over two description variables: time and frequency. The starting point is that since the energy of a signal x can be deduced from the squared modulus of either a signal or its Fourier transform,
x = x (t ) dt = X ( f ) df ,
2 + + 2

(1)

we can interpret x(t ) 2 and

X( f )

as energy densities, respectively in time and in

frequency. It is, then natural, to look for a joint time and frequency energy density
x (t , f ) , such that
+ +

x = x (t, f )dtdf ,

(2)

which is an intermediary situation between those described by (1). As the energy is a quadratic function of the signal, the time-frequency energy distributions will be in general quadratic representations. A very well-known representative of the energy distributions and member of the Cohens class [11], is the Wigner-Ville distribution:
W (t, f ) = z(t + / 2)z * (t / 2)e 2f d ,
+

(3)

where z(t) represents the analytical signal associated with the signal s(t). The WVD can also be expressed as a function of the spectrum of the signal, Z(f), under analysis as follows:
W (t , f ) =
+

Z ( f + u / 2 ) Z * ( f u / 2 )e

j 2ut

du.

(4)

The WV function is a particularly popular distribution due to the large number of desirable mathematical properties it satisfies. Claasen and Mecklenbrauker [10] prove the uniqueness of the Wigner-Ville distribution in the sense that it is that single energy distribution that possesses all the stated desirable properties. This justifies the numerous applications of WVD in Pattern Recognition [12-13], Synthesis [14], Seismic signal [15], and Optics [16] etc.

3 THE ALGORITHM The vertical projection profile of a non-slanted word presents the most dips between the characters, even if they are connected, and the highest peaks at the main body of the characters. The latter is much more evident when ascenders and descenders are included. In the case of slanted words, the otherwise vertical strokes of the characters

cover now the intra-characters gaps. Hence, the dips of the histogram are less deep, while the peaks are smoother. The alternations (i.e., between dips and peaks) of the vertical projection profile of a certain word are more intent when it is non-slanted than slanted. In fig.2 the vertical projection profiles for different slants of the same word are shown. These slanted words were produced automatically by applying the technique described below. The proposed algorithm for slant estimation and correction for a given word, consists of six steps: 1. The word image is artificially slanted to both left and right for different slant angles. The maximum slant angle is 45 degrees approximately and the slant angle step depends on the height of the word image, as described below. 2. For each of the extracted word images, the vertical histogram is calculated. 3. The WVD is calculated for all the above histograms. 4. The curves of maximum intensity of the WVDs are extracted. 5. The curve of maximum intensity with the greatest peak, corresponding to the histogram with the most intent alternations, is selected. In fig.3 several curves of maximum intensity are shown. 6. The corresponding word image is selected as the most non-slanted word. In order to slant a word image to right or left we follow the procedure below. The word image is segmented in equally wide horizontal zones. The lowest zone is considered to be the base. The zone above the base is shifted one pixel to the right or to the left. The next zone (if it exists) is shifted two pixels to the same direction etc. Thus, each pixel p(x,y) of the image, is shifted to the point (x,y):

x=x+i, -Z<i<Z, 0<Z^h and (5) y=y, (6)

where Z the amount of zones and h the height of the image. The corresponding slant  will be: tan=-1/h. (7)

The more the zones, the greater the slant angle. The maximum slant angle corresponds to one-pixel-wide zones (i.e., when the amount of zones is equal to the height h of the word image in pixels). In this case, the higher zone is shifted by h-1 pixels and the corresponding slant angle  is the maximum one and it can be calculated as: tan =h-1/h 1  = 45o. (8)

To illustrate the above procedure, the gradual slanting of a vertical stroke is shown (in enlargement) in fig.4, while in fig.5 we can see the maximum slant of a non-slanted word to left and right. Notice that the words produced by that technique are very natural (as can be seen in fig.2). A maximum slant angle of 45 degrees covers the vast majority of handwritings. Shifting each zone by more than one pixel would increase the maximum slant angle. However, it could cause undesirable disconnections between adjacent zones producing an unnatural outcome.

4 EXPERIMENTAL RESULTS The presented slant removal algorithm has been tested on a collection of word images taken from the IAM-DB [1] and the GRUHD [2] databases comprising English and Modern Greek unconstrained handwriting respectively. In more detail, more than

1,500 word images were used taken from approximately 500 different writers, selected randomly. The word segmentation was performed automatically based on an OCR preprocessing system [18]. As already mentioned, the evaluation of a slant removal method is difficult since the selection of the most appropriate result very often falls under subjective judgements (especially in case of dealing with variant-slanted words). Moreover, a slant removal algorithm can indirectly be evaluated by taking into account the improvement it provides to an existing OCR system. Nevertheless, the application of our method to the above mentioned data produced very satisfactory results. Some examples are shown in fig. 6. In general, it is clear that even in the hardest cases the produced word is considerably improved as regards its processing in further stages (e.g., character segmentation, character recognition, etc.). It is worth noting that the already non-slanted words are not affected in a negative way by applying this algorithm whether the characters are connected or not. The ascenders and descenders, if they exist, play an important role in slant estimation since the alternations of peaks and dips in the histogram are getting more evident. However, the proposed method can also handle words without any ascender or descender. Regarding the variant-slanted words, the slant of the majority of the vertical strokes included in the word is more likely to determine the final slant angle since more peaks of the histogram will be generated. The presented algorithm has been incorporated into the character recognition system, shown in fig.7, aiming at the automatic processing of document images. The system, except of slant removing, includes six other main modules, namely skew angle estimation and correction, printed-handwritten text discrimination, line segmentation,

word segmentation, character segmentation and recognition, stemming from the implementation of already existing as well as novel algorithms. The proposed technique could be applied to words or directly to the text line images resulted from line segmentation. However, the uneven valleys of the vertical histogram, i.e. wide valleys between words narrow valley between characters, could give confusing results in some cases. This is the reason that we preferred to use part of the text lines instead. Assuming that every page corresponds to only one writer, a skew angle is estimated per page. The longer and the most solid the text parts, the more accurate the estimation procedure will be, since they include more information and more even histogram. In order to select these parts, the valleys of the vertical histogram of a text line, with width greater than a threshold are considered to be the boundaries between the parts. The ten longest from the resulted parts were selected. As a threshold, the 1/10 of the line height was used. However, any threshold small enough to sense word segment valleys or smaller is appropriate. The resulted parts usually are entire words or parts of words. The slant estimation technique is applied to them and the average of the estimated slants is considered to be the slant of the page. In fig.8 a document image of IAM-DB as inserted in the system and after the slant correction is shown. In fig.9 the dependence of the recognition accuracy on the amount of training samples per characters with and without the application of the slant removing algorithm are presented, respectively. The results consider tests applied to 200 forms (100 IAM-DB forms and 100 GRUHD forms). In the case of IAM-DB the system was trained with samples taken by NIST[17] database, while for the GRUHD, training sets of the same database were used. As compared with the

10

initial recognition system (i.e., when no slant-correction was performed), the required training data for achieving similar performance were reduced by more than one third. The computational cost of using the proposed technique depends strongly on the size of the document. However, the whole slant removing procedure for the document of the fig.8 requires 29.516 sec using a Pentium III at 300Mhz while just the slant estimation part demands 22.72 sec.

5 CONCLUSIONS In this paper, we presented an algorithm for slant removal. In contrast to current techniques, our method is character independent since it is based on the intent alternations of the vertical histogram, indicating the vertically oriented characters, rather than detecting the almost vertical strokes that may be included in the word. The WVD was used in order to estimate the slant angle that can range between 45o according to the original position. The evaluation of our algorithm was made by both subjective and objective means. First, the algorithm was applied to isolated word image samples from both English and Modern Greek databases. The extracted results are natural, and almost always improved with respect to the original image, even in the case of variant-slanted writing. Then the algorithm was applied to entire document images by incorporating it into an OCR system. The performance of the character recognition system was increased by up to 9% for the same data, while the training time cost was significantly reduced. Almost any OCR system can benefit from our algorithm since it requires little computational cost and it is easy adapted. Cases where characters within a single

11

word have to be corrected by different slant angles cannot be handled by our approach since it is based on the dominant angle. However, we currently work on providing the most accurate results.

REFERENCES [1] U. Marti and H. Bunke, A full English sentence database for off-line handwriting recognition. Proc. 5th Int. Conference on Document Analysis and Recognition, ICDAR'99. Bangalore, 1999, pages 705 - 708. [2] E.Kavallieratou, N.Liolios, E.Koutsogiorgos, N.Fakotakis, G.Kokkinakis, The GRUHD database of Modern Greek Unconstrained Handwriting, LREC2000, Athens, 1999, v.3, pp.1755-1759. [3] M.Watanabe, Y.Hamammoto, T.Yasuda, S.Tomita, Normalization techniques of handwritten numerals for Gabor filters, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR IEEE, Los Alamitos, CA, v 1 p.303-307, 1997. [4] G.Kim, V.Govindaraju, Efficient chain-code-based image manipulation for handwritten word recognition, Proceeding of SPIE-The International Society for Optical Enginering, Bellingham, WA, USA, v 2660 p.262-272, 1996. [5] S.Knerr, E.Augustin, O.Baret and D.Price, Hidden Markov model based word recognition and its application to legal amount reading on french checks, Computer Vision and Image Understanding, v 70, No 3, p.404-419, 1998. [6] A.W.Senior and A.J.Robinson, An Off-line cursive handwriting recognition system, IEEE Transactions on Pattern Analysis and Machine Intelligence, v 20, No 3, p.309-321, 1998.
12

[7] M.Shridar and F.Kimura, Handwritten address interpretation using word recognition with and without lexicon, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Piscataway, NJ, USA, v 3, p.2341-2346, 1995. [8] Alessandro Vinciarelli and Juergen Luettin, Off-Line Cursive Script Recognition Based on Continuous Density HMM, To appear in: Proceedings of the 7th International Workshop On Frontiers in Handwriting Recognition, Amsterdam, 11-13 September 2000. [9] R.M.Bozinovic and S.N.Srihari, Off-line Cursive Script Word Recognition, IEEE Trans on PAMI, vol.11, n.1, pp.68-83, 1989. [10] T.A. Claasen and W.F. Mecklenbrauker, "The Wigner distribution: A tool for time-frequency signal analysis", Phillips Journal of Research, Vol. 35, Parts 1,2, and 3, pp. 217-250, 276-300, and 372-389, 1980. [11] L.Cohen, Generalized phase-space distribution functions, J.Math. Phys. Vol.7 p.781-786, 1966. [12] B.Boashash, B.Lovell and L.White, Time frequency analysis and pattern recognition using singular value decomposition of the Wigner-Ville distribution, Advanced Algorithms and Architecture for Signal Processing, Proc. SPIE, vol. 828, p.104-114, 1987. [13] G.Cristobal, J.Bescos and J.Santamaria, Application of Wigner distribution for image representation and analysis, Proc IEEE 8th Int. Conf. Pattern Recognition, p.998-1000, 1986. [14] K.B.Yu and S.Cheng, Signal synthesis from Wigner distribution, Proc. IEEE ICASSP 85, p.1037-1040, 1985.

13

[15] P.Boles and B.Boashash, The cross Wigner-Ville distribution-a two dimensional analysis method for the processing of vibrosis seismic signals, Proc. IEEE ICASP 87, p.904-907, 1988. [16] O.Kenny and B.Boashash, An optical signal processing for time-frequency signal analysis using the Wigner-Ville distribution, J. Elec. Electron. Eng., p152-158, 1988. [17] Wilkinson, R., J. Geist, S. Janet, P. Grother, C. Burges, R. Creecy, B. Hammond, J. Hull, N. Larsen, T. Vogl, and C. Wilson, 1992. The first census optical character recognition systems conf. #NISTIR 4912. The U.S Bureau of Census and the National Institute of Standards and Technology. Gaithersburg, MD. [18] E.Kavallieratou, N.Fakotakis, G.Kokkinakis, An Integrated system for

Handwritten Document Image Processing, (under reviewing).

14

(a) a word slanted to right

(b) a word slanted to left

(c) a variant-slanted word Fig.1: Examples of slanted word.

15

   

a)

   



















    

b)

  



















    

c)

  





























d)























       

e)



















     

f)

 



















   

g)

   



 

 

 

 

 

 

 

 

Fig.2: Vertical projection profiles of a word with various slants. For each slant the slant angle and the number of horizontal zones are as follows: a) -45o -101, b) 30o31, c)-15o-27, d) 0o-1, e) 15o-27, f)30o-31, g)45o-101.

16

10 5 0

x 10

10 5 0

x 10

10 5 0

x 10

0 (a)

500

0 4 x 10 10 5 0

500 (b)

0 (c)

500

10 5 0

x 10

0 4 x 10 10 5 0

500 (d) 10 5 0

x 10

0 (e)

500

0 (f)

500

0 (g)

500

Fig.3: Curves of maximum intensity that correspond to the histograms of fig.2.

The initial vertical stroke. Separation of the stroke in two zones and shifting of the upper zone by one pixel to the left. Separation of the stroke in three zones and shifting of the second (starting from the bottom) zone by one pixel and the third zone by two pixels to the left. Separation of the stroke in four zones and shifting of the second zone by one pixel, the third zone by two pixels and the fourth by three pixels to the left. Separation of the stroke in four zones and shifting of the second zone by one pixel, the third zone by two pixels, the fourth by three pixels and the fifth by four pixels to the left.

Fig.4: The gradual slanting of a vertical stroke in enlargement.

17

Fig.5: The maximum slant of a word to left and right.

for I=1 to height of page { divide the word image in I horizontal zones base=lower zone for J=1 to number of zones { Shift zone[J] by J-1 pixels }next J shift_image[I] calculate vertical histogram[I] calculate WVD[I] extract maximum intensity curve[I] for J=1 to number of zones { Shift zone[J] by (J-1) pixels }next J shift_image[-I] calculate vertical histogram[-I] calculate WVD[-I] extract maximum intensity curve[-I] }next I select the maximum intensity curve[x] with the highest peak shift_image[x]

Fig.6: Some experimental results (leftdown: original word images, rightdown: corrected word images).

18

Fig.7: The system into which the proposed algorithm has been incorporated.

(a)

(b)

Fig.8: An example of IAM-DB document image, (a) as inserted into the system and (b) after the slant removing.

19

Without Slant Removal


Recognition Accuracy (%)

With Slant Removal

         

Amount of Training Samples

Fig.9: Recognition accuracy versus the amount of training samples per character, (a) before and (b) after the incorporation of the slant removing algorithm.

20

Fig.1: Examples of slanted word.: (a) a word slanted to right (b)a word slanted to left (c) a variant-slanted word Fig.2: Vertical projection profiles of a word with various slants. For each slant the slant angle and the number of horizontal zones are as follows: a) -45o -101, b) 30o31, c)-15o-27, d) 0o-1, e) 15o-27, f)30o-31, g)45o-101. Fig.3: Curves of maximum intensity that correspond to the histograms of fig.2. Fig.4:The gradual slanting of a vertical stroke in enlargement. Fig.5: The maximum slant of a word to left and right. Fig.6: Some experimental results (leftdown: original word images, rightdown: corrected word images). Fig.7: The system into which the proposed algorithm has been incorporated. Fig.8: An example of IAM-DB document image, (a) as inserted into the system and (b) after the slant removing. Fig.9: Recognition accuracy versus the amount of training samples per character, (a) before and (b) after the incorporation of the slant removing algorithm.

21

You might also like