Professional Documents
Culture Documents
DOCUMENTS
Carlos A.B. Mello and Rafael D. Lins
Department of Electronics and Systems – UFPE – Brazil
{cabm, rdl}@cin.ufpe.br
C. Johannsen’s Algorithm
Another variation of an entropy-based algorithm is
proposed by Johannsen trying to minimize the function
Sb(t) + Sw(t), with:
255 255 255
S w (t ) = log( ∑ pi ) + (1 / ∑ pi )[ E ( pt ) + E ( ∑ pi )]
i =t +1 i =t +1 i =t +1 Figure 3. Kapur et al’s segmentation
and
t t t −1
Sb (t ) = log(∑ pi ) + (1 / ∑ pi )[E ( pt ) + E (∑ pi )]
i =0 i =0 i =0
where E(x)=-xlog(x) and t is the threshold value. Figure 4
presents the application of this algorithm to the image of
the document under study.
The algorithm was also tested with other [12] Nabuco Project. URL: http://www.di.ufpe.br/~nabuco.
segmentation methods such as iteractive selection yielding
better results in terms of OCR hit rates and visual
inspection of monochromatic images quality.
IV. CONCLUSION
This paper introduces a new segmentation algorithm for
historical documents, which is particularly suitable to
reduce back-to-front noise of documents written on both
sides. Applied to a set of 40 samples from Nabuco’s
bequest it worked satisfactorily in 90% of them