Professional Documents
Culture Documents
By
Defended publicly on 12th November 2020 in front of the jury composed of:
USTHB-2020
II
To all who find a serious affinity with me
III
IV
This thesis is the result of continuous work at, Laboratoire d'Ingénierie des Systèmes
Intelligents et Communicants (LISIC): the Laboratory of Intelligent and Communicating
Systems Engineering of the Faculty of Electronics and Computer Science of the University of
Science and Technology Houari Boumediene (USTHB). I would like to express my gratitude
to my supervisor, Professor Youcef CHIBANI. I appreciate all his contributions, his help, his
motivation, and his patience. I thank him for his method of work which encourages me to be
tenacious and to believe that I can succeed. Professor Mohamed CHERIET, from Ecole de
Téchnologie Supérieure (ETS) Montréal Canada: a School of Higher Technology, will also find
my sincere gratitude for all his help, his continual encouragements, his advice, and guidance
throughout these years. My warm thanks go to the members of the jury, Mrs. Aichouche
BELHADJ-AISSA Professor at USTHB Algiers; Ms. Karima BENATCHBA Professor at
Ecole Nationale Superieur en Informatique ESI Algiers: a National Higher School in Computer
Science, Mr. Nafaa NACEREDDINE, Director of research at Centre de recherche en
Technologie Industrielle CRTI Algiers: a Center of Industriel Technology Research; Mrs. Akila
KEMMOUCHE, MCA at USTHB Algiers and Mr. Abdel Oualid DJEKOUNE, MRA at Centre
de Developpement des Technologies Avancées (CDTA) Algiers: Center of Advanced
Technology Developpement, for having agreed to devote their time to the examination of this
thesis and for giving me the honor to present it in front of them.
I warmly thank my parents and my family for their support and encouragement. My sincere
thanks also go to all members of the LISIC laboratory with whom I shared years full of affinity
and sincerity in a good working atmosphere. I cannot conclude without mentioning and
thanking all my friends and colleagues.
V
VI
Throughout the world, a considerable amount of printed materials, including ancient
(historical) documents, which represent an invaluable heritage, suffer from various types of
degradations.
To preserve, to exploit effectively this heritage and to make it available to a large community
via the internet, important steps of image processing are necessary: (i) Documents digitizing is
necessary not only to reduce the human contact with the original documents to protect them
from damage due to the human handling but in addition to allow a wide and easy diffusion by
using different electronic media and means. (ii) Since the digitized ancient degraded documents
are not suitable for the use of information retrieval tools, for instance, due to various
degradations, it is necessary to proceed to the binarization stage. This latter consists of
separating the text (foreground) from the background and enhancing the foreground to allow
optimal use of all tools inherent to information technology (IT) such as document image
analysis, recognition, and retrieval systems, optical character recognition (OCR), document
layout recognition and analysis, word-spotting and information retrieval and so on.
Most of the binarization methods reported in the literature are based on the pixel’s gray level
intensity or on simple pixel neighborhood-based information such as mean or variance to
compute the binarization threshold. Moreover, they are mostly based on classical thresholding
methods combined with enhancement (pre-processing) and post-enhancement (post-
processing) stages. Furthermore, as it is difficult to model the degradation types, the problem
of document binarization remains a challenging task. It is worth noting that the information
based on neighborhood pixels is relevant for document image binarization methods and
particularly for thresholding-based methods. Therefore, in this thesis, we explore the use of
texture-based methods for binarizing ancient degraded document images acquired in the optical
domain of the visible spectrum in a spatial grayscale and/or color representation.
By this, we intend also to draw relevant conclusions about the use of pixel neighborhood
information other than basic intensity-based information. Three methods based on texture are
respectively explored, the co-occurrence, the Gabor filter, and the Local Binary Pattern model
(LBP). These methods are used to extract the texture information and combined with
conventional well-known thresholding-based methods to binarize the degraded documents.
Extensive experiments are performed for highlighting the opportunity to use texture for
improving the binarization.
Keywords: Ancient document images, Historical document images, Degraded document images,
Binarization, Threshold, Image processing, Texture, Gabor filter.
VII
VIII
Page
IX
4.3.2 Binarization module ....................................................................................60
4.4 Experimental results ......................................................................................................61
4.4.1 Experimental setup ......................................................................................61
4.4.2 Experimental evaluation .............................................................................63
4.5 Summary of the chapter ................................................................................................69
X
Page
Table 2.1: DIBCO dataset classification according to the degradation type [111]. ................... 37
Table 3.1: Haralick’s attributes values for Background and foreground images....................... 48
Table 3.2: Evaluation results of well-known methods for binarization by type of degradations.
.................................................................................................................................. 50
Table 3.3: Evaluation results of the proposed method and Nick’s method, for binarization by
type of degradations. ................................................................................................ 51
Table 4.1: Performance evaluation of the proposed method against the classical threshold
based on DIBCO datasets. ...................................................................................... 68
Table 4.2: Performance evaluation of the proposed method against the classical threshold-
based methods according to the degradation type. ................................................ 68
Table 5.2: Comparison of the proposed Sauvola-Gabor with the top binarization methods of the
DIBCO2009, DIBCO2010, DIBCO2011, DIBCO2012, and DIBCO2013 contests
and with certain of the sate-of-the-art on blind datasets. ......................................... 90
Table 5.3: Performance evaluation on unblind DIBCO datasets according to degradation type.
.................................................................................................................................. 91
Table 5.4: Influence of the weighting function used in the Sauvola-Gabor. .............................. 91
XI
Page
Figure 1.1: Example of ancient degraded documents due to their poor storage conditions.........6
Figure 1.3: Origins of Document degradations: (a) Chemical, (b) Biological, (c) Human,
(d)(e)(f) External. .......................................................................................................8
Figure 1.4: Document degradation types ((a-b-c-d-e-f-g) from DIBCO Datasets, (h) From
National Library of Algiers): (a) (b) Ink fading, (c) (d) Stain, (e) Ink bleed-through,
(f) Ink show-through, (g) (h) Non-uniform background, Ink fading and uneven
illumination...............................................................................................................12
Figure 2.2: Overall flowchart, of Document Analysis and Recognition System. ......................17
Figure 2.4: Document image binarization process: (a) Original image, (b) Binarized image:
foreground(text), (c) Intermediate background image (pixels belonging to
background). .............................................................................................................19
Figure 2.6: Document image with strong degradation type: (a) Ink bleed-through degradation,
(b) Stain degradation. ...............................................................................................24
Figure 2.7: Local and global binarization methods: (a) (b) Original degraded document images,
(c) (d) Respective histograms, (e) (f) Binarized document image using a global
method, and (g)(h) Binarized document image using a local method. .....................25
Figure 2.9: Curvelet transform-based binarization: (a) Original image, (b) Binarized image [42].
..................................................................................................................................28
Figure 2.10: Binarization based on k-means method: (a) (g) Original degraded document
images, (b)(h) Gray level degraded document image, (c)(i) Respective histogram,
(d)(e)(f) and (j)(k)(l) Respective degraded document layers (foreground,
Background, ink-bleed through degradation). ..........................................................31
Figure 2.12: Sample of a DIBCO-2016 degraded document image with its ground truth: (a)
Original color degraded document image, (b) Ground truth binarized document
image. .......................................................................................................................36
XII
Figure 2.13: Sample of a degraded document image from the National Library of Algeria (BNA)
used for a subjective evaluation. .............................................................................. 36
Figure 2.14: Example of degradation type categorization: (a) Type B, (b) Type D, (c) Type C,
(d), Type Mixed (D+A)............................................................................................ 38
Figure 3.1: Principle scheme of co-occurrence matrix: a) A sample of the sliding window
(size=5x5) showing the 08 directions with a distance 𝑑 = 2, b) the sliding matrix
within the matrix image. .......................................................................................... 42
Figure 3.3: Cooccurrence matrix image on degraded document image : (a) Original image, (b)
Cooccurrence matrix image. .................................................................................... 43
Figure 3.5: Overall steps related to the parameters optimization of the threshold-based
binarization method. ................................................................................................ 47
Figure 3.6: Sample of background and foreground document image: (a) Background, (b)
Foreground. .............................................................................................................. 48
Figure 3.9: Sample for a subjective evaluation on degraded document images from Dibco-
2009 datasets with a distance = 1, window size = 41 and k=-1: (a) Haralick’s contrast
feature with angle = 0° and (b) ) Haralick’s contrast feature with angle = 135°. ... 51
Figure 3.10: Sample for a subjective evaluation on degraded document images from Dibco-
2011 datasets with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s
contrast feature with angle = 0° and (b) Haralick’s contrast feature with angle = 135°.
.................................................................................................................................. 52
Figure 3.11: Sample for a subjective evaluation on degraded document images from Dibco-2009
datasets with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean
feature with angle = 0° and (b) Haralick’s mean feature with angle = 135°. .......... 52
Figure 3.12: Sample for a subjective evaluation on degraded document images from Dibco-2011
datasets with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean
feature with angle = 0° and (b) Haralick’s mean feature with angle = 135°. .......... 52
Figure 3.13: Sample for a subjective evaluation on degraded document images with angle =
135°, distance = 1, window size = 41x41 and k =-1 : (a) original image, (b) binarized
image using Nick, (c) Proposed Method. ................................................................. 53
Figure 4.2: LBP image of degraded document: (a) Degraded document from Dibco-2012 dataset,
(b) LBP image. ......................................................................................................... 56
Figure 4.6: Flowchart for estimating mean and variance from original and LBP images. ........60
Figure 4.7: Binarization method taking into account the advantages of both methods SLBP and
SLBP-C.....................................................................................................................61
Figure 4.8: Overall steps to set optimal thresholding-based method parameters. ......................62
Figure 5.1: Simplified implementation of Gabor filter bank. For simplified notation,
𝐻𝑥, 𝑦, 𝑓𝑖, 𝜃𝑗 and 𝐺𝑥, 𝑦, 𝑓𝑖 are denoted respectively 𝐻𝑗𝑖𝑥, 𝑦 and 𝐺𝑖𝑥, 𝑦, (𝑖 =
0, . . , 𝑁𝑓 − 1) and (𝑗 = 0, . . , 𝑁𝜃 − 1). 𝑁𝑓 and 𝑁𝜃 define the numbers of central
frequencies and orientations, respectively. ...............................................................73
Figure 5.2: Estimation of the standard deviation from degraded and Gabor filtered image (a)
Degraded image, (b) Standard deviation estimated from the degraded image, (c)
Standard deviation estimated from the Gabor filtered image. ..................................74
Figure 5.4: Fourier-transform of a degraded image: (a) Degraded image, (b) Fourier transform.
..................................................................................................................................77
Figure 5.5: Steps of the proposed method performed on Non-Uniform background (Left) and
Stain (Right) degradations: (a) Degraded image, (b) Wiener filtering (c) Binarized
image (d) Morphological operator............................................................................79
Figure 5.6: Steps of the proposed method performed on Ink bleed-through (Left) and Ink
intensity variation (Right) degradations: (a) Degraded image, (b) Wiener filtering (c)
Binarized image (d) Morphological operator. ..........................................................80
XIV
Figure 5.7: Two representative images per degradation type used to set up the parameters of the
Gabor filter banks: (a) Stain degradation, (b) Ink bleed-through, (c) Non-uniform
background, and (d) Ink intensity variation. ............................................................ 81
Figure 5.8: The ground truth images corresponding to the selected images of Figure 5.7 used to
find all the optimal parameters ................................................................................ 82
Figure 5.9: Flowchart for finding the optimal parameters of the Gabor filter. estimation of the
binarization threshold. .............................................................................................. 83
Figure 5.10: Effect of selecting 𝝈 and 𝝆 parameters: (a) Degraded image, (b) Gabor filtered
image with 𝝈 < 𝝆, (c) Gabor filtered image 𝝈 = 𝝆, (d) Gabor filtered image with
𝝈 > 𝝆. ...................................................................................................................... 84
Figure 5.11: Effect of selecting the mask size of the Gabor filter: (a) Degraded image, (b) 7 × 7,
(c) 11 × 11, (d) 21 × 21. ......................................................................................... 84
Figure 5.12: F-Measure versus the central frequency for different angles (4, 8, 16, and 32)
according to the degradation type: (a) Stain, (b) Ink bleed-through, (c) Non-uniform
background, (d) Ink intensity variation, (e) Overall. ............................................... 86
Figure 5.13: F-Measure for different numbers of angles according to the degradation type for
frequency 𝑓𝑜𝑝𝑡 = 0.140: (a) Stain, (b) Ink bleed, (c) Non-uniform background, (d)
Ink intensity variation, (e) Overall. .......................................................................... 87
Figure 5.14: Sample of subjective evaluation for Stain degradation: (a) Degraded image H02
DIBCO-2012 as stain, (b) Sauvola’s method, (c) Ground truth image, (d) The
proposed Sauvola-Gabor method. ........................................................................... 92
Figure 5.15: Sample of subjective evaluation for Ink bleed-through degradation: (a) Degraded
image H06 DIBCO-2012 as ink bleed-through, (b) Sauvola’s method, (c) Ground
truth image, (d) The proposed Sauvola-Gabor method. ......................................... 92
Figure 5.16. Sample of subjective evaluation for non-uniform background: (a) Degraded image:
PR03 DIBCO-2013 as non-uniform background, (b) Sauvola’s method (c) Ground
truth image, (d) The proposed Sauvola-Gabor method. ......................................... 93
Figure 5.17: Sample of subjective evaluation for ink degradation: (a) Degraded image H09
DIBCO-2012 as ink degradation, (b) Sauvola’s method, (c) Ground truth image, (d)
The proposed Sauvola-Gabor method. .................................................................... 93
XV
XVI
Since his existence, man has never ceased to assert himself and to leave traces of his life, his
personality, and his culture by using transcriptions on various means such as wood, leaves,
stones, papyrus, paper, etc.
Through the ages and especially with the development of writing and the appearance of
paper, the handwritten document has become the most used and the most popular medium.
From this fact to the present day, manuscripts and printed documents have never ceased to
invade us to such an extent that it is very difficult and almost impossible to preserve them and
to find or search for useful information for any purpose whatever.
Among these documents, historical documents represent an important part of the cultural
heritage, which plays a fundamental role in the economic and social development of nations.
These documents are an essential characteristic of peoples and worldwide communities, as well
as a testimony of their culture and civilization. Protecting them not only helps to preserve the
heritage itself, but also civilizations, peoples, and nations.
Unfortunately, these documents are unique and there is a very great risk of losing them
irrevocably. These precious objects suffer continuously and progressively from many forms of
deterioration and degradation due to a multitude of factors such as bad storage conditions,
improper handling, dust, dirt, rusty staples, humidity, etc. There is therefore an urgent need to
find a way to keep them and to make their use as simple, effective, optimal, and wide as
possible.
The digitization of the document is the most appropriate way to preserve cultural heritage.
Thanks to information and communication technologies (ICT) as well as the development of
electronics, they have enabled the most suitable solution by providing flexibility in terms of
storage, sharing, ease of access. Furthermore, large amounts of documents can be stored,
duplicated, and preserved. However, because of the strong degradations of ancient and
historical documents, it is almost impossible to exploit them effectively. Indeed, it is impossible
to use, on raw images of ancient degraded documents, tools inherent to information technology
(IT) [1, 2] such as optical character recognition (OCR) [3], text and line segmentation,
document layout analysis, and recognition, word-spotting and information retrieval and so on.
1
INTRODUCTION
To overcome this problem, a conversion of the document into the most appropriate digital
representation is necessary. To do this, image processing methods, including binarization which
consists of separating the text from the background while considering minimizing the effect of
the different degradations and that of the binarization itself on the text, are mainly used.
The proposed work in this thesis is dealing with the problem of ancient and degraded
document image binarization. Although binarization of the degraded document image has been
studied for many years and various methods have been proposed in the literature, the problem
is still challenging [4-9]. Indeed, the field of ancient document image binarization remains an
open research area because of the variety, non-uniformity, and complexity of the degradations.
Most of the reported methods in the literature are threshold-based estimation on the pixel’s gray
level intensity by considering basic information of its neighborhood [10-18]. It is also worth
noting that texture-based features and time frequency-based methods have gained much
attention from researchers in various applications such as image segmentation, but few works
are reported to address binarization issues. Furthermore, it is shown that texture-based methods
of pixel’s neighborhood would be more representative than methods based on basic
neighborhood information of pixels (according to pixel’s gray level intensity) [19-23]. For these
reasons, our main motivation is to explore and investigate more deeply [24, 25] the use of
texture characterization for ancient degraded document binarization in order the enhance and
provide accurate binarization results. Therefore, this thesis will focus on the use of texture to
extract the relevant intensity characteristics based on the gray levels of the pixels and their
respective spatial neighborhoods information and eventually on characteristics based on the
pixel values in other spaces, to develop an enhanced threshold-based method inspired by the
most popular well-known thresholding methods. Moreover, in our approach, we will avoid the
use of a rough binarization step, which is mostly used in the literature. Our contributions can
be summarized as follows:
2
INTRODUCTION
After the Introduction where the presentation and the description of the concerned
problematics, the motivation, and the objectives, are provided, the thesis is organized into four
chapters. Chapter 1 describes the ancient degraded documents, their various degradation
sources, and types. Chapter 2 presents the related works for binarizing degraded document
images. Chapter 3 presents the co-occurrence-based document image binarization method.
Chapter 4 presents the LBP-based degraded document image binarization method. Chapter 5
presents the Gabor-filter-based degraded document image binarization. Finally, a conclusion
summarizes the work, and prospection is given for future work.
3
4
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
Abstract
The purpose of this chapter is to present the importance of the ancient degraded
(historical) documents, focusing on the most serious problems they are suffering
from. We will present the various types of document degradations and their origins.
Nowadays, paper is still the most used medium that continues to invade us in our life, despite
the development of electronic communicating systems. All over the world, many libraries are
concerned with two problems: (i) providing easy access to information for the users, (ii)
preserving books and documents, especially ancient and historical documents, from
deteriorating. Throughout the world, thanks to technological advances and the extremely low
cost of electronic equipment; the use of paper as a medium has almost declined in several
domains. Nowadays, we use less and less paper. Books, newspapers, invoices, and forms are
becoming more and more electronic. These electronic means has several advantages, for
example, ease of use, low storage cost, protection against deterioration of the medium, ease of
searching for information, as well as the economic aspect.
Nevertheless, the paperless objective is not quite achieved, but even if it is progressing at a
slower speed, the paper continues to invade us because of the provided comforts in some certain
cases such as reading newspapers or books for instance. Thus, the two means i.e. paper and e-
paper, coexist harmoniously. Nowadays, several digital and virtual libraries have been created
and the phenomenon is becoming more widespread [26, 27]. These libraries have mainly started
scanning documents that do not yet have a digital version.
5
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
Also, as noted above, there are many ancient, precious, and degraded documents around the
world, which suffer continuously and gradually. These documents, whether in libraries or
warehouses, are often stored in poor conditions, thus accelerating their deterioration. They
undergo many forms of deterioration and degradation due to a multitude of factors that mainly
result from poor storage conditions. It is urgent to find a way to save and protect them. A sample
of documents undergoing the effects of bad storage is illustrated in Figure 1.1, showing some
documents from Khizanates (traditional libraries) located in Adrar which is a region of southern
Algeria, which mostly shelters precious manuscripts and jealously guarded in such Khizanates
and Koranic schools. There is therefore an urgent need to find a way to keep them and protect
them from continuous degradation.
Figure 1.1: Example of ancient degraded documents due to their poor storage conditions.
6
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
To protect them and to avoid improper human handling, the degraded document is converted
to its electronic form using an optical acquisition device, such as a camera or a scanner as shown
in Figure 1.2. There are simple acquisition systems, ranging from webcams, cell phone cameras,
desktop scanners as shown in Figure 1.2 (a-b), to the most sophisticated with very high-
resolution sensors offering more advanced lighting systems and suitable for book scanning
Figure 1.2 (c-d). There exist also other scanners with advanced features such as automatic book
digitization Figure 1.2 (e), as well as those suitable for large formats Figure 1.2 (f).
(d)
(e) (f)
Figure 1.2: Various document scanner devices.
As in our thesis, we deal with the binarization of ancient degraded document images, it is,
therefore, necessary to define these degradations as well as their types and their origins.
Roughly speaking, a document degradation is a partial or a complete alteration that makes it
difficult to read a document, both for humans and computers. These degradations have several
origins. More details and explanations will be given in the following sub-sections. We notice
that the document images depicted in Figure 1.3, are from the national library of Algiers (BNA).
7
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
(c) (d)
(e) (f)
Figure 1.3: Origins of Document degradations: (a) Chemical, (b) Biological, (c) Human, (d)(e)(f) External.
altered mainly due to the quality of the reproduction material, its settings, and its conditions of
use. The quality of the material mainly depends on its technological characteristics among
others, we can cite for instance the main technological characteristics for (i) printing systems,
such as offset, thermal, laser, inkjet, color, grayscale, and for (ii) digitizing systems, such as
manual scanners, large volume, suitable for books, colors / grayscale, resolution, lighting and
speed, camera, lighting, resolution, see Figure 1.3 (c-d-e-f).
Physical degradation
Over time, ancient documents are themselves physically altered or modified by chemical,
biological, or human sources. Therefore, as explained above, these degradations are classified
in the category of physical sources. In the following, more details will be given.
a) Chemical source
The principal raw material for producing paper is cellulose fibers extracted from plants and
mainly wood and various additives [28]. These latter are added to make it more resistant or
giving it specific characteristics such as transparency, waterproofness, etc. Since cellulose is a
biopolymer essentially composed of carbohydrates and added with various chemical additives,
the paper becomes very fragile and subject to the various risks associated with inks, atmospheric
and climatic phenomena. It is worth noting that the ink is a liquid or paste which contains
colored organic or synthetic pigments. Indeed, over time, the humidity, light, and the nature of
the air create conditions of chemical reactions that alter the paper and the ink as well. It usually
results in yellowing of the paper and discoloration usually accompanied by a spread of the ink
and its dissipation across the front-page side to alter the back-page side and vice versa. This last
degradation is called ink-bleed-through.
b) Biological source
Mice, insects, and/or micro-organisms, favored by humidity and heat, can be attracted to
libraries or places containing paper documents or materials such as wood, cotton, and fibers.
These places are very conducive to the refuge and proliferation of all kinds of parasites.
Generally, paper and wood are their favorite foods. Over time, all documents would suffer
considerable damage and devastation. All these parasites generate various degradations ranging
from simple to the most severe. Damage such as stains, holes, would be observed on all
documents. Moreover, these documents can also crumble, and even irreversible damage is often
observed to the point of making them disappear gradually, see Figure 1.3 (a).
9
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
c) Human source
Because of the frequent and improper handling, as well as bad storage, of documents and
books, man contributes greatly to their degradation. This results in tears, folds, stains that make
the document fragile, illegible and thus promoting the acceleration of the degradation process.
Besides, he spares no effort to annotate, affix stamps, and uses adhesives or staples to repair
and restore worn documents, see Figure 1.3 (a-b).
External degradation
Degradations are produced during the capture or printing phase of documents; using devices
that may generate blurred images and/or induce degradations such as deformations and
distortions on the generated documents compared to the original documents as seen in Figure
1.3 (c-d-e-f). The appearance of the image captured or reproduced respectively by the scanning
and/or printing process depends on various factors related to the technology, characteristics of
the device, and its settings. More details will be given in the following.
a) Acquiring device
To obtain the digital image of the document, an optical acquisition system is generally used,
namely a scanner or a camera. For a scanner or a camera device, the image rendering depends
on the following characteristics: (i) The type of sensor (CMOS, CCD, etc.) can have different
resolutions, speed, and sensitivity to lighting. This last parameter is very important to reduce
this generation of various electronic noises within low lighting. Also, there are radio
interference noises that should not be overlooked. (ii) The lighting system is also of paramount
importance regarding the quality of the obtained images. This system must take into account
the intensity of the light, its wavelength, the frequency inherent in its generation, and the type
of the lamp, which can drastically affect the image quality of the document. Various types of
lamps exist such as incandescence-based, Light Emitting Diodes based (LED) producing light
by the movement of electrons between the two terminals of the diode, gas-based (neon, xenon,
etc.). (iii) The position of the sensor to the document to be scanned (distance, skew angle, etc.),
can generate a document image with an unwanted skew angle see Figure 1.3 (e). (iv) Camera
settings (resolution, shutter speed, focus, etc.) also affect the rendering of the image, for
instance, the image, shown in Figure 1.3 (c) is blurred due to either the resolution of the camera
which is improperly set or the camera focus is badly adjusted. More often the operator's fingers
are captured with the document image as shown in Figure 1.3 (e) where we can see, the
document image is altered by the appearance of the fingers image of the operator who scanned
the document, due to lack of means or negligence. Several other problems can arise from the
10
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
improper adjustment of the camera or scanner light and from the improper positioning of the
document which can cause distortions or shadows in the digitized documents. In Figure 1.3 (d)
we can notice a shadow generated by poor lighting and a curved distortion of the acquired
document caused by the fact that the document is not flattened during capture. The latter is a
recurrent phenomenon during document acquisition to the point that this issue (which is a
particular type of degradation) is addressed by researchers.
b) Re-producing device
Till now, printing books or documents was a tedious task. The technical processes were
heavy, expensive and their amortization could only be envisaged on large series. Today, with
the advent of high-performance digital printing, it is possible to print a book in very small copies
for a very reasonable price. Magazines, brochures, and catalogs as well as large print books are
printed on offset machines. It is a process that enables a book to be printed at a low unit cost in
large quantities of around a few thousand copies. To operate offset machines, you need to create
engraved plates, then install them on cylinders, then inking, and finally with many sophisticated
settings and expensive tests. Once these steps are completed, the offset machines produce huge
volumes in a very short time. The unit cost of the printed copy drops rapidly and becomes closer
to the paper cost. On the contrary, digital printing machines are designed for a series of less
than a thousand copies. A digital printing machine works, in principle, as a huge photocopier.
We can also cite office printers which allow us to reproduce only a few examples of about ten.
There is, therefore, no big problem with settings compared to the offset-based printing systems.
Overall, the image rendering depends on the following features:
(i) Resolution of the system and the quality of the ink, this latter may generate non-uniform
script intensity, ink seeping, and ink fading which depend closely on the paper quality
and composition.
(ii) The technology of printing system (offset, laser, inkjet, old printers, etc.) may generate
also script zones with non-uniform script intensity. For instance, certain old printers can
generate non-uniform characters, with a non-uniform space and gaps between characters,
non-straight lines, and faded characters, which can be observed in Figure 1.3 (f).
11
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Figure 1.4: Document degradation types ((a-b-c-d-e-f-g) from DIBCO Datasets, (h) From National Library of
Algiers): (a) (b) Ink fading, (c) (d) Stain, (e) Ink bleed-through, (f) Ink show-through, (g) (h) Non-uniform
background, Ink fading and uneven illumination
To obtain accurate and better results, several research studies deal with a unique and well-
known type of degradation [29]. In many cases, the most dominant degradation is targeted. The
degradations type can be classified as follows.
a) Ink degradation
A process known as photodegradation occurs when the light changes the chemical
compositions of the ink and thus causes discoloration. As the composition of the paper support
as well as that of the ink, are generally not uniform, non-uniform attenuations are often
observed. Ink fading and non-uniform ink intensity of the script document can be observed
locally or globally on the document. Generally, ink seeping is observed around characters, see
Figure 1.4 (a-b).
12
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
b) Stain degradation
Stains, smudges, smears, and shadows can be observed locally or globally on the document
at different intensities and scales, see Figure 1.4 (c-d). This usually results partially, in poor
readability of documents to the point of erasing the document foreground often irreversibly.
e) Mixed degradation
Generally, these ancient documents undergo several types of degradation at the same time,
as shown for instance in Figure 1.4 (g-h) several degradations (ink fading, shadowing, non-
uniform background), which then makes their processing more difficult and less effective.
However, their processing can be done according to the most dominant degradations.
In this chapter, we highlighted the importance of ancient documents and the need to preserve
them from different types of damage and degradation. The suitable solution to preserve the ancient
documents from more degradations is to digitize them to avoid human bad handling. Nevertheless,
13
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION
it is important to notice that this action is not enough to exploit the document image by using ICT
tools.
We presented the main degradations and their sources. To deal effectively with each degradation,
we classified them mainly into four categories namely, Ink fading, Ink bleed-through, Non-uniform
background, and stain.
14
Abstract
The OCR tools are not enough to recognize documents especially when the documents are
complex such as a document with multicolumn, images, and multi-policy as shown in Figure
2.1 (Newspaper page from the library “Bibliothèque Nationale de France (BNF): Source
). Analyzing and recognizing the physical
and logical structure of the document image is the key to converting accurately this type of
15
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
document images. Indeed, if we use an OCR on the document without taking into account the
logical structure of the layout; the OCR would have produced a document whose characters
will be converted into American Standard Code Information Interchange (ASCII) format, by
scanning the document’s characters from left to right, which will have the effect of juxtaposing
the words of the same lines of different independent paragraphs. Also, as fonts are not
recognized, which would have the effect of producing text without logical structures, therefore
of producing plain text without information related to logical entities.
Consequently, the produced text will only be a mixture of words and sentences without
meaning. Moreover, it is necessary to consider the difficulty caused by the presence of images
and graphics which must be extracted and recognized separately. Document Image Analysis
and Recognition (DIAR) systems, as shown in Figure 2.2, aim to recognize and extract all
document components from document image to obtain an accurate description of the document
structure, and recognize all the entities and objects present in the document image to obtain a
digital version as close as possible to the original (reproduce all the useful information both
logical and physical of document image) and suitable for using IT tools to find, search and
handle document information [30-32]. Indeed, DIAR provides document information and
features related to the document structure and layout such as section, headings, footnotes,
references, fonts, blocs, paragraphs, etc. to fully reconstruct the original document from the
electronic format and to take advantage of IT tools and all the computer-based tools for indexing
and searching document’s contents. Then the document is stored in a machine-readable format:
for instance, the image is stored in a jpeg format and the text, after using recognition tools such
as optical character recognition (OCR), is stored in ASCII format. Then information technology
(IT) tools based on retrieval information techniques could be used. However, this last operation
based on OCR tools is more difficult or even complex and impractical to achieve for
handwritten and ancient degraded documents. This category of documents presents challenges
that remain in the field of research. Due to various severe degradations, even printed historical
documents that have suffered over time can present more issues than contemporary handwritten
documents. The degraded and noisy document poses additional problems because of the
complex alterations in the quality of the document image. This makes it impossible to fully
convert the document into electronic representation by using OCR software. To overcome this
drawback, which is still a challenging problem in both fields related to document image analysis
and recognition, and of document image analysis and retrieval, many methods based on the
document image form, are proposed in the literature to access and manipulate document images
without the need for a complete and accurate conversion [33-35]. Word spotting is one of the
most common techniques used by researchers to use image matching to look for the similarity
16
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
of words in degraded documents. Nowadays, word spotting for word and document search and
recognition is a very popular area for the research community to solve several problems related
to the field of document analysis and retrieval [36]. However, to do this and to obtain accurate
results, it is necessary to have documents as clean as possible. We must extract the text
(foreground) as accurately as possible without any alteration despite the existence of
degradations and complex backgrounds. Document image binarization is the most appropriate
technique. More details about the binarization process will be given in the following
subsections.
Tille
Text
Blocks
Sub-titles
Image
Document image
Acquisition
Horizontal lines,
Text Structure and features: Vertical lines,
Title, Headline, Filled region
Curves,
Author, Text Lines,
Foot notes, Titles, Text
Check box Image
Table,
Cell
Face Other
Figure 2.2: Overall flowchart, of Document Analysis and Recognition System.
17
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
The binarization process is carried out as depicted in Figure 2.3, following several steps such
as Digitization, Preprocessing, Binarization, Post-enhancement, and then Final binarized
document, the details of the various steps are given in the following.
Binarized
Degraded
Documents
Documents
a) Digitization
In this process, as explained in chapter 1, a degraded document is converted to its electronic
form using an optical acquisition device, such as a camera or a scanner.
b) Preprocessing
This stage aims to enhance the image to prepare it for the next processing. Techniques and
tools such as color to grayscale conversion, noise removal, blur removal, histogram
equalization, and filters are applied to the acquired document image.
c) Binarization stage
Binarization techniques and algorithms are performed to achieve text and background
separation. A document image binarization consists generally of converting the pre-processed
gray-level image into a binary document where the image is represented by two subsequent
classes: foreground (text) and background (no text) as shown in Figure 2.4. It is worth noting
that in the binarization process, the researchers are mainly interested in extracting the text as
shown in Figure 2.4(b). Nevertheless in some works, mainly related to restoration[37, 38], are
dealing with the background extraction and enhancement. For instance, the background image
is generated using the original image Figure 2.4(a), from which we subtract the binarized image
(the foreground) Figure 2.4(b) to produce the intermediate background image Figure 2.4(c)
which contains pixels belonging to the background and then enhanced eventually for other
purposes if needed using inpainting techniques, to fill in the missing image information [39,
18
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
40]. Roughly speaking, a binarization process can be explained simply, for example, by the
following instance, let 𝐼(𝑖, 𝑗) a matrix representing pixel’s gray values of an image at
coordinates values (𝑖, 𝑗), where 𝑖 and 𝑗 represent respectively the raw and the colon coordinates.
The simplest way to perform a binarization ( leading to binarized image 𝐵(𝑖, 𝑗), is by finding
a global threshold 𝑇 which is a pixel’s gray level intensity allowing the separation of pixels
belonging to the foreground from those belonging to the background according to Eq.(2.1).
1 𝑖𝑓 𝐼(𝑖, 𝑗) > 𝑇
∀ 𝑖, 𝑗 ∈ 𝐵(𝑖, 𝑗) = { (2.1)
0 𝑒𝑙𝑠𝑒
Thresholding
Image subtraction
𝒊𝒎𝒂𝒈𝒆 (𝒂) − 𝒊𝒎𝒂𝒈𝒆 (𝒃)
Figure 2.4: Document image binarization process: (a) Original image, (b) Binarized image: foreground(text),
(c) Intermediate background image (pixels belonging to background).
19
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
d) Post-enhancement
Finally, post-enhancement (post-processing) such as grayscale normalization,
morphological filters: (erosion, dilation) are performed to improve the binarization results to
keep the text as clean as possible without damage and with a few alterations and artifacts as
possible. Typically, the document undergoes scale normalization and skew angle correction
step before addressing the problem of word-based recognition, such as word spotting [35, 36].
Once the document image is binarized and post-processed, the use of tools based on analysis
and document retrieval techniques becomes possible without having a full conversion of the
degraded document into electronic representation.
Binarization of the degraded document image has been studied for many years where various
methods have been proposed in the literature [4, 5, 8, 9]. It is noticed that it is very difficult to
propose a perfect and generalized binarization algorithm because of various and complex
degradation types. Moreover, it is almost impossible to model these degradations. To get
accurate and effective results, researchers among others, addressed these issues mainly,
focusing on specific degradation types such as ink bleed-through, show-through, ink stains,
faint characters, etc.[29] [41, 42]. Several methods and approaches are adopted in the literature
which will be detailed in the taxonomy and the following sub-sections.
One-sided Double-sided
documents documents
methods methods
-Pixel -Otsu
-Histogram -Kittler
Thresholding-based on Texture based
𝝈(𝒙, 𝒚) and 𝒎(𝒙, 𝒚) -Run length
-Wolf -Co-occurrence
-Niblack -LBP
-Sauvola -Gabor filters, etc.
21
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
next page. This phenomenon can be considered respectively as a superposition of two layers of
document image [44, 45]. Because the type of Ink bleed-through degradation is a text with
opposite slant script direction of the foreground (by mirroring) and often with lower intensity
and for the show-through degradation it is a text with the same direction of the foreground with
also lower intensity.
Addressing these issues by using methods accessing both recto and verso sides information
of the document images simultaneously rather than processing each side independently, would
allow achieving more accurate results. The results, of double-sided document image
binarization methods, would be an improvement over the one-sided ones if the documents are
acquired in the same conditions.
In [46, 47], the authors tackle the problem by registering the two sides of the document
images. For instance, the authors in [46], after the registration assume that the edges of the
foreground strokes are sharper than that of the interfering strokes and, the orientations of the
foreground strokes and the interfering strokes are different; to perform the separation between
the foreground and the degradation from the reverse side due to ink seeping. In [48], the authors
use a clustering method based on a set of content-level classifiers to differentiate between the
text and background. The clustering features are the estimated background and the estimated
stroke gray level.
In [44] [49] the authors addressed the problem as blind source separation, based on
Independent Component Analysis and on second-order statistics where a document is modeled
as a view of a linear combination of independent patterns (foreground and background).
However, to successfully achieve the registration step, which is time-consuming and not a
simple task, the documents (both sides if they exist) must be scanned with the same equipment
and conditions to avoid deformation such as scale, skew, offset issues, etc. Nevertheless, these
conditions are rarely met and fully satisfied.
(iii) Combined with methods based on a classifier to optimize the related thresholding
parameters.
(iv) Often the classifier-based methods are used as additional methods to optimize the
results or to select the best thresholding-based method.
According to the taxonomy, two notions global and local were used which will be explained in
the following.
methods depend strictly on the strength of the degradation and on the size of the centered
window are generally related to the foreground stroke’s width, see Figure 2.7(g) and Figure
2.7(h), where the result of the binarization is not famous as well.
Background
(a)
Background
(b)
Figure 2.6: Document image with strong degradation type: (a) Ink bleed-through degradation, (b) Stain
degradation.
24
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Figure 2.7: Local and global binarization methods: (a) (b) Original degraded document images, (c) (d)
Respective histograms, (e) (f) Binarized document image using a global method, and (g)(h) Binarized document
image using a local method.
25
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
We also notice hybrid approaches using global and local methods [45]. Moreover, it is worth
noting that the global-based methods are generally used as an enhancement (pre-processing)
step providing a stage as a rough binarization, or for any other purpose for instance text line or
Region Of Interest (ROI) detection. The techniques and the methods based on global and local
(adaptive) thresholding for degraded document binarization will be detailed in the following
sub-section.
where 𝑚(𝑥, 𝑦) and 𝜎(𝑥, 𝑦) are respectively the mean and the standard deviation of pixels
estimated within a centered window, namely 𝑊0, at the pixel’s coordinates (𝑥, 𝑦). The size of
W0 and the value of the parameter k are tuned experimentally [54]. Niblack’s method is
illumination sensitive and introduces a good deal of noise. Moreover, the binarization results
are worse for complex degradations such as faint characters and ink bleed-through or show-
through degradations. Generally, Niblack's method is used as a rough binarization method
because of its ability to binarize the document well enough without altering the foreground.
Sauvola’s method [55] is an improvement on Niblack’s method, especially in dealing with
stained documents when the standard deviation is adapted. Sauvola’s method threshold is given
by the following equation:
𝜎(𝑥, 𝑦) (2.3)
𝑇(𝑥, 𝑦) = 𝑚(𝑥, 𝑦) (1 − 𝑘 (1 − ))
𝑅
where 𝑅 is a global constant used to normalize the standard deviation and 𝑘 is defined by
experiments, generally equal to 128. The binarization results are improved especially for high-
contrasted and stained documents. However, the method still fails in the case of strong
degradations, such as ink bleed-through, show-through, and faint character degradations.
Wolf’s method [18] is an improvement on Sauvola’s method, it allows solving the problem
26
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
when the gray level of the foreground and background are closer by normalizing the global
contrast of the document image. The threshold is given by the following equation:
𝜎(𝑥, 𝑦)
𝑇(𝑥, 𝑦) = (1 − 𝑘)𝑚(𝑥, 𝑦) + 𝑘. 𝑀 + 𝑘. (𝑚(𝑥, 𝑦) − 𝑀) (2.4)
𝑅
where 𝑅 is set to the maximum standard deviation of all local neighborhoods and 𝑀 is the
minimum gray level value of the image pixels. However, since the normalization is global
(𝑀 and 𝑅 values are global), a small stained patch on the document will significantly alter the
entire image. Roughly speaking, adaptive methods can be considered more accurate than |global
methods. Though, in many cases, the estimation of the adaptive thresholding can fail drastically
in areas with low variance. Niblack’s method retains a lot of noise from the background,
Sauvola’s method produces better results, while Wolf’s method is the best. specifically, to
stained documents. However, strong degradations as ink bleed-through and faint characters
could not be removed in most cases.
Otsu’s method is one of the widely-used global methods [56] based on computing a global
threshold to maximize inter-class variance. Similarly, Kapur’s method [57] uses entropy-based
on the probability distribution of the document image’s gray level. Both global threshold
methods are powerful for documents with a bimodal distribution of gray levels. When the
distribution of gray levels is not bimodal, adaptive techniques are more appropriate that can
adjust locally the threshold according to various measures in certain local regions around each
pixel. In this case, various methods have been developed, for instance, in [12] the author has
modified Otsu's global method to make it adaptive by considering the background estimation
and the stroke width of the foreground-script. However, with classical thresholding-based
methods, many insufficiencies remain. Alternatively, the authors in [16] proposed a new
method inspired by Niblack’s in which the standard deviation is replaced by entropy. The
obtained results are promising and could outperform classical threshold-based methods. Similar
methods using other statistics than mean and standard deviation are proposed in [58, 59].
In [42], as depicted in Figure 2.8, the Curvelet transform is combined with Otsu's method
to binarize non-uniform illuminated images. The Curvelet coefficients are extracted from the
degraded document image and then the adaptive nonlinear functions are applied for histogram
adjusting and denoising. Then, after reconstitution of the image using the inverse Curvelet
transform, the Otsu method is applied for binarization. The Curvelet transform and nonlinear
enhancing improved the distribution of the document image histogram. The authors noticed that
the proposed method leads to better results than classical thresholding-based binarization
27
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
methods, however, errors exist mainly in the contours of the image as shown in Figure 2.9.
Enhancement
Low
frequency Non-linear
coefficients denoising
transform
Degraded Inverse
document Curvelet Curvelet Otsu’s
image transform Transform method
High
frequency Non-linear
coefficients denoising
transform
(a) (b)
Figure 2.9: Curvelet transform-based binarization: (a) Original image, (b) Binarized image [42].
In recent years, the previous methods have often been combined or used as a rough
binarization method and/or as an enhancement (pre-processing) or post-enhancement (post-
processing) step to obtain an improved binarized image, as reported in [24-26] and in different
DIBCO competitions[1] [4-8] [60, 61].
The winners of recent DIBCO events achieved their results partially by using these
threshold-based methods with the estimation of the ink and background classes and by using a
dynamic sliding window to enable more accurate individual pixel classification. Other works
reported combining classical methods with other information from the foreground or the
28
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
background of the document image. For instance, an adaptive and parameter-less generalization
of Otsu's method is achieved by combining multi-parameters of the document [12].
In [14], both background and text stroke information, are estimated within a simple
threshold-based method followed by post-processing stages to improve the quality of the
binarization. In [29] the authors addressed the problem of the faint character of handwritten
document images by using inpainting-based methods; the stroke and background are estimated
within a combination of Niblack’s and Otsu’s methods. In [62] the author makes use of adaptive
image contrasts where an adaptive contrast map is constructed for an input degraded document
image. The contrast map is then binarized and combined with Canny's edge map to identify the
text stroke edge pixels. The document text is further segmented by a local threshold that is
estimated based on the intensities of detected text stroke edge pixels within a local window.
The method is simple and has been tested on three public datasets (DIBCO-2009, DIBCO-2011,
and H-DIBCO-2010), achieving scores that outperform or are close to the best ones in the three
contests. In [63] the author presents an automatic technique for setting the best parameters
suited to the individual image that yields outperformance of the state of the art. Moreover, in
[64], a learning framework is introduced for the optimization of the binarization methods.
clusters, because mean-shift discovers it automatically [66, 67]. An example of performing the
k-means based classification with k=3 is shown in Figure 2.10. The k-means method is applied
on degraded document images, for instance in Figure 2.10(a). According to the histogram
shown in Figure 2.10(c), we assume that each histogram peak represents each class of
document’s layer of the degraded gray-level document image (Figure 2.10(b)), namely:
foreground(binarized document image), background, and ink-bleed through degradation which
are represented respectively by Figure 2.10(d), Figure 2.10(e), and Figure 2.10(f). Effectively
we notice, visually that these classes match well the previous assumption and each layer is well
extracted particularly we notice the well-binarized image (Figure 2.10(d)). However,
unfortunately, when we perform the same methodology on another document image as Figure
2.10(g), the expected layers represented respectively by Figure 2.10(j), Figure 2.10(k), and
Figure 2.10(l) do not match the assumption despite the presence of histogram peaks similarly
as previous as shown in Figure 2.10(i) and the binarized image is so bad as depicted in Figure
2.10(j). We conclude that the classes generated by k-means do not always correspond to the
various layers of a degraded document with the type of ink-bleed-through degradation.
Nevertheless, the method can be very useful for an enhancement (pre-processing) step.
In [68] the authors used an adaptative binarization-based method and combined it with a k-
means based method to avoid setting any threshold. In [69] the authors combined Otsu’s
global method with the k-means based method. In [70] a novel binarization technique
combining local and global approaches using the clustering algorithm k-means, where the
document is first, divided into several blocks, and a k-means algorithm is applied on each block,
then a global phase gathers the obtained k-means results from each block iteratively and
performs a loop until a global convergence is reached. Similar algorithms are presented in the
competition H-DIBCO 2012 [6]. It is worth noting that the k-means based methods are often
combined with other methods and they are powerful on degraded documents with uniform
backgrounds and especially for ink-bleed-through degradations. In [71] the author presented a
method based on a k-means algorithm that is applied sequentially by using a sliding window
where color sample features, set manually, are defined for each class. In [45] the author
presented a method based on the k-means clustering algorithm and the principal component
analysis. The author addressed the problem of the color degraded document where the PCA is
used mainly to decorrelate the colors according to Red, Green, Blue (RGB) features, and then
a clustering based on k-means is used. The method shows good results on documents with ink-
bleed-through degradation and where the foreground is represented by a single color. On the
other hand, the method fails to process multicolored recto documents. However, we have
30
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
noticed that most of these unsupervised approaches, especially when not combined with other
methods, still present difficulties in distinguishing text from non-text components.
(a) (g)
(b) (h)
(c) (i)
(d) (j)
(e) (k)
(f) (l)
Figure 2.10: Binarization based on k-means method: (a) (g) Original degraded document images, (b)(h) Gray
level degraded document image, (c)(i) Respective histogram, (d)(e)(f) and (j)(k)(I) Respective degraded document
layers (foreground, Background, ink-bleed through degradation).
31
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
In [81] the authors combine k-means classification with a classical binarization method to
generate a pure learning set and a conflict class. The SVM classifier is used to manage the
conflict class to make the final binarization that classifies each pixel of the image document as
foreground or background. Experiments are conducted on the standard datasets DIBCO-2009
and DIBCO-2011.
In [82] the authors addressed the problem of severe varying illumination in degraded
document images. The proposed method divides an image into several regions and decides how
to binarize each region with an Otsu’s thresholding-based method. The decision rules are
derived from a learning process based on SVM which is trained using as input the 03 features:
Otsu’s threshold of the region, the min of Otsu’s threshold of the neighboring, standard
deviation, and mean; then the action is used to select one of the Otsu based methods. According
to the authors, favorable results are obtained. Nevertheless, the evaluation is achieved
subjectively and in terms of the OCR performance.
Many other methods using neural networks are proposed in the literature [83-86]. In [85] the
authors use a back-propagation neural network to directly classify image pixels according to
their neighborhood. For each pixel p of the image, the Multi-Layer Perceptron (MLP) is fed, by
the gray values of the pixel p with those of its neighbors. The MLP should then output the value
0 for black or 1 for white. The method is tested on synthetic data and compared mostly against
the classical thresholding-based methods and the authors say that the results are promising
although it is ranked third after Sauvola’s method. In [86] where a neural network is trained
using local threshold values of an image to determine an optimum global threshold value which
is used to binarize the whole image.
Deep learning and combined based methods are addressed by authors, in [87] to enhance a
document’s quality by combining Otsu’s method and CNN based method. The performance
was evaluated according to the OCR accuracy and the tests were carried out on documents taken
from a mobile camera and newspapers. In [88] the authors propose a method based on a deep
convolutional neural network (DCNN) for adaptive binarization of degraded document images.
The method consists of decomposing a degraded document image into a spatial pyramid
structure by using DCNN, with each layer at different scales. Then the foreground image is
sequentially reconstructed from these layers by using deconvolutional network. Experimental
were carried out on DIBCO datasets and compared only against Sauvola’s and Otsu’s methods
were the results demonstrate the effectiveness of the proposed method.
In [89] the authors propose a method based on a CNN composed of two groups of
convolutional layers and a fully connected layer, a sliding window centered at the classified
pixel is used within the CNN to classify each pixel into foreground and background. In [90], a
33
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
34
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
The image binarization has been performed by extracting texture features from the run-
length histogram combined with Otsu’s method. Candidate thresholds are produced using
iterative Otsu’s method; and then a selection of the optimal candidate is carried out using a
texture feature extracted from the run-length histogram. Experiments with 9,000 printed address
blocks from an unconstrained U.S. mail stream demonstrated that over 99.6 percent of the
images were successfully binarized. The method is mainly compared to Otsu’s method.
Another method, based-texture [101] where the authors proposed a color text images
binarization method which is composed of main steps, as the color space dimensionality
reduction, extraction of texture characteristics, and selection of the optimal binary image. Two
types of effective texture characteristics, respectively, run-length histogram and spatial size
distribution, are extracted. Experiments were carried out on a text images dataset of more than
500 colors, the results are very promising. In [102] a texture-based LBP is applied on degraded
document images to extract a region of interest (ROI) on which a rainfall process is performed
iteratively and then a threshold is applied to produce the binarized image. The authors say that
the LBP descriptor is a good descriptor as a text extractor. In our work we used texture and
namely LBP, Co-occurrence and Gabor filter within a thresholding-based method to binarize
ancient degraded document images [59, 103-105], these methods will be detailed in the
following chapters.
35
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
evaluation, two protocols are used; they are, based on, the use of the blind and the unblind
datasets [111]. The blind dataset is a collection of DIBCO degraded images provided according
to years of submission to the contest, where the degradation type is unknown. By contrast, the
unblind dataset is a collection of degraded images grouped according to the degradation type
[111].
(a) (b)
Figure 2.12: Sample of a DIBCO-2016 degraded document image with its ground truth: (a) Original color
degraded document image, (b) Ground truth binarized document image.
Figure 2.13: Sample of a degraded document image from the National Library of Algeria (BNA) used for a
subjective evaluation.
36
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
The effectiveness of the proposed method is shown by comparing the results against the
well-known thresholding-based methods on unblind and blind datasets as well as a comparison
with some of the state-of-the-art methods is achieved on blind datasets. It should be noted that
it is more appropriate to compare the performance of a binarization method based on the unblind
datasets, as mentioned by the authors in [111]. The authors demonstrated that there is an
ambiguity in evaluating the binarization methods on blind datasets and proved that a
binarization method can perform better when it is performed on a specific degradation type.
For this purpose, the authors [111], have categorized the DIBCO datasets by type of
degradations. Where each document image is represented by its dominant degradation type.
Four main degradations types are specifically defined such as, Stain, Ink bleed-through; Non-
Uniform- Background and Ink intensity variation, namely respectively Type A, Type B, Type
C and Type D. To this we add another category, namely the mixed category which contains
documents affected by several types of dominant degradation.
Table 2.1, reports the assignment of DIBCO images organized according to the year of the
contest, the number of images for each dataset denoted by #images, the written type (WT)
denoted by P and W respectively for printed and handwritten documents. It shows also the
image dataset organization according to the degradation type and their position in the dataset
[111]. For instance, as depicted in Figure 2.14, the handwritten image HW08 indexed 8 in
DIBCO-2013 belongs to degradation (Type B), the handwritten image H04 indexed 04 in
DIBCO-2010 belongs to (Type D), the printed image PR06 from DIBCO-2011 belongs to
(Type C) and the image HW6-DIBCO-2011 indexed 06 belongs to mixed degradation (Type
D+A). It is worth noting that these categorized datasets could be used in various steps of design
such as training steps, parameters setting steps, evaluation steps.
Table 2.1: DIBCO dataset classification according to the degradation type [111].
Dataset WT #Images Degradation type
year A B C D Mixed
2009 P 5 4 1; 2; 5 3
H 5 4 2; 3 1 5(A+C)
2010 H 10 6 5 1; 3 2; 4; 7-10
2011 P 8 3;5 1; 2 4; 6; 7 8
H 8 4; 5 7 1 2; 3; 8 6(A+D)
2012 H 14 2; 8 3; 4; 7; 9-13 [4] (A+D)
2013 P 8 4; 5; 8 1; 2; 6 3 7
H 8 4; 5; 8 2 1;7 3(B+D); 6(B+D+C)
37
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
(c) (d)
Figure 2.14: Example of degradation type categorization: (a) Type B, (b) Type D, (c) Type C, (d), Type Mixed
(D+A).
𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛. 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹𝑀 = 2. (2.5)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
𝑇𝑃
where 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and 𝑅𝑒𝑐𝑎𝑙𝑙 are defined respectively as 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 and 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
. 𝑇𝑃, 𝐹𝑃, 𝑎𝑛𝑑 𝐹𝑁 denote, respectively, True Positive (occurs when both the image
𝑇𝑃+𝐹𝑁
binarized pixel and the ground truth are labeled as foreground), False Positive (occurs when the
38
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
image pixel is labeled as foreground and the ground truth is background) and False Negative
(occurs when the image pixel is labeled as background but the ground truth is foreground).
The PSNR measure denotes the similarity between the binarized and the ground truth image
(the higher the PSNR, the higher the similarity), which is defined as:
𝐶2 ∑𝑀 𝑁
𝑖=1 ∑𝑗=1(𝐵(𝑖,𝑗)−𝐺𝑇(𝑖,𝑗))
2
𝑃𝑆𝑁𝑅 = 10. log (𝑀𝑆𝐸 ), 𝑤ℎ𝑒𝑟𝑒, 𝑀𝑆𝐸 = (2.6)
𝑀×𝑁
where 𝑀 × 𝑁 is the size of the binarized document image 𝐵(𝑖, 𝑗) and the binarized ground truth
𝐺𝑇(𝑖, 𝑗), and 𝐶, denotes the gray-level difference between foreground and background.
The DRD measure denotes the distortion for all the flipped pixels (the lower the DRD, the
higher the similarity), which is defined as follows:
Where, 𝐷𝑅𝐷𝑘 is the distortion of the 𝑘 𝑡ℎ flipped pixel and 𝑆 is the number of flipped pixels.
The parameter, 𝑁𝑈𝐵𝑁 is the number of the non-uniform 8*8 blocks in the Ground Truth (GT)
image. The distortion of the 𝑘 𝑡ℎ flipped pixel is defined as a weighted sum of pixels in the 5*5
block 𝑊𝑁𝑚 of the GT that differs from the centered 𝑘 𝑡ℎ flipped pixel at (𝑥, 𝑦) in the binarized
image 𝐵, as follows:
2 2
𝐷𝑅𝐷𝑘 = ∑ ∑ |𝐵𝑘 (𝑥, 𝑦)−𝐺𝑇𝑘 (𝑖, 𝑗)| ∗ 𝑊𝑁𝑚 (𝑖, 𝑗) (2.8)
𝑖=2 𝑗=2
In this chapter, we depicted, as a solution, the different stages based on digitization and
binarization. We showed the possibilities and the difficulties as well as the challenges
encountered by researchers in this field. So, we highlighted recent publications and the
challenges they have faced. It is worth noting that previous methods are based on a spatial
representation of the document image in which pixel gray level intensity or on simple pixel
neighborhood information is often used combined with mainly classical well-known
thresholding-based methods and/or with certain heuristic features such as stroke width and
background estimation. Moreover, it should be noted that there is no generic method to process
39
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION
all the degradations, moreover, the problem is more complicated with severe degradations.
Therefore, taxonomy has been developed and presented.
In conclusion, we confirmed that most of the methods are generally a succession of different
methods and predominantly classical thresholding methods are combined or used as an
approximate binarization method.
To the best of our knowledge we noticed, texture and frequency space features have received
little attention to date and few works related to degraded document image binarization have
been reported, which motivated our choice to develop them further, while trying to avoid the
use of other methods, in particular the classical methods based on thresholding, as a rough pre-
processing.
40
Abstract
To solve the drawback of using simple pixel's neighborhood information. The proposed
method is an adaptive threshold-based which is computed by using a descriptor based on a co-
occurrence matrix. The method is tested objectively using degraded documents from DIBCO
datasets and subjectively using a set of ancient degraded documents provided by a national
library. The results are satisfactory and promising, and present an improvement compared to
classical well-known methods.
First and second-order statistics are among the most used descriptors in texture in image
segmentation and particularly in document image segmentation [113-118]. The co-occurrence
matrix is one of the well-known and most widely used for texture descriptors. The spatial gray
level co-occurrence matrix (𝐺𝐿𝑀𝐶) is a second-order statistics-based method used for
generating texture features. It makes it possible to draw up a new matrix starting from the
whole image or part of the latter. The construction of the co-occurrence matrix considers the
orientation and the spatial distribution of the pixels.
41
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
The 𝐺𝐿𝐶𝑀 represents the joint frequencies of all pairwise combinations of gray levels
intensities within a distance 𝑑, representing the number of pixels, and along a direction θ, as
shown in Figure 3.1. In other words, the 𝐺𝐿𝑀𝐶 matrix, estimates image properties based on the
joint probability of gray levels occurrences of two pixels in a given direction and distance
defined by the vector distance 𝑑⃗ = (∆𝑥, ∆𝑦), or a distance 𝑑 and angle θ as seen in Figure 3.1.
(b)
(a)
Figure 3.1: Principle scheme of co-occurrence matrix: a) A sample of the sliding window (size=5x5) showing
the 08 directions with a distance 𝑑 = 2, b) the sliding matrix within the matrix image.
Given a gray level image represented by a matrix 𝐼(𝑥, 𝑦) of dimension 𝑁x𝑀 as shown in Figure
3.1(b). Each component of 𝐺𝐿𝑀𝐶(𝑖, 𝑗), matrix of dimension 𝑁𝐺 x 𝑁𝐺 (where 𝑁𝐺 is the number
of grayscale of the image 𝐼(𝑥, 𝑦) ), represents the frequency of occurrences of a pair of pixels
gray levels in a spatial relation separated by distance d and angle θ, is computed for one
direction of the neighborhood vector distance 𝑑⃗ = (∆𝑥, ∆𝑦) or computed in term of a distance
𝑑 with angle 𝜃 by the following equation.
An example of a matrix of the 𝐺𝐿𝑀𝐶 matrix is depicted in Figure 3.2, where a given gray
level image represented by a matrix 𝐼(𝑥, 𝑦), where each pixel is coded on 𝑛 bits (where 𝑛 =
02), i.e. the gray-level pixel belongs to [0 3], a 𝐺𝐿𝑀𝐶 matrix is computed according to Eq(3.1),
for a given distance of one pixel (𝑖𝑒, 𝑑 = 1) and for a direction such as angle 𝜃 = 0° as
delighted in 𝐼 matrix, we noticed a frequency of two gray-levels values of the couple (1 2).
42
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Figure 3.3: Cooccurrence matrix image on degraded document image : (a) Original image, (b) Cooccurrence
matrix image.
43
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
From the co-occurrence matrix, we compute some relevant, Haralick’s parameters [119],
some of these textural features are given by the following equations.
𝑁𝐺 −1 𝑁𝐺 −1
(3.2)
𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡(𝐺𝐿𝑀𝐶) = ∑ ∑ (𝑖 − 𝑗)2 𝐺𝐿𝑀𝐶(𝑖, 𝑗)
𝑖=0 𝑗=0
𝑁𝐺 −1 𝑁𝐺 −1
(3.3)
𝑚𝑒𝑎𝑛(𝐺𝐿𝑀𝐶) = ∑ ∑ 𝐺𝐿𝑀𝐶(𝑖, 𝑗)
𝑖=0 𝑗=0
𝑁𝐺 −1 𝑁𝐺 −1
(3.4)
𝑒𝑛𝑒𝑟𝑔𝑦(𝐺𝐿𝑀𝐶) = ∑ ∑ 𝐺𝐿𝑀𝐶(𝑖, 𝑗)2
𝑖=0 𝑗=0
𝑁𝐺 −1
(3.5)
𝑢𝑛𝑖𝑓𝑜𝑟𝑚𝑖𝑡𝑦(𝐺𝐿𝑀𝐶) = ∑ 𝐺𝐿𝑀𝐶(𝑖, 𝑖)2
𝑖=0
𝑁𝐺 −1 𝑁𝐺 −1
𝐺𝐿𝑀𝐶(𝑖, 𝑗) (3.6)
homogeneity (𝐺𝐿𝑀𝐶) = ∑ ∑
1 + |𝑖 − 𝑗|
𝑖=0 𝑗=0
The new binarization method based on texture and particularly co-occurrence matrix for
ancient degraded documents is the aim of the proposed method which addresses the document
binarization field by proposing a new threshold-based method. The method is based on
document image pixel’s texture features extracted from the image. The main motivation for
using the co-occurrence matrix is due to its high sensitivity to capture directional texture
information and for its power discrimination. The proposed method is a pixel-wise adaptive
threshold method based on texture features, its overall scheme is shown in Figure 3.4.
Let 𝑚(𝑥, 𝑦) and α, be respectively the mean and a texture feature estimated for each pixel
within a neighborhood window namely 𝑊, respectively from both degraded document image
44
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
and the extracted co-occurrence matrix. Then the binarization threshold is deduced using one
of the classical well-known methods namely Niblack.
Grayscale conversion
Co-occurrence matrix
𝐺𝐿𝑀𝐶(𝑖, 𝑗)
mean
𝑚(𝑥, 𝑦) Texture feature
𝑇(𝑥, 𝑦) = 𝑓(𝑚, 𝛼)
No
Background Foreground
In summary, the proposed method is performed according to the following steps as depicted
in Figure 3.4. Main details will be given in the following subsections.
The set of representative degraded images is selected from the DIBCO-2009, DIBCO-2010,
DIBCO-2011 datasets.
46
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Figure 3.5, depicts the steps for finding the best parameters of the thresholding-based method.
F-Measure (FM) is used as a metric evaluated between the binarized image and the
corresponding ground truth.
Set of degraded
degraded images
Degraded images
Co-occurrence matrix
Set parameters
Haralick’s attributes
Binarized image
F-Measure (FM)
Figure 3.5: Overall steps related to the parameters optimization of the threshold-based binarization method.
47
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
as seen in Table 3.1, for θ=0°,45°, 90°, and 135° and 𝑑 = 1; then the best attribute will be
chosen.
According to the results such as depicted, the contrast attribute is the best discriminative
(The greater difference between the Background and Foreground, compared with other
Haralick’s descriptors). Therefore we will consider only this attribute and we define
experimentally ∝ = √𝑐𝑜𝑛𝑡𝑟𝑎𝑠𝑡(𝐺𝐿𝑀𝐶) in Eq.(3.6), which gives better results.
(a) (b)
Figure 3.6: Sample of background and foreground document image: (a) Background, (b) Foreground.
Table 3.1: Haralick’s attributes values for Background and foreground images.
Co-occurrence Contrast Mean uniformity Energy
direction
θ=0 Background 0.0002 0.71 0.485 0.45
Foreground 0.0104 0.97 0.075 0.04
θ=45° Background 0.0001 0.51 0.119 0.49
Foreground 0.0060 0.97 0.025 0.04
θ=90° Background 0.0001 0.71 0.485 0.25
Foreground 0.0100 0.97 0.005 0.07
θ=135° Background 0.0001 0.81 0.650 0.29
Foreground 0.0096 0.90 0.060 0.04
48
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
100,00
90,00
80,00
70,00
60,00
F-Measure
50,00
40,00
30,00
20,00
10,00
0,00
0 10 20 30 40 50 60 70 80
Window size WxW
90
80
70
60
F-Measure
50
40
30
20
10
0
-2,5 -2 -1,5 -1 -0,5 0
k constant value
49
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Table 3.2, shows the experiment results of the most well-known local classical methods. The
evaluation is performed on the organized DIBCO data set by the type of degradations according
to [111]. The cumulative rank is computed for each type of degradation. It consists to compute
two distinct measures of evaluation 𝐹𝑀 and 𝑃𝑆𝑁𝑅, for each method and measure we compute
the rank 𝑅(𝑖, 𝑗), where 𝑖 is the method and 𝑗 is the measure, the final ranking 𝑅𝑎𝑛𝑘(𝑖; 𝑗) is
computed by sorting the summation of ranking values for all measures Table 3.2, we can notice
that the best results for various types of degradations such stain, ink bleed, non-uniform
background, are obtained by Nick’s method.
The method is implemented, as mentioned by using the equation Eq.(3.6) and all the
parameters are set up by experiments. We have computed the best value of 𝑘 and the size of the
window 𝑤, for each direction {0°, 30°, 45°, 135°}, that provides the best F-measure value. Two
variants of Haralick’s attributes are selected: contrast and mean. The best results are for the
contrast attribute, so we will detail the results only for this attribute, as shown in Table 3.3.
We notice that the results are enhanced and promising. The proposed method based on
contrast attribute and for angle 𝜃 = 135°, outperforms Nick’s method for all degradation types,
except for the stain degradation. We notice that some weaknesses exist for stain degradation
type and our method is ranked 2nd. For subjective evaluation, a sample of processed document
image as shown in Figure 3.9, Figure 3.10, Figure 3.11, Figure 3.12, and Figure 3.13, confirm
the quantitative results.
Table 3.2: Evaluation results of well-known methods for binarization by type of degradations.
Degradation Method FM PSNR Rank
Bernsen 64.14 9.64 4
Niblack 58.13 7.83 5
Stain Sauvola 81.60 13.60 2
Wolf 80.96 12.49 3
Nick 81.98 14.49 1
Bernsen 62.30 10.17 4
Niblack 54.89 7.86 5
Ink-bleed through Sauvola 84.40 15.14 2
Wolf 83.20 15.08 3
Nick 84.41 15.16 1
Bernsen 43.26 7.91 4
Niblack 37.88 6.16 5
Non-uniform background Sauvola 82.73 16.31 3
Wolf 84.50 16.5 1 2
Nick 85.14 17.06 1
50
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Table 3.3: Evaluation results of the proposed method and Nick’s method, for binarization by type of
degradations.
Degradation type Method FM PSNR Rank
Stain Nick 81,98 14.49 1
M1: Contrast 75.71 13.38 3
θ = 0°
M2: Contrast 74.69 13.28 5
𝜃 = 30°
M3: Contrast 75.61 13.35 4
θ = 45°
M4: Contrast 77.76 14.40 2
θ = 135°
Ink-bleed through Nick 84.41 15.16 5
M1: Contrast 91.702 16.07 2
θ = 0°
M2: Contrast 91.60 16.04 4
θ = 30 °
M3: Contrast 91.65 16.05 3
θ = 45°
M4: Contrast 92.56 16.09 1
θ = 135°
Non uniform background Nick 85.14 17.06 4
M1: Contrast 91.60 17.10 2
θ = 0°
M2: Contrast 91.30 15.065 5
θ = 30°
M3: Contrast 91.35 15.65 3
θ = 45°
M4: Contrast 91.70 17.40 1
θ = 135°
(a)
(b)
Figure 3.9: Sample for a subjective evaluation on degraded document images from Dibco-2009 datasets with
a distance = 1, window size = 41 and k=-1: (a) Haralick’s contrast feature with angle = 0° and (b) ) Haralick’s
contrast feature with angle = 135°.
51
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(c) (d)
Figure 3.10: Sample for a subjective evaluation on degraded document images from Dibco-2011 datasets
with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s contrast feature with angle = 0° and (b)
Haralick’s contrast feature with angle = 135°.
(g) (h)
Figure 3.11: Sample for a subjective evaluation on degraded document images from Dibco-2009 datasets with
a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean feature with angle = 0° and (b) Haralick’s
mean feature with angle = 135°.
(e) (f)
Figure 3.12: Sample for a subjective evaluation on degraded document images from Dibco-2011 datasets with
a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean feature with angle = 0° and (b) Haralick’s
mean feature with angle = 135°.
52
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
(c)
Figure 3.13: Sample for a subjective evaluation on degraded document images with angle = 135°, distance =
1, window size = 41x41 and k =-1 : (a) original image, (b) binarized image using Nick, (c) Proposed Method.
We have presented a new thresholding method based on texture. This latter is inspired by
Niblack’s method and enhanced by using a robust texture descriptor based on the co-occurrence
matrix. We have used a co-occurrence matrix as a texture to compute some of Haralick’s
attributes, such as contrast and mean for a distance-vector module equal to one with four
directions {0°, 30°, 45°, 135°}. The best parameters are defined by experiments. We notice that
the sliding window size is equal to 41x41, for angle 135°, and the method based on Haralick’s
contrast attribute performs better results. The results are promising and satisfactory.
Nevertheless, we notice some weaknesses in the stained document category. For further works,
we will focus a little bit more on such degradation and we will prospect more robust texture
attributes
53
54
Abstract
This chapter aims to present a new binarization method for degraded documents,
based on Local Binary Pattern (LBP) as a texture measure. The mean and variance
of pixels are computed respectively from the original document image and the LBP
image. Then, these features are used within a threshold-based method to perform a
binarization.
Because of the discriminative power, the computational simplicity, and the popularity of the
Local Binary Pattern (LBP) in the field of image segmentation and identification, we intend to
investigate it as a texture measure to improve the discrimination of the foreground from the
background to obtain accurate results in the field of ancient degraded document image
binarization. The LBP space is used for estimating the features by computing the variance. The
proposed method consists of computing a new adaptive thresholding method based on LBP.
More precisely, we use the LBP within Sauvola’s method for estimating the binarization
threshold.
LBP is a form of the Texture spectrum proposed in [120] [121]. The conventional LBP
operator [96] [122] [123] is computed at each pixel location by considering 𝑞𝑝 pixels within a
circular neighborhood of radius 𝑟 around a central pixel value 𝑞𝑐 . It is defined as follows:
𝐿𝐵𝑃(𝑃, 𝑟) = ∑𝑝−1
𝑝=0 𝑠( 𝑞𝑝 − 𝑞𝑐 )2
𝑝 (4.1)
55
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Where 𝑃 is the number of pixels in the neighborhood, 𝑟 is the radius and 𝑠(𝑥) = 1 𝑖𝑓 𝑥 > 0.
otherwise 0.
The following Figure 4.1, describe visually the simple concept of LBP, and Figure 4.2,
shows an example of the LBP image of a degraded document from the DIBCO datasets; where
it should be noted that the LBP image texture information highlights the contours and different
textures of the document image.
(a)
(b)
Figure 4.2: LBP image of degraded document: (a) Degraded document from Dibco-2012 dataset, (b) LBP
image.
56
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Usually, the histogram used jointly with a simple local contrast measure provides the best
performance in texture segmentation. Moreover, the grayscale image variance of the local
neighborhood can be used as a complementary measure [123]. It is also reported that many
related to these binary numbers are used to describe the texture of the image. Many other
variants exist in the literature [96]. It is reported that based on this operator, approaches have
been developed for texture image segmentation, especially in the field of color texture [124]
[125]. However, to the best of our knowledge, there is no work reported yet in the field of
binarization and especially in degraded document binarization.
The proposed method consists of computing a new adaptive thresholding method based on
texture and particularly LBP to binarize ancient degraded documents. The motivation for using
the LBP is because of its discriminative feature as reported in the literature. From the LBP
image, we extract the variance of a pixel within its neighborhood as texture information which
is combined with the mean of this pixel within its neighborhood computed from the original
image, as depicted by the flowchart of Figure 4.3.
Input Degraded Image𝐼(𝑥, 𝑦)
Grayscale conversion
𝐿𝐵𝑃
𝑇(𝑥, 𝑦) = 𝑓(𝑚, 𝜎)
Yes
Background Foreground
57
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Set of representative
degraded images with ground
truth.
Next thresholding
method F-Measure (FM)
58
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Thus, normalization is done by dividing each element of the LBP image by the local contrast
denoted C and defined as follows:
𝐿𝐵𝑃 (4.2)
LBP_C =
𝐶
𝐶𝑝 𝐶𝑛
Where, 𝐶 = 𝑁 − and where, 𝐶𝑝 is the sum of pixels value ≥ 𝑞𝑐 and 𝑁𝑝 is the number of
𝑝 𝑁𝑛
(a)
(b)
(c)
(d)
Figure 4.5: Sample of LBP and original image variances: (a) original image, (b) original image variance, (c)
LBP image, (d) LBP variance.
59
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
𝐼(𝑥, 𝑦)
1-LBP
2-LBP-C
𝑚(𝑥, 𝑦)
𝜎(𝑥, 𝑦)
Figure 4.6: Flowchart for estimating mean and variance from original and LBP images.
To perform a binarization by considering the better of the two methods, we propose to use a
method based on the Sauvola thresholding to generate a binarized image that will be used as a
reference image to compare the SLBP against SLBP-C, using the F-measure. The
implementation of this method namely SLBP-SLBPC is depicted by the flowchart shown in
Figure 4.7.
60
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
LBP-C LBP
FM FM
FM optimal
61
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
• Experimental evaluation using FM measure between the binarized image and the
corresponding ground truth.
Set of
representative
degraded images
with ground truth.
First set the initial 𝑊 = 19 and then vary the 𝑘 parameter to the optimal value 𝑘_𝑜𝑝t.
(Once done, repeat the flowchart by
Initializing 𝑘 = 𝑘_𝑜𝑝𝑡 and vary 𝑊 to the optimal value 𝑊_𝑜𝑝𝑡).
Binarization Ground-Truth
image
F-Measure (FM)
Next value of 𝒌 or 𝑊
According to Figure 4.8, given an initial value of the windows ‘size 𝑊 = 19𝑥19 (according
to the literature results related to Sauvola’s thresholding-based method), we vary the k value
until the optimal value 𝑘_𝑜𝑝𝑡, by performing a binarization, then computing the F-Measure
between the binarized image and the corresponding ground truth image to select the optimal k
value, considering that the higher the value FM is, the better the k parameter. Once the k
parameter has been selected, the same procedure is used to define the optimal size of the sliding
windows 𝑊_𝑜𝑝𝑡, but this time by setting the value 𝑘 to 𝑘_𝑜𝑝𝑡 and varying 𝑊 until the 𝑊_𝑜𝑝𝑡.
This experiment leads to 𝑊 = 41𝑥41 and 𝑘 = 0.2.
Figure 4.9 shows an example on a document image H02 from Dibco-2009, where W, the
sliding window size, is initialized at 𝑊 = 19𝑥19 and 𝑘 varying from 0.1 to 0.6 with a step of
0.1 to find the 𝑘_𝑜𝑝𝑡; while the Figure 4.10 shows an example on the same document image
where 𝑘 was fixed at 𝑘_𝑜𝑝𝑡 = 0.2 and 𝑊 varies between 3 and 51 with a step of 0.1 in order to
find the 𝑊_𝑜𝑝𝑡.
62
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
92
91
90
Fmeasure
89
88
87
86
85
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7
value of k
90
88
86
84
F-measure
82
80
78
76
74
0 10 20 30 40 50 60
Windows size W
63
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
H08-Dibco-2012 P02-Dibco-2009
Niblack
Sauvola
Wolf
Nick
SLBP-C
Figure 4.11: Subjective comparison between classical thresholding-based methods against SLBP-C method.
65
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(c) (d)
(e) (f)
Figure 4.12: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with stain degradation, (c-d) SLBP method; (e-f) SLBP-C method.
(c) (d)
(e) (f)
Figure 4.13: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with Non-uniform background degradation, (c-d) SLBP method; (e-f) SLBP-C method.
66
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(c) (d)
(e) (f)
Figure 4.14: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with Ink fading degradation, (c-d) SLBP method; (e-f) SLBP-C method.
(c) (d)
(e) (f)
Figure 4.15: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with Ink-bleed through degradation, (c-d) SLBP method; (e-f)S LBP-C method.
67
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Table 4.1: Performance evaluation of the proposed method against the classical threshold based on DIBCO
datasets.
Datasets Method FM PSNR DRD
Sauvola 77.55 15.65 7.23
Niblack 40.97 6.06 106.02
DIBCO 2009 Wolf 81.59 16.38 3.86
S-LBP 79.12 15.67 7.33
S-LBP-C 85.53 16.40 3.12
Sauvola 77.53 15.80 5.44
Niblack 38.08 5.79 107.03
DIBCO 2011 Wolf 83.71 16.40 4.53
S-LBP 76.63 15.13 6.37
S-LBP-C 82.07 15.59 8.13
Sauvola 60.69 15.44 10.31
Niblack 31.35 5.79 109.37
DIBCO 2012 Wolf 71.05 16.11 8.32
S-LBP 68.42 15.68 8.50
S-LBP-C 80.78 17.06 5.90
Sauvola 83.12 16.94 5.23
Niblack 36.62 5.86 111.95
DIBCO 2013 Wolf 8 1.60 16.20 4.73
S-LBP 79.76 16.08 10.18
S-LBP-C 84.35 17.04 4.07
Table 4.2: Performance evaluation of the proposed method against the classical threshold-based methods
according to the degradation type.
Degradation type Method FM PSNR DRD
Sauvola 83.12 15.80 4.82
Niblack 37.58 05.91 96.40
Stain Wolf 85.88 15.3 1 4.26
S-LBP 75.97 13.21 5.12
S-LBP-C 88.17 16.00 4.10
Sauvola 82.26 15.75 4.12
Niblack 37.55 6.10 115.04
Ink-bleed
Wolf 82.20 15.78 4.42
through
S-LBP 88.53 17.96 3.80
S-LBP-C 85.81 16.86 4.10
Sauvola 74.82 14.54 6.51
Niblack 32.38 05.46 144.61
Non-uniform
Wolf 74.26 15.73 4.69
Background
S-LBP 78.09 14.89 4.69
S-LBP-C 78.85 15.80 4.50
Sauvola 57.57 14.27 10.91
Niblack 32.60 06.11 91.55
Ink Wolf 63.83 14.81 9.64
S-LBP 73.92 14.71 5.01
S-LBP-C 84.50 17.11 4.25
68
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
The purpose of this chapter is to investigate the use of the LBP operator for estimating the
variance as a texture to be used in a thresholding method for binarizing historical documents.
Two LBP variants, the basic LBP and the modified LPB considering the contrast, combined
with Sauvola’s thresholding-based method are implemented and compared.
In a general way, experimental results conducted on DIBCO datasets prove that the LPB
and its variants can constitute an interesting approach to explore for improving the existing
binarization methods. The results are better for SLBP over SLBP-C for ink-bleed through
degradation, however, their combination outperforms the well-known thresholding-based
methods.
The noticed big gap in terms of all measures compared with Sauvola’s method for particular
document images in terms of FM, should be taken into account for further works.
69
Abstract
For the best discrimination between the foreground and background, a suitable framework
based on the Gabor filter bank is designed by considering the orientation of the foreground-
script and the type of degradation. As the proposed method is based on the joint use of the
Gabor filter combined with the classical local threshold-based method, an overview of the
Gabor filter technique is presented in the following sections.
The Gabor filter was proposed by Dennis Gabor and is considered one of the best methods
for texture analysis and processing [23]. A Gabor wavelet is defined as a complex sinusoidal
plane wave modulated by a Gaussian kernel function as follows.
71
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
1 1 𝑥 ′2 𝑦 ′2 ′
𝐻(𝑥, 𝑦, 𝑓, 𝜃 ) = 𝑒𝑥 𝑝 [− ( 2 + 2 )] . 𝑒 𝑗2𝜋𝑓𝑥 (5.1)
2𝜋𝜎𝜌 2 𝜎 𝜌
where 𝑥 ′ = 𝑥. 𝑐𝑜𝑥𝜃 + 𝑦. 𝑠𝑖𝑛𝜃, 𝑦 ′ = −𝑥. 𝑠𝑖𝑛𝜃 + 𝑦. 𝑐𝑜𝑠𝜃 and (𝑥, 𝑦) are the spatial coordinates.
𝜎 and 𝜌 are the respective standard deviations along the 𝑥 and 𝑦 axis of the Gabor filter’s
Gaussian kernel, describe the size of the Gaussian envelope and define the scale of the filter
along the spatial axis. Additionally, 𝑓 and 𝜃 are the central spatial frequency and orientation
of the filter, respectively.
The two-dimensional Gabor filter is a linear combination performed by convoluting the Gabor
wavelet 𝐻(𝑥, 𝑦) with the input image 𝐼(𝑥, 𝑦) at frequency 𝑓 and angle 𝜃, as follows:
Since the filter is linear, the Gabor filter bank implementation for all possible orientations at a
specific frequency can be written:
𝑁𝜃 −1
72
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
𝜃 𝑁 −1
Let 𝐺(𝑥, 𝑦, 𝑓𝑖 ) = ∑𝑗=0 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) be the Gabor filter Bank for all possible orientations
defined for a specific frequency 𝑓𝑖 . Then, the filtered image for all possible frequencies is
obtained by summing the filtered image for all possible orientations as follows:
𝑁𝑓 −1
(5.5)
𝐼𝐹 (𝑥, 𝑦) = ∑ 𝐺(𝑥, 𝑦, 𝑓𝑖 ) ∗ 𝐼𝐹 (𝑥, 𝑦, 𝑓𝑖 )
𝑖=0
Finally, for a suitable implementation, the filtered image is composed of a double 1D-
convolution, can be written as follows:
𝑁𝑓 −1 𝑁𝜃 −1
(5.6)
𝐼𝐹 (𝑥, 𝑦) = [ ∑ ∑ 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 )] ∗ 𝐼(𝑥, 𝑦)
𝑖=0 𝑗=0
The filtered image thus contains features captured from the degraded image for all possible
frequencies and orientations. Figure 5.1 depicts the general structure of the Gabor filter
implementation.
Degraded Image
𝐼(𝑥, 𝑦)
𝑓0 𝑓𝑖 𝑓𝑁𝑓−1
𝑁𝑓−1 𝑁𝑓−1
𝐻00 0
𝐻𝑁ө−1 1
𝐻𝑁ө−1 𝐻0 𝐻𝑁ө−1
𝐻01
Filtered Image
𝐼𝐹 (𝑥, 𝑦)
Figure 5.1: Simplified implementation of Gabor filter bank. For simplified notation, 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) and
𝐺(𝑥, 𝑦, 𝑓𝑖 ) are denoted respectively 𝐻𝑗𝑖 (𝑥, 𝑦) and 𝐺𝑖 (𝑥, 𝑦), (𝑖 = 0, . . , 𝑁𝑓 − 1) and (𝑗 = 0, . . , 𝑁𝜃 − 1). 𝑁𝑓 and 𝑁𝜃
define the numbers of central frequencies and orientations, respectively.
73
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
A new binarization method based on Gabor filter bank for ancient degraded documents is
the aim of the proposed method which addresses the document binarization field by proposing
a new threshold based-method using a texture feature based on Gabor filters and taking into
account the type of degradation. The proposed method is inspired by the classical ones. The
method is based not only on document image pixel’s neighbors but also on texture features
extracted from the image filtered by the Gabor filters bank.
The main hypothesis for using the Gabor filter is that the foreground has a different orientation
than a background or than any degradation. Indeed, the main motivation for using the Gabor
filter is due to its high sensitivity to capture directional features. Therefore, it has power
discrimination of texture features such as the variance undertaken in Gabor space, which is
generally higher than the original variance taken from the original document image as shown
in Figure 5.2. The overall scheme of the proposed method is illustrated in Figure 5.3: Let
𝑚𝐷 (𝑥, 𝑦) and 𝜎𝐹 (𝑥, 𝑦) be the mean and the standard deviation estimated for each pixel within
a neighborhood window namely 𝑊𝑔 from both degraded and filtered document images. Then
the binarization threshold is deduced using one of the classical well-known methods, i.e.
Niblack, Sauvola, and Wolf's methods. For better clarity, three variants of the proposed method
are namely Niblak-Gabor, Sauvola-Gabor, and Wolf-Gabor, respectively based on Gabor filter
bank combined with the one Niblack’s, Sauvola’s, and Wolf’s methods.
(a)
(b) (c)
Figure 5.2: Estimation of the standard deviation from degraded and Gabor filtered image (a) Degraded image,
(b) Standard deviation estimated from the degraded image, (c) Standard deviation estimated from the Gabor
filtered image.
74
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Grayscale conversion
Pre-processing
Wiener Filter
𝐼𝐹 (𝑥, 𝑦)
𝑚𝐷 (𝑥, 𝑦) 𝜎𝐹 (𝑥, 𝑦)
Yes
Bachground
Foreground
The historical document often contains complex degradations mixed with the foreground
(text) information. Hence, by exploiting the hypothesis that the oriented texture of the
foreground is different than the background’s texture, a better extraction of the foreground (text)
can be achieved by computing the Gabor filter banks for all different angles weighted with the
75
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
dominant slant orientation of the foreground. To catch the maximum of information belonging
to the foreground, the filter is oriented along the most dominant direction of the script slant of
the document image. The sum of the filtered images is then weighted by the dominant angle of
the script slant. The filtering of the degraded image into the main direction around the main
script (foreground) is performed using a weighting function namely 𝑤𝑗 , which is formally
defined as follows:
where α is a constant defined by experiment and 𝜃𝑑 is the dominant angle of the foreground-
script slant of the document image, while 𝜃𝑗 are the angles used in each Gabor filter’s direction.
Furthermore, taking into account the linearity of the Gabor Filter, a new Gabor wavelet (a
weighted matrix mask) namely 𝐺𝑤 (𝑥, 𝑦, 𝑓𝑖 ) is computed, which is the sum of the weighted
Gabor wavelets corresponding to each direction, as follows:
𝑁𝜃 −1
𝐺𝑤 (𝑥, 𝑦, 𝑓𝑖 ) = ∑ 𝑤𝑗 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 )
(5.8)
𝑗=0
The dominant angle 𝜃𝑑 is computed using the Fourier transform of the document image. The
2D Fourier transform is performed on the degraded image 𝐼(𝑥, 𝑦) to obtain an image in a
frequency domain namely 𝑇(𝑢, 𝑣) = ℱ(𝐼(𝑥, 𝑦)), where ℱ(. ) is the Fourier transform that
allows highlighting the foreground (text) contained in the degraded image. For finding the
dominant slant angle, the resulting Fourier transform can be written as:
such that |𝑇(𝑢, 𝑣)| and 𝜑(𝑢, 𝑣) are the magnitude and phase of the Fourier transform of the
degraded image, respectively. The highest amplitude of 𝑇(𝑢, 𝑣) represents the foreground.
Hence, the dominant slant angle 𝜃𝑑 of the script is deduced by searching the magnitude
|𝑇(𝑢, 𝑣)| having the highest values. Figure 5.4 depicts an example of the 2D Fourier- transform
performed on the degraded image.
76
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a)
(b)
Figure 5.4: Fourier-transform of a degraded image: (a) Degraded image, (b) Fourier transform.
In summary, the proposed method is performed according to the following steps according
to Figure 5.1 and Figure 5.3.
77
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a)
(b)
(c)
(d)
Figure 5.5: Steps of the proposed method performed on Non-Uniform background (Left) and Stain (Right)
degradations: (a) Degraded image, (b) Wiener filtering (c) Binarized image (d) Morphological operator.
79
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a)
(b)
(c)
(d)
Figure 5.6: Steps of the proposed method performed on Ink bleed-through (Left) and Ink intensity variation
(Right) degradations: (a) Degraded image, (b) Wiener filtering (c) Binarized image (d) Morphological operator.
requires selecting a set of representative images and the corresponding ground truth images to
find the best ones. Hence, for an objective evaluation, the experimental protocol is conducted
in three steps:
• Selection of representative degraded images and the corresponding ground truths,
• Estimation of the binarization threshold on the selected degraded images,
• Experimental evaluation.
To set all the parameters to deduce the binarization threshold, a set of representative
degraded images is selected from the DIBCO-2009, DIBCO-2010, DIBCO-2011, DIBCO-
2012, and DIBCO-2013 datasets as shown in Figure 5.7.
81
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Two representative images per degradation type (Stain and Ink bleed-through, Non-uniform
background, Ink intensity variation) are selected, with their corresponding ground truth as
depicted in Figure 5.8.
82
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Figure 5.9, depicts the steps for finding the best parameters of the Gabor filter banks.
Set of representative
degraded images
Gabor filtering
Gabor filtering
Binarization
Binarization
Post-processing
F-Measure (FM)
Binarized image
No Yes
Best FM (𝑓𝑜𝑝𝑡 , 𝜃𝑜𝑝𝑡 )
Figure 5.9: Flowchart for finding the optimal parameters of the Gabor filter. estimation of the binarization
threshold.
The estimation of the binarization threshold using the Gabor filter involves the setup and
tuning of certain parameters, which are the size of the sliding window 𝑊𝑔 , 𝑘, and 𝑅 as well as
the mask size (𝜎, 𝜌), central frequencies, and orientation angles. The most suitable values for
the size of the sliding window 𝑊𝑔 and parameters 𝑘, 𝑅 are respectively 41 × 41, 0.7, and 128.
The parameter α of the weighting function is defined by the experiment and set to -0.002. The
parameters 𝜎 and 𝜌 are set so that 𝜎 = 𝜌 to design an isotropic filter which is suitable for
highlighting efficiently the contours of the foreground. If 𝜎 < 𝜌 or 𝜎 > 𝜌, the filter becomes
less sensitive to contours. Figure 5.10, shows the effect of selecting the 𝜎 and 𝜌 parameters,
83
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
whereas Figure 5.11, shows the effect of selecting the mask size. The parameters 𝜎 and 𝜌 are
fixed to 3, allowing the creation of the mask size of the Gabor filter to 11 × 11.
(a) (b)
(c) (d)
Figure 5.10: Effect of selecting 𝝈 and 𝝆 parameters: (a) Degraded image, (b) Gabor filtered image with 𝝈 <
𝝆, (c) Gabor filtered image 𝝈 = 𝝆, (d) Gabor filtered image with 𝝈 > 𝝆.
(a) (b)
(c) (d)
Figure 5.11: Effect of selecting the mask size of the Gabor filter: (a) Degraded image, (b) 7 × 7, (c) 11 × 11,
(d) 21 × 21.
84
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
To set up the central frequency, frequencies are varied for each degradation type from 0.08
to 0.6 for each number of angles 𝜃 varying from 4 to 32. The optimal frequency is selected
when the value of the FM metric is higher. Figure 5.12, depicts FM versus the central frequency
for several angles (4, 8, 16, and 32) per degradation type. As can be seen, the optimal central
frequency corresponds to 𝑓𝑜𝑝𝑡 = 0.140 for the Stain, Ink bleed-through, and Non-uniform
background degradations for any angle (Figure 5.12(a), Figure 5.12(b), Figure 5.12(c)). While
the best F-Measure value for Ink intensity variation degradation is obtained for the frequency
𝑓𝑜𝑝𝑡 = 0.145 for any angle Figure 5.12(d). Once the optimal frequency is selected, the number
of angles is set up in the same way as in the previous protocol. Figure 5.13, shows that the
optimal number of angles corresponds to 8 for all degradation types. However, the number of
angles is 8 for the overall average of all the degradations. Consequently, the number of angles
is set to 8. When considering simultaneously all degradation types, the average F-Measure for
all degradations by the number of angles leads to the frequency 𝑓 = 0.140 (Figure 5.12.e).
Consequently, the optimal selected frequency for the Gabor filter bank is then set as 𝑓𝑜𝑝𝑡 =
0.140. The adjustment of different parameters for defining an appropriate threshold reveals an
important finding by observing the obtained curves depicted in Figure 5.12. Indeed, F-Measure
versus the central frequency shows that the curves have a specific shape for each type of
degradation. This can be useful for adapting the parameters for each type of degradation.
85
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
100 4 8 100 4 8
16 32 90
16 32
90
80 80
70 70
F-Measure (%)
F-Measure (%)
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,2 0,3 0,4 0,5 0,6
Central frequency Central frequency
(c) (d)
100
4 8 16 32 100
90 4 8
90
80
80
70
70
F-Measure (%)
F-Measure (%)
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,2 0,3 0,4 0,5 0,6
Central frequency Central frequency
(e)
100
4 8 16 32
90
80
70
F-Measure (%)
60
50
40
30
20
10
0
0,1 0,2 0,3 0,4 0,5 0,6
Central frequency
Figure 5.12: F-Measure versus the central frequency for different angles (4, 8, 16, and 32) according to the
degradation type: (a) Stain, (b) Ink bleed-through, (c) Non-uniform background, (d) Ink intensity variation, (e)
Overall.
86
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
100
100
95
95
F-Measure (%)
90
F-Measure (%)
90
85 85
80 80
75 75
70 70
4 8 16 32 4 8 16 32
Number of angles Number of angles
(c) (d)
100 100
95 95
F-Measure (%)
F-Measure (%)
90 90
85 85
80 80
75 75
70 70
4 8 16 32 4 8 16 32
Number of angles Number of angles
(e)
100
95
F-Measure (%)
90
85
80
75
70
4 8 16 32
Number of angles
Figure 5.13: F-Measure for different numbers of angles according to the degradation type for frequency 𝑓𝑜𝑝𝑡 =
0.140: (a) Stain, (b) Ink bleed, (c) Non-uniform background, (d) Ink intensity variation, (e) Overall.
87
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
methods and most other methods are based on them, the evaluation focuses primarily on the
popular classical threshold-based methods while comparison against some of the state-of-the-
art is highlighted. Therefore, evaluations are performed on blind and unblind datasets as well
as without and with a weighting function to evaluate its influence. More precisely, all DIBCO
datasets (2009, 2010, 2011, 2012, and 2013) organized by year of the Contest are used for a
blind evaluation, while the datasets organized by the degradation type are used for the unblind
evaluation [111]. The proposed method is evaluated based on three distinct measures, FM,
𝑃𝑆𝑁𝑅 and 𝐷𝑅𝐷, respectively. For each method 𝑖, a cumulative 𝑅𝑎𝑛𝑘(𝑖) is computed using the
three previous evaluation metrics. More precisely, let 𝑅𝑎𝑛𝑘(𝑖, 𝑗) be the rank of the 𝑖 𝑡ℎ method
using the 𝑗 𝑡ℎ measure, the cumulative ranking values 𝑅𝑎𝑛𝑘(𝑖) are computed for 𝑛 measures
through the following equation:
The final ranking, namely Rank, is computed by sorting the 𝑹𝒂𝒏𝒌(𝒊) values. Table 5.1,
Table 5.2, and Table 5.3 report all the measures computed from the binarized images and the
corresponding ground truth images. The blind and unblind evaluation of the proposed method
against the well-known threshold-based methods are shown, respectively, in Table 5.1 and
Table 5.3, while the blind evaluation against other methods of the state of the art is shown in
Table 5.2. For the first evaluation reported in Table 5.1 and Table 5.3, the first observation
highlights that the method based on the Gabor filter bank combined with Sauvola’s thresholding
outperforms all the other threshold-based methods when using all DIBCO datasets (2009-2010-
2011-2012-2013), and all the datasets organized by the degradation type. Moreover, the ranking
shows the stability of the Sauvola–Gabor method against others since it is ranked first in both
blind and unblind evaluations. In contrast, Sauvola’s method ranking changes in each
evaluation protocol. Furthermore, Niblack’s method is ranked last in both the blind and the
unblind evaluations. However, when combined with the Gabor filter, the ranking of the
Niblack–Gabor method changes significantly up to the second position, outperforming the
Wolf–Gabor method. Roughly speaking, the best results are obtained for ink bleed-through
degradation and ink degradation. For the second evaluation, in comparison with the Gabor–
Sauvola method against certain of the state-of-the-art methods, Table 5.2 depicts the results
performed on blind datasets organized by the year of the contest. The proposed method shows
interesting results. The proposed method is well ranked and almost more stable against other
methods from the state-of-the-art. It is worth noting most of these other methods are not stables.
88
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(They have not the same scores at each dataset.) These results lead us to confirm that the
comparison using an unblind dataset is better suitable. For further works, it is worth noting the
importance of investigating the shape of the FM-frequency curve for degradation modeling type
to improve the results.
89
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
Table 5.2: Comparison of the proposed Sauvola-Gabor with the top binarization methods of the DIBCO2009,
DIBCO2010, DIBCO2011, DIBCO2012, and DIBCO2013 contests and with certain of the sate-of-the-art on blind
datasets.
Dataset Reference FM PSNR DRD Observation Rank
Sehad et al. [105] 85.53 16.40 3.12 9
Gatos et al. [128] 85.25 16.50 -- 8
Sehad et al. [103] 87.82 16.50 - 7
Rivest-Hénault et al. [4] 17.79 -- 3rd rank of contest 6
89.34
[129]
DIBCO-2009 Lu et al. [4] [130] 90.06 18.23 -- 2nd rank of contest 5
Lu et al. [4] [131] 91.24 18.66 -- 1st rank of contest 3
Moghaddam et al. [132] 91.61 18.80 -- 2
Nafchi et al.[133] 92.58 19.00 -- 1
Proposed (Sauvola-Gabor) 89.85 18.90 2.25 3
Gatos et al. [128] 71.99 15.12 -- 7
Lu et al. [130] 85.49 17.83 -- 6
Moghaddam et al. [132] 86.25 18.04 -- 5
DIBCO-2010 Lu et al. [131] 86.41 18.14 -- 4
Su et al. [110] 89.70 19.15 -- 2nd rank of contest 2
Su et al. [110] 91.50 19.78 -- 1st rank of contest 1
Proposed (Sauvola-Gabor) 86.68 18.50 3.61 3
Natarajan [58] [134] 72.59 13.53 10.94 13
Lelore et al. [108] [135] 80.86 16.13 104.48 1st rank of contest 12
Lu et al. [131] 81.67 15.59 11.24 11
Sehad et al. [21] 82.07 15.59 8.13 10
Gatos et al. [128] 82.11 16.04 5.42 9
Su et al. [108] 17.16 15.66 2nd rank of 8
85.20
DIBCO-2011 contest
Su et al. [130] 85.56 16.75 6.02 7
Moghaddam et al. [132] 86.58 16.88 4.36 6
Sehad et al. [103] 87.44 17.20 4.60 5
Su et al. [62] 87.80 17.56 4.84 2
Howe [108] [136] 88.74 17.84 5.37 3rd rank of contest 2
Nafchi et al. [133] 91.56 18.68 2.74 1
Proposed (Sauvola-Gabor) 88.90 17.51 4.90 4
Sehad et al. [105] 80.87 17.06 5.90 8
Sehad et al. [103] 82.92 17.83 5.80 7
Su et al. [130] 85.56 16.75 6.02 6
DIBCO-2012 Moghaddam et al. [132] 87.73 18.50 4.36 5
Howe [112] [136] 89.47 20.14 3.04 1st rank of contest 3
Nafchi et al.[133] 92.23 19.93 2.61 1
Lelore et al. [108] [135] 92.84 17.73 2.66 4
Proposed (Sauvola-Gabor) 87.95 19.53 1.50 2
Sehad et al. [105] 84 .35 17.04 4.07 5
Nafchi et al. [137] 87.53 17.62 5.75 5
DIBCO-2013 Lu et al. [131] 88.84 18.75 4.98 4
Nafchi et al. [133] 90.99 19.44 3.47 3
Lu et al. [109] 92.12 20.68 3.10 1st rank of contest 1
Proposed (Sauvola-Gabor) 88.89 19.53 1.50 2
90
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
To appreciate the importance of using the weighting function for selecting the dominant slant
angle, Table 5.4, shows a comparison of the proposed method on a sample subset of degraded
images when using the Sauvola-Gabor method with and without the weighting function. The
obtained results clearly show an important enhancement of all measures when the weighting
function is used. For instance, for Ink bleed-through degradation, the F-Measure grows from
85.92 (without weighting) to 91.80 (with weighting). Similar observations can be deduced for
other degradations. This proves that considering the dominant angle allows improving the
binarization of the image.
Table 5.3: Performance evaluation on unblind DIBCO datasets according to degradation type.
Degradation type Method FM PSNR DRD Rank
Stain Niblack 56.01 6.71 29.10 6
Sauvola 83.90 16.20 2.03 4
Wolf 86.28 16.11 1.94 3
Niblak-Gabor 85.24 16.55 1.55 2
Sauvola-Gabor 89.50 16.67 1.70 1
Wolf-Gabor 53.55 9.05 22.70 5
Ink bleed-through Niblack 59.35 7.08 32.02 6
Sauvola 83.16 16.65 2.48 3
Wolf 83.10 15.98 2.10 2
Niblak-Gabor 82.60 15.55 1.61 4
Sauvola-Gabor 91.80 18.60 1.45 1
Wolf-Gabor 45.91 9.45 34.10 5
Non-uniform Background Niblack 58.40 4.96 21.93 6
Sauvola 75.62 14.94 5.84 3
Wolf 74.86 16.43 6.10 2
Niblak-Gabor 68.97 15.31 7.02 4
Sauvola-Gabor 81.20 16.70 1.92 1
Wolf-Gabor 35.60 8.24 25.94 5
Ink intensity variation Niblack 56.54 6.81 34.81 6
Sauvola 68.77 14.97 12.51 5
Wolf 64.50 15.01 10.62 4
Niblak-Gabor 75.90 16.90 4.11 2
Sauvola-Gabor 89.09 18.40 2.01 1
Wolf-Gabor 71.01 16.69 5.32 3
91
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
For a subjective visual evaluation, some samples of document images for the stain, ink bleed-
through, non-uniform background, and ink degradations are depicted respectively in Figure
5.14, Figure 5.15, Figure 5.16, and Figure 5.17. We notice a clear improvement in terms of
binarization when we visually compare the binarized document images using our method with
the ground truth images and those binarized using a classical method such as Sauvola’s method.
(a) (b)
(c) (d)
Figure 5.14: Sample of subjective evaluation for Stain degradation: (a) Degraded image H02 DIBCO-2012 as
stain, (b) Sauvola’s method, (c) Ground truth image, (d) The proposed Sauvola-Gabor method.
(a) (b)
(c) (d)
Figure 5.15: Sample of subjective evaluation for Ink bleed-through degradation: (a) Degraded image H06
DIBCO-2012 as ink bleed-through, (b) Sauvola’s method, (c) Ground truth image, (d) The proposed Sauvola-
Gabor method.
92
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
(a) (b)
(c) (d)
Figure 5.16. Sample of subjective evaluation for non-uniform background: (a) Degraded image: PR03 DIBCO-
2013 as non-uniform background, (b) Sauvola’s method (c) Ground truth image, (d) The proposed Sauvola-Gabor
method.
(a) (b)
(c) (d)
Figure 5.17: Sample of subjective evaluation for ink degradation: (a) Degraded image H09 DIBCO-2012 as
ink degradation, (b) Sauvola’s method, (c) Ground truth image, (d) The proposed Sauvola-Gabor method.
93
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION
This chapter presents a new method suitable for a software and hardware implementation
based on Gabor filters for the binarization of ancient degraded documents. The Gabor filter
bank is designed by considering the degradation type of the document based on the unblind
protocol. First, the document image is pre-processed using a Wiener filter to smooth the
degradation. Subsequently, the binarization threshold is estimated using texture features, such
as the mean and the standard deviation, extracted from the respective original image and the
filtered document image. Furthermore, a new protocol namely unblind protocol is proposed for
estimating the standard deviation according to the degradation type for setting the optimal
parameters of the Gabor filter such as the central frequency and the number of angles. The
dominant slant angle is estimated using a 2D Fourier transform. Finally, morphological
operators as post-processing are applied to the binarized image to reduce some artifacts. The
comparison is achieved using the proposed method against various well-known binarization
methods. The experimental evaluation and extensive tests performed on blind and unblind
samples showed the benefit of combining the Gabor filter with the standard thresholding
methods for binarizing ancient documents. Indeed, the Sauvola-Gabor method ranks first
against all the other methods. However, the proposed method seems more suitable for ink bleed-
through degradation and ink degradation. This outcome is explained by the directional feature
of the Gabor filter bank. Moreover, for poorly contrasted images, the standard deviation in the
Gabor space is increased. Consequently, the proposed method can be suitable for poorly
contrasted documents.
For future research, the automatic detection of the degradation type seems the first path to
be explored for adapting automatically the parameters of the Gabor filter by exploiting the shape
of the F-Measure curves as a function of the frequency, which is a recurring issue for
researchers. Another exploration is the estimation of the optimal parameters of the Gabor filter
without a reference binarized image. Finally, non-local texture information combined with
other texture features constitutes an interesting way for better extraction of local textural
features.
94
In this thesis, we have addressed the problem of ancient degraded document binarization. It
is worth noting that we deal with document images acquired with devices in the visible field
and we do not address the problem by processing document images in the field of invisible,
multispectral, laser, and ultrasonic images.
The problem of the binarization of degraded documents is still difficult because of the
variety, the non-uniformity, the complexity of the degradations, and the difficulty to model the
degradations. We focused mainly on introducing and exploring new texture-based binarization
methods with a new representation space to be effective enough despite hard degradations type
to extract the foreground (text) from the background with as little alteration as possible. Three
methods based on texture are explored to binarize the ancient degraded documents.
Three methods based respectively on co-occurrences, LBP, and Gabor filters and combined
with a thresholding-based method are used to overcome the lack of grayscale pixel-based
methods. Our contributions are summarized as follows:
We have presented a new thresholding method based on texture features extracted from a
co-occurrence matrix. The general observation that can be noticed is, the co-occurrence allows
interesting discrimination between the text and the background. The method is inspired by
Niblack’s method and enhanced by using the well-known texture descriptor based on the co-
occurrence matrix. The Haralick’s attributes, such as contrast and mean are computed from the
co-occurrence matrix, under parameters of a distance-vector module equal to one with four
directions (0°, 30°, 45°, 135°). The best parameters are defined by experiments. The best results
were performed for Haralick’s contrast attribute for angle 135°. The subjective and objective
evaluations showed good results. Compared to the state-of-the-art, the results are promising and
satisfactory. Nevertheless, we notice some weaknesses of binarization for the stained document
category. In future work, more attention will be given to this type of degradation.
95
CONCLUSION
Local Binary Pattern (LBP) is used as a texture measure within a thresholding-based method.
The mean and variance of pixels are computed respectively from both the original document
image and the LBP image. Then, these features are used within a threshold-based method. The
method presents some weaknesses for poor contrasted documents. To overcome this poorly
drawback another variant is computed by combining contrast information with the basic LBP
operator. The proposed method is tested for subjective and objective evaluation, using multiple
metrics. Tests are conducted on three DIBCO-datasets organized by their year of submission
and by type of degradation. We notice that SLBP works better than SLBP-C for ink-bleed
through degradation, however, SLBP-C works better for the other degradations, which makes
them both together complementary. Their combination outperforms all other well-known
thresholding-based methods in terms of FM, PSNR, and DRD. Particularly, we notice above
all that in certain document images, the results are much better. The LBP can be considered as
a good candidate for the binarization of degraded document images.
A new method based on Gabor filters is applied for the binarization of ancient degraded
documents. After an enhancement step, a binarization threshold is estimated using texture
features, such as the mean and the standard deviation, extracted from the respective original
image and the filtered document image based on Gabor-filters. The optimal parameters of the
Gabor filter such as the central frequency and the number of angles are estimated by
experiments and by introducing a new protocol namely unblind (based on using document
images by type of degradation). The comparison is achieved using the proposed method against
the state-of-the-art and various well-known binarization methods. The experimental evaluation
and extensive tests performed on blind and unblind samples showed good results against the
state-of-the-art methods, for binarizing ancient documents. It is worth noting that the Sauvola–
Gabor method ranks first against all the other classical thresholding-based methods. Better
results are shown for ink bleed-through degradation and ink degradation. This outcome is
explained by the directional feature of the Gabor filter bank.
In a general way, experimental results conducted on various datasets prove that the texture
and its variants can constitute an interesting approach to explore for improving the existing
96
CONCLUSION
binarization methods. The combination of the three methods could be an interesting way to
explore to enhance the binarization results.
For future research, the automatic detection of the degradation type seems the first path to
be explored for adapting automatically the parameters of the Gabor filter by exploiting the shape
of the F-measure curves as a function of the frequency, which is a recurring issue for
researchers. Another exploration is the estimation of the optimal parameters of the binarization
methods without a reference binarized image.
Finally, non-local texture information combined with a more accurate dominant angle slant
estimation of the document script constitutes an interesting way for better extraction of local
textural features. For instance, features extracted from the co-occurrence-based texture could
be processed on other representation spaces of the document image.
97
98
Sehad, A., Chibani, Y., Hedjam, R., & Cheriet, M. (2019). Gabor filter-based texture for
ancient degraded document image binarization. Pattern Analysis and Applications, 22(1), 1-22.
-Doi: 10.1007/s10044-018-0747-7
https://link.springer.com/article/10.1007/s10044-018-0747-7
1- Sehad, A., Chibani, Y., Cheriet, M., & Yaddaden, Y. (2013, September). Ancient degraded
document image binarization based on texture features. In 2013 8th International Symposium
on Image and Signal Processing and Analysis (ISPA) (pp. 189-193). IEEE.
-https://ieeexplore.ieee.org/abstract/document/6703737
2- Brik, Y., Chibani, Y., Zemouri, E. T., and Sehad, A. (2013, September). Ridgelet-DTW-
based word spotting for Arabic historical document. In 2013 8th International Symposium on
Image and Signal Processing and Analysis (ISPA) (pp. 194-199). IEEE.
-https://ieeexplore.ieee.org/document/6703738
3- Sehad, A., Chibani, Y., & Cheriet, M. (2014, September). Gabor filters for degraded
document image binarization. In 2014 14th International Conference on Frontiers in
Handwriting Recognition (pp. 702-707). IEEE.
-https://ieeexplore.ieee.org/document/6981102
4- Sehad, A., Chibani, Y., Hedjam, R., & Cheriet, M. (2015, November). LBP-based degraded
document image binarization. In 2015 International Conference on Image Processing Theory,
Tools and Applications (IPTA) (pp. 213-217). IEEE.
-https://ieeexplore.ieee.org/document/7367131
5- Djema, A., Chibani, Y., Sehad, A., & Zemouri, E. T. (2015, August). Blind versus unblind
performance evaluation of binarization methods. In 2015 13th International Conference on
Document Analysis and Recognition (ICDAR) (pp. 511-515). IEEE.
-https://ieeexplore.ieee.org/document/7333814
99
[1] S. Marinai, "Introduction to document analysis and recognition," in Machine learning
in document analysis and recognition, ed: Springer, pp. 1-20, 2008.
[4] B. Gatos, K. Ntirogiannis, and I. Pratikakis, "ICDAR 2009 document image binarization
contest (DIBCO 2009)," in 2009 10th International Conference on document analysis
and recognition, pp. 1375-1382, 2009.
[7] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2013 document image binarization
contest (DIBCO 2013)," in 2013 12th International Conference on Document Analysis
and Recognition, pp. 1471-1476, 2013.
101
REFERENCES
[13] K. Ntirogiannis, B. Gatos, and I. Pratikakis, "A modified adaptive logical level
binarization technique for historical document images," in 2009 10th International
Conference on Document Analysis and Recognition, pp. 1171-1175, 2009.
[14] S. Lu, B. Su, and C. L. Tan, "Document image binarization using background estimation
and stroke edges," International Journal on Document Analysis and Recognition
(IJDAR), vol. 13, pp. 303-314, 2010.
[15] B. Bataineh, S. N. H. S. Abdullah, and K. Omar, "An adaptive local binarization method
for document images based on a novel thresholding method and dynamic windows,"
Pattern Recognition Letters, vol. 32, pp. 1805-1813, 2011.
[18] C. Wolf, J.-M. Jolion, and F. Chassaing, "Text localization, enhancement, and
binarization in multimedia documents," in Object recognition supported by user
interaction for service robots, pp. 1037-1040, 2002.
[19] M. Unser, "Texture classification and segmentation using wavelet frames," IEEE
Transactions on image processing, vol. 4, pp. 1549-1560, 1995.
[21] L. Shen and L. Bai, "A review on Gabor wavelets for face recognition," Pattern Analysis
and Applications, vol. 9, pp. 273-292, 2006.
[22] T. Celik and T. Tjahjadi, "Unsupervised color image segmentation using dual-tree
complex wavelet transform," Computer vision and image understanding, vol. 114, pp.
813-826, 2010.
[24] Y. Liu and S. N. Srihari, "Document image binarization based on texture features,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 540-544,
1997.
[25] S. Mandal, S. Biswas, A. K. Das, and B. Chanda, "Binarisation of color map images
through the extraction of regions," in International Conference on Computer Vision and
Graphics, pp. 418-427, 2014.
[26] M. Jameson, "Promises and challenges of digital libraries and document image analysis:
a humanist's perspective," in First International Workshop on Document Image
Analysis for Libraries, 2004. Proceedings, pp. 54-61, 2004.
102
REFERENCES
[27] I. H. Witten, D. Bainbridge, and D. M. Nichols, How to build a digital library: Morgan
Kaufmann, 2009.
[28] A.-L. Dupont, "Le patrimoine culturel sur papier. De la compréhension des processus
d'altération à la conception de procédés de stabilisation," Université Evry Val
d'Essonne, 2014.
[29] K. Ntirogiannis, B. Gatos, and I. Pratikakis, "A combined approach for the binarization
of handwritten document images," Pattern Recognition Letters, vol. 35, pp. 3-15, 2014.
[31] L. O'Gorman and R. Kasturi, Document image analysis vol. 39: IEEE Computer Society
Press Los Alamitos, 1995.
[32] G. Nagy, "Twenty years of document image analysis in PAMI," IEEE Transactions on
Pattern Analysis & Machine Intelligence, pp. 38-62, 2000.
[33] D. Doermann, "The indexing and retrieval of document images: A survey," Computer
vision and image understanding, vol. 70, pp. 287-298, 1998.
[34] C. L. Tan, W. Huang, Z. Yu, and Y. Xu, "Imaged document text retrieval without OCR,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 838-844,
2002.
[36] T. M. Rath and R. Manmatha, "Word spotting for historical documents," International
Journal of Document Analysis and Recognition (IJDAR), vol. 9, pp. 139-152, 2007.
[37] R. Hedjam and M. Cheriet, "Historical document image restoration using a multispectral
imaging system," Pattern Recognition, vol. 46, pp. 2297-2312, 2013.
[40] A. Criminisi, P. Pérez, and K. Toyama, "Region filling and object removal by exemplar-
based image inpainting," IEEE Transactions on image processing, vol. 13, pp. 1200-
1212, 2004.
[41] W. A. Mustafa and H. Yazid, "Illumination and Contrast Correction Strategy using
Bilateral Filtering and Binarization Comparison," Journal of Telecommunication,
Electronic and Computer Engineering (JTEC), vol. 8, pp. 67-73, 2016.
103
REFERENCES
[42] J. Wen, S. Li, and J. Sun, "A new binarization method for non-uniform illuminated
document images," Pattern Recognition, vol. 46, pp. 1670-1690, 2013.
[43] M. Sezgin and B. Sankur, "Survey over image thresholding techniques and quantitative
performance evaluation," Journal of Electronic Imaging, vol. 13, pp. 146-166, 2004.
[44] A. Tonazzini, L. Bedini, and E. Salerno, "Independent component analysis for document
restoration," Document Analysis and Recognition, vol. 7, pp. 17-27, 2004.
[47] R. Cao, C. L. Tan, and P. Shen, "A wavelet approach to double-sided document image
pair processing," in Proceedings 2001 International Conference on Image Processing
(Cat. No. 01CH37205), pp. 174-177, 2001.
[49] A. Tonazzini, E. Salerno, M. Mochi, and L. Bedini, "Blind source separation techniques
for detecting hidden texts and textures in document images," in International
Conference Image Analysis and Recognition, pp. 241-248, 2004.
[50] N. Ntogas and D. Veintzas, "A binarization algorithm for historical manuscripts," in
WSEAS International Conference. Proceedings. Mathematics and Computers in
Science and Engineering, pp. 41-51, 2008.
[51] T. R. Singh, S. Roy, O. I. Singh, T. Sinam, and K. Singh, "A new local adaptive
thresholding technique in binarization," arXiv preprint arXiv:1201.5227, 2012.
[53] W. Niblack, "An Introduction to Digital Image Processing (Birkerød," ed: Strandberg
Publishing Company, 1985.
[54] O. D. Trier and T. Taxt, "Evaluation of binarization methods for document images,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, pp. 312-315,
1995.
104
REFERENCES
[56] N. Otsu, "A threshold selection method from gray-level histograms," IEEE transactions
on systems, man, and cybernetics, vol. 9, pp. 62-66, 1979.
[57] J. N. Kapur, P. K. Sahoo, and A. K. Wong, "A new method for gray-level picture
thresholding using the entropy of the histogram," Computer vision, graphics, and image
processing, vol. 29, pp. 273-285, 1985.
[58] J. Natarajan and I. Sreedevi, "Enhancement of ancient manuscript images by log based
binarization technique," AEU-International Journal of Electronics and
Communications, vol. 75, pp. 15-22, 2017.
[60] G. Pratikakis, "I. and K. Ntirogiannis,“Icdar 2011 document image binarization contest
(dibco 2011),”" in International Conference on Document Analysis and Recognition.
IEEE Computer Society, p. 15061510, 2011.
[62] B. Su, S. Lu, and C. L. Tan, "Robust document image binarization technique for
degraded document images," Image Processing, IEEE Transactions on, vol. 22, pp.
1408-1417, 2013.
[64] M. Cheriet, R. F. Moghaddam, and R. Hedjam, "A learning framework for the
optimization and automation of document binarization methods," Computer vision and
image understanding, vol. 117, pp. 269-280, 2013.
[66] K. Fukunaga and L. Hostetler, "The estimation of the gradient of a density function,
with applications in pattern recognition," IEEE Transactions on information theory, vol.
21, pp. 32-40, 1975.
[67] D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space
analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 603-619,
2002.
105
REFERENCES
[69] R. G. Mesquita, C. A. Mello, and L. Almeida, "A new thresholding algorithm for
document images based on the perception of objects by distance," Integrated Computer-
Aided Engineering, vol. 21, pp. 133-146, 2014.
[70] M. Soua, R. Kachouri, and M. Akil, "A new hybrid binarization method based on
Kmeans," in 2014 6th International Symposium on Communications, Control and
Signal Processing (ISCCSP), pp. 118-123, 2014.
[71] Y. Leydier, F. Le Bourgeois, and H. Emptoz, "Serialized k-means for adaptative color
image segmentation," in International Workshop on Document Analysis Systems, pp.
252-263, 2004
[72] C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data
mining and knowledge discovery, vol. 2, pp. 121-167, 1998.
[73] V. Vapnik and V. Vapnik, "Statistical learning theory Wiley," New York, vol. 1, 1998.
[74] C.-H. Tung and Y.-G. Lin, "Efficient uneven-lighting image binarization by support
vector machines," Journal of Information and Optimization Sciences, vol. 39, pp. 519-
543, 2018.
[76] C. Thillou and B. Gosselin, "Color binarization for complex camera-based images," in
Color Imaging X: Processing, Hardcopy, and Applications, pp. 301-308, 2005.
[77] C. M. Bishop, Neural networks for pattern recognition: Oxford university press, 1995.
[78] O. Omidvar and J. Dayhoff, Neural networks and pattern recognition: Elsevier, 1997
.
[79] Y. Pao, "Adaptive pattern recognition and neural networks," 1989.
[80] W. Xiong, J. Xu, Z. Xiong, J. Wang, and M. Liu, "Degraded historical document image
binarization using local features and support vector machine (SVM)," Optik, vol. 164,
pp. 218-223, 2018.
[82] C.-H. Chou, W.-H. Lin, and F. Chang, "A binarization method with learning-built rules
for document images produced by cameras," Pattern Recognition, vol. 43, pp. 1518-
1530, 2010.
[83] F. Westphal, N. Lavesson, and H. Grahn, "Document image binarization using recurrent
neural networks," in 2018 13th IAPR International Workshop on Document Analysis
Systems (DAS), pp. 263-268, 2018.
106
REFERENCES
[84] C. Tensmeyer and T. Martinez, "Document image binarization with fully convolutional
neural networks," in 2017 14th IAPR International Conference on Document Analysis
and Recognition (ICDAR), pp. 99-104, 2017.
[85] T. Sari, A. Kefali, and H. Bahi, "An MLP for binarizing images of old manuscripts," in
2012 International Conference on Frontiers in Handwriting Recognition, pp. 247-251.
, 2012
[87] L. Kang, P. Ye, Y. Li, and D. Doermann, "A deep learning approach to document image
quality assessment," in 2014 IEEE International Conference on Image Processing
(ICIP), pp. 2570-2574, 2014.
[88] G. Meng, K. Yuan, Y. Wu, S. Xiang, and C. Pan, "Deep networks for degraded
document image binarization through pyramid reconstruction," in 2017 14th IAPR
International Conference on Document Analysis and Recognition (ICDAR), pp. 727-
732. , 2017
[92] C. Chen, L. Pau, P. Wang, and S. Wang, "Texture analysis," Handbook of Pattern
Recognition and Computer Vision, pp. 207-248, 1998.
[96] M. Pietikäinen, A. Hadid, G. Zhao, and T. Ahonen, Computer vision using local binary
patterns vol. 40: Springer Science & Business Media, 2011.
[97] D. L. Pham, C. Xu, and J. L. Prince, "Current methods in medical image segmentation,"
Annual review of biomedical engineering, vol. 2, pp. 315-337, 2000.
[98] J. Malik, S. Belongie, T. Leung, and J. Shi, "Contour and texture analysis for image
segmentation," International Journal of computer vision, vol. 43, pp. 7-27, 2001.
107
REFERENCES
[100] Y. Liu and S. N. Srihari, "Document image binarization based on texture features,"
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 540-544,
1997.
[101] B. Wang, X.-F. Li, F. Liu, and F.-Q. Hu, "Color text image binarization based on binary
texture analysis," Pattern Recognition Letters, vol. 26, pp. 1568-1576, 2005.
[103] A. Sehad, Y. Chibani, and M. Cheriet, "Gabor Filters for Degraded Document Image
Binarization," in Frontiers in Handwriting Recognition (ICFHR), 2014 14th
International Conference on, pp. 702-707, 2014.
[104] A. Sehad, Y. Chibani, R. Hedjam, and M. Cheriet, "Gabor filter-based texture for
ancient degraded document image binarization," Pattern Analysis and Applications, vol.
22, pp. 1-22, 2019.
[109] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2013 document image binarization
contest (DIBCO 2013)," in Document Analysis and Recognition (ICDAR), 2013 12th
International Conference on, pp. 1471-1476, 2013.
[111] A. Djema, Y. Chibani, A. Sehad, and E.-T. Zemouri, "Blind versus unblind performance
evaluation of binarization methods," in Document Analysis and Recognition (ICDAR),
2015 13th International Conference on, pp. 511-515, 2015.
108
REFERENCES
[114] M. E. Shokr, "Evaluation of second‐order texture parameters for sea ice classification
from radar images," Journal of Geophysical Research: Oceans, vol. 96, pp. 10625-
10640, 1991.
[115] A. Sutter, G. Sperling, and C. Chubb, "Measuring the spatial frequency selectivity of
second-order texture mechanisms," Vision Research, vol. 35, pp. 915-924, 1995.
[116] M.-W. Lin, J.-R. Tapamo, and B. Ndovie, "A texture-based method for document
segmentation and classification," South African Computer Journal, vol. 2006, pp. 49-
56, 2006.
[118] A. K. Jain and Y. Zhong, "Page segmentation using texture analysis," Pattern
Recognition, vol. 29, pp. 743-770, 1996.
[120] D.-C. He and L. Wang, "Texture unit, texture spectrum, and texture analysis," IEEE
transactions on Geoscience and Remote Sensing, vol. 28, pp. 509-512, 1990.
[121] L. Wang and D.-C. He, "Texture classification using texture spectrum," Pattern
Recognition, vol. 23, pp. 905-910, 1990.
[122] A. Hadid, M. Pietikainen, and T. Ahonen, "A discriminative feature space for detecting
and recognizing faces," in Proceedings of the 2004 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2004. CVPR 2004, pp. II-II., 2004.
[124] Y. Luo, C.-m. Wu, and Y. Zhang, "Facial expression recognition based on fusion feature
of PCA and LBP with SVM," Optik-International Journal for Light and Electron
Optics, vol. 124, pp. 2767-2770, 2013.
109
REFERENCES
[127] C. Wolf, J.-M. Jolion, and F. Chassaing, "Text localization, enhancement, and
binarization in multimedia documents," in Pattern Recognition, 2002. Proceedings.
16th International Conference on, pp. 1037-1040, 2002.
[129] D. Rivest-Hénault, R. Farrahi Moghaddam, and M. Cheriet, "A local linear level set
method for the binarization of degraded historical document images," International
Journal on Document Analysis and Recognition, vol. 15, pp. 101-124, 2012.
[130] B. Su, S. Lu, and C. L. Tan, "Binarization of historical document images using the local
maximum and minimum," in Proceedings of the 9th IAPR International Workshop on
Document Analysis Systems, pp. 159-166, 2010.
[131] S. Lu, B. Su, and C. L. Tan, "Document image binarization using background estimation
and stroke edges," International journal on document analysis and recognition, vol. 13,
pp. 303-314, 2010.
[136] N. R. Howe, "A Laplacian Energy for Document Binarization," pp. 6-10, 2011.
110
111