You are on page 1of 127

Order Number: 10/2020-D/ELN

PEOPLE'S DEMOCRATIC REPUBLIC OF ALGERIA


Ministry of Higher Education and Scientific Research

UNIVERSITY OF SCIENCE AND TECHNOLOGY HOUARI BOUMEDIENE


FACULTY OF ELECTRONICS AND COMPUTER SCIENCE

By

Defended publicly on 12th November 2020 in front of the jury composed of:

Mrs. BELHADJ-AISSA Aichouche USTHB /FEI Professor President


Mr. CHIBANI Youcef USTHB /FEI Professor Thesis Director
Ms. BENATCHBA Karima ESI Professor Examiner
Mr. NACEREDDINE Nafaa CRTI DR (Directeur de Recherche) Examiner
Mrs. KEMMOUCHE Akila USTHB /FEI MCA (Maitre de Conférence /A) Examiner
Mr. DJEKOUNE Abdel Oualid CDTA MRA (Maitre de Recherche /A) Examiner

USTHB-2020
II
To all who find a serious affinity with me

III
IV
This thesis is the result of continuous work at, Laboratoire d'Ingénierie des Systèmes
Intelligents et Communicants (LISIC): the Laboratory of Intelligent and Communicating
Systems Engineering of the Faculty of Electronics and Computer Science of the University of
Science and Technology Houari Boumediene (USTHB). I would like to express my gratitude
to my supervisor, Professor Youcef CHIBANI. I appreciate all his contributions, his help, his
motivation, and his patience. I thank him for his method of work which encourages me to be
tenacious and to believe that I can succeed. Professor Mohamed CHERIET, from Ecole de
Téchnologie Supérieure (ETS) Montréal Canada: a School of Higher Technology, will also find
my sincere gratitude for all his help, his continual encouragements, his advice, and guidance
throughout these years. My warm thanks go to the members of the jury, Mrs. Aichouche
BELHADJ-AISSA Professor at USTHB Algiers; Ms. Karima BENATCHBA Professor at
Ecole Nationale Superieur en Informatique ESI Algiers: a National Higher School in Computer
Science, Mr. Nafaa NACEREDDINE, Director of research at Centre de recherche en
Technologie Industrielle CRTI Algiers: a Center of Industriel Technology Research; Mrs. Akila
KEMMOUCHE, MCA at USTHB Algiers and Mr. Abdel Oualid DJEKOUNE, MRA at Centre
de Developpement des Technologies Avancées (CDTA) Algiers: Center of Advanced
Technology Developpement, for having agreed to devote their time to the examination of this
thesis and for giving me the honor to present it in front of them.
I warmly thank my parents and my family for their support and encouragement. My sincere
thanks also go to all members of the LISIC laboratory with whom I shared years full of affinity
and sincerity in a good working atmosphere. I cannot conclude without mentioning and
thanking all my friends and colleagues.

V
VI
Throughout the world, a considerable amount of printed materials, including ancient
(historical) documents, which represent an invaluable heritage, suffer from various types of
degradations.
To preserve, to exploit effectively this heritage and to make it available to a large community
via the internet, important steps of image processing are necessary: (i) Documents digitizing is
necessary not only to reduce the human contact with the original documents to protect them
from damage due to the human handling but in addition to allow a wide and easy diffusion by
using different electronic media and means. (ii) Since the digitized ancient degraded documents
are not suitable for the use of information retrieval tools, for instance, due to various
degradations, it is necessary to proceed to the binarization stage. This latter consists of
separating the text (foreground) from the background and enhancing the foreground to allow
optimal use of all tools inherent to information technology (IT) such as document image
analysis, recognition, and retrieval systems, optical character recognition (OCR), document
layout recognition and analysis, word-spotting and information retrieval and so on.
Most of the binarization methods reported in the literature are based on the pixel’s gray level
intensity or on simple pixel neighborhood-based information such as mean or variance to
compute the binarization threshold. Moreover, they are mostly based on classical thresholding
methods combined with enhancement (pre-processing) and post-enhancement (post-
processing) stages. Furthermore, as it is difficult to model the degradation types, the problem
of document binarization remains a challenging task. It is worth noting that the information
based on neighborhood pixels is relevant for document image binarization methods and
particularly for thresholding-based methods. Therefore, in this thesis, we explore the use of
texture-based methods for binarizing ancient degraded document images acquired in the optical
domain of the visible spectrum in a spatial grayscale and/or color representation.
By this, we intend also to draw relevant conclusions about the use of pixel neighborhood
information other than basic intensity-based information. Three methods based on texture are
respectively explored, the co-occurrence, the Gabor filter, and the Local Binary Pattern model
(LBP). These methods are used to extract the texture information and combined with
conventional well-known thresholding-based methods to binarize the degraded documents.
Extensive experiments are performed for highlighting the opportunity to use texture for
improving the binarization.

Keywords: Ancient document images, Historical document images, Degraded document images,
Binarization, Threshold, Image processing, Texture, Gabor filter.

VII
VIII
Page

1.1 Introduction .................................................................................................................... 5


1.2 Ancient degraded document image ................................................................................ 7
1.2.1 Source of degradation ................................................................................... 8
1.2.2 Type of degradation .................................................................................... 11
1.3 Summary of the chapter ................................................................................................ 13

2.1 Introduction .................................................................................................................. 15


2.2 Document image analysis, recognition, and retrieval systems ..................................... 15
2.3 Binarization .................................................................................................................. 18
2.4 Related works ............................................................................................................... 20
2.4.1 Taxonomy of degraded document binarization methods ........................... 20
2.4.2 Double-sided document image binarization methods ................................ 21
2.4.3 One-sided document image binarization methods...................................... 22
2.4.4 Global and local binarization based methods ............................................. 23
2.4.5 Thresholding-based binarization methods .................................................. 26
2.4.6 Classifier based methods ............................................................................ 29
2.4.7 Texture based binarization methods ........................................................... 34
2.5 Datasets and evaluation criteria .................................................................................... 35
2.5.1 Datasets and evaluation protocol ................................................................ 35
2.5.2 Evaluation criteria....................................................................................... 38
2.6 Summary of the chapter ................................................................................................ 39

3.1 Introduction .................................................................................................................. 41


3.2 Overview of the co-occurrence matrix ......................................................................... 41
3.3 Co-occurrence matrix method for degraded document image binarization ................. 44
3.3.1 Design of the co-occurrence matrix............................................................ 45
3.3.2 Binarization module ................................................................................... 46
3.4 Experimental results ..................................................................................................... 46
3.4.1 Experimental setup ..................................................................................... 46
3.4.2 Experimental evaluation ............................................................................. 49
3.5 Summary of the chapter ................................................................................................ 53

4.1 Introduction .................................................................................................................. 55


4.2 Overview of the LBP operator...................................................................................... 55
4.3 LBP-based method for document binarization ............................................................. 57
4.3.1 Design of the LBP-based texture ................................................................ 58

IX
4.3.2 Binarization module ....................................................................................60
4.4 Experimental results ......................................................................................................61
4.4.1 Experimental setup ......................................................................................61
4.4.2 Experimental evaluation .............................................................................63
4.5 Summary of the chapter ................................................................................................69

5.1 Introduction ...................................................................................................................71


5.2 Overview of Gabor filter ...............................................................................................71
5.3 Gabor filter bank-based method for document binarization .........................................74
5.3.1 Pre-processing stage ....................................................................................77
5.3.2 Design of the Gabor filter bank...................................................................78
5.3.3 Binarization module ....................................................................................78
5.3.4 Post-processing stage: .................................................................................78
5.4 Experimental results:.....................................................................................................80
5.4.1 Experimental setup ......................................................................................80
5.4.2 Experimental evaluation .............................................................................87
5.5 Summary of the chapter ................................................................................................94

X
Page
Table 2.1: DIBCO dataset classification according to the degradation type [111]. ................... 37

Table 3.1: Haralick’s attributes values for Background and foreground images....................... 48

Table 3.2: Evaluation results of well-known methods for binarization by type of degradations.
.................................................................................................................................. 50

Table 3.3: Evaluation results of the proposed method and Nick’s method, for binarization by
type of degradations. ................................................................................................ 51

Table 4.1: Performance evaluation of the proposed method against the classical threshold
based on DIBCO datasets. ...................................................................................... 68

Table 4.2: Performance evaluation of the proposed method against the classical threshold-
based methods according to the degradation type. ................................................ 68

Table 4.3: Evaluation Results on Image Samples. ................................................................... 69

Table 5.1: Performance evaluation on blind DIBCO dataset..................................................... 89

Table 5.2: Comparison of the proposed Sauvola-Gabor with the top binarization methods of the
DIBCO2009, DIBCO2010, DIBCO2011, DIBCO2012, and DIBCO2013 contests
and with certain of the sate-of-the-art on blind datasets. ......................................... 90

Table 5.3: Performance evaluation on unblind DIBCO datasets according to degradation type.
.................................................................................................................................. 91

Table 5.4: Influence of the weighting function used in the Sauvola-Gabor. .............................. 91

XI
Page
Figure 1.1: Example of ancient degraded documents due to their poor storage conditions.........6

Figure 1.2: Various document scanner devices. ...........................................................................7

Figure 1.3: Origins of Document degradations: (a) Chemical, (b) Biological, (c) Human,
(d)(e)(f) External. .......................................................................................................8

Figure 1.4: Document degradation types ((a-b-c-d-e-f-g) from DIBCO Datasets, (h) From
National Library of Algiers): (a) (b) Ink fading, (c) (d) Stain, (e) Ink bleed-through,
(f) Ink show-through, (g) (h) Non-uniform background, Ink fading and uneven
illumination...............................................................................................................12

Figure 2.1: Document image structure. ......................................................................................17

Figure 2.2: Overall flowchart, of Document Analysis and Recognition System. ......................17

Figure 2.3: Overall, of the binarizing scheme. ...........................................................................18

Figure 2.4: Document image binarization process: (a) Original image, (b) Binarized image:
foreground(text), (c) Intermediate background image (pixels belonging to
background). .............................................................................................................19

Figure 2.5: Taxonomy of degraded document binarization. ......................................................21

Figure 2.6: Document image with strong degradation type: (a) Ink bleed-through degradation,
(b) Stain degradation. ...............................................................................................24

Figure 2.7: Local and global binarization methods: (a) (b) Original degraded document images,
(c) (d) Respective histograms, (e) (f) Binarized document image using a global
method, and (g)(h) Binarized document image using a local method. .....................25

Figure 2.8: Overall scheme of Curvelet transform-based binarization [42]. .............................28

Figure 2.9: Curvelet transform-based binarization: (a) Original image, (b) Binarized image [42].
..................................................................................................................................28

Figure 2.10: Binarization based on k-means method: (a) (g) Original degraded document
images, (b)(h) Gray level degraded document image, (c)(i) Respective histogram,
(d)(e)(f) and (j)(k)(l) Respective degraded document layers (foreground,
Background, ink-bleed through degradation). ..........................................................31

Figure 2.11: An image blocking sample [80]. ............................................................................32

Figure 2.12: Sample of a DIBCO-2016 degraded document image with its ground truth: (a)
Original color degraded document image, (b) Ground truth binarized document
image. .......................................................................................................................36

XII
Figure 2.13: Sample of a degraded document image from the National Library of Algeria (BNA)
used for a subjective evaluation. .............................................................................. 36

Figure 2.14: Example of degradation type categorization: (a) Type B, (b) Type D, (c) Type C,
(d), Type Mixed (D+A)............................................................................................ 38

Figure 3.1: Principle scheme of co-occurrence matrix: a) A sample of the sliding window
(size=5x5) showing the 08 directions with a distance 𝑑 = 2, b) the sliding matrix
within the matrix image. .......................................................................................... 42

Figure 3.2: Sample of co-occurrence processing. ...................................................................... 43

Figure 3.3: Cooccurrence matrix image on degraded document image : (a) Original image, (b)
Cooccurrence matrix image. .................................................................................... 43

Figure 3.4: The overall scheme of the proposed method. .......................................................... 45

Figure 3.5: Overall steps related to the parameters optimization of the threshold-based
binarization method. ................................................................................................ 47

Figure 3.6: Sample of background and foreground document image: (a) Background, (b)
Foreground. .............................................................................................................. 48

Figure 3.7: F-Measure function sliding window size ................................................................ 49

Figure 3.8: F-Measure function k parameter. ........................................................................... 49

Figure 3.9: Sample for a subjective evaluation on degraded document images from Dibco-
2009 datasets with a distance = 1, window size = 41 and k=-1: (a) Haralick’s contrast
feature with angle = 0° and (b) ) Haralick’s contrast feature with angle = 135°. ... 51

Figure 3.10: Sample for a subjective evaluation on degraded document images from Dibco-
2011 datasets with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s
contrast feature with angle = 0° and (b) Haralick’s contrast feature with angle = 135°.
.................................................................................................................................. 52

Figure 3.11: Sample for a subjective evaluation on degraded document images from Dibco-2009
datasets with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean
feature with angle = 0° and (b) Haralick’s mean feature with angle = 135°. .......... 52

Figure 3.12: Sample for a subjective evaluation on degraded document images from Dibco-2011
datasets with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean
feature with angle = 0° and (b) Haralick’s mean feature with angle = 135°. .......... 52

Figure 3.13: Sample for a subjective evaluation on degraded document images with angle =
135°, distance = 1, window size = 41x41 and k =-1 : (a) original image, (b) binarized
image using Nick, (c) Proposed Method. ................................................................. 53

Figure 4.1: Processing of the LBP code. ................................................................................... 56

Figure 4.2: LBP image of degraded document: (a) Degraded document from Dibco-2012 dataset,
(b) LBP image. ......................................................................................................... 56

Figure 4.3: The overall scheme of the proposed method. .......................................................... 57

Figure 4.4: Selecting the best thresholding-based method. ....................................................... 58


XIII
Figure 4.5: Sample of LBP and original image variances: (a) original image, (b) original image
variance, (c) LBP image, (d) LBP variance. ............................................................59

Figure 4.6: Flowchart for estimating mean and variance from original and LBP images. ........60

Figure 4.7: Binarization method taking into account the advantages of both methods SLBP and
SLBP-C.....................................................................................................................61

Figure 4.8: Overall steps to set optimal thresholding-based method parameters. ......................62

Figure 4.9: F-Measure function the k parameter. .......................................................................63

Figure 4.10: F-Measure function sliding window size. ..............................................................63

Figure 4.11: Subjective comparison between classical thresholding-based methods against


SLBP-C method........................................................................................................65

Figure 4.12: Subjective evaluation on a Sample of binarized images, respectively: (a-b)


Original image with stain degradation, (c-d) SLBP method; (e-f) SLBP-C
method. ....................................................................................................................66

Figure 4.13: Subjective evaluation on a Sample of binarized images, respectively: (a-b)


Original image with Non-uniform background degradation, (c-d) SLBP method;
(e-f) SLBP-C method. ...........................................................................................66

Figure 4.14: Subjective evaluation on a Sample of binarized images, respectively: (a-b)


Original image with Ink fading degradation, (c-d) SLBP method; (e-f) SLBP-C
method. ....................................................................................................................67

Figure 4.15: Subjective evaluation on a Sample of binarized images, respectively: (a-b)


Original image with Ink-bleed through degradation, (c-d) SLBP method; (e-f)S
LBP-C method. .......................................................................................................67

Figure 5.1: Simplified implementation of Gabor filter bank. For simplified notation,
𝐻𝑥, 𝑦, 𝑓𝑖, 𝜃𝑗 and 𝐺𝑥, 𝑦, 𝑓𝑖 are denoted respectively 𝐻𝑗𝑖𝑥, 𝑦 and 𝐺𝑖𝑥, 𝑦, (𝑖 =
0, . . , 𝑁𝑓 − 1) and (𝑗 = 0, . . , 𝑁𝜃 − 1). 𝑁𝑓 and 𝑁𝜃 define the numbers of central
frequencies and orientations, respectively. ...............................................................73

Figure 5.2: Estimation of the standard deviation from degraded and Gabor filtered image (a)
Degraded image, (b) Standard deviation estimated from the degraded image, (c)
Standard deviation estimated from the Gabor filtered image. ..................................74

Figure 5.3: General scheme of the proposed method. ................................................................75

Figure 5.4: Fourier-transform of a degraded image: (a) Degraded image, (b) Fourier transform.
..................................................................................................................................77

Figure 5.5: Steps of the proposed method performed on Non-Uniform background (Left) and
Stain (Right) degradations: (a) Degraded image, (b) Wiener filtering (c) Binarized
image (d) Morphological operator............................................................................79

Figure 5.6: Steps of the proposed method performed on Ink bleed-through (Left) and Ink
intensity variation (Right) degradations: (a) Degraded image, (b) Wiener filtering (c)
Binarized image (d) Morphological operator. ..........................................................80

XIV
Figure 5.7: Two representative images per degradation type used to set up the parameters of the
Gabor filter banks: (a) Stain degradation, (b) Ink bleed-through, (c) Non-uniform
background, and (d) Ink intensity variation. ............................................................ 81

Figure 5.8: The ground truth images corresponding to the selected images of Figure 5.7 used to
find all the optimal parameters ................................................................................ 82

Figure 5.9: Flowchart for finding the optimal parameters of the Gabor filter. estimation of the
binarization threshold. .............................................................................................. 83

Figure 5.10: Effect of selecting 𝝈 and 𝝆 parameters: (a) Degraded image, (b) Gabor filtered
image with 𝝈 < 𝝆, (c) Gabor filtered image 𝝈 = 𝝆, (d) Gabor filtered image with
𝝈 > 𝝆. ...................................................................................................................... 84

Figure 5.11: Effect of selecting the mask size of the Gabor filter: (a) Degraded image, (b) 7 × 7,
(c) 11 × 11, (d) 21 × 21. ......................................................................................... 84

Figure 5.12: F-Measure versus the central frequency for different angles (4, 8, 16, and 32)
according to the degradation type: (a) Stain, (b) Ink bleed-through, (c) Non-uniform
background, (d) Ink intensity variation, (e) Overall. ............................................... 86

Figure 5.13: F-Measure for different numbers of angles according to the degradation type for
frequency 𝑓𝑜𝑝𝑡 = 0.140: (a) Stain, (b) Ink bleed, (c) Non-uniform background, (d)
Ink intensity variation, (e) Overall. .......................................................................... 87

Figure 5.14: Sample of subjective evaluation for Stain degradation: (a) Degraded image H02
DIBCO-2012 as stain, (b) Sauvola’s method, (c) Ground truth image, (d) The
proposed Sauvola-Gabor method. ........................................................................... 92

Figure 5.15: Sample of subjective evaluation for Ink bleed-through degradation: (a) Degraded
image H06 DIBCO-2012 as ink bleed-through, (b) Sauvola’s method, (c) Ground
truth image, (d) The proposed Sauvola-Gabor method. ......................................... 92

Figure 5.16. Sample of subjective evaluation for non-uniform background: (a) Degraded image:
PR03 DIBCO-2013 as non-uniform background, (b) Sauvola’s method (c) Ground
truth image, (d) The proposed Sauvola-Gabor method. ......................................... 93

Figure 5.17: Sample of subjective evaluation for ink degradation: (a) Degraded image H09
DIBCO-2012 as ink degradation, (b) Sauvola’s method, (c) Ground truth image, (d)
The proposed Sauvola-Gabor method. .................................................................... 93

XV
XVI
Since his existence, man has never ceased to assert himself and to leave traces of his life, his
personality, and his culture by using transcriptions on various means such as wood, leaves,
stones, papyrus, paper, etc.
Through the ages and especially with the development of writing and the appearance of
paper, the handwritten document has become the most used and the most popular medium.
From this fact to the present day, manuscripts and printed documents have never ceased to
invade us to such an extent that it is very difficult and almost impossible to preserve them and
to find or search for useful information for any purpose whatever.
Among these documents, historical documents represent an important part of the cultural
heritage, which plays a fundamental role in the economic and social development of nations.
These documents are an essential characteristic of peoples and worldwide communities, as well
as a testimony of their culture and civilization. Protecting them not only helps to preserve the
heritage itself, but also civilizations, peoples, and nations.
Unfortunately, these documents are unique and there is a very great risk of losing them
irrevocably. These precious objects suffer continuously and progressively from many forms of
deterioration and degradation due to a multitude of factors such as bad storage conditions,
improper handling, dust, dirt, rusty staples, humidity, etc. There is therefore an urgent need to
find a way to keep them and to make their use as simple, effective, optimal, and wide as
possible.
The digitization of the document is the most appropriate way to preserve cultural heritage.
Thanks to information and communication technologies (ICT) as well as the development of
electronics, they have enabled the most suitable solution by providing flexibility in terms of
storage, sharing, ease of access. Furthermore, large amounts of documents can be stored,
duplicated, and preserved. However, because of the strong degradations of ancient and
historical documents, it is almost impossible to exploit them effectively. Indeed, it is impossible
to use, on raw images of ancient degraded documents, tools inherent to information technology
(IT) [1, 2] such as optical character recognition (OCR) [3], text and line segmentation,
document layout analysis, and recognition, word-spotting and information retrieval and so on.

1
INTRODUCTION

To overcome this problem, a conversion of the document into the most appropriate digital
representation is necessary. To do this, image processing methods, including binarization which
consists of separating the text from the background while considering minimizing the effect of
the different degradations and that of the binarization itself on the text, are mainly used.

The proposed work in this thesis is dealing with the problem of ancient and degraded
document image binarization. Although binarization of the degraded document image has been
studied for many years and various methods have been proposed in the literature, the problem
is still challenging [4-9]. Indeed, the field of ancient document image binarization remains an
open research area because of the variety, non-uniformity, and complexity of the degradations.
Most of the reported methods in the literature are threshold-based estimation on the pixel’s gray
level intensity by considering basic information of its neighborhood [10-18]. It is also worth
noting that texture-based features and time frequency-based methods have gained much
attention from researchers in various applications such as image segmentation, but few works
are reported to address binarization issues. Furthermore, it is shown that texture-based methods
of pixel’s neighborhood would be more representative than methods based on basic
neighborhood information of pixels (according to pixel’s gray level intensity) [19-23]. For these
reasons, our main motivation is to explore and investigate more deeply [24, 25] the use of
texture characterization for ancient degraded document binarization in order the enhance and
provide accurate binarization results. Therefore, this thesis will focus on the use of texture to
extract the relevant intensity characteristics based on the gray levels of the pixels and their
respective spatial neighborhoods information and eventually on characteristics based on the
pixel values in other spaces, to develop an enhanced threshold-based method inspired by the
most popular well-known thresholding methods. Moreover, in our approach, we will avoid the
use of a rough binarization step, which is mostly used in the literature. Our contributions can
be summarized as follows:

a) Proposing a new taxonomy and survey of ancient degraded document binarization


methods
After a detailed survey of various methods inherent in the degraded document binarization,
a global taxonomy on the degraded document binarization methods is proposed as well as the
use of a categorization of the degradation types in the design and the experimental stages of the
proposed methods.

2
INTRODUCTION

b) Proposing a new degraded document binarization


Explore new texture-based binarization methods with a new representation space to be
robust and accurate enough despite hard degradations type to extract the foreground (text) from
the Background with as little alteration as possible on the foreground. Textures such as co-
occurrences, Local Binary Patterns (LBP), and Gabor filters are used to overcome the lack of
grayscale pixel-based binarization methods. The categorization of ancient documents by types
of degradation makes it possible to obtain more satisfactory binarization results. As the
degradations are of diverse and complex origins, their modeling is a very interesting task to
achieve the best and accurate binarization. However, this task remains in a very challenging
research field.

After the Introduction where the presentation and the description of the concerned
problematics, the motivation, and the objectives, are provided, the thesis is organized into four
chapters. Chapter 1 describes the ancient degraded documents, their various degradation
sources, and types. Chapter 2 presents the related works for binarizing degraded document
images. Chapter 3 presents the co-occurrence-based document image binarization method.
Chapter 4 presents the LBP-based degraded document image binarization method. Chapter 5
presents the Gabor-filter-based degraded document image binarization. Finally, a conclusion
summarizes the work, and prospection is given for future work.

3
4
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

Abstract
The purpose of this chapter is to present the importance of the ancient degraded
(historical) documents, focusing on the most serious problems they are suffering
from. We will present the various types of document degradations and their origins.

Nowadays, paper is still the most used medium that continues to invade us in our life, despite
the development of electronic communicating systems. All over the world, many libraries are
concerned with two problems: (i) providing easy access to information for the users, (ii)
preserving books and documents, especially ancient and historical documents, from
deteriorating. Throughout the world, thanks to technological advances and the extremely low
cost of electronic equipment; the use of paper as a medium has almost declined in several
domains. Nowadays, we use less and less paper. Books, newspapers, invoices, and forms are
becoming more and more electronic. These electronic means has several advantages, for
example, ease of use, low storage cost, protection against deterioration of the medium, ease of
searching for information, as well as the economic aspect.
Nevertheless, the paperless objective is not quite achieved, but even if it is progressing at a
slower speed, the paper continues to invade us because of the provided comforts in some certain
cases such as reading newspapers or books for instance. Thus, the two means i.e. paper and e-
paper, coexist harmoniously. Nowadays, several digital and virtual libraries have been created
and the phenomenon is becoming more widespread [26, 27]. These libraries have mainly started
scanning documents that do not yet have a digital version.

5
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

Also, as noted above, there are many ancient, precious, and degraded documents around the
world, which suffer continuously and gradually. These documents, whether in libraries or
warehouses, are often stored in poor conditions, thus accelerating their deterioration. They
undergo many forms of deterioration and degradation due to a multitude of factors that mainly
result from poor storage conditions. It is urgent to find a way to save and protect them. A sample
of documents undergoing the effects of bad storage is illustrated in Figure 1.1, showing some
documents from Khizanates (traditional libraries) located in Adrar which is a region of southern
Algeria, which mostly shelters precious manuscripts and jealously guarded in such Khizanates
and Koranic schools. There is therefore an urgent need to find a way to keep them and protect
them from continuous degradation.

Figure 1.1: Example of ancient degraded documents due to their poor storage conditions.

6
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

To protect them and to avoid improper human handling, the degraded document is converted
to its electronic form using an optical acquisition device, such as a camera or a scanner as shown
in Figure 1.2. There are simple acquisition systems, ranging from webcams, cell phone cameras,
desktop scanners as shown in Figure 1.2 (a-b), to the most sophisticated with very high-
resolution sensors offering more advanced lighting systems and suitable for book scanning
Figure 1.2 (c-d). There exist also other scanners with advanced features such as automatic book
digitization Figure 1.2 (e), as well as those suitable for large formats Figure 1.2 (f).

(a) (b) (c)

(d)
(e) (f)
Figure 1.2: Various document scanner devices.

As in our thesis, we deal with the binarization of ancient degraded document images, it is,
therefore, necessary to define these degradations as well as their types and their origins.
Roughly speaking, a document degradation is a partial or a complete alteration that makes it
difficult to read a document, both for humans and computers. These degradations have several
origins. More details and explanations will be given in the following sub-sections. We notice
that the document images depicted in Figure 1.3, are from the national library of Algiers (BNA).

7
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

(c) (d)

(e) (f)

Figure 1.3: Origins of Document degradations: (a) Chemical, (b) Biological, (c) Human, (d)(e)(f) External.

1.2.1 Source of degradation


Generally, in the literature, degradations of ancient documents are classified into two
categories, according to their origin, namely physical or external [27, 28].
For the first category, the document is physically altered. The medium of the document itself
as well as the ink, are altered. All the external agents which are in contact with the document
contribute to generate or accentuate the various degradations.
For instance, as shown in Figure 1.3(a-b) the documents are physically altered, they may be
due to poor storage conditions, improper human handling, insects or rodents, moisture, the
chemical composition of the medium, and/or ink composition.
For the second category, modifications and alterations may occur on the reproduced
document itself, in other words, at the level of the digital document or the reprinted document.
This occurs during the scanning or reprinting process. Then, the reproduced document may be
8
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

altered mainly due to the quality of the reproduction material, its settings, and its conditions of
use. The quality of the material mainly depends on its technological characteristics among
others, we can cite for instance the main technological characteristics for (i) printing systems,
such as offset, thermal, laser, inkjet, color, grayscale, and for (ii) digitizing systems, such as
manual scanners, large volume, suitable for books, colors / grayscale, resolution, lighting and
speed, camera, lighting, resolution, see Figure 1.3 (c-d-e-f).

Physical degradation
Over time, ancient documents are themselves physically altered or modified by chemical,
biological, or human sources. Therefore, as explained above, these degradations are classified
in the category of physical sources. In the following, more details will be given.

a) Chemical source
The principal raw material for producing paper is cellulose fibers extracted from plants and
mainly wood and various additives [28]. These latter are added to make it more resistant or
giving it specific characteristics such as transparency, waterproofness, etc. Since cellulose is a
biopolymer essentially composed of carbohydrates and added with various chemical additives,
the paper becomes very fragile and subject to the various risks associated with inks, atmospheric
and climatic phenomena. It is worth noting that the ink is a liquid or paste which contains
colored organic or synthetic pigments. Indeed, over time, the humidity, light, and the nature of
the air create conditions of chemical reactions that alter the paper and the ink as well. It usually
results in yellowing of the paper and discoloration usually accompanied by a spread of the ink
and its dissipation across the front-page side to alter the back-page side and vice versa. This last
degradation is called ink-bleed-through.

b) Biological source
Mice, insects, and/or micro-organisms, favored by humidity and heat, can be attracted to
libraries or places containing paper documents or materials such as wood, cotton, and fibers.
These places are very conducive to the refuge and proliferation of all kinds of parasites.
Generally, paper and wood are their favorite foods. Over time, all documents would suffer
considerable damage and devastation. All these parasites generate various degradations ranging
from simple to the most severe. Damage such as stains, holes, would be observed on all
documents. Moreover, these documents can also crumble, and even irreversible damage is often
observed to the point of making them disappear gradually, see Figure 1.3 (a).

9
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

c) Human source
Because of the frequent and improper handling, as well as bad storage, of documents and
books, man contributes greatly to their degradation. This results in tears, folds, stains that make
the document fragile, illegible and thus promoting the acceleration of the degradation process.
Besides, he spares no effort to annotate, affix stamps, and uses adhesives or staples to repair
and restore worn documents, see Figure 1.3 (a-b).

External degradation
Degradations are produced during the capture or printing phase of documents; using devices
that may generate blurred images and/or induce degradations such as deformations and
distortions on the generated documents compared to the original documents as seen in Figure
1.3 (c-d-e-f). The appearance of the image captured or reproduced respectively by the scanning
and/or printing process depends on various factors related to the technology, characteristics of
the device, and its settings. More details will be given in the following.

a) Acquiring device
To obtain the digital image of the document, an optical acquisition system is generally used,
namely a scanner or a camera. For a scanner or a camera device, the image rendering depends
on the following characteristics: (i) The type of sensor (CMOS, CCD, etc.) can have different
resolutions, speed, and sensitivity to lighting. This last parameter is very important to reduce
this generation of various electronic noises within low lighting. Also, there are radio
interference noises that should not be overlooked. (ii) The lighting system is also of paramount
importance regarding the quality of the obtained images. This system must take into account
the intensity of the light, its wavelength, the frequency inherent in its generation, and the type
of the lamp, which can drastically affect the image quality of the document. Various types of
lamps exist such as incandescence-based, Light Emitting Diodes based (LED) producing light
by the movement of electrons between the two terminals of the diode, gas-based (neon, xenon,
etc.). (iii) The position of the sensor to the document to be scanned (distance, skew angle, etc.),
can generate a document image with an unwanted skew angle see Figure 1.3 (e). (iv) Camera
settings (resolution, shutter speed, focus, etc.) also affect the rendering of the image, for
instance, the image, shown in Figure 1.3 (c) is blurred due to either the resolution of the camera
which is improperly set or the camera focus is badly adjusted. More often the operator's fingers
are captured with the document image as shown in Figure 1.3 (e) where we can see, the
document image is altered by the appearance of the fingers image of the operator who scanned
the document, due to lack of means or negligence. Several other problems can arise from the
10
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

improper adjustment of the camera or scanner light and from the improper positioning of the
document which can cause distortions or shadows in the digitized documents. In Figure 1.3 (d)
we can notice a shadow generated by poor lighting and a curved distortion of the acquired
document caused by the fact that the document is not flattened during capture. The latter is a
recurrent phenomenon during document acquisition to the point that this issue (which is a
particular type of degradation) is addressed by researchers.

b) Re-producing device
Till now, printing books or documents was a tedious task. The technical processes were
heavy, expensive and their amortization could only be envisaged on large series. Today, with
the advent of high-performance digital printing, it is possible to print a book in very small copies
for a very reasonable price. Magazines, brochures, and catalogs as well as large print books are
printed on offset machines. It is a process that enables a book to be printed at a low unit cost in
large quantities of around a few thousand copies. To operate offset machines, you need to create
engraved plates, then install them on cylinders, then inking, and finally with many sophisticated
settings and expensive tests. Once these steps are completed, the offset machines produce huge
volumes in a very short time. The unit cost of the printed copy drops rapidly and becomes closer
to the paper cost. On the contrary, digital printing machines are designed for a series of less
than a thousand copies. A digital printing machine works, in principle, as a huge photocopier.
We can also cite office printers which allow us to reproduce only a few examples of about ten.
There is, therefore, no big problem with settings compared to the offset-based printing systems.
Overall, the image rendering depends on the following features:
(i) Resolution of the system and the quality of the ink, this latter may generate non-uniform
script intensity, ink seeping, and ink fading which depend closely on the paper quality
and composition.
(ii) The technology of printing system (offset, laser, inkjet, old printers, etc.) may generate
also script zones with non-uniform script intensity. For instance, certain old printers can
generate non-uniform characters, with a non-uniform space and gaps between characters,
non-straight lines, and faded characters, which can be observed in Figure 1.3 (f).

1.2.2 Type of degradation


Ancient documents can be affected by one or more types of degradations as shown in Figure
1.4, making binarization more difficult.

11
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

(c) (d)

(e) (f)

(g) (h)
Figure 1.4: Document degradation types ((a-b-c-d-e-f-g) from DIBCO Datasets, (h) From National Library of
Algiers): (a) (b) Ink fading, (c) (d) Stain, (e) Ink bleed-through, (f) Ink show-through, (g) (h) Non-uniform
background, Ink fading and uneven illumination

To obtain accurate and better results, several research studies deal with a unique and well-
known type of degradation [29]. In many cases, the most dominant degradation is targeted. The
degradations type can be classified as follows.

a) Ink degradation
A process known as photodegradation occurs when the light changes the chemical
compositions of the ink and thus causes discoloration. As the composition of the paper support
as well as that of the ink, are generally not uniform, non-uniform attenuations are often
observed. Ink fading and non-uniform ink intensity of the script document can be observed
locally or globally on the document. Generally, ink seeping is observed around characters, see
Figure 1.4 (a-b).

12
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

b) Stain degradation
Stains, smudges, smears, and shadows can be observed locally or globally on the document
at different intensities and scales, see Figure 1.4 (c-d). This usually results partially, in poor
readability of documents to the point of erasing the document foreground often irreversibly.

c) Ink bleed-through and show-through degradation


Depending on the degree of permeability or transparency of the paper, diffusion of the ink
bleed-through the paper can cause two degradations: (i) The backside of a page appears on the
front side of the same, page in the opposite direction of the reading and vice versa, see Figure
1.4 (e). (ii) A page (case of a book) appears in transparency on the front side of its previous
page, in the same direction of reading, with generally variable intensities, see Figure 1.4 (f). In
the first case, the degradation is called ink-bleed-through and in the second case, it is called
show-through. Both degradations are usually processed in the same way. It is worth noting, in
the case of cursive writing, one can easily recognize the difference by the direction of the skew
angle of the diffused script.

d) Non-Uniform background degradation


Non-uniform background degradation and distortions are mostly caused by the quality of
digitization equipment and/or by the uneven illumination caused by lighting conditions, which
can also induce a non-uniform contrast or a shadowing effect. The composition of the paper
medium can also generate, over time, a non-uniform discoloration under the influence of light
and humidity and therefore, generate non-uniform backgrounds and/or textured backgrounds as
shown in Figure 1.4 (g-h).

e) Mixed degradation
Generally, these ancient documents undergo several types of degradation at the same time,
as shown for instance in Figure 1.4 (g-h) several degradations (ink fading, shadowing, non-
uniform background), which then makes their processing more difficult and less effective.
However, their processing can be done according to the most dominant degradations.

In this chapter, we highlighted the importance of ancient documents and the need to preserve
them from different types of damage and degradation. The suitable solution to preserve the ancient
documents from more degradations is to digitize them to avoid human bad handling. Nevertheless,

13
CHAPTER 1: OVERVIEW ON DEGRADED DOCUMENT IMAGE BINARIZATION

it is important to notice that this action is not enough to exploit the document image by using ICT
tools.
We presented the main degradations and their sources. To deal effectively with each degradation,
we classified them mainly into four categories namely, Ink fading, Ink bleed-through, Non-uniform
background, and stain.

14
Abstract

The purpose of this chapter is to present a solution, based on document image


binarization to preserve and to deal with ancient degraded document images for
better support by information and technology tools, as well as taxonomy and a
literature review on degraded document image binarization.

As explained in Chapter 1, digitization is a suitable way to protect documents from further


degradation. However, due to degradation, it is almost impossible to use ICT tools, to benefit
from the provided flexibility in terms of storage, sharing, ease of access. It is impossible to use
IT tools such as OCR, word-spotting, object recognitions. Even if the document is printed but
altered, tools inherent to document image recognition and analysis, are impossible to use. To
understand these difficulties, a background on document understanding and analysis. Moreover,
a literature review on degraded document image binarization will be given in the following
subsection.

The OCR tools are not enough to recognize documents especially when the documents are
complex such as a document with multicolumn, images, and multi-policy as shown in Figure
2.1 (Newspaper page from the library “Bibliothèque Nationale de France (BNF): Source
). Analyzing and recognizing the physical
and logical structure of the document image is the key to converting accurately this type of
15
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

document images. Indeed, if we use an OCR on the document without taking into account the
logical structure of the layout; the OCR would have produced a document whose characters
will be converted into American Standard Code Information Interchange (ASCII) format, by
scanning the document’s characters from left to right, which will have the effect of juxtaposing
the words of the same lines of different independent paragraphs. Also, as fonts are not
recognized, which would have the effect of producing text without logical structures, therefore
of producing plain text without information related to logical entities.
Consequently, the produced text will only be a mixture of words and sentences without
meaning. Moreover, it is necessary to consider the difficulty caused by the presence of images
and graphics which must be extracted and recognized separately. Document Image Analysis
and Recognition (DIAR) systems, as shown in Figure 2.2, aim to recognize and extract all
document components from document image to obtain an accurate description of the document
structure, and recognize all the entities and objects present in the document image to obtain a
digital version as close as possible to the original (reproduce all the useful information both
logical and physical of document image) and suitable for using IT tools to find, search and
handle document information [30-32]. Indeed, DIAR provides document information and
features related to the document structure and layout such as section, headings, footnotes,
references, fonts, blocs, paragraphs, etc. to fully reconstruct the original document from the
electronic format and to take advantage of IT tools and all the computer-based tools for indexing
and searching document’s contents. Then the document is stored in a machine-readable format:
for instance, the image is stored in a jpeg format and the text, after using recognition tools such
as optical character recognition (OCR), is stored in ASCII format. Then information technology
(IT) tools based on retrieval information techniques could be used. However, this last operation
based on OCR tools is more difficult or even complex and impractical to achieve for
handwritten and ancient degraded documents. This category of documents presents challenges
that remain in the field of research. Due to various severe degradations, even printed historical
documents that have suffered over time can present more issues than contemporary handwritten
documents. The degraded and noisy document poses additional problems because of the
complex alterations in the quality of the document image. This makes it impossible to fully
convert the document into electronic representation by using OCR software. To overcome this
drawback, which is still a challenging problem in both fields related to document image analysis
and recognition, and of document image analysis and retrieval, many methods based on the
document image form, are proposed in the literature to access and manipulate document images
without the need for a complete and accurate conversion [33-35]. Word spotting is one of the
most common techniques used by researchers to use image matching to look for the similarity
16
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

of words in degraded documents. Nowadays, word spotting for word and document search and
recognition is a very popular area for the research community to solve several problems related
to the field of document analysis and retrieval [36]. However, to do this and to obtain accurate
results, it is necessary to have documents as clean as possible. We must extract the text
(foreground) as accurately as possible without any alteration despite the existence of
degradations and complex backgrounds. Document image binarization is the most appropriate
technique. More details about the binarization process will be given in the following
subsections.

Tille

Text

Blocks

Sub-titles

Image

Figure 2.1: Document image structure.

Document image
Acquisition

Textual Graphical processing


processing

Optical Page layout Line Region and symbols


Character analysis processing processing
Recognition

Horizontal lines,
Text Structure and features: Vertical lines,
Title, Headline, Filled region
Curves,
Author, Text Lines,
Foot notes, Titles, Text
Check box Image
Table,
Cell
Face Other
Figure 2.2: Overall flowchart, of Document Analysis and Recognition System.
17
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

The binarization process is carried out as depicted in Figure 2.3, following several steps such
as Digitization, Preprocessing, Binarization, Post-enhancement, and then Final binarized
document, the details of the various steps are given in the following.

Digitization Preprocessing Binarization Post-enhancement

Binarized
Degraded
Documents
Documents

Figure 2.3: Overall, of the binarizing scheme.

a) Digitization
In this process, as explained in chapter 1, a degraded document is converted to its electronic
form using an optical acquisition device, such as a camera or a scanner.

b) Preprocessing
This stage aims to enhance the image to prepare it for the next processing. Techniques and
tools such as color to grayscale conversion, noise removal, blur removal, histogram
equalization, and filters are applied to the acquired document image.

c) Binarization stage
Binarization techniques and algorithms are performed to achieve text and background
separation. A document image binarization consists generally of converting the pre-processed
gray-level image into a binary document where the image is represented by two subsequent
classes: foreground (text) and background (no text) as shown in Figure 2.4. It is worth noting
that in the binarization process, the researchers are mainly interested in extracting the text as
shown in Figure 2.4(b). Nevertheless in some works, mainly related to restoration[37, 38], are
dealing with the background extraction and enhancement. For instance, the background image
is generated using the original image Figure 2.4(a), from which we subtract the binarized image
(the foreground) Figure 2.4(b) to produce the intermediate background image Figure 2.4(c)
which contains pixels belonging to the background and then enhanced eventually for other
purposes if needed using inpainting techniques, to fill in the missing image information [39,
18
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

40]. Roughly speaking, a binarization process can be explained simply, for example, by the
following instance, let 𝐼(𝑖, 𝑗) a matrix representing pixel’s gray values of an image at
coordinates values (𝑖, 𝑗), where 𝑖 and 𝑗 represent respectively the raw and the colon coordinates.
The simplest way to perform a binarization ( leading to binarized image 𝐵(𝑖, 𝑗), is by finding
a global threshold 𝑇 which is a pixel’s gray level intensity allowing the separation of pixels
belonging to the foreground from those belonging to the background according to Eq.(2.1).

1 𝑖𝑓 𝐼(𝑖, 𝑗) > 𝑇
∀ 𝑖, 𝑗 ∈ 𝐵(𝑖, 𝑗) = { (2.1)
0 𝑒𝑙𝑠𝑒

Thresholding

(𝑎) Original image (𝑏) Binarized 𝑖𝑚𝑎𝑔𝑒

Image subtraction
𝒊𝒎𝒂𝒈𝒆 (𝒂) − 𝒊𝒎𝒂𝒈𝒆 (𝒃)

(c)Intermediate background image

Figure 2.4: Document image binarization process: (a) Original image, (b) Binarized image: foreground(text),
(c) Intermediate background image (pixels belonging to background).

19
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

d) Post-enhancement
Finally, post-enhancement (post-processing) such as grayscale normalization,
morphological filters: (erosion, dilation) are performed to improve the binarization results to
keep the text as clean as possible without damage and with a few alterations and artifacts as
possible. Typically, the document undergoes scale normalization and skew angle correction
step before addressing the problem of word-based recognition, such as word spotting [35, 36].
Once the document image is binarized and post-processed, the use of tools based on analysis
and document retrieval techniques becomes possible without having a full conversion of the
degraded document into electronic representation.

Binarization of the degraded document image has been studied for many years where various
methods have been proposed in the literature [4, 5, 8, 9]. It is noticed that it is very difficult to
propose a perfect and generalized binarization algorithm because of various and complex
degradation types. Moreover, it is almost impossible to model these degradations. To get
accurate and effective results, researchers among others, addressed these issues mainly,
focusing on specific degradation types such as ink bleed-through, show-through, ink stains,
faint characters, etc.[29] [41, 42]. Several methods and approaches are adopted in the literature
which will be detailed in the taxonomy and the following sub-sections.

2.4.1 Taxonomy of degraded document binarization methods


These methods can be classified into two main categories depending on the availability of
the document image sides: (i) those where we only have access to one side (One-sided
document image enhancement methods), recto or verso of the document image independently
and (ii) those where we have access to the recto and the verso sides simultaneously (double
sided-based document image binarization methods). It is worth noting that the double-sided-
based document image binarization methods are effective if both sides of the documents are
scanned under the same conditions to avoid possible different geometric distortions on each
side, such as lighting, resolution, rotation, and skew. That is why the most widespread degraded
document image binarization methods are on a one-sided category.
Most of the methods in the literature are based on thresholds or this latter combined with
classification-based techniques. Therefore, it is worth noting that thresholding techniques are
at the heart of most methods. Following [43] the threshold-based methods are categorized into
six groups, according to the information they are exploiting such as histogram shape-based
20
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

methods, clustering-based methods, entropy-based methods, object attribute-based methods,


spatial information-based methods, and locally adaptive thresholding-based methods. Wen et
al. [42] proposed a new categorization in which binarization methods are regrouped into three
major categories: clustering-based methods, threshold-based methods, and hybrid-based
methods. We complete this classification for proposing a taxonomy that is adopted in this thesis
as depicted in Figure 2.5, by introducing texture-based methods, classification based-methods
and hybrid-based methods. Details will be given in the following subsections.

Degraded document Image

One-sided Double-sided
documents documents
methods methods

Thresholding-based Classifier-based Hybrid Using both sides


document information

Global Local Unsupervised Supervised

-Pixel -Otsu
-Histogram -Kittler
Thresholding-based on Texture based
𝝈(𝒙, 𝒚) and 𝒎(𝒙, 𝒚) -Run length
-Wolf -Co-occurrence
-Niblack -LBP
-Sauvola -Gabor filters, etc.

Figure 2.5: Taxonomy of degraded document binarization.

2.4.2 Double-sided document image binarization methods


A double-sided-based document image binarization method assumes the existence of both
sides of document image acquired almost in the same conditions. These conditions are
necessary because the methods related to this type of document, aim to process each side
separately and simultaneously by considering the degradation seeped from one side to the other.
Ink bleed-through or show-through degradation type which is respectively a superposition due
to ink seeping, through the paper and paper transparency, with the other side of the document
page. For the former the recto side with the verso one and in the latter the verso side with the

21
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

next page. This phenomenon can be considered respectively as a superposition of two layers of
document image [44, 45]. Because the type of Ink bleed-through degradation is a text with
opposite slant script direction of the foreground (by mirroring) and often with lower intensity
and for the show-through degradation it is a text with the same direction of the foreground with
also lower intensity.
Addressing these issues by using methods accessing both recto and verso sides information
of the document images simultaneously rather than processing each side independently, would
allow achieving more accurate results. The results, of double-sided document image
binarization methods, would be an improvement over the one-sided ones if the documents are
acquired in the same conditions.
In [46, 47], the authors tackle the problem by registering the two sides of the document
images. For instance, the authors in [46], after the registration assume that the edges of the
foreground strokes are sharper than that of the interfering strokes and, the orientations of the
foreground strokes and the interfering strokes are different; to perform the separation between
the foreground and the degradation from the reverse side due to ink seeping. In [48], the authors
use a clustering method based on a set of content-level classifiers to differentiate between the
text and background. The clustering features are the estimated background and the estimated
stroke gray level.
In [44] [49] the authors addressed the problem as blind source separation, based on
Independent Component Analysis and on second-order statistics where a document is modeled
as a view of a linear combination of independent patterns (foreground and background).
However, to successfully achieve the registration step, which is time-consuming and not a
simple task, the documents (both sides if they exist) must be scanned with the same equipment
and conditions to avoid deformation such as scale, skew, offset issues, etc. Nevertheless, these
conditions are rarely met and fully satisfied.

2.4.3 One-sided document image binarization methods


As mentioned above, the most widespread degraded document image binarization methods
are based on the one-sided category where the binarization methods are based only on
information related to one side of the document. According to the literature and following our
taxonomy, we notice that the thresholding-based method gained much attention from the
researchers. Even if these methods are not always used directly, they are often combined with
other methods such as:
(i) Used as an enhancement or post-enhancement method.
(ii) Used as rough binarization methods.
22
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

(iii) Combined with methods based on a classifier to optimize the related thresholding
parameters.
(iv) Often the classifier-based methods are used as additional methods to optimize the
results or to select the best thresholding-based method.
According to the taxonomy, two notions global and local were used which will be explained in
the following.

2.4.4 Global and local binarization based methods


Following the literature [4, 50, 51] degraded document binarization methods can be
classified into two main categories according to the selected threshold. When the threshold is
computed for all pixels of a document image, the binarization is considered global. In contrast,
the binarization is considered local or adaptive when the threshold is computed for each pixel
using information about its neighborhood within a centered window.
The global thresholding does not make it possible to perform an optimal binarization in the
presence of serious degradations as shown in Figure 2.6. For instance, we observe, in Figure
2.6(a) a presence of a strong ink-bleed-through degradation and in Figure 2.6(b) a presence of
a strong stain degradation with almost the same intensity as the foreground, in some regions of
both document images.
The same problem is also observed in degradations such as severe ink-fading or severe
varying-illumination. Also, the problem becomes more difficult when the alteration is a
combination of different types of degradations or when it is severely diffused non-linearly
throughout the document.
In what follows, we will show up a binarization based on global and local methods of two
documents both affected by multiples severe degradations where mainly a severe stain
degradation and severe ink-bleed-through degradation, as shown respectively in Figure 2.7(a)
and Figure 2.7(b). The respective histograms Figure 2.7(c) and Figure 2.7(d) for each document
image (Figure 2.7(a) and Figure 2.7(b)), show well the strength of the degradations. If the
degradations are weak, generally we should see in the histograms Figure 2.7(c) and Figure
2.7(d) respectively of each degraded document: (i) two peaks corresponding to the foreground
and the background of the stain degraded document (Figure 2.7(a)) and (ii) three pics
corresponding to the foreground, ink-bleed-through and background of the ink-bleed through
the degraded document (Figure 2.7(b)). Global techniques do not, therefore, make it possible
to achieve good binarization performance but rather the opposite by eliminating the
degradations and the text as shown in Figure 2.7(e) and Figure 2.7(f).
Mostly, adaptive methods perform better and provide accurate results [52]. However, these
23
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

methods depend strictly on the strength of the degradation and on the size of the centered
window are generally related to the foreground stroke’s width, see Figure 2.7(g) and Figure
2.7(h), where the result of the binarization is not famous as well.

Text (Foreground) Ink-bleed-through degradation

Background

(a)

Text (Foreground) Stain degradation

Background

(b)

Figure 2.6: Document image with strong degradation type: (a) Ink bleed-through degradation, (b) Stain
degradation.

24
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2.7: Local and global binarization methods: (a) (b) Original degraded document images, (c) (d)
Respective histograms, (e) (f) Binarized document image using a global method, and (g)(h) Binarized document
image using a local method.

25
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

We also notice hybrid approaches using global and local methods [45]. Moreover, it is worth
noting that the global-based methods are generally used as an enhancement (pre-processing)
step providing a stage as a rough binarization, or for any other purpose for instance text line or
Region Of Interest (ROI) detection. The techniques and the methods based on global and local
(adaptive) thresholding for degraded document binarization will be detailed in the following
sub-section.

2.4.5 Thresholding-based binarization methods


Among the thresholding-based methods, the classical ones are well-known and the most
used in the literature. Niblack’s, Sauvola’s and Wolf’s methods are the most well-known
adaptive threshold-based methods for binarizing degraded document images. Niblack’s
threshold [53] is inspired by the Gaussian distribution given by the following equation.

𝑇(𝑥, 𝑦) = 𝑚(𝑥, 𝑦) + 𝑘𝜎(𝑥, 𝑦) (2.2)

where 𝑚(𝑥, 𝑦) and 𝜎(𝑥, 𝑦) are respectively the mean and the standard deviation of pixels
estimated within a centered window, namely 𝑊0, at the pixel’s coordinates (𝑥, 𝑦). The size of
W0 and the value of the parameter k are tuned experimentally [54]. Niblack’s method is
illumination sensitive and introduces a good deal of noise. Moreover, the binarization results
are worse for complex degradations such as faint characters and ink bleed-through or show-
through degradations. Generally, Niblack's method is used as a rough binarization method
because of its ability to binarize the document well enough without altering the foreground.
Sauvola’s method [55] is an improvement on Niblack’s method, especially in dealing with
stained documents when the standard deviation is adapted. Sauvola’s method threshold is given
by the following equation:

𝜎(𝑥, 𝑦) (2.3)
𝑇(𝑥, 𝑦) = 𝑚(𝑥, 𝑦) (1 − 𝑘 (1 − ))
𝑅

where 𝑅 is a global constant used to normalize the standard deviation and 𝑘 is defined by
experiments, generally equal to 128. The binarization results are improved especially for high-
contrasted and stained documents. However, the method still fails in the case of strong
degradations, such as ink bleed-through, show-through, and faint character degradations.
Wolf’s method [18] is an improvement on Sauvola’s method, it allows solving the problem

26
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

when the gray level of the foreground and background are closer by normalizing the global
contrast of the document image. The threshold is given by the following equation:

𝜎(𝑥, 𝑦)
𝑇(𝑥, 𝑦) = (1 − 𝑘)𝑚(𝑥, 𝑦) + 𝑘. 𝑀 + 𝑘. (𝑚(𝑥, 𝑦) − 𝑀) (2.4)
𝑅

where 𝑅 is set to the maximum standard deviation of all local neighborhoods and 𝑀 is the
minimum gray level value of the image pixels. However, since the normalization is global
(𝑀 and 𝑅 values are global), a small stained patch on the document will significantly alter the
entire image. Roughly speaking, adaptive methods can be considered more accurate than |global
methods. Though, in many cases, the estimation of the adaptive thresholding can fail drastically
in areas with low variance. Niblack’s method retains a lot of noise from the background,
Sauvola’s method produces better results, while Wolf’s method is the best. specifically, to
stained documents. However, strong degradations as ink bleed-through and faint characters
could not be removed in most cases.
Otsu’s method is one of the widely-used global methods [56] based on computing a global
threshold to maximize inter-class variance. Similarly, Kapur’s method [57] uses entropy-based
on the probability distribution of the document image’s gray level. Both global threshold
methods are powerful for documents with a bimodal distribution of gray levels. When the
distribution of gray levels is not bimodal, adaptive techniques are more appropriate that can
adjust locally the threshold according to various measures in certain local regions around each
pixel. In this case, various methods have been developed, for instance, in [12] the author has
modified Otsu's global method to make it adaptive by considering the background estimation
and the stroke width of the foreground-script. However, with classical thresholding-based
methods, many insufficiencies remain. Alternatively, the authors in [16] proposed a new
method inspired by Niblack’s in which the standard deviation is replaced by entropy. The
obtained results are promising and could outperform classical threshold-based methods. Similar
methods using other statistics than mean and standard deviation are proposed in [58, 59].
In [42], as depicted in Figure 2.8, the Curvelet transform is combined with Otsu's method
to binarize non-uniform illuminated images. The Curvelet coefficients are extracted from the
degraded document image and then the adaptive nonlinear functions are applied for histogram
adjusting and denoising. Then, after reconstitution of the image using the inverse Curvelet
transform, the Otsu method is applied for binarization. The Curvelet transform and nonlinear
enhancing improved the distribution of the document image histogram. The authors noticed that
the proposed method leads to better results than classical thresholding-based binarization

27
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

methods, however, errors exist mainly in the contours of the image as shown in Figure 2.9.

Enhancement
Low
frequency Non-linear
coefficients denoising
transform
Degraded Inverse
document Curvelet Curvelet Otsu’s
image transform Transform method

High
frequency Non-linear
coefficients denoising
transform

Figure 2.8: Overall scheme of Curvelet transform-based binarization [42].

(a) (b)
Figure 2.9: Curvelet transform-based binarization: (a) Original image, (b) Binarized image [42].

In recent years, the previous methods have often been combined or used as a rough
binarization method and/or as an enhancement (pre-processing) or post-enhancement (post-
processing) step to obtain an improved binarized image, as reported in [24-26] and in different
DIBCO competitions[1] [4-8] [60, 61].
The winners of recent DIBCO events achieved their results partially by using these
threshold-based methods with the estimation of the ink and background classes and by using a
dynamic sliding window to enable more accurate individual pixel classification. Other works
reported combining classical methods with other information from the foreground or the
28
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

background of the document image. For instance, an adaptive and parameter-less generalization
of Otsu's method is achieved by combining multi-parameters of the document [12].
In [14], both background and text stroke information, are estimated within a simple
threshold-based method followed by post-processing stages to improve the quality of the
binarization. In [29] the authors addressed the problem of the faint character of handwritten
document images by using inpainting-based methods; the stroke and background are estimated
within a combination of Niblack’s and Otsu’s methods. In [62] the author makes use of adaptive
image contrasts where an adaptive contrast map is constructed for an input degraded document
image. The contrast map is then binarized and combined with Canny's edge map to identify the
text stroke edge pixels. The document text is further segmented by a local threshold that is
estimated based on the intensities of detected text stroke edge pixels within a local window.
The method is simple and has been tested on three public datasets (DIBCO-2009, DIBCO-2011,
and H-DIBCO-2010), achieving scores that outperform or are close to the best ones in the three
contests. In [63] the author presents an automatic technique for setting the best parameters
suited to the individual image that yields outperformance of the state of the art. Moreover, in
[64], a learning framework is introduced for the optimization of the binarization methods.

2.4.6 Classifier based methods


Mostly, in degraded document binarization, learning techniques are used to categorize or
classify each similar data point into specific classes. Supervised and unsupervised based
methods can be both used.

Unsupervised based methods


Clustering is an unsupervised learning technique used to categorize or classify each similar
data point into a specific group. The most known methods are k-Means Clustering, Mean-Shift
Clustering, Density-Based Spatial Clustering, Expectation-Maximization (EM) Clustering, and
Agglomerative Hierarchical Clustering.
The k-means is the most well know clustering algorithm[65]. It starts with a random choice
of cluster centers and therefore it may yield different clustering results on different runs of the
algorithm. Each data point is classified by computing the distance between that point and each
group center and then classifying the point to be in the group whose center is closest to it. The
group centers are recomputed taking the mean of all the vectors in the group. until the group
centers do not change much between iterations.
Mean shift clustering is an algorithm based on a sliding window that attempts to find dense
areas of data points. Unlike k-means clustering, it is not necessary to select the number of
29
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

clusters, because mean-shift discovers it automatically [66, 67]. An example of performing the
k-means based classification with k=3 is shown in Figure 2.10. The k-means method is applied
on degraded document images, for instance in Figure 2.10(a). According to the histogram
shown in Figure 2.10(c), we assume that each histogram peak represents each class of
document’s layer of the degraded gray-level document image (Figure 2.10(b)), namely:
foreground(binarized document image), background, and ink-bleed through degradation which
are represented respectively by Figure 2.10(d), Figure 2.10(e), and Figure 2.10(f). Effectively
we notice, visually that these classes match well the previous assumption and each layer is well
extracted particularly we notice the well-binarized image (Figure 2.10(d)). However,
unfortunately, when we perform the same methodology on another document image as Figure
2.10(g), the expected layers represented respectively by Figure 2.10(j), Figure 2.10(k), and
Figure 2.10(l) do not match the assumption despite the presence of histogram peaks similarly
as previous as shown in Figure 2.10(i) and the binarized image is so bad as depicted in Figure
2.10(j). We conclude that the classes generated by k-means do not always correspond to the
various layers of a degraded document with the type of ink-bleed-through degradation.
Nevertheless, the method can be very useful for an enhancement (pre-processing) step.
In [68] the authors used an adaptative binarization-based method and combined it with a k-
means based method to avoid setting any threshold. In [69] the authors combined Otsu’s
global method with the k-means based method. In [70] a novel binarization technique
combining local and global approaches using the clustering algorithm k-means, where the
document is first, divided into several blocks, and a k-means algorithm is applied on each block,
then a global phase gathers the obtained k-means results from each block iteratively and
performs a loop until a global convergence is reached. Similar algorithms are presented in the
competition H-DIBCO 2012 [6]. It is worth noting that the k-means based methods are often
combined with other methods and they are powerful on degraded documents with uniform
backgrounds and especially for ink-bleed-through degradations. In [71] the author presented a
method based on a k-means algorithm that is applied sequentially by using a sliding window
where color sample features, set manually, are defined for each class. In [45] the author
presented a method based on the k-means clustering algorithm and the principal component
analysis. The author addressed the problem of the color degraded document where the PCA is
used mainly to decorrelate the colors according to Red, Green, Blue (RGB) features, and then
a clustering based on k-means is used. The method shows good results on documents with ink-
bleed-through degradation and where the foreground is represented by a single color. On the
other hand, the method fails to process multicolored recto documents. However, we have

30
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

noticed that most of these unsupervised approaches, especially when not combined with other
methods, still present difficulties in distinguishing text from non-text components.

(a) (g)

(b) (h)

(c) (i)

(d) (j)

(e) (k)

(f) (l)
Figure 2.10: Binarization based on k-means method: (a) (g) Original degraded document images, (b)(h) Gray
level degraded document image, (c)(i) Respective histogram, (d)(e)(f) and (j)(k)(I) Respective degraded document
layers (foreground, Background, ink-bleed through degradation).

31
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

Supervised based methods


In the literature related to pattern recognition, Support vector machine (SVM) [72-76] and
Artificial Neural Networks (ANN) [77-79] are the most supervised applied techniques. The
Support Vector Machines (SVM) are machine learning algorithms for binary classification.
They are widely applied for pattern recognition and used as a powerful classification method.
More formally, a support vector machine constructs an optimal hyperplane that allows
separating two classes by searching the largest margin between the vectors of the two classes.
The SVM seeks to find the optimal separating hyperplane between these two classes by solving
a quadratic optimization problem. On the other hand, the ANN principle is based on an
information processing system inspired by biological neural networks. The ANN is an
interconnected group of artificial neurons arranged in one or several layers. The various ANN-
based methods are characterized by the structure of their connections between the neurons. A
common criticism of neural networks is that they require a large diversity of training samples.
In the literature on the binarization of degraded documents, authors have addressed the use of
these classifiers based on SVM and neural networks. In [80] the author presents a method based
on an SVM for the binarization of degraded document images. The document image is mainly
segmented into regions, then an SVM is used to select an optimal global threshold for the
binarization of each image block. These blocks as shown in Figure 2.11, fall into three
categories: (i) an image block only contains background pixels (blocks e and j ), (ii) the
proportion of the text is almost equal to that of the background (blocks b, f, and g), (iii) an
image block contains more background than text pixels (blocks a and d). Ten characteristic
parameters are selected to establish the SVM classification model, including mean, standard
deviation, entropy, uniformity, etc. The extracted features of each image block are fed into the
SVM to determine the block’s optimal threshold.

Figure 2.11: An image blocking sample [80].


32
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

In [81] the authors combine k-means classification with a classical binarization method to
generate a pure learning set and a conflict class. The SVM classifier is used to manage the
conflict class to make the final binarization that classifies each pixel of the image document as
foreground or background. Experiments are conducted on the standard datasets DIBCO-2009
and DIBCO-2011.
In [82] the authors addressed the problem of severe varying illumination in degraded
document images. The proposed method divides an image into several regions and decides how
to binarize each region with an Otsu’s thresholding-based method. The decision rules are
derived from a learning process based on SVM which is trained using as input the 03 features:
Otsu’s threshold of the region, the min of Otsu’s threshold of the neighboring, standard
deviation, and mean; then the action is used to select one of the Otsu based methods. According
to the authors, favorable results are obtained. Nevertheless, the evaluation is achieved
subjectively and in terms of the OCR performance.
Many other methods using neural networks are proposed in the literature [83-86]. In [85] the
authors use a back-propagation neural network to directly classify image pixels according to
their neighborhood. For each pixel p of the image, the Multi-Layer Perceptron (MLP) is fed, by
the gray values of the pixel p with those of its neighbors. The MLP should then output the value
0 for black or 1 for white. The method is tested on synthetic data and compared mostly against
the classical thresholding-based methods and the authors say that the results are promising
although it is ranked third after Sauvola’s method. In [86] where a neural network is trained
using local threshold values of an image to determine an optimum global threshold value which
is used to binarize the whole image.
Deep learning and combined based methods are addressed by authors, in [87] to enhance a
document’s quality by combining Otsu’s method and CNN based method. The performance
was evaluated according to the OCR accuracy and the tests were carried out on documents taken
from a mobile camera and newspapers. In [88] the authors propose a method based on a deep
convolutional neural network (DCNN) for adaptive binarization of degraded document images.
The method consists of decomposing a degraded document image into a spatial pyramid
structure by using DCNN, with each layer at different scales. Then the foreground image is
sequentially reconstructed from these layers by using deconvolutional network. Experimental
were carried out on DIBCO datasets and compared only against Sauvola’s and Otsu’s methods
were the results demonstrate the effectiveness of the proposed method.
In [89] the authors propose a method based on a CNN composed of two groups of
convolutional layers and a fully connected layer, a sliding window centered at the classified
pixel is used within the CNN to classify each pixel into foreground and background. In [90], a
33
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

novel supervised-binarization method based on a hierarchical Deep Supervised Network (DSN)


architecture is proposed. Given a gray-scale document image, the document is scanned using a
local window of size 𝑑x𝑑, to obtain input image patches. The proposed DSNs were trained over
the created image patches. Three foreground maps are predicted for each image patch where
each map represents special information of the foreground and background. The optimal
threshold T value is analyzed on the predicted maps of the training images. The authors say that
the evaluation regarding the different measurements on three public datasets shows that the
proposed method shows good results, and some problems remain to be resolved particularly for
manuscript documents with ink fading degradation.

2.4.7 Texture based binarization methods


In image processing, the concept of texture [91-93] is difficult to explain because of the
existence of multiple definitions. It is one of the least well defined, indeed there is no universal
definition. Despite its omnipresence in the images, there is no formal approach or precise
definition of the texture, they vary according to the research fields and the authors' conceptions.
We distinguish two definitions cited in [92] where following [94], a texture can be
considered as a macroscopic region, where its structure is simply attributed to the repeating
patterns in which the primitives are arranged according to a placement rule and in [95] the
authors regarded a texture as follows: a region in an image has a constant texture if a set of local
statistics or other local properties of the picture function are constant, or approximately
periodic. Moreover, it is worth noting two main tools cited in [92] to characterize textures (i)
the spatial approaches that are statistical, geometric, and model-based methods, (ii) methods
resulting from signal processing and related to a frequency domain.
Among the texture-based methods [96], there may be mentioned the most used such as the
gray level co-occurrence matrix GLCM, LBP, and Gabor’s filter banks.
Texture-based features and time frequency-based methods have gained much attention by
researchers in various applications such as image segmentation [21, 22] [96-99] and
particularly, the wavelet transform and the Gabor filtering techniques [19-21] are widely
applied in texture analysis in various domains such as in pattern recognition, image denoising,
classification, image compression, image segmentation and, more specifically in complex
documents, for segmenting text for its ability to the oriented texture discrimination. In [99],
local features are incorporated into a conventional entropic method to implement a thresholding
based-method. For document binarization, a texture feature-based thresholding method to
address the problem of binarization of documents with poor contrast, strong noise, and variable
modalities in gray-scale histograms, has been reported in [100].

34
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

The image binarization has been performed by extracting texture features from the run-
length histogram combined with Otsu’s method. Candidate thresholds are produced using
iterative Otsu’s method; and then a selection of the optimal candidate is carried out using a
texture feature extracted from the run-length histogram. Experiments with 9,000 printed address
blocks from an unconstrained U.S. mail stream demonstrated that over 99.6 percent of the
images were successfully binarized. The method is mainly compared to Otsu’s method.
Another method, based-texture [101] where the authors proposed a color text images
binarization method which is composed of main steps, as the color space dimensionality
reduction, extraction of texture characteristics, and selection of the optimal binary image. Two
types of effective texture characteristics, respectively, run-length histogram and spatial size
distribution, are extracted. Experiments were carried out on a text images dataset of more than
500 colors, the results are very promising. In [102] a texture-based LBP is applied on degraded
document images to extract a region of interest (ROI) on which a rainfall process is performed
iteratively and then a threshold is applied to produce the binarized image. The authors say that
the LBP descriptor is a good descriptor as a text extractor. In our work we used texture and
namely LBP, Co-occurrence and Gabor filter within a thresholding-based method to binarize
ancient degraded document images [59, 103-105], these methods will be detailed in the
following chapters.

2.5.1 Datasets and evaluation protocol


The DIBCO (Document Image Binarization) and HDIBCO (Handwritten DIBCO) are
organized annually in conjunction with the International Conference on Document Analysis
and Recognition (ICDAR) and the International Conference on Frontiers in Handwriting
Recognition (ICFHR). The overall goal is to record recent advances in the binarization of
machine-printed and hand-written images. To this end, each year of the contest submission, sets
of data represented by a variety of about ten digitized documents, printed and scanned
manuscripts in colors and/or grayscale levels for which the ground truths of the binary images
were created and a set of evaluation measures were provided for benchmarking: A sample of a
color degraded document image and its ground truth from DIBCO-2016 [106] is shown in
Figure 2.12. The proposed methods are evaluated using a set of ancient degraded documents
provided from DIBCO datasets [4, 107-110] and a set of ancient degraded documents provided
from the National Library of Algiers (Bibliothèque Nationale d’Alger: BNA), a sample is
shown in Figure 2.13. This latter dataset is used for subjective evaluation. For objective

35
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

evaluation, two protocols are used; they are, based on, the use of the blind and the unblind
datasets [111]. The blind dataset is a collection of DIBCO degraded images provided according
to years of submission to the contest, where the degradation type is unknown. By contrast, the
unblind dataset is a collection of degraded images grouped according to the degradation type
[111].

(a) (b)
Figure 2.12: Sample of a DIBCO-2016 degraded document image with its ground truth: (a) Original color
degraded document image, (b) Ground truth binarized document image.

Figure 2.13: Sample of a degraded document image from the National Library of Algeria (BNA) used for a
subjective evaluation.
36
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

The effectiveness of the proposed method is shown by comparing the results against the
well-known thresholding-based methods on unblind and blind datasets as well as a comparison
with some of the state-of-the-art methods is achieved on blind datasets. It should be noted that
it is more appropriate to compare the performance of a binarization method based on the unblind
datasets, as mentioned by the authors in [111]. The authors demonstrated that there is an
ambiguity in evaluating the binarization methods on blind datasets and proved that a
binarization method can perform better when it is performed on a specific degradation type.
For this purpose, the authors [111], have categorized the DIBCO datasets by type of
degradations. Where each document image is represented by its dominant degradation type.
Four main degradations types are specifically defined such as, Stain, Ink bleed-through; Non-
Uniform- Background and Ink intensity variation, namely respectively Type A, Type B, Type
C and Type D. To this we add another category, namely the mixed category which contains
documents affected by several types of dominant degradation.
Table 2.1, reports the assignment of DIBCO images organized according to the year of the
contest, the number of images for each dataset denoted by #images, the written type (WT)
denoted by P and W respectively for printed and handwritten documents. It shows also the
image dataset organization according to the degradation type and their position in the dataset
[111]. For instance, as depicted in Figure 2.14, the handwritten image HW08 indexed 8 in
DIBCO-2013 belongs to degradation (Type B), the handwritten image H04 indexed 04 in
DIBCO-2010 belongs to (Type D), the printed image PR06 from DIBCO-2011 belongs to
(Type C) and the image HW6-DIBCO-2011 indexed 06 belongs to mixed degradation (Type
D+A). It is worth noting that these categorized datasets could be used in various steps of design
such as training steps, parameters setting steps, evaluation steps.

Table 2.1: DIBCO dataset classification according to the degradation type [111].
Dataset WT #Images Degradation type
year A B C D Mixed
2009 P 5 4 1; 2; 5 3
H 5 4 2; 3 1 5(A+C)
2010 H 10 6 5 1; 3 2; 4; 7-10
2011 P 8 3;5 1; 2 4; 6; 7 8
H 8 4; 5 7 1 2; 3; 8 6(A+D)
2012 H 14 2; 8 3; 4; 7; 9-13 [4] (A+D)
2013 P 8 4; 5; 8 1; 2; 6 3 7
H 8 4; 5; 8 2 1;7 3(B+D); 6(B+D+C)

37
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

(c) (d)
Figure 2.14: Example of degradation type categorization: (a) Type B, (b) Type D, (c) Type C, (d), Type Mixed
(D+A).

2.5.2 Evaluation criteria


For the evaluation of the binarization method, we process scores using various measures
between each binarized image and its corresponding ground-truth image provided with the
DIBCO datasets. For an objective comparison, three standard measures [4, 107, 109, 112] are
used to evaluate the method’s performance: F-Measure (FM), Pick Signal to Noise Ratio
(PSNR), and Distance Reciprocal Distortion (DRD). More specifically, FM denotes the
percentage of accuracy of the binarized image (the higher FM, the higher the accuracy of the
binarized image) defined as follows:

𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛. 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹𝑀 = 2. (2.5)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

𝑇𝑃
where 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and 𝑅𝑒𝑐𝑎𝑙𝑙 are defined respectively as 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 and 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
. 𝑇𝑃, 𝐹𝑃, 𝑎𝑛𝑑 𝐹𝑁 denote, respectively, True Positive (occurs when both the image
𝑇𝑃+𝐹𝑁

binarized pixel and the ground truth are labeled as foreground), False Positive (occurs when the

38
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

image pixel is labeled as foreground and the ground truth is background) and False Negative
(occurs when the image pixel is labeled as background but the ground truth is foreground).
The PSNR measure denotes the similarity between the binarized and the ground truth image
(the higher the PSNR, the higher the similarity), which is defined as:

𝐶2 ∑𝑀 𝑁
𝑖=1 ∑𝑗=1(𝐵(𝑖,𝑗)−𝐺𝑇(𝑖,𝑗))
2
𝑃𝑆𝑁𝑅 = 10. log (𝑀𝑆𝐸 ), 𝑤ℎ𝑒𝑟𝑒, 𝑀𝑆𝐸 = (2.6)
𝑀×𝑁

where 𝑀 × 𝑁 is the size of the binarized document image 𝐵(𝑖, 𝑗) and the binarized ground truth
𝐺𝑇(𝑖, 𝑗), and 𝐶, denotes the gray-level difference between foreground and background.

The DRD measure denotes the distortion for all the flipped pixels (the lower the DRD, the
higher the similarity), which is defined as follows:

∑𝑆𝑘=1 𝐷𝑅𝐷𝑘 (2.7)


𝐷𝑅𝐷 =
𝑁𝑈𝐵𝑁

Where, 𝐷𝑅𝐷𝑘 is the distortion of the 𝑘 𝑡ℎ flipped pixel and 𝑆 is the number of flipped pixels.
The parameter, 𝑁𝑈𝐵𝑁 is the number of the non-uniform 8*8 blocks in the Ground Truth (GT)
image. The distortion of the 𝑘 𝑡ℎ flipped pixel is defined as a weighted sum of pixels in the 5*5
block 𝑊𝑁𝑚 of the GT that differs from the centered 𝑘 𝑡ℎ flipped pixel at (𝑥, 𝑦) in the binarized
image 𝐵, as follows:

2 2
𝐷𝑅𝐷𝑘 = ∑ ∑ |𝐵𝑘 (𝑥, 𝑦)−𝐺𝑇𝑘 (𝑖, 𝑗)| ∗ 𝑊𝑁𝑚 (𝑖, 𝑗) (2.8)
𝑖=2 𝑗=2

In this chapter, we depicted, as a solution, the different stages based on digitization and
binarization. We showed the possibilities and the difficulties as well as the challenges
encountered by researchers in this field. So, we highlighted recent publications and the
challenges they have faced. It is worth noting that previous methods are based on a spatial
representation of the document image in which pixel gray level intensity or on simple pixel
neighborhood information is often used combined with mainly classical well-known
thresholding-based methods and/or with certain heuristic features such as stroke width and
background estimation. Moreover, it should be noted that there is no generic method to process
39
CHAPTER 2: LITERATURE REVIEW ON ANCIENT DEGRADED DOCUMENT IMAGE BINARIZATION

all the degradations, moreover, the problem is more complicated with severe degradations.
Therefore, taxonomy has been developed and presented.
In conclusion, we confirmed that most of the methods are generally a succession of different
methods and predominantly classical thresholding methods are combined or used as an
approximate binarization method.
To the best of our knowledge we noticed, texture and frequency space features have received
little attention to date and few works related to degraded document image binarization have
been reported, which motivated our choice to develop them further, while trying to avoid the
use of other methods, in particular the classical methods based on thresholding, as a rough pre-
processing.

40
Abstract

This chapter presents an adaptive threshold-based method, where a texture


descriptor is extracted from a co-occurrence matrix and used within a Niblack
thresholding-based method, for degraded document image binarization. The
proposed method is tested objectively and subjectively.

To solve the drawback of using simple pixel's neighborhood information. The proposed
method is an adaptive threshold-based which is computed by using a descriptor based on a co-
occurrence matrix. The method is tested objectively using degraded documents from DIBCO
datasets and subjectively using a set of ancient degraded documents provided by a national
library. The results are satisfactory and promising, and present an improvement compared to
classical well-known methods.

First and second-order statistics are among the most used descriptors in texture in image
segmentation and particularly in document image segmentation [113-118]. The co-occurrence
matrix is one of the well-known and most widely used for texture descriptors. The spatial gray
level co-occurrence matrix (𝐺𝐿𝑀𝐶) is a second-order statistics-based method used for
generating texture features. It makes it possible to draw up a new matrix starting from the
whole image or part of the latter. The construction of the co-occurrence matrix considers the
orientation and the spatial distribution of the pixels.

41
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

The 𝐺𝐿𝐶𝑀 represents the joint frequencies of all pairwise combinations of gray levels
intensities within a distance 𝑑, representing the number of pixels, and along a direction θ, as
shown in Figure 3.1. In other words, the 𝐺𝐿𝑀𝐶 matrix, estimates image properties based on the
joint probability of gray levels occurrences of two pixels in a given direction and distance
defined by the vector distance 𝑑⃗ = (∆𝑥, ∆𝑦), or a distance 𝑑 and angle θ as seen in Figure 3.1.

(b)
(a)
Figure 3.1: Principle scheme of co-occurrence matrix: a) A sample of the sliding window (size=5x5) showing
the 08 directions with a distance 𝑑 = 2, b) the sliding matrix within the matrix image.

Given a gray level image represented by a matrix 𝐼(𝑥, 𝑦) of dimension 𝑁x𝑀 as shown in Figure
3.1(b). Each component of 𝐺𝐿𝑀𝐶(𝑖, 𝑗), matrix of dimension 𝑁𝐺 x 𝑁𝐺 (where 𝑁𝐺 is the number
of grayscale of the image 𝐼(𝑥, 𝑦) ), represents the frequency of occurrences of a pair of pixels
gray levels in a spatial relation separated by distance d and angle θ, is computed for one
direction of the neighborhood vector distance 𝑑⃗ = (∆𝑥, ∆𝑦) or computed in term of a distance
𝑑 with angle 𝜃 by the following equation.

𝐺𝐿𝑀𝐶(𝑖, 𝑗) = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝐼(𝑥, 𝑦) = 𝑖 𝑎𝑛𝑑 𝐼(𝑥 + (3.1)


∆𝑥, 𝑦 + ∆𝑦) = 𝑗, 𝑓𝑜𝑟 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 d 𝑎𝑛𝑑 𝑎𝑛𝑔𝑙𝑒 𝜃

An example of a matrix of the 𝐺𝐿𝑀𝐶 matrix is depicted in Figure 3.2, where a given gray
level image represented by a matrix 𝐼(𝑥, 𝑦), where each pixel is coded on 𝑛 bits (where 𝑛 =
02), i.e. the gray-level pixel belongs to [0 3], a 𝐺𝐿𝑀𝐶 matrix is computed according to Eq(3.1),
for a given distance of one pixel (𝑖𝑒, 𝑑 = 1) and for a direction such as angle 𝜃 = 0° as
delighted in 𝐼 matrix, we noticed a frequency of two gray-levels values of the couple (1 2).
42
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Then this frequency will be written in 𝐺𝐿𝑀𝐶(𝑖, 𝑗) matrix of a dimension (𝑁𝐺 𝐱 𝑁𝐺 ) =


(2n 𝐱 2n = 4𝑥4) at 𝑖 = 1 and 𝑗 = 2 coordinates which are the respective values of the gray
level of the couple. Figure 3.3, illustrates an example of a co-occurrence matrix image.
We note that the co-occurrence matrix alone does not provide useful information that could
be provided to describe the image. Textural information should be extracted from this matrix.
In fact, from the matrix thus obtained, it is necessary to compute a certain number of
characteristics or attributes, there are in all fourteen according to Haralik [119]. We just
calculate four of the most important ones.

Figure 3.2: Sample of co-occurrence processing.

Figure 3.3: Cooccurrence matrix image on degraded document image : (a) Original image, (b) Cooccurrence
matrix image.
43
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

From the co-occurrence matrix, we compute some relevant, Haralick’s parameters [119],
some of these textural features are given by the following equations.

𝑁𝐺 −1 𝑁𝐺 −1
(3.2)
𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡(𝐺𝐿𝑀𝐶) = ∑ ∑ (𝑖 − 𝑗)2 𝐺𝐿𝑀𝐶(𝑖, 𝑗)
𝑖=0 𝑗=0

𝑁𝐺 −1 𝑁𝐺 −1
(3.3)
𝑚𝑒𝑎𝑛(𝐺𝐿𝑀𝐶) = ∑ ∑ 𝐺𝐿𝑀𝐶(𝑖, 𝑗)
𝑖=0 𝑗=0

𝑁𝐺 −1 𝑁𝐺 −1
(3.4)
𝑒𝑛𝑒𝑟𝑔𝑦(𝐺𝐿𝑀𝐶) = ∑ ∑ 𝐺𝐿𝑀𝐶(𝑖, 𝑗)2
𝑖=0 𝑗=0

𝑁𝐺 −1
(3.5)
𝑢𝑛𝑖𝑓𝑜𝑟𝑚𝑖𝑡𝑦(𝐺𝐿𝑀𝐶) = ∑ 𝐺𝐿𝑀𝐶(𝑖, 𝑖)2
𝑖=0

𝑁𝐺 −1 𝑁𝐺 −1
𝐺𝐿𝑀𝐶(𝑖, 𝑗) (3.6)
homogeneity (𝐺𝐿𝑀𝐶) = ∑ ∑
1 + |𝑖 − 𝑗|
𝑖=0 𝑗=0

The new binarization method based on texture and particularly co-occurrence matrix for
ancient degraded documents is the aim of the proposed method which addresses the document
binarization field by proposing a new threshold-based method. The method is based on
document image pixel’s texture features extracted from the image. The main motivation for
using the co-occurrence matrix is due to its high sensitivity to capture directional texture
information and for its power discrimination. The proposed method is a pixel-wise adaptive
threshold method based on texture features, its overall scheme is shown in Figure 3.4.
Let 𝑚(𝑥, 𝑦) and α, be respectively the mean and a texture feature estimated for each pixel
within a neighborhood window namely 𝑊, respectively from both degraded document image

44
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

and the extracted co-occurrence matrix. Then the binarization threshold is deduced using one
of the classical well-known methods namely Niblack.

Input degraded image


𝐼(𝑥, 𝑦)

Grayscale conversion

Co-occurrence matrix

𝐺𝐿𝑀𝐶(𝑖, 𝑗)

mean
𝑚(𝑥, 𝑦) Texture feature

𝑇(𝑥, 𝑦) = 𝑓(𝑚, 𝛼)

Pixel wise operation for


each image
Yes Thresholding
based on Niblack’s > 𝑇(𝑥, 𝑦)

No

Background Foreground

𝐵(𝑥, 𝑦), Binarized


Document image

Figure 3.4: The overall scheme of the proposed method.

In summary, the proposed method is performed according to the following steps as depicted
in Figure 3.4. Main details will be given in the following subsections.

3.3.1 Design of the co-occurrence matrix


After a pre-processing stage which consists of converting the color images into grayscale
images, the features are extracted from the co-occurrence matrix according to the following:
1. Select the co-occurrence parameters such as distance 𝒅 and direction 𝜽.
2. Generate the co-occurrence matrix for each document image.
3. Normalize the directional co-occurrence matrix (dividing by the max value).
45
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

3.3.2 Binarization module


The binarization module is designed according to the following steps:
1. Compute the mean 𝑚(𝑥, 𝑦) within a local window 𝑊 centered in (𝑥, 𝑦) of the gray
level, original image 𝐼(𝑥, 𝑦).
2. Set and extract 𝛼 the Haralik’s feature from the co-occurrence matrix 𝐺𝐿𝑀𝐶(𝑖, 𝑗),
within a local window 𝑊 centered in (𝑗, 𝑗).
3. Perform an adaptive binarization by using the extracted features 𝛼 and 𝑚 within the
well-known thresholding-based method namely Niblack, according to Eq (3.6).

𝑇(𝑖, 𝑗) = 𝑚(𝑖, 𝑗) + 𝑘 x 𝛼 (3.6)

Where in this equation, 𝑇(𝑖, 𝑗) is the threshold, ∝ is a parameter based on Haralick


descriptor, 𝑚 the local mean of image pixels within a centered window of, and 𝑘 is a constant
parameter defined by experiment. The details will be explained later.

3.4.1 Experimental setup


As explained above, the proposed binarization method requires setting some parameters
related to the co-occurrence matrix, and the adaptive thresholding-based binarization method
namely the Niblack’s method.
The co-occurrence matrix needs two parameters to be defined, the distance 𝑑 and the
orientation 𝜃. Conversely, binarization methods require defining the size of the sliding window
𝑊 and the value of the constant 𝑘. To set up the optimal parameters, the proposed method
requires selecting a set of representative document images and their corresponding ground truth
images. Then, for an objective evaluation, the experimental protocol is conducted in three steps:

• Selection of representative degraded images and the corresponding ground truths,


• Estimation of the binarization threshold on the selected degraded images,
• Experimental evaluation.

The set of representative degraded images is selected from the DIBCO-2009, DIBCO-2010,
DIBCO-2011 datasets.

46
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Figure 3.5, depicts the steps for finding the best parameters of the thresholding-based method.
F-Measure (FM) is used as a metric evaluated between the binarized image and the
corresponding ground truth.

Set of degraded
degraded images

Degraded images

Gray level conversion


Pre-processing
Set of representative degraded
images

Co-occurrence matrix

Set parameters

Haralick’s attributes

Binarization Ground-Truth image

Binarized image

F-Measure (FM)

No Best FM Yes Optimal


parameters

Figure 3.5: Overall steps related to the parameters optimization of the threshold-based binarization method.

a) Set up and select of Haralick feature


To select the best Haralick feature, we will use a set of two kinds of images. Five images
representing the background and five other representing the foreground (text); a sample, of five
background and five foreground images, is depicted in Figure 3.6 and for each of the two kinds
of images, we compute their respective Haralick's attributes from the co-occurrence matrices

47
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

as seen in Table 3.1, for θ=0°,45°, 90°, and 135° and 𝑑 = 1; then the best attribute will be
chosen.
According to the results such as depicted, the contrast attribute is the best discriminative
(The greater difference between the Background and Foreground, compared with other
Haralick’s descriptors). Therefore we will consider only this attribute and we define
experimentally ∝ = √𝑐𝑜𝑛𝑡𝑟𝑎𝑠𝑡(𝐺𝐿𝑀𝐶) in Eq.(3.6), which gives better results.

(a) (b)
Figure 3.6: Sample of background and foreground document image: (a) Background, (b) Foreground.

Table 3.1: Haralick’s attributes values for Background and foreground images.
Co-occurrence Contrast Mean uniformity Energy
direction
θ=0 Background 0.0002 0.71 0.485 0.45
Foreground 0.0104 0.97 0.075 0.04
θ=45° Background 0.0001 0.51 0.119 0.49
Foreground 0.0060 0.97 0.025 0.04
θ=90° Background 0.0001 0.71 0.485 0.25
Foreground 0.0100 0.97 0.005 0.07
θ=135° Background 0.0001 0.81 0.650 0.29
Foreground 0.0096 0.90 0.060 0.04

b) Set up the size of the window and the k parameter


According to Figure 3.5, given a fixed initial value of k, we vary the size of the window and
we perform a binarization, then the F-Measure is computed between the binarized image and
the corresponding ground truth image to select the best size as seen in Figure 3.7, considering
that the higher the value FM, is the better the window size. Once the window has been selected,
the same procedure is used to define the best value of k as seen in Figure 3.8. This experiment
leads to the size of 𝑊 = 41𝑥41 and 𝑘 = −1.

48
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

100,00

90,00

80,00

70,00

60,00
F-Measure

50,00

40,00

30,00

20,00

10,00

0,00
0 10 20 30 40 50 60 70 80
Window size WxW

Figure 3.7: F-Measure function sliding window size

90

80

70

60
F-Measure

50

40

30

20

10

0
-2,5 -2 -1,5 -1 -0,5 0
k constant value

Figure 3.8: F-Measure function k parameter.

3.4.2 Experimental evaluation


The proposed method has been tested for objective and subjective evaluation. The former is
based on the degraded document images of the dataset DIBCO-2009, DIBCO-2010, DIBCO-
2011, and the latter is based on a set of ancient degraded documents provided by the Algiers
National Library (BNA).

49
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Table 3.2, shows the experiment results of the most well-known local classical methods. The
evaluation is performed on the organized DIBCO data set by the type of degradations according
to [111]. The cumulative rank is computed for each type of degradation. It consists to compute
two distinct measures of evaluation 𝐹𝑀 and 𝑃𝑆𝑁𝑅, for each method and measure we compute
the rank 𝑅(𝑖, 𝑗), where 𝑖 is the method and 𝑗 is the measure, the final ranking 𝑅𝑎𝑛𝑘(𝑖; 𝑗) is
computed by sorting the summation of ranking values for all measures Table 3.2, we can notice
that the best results for various types of degradations such stain, ink bleed, non-uniform
background, are obtained by Nick’s method.
The method is implemented, as mentioned by using the equation Eq.(3.6) and all the
parameters are set up by experiments. We have computed the best value of 𝑘 and the size of the
window 𝑤, for each direction {0°, 30°, 45°, 135°}, that provides the best F-measure value. Two
variants of Haralick’s attributes are selected: contrast and mean. The best results are for the
contrast attribute, so we will detail the results only for this attribute, as shown in Table 3.3.
We notice that the results are enhanced and promising. The proposed method based on
contrast attribute and for angle 𝜃 = 135°, outperforms Nick’s method for all degradation types,
except for the stain degradation. We notice that some weaknesses exist for stain degradation
type and our method is ranked 2nd. For subjective evaluation, a sample of processed document
image as shown in Figure 3.9, Figure 3.10, Figure 3.11, Figure 3.12, and Figure 3.13, confirm
the quantitative results.

Table 3.2: Evaluation results of well-known methods for binarization by type of degradations.
Degradation Method FM PSNR Rank
Bernsen 64.14 9.64 4
Niblack 58.13 7.83 5
Stain Sauvola 81.60 13.60 2
Wolf 80.96 12.49 3
Nick 81.98 14.49 1
Bernsen 62.30 10.17 4
Niblack 54.89 7.86 5
Ink-bleed through Sauvola 84.40 15.14 2
Wolf 83.20 15.08 3
Nick 84.41 15.16 1
Bernsen 43.26 7.91 4
Niblack 37.88 6.16 5
Non-uniform background Sauvola 82.73 16.31 3
Wolf 84.50 16.5 1 2
Nick 85.14 17.06 1

50
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Table 3.3: Evaluation results of the proposed method and Nick’s method, for binarization by type of
degradations.
Degradation type Method FM PSNR Rank
Stain Nick 81,98 14.49 1
M1: Contrast 75.71 13.38 3
θ = 0°
M2: Contrast 74.69 13.28 5
𝜃 = 30°
M3: Contrast 75.61 13.35 4
θ = 45°
M4: Contrast 77.76 14.40 2
θ = 135°
Ink-bleed through Nick 84.41 15.16 5
M1: Contrast 91.702 16.07 2
θ = 0°
M2: Contrast 91.60 16.04 4
θ = 30 °
M3: Contrast 91.65 16.05 3
θ = 45°
M4: Contrast 92.56 16.09 1
θ = 135°
Non uniform background Nick 85.14 17.06 4
M1: Contrast 91.60 17.10 2
θ = 0°
M2: Contrast 91.30 15.065 5
θ = 30°
M3: Contrast 91.35 15.65 3
θ = 45°
M4: Contrast 91.70 17.40 1
θ = 135°

(a)
(b)
Figure 3.9: Sample for a subjective evaluation on degraded document images from Dibco-2009 datasets with
a distance = 1, window size = 41 and k=-1: (a) Haralick’s contrast feature with angle = 0° and (b) ) Haralick’s
contrast feature with angle = 135°.

51
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(c) (d)
Figure 3.10: Sample for a subjective evaluation on degraded document images from Dibco-2011 datasets
with a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s contrast feature with angle = 0° and (b)
Haralick’s contrast feature with angle = 135°.

(g) (h)

Figure 3.11: Sample for a subjective evaluation on degraded document images from Dibco-2009 datasets with
a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean feature with angle = 0° and (b) Haralick’s
mean feature with angle = 135°.

(e) (f)
Figure 3.12: Sample for a subjective evaluation on degraded document images from Dibco-2011 datasets with
a distance = 1 and window size = 41 and k=-1 : (a) Haralick’s mean feature with angle = 0° and (b) Haralick’s
mean feature with angle = 135°.

52
CHAPTER 3: CO-OCCURRENCE MATRIX-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

(c)
Figure 3.13: Sample for a subjective evaluation on degraded document images with angle = 135°, distance =
1, window size = 41x41 and k =-1 : (a) original image, (b) binarized image using Nick, (c) Proposed Method.

We have presented a new thresholding method based on texture. This latter is inspired by
Niblack’s method and enhanced by using a robust texture descriptor based on the co-occurrence
matrix. We have used a co-occurrence matrix as a texture to compute some of Haralick’s
attributes, such as contrast and mean for a distance-vector module equal to one with four
directions {0°, 30°, 45°, 135°}. The best parameters are defined by experiments. We notice that
the sliding window size is equal to 41x41, for angle 135°, and the method based on Haralick’s
contrast attribute performs better results. The results are promising and satisfactory.
Nevertheless, we notice some weaknesses in the stained document category. For further works,
we will focus a little bit more on such degradation and we will prospect more robust texture
attributes

53
54
Abstract

This chapter aims to present a new binarization method for degraded documents,
based on Local Binary Pattern (LBP) as a texture measure. The mean and variance
of pixels are computed respectively from the original document image and the LBP
image. Then, these features are used within a threshold-based method to perform a
binarization.

Because of the discriminative power, the computational simplicity, and the popularity of the
Local Binary Pattern (LBP) in the field of image segmentation and identification, we intend to
investigate it as a texture measure to improve the discrimination of the foreground from the
background to obtain accurate results in the field of ancient degraded document image
binarization. The LBP space is used for estimating the features by computing the variance. The
proposed method consists of computing a new adaptive thresholding method based on LBP.
More precisely, we use the LBP within Sauvola’s method for estimating the binarization
threshold.

LBP is a form of the Texture spectrum proposed in [120] [121]. The conventional LBP
operator [96] [122] [123] is computed at each pixel location by considering 𝑞𝑝 pixels within a
circular neighborhood of radius 𝑟 around a central pixel value 𝑞𝑐 . It is defined as follows:

𝐿𝐵𝑃(𝑃, 𝑟) = ∑𝑝−1
𝑝=0 𝑠( 𝑞𝑝 − 𝑞𝑐 )2
𝑝 (4.1)

55
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Where 𝑃 is the number of pixels in the neighborhood, 𝑟 is the radius and 𝑠(𝑥) = 1 𝑖𝑓 𝑥 > 0.
otherwise 0.
The following Figure 4.1, describe visually the simple concept of LBP, and Figure 4.2,
shows an example of the LBP image of a degraded document from the DIBCO datasets; where
it should be noted that the LBP image texture information highlights the contours and different
textures of the document image.

Figure 4.1: Processing of the LBP code.

(a)

(b)

Figure 4.2: LBP image of degraded document: (a) Degraded document from Dibco-2012 dataset, (b) LBP
image.
56
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Usually, the histogram used jointly with a simple local contrast measure provides the best
performance in texture segmentation. Moreover, the grayscale image variance of the local
neighborhood can be used as a complementary measure [123]. It is also reported that many
related to these binary numbers are used to describe the texture of the image. Many other
variants exist in the literature [96]. It is reported that based on this operator, approaches have
been developed for texture image segmentation, especially in the field of color texture [124]
[125]. However, to the best of our knowledge, there is no work reported yet in the field of
binarization and especially in degraded document binarization.

The proposed method consists of computing a new adaptive thresholding method based on
texture and particularly LBP to binarize ancient degraded documents. The motivation for using
the LBP is because of its discriminative feature as reported in the literature. From the LBP
image, we extract the variance of a pixel within its neighborhood as texture information which
is combined with the mean of this pixel within its neighborhood computed from the original
image, as depicted by the flowchart of Figure 4.3.
Input Degraded Image𝐼(𝑥, 𝑦)

Grayscale conversion

𝐿𝐵𝑃

𝑚(𝑥, 𝑦) σ =Texture feature =Variance

𝑇(𝑥, 𝑦) = 𝑓(𝑚, 𝜎)

Pixel wise operation


for each image
No
thresholding method 𝑇(𝑥, 𝑦)
(I(x,y) > T(x,y))

Yes

Background Foreground

𝐵(𝑥, 𝑦), Binarized document

Figure 4.3: The overall scheme of the proposed method.

57
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

To select the best thresholding-based method, we performed a series of tests conducted on a


set of 08 image documents (02 images per degradation from Dibco datasets), according to the
flowchart shown in Figure 4.4. The results lead to Sauvola’s methods which perform better
discrimination between text and foreground.

Set of representative
degraded images with ground
truth.

Set to default the related to thresholding-based method 𝑊 = 19

Select one of the thresholding methods:


-Niblack
-Sauvola
-Wolf

Binarized image using LBP with Ground-Truth


the thresholding-based method image

Next thresholding
method F-Measure (FM)

No Yes Best thresholding


Best FM method

Figure 4.4: Selecting the best thresholding-based method.

4.3.1 Design of the LBP-based texture


Following the overall scheme of the proposed method, a variance is extracted from the LBP
image of the degraded document to use it, as a texture feature. The motivation to use the
variance on the LBP image is since the variance extracted from LBP shows more contours than
if we use the variance directly from the original degraded document ima as shown in Figure
4.5.
The designed method is based on two variants of LBP, which are namely Basic LBP and a
modified LBP namely LBP-C. This latter is computed by combining the local contrast within
the LBP image to overcome the binarization drawback caused by the poor contrast degradation.

58
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Thus, normalization is done by dividing each element of the LBP image by the local contrast
denoted C and defined as follows:

𝐿𝐵𝑃 (4.2)
LBP_C =
𝐶
𝐶𝑝 𝐶𝑛
Where, 𝐶 = 𝑁 − and where, 𝐶𝑝 is the sum of pixels value ≥ 𝑞𝑐 and 𝑁𝑝 is the number of
𝑝 𝑁𝑛

pixels ≥ 𝑞𝑐 ; 𝐶𝑛 is the sum of pixels value ≤ 𝑞𝑐 ; and 𝑁𝑛 is the number of pixels ≤ 𝑞𝑐 .


After a pre-processing stage which consists of converting the color images into grayscale
images, a variance is extracted from the LBP image according to the following steps:
1. Compute the LBP image from the original document image and normalize the LBP image.
2. Extract variance information for each pixel neighborhood within a centered sliding window.

(a)

(b)

(c)

(d)

Figure 4.5: Sample of LBP and original image variances: (a) original image, (b) original image variance, (c)
LBP image, (d) LBP variance.
59
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

4.3.2 Binarization module


The binarization module is designed according to the following steps and implemented as
shown in Figure 4.6.
• Compute the mean 𝒎(𝒙, 𝒚) within a local window of size 𝑾 centered at (𝒙, 𝒚)
coordinates of the gray level original image 𝑰(𝒙, 𝒚).
• Set and extract for each pixel, σ the variance feature respectively from 𝑳𝑩𝑷(𝒙, 𝒚) and
image 𝑳𝑩𝑷_𝑪(𝒙, 𝒚) within a local window of size 𝑾 centered at (𝒙, 𝒚) coordinates.
• Perform an adaptive binarization, by using the extracted features 𝝈 and 𝒎 within the
Sauvola thresholding-based method namely SLBP and SLBP-C, according to Eq.(2.3).

𝐼(𝑥, 𝑦)

1-LBP
2-LBP-C

𝑚(𝑥, 𝑦)

𝜎(𝑥, 𝑦)

1- SLBP, Sauvola’s threshold + LBP


2- SLBP-C, Sauvola’s threshold +LBP-C

Binarized Document Image

Figure 4.6: Flowchart for estimating mean and variance from original and LBP images.

To perform a binarization by considering the better of the two methods, we propose to use a
method based on the Sauvola thresholding to generate a binarized image that will be used as a
reference image to compare the SLBP against SLBP-C, using the F-measure. The
implementation of this method namely SLBP-SLBPC is depicted by the flowchart shown in
Figure 4.7.
60
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Input degraded document image 𝐼(𝑥, 𝑦)

LBP-C LBP

𝜎(𝑥, 𝑦) 𝜎(𝑥, 𝑦) 𝑚(𝑥, 𝑦) 𝜎(𝑥, 𝑦)

S LBP-C Sauvola’s tresholding-based method S LBP

FM FM

FM optimal

Binarized Document Image


SLBP-SLBPC
Figure 4.7: Binarization method taking into account the advantages of both methods SLBP and SLBP-C.

4.4.1 Experimental setup


The proposed binarization method requires the setting of parameters related to the LBP
algorithm, and the adaptive thresholding-based binarization method namely the Sauvola’s
method. It is worth noting that all the related parameters are optimal and defined by
experiments. The LBP parameters are defined experimentally as (𝑃 = 8, 𝑟 = 1).
To set up the optimal parameters related to the threshold 𝑖𝑒. the size of the sliding window
𝑊 and the 𝑘 parameter, the proposed method is conducted in three steps as shown in Figure
4.8. First, we define the window size to the best value in the literature and we find out the best
𝑘 value by varying the 𝑘 value according to the Figure 4.8, once the optimal value of k is found,
we repeat the steps of the flowchart to find the optimal window’s size.
• Selection of representative degraded images and the corresponding ground truths from
DIBCO-2009, DIBCO-2010, and DIBCO-2011.
• Estimation of the binarization threshold on the selected degraded images.

61
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

• Experimental evaluation using FM measure between the binarized image and the
corresponding ground truth.

Set of
representative
degraded images
with ground truth.

Set thresholding parameters

First set the initial 𝑊 = 19 and then vary the 𝑘 parameter to the optimal value 𝑘_𝑜𝑝t.
(Once done, repeat the flowchart by
Initializing 𝑘 = 𝑘_𝑜𝑝𝑡 and vary 𝑊 to the optimal value 𝑊_𝑜𝑝𝑡).

Binarization Ground-Truth
image

F-Measure (FM)
Next value of 𝒌 or 𝑊

No Yes Optimal values:


Best FM 𝒌_𝒐𝒑𝒕 and 𝑾_𝒐𝒑𝒕

Figure 4.8: Overall steps to set optimal thresholding-based method parameters.

According to Figure 4.8, given an initial value of the windows ‘size 𝑊 = 19𝑥19 (according
to the literature results related to Sauvola’s thresholding-based method), we vary the k value
until the optimal value 𝑘_𝑜𝑝𝑡, by performing a binarization, then computing the F-Measure
between the binarized image and the corresponding ground truth image to select the optimal k
value, considering that the higher the value FM is, the better the k parameter. Once the k
parameter has been selected, the same procedure is used to define the optimal size of the sliding
windows 𝑊_𝑜𝑝𝑡, but this time by setting the value 𝑘 to 𝑘_𝑜𝑝𝑡 and varying 𝑊 until the 𝑊_𝑜𝑝𝑡.
This experiment leads to 𝑊 = 41𝑥41 and 𝑘 = 0.2.
Figure 4.9 shows an example on a document image H02 from Dibco-2009, where W, the
sliding window size, is initialized at 𝑊 = 19𝑥19 and 𝑘 varying from 0.1 to 0.6 with a step of
0.1 to find the 𝑘_𝑜𝑝𝑡; while the Figure 4.10 shows an example on the same document image
where 𝑘 was fixed at 𝑘_𝑜𝑝𝑡 = 0.2 and 𝑊 varies between 3 and 51 with a step of 0.1 in order to
find the 𝑊_𝑜𝑝𝑡.
62
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

92

91

90
Fmeasure

89

88

87

86

85
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7
value of k

Figure 4.9: F-Measure function the k parameter.

90

88

86

84
F-measure

82

80

78

76

74
0 10 20 30 40 50 60
Windows size W

Figure 4.10: F-Measure function sliding window size.

4.4.2 Experimental evaluation


The proposed method is tested for objective and subjective evaluation, using multiple
metrics such as F-Measure, Pick Signal to Noise Ratio (PNSR), and Distance Reciprocal
Distortion (DRD) [54]. The dataset from DIBCO-2009, DIBCO-2011, DIBCO-2012, DIBCO-
2013 is used. Datasets are composed of documents with different types of degradations with
their corresponding ground truth document images. Results are compared using the well-known
classical threshold-based methods such as Sauvola’s, Niblack’s, and Wolf’s methods against
Sauvola-LBP and Sauvola-LPB-Contrast denoted respectively S-LBP and S-LBP-C. Results
are reported in Table 4.1, according to respectively DIBCO-datasets by year of contest

63
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

submission and to DIBCO-datasets reorganized by degradation type. We notice in Table 4.1,


that the combination of the LBP-Contrast namely LBP-C with the Sauvola’s binarization
method outperforms all other methods in terms of FM, PSNR and DRD conducted on DIBCO-
2009, DIBCO-2012, and DIBCO-2013 datasets and the results are comparable with Wolf’s
method for DIBCO-2011. The other comparison conducted on the DIBCO dataset, organized
according to the degradation type as seen in Table 4.2, shows that the method S-LBP-C
outperforms all other for Stain, Non-uniform background, and Ink degradation. Also, it is worth
noting that the results in LBP-C for Ink degradation, presents a big gap compared with the
others. On the contrary, it can be observed that the LBP method outperforms LBP-C and the
others for Ink-bleed degradations.
Two sample subjective comparisons are carried out on Dibco-datasets where we can see
visually the pertinence of these results: The first between the classical well-known thresholding-
based methods carried out on DIBCO-datasets which are depicted in Figure 4.11, where we see
the outperformance of the SLBP-C over the well-known classical methods; the second,
between SLBP-C method against SLBP method, carried out on DIBCO-datasets by type of
degradation as seen in Figure 4.12, Figure 4.13, Figure 4.14 and Figure 4.15.
Figure 4.12, depicts the performance of SLBP-C over LBP for documents such as those
with stain degradation as seen in Figure 4.12(a-b), where the corresponding binarized images
with SLBP method Figure 4.12(c-d), are still degraded while in those binarized with SLBP-C
see Figure 4.12(e-f), the stain has almost disappeared and the result of the binarization is better.
The same observation for documents with ink fading degradation Figure 4.13(a-b), where the
corresponding binarized images using the SLBP method Figure 4.13(c-d) shows the foreground
almost erased, on the other hand, the corresponding binarized image Figure 4.13(e-f) for the
SLBP-C method shows good results. For the non-uniform degradation, the SLBP method shows
noisy binarized images in Figure 4.14(c-d) however results for the SLBP-C method are better
as seen in Figure 4.14(e-f).
On the contrary, the results are better for SLBP over SLBP-C for ink-bleed through
degradation as seen in Figure 4.15(c-d) over Figure 4.15(e-f).
Quantitatively, as shown in Table 4.3, some results are reported for selected images from
DIBCO-datasets to show the improvement when combing Sauvola’s method with the LBP-C.
This reflects the interest of using the SLBP and its variant SLBP-C for estimating the local
texture to be used in the binarization threshold. Furthermore, it can be observed that mostly
there is a big gap in terms of all measures compared with Sauvola’s method as shown
particularly for instance for the document image H01 in term of FM, performs a value of 49.90
in Sauvola’s method against a value of 76 in SLBP-C based-method.
64
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

H08-Dibco-2012 P02-Dibco-2009

Niblack

Sauvola

Wolf

Nick

SLBP-C
Figure 4.11: Subjective comparison between classical thresholding-based methods against SLBP-C method.

65
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a) PR04-DIBCO-2013 (b) H04-DIBCO-2009

(c) (d)

(e) (f)
Figure 4.12: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with stain degradation, (c-d) SLBP method; (e-f) SLBP-C method.

(a) H03-DIBCO-2010 (b) HW01-DIBCO-2011

(c) (d)

(e) (f)
Figure 4.13: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with Non-uniform background degradation, (c-d) SLBP method; (e-f) SLBP-C method.

66
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a) H02-DIBCO-2010 (b) H03- DIBCO -2010

(c) (d)

(e) (f)
Figure 4.14: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with Ink fading degradation, (c-d) SLBP method; (e-f) SLBP-C method.

(a) HW7- DIBCO-2011 (b)H06- DIBCO -2012

(c) (d)

(e) (f)
Figure 4.15: Subjective evaluation on a Sample of binarized images, respectively: (a-b) Original image
with Ink-bleed through degradation, (c-d) SLBP method; (e-f)S LBP-C method.

67
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Table 4.1: Performance evaluation of the proposed method against the classical threshold based on DIBCO
datasets.
Datasets Method FM PSNR DRD
Sauvola 77.55 15.65 7.23
Niblack 40.97 6.06 106.02
DIBCO 2009 Wolf 81.59 16.38 3.86
S-LBP 79.12 15.67 7.33
S-LBP-C 85.53 16.40 3.12
Sauvola 77.53 15.80 5.44
Niblack 38.08 5.79 107.03
DIBCO 2011 Wolf 83.71 16.40 4.53
S-LBP 76.63 15.13 6.37
S-LBP-C 82.07 15.59 8.13
Sauvola 60.69 15.44 10.31
Niblack 31.35 5.79 109.37
DIBCO 2012 Wolf 71.05 16.11 8.32
S-LBP 68.42 15.68 8.50
S-LBP-C 80.78 17.06 5.90
Sauvola 83.12 16.94 5.23
Niblack 36.62 5.86 111.95
DIBCO 2013 Wolf 8 1.60 16.20 4.73
S-LBP 79.76 16.08 10.18
S-LBP-C 84.35 17.04 4.07

Table 4.2: Performance evaluation of the proposed method against the classical threshold-based methods
according to the degradation type.
Degradation type Method FM PSNR DRD
Sauvola 83.12 15.80 4.82
Niblack 37.58 05.91 96.40
Stain Wolf 85.88 15.3 1 4.26
S-LBP 75.97 13.21 5.12
S-LBP-C 88.17 16.00 4.10
Sauvola 82.26 15.75 4.12
Niblack 37.55 6.10 115.04
Ink-bleed
Wolf 82.20 15.78 4.42
through
S-LBP 88.53 17.96 3.80
S-LBP-C 85.81 16.86 4.10
Sauvola 74.82 14.54 6.51
Niblack 32.38 05.46 144.61
Non-uniform
Wolf 74.26 15.73 4.69
Background
S-LBP 78.09 14.89 4.69
S-LBP-C 78.85 15.80 4.50
Sauvola 57.57 14.27 10.91
Niblack 32.60 06.11 91.55
Ink Wolf 63.83 14.81 9.64
S-LBP 73.92 14.71 5.01
S-LBP-C 84.50 17.11 4.25

68
CHAPTER 4: LBP-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Table 4.3: Evaluation Results on Image Samples.


Image FM Sauvola FM SLBP-C PSNR Sauvola PSNR SLBP-C DRD Sauvola DRD SLBP-C
DIBCO 2009
H01 48.90 76.00 13.44 15.00 10.94 5.43
H05 56.69 76.86 16.33 17.51 10.56 5.97
P03 65.48 70.95 10.55 11.04 22.67 20.20
DIBCO 2011
HW8 77.69 88.89 17.60 20.13 4.61 2.48
PR7 81.87 85.04 21.16 21.84 5.39 4.52
PR8 75.67 81.22 12.66 13.54 5.74 4.65
DIBCO 2012
H07 50.60 75.89 13.55 15.78 8.45 4.81
H010 43.18 79.00 11.40 14.56 19.30 8.49
H011 7.94 68.19 10.85 13.76 22.93 11.11
H013 15.55 59.65 13.53 15.52 13.22 7.95
DIBCO 2013
HW01 61.04 79.37 16.58 18.70 8.77 5.07
PR05 88.92 92.14 14.31 15.58 4.55 3.46
PR03 86.61 90.47 17.62 18.91 4.88 3.64

The purpose of this chapter is to investigate the use of the LBP operator for estimating the
variance as a texture to be used in a thresholding method for binarizing historical documents.
Two LBP variants, the basic LBP and the modified LPB considering the contrast, combined
with Sauvola’s thresholding-based method are implemented and compared.
In a general way, experimental results conducted on DIBCO datasets prove that the LPB
and its variants can constitute an interesting approach to explore for improving the existing
binarization methods. The results are better for SLBP over SLBP-C for ink-bleed through
degradation, however, their combination outperforms the well-known thresholding-based
methods.
The noticed big gap in terms of all measures compared with Sauvola’s method for particular
document images in terms of FM, should be taken into account for further works.

69
Abstract

In this chapter, a new threshold-based method exploiting texture information


features extracted from both the filtered image using the Gabor filter and the original
degraded document is developed. Firstly, a preprocessing stage using the Wiener
filter is performed on the degraded image for facilitating the binarization. Then, a
Gabor filter bank is weighted according to the dominant slant of the document’s
image script for estimating the binarization threshold. Finally, a post-processing
stage is applied based on the morphological operator for reducing some artifacts.
Exhaustive experiments are achieved using standard DIBCO datasets series
reorganized according to the degradation type and the year of the contest. Obtained
results are compared against various well-known threshold-based methods. On the
other hand, a comparison is achieved with state-of-the-art methods. Promising
results and stability are noticed for the proposed technique, specifically for ink
bleed-through degradation and low-contrasted documents.

For the best discrimination between the foreground and background, a suitable framework
based on the Gabor filter bank is designed by considering the orientation of the foreground-
script and the type of degradation. As the proposed method is based on the joint use of the
Gabor filter combined with the classical local threshold-based method, an overview of the
Gabor filter technique is presented in the following sections.

The Gabor filter was proposed by Dennis Gabor and is considered one of the best methods
for texture analysis and processing [23]. A Gabor wavelet is defined as a complex sinusoidal
plane wave modulated by a Gaussian kernel function as follows.

71
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

1 1 𝑥 ′2 𝑦 ′2 ′
𝐻(𝑥, 𝑦, 𝑓, 𝜃 ) = 𝑒𝑥 𝑝 [− ( 2 + 2 )] . 𝑒 𝑗2𝜋𝑓𝑥 (5.1)
2𝜋𝜎𝜌 2 𝜎 𝜌

where 𝑥 ′ = 𝑥. 𝑐𝑜𝑥𝜃 + 𝑦. 𝑠𝑖𝑛𝜃, 𝑦 ′ = −𝑥. 𝑠𝑖𝑛𝜃 + 𝑦. 𝑐𝑜𝑠𝜃 and (𝑥, 𝑦) are the spatial coordinates.
𝜎 and 𝜌 are the respective standard deviations along the 𝑥 and 𝑦 axis of the Gabor filter’s
Gaussian kernel, describe the size of the Gaussian envelope and define the scale of the filter
along the spatial axis. Additionally, 𝑓 and 𝜃 are the central spatial frequency and orientation
of the filter, respectively.
The two-dimensional Gabor filter is a linear combination performed by convoluting the Gabor
wavelet 𝐻(𝑥, 𝑦) with the input image 𝐼(𝑥, 𝑦) at frequency 𝑓 and angle 𝜃, as follows:

𝐼𝐹 (𝑥, 𝑦, 𝑓, 𝜃) = 𝐻(𝑥, 𝑦, 𝑓, 𝜃) ∗ 𝐼(𝑥, 𝑦)


(5.2)

where * denotes a convolution in two dimensions at coordinates 𝑥 and 𝑦, while 𝐼𝐹 (𝑥, 𝑦, 𝑓, 𝜃)


is the filtered image at frequency 𝑓 and angle 𝜃. The design of the Gabor filter requires the
setup of four parameters (𝜎, 𝜌, 𝑓, 𝜃). However, (𝑓, 𝜃) are the main parameters that allow
capturing features according to a specific frequency and orientation. Hence, different values of
𝑓 and 𝜃 are required for capturing all possible features contained in the image. Therefore, a
Gabor filter bank is designed to use several filters at different frequencies and orientations. Let
𝑁𝑓 and 𝑁𝜃 be the numbers of central frequencies and orientations, respectively; and let 𝑓𝑖 (𝑖 =
0, . . 𝑁𝑓−1 ) and 𝜃𝑗 (𝑗 = 0, . . , 𝑁𝜃 − 1) be a specific frequency and orientation, then the filtered
image at 𝑓𝑖 and 𝜃𝑗 takes the following form:

𝐼𝐹 (𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) = 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) ∗ 𝐼(𝑥, 𝑦)


(5.3)

Since the filter is linear, the Gabor filter bank implementation for all possible orientations at a
specific frequency can be written:

𝑁𝜃 −1

𝐼𝐹 (𝑥, 𝑦, 𝑓𝑖 ) = ∑ 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) ∗ 𝐼(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 )


(5.4)
𝑗=0

72
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

𝜃 𝑁 −1
Let 𝐺(𝑥, 𝑦, 𝑓𝑖 ) = ∑𝑗=0 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) be the Gabor filter Bank for all possible orientations

defined for a specific frequency 𝑓𝑖 . Then, the filtered image for all possible frequencies is
obtained by summing the filtered image for all possible orientations as follows:

𝑁𝑓 −1
(5.5)
𝐼𝐹 (𝑥, 𝑦) = ∑ 𝐺(𝑥, 𝑦, 𝑓𝑖 ) ∗ 𝐼𝐹 (𝑥, 𝑦, 𝑓𝑖 )
𝑖=0

Finally, for a suitable implementation, the filtered image is composed of a double 1D-
convolution, can be written as follows:

𝑁𝑓 −1 𝑁𝜃 −1
(5.6)
𝐼𝐹 (𝑥, 𝑦) = [ ∑ ∑ 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 )] ∗ 𝐼(𝑥, 𝑦)
𝑖=0 𝑗=0

The filtered image thus contains features captured from the degraded image for all possible
frequencies and orientations. Figure 5.1 depicts the general structure of the Gabor filter
implementation.

Degraded Image
𝐼(𝑥, 𝑦)

𝑓0 𝑓𝑖 𝑓𝑁𝑓−1

𝑁𝑓−1 𝑁𝑓−1
𝐻00 0
𝐻𝑁ө−1 1
𝐻𝑁ө−1 𝐻0 𝐻𝑁ө−1
𝐻01

𝐺0 (𝑥, 𝑦) 𝐺1(𝑥, 𝑦) 𝐺𝑁𝑓 −1(𝑥, 𝑦)

Filtered Image
𝐼𝐹 (𝑥, 𝑦)

Figure 5.1: Simplified implementation of Gabor filter bank. For simplified notation, 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) and
𝐺(𝑥, 𝑦, 𝑓𝑖 ) are denoted respectively 𝐻𝑗𝑖 (𝑥, 𝑦) and 𝐺𝑖 (𝑥, 𝑦), (𝑖 = 0, . . , 𝑁𝑓 − 1) and (𝑗 = 0, . . , 𝑁𝜃 − 1). 𝑁𝑓 and 𝑁𝜃
define the numbers of central frequencies and orientations, respectively.

73
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

A new binarization method based on Gabor filter bank for ancient degraded documents is
the aim of the proposed method which addresses the document binarization field by proposing
a new threshold based-method using a texture feature based on Gabor filters and taking into
account the type of degradation. The proposed method is inspired by the classical ones. The
method is based not only on document image pixel’s neighbors but also on texture features
extracted from the image filtered by the Gabor filters bank.
The main hypothesis for using the Gabor filter is that the foreground has a different orientation
than a background or than any degradation. Indeed, the main motivation for using the Gabor
filter is due to its high sensitivity to capture directional features. Therefore, it has power
discrimination of texture features such as the variance undertaken in Gabor space, which is
generally higher than the original variance taken from the original document image as shown
in Figure 5.2. The overall scheme of the proposed method is illustrated in Figure 5.3: Let
𝑚𝐷 (𝑥, 𝑦) and 𝜎𝐹 (𝑥, 𝑦) be the mean and the standard deviation estimated for each pixel within
a neighborhood window namely 𝑊𝑔 from both degraded and filtered document images. Then
the binarization threshold is deduced using one of the classical well-known methods, i.e.
Niblack, Sauvola, and Wolf's methods. For better clarity, three variants of the proposed method
are namely Niblak-Gabor, Sauvola-Gabor, and Wolf-Gabor, respectively based on Gabor filter
bank combined with the one Niblack’s, Sauvola’s, and Wolf’s methods.

(a)

(b) (c)

Figure 5.2: Estimation of the standard deviation from degraded and Gabor filtered image (a) Degraded image,
(b) Standard deviation estimated from the degraded image, (c) Standard deviation estimated from the Gabor
filtered image.

74
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

𝐼(𝑥, 𝑦) Degraded document image

Grayscale conversion
Pre-processing
Wiener Filter

Gabor Filter Bank


weighted by the dominant angle using Filtered document image
the 2D Fourier transform

𝐼𝐹 (𝑥, 𝑦)

𝑚𝐷 (𝑥, 𝑦) 𝜎𝐹 (𝑥, 𝑦)

Estimation of the threshold 𝑇(𝑥, 𝑦)


𝑇(𝑥, 𝑦) and thresholding the image

Pixel wise operation for


each image
No
I(x,y) >
Thresholding

Yes

Bachground
Foreground

Morphological operator Post-processing

𝐵(𝑥, 𝑦) Binarized document

Figure 5.3: General scheme of the proposed method.

The historical document often contains complex degradations mixed with the foreground
(text) information. Hence, by exploiting the hypothesis that the oriented texture of the
foreground is different than the background’s texture, a better extraction of the foreground (text)
can be achieved by computing the Gabor filter banks for all different angles weighted with the

75
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

dominant slant orientation of the foreground. To catch the maximum of information belonging
to the foreground, the filter is oriented along the most dominant direction of the script slant of
the document image. The sum of the filtered images is then weighted by the dominant angle of
the script slant. The filtering of the degraded image into the main direction around the main
script (foreground) is performed using a weighting function namely 𝑤𝑗 , which is formally
defined as follows:

𝑤𝑗 = 𝑒 −𝛼|𝜃𝑑 −𝜃𝑗| (5.7)

where α is a constant defined by experiment and 𝜃𝑑 is the dominant angle of the foreground-
script slant of the document image, while 𝜃𝑗 are the angles used in each Gabor filter’s direction.
Furthermore, taking into account the linearity of the Gabor Filter, a new Gabor wavelet (a
weighted matrix mask) namely 𝐺𝑤 (𝑥, 𝑦, 𝑓𝑖 ) is computed, which is the sum of the weighted
Gabor wavelets corresponding to each direction, as follows:

𝑁𝜃 −1

𝐺𝑤 (𝑥, 𝑦, 𝑓𝑖 ) = ∑ 𝑤𝑗 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 )
(5.8)
𝑗=0

The dominant angle 𝜃𝑑 is computed using the Fourier transform of the document image. The
2D Fourier transform is performed on the degraded image 𝐼(𝑥, 𝑦) to obtain an image in a
frequency domain namely 𝑇(𝑢, 𝑣) = ℱ(𝐼(𝑥, 𝑦)), where ℱ(. ) is the Fourier transform that
allows highlighting the foreground (text) contained in the degraded image. For finding the
dominant slant angle, the resulting Fourier transform can be written as:

𝑇(𝑢, 𝑣) = |𝑇(𝑢, 𝑣)|exp (𝜑(𝑢, 𝑣))


(5.9)

such that |𝑇(𝑢, 𝑣)| and 𝜑(𝑢, 𝑣) are the magnitude and phase of the Fourier transform of the
degraded image, respectively. The highest amplitude of 𝑇(𝑢, 𝑣) represents the foreground.
Hence, the dominant slant angle 𝜃𝑑 of the script is deduced by searching the magnitude
|𝑇(𝑢, 𝑣)| having the highest values. Figure 5.4 depicts an example of the 2D Fourier- transform
performed on the degraded image.

76
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a)

(b)
Figure 5.4: Fourier-transform of a degraded image: (a) Degraded image, (b) Fourier transform.

In summary, the proposed method is performed according to the following steps according
to Figure 5.1 and Figure 5.3.

5.3.1 Pre-processing stage


First, the color images are converted into grayscale images. The subsequent step is to
perform the Wiener filter for facilitating the estimation of the binarization threshold. Wiener
filter is used to smooth the degradation contained in the image, especially for non-uniform
background degradation. Figure 5.5 and Figure 5.6 show the effect of using the Wiener filter
for various degradation types.

77
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

5.3.2 Design of the Gabor filter bank


The Gabor filter bank is designed according to the following steps:
1. Select the filter parameters 𝜎 = 𝜌 (In this case, the Gabor filter is considered as
isotropic).
2. Select a set of frequencies 𝑓𝑖 (𝑖 = 0, . . 𝑁𝑓−1 ) and angles 𝜃𝑗 (𝑗 = 0, . . , 𝑁𝜃 − 1)
of the Gabor filter bank.
3. Construct the mask 𝐻(𝑥, 𝑦, 𝑓𝑖 , 𝜃𝑗 ) for each frequency and angle.
4. Weight each angle with the dominant angle using Eq (5.3).
5. Compute the weighted Gabor wavelets 𝐺𝑤 (𝑥, 𝑦, 𝑓𝑖 ) using Eq. (5.9).
6. Perform the convolution of the degraded document image with a weighted
Gabor filter using Eq. (5.5).

5.3.3 Binarization module


The Binarization module is designed according to the following steps:
1. Compute the mean 𝑚𝐷 (𝑥, 𝑦) within a local window 𝑊𝑔 centered in (𝑥, 𝑦) of the
original image 𝐼(𝑥, 𝑦).
2. Compute the standard deviation 𝜎𝐹 (𝑥, 𝑦) of each pixel’s neighborhood within
a local window 𝑊𝑔 from the filtered image 𝐼𝐹 (𝑥, 𝑦).
3. Deduce the thresholds using one of the binarization methods (Niblack, Sauvola,
Wolf), namely Niblack-Gabor, Sauvola-Gabor, and Wolf-Gabor.

5.3.4 Post-processing stage:


A post-processing stage is applied after the thresholding of the document image to reduce
some artifacts. The post-processing is performed through the following steps:
1. Morphological operations as closing and opening are performed to fill the gaps
and to smooth the edges. A structured element is used having one radius
surrounding 8 neighbors to avoid altering the binarized image.
2. The subsequent step is an area opening operation to eliminate the small
connected components. The size of the smallest connected component is defined
as 60 pixels and 8 neighbors.
Figure 5.5 and Figure 5.6, show some examples of different steps for the four degradation
types. As can be seen, the use of the pre-processing via the Wiener filter and the post-processing
via the morphological operation allows enhancing the binarization process for all degradation
types.
78
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a)

(b)

(c)

(d)

Figure 5.5: Steps of the proposed method performed on Non-Uniform background (Left) and Stain (Right)
degradations: (a) Degraded image, (b) Wiener filtering (c) Binarized image (d) Morphological operator.

79
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a)

(b)

(c)

(d)
Figure 5.6: Steps of the proposed method performed on Ink bleed-through (Left) and Ink intensity variation
(Right) degradations: (a) Degraded image, (b) Wiener filtering (c) Binarized image (d) Morphological operator.

5.4.1 Experimental setup


The proposed binarization method requires setting some parameters related to the Gabor
filter and the selected standard binarization methods for estimating the threshold (Niblack,
Sauvola, or Wolf). The Gabor filter needs to define the mask size (𝜎, 𝜌), central frequencies,
and orientation angles. Conversely, the standard binarization methods require defining the size
of the sliding window 𝑊𝑔 , 𝑘, and 𝑅. To set up all these parameters, the proposed method
80
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

requires selecting a set of representative images and the corresponding ground truth images to
find the best ones. Hence, for an objective evaluation, the experimental protocol is conducted
in three steps:
• Selection of representative degraded images and the corresponding ground truths,
• Estimation of the binarization threshold on the selected degraded images,
• Experimental evaluation.
To set all the parameters to deduce the binarization threshold, a set of representative
degraded images is selected from the DIBCO-2009, DIBCO-2010, DIBCO-2011, DIBCO-
2012, and DIBCO-2013 datasets as shown in Figure 5.7.

H04 image (DIBCO-2009) PR04 image (DIBCO-2013)


(a)

HW06 image (DIBCO213) HW08 image (DIBCO-2013)


(b)

H02 image (DIBCO-2010) H01 image (DIBCO-2011)


(c)

H04 image (DIBCO-2010) H09 image (DIBCO-2010)


(d)
Figure 5.7: Two representative images per degradation type used to set up the parameters of the Gabor filter
banks: (a) Stain degradation, (b) Ink bleed-through, (c) Non-uniform background, and (d) Ink intensity variation.

81
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Two representative images per degradation type (Stain and Ink bleed-through, Non-uniform
background, Ink intensity variation) are selected, with their corresponding ground truth as
depicted in Figure 5.8.

H04 image (DIBCO-2009) PR04 image (DIBCO-2013)


(a)

HW06 image (DIBCO213) HW08 image (DIBCO-2013)


(b)

HW02 image (DIBCO-20103) HW1 image (DIBCO-2011)


(c)

H04 image (DIBCO-2010) H09 image (DIBCO-2010)


(d)
Figure 5.8: The ground truth images corresponding to the selected images of Figure 5.7 used to find all the
optimal parameters

82
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Figure 5.9, depicts the steps for finding the best parameters of the Gabor filter banks.

Set of representative
degraded images

Set of representative degraded Degraded images


images

Adjustment of Gabor parameters


(𝑓, 𝜃)
Pre-processing

Gabor filtering
Gabor filtering

Binarization

Binarization

Binarized image Ground-Truth image

Post-processing

F-Measure (FM)

Binarized image

No Yes
Best FM (𝑓𝑜𝑝𝑡 , 𝜃𝑜𝑝𝑡 )

Figure 5.9: Flowchart for finding the optimal parameters of the Gabor filter. estimation of the binarization
threshold.

The estimation of the binarization threshold using the Gabor filter involves the setup and
tuning of certain parameters, which are the size of the sliding window 𝑊𝑔 , 𝑘, and 𝑅 as well as
the mask size (𝜎, 𝜌), central frequencies, and orientation angles. The most suitable values for
the size of the sliding window 𝑊𝑔 and parameters 𝑘, 𝑅 are respectively 41 × 41, 0.7, and 128.
The parameter α of the weighting function is defined by the experiment and set to -0.002. The
parameters 𝜎 and 𝜌 are set so that 𝜎 = 𝜌 to design an isotropic filter which is suitable for
highlighting efficiently the contours of the foreground. If 𝜎 < 𝜌 or 𝜎 > 𝜌, the filter becomes
less sensitive to contours. Figure 5.10, shows the effect of selecting the 𝜎 and 𝜌 parameters,

83
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

whereas Figure 5.11, shows the effect of selecting the mask size. The parameters 𝜎 and 𝜌 are
fixed to 3, allowing the creation of the mask size of the Gabor filter to 11 × 11.

(a) (b)

(c) (d)
Figure 5.10: Effect of selecting 𝝈 and 𝝆 parameters: (a) Degraded image, (b) Gabor filtered image with 𝝈 <
𝝆, (c) Gabor filtered image 𝝈 = 𝝆, (d) Gabor filtered image with 𝝈 > 𝝆.

(a) (b)

(c) (d)
Figure 5.11: Effect of selecting the mask size of the Gabor filter: (a) Degraded image, (b) 7 × 7, (c) 11 × 11,
(d) 21 × 21.

84
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

To set up the central frequency, frequencies are varied for each degradation type from 0.08
to 0.6 for each number of angles 𝜃 varying from 4 to 32. The optimal frequency is selected
when the value of the FM metric is higher. Figure 5.12, depicts FM versus the central frequency
for several angles (4, 8, 16, and 32) per degradation type. As can be seen, the optimal central
frequency corresponds to 𝑓𝑜𝑝𝑡 = 0.140 for the Stain, Ink bleed-through, and Non-uniform
background degradations for any angle (Figure 5.12(a), Figure 5.12(b), Figure 5.12(c)). While
the best F-Measure value for Ink intensity variation degradation is obtained for the frequency
𝑓𝑜𝑝𝑡 = 0.145 for any angle Figure 5.12(d). Once the optimal frequency is selected, the number
of angles is set up in the same way as in the previous protocol. Figure 5.13, shows that the
optimal number of angles corresponds to 8 for all degradation types. However, the number of
angles is 8 for the overall average of all the degradations. Consequently, the number of angles
is set to 8. When considering simultaneously all degradation types, the average F-Measure for
all degradations by the number of angles leads to the frequency 𝑓 = 0.140 (Figure 5.12.e).
Consequently, the optimal selected frequency for the Gabor filter bank is then set as 𝑓𝑜𝑝𝑡 =
0.140. The adjustment of different parameters for defining an appropriate threshold reveals an
important finding by observing the obtained curves depicted in Figure 5.12. Indeed, F-Measure
versus the central frequency shows that the curves have a specific shape for each type of
degradation. This can be useful for adapting the parameters for each type of degradation.

85
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

100 4 8 100 4 8
16 32 90
16 32
90
80 80

70 70

F-Measure (%)
F-Measure (%)

60 60

50 50

40 40

30 30

20 20

10 10

0 0
0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,2 0,3 0,4 0,5 0,6
Central frequency Central frequency

(c) (d)
100
4 8 16 32 100
90 4 8
90
80
80
70
70
F-Measure (%)

F-Measure (%)

60 60
50 50
40 40
30 30
20 20
10 10
0 0
0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,2 0,3 0,4 0,5 0,6
Central frequency Central frequency

(e)
100
4 8 16 32
90
80
70
F-Measure (%)

60
50
40
30
20
10
0
0,1 0,2 0,3 0,4 0,5 0,6
Central frequency
Figure 5.12: F-Measure versus the central frequency for different angles (4, 8, 16, and 32) according to the
degradation type: (a) Stain, (b) Ink bleed-through, (c) Non-uniform background, (d) Ink intensity variation, (e)
Overall.

86
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)
100
100
95
95
F-Measure (%)

90

F-Measure (%)
90

85 85

80 80

75 75

70 70
4 8 16 32 4 8 16 32
Number of angles Number of angles

(c) (d)

100 100

95 95
F-Measure (%)

F-Measure (%)

90 90

85 85

80 80

75 75

70 70
4 8 16 32 4 8 16 32
Number of angles Number of angles

(e)
100

95
F-Measure (%)

90

85

80

75

70
4 8 16 32
Number of angles

Figure 5.13: F-Measure for different numbers of angles according to the degradation type for frequency 𝑓𝑜𝑝𝑡 =
0.140: (a) Stain, (b) Ink bleed, (c) Non-uniform background, (d) Ink intensity variation, (e) Overall.

5.4.2 Experimental evaluation


After finding the optimal parameters of the Gabor filter on a selected subset degradation
dataset, the binarization threshold is used on the dataset to evaluate the robustness of the
proposed method. Since the proposed method is closely related to conventional thresholding

87
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

methods and most other methods are based on them, the evaluation focuses primarily on the
popular classical threshold-based methods while comparison against some of the state-of-the-
art is highlighted. Therefore, evaluations are performed on blind and unblind datasets as well
as without and with a weighting function to evaluate its influence. More precisely, all DIBCO
datasets (2009, 2010, 2011, 2012, and 2013) organized by year of the Contest are used for a
blind evaluation, while the datasets organized by the degradation type are used for the unblind
evaluation [111]. The proposed method is evaluated based on three distinct measures, FM,
𝑃𝑆𝑁𝑅 and 𝐷𝑅𝐷, respectively. For each method 𝑖, a cumulative 𝑅𝑎𝑛𝑘(𝑖) is computed using the
three previous evaluation metrics. More precisely, let 𝑅𝑎𝑛𝑘(𝑖, 𝑗) be the rank of the 𝑖 𝑡ℎ method
using the 𝑗 𝑡ℎ measure, the cumulative ranking values 𝑅𝑎𝑛𝑘(𝑖) are computed for 𝑛 measures
through the following equation:

𝑅𝑎𝑛𝑘(𝑖) = ∑𝑛𝑗=1 𝑅𝑎𝑛𝑘(𝑖, 𝑗) (5.10)

The final ranking, namely Rank, is computed by sorting the 𝑹𝒂𝒏𝒌(𝒊) values. Table 5.1,
Table 5.2, and Table 5.3 report all the measures computed from the binarized images and the
corresponding ground truth images. The blind and unblind evaluation of the proposed method
against the well-known threshold-based methods are shown, respectively, in Table 5.1 and
Table 5.3, while the blind evaluation against other methods of the state of the art is shown in
Table 5.2. For the first evaluation reported in Table 5.1 and Table 5.3, the first observation
highlights that the method based on the Gabor filter bank combined with Sauvola’s thresholding
outperforms all the other threshold-based methods when using all DIBCO datasets (2009-2010-
2011-2012-2013), and all the datasets organized by the degradation type. Moreover, the ranking
shows the stability of the Sauvola–Gabor method against others since it is ranked first in both
blind and unblind evaluations. In contrast, Sauvola’s method ranking changes in each
evaluation protocol. Furthermore, Niblack’s method is ranked last in both the blind and the
unblind evaluations. However, when combined with the Gabor filter, the ranking of the
Niblack–Gabor method changes significantly up to the second position, outperforming the
Wolf–Gabor method. Roughly speaking, the best results are obtained for ink bleed-through
degradation and ink degradation. For the second evaluation, in comparison with the Gabor–
Sauvola method against certain of the state-of-the-art methods, Table 5.2 depicts the results
performed on blind datasets organized by the year of the contest. The proposed method shows
interesting results. The proposed method is well ranked and almost more stable against other
methods from the state-of-the-art. It is worth noting most of these other methods are not stables.

88
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(They have not the same scores at each dataset.) These results lead us to confirm that the
comparison using an unblind dataset is better suitable. For further works, it is worth noting the
importance of investigating the shape of the FM-frequency curve for degradation modeling type
to improve the results.

Table 5.1: Performance evaluation on blind DIBCO dataset.


Dataset Method FM PSNR DRD Rank
DIBCO 2009 Niblack [126] [62] 55.82 9.89 31.40 6
Sauvola [4, 55] [62] 85.41 16.39 6.30 3
Wolf [127] 81.59 16.38 1.46 4
Niblak-Gabor 83.91 16.40 4.60 2
Sauvola-Gabor 89.52 17.90 4.25 1
Wolf-Gabor 56.97 10.89 11.20 5
DIBCO 2010 Niblack [126] [62] 74.10 15.73 15.99 6
Sauvola [62] 75.30 15.81 2.80 2
Wolf [127] 62.75 15.25 1.95 4
Niblak-Gabor 68.45 16.13 8.63 3
Sauvola-Gabor 86.68 18.50 3.61 1
Wolf-Gabor 62.35 15.80 7.44 5
DIBCO 2011 Niblack [126] [62] 68.52 12.76 28.31 6
Sauvola [108] 82.54 15.75 8.09 3
Wolf [127] 83.81 17.30 2.42 2
Niblak-Gabor 79.35 15.52 1.02 4
Sauvola-Gabor 88.90 17.51 4.90 1
Wolf-Gabor 58.35 13.43 22.40 5
DIBCO 2012 Niblack [126] [62] 61.35 12.79 27.50 6
Sauvola [112] 82.89 16.71 6.59 4
Wolf [127] 81.90 16.73 0.95 2
Niblak-Gabor 76.36 16.72 2.03 3
Sauvola-Gabor 87.95 19.53 1.50 1
Wolf-Gabor 65.32 10.52 11.23 5
DIBCO 2013 Niblack [126] 57.62 13.86 21.01 6
Sauvola [109] 85.02 16.94 7.58 3
Wolf [127] 83.60 16.20 1.05 4
Niblak-Gabor 71.20 16.60 4.91 5
Sauvola-Gabor 88.89 19.53 1.50 1
Wolf-Gabor 85.12 16.66 3.42 2
Overall Niblack [126] 47.90 11.07 20.75 6
Sauvola [55] 80.14 16.82 2.72 2
Wolf [127] 77.54 15.96 2.37 3
Niblak-Gabor 75.44 15.85 5.07 4
Sauvola-Gabor 87.20 18.14 2.59 1
Wolf-Gabor 59.00 12.15 11.75 5

89
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

Table 5.2: Comparison of the proposed Sauvola-Gabor with the top binarization methods of the DIBCO2009,
DIBCO2010, DIBCO2011, DIBCO2012, and DIBCO2013 contests and with certain of the sate-of-the-art on blind
datasets.
Dataset Reference FM PSNR DRD Observation Rank
Sehad et al. [105] 85.53 16.40 3.12 9
Gatos et al. [128] 85.25 16.50 -- 8
Sehad et al. [103] 87.82 16.50 - 7
Rivest-Hénault et al. [4] 17.79 -- 3rd rank of contest 6
89.34
[129]
DIBCO-2009 Lu et al. [4] [130] 90.06 18.23 -- 2nd rank of contest 5
Lu et al. [4] [131] 91.24 18.66 -- 1st rank of contest 3
Moghaddam et al. [132] 91.61 18.80 -- 2
Nafchi et al.[133] 92.58 19.00 -- 1
Proposed (Sauvola-Gabor) 89.85 18.90 2.25 3
Gatos et al. [128] 71.99 15.12 -- 7
Lu et al. [130] 85.49 17.83 -- 6
Moghaddam et al. [132] 86.25 18.04 -- 5
DIBCO-2010 Lu et al. [131] 86.41 18.14 -- 4
Su et al. [110] 89.70 19.15 -- 2nd rank of contest 2
Su et al. [110] 91.50 19.78 -- 1st rank of contest 1
Proposed (Sauvola-Gabor) 86.68 18.50 3.61 3
Natarajan [58] [134] 72.59 13.53 10.94 13
Lelore et al. [108] [135] 80.86 16.13 104.48 1st rank of contest 12
Lu et al. [131] 81.67 15.59 11.24 11
Sehad et al. [21] 82.07 15.59 8.13 10
Gatos et al. [128] 82.11 16.04 5.42 9
Su et al. [108] 17.16 15.66 2nd rank of 8
85.20
DIBCO-2011 contest
Su et al. [130] 85.56 16.75 6.02 7
Moghaddam et al. [132] 86.58 16.88 4.36 6
Sehad et al. [103] 87.44 17.20 4.60 5
Su et al. [62] 87.80 17.56 4.84 2
Howe [108] [136] 88.74 17.84 5.37 3rd rank of contest 2
Nafchi et al. [133] 91.56 18.68 2.74 1
Proposed (Sauvola-Gabor) 88.90 17.51 4.90 4
Sehad et al. [105] 80.87 17.06 5.90 8
Sehad et al. [103] 82.92 17.83 5.80 7
Su et al. [130] 85.56 16.75 6.02 6
DIBCO-2012 Moghaddam et al. [132] 87.73 18.50 4.36 5
Howe [112] [136] 89.47 20.14 3.04 1st rank of contest 3
Nafchi et al.[133] 92.23 19.93 2.61 1
Lelore et al. [108] [135] 92.84 17.73 2.66 4
Proposed (Sauvola-Gabor) 87.95 19.53 1.50 2
Sehad et al. [105] 84 .35 17.04 4.07 5
Nafchi et al. [137] 87.53 17.62 5.75 5
DIBCO-2013 Lu et al. [131] 88.84 18.75 4.98 4
Nafchi et al. [133] 90.99 19.44 3.47 3
Lu et al. [109] 92.12 20.68 3.10 1st rank of contest 1
Proposed (Sauvola-Gabor) 88.89 19.53 1.50 2

90
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

To appreciate the importance of using the weighting function for selecting the dominant slant
angle, Table 5.4, shows a comparison of the proposed method on a sample subset of degraded
images when using the Sauvola-Gabor method with and without the weighting function. The
obtained results clearly show an important enhancement of all measures when the weighting
function is used. For instance, for Ink bleed-through degradation, the F-Measure grows from
85.92 (without weighting) to 91.80 (with weighting). Similar observations can be deduced for
other degradations. This proves that considering the dominant angle allows improving the
binarization of the image.

Table 5.3: Performance evaluation on unblind DIBCO datasets according to degradation type.
Degradation type Method FM PSNR DRD Rank
Stain Niblack 56.01 6.71 29.10 6
Sauvola 83.90 16.20 2.03 4
Wolf 86.28 16.11 1.94 3
Niblak-Gabor 85.24 16.55 1.55 2
Sauvola-Gabor 89.50 16.67 1.70 1
Wolf-Gabor 53.55 9.05 22.70 5
Ink bleed-through Niblack 59.35 7.08 32.02 6
Sauvola 83.16 16.65 2.48 3
Wolf 83.10 15.98 2.10 2
Niblak-Gabor 82.60 15.55 1.61 4
Sauvola-Gabor 91.80 18.60 1.45 1
Wolf-Gabor 45.91 9.45 34.10 5
Non-uniform Background Niblack 58.40 4.96 21.93 6
Sauvola 75.62 14.94 5.84 3
Wolf 74.86 16.43 6.10 2
Niblak-Gabor 68.97 15.31 7.02 4
Sauvola-Gabor 81.20 16.70 1.92 1
Wolf-Gabor 35.60 8.24 25.94 5
Ink intensity variation Niblack 56.54 6.81 34.81 6
Sauvola 68.77 14.97 12.51 5
Wolf 64.50 15.01 10.62 4
Niblak-Gabor 75.90 16.90 4.11 2
Sauvola-Gabor 89.09 18.40 2.01 1
Wolf-Gabor 71.01 16.69 5.32 3

Table 5.4: Influence of the weighting function used in the Sauvola-Gabor.


Degradation type Weighting FM PSNR DRD Rank
Yes 89.50 16.67 1.70 1
Stain
No 88.20 15.50 4.77 2
Yes 91.80 18.60 1.45 1
Ink bleed-through
No 85.92 13.46 23.71 2
Yes 81.20 16.70 1.92 1
Non-uniform background
No 80.01 15.74 16.61 2
Yes 89.09 18.40 2.01 1
Ink intensity variation
No 87.63 16.95 5.11 2

91
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

For a subjective visual evaluation, some samples of document images for the stain, ink bleed-
through, non-uniform background, and ink degradations are depicted respectively in Figure
5.14, Figure 5.15, Figure 5.16, and Figure 5.17. We notice a clear improvement in terms of
binarization when we visually compare the binarized document images using our method with
the ground truth images and those binarized using a classical method such as Sauvola’s method.

(a) (b)

(c) (d)
Figure 5.14: Sample of subjective evaluation for Stain degradation: (a) Degraded image H02 DIBCO-2012 as
stain, (b) Sauvola’s method, (c) Ground truth image, (d) The proposed Sauvola-Gabor method.

(a) (b)

(c) (d)
Figure 5.15: Sample of subjective evaluation for Ink bleed-through degradation: (a) Degraded image H06
DIBCO-2012 as ink bleed-through, (b) Sauvola’s method, (c) Ground truth image, (d) The proposed Sauvola-
Gabor method.

92
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

(a) (b)

(c) (d)
Figure 5.16. Sample of subjective evaluation for non-uniform background: (a) Degraded image: PR03 DIBCO-
2013 as non-uniform background, (b) Sauvola’s method (c) Ground truth image, (d) The proposed Sauvola-Gabor
method.

(a) (b)

(c) (d)
Figure 5.17: Sample of subjective evaluation for ink degradation: (a) Degraded image H09 DIBCO-2012 as
ink degradation, (b) Sauvola’s method, (c) Ground truth image, (d) The proposed Sauvola-Gabor method.

93
CHAPTER 5: GABOR FILTER-BASED DEGRADED DOCUMENT IMAGE BINARIZATION

This chapter presents a new method suitable for a software and hardware implementation
based on Gabor filters for the binarization of ancient degraded documents. The Gabor filter
bank is designed by considering the degradation type of the document based on the unblind
protocol. First, the document image is pre-processed using a Wiener filter to smooth the
degradation. Subsequently, the binarization threshold is estimated using texture features, such
as the mean and the standard deviation, extracted from the respective original image and the
filtered document image. Furthermore, a new protocol namely unblind protocol is proposed for
estimating the standard deviation according to the degradation type for setting the optimal
parameters of the Gabor filter such as the central frequency and the number of angles. The
dominant slant angle is estimated using a 2D Fourier transform. Finally, morphological
operators as post-processing are applied to the binarized image to reduce some artifacts. The
comparison is achieved using the proposed method against various well-known binarization
methods. The experimental evaluation and extensive tests performed on blind and unblind
samples showed the benefit of combining the Gabor filter with the standard thresholding
methods for binarizing ancient documents. Indeed, the Sauvola-Gabor method ranks first
against all the other methods. However, the proposed method seems more suitable for ink bleed-
through degradation and ink degradation. This outcome is explained by the directional feature
of the Gabor filter bank. Moreover, for poorly contrasted images, the standard deviation in the
Gabor space is increased. Consequently, the proposed method can be suitable for poorly
contrasted documents.
For future research, the automatic detection of the degradation type seems the first path to
be explored for adapting automatically the parameters of the Gabor filter by exploiting the shape
of the F-Measure curves as a function of the frequency, which is a recurring issue for
researchers. Another exploration is the estimation of the optimal parameters of the Gabor filter
without a reference binarized image. Finally, non-local texture information combined with
other texture features constitutes an interesting way for better extraction of local textural
features.

94
In this thesis, we have addressed the problem of ancient degraded document binarization. It
is worth noting that we deal with document images acquired with devices in the visible field
and we do not address the problem by processing document images in the field of invisible,
multispectral, laser, and ultrasonic images.

The problem of the binarization of degraded documents is still difficult because of the
variety, the non-uniformity, the complexity of the degradations, and the difficulty to model the
degradations. We focused mainly on introducing and exploring new texture-based binarization
methods with a new representation space to be effective enough despite hard degradations type
to extract the foreground (text) from the background with as little alteration as possible. Three
methods based on texture are explored to binarize the ancient degraded documents.

Three methods based respectively on co-occurrences, LBP, and Gabor filters and combined
with a thresholding-based method are used to overcome the lack of grayscale pixel-based
methods. Our contributions are summarized as follows:

We have presented a new thresholding method based on texture features extracted from a
co-occurrence matrix. The general observation that can be noticed is, the co-occurrence allows
interesting discrimination between the text and the background. The method is inspired by
Niblack’s method and enhanced by using the well-known texture descriptor based on the co-
occurrence matrix. The Haralick’s attributes, such as contrast and mean are computed from the
co-occurrence matrix, under parameters of a distance-vector module equal to one with four
directions (0°, 30°, 45°, 135°). The best parameters are defined by experiments. The best results
were performed for Haralick’s contrast attribute for angle 135°. The subjective and objective
evaluations showed good results. Compared to the state-of-the-art, the results are promising and
satisfactory. Nevertheless, we notice some weaknesses of binarization for the stained document
category. In future work, more attention will be given to this type of degradation.

95
CONCLUSION

Local Binary Pattern (LBP) is used as a texture measure within a thresholding-based method.
The mean and variance of pixels are computed respectively from both the original document
image and the LBP image. Then, these features are used within a threshold-based method. The
method presents some weaknesses for poor contrasted documents. To overcome this poorly
drawback another variant is computed by combining contrast information with the basic LBP
operator. The proposed method is tested for subjective and objective evaluation, using multiple
metrics. Tests are conducted on three DIBCO-datasets organized by their year of submission
and by type of degradation. We notice that SLBP works better than SLBP-C for ink-bleed
through degradation, however, SLBP-C works better for the other degradations, which makes
them both together complementary. Their combination outperforms all other well-known
thresholding-based methods in terms of FM, PSNR, and DRD. Particularly, we notice above
all that in certain document images, the results are much better. The LBP can be considered as
a good candidate for the binarization of degraded document images.

A new method based on Gabor filters is applied for the binarization of ancient degraded
documents. After an enhancement step, a binarization threshold is estimated using texture
features, such as the mean and the standard deviation, extracted from the respective original
image and the filtered document image based on Gabor-filters. The optimal parameters of the
Gabor filter such as the central frequency and the number of angles are estimated by
experiments and by introducing a new protocol namely unblind (based on using document
images by type of degradation). The comparison is achieved using the proposed method against
the state-of-the-art and various well-known binarization methods. The experimental evaluation
and extensive tests performed on blind and unblind samples showed good results against the
state-of-the-art methods, for binarizing ancient documents. It is worth noting that the Sauvola–
Gabor method ranks first against all the other classical thresholding-based methods. Better
results are shown for ink bleed-through degradation and ink degradation. This outcome is
explained by the directional feature of the Gabor filter bank.

In a general way, experimental results conducted on various datasets prove that the texture
and its variants can constitute an interesting approach to explore for improving the existing

96
CONCLUSION

binarization methods. The combination of the three methods could be an interesting way to
explore to enhance the binarization results.
For future research, the automatic detection of the degradation type seems the first path to
be explored for adapting automatically the parameters of the Gabor filter by exploiting the shape
of the F-measure curves as a function of the frequency, which is a recurring issue for
researchers. Another exploration is the estimation of the optimal parameters of the binarization
methods without a reference binarized image.
Finally, non-local texture information combined with a more accurate dominant angle slant
estimation of the document script constitutes an interesting way for better extraction of local
textural features. For instance, features extracted from the co-occurrence-based texture could
be processed on other representation spaces of the document image.

97
98
Sehad, A., Chibani, Y., Hedjam, R., & Cheriet, M. (2019). Gabor filter-based texture for
ancient degraded document image binarization. Pattern Analysis and Applications, 22(1), 1-22.
-Doi: 10.1007/s10044-018-0747-7
https://link.springer.com/article/10.1007/s10044-018-0747-7

1- Sehad, A., Chibani, Y., Cheriet, M., & Yaddaden, Y. (2013, September). Ancient degraded
document image binarization based on texture features. In 2013 8th International Symposium
on Image and Signal Processing and Analysis (ISPA) (pp. 189-193). IEEE.
-https://ieeexplore.ieee.org/abstract/document/6703737
2- Brik, Y., Chibani, Y., Zemouri, E. T., and Sehad, A. (2013, September). Ridgelet-DTW-
based word spotting for Arabic historical document. In 2013 8th International Symposium on
Image and Signal Processing and Analysis (ISPA) (pp. 194-199). IEEE.
-https://ieeexplore.ieee.org/document/6703738
3- Sehad, A., Chibani, Y., & Cheriet, M. (2014, September). Gabor filters for degraded
document image binarization. In 2014 14th International Conference on Frontiers in
Handwriting Recognition (pp. 702-707). IEEE.
-https://ieeexplore.ieee.org/document/6981102
4- Sehad, A., Chibani, Y., Hedjam, R., & Cheriet, M. (2015, November). LBP-based degraded
document image binarization. In 2015 International Conference on Image Processing Theory,
Tools and Applications (IPTA) (pp. 213-217). IEEE.
-https://ieeexplore.ieee.org/document/7367131
5- Djema, A., Chibani, Y., Sehad, A., & Zemouri, E. T. (2015, August). Blind versus unblind
performance evaluation of binarization methods. In 2015 13th International Conference on
Document Analysis and Recognition (ICDAR) (pp. 511-515). IEEE.
-https://ieeexplore.ieee.org/document/7333814

99
[1] S. Marinai, "Introduction to document analysis and recognition," in Machine learning
in document analysis and recognition, ed: Springer, pp. 1-20, 2008.

[2] S. Mao, A. Rosenfeld, and T. Kanungo, "Document structure analysis algorithms: a


literature survey," in Document Recognition and Retrieval X, pp. 197-208, 2003.

[3] S. Impedovo, L. Ottaviano, and S. Occhinegro, "Optical character recognition—a


survey," International Journal of Pattern Recognition and Artificial Intelligence, vol.
5, pp. 1-24, 1991.

[4] B. Gatos, K. Ntirogiannis, and I. Pratikakis, "ICDAR 2009 document image binarization
contest (DIBCO 2009)," in 2009 10th International Conference on document analysis
and recognition, pp. 1375-1382, 2009.

[5] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "H-DIBCO 2010-handwritten document


image binarization competition," in 2010 12th International Conference on Frontiers in
Handwriting Recognition, pp. 727-732, 2010.

[6] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICFHR 2012 competition on handwritten


document image binarization (H-DIBCO 2012)," in 2012 international conference on
frontiers in handwriting recognition, pp. 817-822, 2012.

[7] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2013 document image binarization
contest (DIBCO 2013)," in 2013 12th International Conference on Document Analysis
and Recognition, pp. 1471-1476, 2013.

[8] K. Ntirogiannis, B. Gatos, and I. Pratikakis, "ICFHR2014 competition on handwritten


document image binarization (H-DIBCO 2014)," in 2014 14th International conference
on frontiers in handwriting recognition, pp. 809-813, 2014.

[9] I. Pratikakis, K. Zagoris, G. Barlas, and B. Gatos, "ICDAR2017 competition on


document image binarization (DIBCO 2017)," in 2017 14th IAPR International
Conference on Document Analysis and Recognition (ICDAR), pp. 1395-1403, 2017.

[10] I. B. Messaoud, H. Amiri, H. El Abed, and V. Margner, "New binarization approach


based on text block extraction," in 2011 International Conference on Document
Analysis and Recognition, pp. 1205-1209, 2011.

[11] M. A. Ramírez-Ortegón, V. Märgner, E. Cuevas, and R. Rojas, "An optimization for


binarization methods by removing binary artifacts," Pattern Recognition Letters, vol.
34, pp. 1299-1306, 2013.

[12] R. F. Moghaddam and M. Cheriet, "AdOtsu: An adaptive and parameterless


generalization of Otsu's method for document image binarization," Pattern Recognition,
vol. 45, pp. 2419-2431, 2012.

101
REFERENCES

[13] K. Ntirogiannis, B. Gatos, and I. Pratikakis, "A modified adaptive logical level
binarization technique for historical document images," in 2009 10th International
Conference on Document Analysis and Recognition, pp. 1171-1175, 2009.

[14] S. Lu, B. Su, and C. L. Tan, "Document image binarization using background estimation
and stroke edges," International Journal on Document Analysis and Recognition
(IJDAR), vol. 13, pp. 303-314, 2010.

[15] B. Bataineh, S. N. H. S. Abdullah, and K. Omar, "An adaptive local binarization method
for document images based on a novel thresholding method and dynamic windows,"
Pattern Recognition Letters, vol. 32, pp. 1805-1813, 2011.

[16] K. Khurshid, I. Siddiqi, C. Faure, and N. Vincent, "Comparison of Niblack inspired


binarization methods for ancient documents," in Document Recognition and Retrieval
XVI, p. 72470U, 2009.

[17] W. Niblack, An introduction to digital image processing vol. 34: Prentice-Hall


Englewood Cliffs, 1986.

[18] C. Wolf, J.-M. Jolion, and F. Chassaing, "Text localization, enhancement, and
binarization in multimedia documents," in Object recognition supported by user
interaction for service robots, pp. 1037-1040, 2002.

[19] M. Unser, "Texture classification and segmentation using wavelet frames," IEEE
Transactions on image processing, vol. 4, pp. 1549-1560, 1995.

[20] S. Meshgini, A. Aghagolzadeh, and H. Seyedarabi, "Face recognition using Gabor-


based direct linear discriminant analysis and support vector machine," Computers &
Electrical Engineering, vol. 39, pp. 727-745, 2013.

[21] L. Shen and L. Bai, "A review on Gabor wavelets for face recognition," Pattern Analysis
and Applications, vol. 9, pp. 273-292, 2006.

[22] T. Celik and T. Tjahjadi, "Unsupervised color image segmentation using dual-tree
complex wavelet transform," Computer vision and image understanding, vol. 114, pp.
813-826, 2010.

[23] A. G. Zuñiga, J. B. Florindo, and O. M. Bruno, "Gabor wavelets combined with


volumetric fractal dimension applied to texture analysis," Pattern Recognition Letters,
vol. 36, pp. 135-143, 2014.

[24] Y. Liu and S. N. Srihari, "Document image binarization based on texture features,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 540-544,
1997.

[25] S. Mandal, S. Biswas, A. K. Das, and B. Chanda, "Binarisation of color map images
through the extraction of regions," in International Conference on Computer Vision and
Graphics, pp. 418-427, 2014.

[26] M. Jameson, "Promises and challenges of digital libraries and document image analysis:
a humanist's perspective," in First International Workshop on Document Image
Analysis for Libraries, 2004. Proceedings, pp. 54-61, 2004.
102
REFERENCES

[27] I. H. Witten, D. Bainbridge, and D. M. Nichols, How to build a digital library: Morgan
Kaufmann, 2009.

[28] A.-L. Dupont, "Le patrimoine culturel sur papier. De la compréhension des processus
d'altération à la conception de procédés de stabilisation," Université Evry Val
d'Essonne, 2014.

[29] K. Ntirogiannis, B. Gatos, and I. Pratikakis, "A combined approach for the binarization
of handwritten document images," Pattern Recognition Letters, vol. 35, pp. 3-15, 2014.

[30] R. Kasturi, L. O’gorman, and V. Govindaraju, "Document image analysis: A primer,"


Sadhana, vol. 27, pp. 3-22, 2002.

[31] L. O'Gorman and R. Kasturi, Document image analysis vol. 39: IEEE Computer Society
Press Los Alamitos, 1995.

[32] G. Nagy, "Twenty years of document image analysis in PAMI," IEEE Transactions on
Pattern Analysis & Machine Intelligence, pp. 38-62, 2000.

[33] D. Doermann, "The indexing and retrieval of document images: A survey," Computer
vision and image understanding, vol. 70, pp. 287-298, 1998.

[34] C. L. Tan, W. Huang, Z. Yu, and Y. Xu, "Imaged document text retrieval without OCR,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 838-844,
2002.

[35] B. Gatos and I. Pratikakis, "Segmentation-free word spotting in historical printed


documents," in 2009 10th International Conference on Document Analysis and
Recognition, pp. 271-275, 2009.

[36] T. M. Rath and R. Manmatha, "Word spotting for historical documents," International
Journal of Document Analysis and Recognition (IJDAR), vol. 9, pp. 139-152, 2007.

[37] R. Hedjam and M. Cheriet, "Historical document image restoration using a multispectral
imaging system," Pattern Recognition, vol. 46, pp. 2297-2312, 2013.

[38] R. F. Moghaddam and M. Cheriet, "RSLDI: Restoration of single-sided low-quality


document images," Pattern Recognition, vol. 42, pp. 3355-3364, 2009.

[39] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, "Image inpainting," in


Proceedings of the 27th annual conference on Computer graphics and interactive
techniques, pp. 417-424, 2000.

[40] A. Criminisi, P. Pérez, and K. Toyama, "Region filling and object removal by exemplar-
based image inpainting," IEEE Transactions on image processing, vol. 13, pp. 1200-
1212, 2004.

[41] W. A. Mustafa and H. Yazid, "Illumination and Contrast Correction Strategy using
Bilateral Filtering and Binarization Comparison," Journal of Telecommunication,
Electronic and Computer Engineering (JTEC), vol. 8, pp. 67-73, 2016.

103
REFERENCES

[42] J. Wen, S. Li, and J. Sun, "A new binarization method for non-uniform illuminated
document images," Pattern Recognition, vol. 46, pp. 1670-1690, 2013.

[43] M. Sezgin and B. Sankur, "Survey over image thresholding techniques and quantitative
performance evaluation," Journal of Electronic Imaging, vol. 13, pp. 146-166, 2004.

[44] A. Tonazzini, L. Bedini, and E. Salerno, "Independent component analysis for document
restoration," Document Analysis and Recognition, vol. 7, pp. 17-27, 2004.

[45] F. Drira, "Contribution à la restauration des images de documents anciens," Doctoral


Thesis, Informatique et Information pour la Société (EDIIS), LIRIS, UMR 5205 CNRS,
2007.

[46] Q. Wang and C. L. Tan, "Matching of double-sided document images to remove


interference," in Proceedings of the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition. CVPR 2001, pp. I-I, 2001.

[47] R. Cao, C. L. Tan, and P. Shen, "A wavelet approach to double-sided document image
pair processing," in Proceedings 2001 International Conference on Image Processing
(Cat. No. 01CH37205), pp. 174-177, 2001.

[48] R. F. Moghaddam and M. Cheriet, "Application of multi-level classifiers and clustering


for automatic word spotting in historical document images," in 2009 10th International
Conference on Document Analysis and Recognition, pp. 511-515, 2009.

[49] A. Tonazzini, E. Salerno, M. Mochi, and L. Bedini, "Blind source separation techniques
for detecting hidden texts and textures in document images," in International
Conference Image Analysis and Recognition, pp. 241-248, 2004.

[50] N. Ntogas and D. Veintzas, "A binarization algorithm for historical manuscripts," in
WSEAS International Conference. Proceedings. Mathematics and Computers in
Science and Engineering, pp. 41-51, 2008.

[51] T. R. Singh, S. Roy, O. I. Singh, T. Sinam, and K. Singh, "A new local adaptive
thresholding technique in binarization," arXiv preprint arXiv:1201.5227, 2012.

[52] G. Leedham, S. Varma, A. Patankar, and V. Govindaraju, "Separating text and


background in degraded document images-a comparison of global thresholding
techniques for multi-stage thresholding," in Proceedings Eighth International
Workshop on Frontiers in Handwriting Recognition, pp. 244-249, 2002.

[53] W. Niblack, "An Introduction to Digital Image Processing (Birkerød," ed: Strandberg
Publishing Company, 1985.

[54] O. D. Trier and T. Taxt, "Evaluation of binarization methods for document images,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, pp. 312-315,
1995.

[55] J. Sauvola and M. Pietikäinen, "Adaptive document image binarization," Pattern


Recognition, vol. 33, pp. 225-236, 2000.

104
REFERENCES

[56] N. Otsu, "A threshold selection method from gray-level histograms," IEEE transactions
on systems, man, and cybernetics, vol. 9, pp. 62-66, 1979.

[57] J. N. Kapur, P. K. Sahoo, and A. K. Wong, "A new method for gray-level picture
thresholding using the entropy of the histogram," Computer vision, graphics, and image
processing, vol. 29, pp. 273-285, 1985.

[58] J. Natarajan and I. Sreedevi, "Enhancement of ancient manuscript images by log based
binarization technique," AEU-International Journal of Electronics and
Communications, vol. 75, pp. 15-22, 2017.

[59] A. Sehad, Y. Chibani, M. Cheriet, and Y. Yaddaden, "Ancient degraded document


image binarization based on texture features," in 2013 8th International Symposium on
Image and Signal Processing and Analysis (ISPA), pp. 189-193, 2013.

[60] G. Pratikakis, "I. and K. Ntirogiannis,“Icdar 2011 document image binarization contest
(dibco 2011),”" in International Conference on Document Analysis and Recognition.
IEEE Computer Society, p. 15061510, 2011.

[61] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "Document Image Binarization Contest


(DIBCO 2011), ICDAR 2011: 11th International Conference on Document Analysis
and Recognition, China, Beijing, 18–21 September, 2011," ed: Beijing, 2011.

[62] B. Su, S. Lu, and C. L. Tan, "Robust document image binarization technique for
degraded document images," Image Processing, IEEE Transactions on, vol. 22, pp.
1408-1417, 2013.

[63] N. R. Howe, "Document binarization with automatic parameter tuning," International


Journal on Document Analysis and Recognition (IJDAR), vol. 16, pp. 247-258, 2013.

[64] M. Cheriet, R. F. Moghaddam, and R. Hedjam, "A learning framework for the
optimization and automation of document binarization methods," Computer vision and
image understanding, vol. 117, pp. 269-280, 2013.

[65] J. A. Hartigan and M. A. Wong, "Algorithm AS 136: A k-means clustering algorithm,"


Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, pp. 100-
108, 1979.

[66] K. Fukunaga and L. Hostetler, "The estimation of the gradient of a density function,
with applications in pattern recognition," IEEE Transactions on information theory, vol.
21, pp. 32-40, 1975.

[67] D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space
analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 603-619,
2002.

[68] A. E. Savakis, "Adaptive document image thresholding using foreground and


background clustering," in Proceedings 1998 International Conference on Image
Processing. ICIP98 (Cat. No. 98CB36269), pp. 785-789, 1998.

105
REFERENCES

[69] R. G. Mesquita, C. A. Mello, and L. Almeida, "A new thresholding algorithm for
document images based on the perception of objects by distance," Integrated Computer-
Aided Engineering, vol. 21, pp. 133-146, 2014.

[70] M. Soua, R. Kachouri, and M. Akil, "A new hybrid binarization method based on
Kmeans," in 2014 6th International Symposium on Communications, Control and
Signal Processing (ISCCSP), pp. 118-123, 2014.

[71] Y. Leydier, F. Le Bourgeois, and H. Emptoz, "Serialized k-means for adaptative color
image segmentation," in International Workshop on Document Analysis Systems, pp.
252-263, 2004

[72] C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data
mining and knowledge discovery, vol. 2, pp. 121-167, 1998.

[73] V. Vapnik and V. Vapnik, "Statistical learning theory Wiley," New York, vol. 1, 1998.

[74] C.-H. Tung and Y.-G. Lin, "Efficient uneven-lighting image binarization by support
vector machines," Journal of Information and Optimization Sciences, vol. 39, pp. 519-
543, 2018.

[75] F. Kasmin, A. Abdullah, and A. S. Prabuwono, "Ensemble of steerable local


neighborhood grey-level information for binarization," Pattern Recognition Letters,
vol. 98, pp. 8-15, 2017.

[76] C. Thillou and B. Gosselin, "Color binarization for complex camera-based images," in
Color Imaging X: Processing, Hardcopy, and Applications, pp. 301-308, 2005.

[77] C. M. Bishop, Neural networks for pattern recognition: Oxford university press, 1995.

[78] O. Omidvar and J. Dayhoff, Neural networks and pattern recognition: Elsevier, 1997
.
[79] Y. Pao, "Adaptive pattern recognition and neural networks," 1989.

[80] W. Xiong, J. Xu, Z. Xiong, J. Wang, and M. Liu, "Degraded historical document image
binarization using local features and support vector machine (SVM)," Optik, vol. 164,
pp. 218-223, 2018.

[81] A. Djema and Y. Chibani, "Binarization of historical documents using self-learning


classifier based on k-means and SVM," in 21st European Signal Processing Conference
(EUSIPCO 2013), pp. 1-5, 2013.

[82] C.-H. Chou, W.-H. Lin, and F. Chang, "A binarization method with learning-built rules
for document images produced by cameras," Pattern Recognition, vol. 43, pp. 1518-
1530, 2010.

[83] F. Westphal, N. Lavesson, and H. Grahn, "Document image binarization using recurrent
neural networks," in 2018 13th IAPR International Workshop on Document Analysis
Systems (DAS), pp. 263-268, 2018.

106
REFERENCES

[84] C. Tensmeyer and T. Martinez, "Document image binarization with fully convolutional
neural networks," in 2017 14th IAPR International Conference on Document Analysis
and Recognition (ICDAR), pp. 99-104, 2017.

[85] T. Sari, A. Kefali, and H. Bahi, "An MLP for binarizing images of old manuscripts," in
2012 International Conference on Frontiers in Handwriting Recognition, pp. 247-251.
, 2012

[86] A. Khashman and B. Sekeroglu, "Document image binarisation using a supervised


neural network," International Journal of Neural Systems, vol. 18, pp. 405-418, 2008.

[87] L. Kang, P. Ye, Y. Li, and D. Doermann, "A deep learning approach to document image
quality assessment," in 2014 IEEE International Conference on Image Processing
(ICIP), pp. 2570-2574, 2014.

[88] G. Meng, K. Yuan, Y. Wu, S. Xiang, and C. Pan, "Deep networks for degraded
document image binarization through pyramid reconstruction," in 2017 14th IAPR
International Conference on Document Analysis and Recognition (ICDAR), pp. 727-
732. , 2017

[89] J. Pastor-Pellicer, S. España-Boquera, F. Zamora-Martínez, M. Z. Afzal, and M. J.


Castro-Bleda, "Insights on the use of convolutional neural networks for document image
binarization," in International Work-Conference on Artificial Neural Networks, pp. 115-
126, 2015

[90] Q. N. Vo, S. H. Kim, H. J. Yang, and G. Lee, "Binarization of degraded document


images based on hierarchical deep supervised network," Pattern recognition, vol. 74,
pp. 568-586, 2018.

[91] A. Gagalowicz, "Vers un modèle de textures," 1983.

[92] C. Chen, L. Pau, P. Wang, and S. Wang, "Texture analysis," Handbook of Pattern
Recognition and Computer Vision, pp. 207-248, 1998.

[93] M. Mirmehdi, Handbook of texture analysis: Imperial College Press, 2008.

[94] H. Tamura, S. Mori, and T. Yamawaki, "Textural features corresponding to visual


perception," IEEE transactions on systems, man, and cybernetics, vol. 8, pp. 460-473,
1978.

[95] J. Sklansky, "Image segmentation and feature extraction," IEEE transactions on


systems, man, and cybernetics, vol. 8, pp. 237-247, 1978.

[96] M. Pietikäinen, A. Hadid, G. Zhao, and T. Ahonen, Computer vision using local binary
patterns vol. 40: Springer Science & Business Media, 2011.

[97] D. L. Pham, C. Xu, and J. L. Prince, "Current methods in medical image segmentation,"
Annual review of biomedical engineering, vol. 2, pp. 315-337, 2000.

[98] J. Malik, S. Belongie, T. Leung, and J. Shi, "Contour and texture analysis for image
segmentation," International Journal of computer vision, vol. 43, pp. 7-27, 2001.

107
REFERENCES

[99] A. Yimit, Y. Hagihara, T. Miyoshi, and Y. Hagihara, "2-D direction histogram-based


entropic thresholding," Neurocomputing, vol. 120, pp. 287-297, 2013.

[100] Y. Liu and S. N. Srihari, "Document image binarization based on texture features,"
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 540-544,
1997.

[101] B. Wang, X.-F. Li, F. Liu, and F.-Q. Hu, "Color text image binarization based on binary
texture analysis," Pattern Recognition Letters, vol. 26, pp. 1568-1576, 2005.

[102] N. Armanfard, M. Valizadeh, M. Komeili, and E. Kabir, "Document image binarization


by using texture-edge descriptor," in 2009 14th International CSI Computer
Conference, pp. 134-139, 2009.

[103] A. Sehad, Y. Chibani, and M. Cheriet, "Gabor Filters for Degraded Document Image
Binarization," in Frontiers in Handwriting Recognition (ICFHR), 2014 14th
International Conference on, pp. 702-707, 2014.

[104] A. Sehad, Y. Chibani, R. Hedjam, and M. Cheriet, "Gabor filter-based texture for
ancient degraded document image binarization," Pattern Analysis and Applications, vol.
22, pp. 1-22, 2019.

[105] A. Sehad, Y. Chibani, R. Hedjam, and M. Cheriet, "LBP-based degraded document


image binarization," in Image Processing Theory, Tools and Applications (IPTA), 2015
International Conference on, pp. 213-217, 2015.

[106] I. Pratikakis, K. Zagoris, G. Barlas, and B. Gatos, "ICFHR2016 handwritten document


image binarization contest (H-DIBCO 2016)," in 2016 15th International Conference
on Frontiers in Handwriting Recognition (ICFHR), pp. 619-623, 2016.

[107] K. Ntirogiannis, B. Gatos, and I. Pratikakis, "ICFHR2014 Competition on Handwritten


Document Image Binarization (H-DIBCO 2014)," pp. 809-813, 2014.

[108] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2011 Document Image


Binarization Contest (DIBCO 2011)," pp. 1506-1510, 2011.

[109] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICDAR 2013 document image binarization
contest (DIBCO 2013)," in Document Analysis and Recognition (ICDAR), 2013 12th
International Conference on, pp. 1471-1476, 2013.

[110] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "H-DIBCO 2010-handwritten document


image binarization competition," in Frontiers in Handwriting Recognition (ICFHR),
2010 International Conference on, pp. 727-73, 20102.

[111] A. Djema, Y. Chibani, A. Sehad, and E.-T. Zemouri, "Blind versus unblind performance
evaluation of binarization methods," in Document Analysis and Recognition (ICDAR),
2015 13th International Conference on, pp. 511-515, 2015.

[112] I. Pratikakis, B. Gatos, and K. Ntirogiannis, "ICFHR 2012 Competition on Handwritten


Document Image Binarization (H-DIBCO 2012)," pp. 817-822, 2012.

108
REFERENCES

[113] M. Tuceryan and A. K. Jain, "Texture analysis," in Handbook of Pattern Recognition


and Computer Vision, ed: World Scientific, pp. 235-276, 1993.

[114] M. E. Shokr, "Evaluation of second‐order texture parameters for sea ice classification
from radar images," Journal of Geophysical Research: Oceans, vol. 96, pp. 10625-
10640, 1991.

[115] A. Sutter, G. Sperling, and C. Chubb, "Measuring the spatial frequency selectivity of
second-order texture mechanisms," Vision Research, vol. 35, pp. 915-924, 1995.

[116] M.-W. Lin, J.-R. Tapamo, and B. Ndovie, "A texture-based method for document
segmentation and classification," South African Computer Journal, vol. 2006, pp. 49-
56, 2006.

[117] M. Mehri, P. Gomez-Krämer, P. Héroux, A. Boucher, and R. Mullot, "Texture feature


evaluation for segmentation of historical document images," in Proceedings of the 2nd
International Workshop on Historical Document Imaging and Processing, pp. 102-109,
2013.

[118] A. K. Jain and Y. Zhong, "Page segmentation using texture analysis," Pattern
Recognition, vol. 29, pp. 743-770, 1996.

[119] R. M. Haralick and K. Shanmugam, "Computer classification of reservoir sandstones,"


IEEE Transactions on Geoscience Electronics, vol. 11, pp. 171-177, 1973.

[120] D.-C. He and L. Wang, "Texture unit, texture spectrum, and texture analysis," IEEE
transactions on Geoscience and Remote Sensing, vol. 28, pp. 509-512, 1990.

[121] L. Wang and D.-C. He, "Texture classification using texture spectrum," Pattern
Recognition, vol. 23, pp. 905-910, 1990.

[122] A. Hadid, M. Pietikainen, and T. Ahonen, "A discriminative feature space for detecting
and recognizing faces," in Proceedings of the 2004 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2004. CVPR 2004, pp. II-II., 2004.

[123] T. Ojala, M. Pietikainen, and D. Harwood, "Performance evaluation of texture measures


with classification based on Kullback discrimination of distributions," in Proceedings
of 12th International Conference on Pattern Recognition, pp. 582-585, 1994.

[124] Y. Luo, C.-m. Wu, and Y. Zhang, "Facial expression recognition based on fusion feature
of PCA and LBP with SVM," Optik-International Journal for Light and Electron
Optics, vol. 124, pp. 2767-2770, 2013.

[125] A. Halidou, X. You, M. Hamidine, R. A. Etoundi, and L. H. Diakite, "Fast pedestrian


detection based on region of interest and multi-block local binary pattern descriptors,"
Computers & Electrical Engineering, vol. 40, pp. 375-389, 2014.

[126] W. Niblack, An introduction to digital image processing: Strandberg Publishing


Company, 1985.

109
REFERENCES

[127] C. Wolf, J.-M. Jolion, and F. Chassaing, "Text localization, enhancement, and
binarization in multimedia documents," in Pattern Recognition, 2002. Proceedings.
16th International Conference on, pp. 1037-1040, 2002.

[128] B. Gatos, I. Pratikakis, and S. J. Perantonis, "Adaptive degraded document image


binarization," Pattern Recognition, vol. 39, pp. 317-327, 2006.

[129] D. Rivest-Hénault, R. Farrahi Moghaddam, and M. Cheriet, "A local linear level set
method for the binarization of degraded historical document images," International
Journal on Document Analysis and Recognition, vol. 15, pp. 101-124, 2012.

[130] B. Su, S. Lu, and C. L. Tan, "Binarization of historical document images using the local
maximum and minimum," in Proceedings of the 9th IAPR International Workshop on
Document Analysis Systems, pp. 159-166, 2010.

[131] S. Lu, B. Su, and C. L. Tan, "Document image binarization using background estimation
and stroke edges," International journal on document analysis and recognition, vol. 13,
pp. 303-314, 2010.

[132] R. Farrahi Moghaddam and M. Cheriet, "AdOtsu: An adaptive and parameterless


generalization of Otsu's method for document image binarization," Pattern Recognition,
vol. 45, pp. 2419-2431, 2012.

[133] H. Z. Nafchi, R. F. Moghaddam, and M. Cheriet, "Phase-based binarization of ancient


document images: Model and applications," IEEE transactions on image processing,
vol. 23, pp. 2916-2930, 2014.

[134] A. Sehad, Y. Chibani, M. Cheriet, and Y. Yaddaden, "Ancient degraded document


image binarization based on texture features," in Image and Signal Processing and
Analysis (ISPA), 2013 8th International Symposium on, pp. 189-193, 2013.

[135] T. Lelore and F. Bouchara, "Super-Resolved Binarization of Text-Based on the FAIR


Algorithm," pp. 839-843, 2011.

[136] N. R. Howe, "A Laplacian Energy for Document Binarization," pp. 6-10, 2011.

[137] H. Z. Nafchi, R. F. Moghaddam, and M. Cheriet, "Historical document binarization


based on phase information of images," in Asian Conference on Computer Vision, pp.
1-12, 2012.

110
111

You might also like