2017 2 Col Bare JRNL

Edge detection: a review of dissimilarity evaluations and
a proposed normalized measure

Baptiste Magnier
To cite this version:

Baptiste Magnier. Edge detection: a review of dissimilarity evaluations and a proposed normalized
measure: review. Multimedia Tools and Applications, 2018, 77 (8), pp.9489-9533. �10.1007/s11042-
017-5127-6�. �hal-01940231�
HAL Id: hal-01940231

https://hal.science/hal-01940231
Submitted on 30 Nov 2018
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
JOURNAL : MULTIMEDIA TOOLS AND APPLICATIONS, PP. 1–45, SEPTEMBER 2017 1
Edge Detection: A Review of Dissimilarity

Evaluations and a Proposed Normalized Measure
Baptiste Magnier
Abstract—In digital images, edges characterize object bound-

aries, so edge detection remains a crucial stage in numerous
applications. To achieve this task, many edge detectors have
been designed, producing different results, with various qualities
of segmentation. Indeed, optimizing the response obtained by
these detectors has become a crucial issue, and effective contour
assessment assists performance evaluation. In this paper, several (a) Object (b) Contour 1 (c) Contour 2 (d) Contour 3
referenced-based boundary detection evaluations are detailed,
Fig. 1. Several edge chains are available for the same object. The image in
pointing out their advantages and disadvantages, theoretically
(a) is of size 30×30. Contour points are represented in blue for image (b), (c)
and through concrete examples of image edges. Then, a new and (d). Contour 1 represents the inner edge whereas contour 2 corresponds
normalized supervised edge map quality measure is proposed, to the outer boundary. Contour 3 is the result of a step edge detector.
comparing a ground truth contour image, the candidate contour
image and their associated spatial nearness. The effectiveness
of the proposed distance measure is demonstrated theoretically
and through several experiments, comparing the results with the edge detection evaluation leads to a direct assessment of the
methods detailed in the state-of-the-art. In summary, compared considered binary image. Moreover, measures of boundary
to other boundary detection assessments, this new method proved
to be a more reliable edge map quality measure. detection evaluation are based on the fact that a reliable edge
map should characterize all the relevant structures of the
Index Terms—Edge detection, supervised evaluation, distance image. It should also create a minimum of spurious pixels
measures.
or holes (oversights). Therefore, an evaluation can be used to
assess and improve an algorithm, or to optimize edge detector
I. I NTRODUCTION parameters [21], for example thresholds in the last stage of
In computer science, all systems, especially automated the detection.
information processing structures, must be evaluated before In edge detection assessment, the measurement process
being developed, principally for industrial applications or can be classified as using either unsupervised or supervised
medical data. Image processing is no exception to this rule. In evaluation criteria. The first class of methods exploits only the
image analysis, image segmentation is one of the most critical input data image and gives a score of coherence that qualifies
tasks and represents an essential step in low-level vision. the result given by the algorithm [21][17][42]. The second
All the methods developed therefore have to be tested and class computes a similarity/dissimilarity measure between a
assessed, whether regarding edge detection, point matching, segmentation result and a ground truth obtained from synthetic
region segmentation or image restoration/enhancement. data or an expert judgment [9][22][31]. As edge extraction is
Edge detection represents one of the pioneer theoretical performed for image processing tasks and computer vision,
works in image processing tasks [38] and remains a key point similarity measures are important within a broad flied of study,
in many applications. It is extensively used because boundaries for example in image interpretation, consisting in automati-
include the most important structures in the image [54][44]. cally extracting or recognizing objects in an image, compared
Furthermore, edge detection itself could be used to qualify a to a model. Moreover, the assessment can compute a similarity
region segmentation technique [32][34]. In addition, contour or a dissimilarity of the shapes attributes between two of
extraction remains a very useful preprocessing step in image binary objects [13][22][37]. Thus, the problem of supervised
segmentation, registration, reconstruction, interpretation and edge detection assessment amounts to a question of pattern
tracking [33]. An efficient boundary detection method should recognition. Nevertheless, the edge position of an object could
create a contour image containing edges at their correct loca- be interpreted in different ways, as represented in Fig. 1.
tions with a minimum of misclassified pixels. Edge detection Indeed, for a vertical step edge, an edge can be located either
assessment is therefore an essential field of study, but research on the left, or on the right (which corresponds to an inner
has still not necessarily gone deep enough regarding contour or an outer contour). Considering a vertical blurred edge (i.e.
detection for digital images. Contrary to region segmentation ramp contour [6]), the true edge is placed in the middle of
evaluations, which may consider color attributes, contour the blur, but the question of the position remains the same
detection assessments generally use binary images. Thus, an when the size of the blur is even. In the crest lines case [37],
true edges are chosen in the middle of the ridge or of the
Ecole des Mines d’Alès, Site de Croupillac, LGI2P, 7. rue Jules Renard,
30100 ALES, France. E-mail: baptiste.magnier@mines-ales.fr valley when the width of the ridge/valley is equal to a odd
Special thanks to Adam Clark Gimming. number (i.e. at the maxima -top of ridges- or minima -bottom
of valleys- [36] ). Finally, for a single real or synthetic image, a boundary detection assessment that takes into consideration
as several contour chains constituting the ground truth depend the distance of both the false positive and the false negative
on several positions of the true pixels, thus the combinations points created or missed by the boundary detection process.
of the boundary pixel placements are numerous, so many The remainder of this paper is organized as follows. Section
chains could be created/chosen. For example, we can create a II is devoted to an overview of error measures based on
ground truth boundary map of the image (a) in Fig. 1 with as statistics. Next, Section III reviews most existing reference-
many combinations as there are pixels in Contour 1, 2 and 3. based edge measures involving distances of erroneous points,
This observation shows that the evaluation must not be binary, then points out the advantages and drawbacks of different
but rather illustrates the importance of taking into account the measures, followed by the presentation of a new measure.
misplaced edge distance regarding the desired contour of the Section IV presents experimental results for different synthetic
ground true image. For all these reasons, as described in [44], data, and then quality measure results for real images. Finally,
quantitative assessments are generally lacking in proposed Section V gives perspectives for future work and draws the
edge detection methods. conclusions of the study.
In image segmentation evaluation, the Structural Similarity
Index (SSIM ) estimates the visual impact of shifts in an II. C ONFUSION MATRIX - BASED ERROR ASSESSMENTS
image [57]. This measure is based on the grayscale informa- To assess an algorithm, the confusion matrix remains a
tion concerning the attributes that represent the structure of cornerstone in boundary detection evaluation methods. Let Gt
objects in the scene. SSIM consists of three local comparison be the reference contour map corresponding to ground truth
functions, namely luminance comparison, contrast compari- and Dc the detected contour map of an image I. Comparing
son, and structure comparison between two signals excluding pixel per pixel Gt and Dc , the first criterion to be assessed
other remaining errors. Unlike the P SN R (Peak Signal to is the common presence of edge/non-edge points. A basic
Noise Ratio) or RM SE (Root-Mean-Square Error), which evaluation is compounded from statistics resulting from a
are measured at the global or the image level, the SSIM confusion matrix. To that effect, Gt and Dc are combined.
is computed locally by moving a 8×8 window for each pixel. Afterwards, denoting | · | as the cardinality of a set, all points
The final score of an entire image corresponds to the mean are divided into four sets:
of all the local scores. Applications of this image quality • True Positive points (TPs), common points of Gt and Dc :
evaluation include automatically judging the performance of T P = |Dc ∩ Gt |,
compression or restoration algorithms, concerning grayscale or • False Positive points (FPs), spurious detected edges, i.e.
color images. Even though SSIM can be applied in the case erroneous pixels of Dc and not in Gt defined as boundary:
of an edge detection evaluation, in the presence of too many F P = |Dc ∩ ¬Gt |,
areas without contours, the obtained score is not efficient or • False Negative points (FNs), missing boundary points of
useful (in order to judge the quality of edge detection with Dc , i.e. holes in the true contour: F N = |¬Dc ∩ Gt |,
the SSIM , it is necessary to compare with an image having • True Negative points (TNs), common non-edge points of
detected edges situated throughout the image areas). Gt and Dc : T N = |¬Dc ∩ ¬Gt |.
This work focusses on comparisons of supervised edge
On the one hand, let us consider boundary detection of natural
detection evaluations with respect to binary representation
images, FPs appear in the presence of noise, texture or other
of the boundaries. As introduced above, a supervised eval-
contours influencing the filter used by the edge detection
uation process estimates scores between a ground truth and
operator. On the other hand, FNs represent holes in a contour
a candidate edge map (both binary images). These scores
of Dc (generally caused by blurred edges in the original image
could be evaluated counting the number of erroneous pixels,
I). For example, an incorrect threshold of the segmentation
but also through spatial distances of misplaced or undetected
could generate both FPs and FNs. In the experiment, TPs, FPs
contours; this paper outlines the various algorithms presented
and FNs are respectively represented in green, red and blue:
in the literature. In addition, a new supervised edge map
see illustrations in Fig. 2 for more details. Computing only FPs
quality measure based on the distances of misplaced pixels is
and FNs enables a segmentation assessment to be performed
presented and compared to the others; the score obtained by
[35][36]. Yet, combining at least these two quantities enables
this measure is normalized and can be interpreted for each
type of contour. In order to present the effectiveness and
efficiency of the proposed method, this new formula is com-
pared with others presented and detailed in the state-of-the- TP pixel
art. Thus, several comparisons of edge detection assessments
FP pixel
are carried out for different perturbations such as: addition of
false positive points, creation of false negative points, over- FN pixel
segmentation near the edge (until total dilation of the edge),
contour displacement, modification of the image size... These TN pixel
experiments indicate that some measures are not appropriate
for the evaluation of edge detection for each type of boundary, (a) Gt (b) Dc (c) Gt ∪Dc (d) Legend
and an ideal measure must be one that reacts as coherently Fig. 2. Illustration of TP (green), FP (red) and FN (blue) points. In (b), Dc
possible to reality. Therefore, this paper shows the relevance of is contaminated with 6 FPs ans 4 FNs, illustrated with colors in (c).
TABLE I
L IST OF ERROR MEASURES INVOLVING ONLY STATISTICS .
Error measure name Formulation Parameters

s
|Dc |
Binary noise-to-signal ratio [58] BSN R (Gt , Dc ) = None
FP + FN
Complemented P erf ormance measure ∗ (G , D ) = 1 − TP

Pm t c None
[51] [19] TP + FP + FN
TP2
Segmentation Success Ratio [55] SSR (Gt , Dc ) = 1 − None
|Gt | · |Dc |
TPR · TN
Complemented Φ measure [56] Φ∗ (Gt , Dc ) = 1 − None
TN + FP
TPR − TP − FP TP + FP + FPR
Complemented χ2 measure [60] χ2∗ (Gt , Dc ) = 1 − · None
1 − TP − FP TP + FP
P REC · T P R TP
Complemented Fα measure [39] Fα∗ (Gt , Dc ) = 1 − , with P REC = α ∈]0; 1]
α · T P R + (1 − α) · P REC TP + FP
a segmented image to be assessed more precisely. Thus, we of vertical edges (see Fig. 3). On the contrary, the score
present a list of statistical measures -others are detailed in [3]- of P1 can be over 1 when w1 wstan , as in Fig. 5.
nr
, and a good-quaity edge detection method should obtain the Moreover, concerning P2 , when nr = w2 , so w 2
= 1 and
smallest response for the three following indicators [2] [9]: (1−nnoise /|Gt |)w1
P2 (Gt , Dc ) = 1 − (1−nnoise /|Gt |)w1 = 0, translating, wrongly,
FP a perfect segmentation, as in Fig. 3 (b), (c), (e), (f) and Fig.

 Over-detection error : Over(Gt , Dc ) = ,
5.




 |I| − |Gt |
 FN
Under-detection error : U nder(Gt , Dc ) = ,

 |Gt |
FP + FN



 Localization-error [29]: Loc(Gt , Dc ) =
 .
|I|
{
One of the pioneer works in quantitative edge evaluation,
reported by Deutsch and Fram [12] computed two parameters
P1 and P2 . The first evaluates the distribution of false positive
points in function of the number of columns containing edge
points, whereas P2 quantifies the edge detection by counting
{
the missing points for each line of Dc . Thus, P1 and P2
parameters are given by:
nsig (a) T
P1 (Gt , Dc ) = 1 − . (1)
wstan · |Gt |
nsig + (nnoise + F P ) ·
w1 · |I|
and w
nr nnoise 1
− 1− 1−
w2 |Gt |
P2 (Gt , Dc ) = 1 − w , (2)
nnoise 1
1− (b) C (c) L (d) C1
|Gt | P1 (L, C) = 1.0244 P1 (C, L) = 1.200 P1 (C, C1 ) = 0
with: P2 (L, C) = 0 P2 (C, L) = 0 P2 (C, C1 ) = 0.1429

F P · |Gt |

 nnoise =


TN + FP
 nsig = T P − nnoise ,


1 − nnoise
where nr and w1 represent respectively the number of rows
and columns in Dc which contains a least one edge point (e) C2 (f) C3 (g) C4
(F P or T P point). Also, wstan corresponds to the number P1 (C, C2 ) = 0.091 P1 (C, C3 ) = 0.091 P1 (C, C4 ) = 0.121
of columns of Gt (which is the same as Dc ), w2 is the P2 (C, C2 ) = 0 P2 (C, C3 ) = 0 P2 (C, C4 ) = 0.1654
number of rows of the envelop rectangle containing Dc .
Fig. 3. Computation of parameters P1 and P2 using contour images 7×7. The
These two parameters are close to 0 when the segmentation C image in (b) is considerate as the Gt image. P1 penalizes Dc having FPs
is efficient and tend towards one for poor detection. How- whereas P2 penalizes vertical contours with hole(s). In (d), P1 (C, C1 ) = 0
ever, P1 and P2 would be more suited for the evaluation because P1 records only FPs. Note in (a) that P1 (L, C) 6= P1 (C, L).
Several edge detection evaluations involving confusion ma- inclusion could be carried by a distance threshold or a dilation
trices are presented in Table I. Only TPs are an indicator, as for of Dc and/or Gt , as in [10] (see Eq. (58)). Such a strategy of
SSR, this measure is normalized in function of |Gt |·|Dc | and assimilation leads to counting several near contours as stripes
decreases with improved quality of detection, with SSR = 0 parallel to the desired boundary (issued from the edge detector
qualifying a perfect segmentation. itself or a blur/texture in the original image). For example,
The Binary Signal-to-Noise Ratio [58] BSN R (see table I) is awarding a spatial tolerance for the FPs will ensure the same
inspired by the Signal-to-Noise Ratio (SN R). It corresponds score for an over-segmentation near the edges (experiments
to a measure comparing the level of a signal in a reference illustrated in Fig. 13), if the spatial tolerance stays greater
signal to the level of the noise in a desired signal. Thus, than the FP distances from the true pixel. Tolerating a distance
BSN R computes a global error in function of the FPs and from the true contour and integrating several TPs for one
FNs; the fewer the numbers of FPs and FNs, the more the detected contour are opposite to the principle of unicity in edge
score increases and tends to the infinity when Gt = Dc . detection expressed by the 3rd Canny criterion: an optimal
∗
Finally, the complemented Performance measure Pm pre- edge detector must produce a single response for one contour
sented in table I considers directly and simultaneously the [8]. Thus, from the discussion below, only one FP or one
three entities T P , F P and F N to assess a binary image [19]. TP should be considered in the boundary detection evaluation
The measure is normalized and decreases with improved qual- process. Finally, to perform an edge evaluation, the assessment
∗
ity of detection, with Pm = 0 qualifying perfect segmentation. should penalize a misplaced edge point proportionally to the
By combining F P , F N , T P and T N , another way to dis- distance from its true location.
play evaluations is to create Receiver Operating Characteristic
(ROC) [7] curves or Precision-Recall (PR) [39], involving True
Positive Rates (T P R) and False Positive Rates (F P R): III. A SSESSMENT INVOLVING DISTANCES OF MISPLACED

TP PIXELS
 TPR =

TP + FN
FP A reference-based edge map quality measure requires that
 FPR =

. a displaced edge should be penalized in function not only of
FP + TN
FPs and/or FNs but also of the distance from the position
Derived from T P R and F P R, the three measures Φ, χ2 and where it should be located. Table II reviews the most rel-
Fα (detailed in Table I) are frequently used in edge detection evant measures in the literature with a new one called Ψ.
assessment. Using the complement of these measures results in Some distance measures are specified in the evaluation of
a score close to 1 indicating good segmentation, and a score over-segmentation (i.e. presence of FPs) and others in the
close to 0 indicating poor segmentation. Among these three assessment of under segmentation (i.e. missing ground truth
measures, Fα remains the most stable because it does not points). A complete edge detection evaluation measure takes
consider the TNs, which are dominant in edge maps. Indeed, into account both under- and over-segmentation assessment,
taking into consideration T N in Φ and χ2 influences solely the studied measures are detailed in the following subsection.
the measurement (as is the case in huge images).
These measures evaluate the comparison of two edge im-
ages, pixel per pixel, tending to severely penalize an (even A. Existing quality measures involving distances
slightly) misplaced contour. Furthermore, they depend heavily
of the reference image Gt which could be interpreted in The common feature between these evaluators corresponds
different ways, as the different contours presented in Fig. to the error distance dGt (p) or/and dDc (p). Indeed, for a pixel
1. So statistical measures do not indicate enough significant belonging to the desired contour p ∈ Dc , dGt (p) represents
variations of the desired contour shapes in the course of the minimal distance between p and Gt . On the contrary,
an evaluation (as illustrated in Fig. 5). As this penalization if a pixel p belongs to the ground truth Gt , dDc (p) is the
tends to be too severe, some evaluations resulting from the minimal distance between p and Dc . Mathematically, denoting
confusion matrix recommend incorporating spatial tolerance, (xp , yp ) and (xt , yt ) the pixel coordinates of two points p and
particularly for the assimilation of TPs [48] [7] [39]. This t respectively, thus dGt (p) and dDc (p) are described by:
1 1
0.9 0.9
FoM, κ = 1
0.8 0.8 FoM, κ = 1/2
0.7 0.7
FoM, κ = 1/3
1
FoM, κ = 1/4
0.6 0.6
Measure
Measure
0.5
0.5 0.5 0
0.4 0.4
FoM, κ = 1/5
distance 0.3 distance 0.3 FoM, κ = 1/6
0.2 0.2 FoM, κ = 1/7
of FPs of FNs
FoM, κ = 1/8
0.1 0.1
0 0
2 4 6 8 10 12
distance of the FP points
14 16 2 4 6 8 10 12
distance of the FN points
14 16
FoM, κ = 1/9
(a) Vertical edge lines with (b) F oM scores (c) Vertical edge line with (d) F oM scores (e) Legend
false positive points false negative points
Fig. 4. Evolution of F oM in function of the the distance of the false positive/negative points and κ parameter. A vertical line of false positive points (a) or
false negative points (c) is shifted by a maximum distance of 16 pixels and the measured scores are plotted in function of the displacement of this distance.
TABLE II
L IST OF ERROR MEASURES COMPARED IN THIS WORK . I N THE LITERATURE , THE MOST COMMON VALUES ARE k = 1 OR k = 2.
Error measure name Formulation Parameters

Figure of Merit (F oM ) 1 X 1
F oM (Gt , Dc ) = 1 − · κ ∈ ]0; 1]
[1] max (|Gt | , |Dc |) p∈D 1 + κ · d2Gt (p)
c
F oM of over- 1 X 1
F oMe (Gt , Dc ) = 1 − · κ ∈ ]0; 1]
segmentation [52] max (e−F P , F P ) p∈D ∩¬G 1 + κ · d2Gt (p)
c t
1
X 1 κ ∈ ]0; 1] and β ∈
F oM revisited [47] F (Gt , Dc ) = 1 − ·
|Gt |+β·F P 1 + κ · d2Dc (p) R+
p∈Gt
s
2
Combination of F oM and 1 (T P − max (|Gt | , |Dc |)) + F N 2 + F P 2 κ ∈ ]0; 1] and β ∈
d4 (Gt , Dc ) = · + F oM (Gt , Dc )
statistics [5] 2
(max (|Gt | , |Dc |))2 R+
! !
Edge map quality measure 1/2
X 1 1/2
X 1
Dp (Gt , Dc ) = |I|−|Gt |
· 1− + |G | · 1− κ ∈ ]0; 1]
[43] 2
1 + κ·dGt(p) t 1 + κ·d2Gt ∩Dc(p)
p∈Dc p∈G t
r P
100
Yasnoff measure [59] Υ (Gt , Dc ) = |I|
· d2Gt (p) None
p∈Dc

Hausdorff distance [24] H (Gt , Dc ) = max max dGt (p), max dDc (p) None
p∈Dc p∈Gt
Distance to Gt [46] 1
r P
Dk (Gt , Dc ) = |Dc |
· k dkGt (p), k = 1 for [46] k ∈ R+
[24][13][31] p∈Dc
 
1 X 1 X
Maximum distance [13] f2 d6 (Gt , Dc ) = max  · dGt (p), · dDc (p) None
|Dc | p∈D |Gt | p∈G
c t
Oversegmentation 1 P dGt (p) k for [40]: k ∈ R+ and

Θ (Gt , Dc ) = · , k = δT H = 1 for [20]
[20][40] FP
p∈Dc
δ TH δT H ∈ R ∗ +
Undersegmentation 1 P dDc (p) k for [40]: k ∈ R+ and

Ω (Gt , Dc ) = · , k = δT H = 1 for [20]
[20][40] FN
p∈Gt
δ TH δT H ∈ R ∗ +
k ∈ R+ and a con-
Baddeley’s Delta Metric r
1
∆k (Gt , Dc ) = |w(dGt (p)) − w(dDc (p))|k
P
[2] k |I|
· vex function w :
p∈I R 7→ R
v
u P k P k
u dGt (p)) + dDc (p)
k
Symmetric distance p∈D p∈G
u
c t
k
S (Gt , Dc ) = , k = 1 for [13] k ∈ R+
t
[13][31] |Dc ∪ Gt |
Magnier et al. measure r P
Γ(Gt , Dc ) = F P|G+F|2N · d2Gt (p) None
[37] t
p∈Dc
Symmetric distance mea- F P +F N

r P
d2Dc (p) +
P 2
Ψ(Gt , Dc ) = |Gt |2
· dGt (p) None
sure p∈Gt p∈Dc
most popular descriptors is the Figure of Merit (F oM ). This


 for p ∈ Dc : n distance measure ranges from 0 to 1, where 0 corresponds to a

perfect segmentation [1]. The constant value κ is fixed at 1/9,
 p o
 dGt (p) = Inf (xp − xt )2 + (yp − yt )2 , t ∈ Gt ,


1/4 or 1/10 (the last value is the one used in the experiments

 for p ∈ Gt : n of this paper). The more κ is close to 1, the more F oM
 o

 dDc (p) = Inf
 p
(xp − xt )2 + (yp − yt )2 , t ∈ Dc . tackles FPs, as illustrated in Fig 4. Widely utilized for com-
paring several different segmentation methods, in particular
These distances refer to the Euclidean distance, even though thanks to its normalization criterion, this assessment approach
some authors include other types of distance, see [31]. The nonetheless suffers from a main drawback. Whenever FNs are
measures presented in Table II are tuned in function of the created, for example a contour chain (even long) which is
various parameters referenced in Table IV. After marking a not totally extracted, the distance of FNs (dDc (p)) are not
theoretical comparison of all these edge detection assessments recorded. Indeed, F oM can be rewritten as:
and carrying out in depth comparison of their parameters, the
experimental results are presented. X 1 X 1
+
1) Figure of Merit and its derivatives: First, to achieve a p∈Dc ∩G
1 + κ · d2 (p)
Gt p∈D ∩¬G
1 + κ · d2Gt (p)
t c t
quantitative index of edge detector performance, one of the F oM (Gt , Dc ) = 1− ,
max (|Gt | , |Dc |)
hence: X 1
TP +
p∈Dc ∩¬G
1 + κ · d2Gt (p)
t
F oM (Gt , Dc ) = 1 − , (3)
max (|Gt | , |Dc |)
because, for p ∈ Dc ∩ Gt , d2Gt (p) = 0 and 1+κ·d12 (p) = 1.

Gt
Knowing that T P = |Gt | − F N , for the extreme cases, the
(a) Gt , |Gt | = 48 (b) D1 , F P = 54 (c) D2 , F P = 54 F oM measures takes the following values:
|Dc | = 54 |Dc | = 54 
 if F P = 0 :
TP


F oM (Gt , Dc ) = 1 −
50
,
1.5



1.5 1 50
40
40

 |Gt |
1
Error distance
Error distance
0.5 0.5 30 30
0
0
20
20

 if F N = 0 :
−0.5
−1
10 
 1
X 1
 F oM (Gt , Dc ) = 1 − · .
−0.5 0 
10
−1.5

−1
−10  max(|Gt |,|Dc |) 1 + κ · d2Gt (p)
20
15 20
20
15 20
0
p∈Dc ∩¬Gt
−1.5
10 15 10 15
−10
10 10
5
0 0
5
5
0 0
5
Consequently, F oM counts only TPs to penalize FN points
(d) Error distance: Gt v.s. D1 (e) Error distance: Gt v.s. D2 (Fig 4 (d)) whereas only distances of FPs are recorded (Fig
Negative values (in blue) represent dDc while positive values correspond to dGt . 4 (b)). Moreover, for F P > 0, as 1+κ·d12 (p) < 1, it can
Gt
be easily demonstrate that the F oM measure penalizes the
Over(Gt , D{1,2} ) = 0.12 U nder(Gt , D{1,2} ) = 1 over-detection very low compared to the under-detection. The
Loc(Gt , D{1,2} ) = 0.24 BSN R (Gt , D{1,2} ) = 0.73 curve in Fig. 6 shows that when F N > F P , the penalization
∗
Pm (Gt , D{1,2} ) = 1 Φ∗ (Gt , D{1,2} ) = 1 of missing points (FNs) becomes higher wheareas it is weak
∗
χ2∗ (Gt , D{1,2} ) = 0.98 Fα=0.5 (Gt , D{1,2} ) = 1 F N < F P . Incidentally, F oMe represents an extension of
P1 (Gt , D1 ) = 16.25 P1 (Gt , D2 ) = 1.63 F oM by counting only FPs [52], strictly evaluating the over-
P2 (Gt , D1 ) = 0 P2 (Gt , D2 ) = 0 segmentation. In fact, it computes a mean of 1+κ·d12 (p) for
Gt
F oM (Gt , D1 ) = 0.22 F oM (Gt , D2 ) = 0.60 all the FPs. Consequently, in the presence or absence of
F oMe (Gt , D1 ) = 0.22 F oMe (Gt , D2 ) = 0.60 FNs, a contour image is considered by the F oMe criterion
F (Gt , D1 ) = 0.63 F (Gt , D2 ) = 0.75 as correctly segmented. The experiments in Fig. 6, Fig. 8
d4 (Gt , D1 ) = 0.84 d4 (Gt , D2 ) = 0.89 and Section IV illustrate this undesirable behavior. Some
Dp (Gt , D1 ) = 0.51 Dp (Gt , D2 ) = 0.54 improvements have been developed, such as F and d4 . In
SF oM (Gt , D1 ) = 0.25 SF oM (Gt , D2 ) = 0.40 order to overcome the gaps in F oM , F computes the distances
M F oM (Gt , D1 ) = 0.29 M F oM (Gt , D2 ) = 0.40 of FNs from Gt and the measure is tuned by the number of
Υ(Gt , D1 ) = 12.99 Υ(Gt , D2 ) = 37.34 FPs and the cardinality of Gt (choosing β = 1 [47]). As
H(Gt , D1 ) = 1.41 H(Gt , D2 ) = 7.67
H5% (Gt , D1 ) = 1.41 H5% (Gt , D2 ) = 6.71
k k
Dk=1 (Gt , D1 ) = 1.06 Dk=1 ((Gt , D2 ) = 3.05
k k
Dk=2 (Gt , D1 ) = 0.16 Dk=2 ((Gt , D2 ) = 0.48
f2 d6 (Gt , D1 ) = 1.06 f2 d6 (Gt , D2 ) = 3.05
ΘδT H =1 (Gt , D1 ) = 1.19 ΘδT H =1 (Gt , D2 ) = 12.26
ΘδT H =5 (Gt , D1 ) = 0.05 ΘδT H =5 (Gt , D2 ) = 0.49
ΩδT H =1 (Gt , D1 ) = 1.04 ΩδT H =1 (Gt , D2 ) = 5.12 (a) Image Gt , (b) Image L1 , (c) Image L2 ,
ΩδT H =5 (Gt , D1 ) = 0.04 ΩδT H =5 (Gt , D2 ) = 0.20 |Gt | = 21 |Dc | = 31, T P =, 21 |Dc | = 10, T P = 0,
∆kw (Gt , D1 ) = 0.96 ∆kw (Gt , D2 ) = 2.31 F P = 10, F N = 0 F P = 10, F N = 21
k k
SDk=1 (Gt , D1 ) = 1.04 SDk=1 ((Gt , D2 ) = 2.57 1
k k
SDk=1 (Gt , D1 ) = 1.05 SDk=2 ((Gt , D2 ) = 2.98 FoM, g = 0.1
0.8
Γ(Gt , D1 ) = 0.34 Γ(Gt , D2 ) = 0.57 FoMe, g = 0.1
Measure
Ψ(Gt , D1 ) = 0.46 Ψ(Gt , D2 ) = 0.72 0.6 F, g = 0.1

d4, g = 0.1
KP IΓ (Gt , D1 ) = 0.16 KP IΓ (Gt , D2 ) = 0.27 0.4 SFoM, g = 0.1
MFOM, g = 0.1
KP IΨ (Gt , D1 ) = 0.22 KP IΨ (Gt , D2 ) = 0.37 0.2 Dp, g = 0.1
Fig. 5. Results of evaluation measures. For the two candidate 0

0 5 10 15 20
images, the number of FPs and number of FNs are the same: % of FNs
FPs: |D1 ∩¬Gt |=|D2 ∩¬Gt |=54 and FNs: |¬D1 ∩Gt |=|¬D2 ∩Gt | =
|Gt |=48. Also, D1 ∩Gt =D2 ∩Gt =∅, so T P = 0 and SSR(Gt , D1 ) = (d) Scores of F oM and its derived measures.
SSR(Gt , D2 ) = 1. The assessments involving distances of FPs and/or FNs
heavily penalize Dc , including misplaced points with greater distances.
Fig. 6. Results of F oM and its derived measures in function of the %
of FNs, comparing Gt and L1 . When the number of FNs attains 100%, the
candidate edge image corresponds to L2 .
TABLE III
B EHAVIOR OF d4 MEASURE IN FUNCTION OF DIFFERENT SEGMENTATION
LEVELS , NOTING M = (max(|Gt |, |Dc |))2 .
|T P −max(|Gt |,|Dc |)| FN FP

Segmentation level F oM M M M
Good ≈0 ≈0 ≈0 ≈0
F P %, F N = 0 >0 >0 0 >0
F P = 0, F N % >0 >0 >0 0 (a) Image A, (b) Image An , (c) Image L,
F P %, F N % >0 >0 >0 >0
|Gt | = 45 |Dc | = 48, T P =, 45 |Dc | = 32, T P = 4,
F P = 3, F N = 0 F P = 28, F N = 41
FPs distances
pointed out through Eq. 3 for F oM and illustrated in Fig.
6, F behaves inversely to F oM concerning FPs/FNs points:
F is more sensitive to FPs than FNs. Also, d4 represents
FNs distances ↔
another enhancement, this edge measure depends particularly
FPs distances
on T P , F P , F N and F oM ; d4 is normalized with the 21
coefficient [5]. Nonetheless, even though d4 behaves correctly
for the experiment in Fig. 6, this measure focuses on FPs and (d) Distance map: A vs. An (e) Distance map: A vs. L
penalizes FNs like the F oM measure, as detailed in Table III.
P1 (A, An ) = 0.01 P1 (A, L) = 0.80
This measure suffers from two main drawbacks. On the one
P2 (A, An ) = 0 P2 (A, L) = 0
hand, the sum of all its terms yields a high sensibility to edge ∗ ∗
Pm (A, An ) = 0.06 Pm (A, L) = 0.95
displacements. On the other hand, in the case of a pure under-
SSR(A, An ) = 0.06 SSR(A, L) = 0.99
segmentation, when F N → 0, d4 belongs to [0.7, 0.85] but
Φ∗ (A, An ) = 0.01 Φ∗ (A, L) = 0.92
does not attains the score of 1.
χ2∗ (A, An ) = 0.07 χ2∗ (A, L) = 0.99
As described here, an effective measure for the evaluation ∗ ∗
Fα=0.5 (A, An ) = 0.03 Fα=0.5 (A, L) = 0.90
of the edge detection is not only directed by dGt or dDc . Thus,
F oM (A, An ) = 0.05 F oM (A, L) = 0.60
inspired by f2 d6 (see bottom and [13]), another way to avoid
F oMe (A, An ) = 0.50 F oMe (A, L) = 0.83
the computation of only the distance of FPs in F oM (or only
F (A, An ) = 0.06 F (A, L) = 0.66
the distance of FNs in F ) is to consider the combination of
d4 (A, An ) = 0.05 d4 (A, L) = 0.78
both F oM (Gt , Dc ) and F oM (Dc , Gt ), as in the following
SF oM (A, An ) = 0.06 SF oM (A, L) = 0.53
two formulas:
M F oM (A, An ) = 0.06 SF oM (A, L) = 0.60
•Symmetric Figure of Merit: Dp (A, An ) = 0.003 Dp (A, L) = 0.29
1 1 Υ(A, An ) = 2.93 Υ(A, L) = 1.82
SF oM (Gt , Dc ) = · F oM (Gt , Dc ) + · F oM (Dc , Gt ) H(A, An ) = 7 H(A, L) = 5.83
2 2
(4) H5% (A, An ) = 5.5 H5% (A, L) = 5.37
• Maximum Figure of Merit: Dk,k=1 (A, An ) = 0.30 Dk,k=1 ((A, L) = 2.05
Dk,k=2 (A, An ) = 0.18 Dk,k=2 ((A, L) = 0.44
M F oM (Gt , Dc ) = max (F oM (Gt , Dc ) , F oM (Dc , Gt )) .
f2 d6 (A, An ) = 0.30 f2 d6 (A, L) = 2.11
(5)
ΘδT H =1 (A, An ) = 4.87 ΘδT H =1 (A, L) = 2.34
As F oM is a normalized measure, SF oM and M F oM
ΘδT H =3 (A, An ) = 2.89 ΘδT H =3 (A, L) = 0.79
are also normalized. Finally, SF oM and M F oM take into
ΩδT H =1 (A, An ) = 0 ΩδT H =1 (A, L) = 2.31
account both distances of FNS (i.e. dDc ) and FPs (i.e. dGt ),
ΩδT H =3 (A, An ) = 0 ΩδT H =3 (A, L) = 0.77
so they can compute a global evaluation of a contour image.
∆k (A, An ) = 2.11 ∆k (A, L) = 2.56
Nevertheless, Fig. 6 shows that SF oM behaves like F oM
SDk,k=1 (A, An ) = 2.12 SDk,k=1 (A, L) = 2.09
when F N > F P and that M F oM is not monotonous for SDk,k=2 (A, An ) = 2.71 SDk,k=2 (A, L) = 2.51
this experiment (whereas only FNs are added). Γ(A, An ) = 0.01 Γ(A, L) = 0.48
2) Hausdorff distance its enhancements: A second measure Ψ(A, An ) = 0.01 Ψ(A, L) = 0.75
widely computed in matching techniques is represented by KP IΓ (A, An ) = 0.01 KP IΓ (A, L) = 0.24
the Hausdorff distance H. In object recognition [22], the KP IΨ (A, An ) = 0.01 KP IΨ (A, L) = 0.39
algorithm aims to minimize H, which measure the mismatch
of two sets of points [24][45]. This max-min distance could be Fig. 7. Results of evaluation measures. Even edge evaluations involving
distances can be disturbed by few misplaced pixels and do not respect the
strongly deviated by only one pixel which can be positioned shape of the true pattern.
sufficiently far from the pattern (illustrated in Fig. 8); so the
measured distance becomes that between the pattern and the
(erroneous) point, in that case disturbing the score of H. To (n ∈ R∗ + ). Even though a mean percentage of the distance
improve the measure such that H becomes less sensitive to does not guarantee an optimized comparison, as illustrated in
outliers, one idea is to compute H with a proportion of the Fig. 7, this enhancement helps with regard to robustness, and
maximum distances (for example 5%, 10% or 15% of the some other modifications are proposed in [61] and in [4].
values [24]); let us note Hn% this measure for n% of values Inspired by the Hausdorff distance with a view to developing
a new method that is robust with regard to a small number The under-segmentation measure (right term) computes the
of outliers, some researchers have proposed other measures distances of the FNs between the closest correctly detected
and studied their behaviors in the presence of misplaced edge edge pixel, i.e. Gt ∩ Dc . That means that FNs and their
points [2] [13]. As Hn% remains close to the Hausdorff distances are not counted without the presence of TP(s), and
distance, the rank n acts as a threshold for erroneous pixels Dp is sensitive to displacements of edges. Moreover, both the
and Hn% behaves as H. As pointed out in [13], an average left and the right terms are composed of a 21 coefficient, so in
distance from the edge pixels in the candidate image to those the presence of only under- or over-segmentation, the score of
in the ground truth is more appropriate for matching purposes Dp does not go above 21 .
than H and Hn% . A first proposition of this distance is Dk
which represents an error distance only in function of dGt . 3) A new edge detection measure evaluation: In [37], a
Also, the Yasnoff measure, called Υ, seems to Dk , with k = 2, normalized measure is developed after computation of the edge
using a different coefficient of 100 assessment Γ. The Γ function represents an over-segmentation
|I| [59]. The distance measures
k
D and Υ estimate the divergence of FPs; in other words, measure which depends also of F N and F P . As this measure
they correspond to a measure of over-segmentation. On the is not sufficiently efficient concerning FNs, Ψ is an alternative
contrary, the sole use of a distance dDc instead of dGt enables function which also considers dDc for false negative points.
an estimation of the FN divergences, representing an under- Inspired by S k , Ψ uses different coefficients which change the
segmentation (as in Ω). Precisely, θ and Ω represent two over- behavior of the measure, as discussed in the next subsection.
and under-segmentation assessment measures, where δT H is
the maximum distance allowed to search for a contour point
(see also distortion rates in [18]). These distance measures B. On the importance of the coefficients
penalize misplaced points further than δT H from Gt [40].
Choosing, δT H > 1 is equivalent to taking into account the As shown in Table IV, distance measures compute the
relative position for the over- and under-segmentation and evaluation using a coefficient mostly including: |Gt |, |Dc |,
goes back to the problem of the spatial tolerance detailed |Gt ∪Dc |, F N or F P . Concerning the Υ distance measure, the
in Section II). Finally, as concluded in [9], a complete and coefficient 100
|I| compress the measurement, especially when
optimum edge detection evaluation measure should combine the image is large (as demonstrated in Fig. 12). For Dp , the
assessments of both over- and under-segmentation, as f2 d6 and 1
coefficient |I|−|G affects the measure concerning the false
t|
S k . Thus, the score of the f2 d6 corresponds to the maximum positive term, thereby creating an insensitivity to FPs, because,
between the over- and the under-segmentation whereas the generally, |Gt | |I|and |I|−|G 1
∼ |I|1
∼ 0+ . Thus, the
t|
values obtained by S k reprensents their mean. Moreover, S k assessment of FNs is given more weight in the edge evalua-
takes small values in the presence of low level of outliers tion, strongly penalizing edge displacements. The experiments
whereas the score becomes large as the level of mistaken presented in the next section illustrate this drawback (Fig. 12).
points increases [13][31] but is sensitive to remote misplaced However, ∆k uses |I| as denominator term because the mean
points as represented in Fig. 8. distance is computed for all the pixels of the image.
Combining both dDc and dGt , Baddeley’s Delta Metric (∆k ) The authors of Γ have studied the influence of the coefficient
[2] is a measure derived from the Hausdorff distance which is in different concrete cases [37]. Actually, using only |Gt | or
intended to estimate the dissimilarity between each element of |Dc | penalizes severely the measurement when one misclassi-
two binary images. Finally, as this distance measure is based fied pixel is placed at a significant distance of its true position.
on the mean difference between the two compared images
and is useful in region segmentation [26]. For this measure, w
TABLE IV
√ concave function, in general w(x) = x
represents a weighting PARAMETERS AFFECTING THE ERROR DISTANCE MEASURES .
or w(x) = min( n2 + m2 , x) for an image of size m×n,
with m and n∈N∗ [30] (in our experiments, we use w(x)=x). Measure |Gt | |Dc | dGt dDc Other
Compared to automatic threshold algorithms such as [41], the F oM X X X
F oMe X FP
threshold at which the edge detector obtains the best edge map F X X FP
is more appropriate when it is computed by the minimum value d4 X X TP, FP, FN
of ∆k [14]. The main drawback of ∆k is its hypersensitivity to M F oM X X X X
false positive points, i.e. this measure tends to over-penalize SF oM X X X X
images with false detections. Indeed, when a false positive Dp X X |I|, p ∈ Gt : dGt ∩Dc (p)
Υ X |I|
pixel is far from the true edge, the |w(dGt (p)) − w(dDc (p))| H, Hn% X X
value creates a high impact on the evaluation, thus penalizing Dk X X
the measure (as in the example in Fig. 8). Θ X FP
Ω X FN
Another way to compute a global measure is presented ∆k X X |I|
in [43] with the normalized edge map quality measure Dp . f2 d6 X X X X
In fact, this distance measure is similar to SF oM , with Sk X X |Gt ∪Dc |
different coefficients. The over-segmentation measure (left Γ X X FP + FN
term) evaluates dDc , the distances between the FPs and Gt . Ψ X X X FP + FN
FPs distances
FPs distances
FNs distances ↔
FNs distances ↔
(a) Ground truth, (b) T1 with 19 FPs, (c) Distance map (d) T2 with 1 FP, (e) Distance map
|Gt | = 20 distance of 2 pixels. from (a) vs. (b) distance of 15 pixel. from (a) vs. (d)
r P
d2Dc (p) + d2Gt (p) ∆k Sk
P
Dc |Gt | FN FP F oMe H H20% A B Ψ
p∈Gt p∈Dc
T1 100 12.21 19 19 0.5 2 2 1.91 1.93 0.031 0.031 1.15

T2 100 15.03 1 1 0.983 15 4 4.78 2.38 0.038 0.038 0.075
Fig. 8. Errors quantified by F oMe , H, H5% , ∆k , S k (k = 2), A, B and Γ for two different candidate edge images compared to the same ground truth.
For a counterexample, let us consider the following formula: the same information: an error measure close to 1 means
r P poor segmentation whereas a value close to 0 indicates good
d2Dc (p) +
P 2
dGt (p) segmentation. The values of F oM , F oMe , F , d4 , M F oM ,
p∈Gt p∈Dc
A(Gt , Dc ) = . (6) SF oM and Dp comply with this [0, 1] condition. However,
|Gt |2 concerning the other distance measures in Table II, normal-
A represents an evaluation measure of contour detection ization is required. Introduced in [37], a formula called Key
involving both dDc and dGt plus a coefficient term |G1t |2 . Performance Indicator (KP I), with KP I ∈ [0, 1] gives a
Now, considering two images (images (b) and (d) in Fig. 8) value close to 1 for a poor segmentation. Alternatively, a
with identical ground truth points Gt (Fig. 8(a)). However, in KP I value close to 0 indicates a good segmentation. The Key
image (d), a single FP point at a distance sufficiently high Performance Indicator is defined using the following equation:
behaves for A as several FP points on another image; these
images have the same interpretation in terms of A measure, KP Iu : [0; ∞[ 7→ [0; 1[
which is not the case with ratios 1/10 and 9/10. The table in 1 (8)
u → 1− .
Fig. 8 illustrates, with a simple example, this drawback for A, 1 + uh
∆k and S k , and a similar example can be found in in Fig. 7. where the parameter u is replaced by a distance error and h
Close to A, the following measure B depends strongly on a constant such that h ∈ R+ ∗.
the desired contour number |Dc |: A key parameter of the KP I formula is the power of the
r P denominator term called h. It may be called a power of obser-
d2Dc (p) +
P 2
dGt (p)
p∈Gt p∈Dc vation. Inasmuch as KP I depends on its value, it evolves more
B(Gt , Dc ) = . (7) or less quickly around 0.5 and embodies a range of observable
|Dc |2
cases. Averageq values have been determined for the error
B suffers from the same problem as A. Furthermore, in P 2 P 2 qP
distance term dDc + dGt concerning Ψ and d2Gt for
general, a measure depending mainly on a coefficient |D1c | (as Γ. The choice of values between 1 and 2 can be easily checked.
1
a coefficient |I| ) evaluates a boundary image with a value close Otherwise, the more abrupt the KP I evolution, the less the
to 0 whereas Dc stays totally misplaced, especially when |Dc | transition between 0.5 and 1 is marked (i.e. the slope of the
is huge (when |Dc | is huge, it corresponds to a total saturation KP I curve, for example h = 1 in Fig. 9). qP Moreover, fixing
of the desired contour with an inconsistent number of FPs). h = 1, KP I stagnates far from 1 when d2Dc + d2Gt or
P
Finally, the authors concluded in [37] that such a for- qP
d2Gt becomes high. Additionally, when h = 2, KP I starts
mulation must take into consideration all observable and
theoretically observable cases, as illustrated in Fig. 8. That
means that an effective measure has to take into account all the 0.4 1
following input parameters |Gt |, |Dc |, F N and F P , whereas 0.35

0.9
0.8
the image dimensions should not be considered. Thus, the 0.3

0.7
coefficient parameter F P|G+F t|

2
N
seems a good compromise for 0.25
0.6
KPI
KPI
0.2 0.5
an evaluation measure of edge maps and has been introduced 0.15
0.4 KPI , h = 1
KPI , h = 2
into the new formula of assessment Ψ. 0.1
0.3 KPI , h = φ
0.2
0.05
0.1
C. Normalization of the edge detection evaluation 0

0 100 200 300 400
p P500 2 600 700 800 900
0
0 1000 2000 3000
p4000 5000
P 2 6000 7000 8000 9000
d d
In order to compare each boundary detection assessment,
Fig. 9. Evolution of the KP Iψ in function of the mistake points distance,
all the measures must be normalized, and must also indicate for different powers h, with F N + F P = 4000 and |Gt | = 2200.
|Gt | = 200 |Gt | = 2000 |Gt | = 5000
Fig. 10. Evolution of the KP I in function of the erroneous points distance, the number of FPs and FNs, for different |Gt | and fixing h = φ.
to increase slowly, then the slope becomes sharp around 0.5 • remoteness of a false positive and false negative chains,
and converges quickly towards 1. Finally, to fix the power • computation of the minimum value of the measures on edge
at the golden ratio φ ' 1.6180339887 in order to ensure images (synthetic and real) compared to the ground truth.
an evolution of KP I that would not be too abrupt from 0 Therefore, 26 measures are tested and compared with each
to 1 and also not penalize when the error distances are not other: statistical measures in Table I (except BSN R ), distance
high (contrary to KP I with h = 1, see Fig. 9). Furthermore, measures in Table II, plus SF oM , M F oM and SSIM .
Fig. 10 illustrates KP I values in function of the distance of Firstly, the normalized measures Φ∗ , χ2∗ , Pm ∗
, Fα∗ , SSIM ,
the mistake points and their numbers. For a small skeleton F oM , F oMe , F , d4 , SF oM , M F oM , Dp , KP IΓ and
Gt = 200, it shows that KP I draws near 1 quickly when KP IΨ are plotted together. The values of SSR are not
the number of FPs, number of FNs and their distances are recorded because this measure behaves like Pm ∗
. Secondly,
growing. Alternatively, for a bigger skeleton, KP I rises more the scores of the other measures are also normalized to be
slowly. These three evolution surfaces show the coherence plotted together, and normalized using the following equation
and robustness of KP I regarding the starting conditions: a for easy visual comparison. Denoting by f ∈ [0; +∞[ the
displaced edge is penalized in function of the number of false scores vector of a distance measure such that m = min(f )
pixels and also of the distance from the position where it and M = max(f ), then the normalization N of a measure is
should be located. computed by:
Such a value enables a displaced edge to be penalized in 
function of the false pixel number and also of the distance  0 if M = m = 0
if M = m 6= 0

from the position it should be located at. As a compromise,

 1
using the KP I formula with h = φ, the measurement N (f ) = f −m (9)
if M > 1 and m 6= 0
−

neither becomes too strong in the presence a small number


 M m
f otherwise.
of positive or negative misclassified pixels nor penalizes Dc
too severely if dGt /Dc is small. Compared to Γ, Ψ improves Note that the parameters for each evaluation measure are
the measurement by combining both dGt and dDc . The next indicated directly in the captions of the curves and that the
section is dedicated to experimental results where candidate matlab code of the distance measures as F oM , Dk , S k and
edge images are progressively degraded in order to study the ∆k are available at http://kermitimagetoolkit.net/library/code/.
behaviors of the presented boundary detection assessments.
Note that the other measures are also normalized with another
formula in the next section in order to compare all of them. A. Behavior comparison
The simulation of the degradation of the ground truth
IV. C OMPARISON OF EXPERIMENTAL RESULTS Gt is studied using synthetic data. Thus, concerning the
first 6 experiments, a vertical line represents Gt in a size
The two sections above describe the main error measures image of 100×100, as illustrated in Fig 11(a). FNs, FPs or
concerning edge detection assessment, explaining the advan- displacements corrupt the image, and, as Gt corresponds
tages and drawbacks of each one. The tests carried out in the to a line, the interpretation of the results remains simpler.
experiments are intended to be as complete as possible, and For example, adding a false pixel to Gt or a translation of
thus as close as possible to reality. To that end, considering an Gt can easily be represented mathematically whereas, for a
edge model (i.e. ground truth) the edge detection evaluation more complicated shape, the translation of Gt can cross other
measures are subjected to the following studies: pixels and an appropriate measure will not obtain a monotonic
• addition of false negative points (under-segmentation), score. Hence, the degradations applied to the vertical lines are
• addition of false positive points (over-segmentation), chosen in order to obtain a monotonic score for a complete
• addition of both false negative and false positive points, measure. Then, the following two tests concern a ground
• addition of false positive points close to the true contour, truth which is a square followed by the computation of the
• translation of the boundary, minimal value of the edge detection evaluation measure in
soon as they appear, but their values are under 0.5 whereas the
last corrupted image is composed of more FPs that TPs(see
Fig. 11(b)). F oMe is not monotonic and computes a mean
in function of the apparition of FPs. Φ∗ and Dp overlap and
stay close to 0 (same remark for the under-segmentation Ω).
Contrary to the previous measures, KP IΓ and KP IΨ ensure
an evaluation which does not become too high in the presence
of a small number of FPs, but penalizes Dc more severely
(a) Gt , image 100×100 (b) addition of 101 FPs for 50 or more undesirable points. Secondly, concerning the
1
Φ*
non-normalized measures, Θ is not monotonic or sensitive to
0.9 χ2* FP distances, as H and H5% which stay blocked at the higher
P*m
0.8
F*α, α = 0.5
distances values, whatever the number of FPs. Furthermore, Υ,
0.7
Dk , ∆k , f2 d6 and S k have a coherent behavior, even though
KPI or measure
1 SSIM
Υ, Dk , ∆k and S k (with k = 2) are sensitive to FPs at the

0.6 FoM, κ = 0.1
FoMe, κ = 0.1
0.5 0.5
0.4
F, κ = 0.1
d4, κ = 0.1
beginning of the assessment.
0
0.3
05100
0
SFoM, κ = 0.1 This first experiment is produced by randomly adding FPs
MFOM, κ = 0.1
0.2
Dp, κ = 0.1 to the same Gt image. The second test is presented in the
0.1 KPIΓ
KPIψ
image of Fig. 12 (a) which is composed of a vertical line and
0
0 10 20 30 40 50 60 70 80 90 100 FP / |Gt | 100 FPs randomly placed around the line. The first image
FP
in this test image is as large as the image in Fig. 11(b). In
1
a second step, the TPs and FPs are maintained in the same
0.9 ϒ
H position and the margins are increased, as illustrated in Fig.
0.8 H5%
12 (b), influencing the evaluation in certain cases. The curves
Normalized measure
0.7 Dk, k = 2
0.6
Θ, δTH = k = 1
Ω, δTH = k = 1
presented in Fig. 12 (b) show which dissimilarity measure
0.5 Θ, δTH = 5, k = 2
Ω, δTH = 5, k = 2
0.4
∆k, k = 2
0.3 f2d6
Sk, k = 1
0.2
Sk, k = 2
0.1 FP / |Gt |
0
0 10 20 30 40 50 60 70 80 90 100
FP
Fig. 11. Evolution of the dissimilarity measures in function of the number

of added false positive points.
1
0.9
(a) 30 % of FPs (b) 100 % of FPs
n
0.8 1
o
χ2* Φ*
ati
0.7
0.9 2*
χ
ng
SSIM
Measure
0.6 *
Dp, κ = 0.25 0.8 Pm
e lo
0.5
ϒ 0.7
F*α, α = 0.5
KPI or measure
e
0.4
∆k, k = 2 SSIM
siz
0.3 0.6 FoM, κ = 0.1

ng
FoMe, κ = 0.1
0.2 0.5
gi
F, κ = 0.1
ar
0.1
0.4 d4, κ = 0.1
M
0
0 50 100 150 200
SFoM, κ = 0.1
0.3
margin size MFOM, κ = 0.1
0.2 Dp, κ = 0.1
(a) Elongation of the margins (b) Measure scores KPIΓ
0.1
KPIψ
Fig. 12. Behaviors of dissimilarity measures in function of image size. 0
0 10 20 30 40 50 60 70 80 90 100
% of oversegmentation at a distance of 5 pixels
function of the threshold. 1
0.9
ϒ
0.8 H
1) Addition of false positive points: The first test consisted H5%
Normalized measure
0.7 Dk, k = 2
in randomly adding undesirable pixels to Gt until 100 pixels, 0.6
Θ, δTH = k = 1
Ω, δTH = k = 1
as represented in Fig. 11(b). This perturbation is equivalent to 0.5
Θ, δTH = 5, k = 2
impulsive noise. The curves in Fig. 11 indicate the response 0.4 Ω, δTH = 5, k = 2
∆k, k = 2
of each measure in function of the number of created Fs; they 0.3
f2d6
Sk, k = 1
enable comparison of the results delivered by the different 0.2
Sk, k = 2
0.1
measures.
0
0 10 20 30 40 50 60 70 80 90 100
Firstly, concerning measures automatically given values % of oversegmentation at a distance of 5 pixels
∗
between 0 and 1, χ2∗ , Pm , Fα∗ , SSIM , F oM , F , d4 , SF oM
Fig. 13. Evolution of the dissimilarity measures in function of the over-
and M F oM overlap and penalize corrupted images by FPs as segmentation in the contour area.
1 1
Φ*
Φ* 2*
0.9 0.9 1χ
2*
χ
0.8 P*m
0.8 P*m *
0.5F , α = 0.5
KPI or measure
F*α, α = 0.5 0.7 α
0.7 SSIM
KPI or measure
SSIM 0.6 FoM, κ = 0.1

0.6 0
FoM, κ = 0.1 0 50100
FoM , κ = 0.1
0.5 e
FoMe, κ = 0.1
0.5 F, κ = 0.1
0.4
F, κ = 0.1 d4, κ = 0.1
0.4 d4, κ = 0.1 0.3 SFoM, κ = 0.1
SFoM, κ = 0.1 0.2 MFOM, κ = 0.1
0.3
MFOM, κ = 0.1 Dp, κ = 0.1
0.2 Dp, κ = 0.1 0.1
KPIΓ, k = φ
KPIΓ 0 KPIψ, k = φ
0.1 0 10 20 30 40 50 60 70 80 90 100
KPI
ψ % of FNs, i.e. FN / |Gt |, with |Gt |=100 FP / |G |
t
0
0 10 20 30 40 50 60 70 80 90 100
% of FNs
1
1
0.9 ϒ
0.9
ϒ H
0.8 H 0.8 H5%
H
Normalized measure
Dk, k = 2
Normalized measure
5%
0.7 0.7
Dk, k = 2 Θ, δ = k = 1
TH
Θ, δTH = k = 1 0.6
0.6 Ω, δ =k=1
TH
Ω, δTH = k = 1
0.5 Θ, δTH = 5, k = 2
0.5
Θ, δTH = 5, k = 2
Ω, δTH = 5, k = 2
0.4 Ω, δTH = 5, k = 2 0.4
∆k, k = 2
∆k, k = 2 fd
0.3 0.3
FP = FN = 50
f2d6 2 6
0.2 k 0.2
Sk, k = 1
S,k= 1
Sk, k = 2
Sk, k = 2
0.1 0.1 FP / |Gt |
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
% of FNs % of FNs, i.e. FN / |Gt |, with |Gt |=100
Fig. 14. Evolution of the dissimilarity measures in function of the number Fig. 15. Evolution of the dissimilarity measures in function of both the
of false negative points addition. F oM and M F oM overlap. number of false positive and false negative points addition.
scores are changed by the elongations of the margins (note the experiment. As this test concerns only the addition of
that the margins lengthen both downward and to the right FNs, the Dp distance measure obtains a result of 0.5 at the
at the same time). Even though the values of χ2∗ do not finish, but no higher because it corresponds to a measure
change significantly, SSIM , Υ and ∆k evolve strongly. which separates under- and over-segmentation. As feared in
The distance measure Dp , which remains insensitive to FPs Section III, the d4 measure does not sufficiently penalizes
becomes increasingly close to zero as the margin increases. the under-segmentation because the maximum score it attains
is around 0.8. Moreover, compared to the experiment in
2) Over-segmentation close to the contour: The idea Fig. 11, it is clear that F oM , F , d4 , SF oM , M F oM and
of this experiment is to create an over-segmentation at a Dp are more sensitive to FN addition than FP points. The
maximum distance of 5 pixels, as illustrated in Fig. 13(a).
Therefore, 100% of over-segmentation represents a dilation
of the vertical line with a structural element of size 1×6, 1
Φ*
corresponding of a total saturation of the contour, see Fig. 0.9
χ2*
P*m
13(b). The curves presented in Fig. 13 show that F oMe , Υ, 0.8
F*α, α = 0.5
S k (k = 2), ∆k and Θ (δT H = 1) are very sensitive to FPs 0.7
KPI or measure
SSIM
0.6 FoM, κ = 0.1
whereas, Φ∗ , Dk , F oMe , SSIM and Dp do not penalize 0.5
FoMe, κ = 0.1
F, κ = 0.1
Dc enough. Also, Ω stagnates at 0 because it corresponds 0.4 d , κ = 0.1
4
to an under-segmentation measure. F oMe , Dk and Θ are 0.3

SFoM, κ = 0.1
MFOM, κ = 0.1
Dp, κ = 0.1
not monotonic whereas the saturation of the contour is 0.2
KPIΓ
0.1
progressive. Note that H and H5% keep the same result 0
KPIψ
throughout the test after 5% of degradation and that ∆k is 0 2 4 6 8 10

pixel translation
12 14 16 18 20
nearly the same. Finally, χ2∗ , Fα∗ , F oM , F , d4 , SF oM , 1

Edge translation: 20 pixels
M F oM , KP IΓ , KP IΨ , f2 d6 and S k (k = 1) ensure a 0.9 ϒ

measure evolution which is not too abrupt, even though F oM 0.8
H
H5%
Normalized measure
stagnates around 0.35, then d4 and SF oM around 0.6. 0.7 Dk, k = 2

Θ, δTH = k = 1
0.6
Ω, δTH = k = 1
3) Addition of false negative points: In this experiment, 0.5
Θ, δTH = 5, k = 2
Ω, δTH = 5, k = 2
pixels of the vertical line are missing, creating false negative 0.4
0.3 ∆k, k = 2
points. Fig. 14 presents the evolution of the criteria obtained f2d6
0.2
by the studied segmentation measures. These curves indicate Sk, k = 1
0.1 Sk, k = 2
that almost all the measures show the same behavior. 0
The over-segmentation measures F oMe , Γ, Υ, Dk and Θ 0 2 4 6 8 10
pixel translation
12 14 16 18 20
stagnate at 0. The SSIM stays close to 0, whereas there Fig. 16. Evolution of the dissimilarity measures in function of the translation
are almost no more remaining pixels in Dc at the end of of Dc . Note that F oM , F oMe , SF oM and M F oM overlap.
n
o
o
ati
ati
ng
ng
elo
elo
ge
ge
Ed
Ed
(a) Gt (b) C1 (c) Gt vs. C1 (d) Edges (e) Gt (f) C2 (g) Gt vs. C2 (h) Edges
(crop) (crop) (crop) evolutions (crop) (crop) (crop) evolutions
1 1
Φ* Φ*
0.9 0.9
χ2* χ2*
0.8 P*m 0.8 P*m
0.7
F*α, α = 0.5 0.7
F*α, α = 0.5
KPI or measure
KPI or measure
SSIM SSIM
0.6 FoM, κ = 0.1 0.6 FoM, κ = 0.1
FoMe, κ = 0.1 FoMe, κ = 0.1
0.5 0.5
F, κ = 0.1 F, κ = 0.1
0.4 d4, κ = 0.1 0.4 d4, κ = 0.1
SFoM, κ = 0.1 SFoM, κ = 0.1
0.3 0.3
MFOM, κ = 0.1 MFOM, κ = 0.1
0.2 Dp, κ = 0.1 0.2 Dp, κ = 0.1
KPIΓ KPIΓ
0.1 0.1
KPIψ KPIψ
0 0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
square size, with | I | = 2500 square size, with | I | = 2500
(i) Gt vs. C1 : Evolution of the normalized dissimilarity measures (j) Gt vs. C2 : Evolution of the normalized dissimilarity measures
in function of the edges elongations in function of the edges elongations
1 1
0.9 0.9
ϒ ϒ
0.8 H 0.8 H
H5% H5%
Normalized measure
Normalized measure
0.7 Dk, k = 2 0.7 Dk, k = 2
Θ, δTH = k = 1 Θ, δTH = k = 1
0.6 0.6
Ω, δTH = k = 1 Ω, δTH = k = 1
0.5 0.5
Θ, δTH = 5, k = 2 Θ, δTH = 5, k = 2
0.4 Ω, δTH = 5, k = 2 0.4 Ω, δTH = 5, k = 2
∆k, k = 2 ∆k, k = 2
0.3 0.3
f2d6 f2d6
k
0.2 S,k= 1 0.2 Sk, k = 1
Sk, k = 2 Sk, k = 2
0.1 0.1
0 0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
square size, with | I | = 2500 square size, with | I | = 2500
(k) Gt vs. C1 : Evolution of the dissimilarity measures (l) Gt vs. C2 : Evolution of the dissimilarity measures
in function of the edges elongations in function of the edges elongations
Fig. 17. Evolution of dissimilarity measures in function of the size of the original square.
non-normalized measure scores in the lower graph of Fig. 1 whereas both FNs and FPs are added. For example, the
14 evolve progressively, even though S k (with k = 1) is image in Fig 15, bottom right, illustrates the line where both
less sensitive to FNs Note that, for ∆k (which overlaps with 50% of the correct pixels are missing and 50 FPs are added.
f2 d6 ), FN distances are considered to be less detrimental In this precise case, Φ∗ , Fα∗ , F oM , d4 , M F oM and SF oM
than FP distances (experiments reported in Fig. 11). Finally, measures give a result close to 0.5 whereas Gt is disturbed by
KP IΨ ensures an evaluation which is not too high in the more than 50%, as represented by the measurement of χ2∗ ,
∗
presence of a small number of FNs, followed by a greater Pm , KP IΓ and KP IΨ (note that Υ, Dk and S k -with k = 2-
penalization when the number of FNs increases. behave the same way even though they have a sensitivity
to misplaced points at the beginning of the experiment). F
4) Addition of both false positive and false negative points: seems to evolve correctly but is not monotonic. Note that
Some measure are specialized and have an adaptive behav- the SSIM remains unsuitable for this type of degradation
ior for the evaluation of only under- or over-segmentation. because the best score it attains is little more than 0.5. Finally,
Nevertheless, it is interesting to study their sensitivities by the evolution of Dp is monotonic until 0.5 but not thereafter,
1
combining these two degradations because edge detectors due to the coefficient |I|−|G t|
which compresses the result,
often create both FPs and FNs in real/noisy images. Thus, whereas the final image represents a cloud of FPs without
adding randomly undesirable pixels to Gt (until 100 pixels) TPs (the next experiment with edge translation shows more
and creating FNs at the same times gives a discontinuous line clearly the influence of this coefficient).
in the middle of randomly placed FPs, as illustrated in Fig 15,
bottom right. 5) Edge translation: As pointed out in Section I, a blur in
Fig. 15 shows a hyper-sensitivity of FPs/FNs distances for the original image can shift the detected contour. When this
F oMe , H, H5% , Θ, ∆k , S k and ∆k . Moreover, Θ and Ω displacement remains not too high, the shape of the desirable
are not monotonic. Also, Φ∗ , Fα∗ , F oM , d4 , M F oM , f2 d6 object stays detectable and the evaluation should not overly
and S k (with k = 1) increase (almost) constantly from 0 to penalize the edge detector. Hence, edge detection evaluation
measures must be assessed in function of the translation performance of the contour detection evaluation measures,
of a true contour. Indeed, the vertical line is shifted by a one approach is to compare each measure by varying the
maximum distance of 20 pixels (Fig. 16 bottom left) and threshold of the thin edges computed by an edge detector1 .
the score of the measures is plotted in Fig. 16 in function Indeed, compared to a ground truth contour map, the ideal
of the displacement of the desired contour. Thus, regarding edge map for a measure corresponds to the desired contour
the statistical measures Φ∗ , χ2∗ , Pm ∗
and Fα∗ , the curves at which the evaluation obtains the minimum score for the
perfectly illustrate the need to consider the distances of the considered measure among the thresholded gradient images.
misplaced pixels for the edge detection evaluation. F oM , Theoretically, this score corresponds to the threshold at which
SF oM and M F oM overlap completely, but their evaluations the edge detection represents the best edge map, compared to
are correct, like KP IΓ and KP IΨ . F and d4 measures are the ground truth contour map [14][9]. Since a small threshold
sensitive to the first 2 translations. As pointed out above leads to heavy over-segmentation and a strong threshold may
with the previous test, Dp is not suited to the evaluation create numerous false negative pixels, the minimum score of
of translations, like SSIM (the SSIM computes locally a an edge detection evaluation should be a compromise between
score in a window 8×8, so the score does not change after a under- and over-segmentation.
switch of 8 pixels). The evolution of the other measures (Fig. The two images used in these experiments are a synthetic
16, bottom) perform monotonically but no information can and a real image. For the first one, in Fig. 18 (a), several
be infered because the measures are not normalized between white shapes are present immersed in a black background,
0 and 1 to evaluate the segmentation. creating step edges. To avoid the problem of edge pixel
placements, as stated in Section I, Fig. 1, a blur is created by
6) Comparison of the false positive distance and false adding a 1 pixel width of gray around each shape. Thus, the
negative distance points: In the experiment proposed in Fig. ground truth corresponds to this gray. The second image in
17, two desired contours are compared with a ground truth, Fig. 19(a) is a real image with a corresponding hand-made
illustrating the importance of considering both the distance of ground truth. Even though the problem of hand-made ground
the false negatives points (dDc ) and the distance of the false truths on real images is discussed by some researchers, only
positive points (dGt ). Indeed, a square (Gt in Fig. 17 (a)) the comparison of Dc with a Gt is studied here.
is compared with another square (Fig. 17 (b)) and also with
a shape of two edges forming an angle (Fig. 17 (f)). The 1) Edges of synthetic data: To evaluate the measures
minimum square size for Gt is 3×3 large and the image size performances, the original image in Fig. 18(a) is disturbed
stays fixed at 50×50. The shapes C1 and C2 keep the same with random Gaussian noise and edges are extracted from
position as in (c) and (g) and increase in proportion to Gt . the noisy image (Fig. 18(d) and (e)). Scores are plotted in
All the shapes grow at the same time, as represented in Figs. function of the threshold and the image under each curve
17 (d) and (h) and the scores for each measure are plotted in corresponds to the ideal edge map for the considerate measure.
Fig. 17 (i), (j), (k) and (l). Thus, the more Gt grows, the more Statistic measures and Dp correctly extract the edges at the
C1 is visually closer to Gt whereas FNs deviate strongly in price of numerous FPs. The Hausdorff distance H, H5%
the case of C2 , but keep the same percentage of FPs and FNs and ∆k threshold the edges too severely, losing the majority
for each desired contour. That is the reason why statistical of TPs. The over-segmentation distance measures Dk , θ
measures obtain the same evolutions for each shape: Φ∗ , χ2∗ , and Γ lose almost all the contours because the minimum
∗
Pm and Fα∗ . Despite these two different edge evolutions, corresponds to the threshold where no false pixel appears
some distance measures obtain almost the same measurements (even though true pixels disappear). Finally, SSIM , f2 d6 ,
for C1 and C2 : F oM , F oMe , Dp , Dk , Θ (with δT H = 1) F , S k and Ψ do not create significant holes in the contours
and KP IΓ . On the contrary, concerning the normalized and the FNs extracted are positioned close to the true contours.
error distance measures, M F oM and KP IΨ increase to
about 0.5 for C2 , due to around 50% of TPs whereas they 2) Edges of a real image: Real images contain other
converge towards 0 for C1 , since C1 becomes visually closer disturbances than the previous noisy synthetic image, such as
to Gt . Nevertheless, SF oM does not sufficiently penalize texture or blurred edges. The ground truth and the original
FN distances in C2 , and F is too severe with C1 . Concerning image in Fig. 19 arise from the database available at the
non-normalized measures, H5% , f2 d6 and S k behave correctly. following website: http://www.cs.rug.nl/∼imaging/databases/
contour database/contour database.html.
In edge detection assessment, the ground truth is considered
as a perfect segmentation. However, the boundary benchmarks
B. Threshold corresponding to the minimum measure are built by human observers and errors may be created by hu-
The aim of the experiments presented here is to obtain man labels (oversights or supplements). Indeed, an inaccurate
the best edge map in a supervised way. The edges are ground truth contour map in terms of localization (sometimes
extracted using the Canny filter [8] (i.e. isotropic Gaussian several pixels, see [27], part 2.2.2.3) penalizes precise edge
filter with a standard deviation of σ). Then, thin edges are detectors and/or advantages the rough algorithms. In [23] the
created after the non-maximum suppression of the absolute 1 The matlab code and several measures are available on
gradient in the gradient direction η [49] (see table V) and MathWorks: https://fr.mathworks.com/matlabcentral/fileexchange/
are normalized for easier comparison. In order to study the 63326-objective-supervised-edge-detection-evaluation-by-varying-thresholds-of-the-thin-edg
(a) Image 270×238 (b) Zoom in (a) (c) Gt , |Gt | = 1584 (d) Noisy, SN R = 3.31dB (e) Thin edges (f) Φ
1 250 250 0.35 1 120
ideal threshold = 0.54 ideal threshold = 0.77 ideal threshold = 0.65 ideal threshold = 0.61
0.9 ideal threshold = 0.58
0.9 0.3
100
200 200 0.8
0.8
0.25 0.7
80
0.7
150 150 0.6
Measure
Measure
Measure
Measure
Measure
Measure
0.2
0.6 0.5 60
0.15
100 100 0.4
0.5
40
0.1 0.3
0.4
50 50 0.2
ideal threshold = 0.92 20
0.3 0.05
0.1
0.2 0 0 0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Threshold Threshold Threshold Threshold Threshold Threshold
(g) Fα (h) H (i) H5% (j) D k , k = 2 (k) SSIM (l) f2 d6

140 1 1 120 0.5 1
ideal threshold = 0.65 ideal threshold = 0.52 ideal threshold = 0.62 ideal threshold = 0.77 ideal threshold = 0.47 ideal threshold = 0.62
0.9 0.9 0.45 0.9
120
100
0.8 0.4 0.8
0.8
100 0.7 0.35 0.7
0.7 80
0.6 0.3 0.6
Measure
Measure
Measure
Measure
Measure
Measure
80 0.6
0.5 60 0.25 0.5
60 0.5
0.4 0.2 0.4
0.4 40
40 0.3 0.15 0.3
0.3
0.2 0.1 0.2
20
20
0.1 0.2 0.05 0.1
0 0 0.1 0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Threshold Threshold Threshold Threshold Threshold Threshold
(m) S k , k = 2 (n) F oM (o) F (p) ∆k (q) Dp (r) KP IΨ
Fig. 18. Comparison of the minimal score for several edge detection evaluation measures concerning synthetic data. The edges are extracted using Canny’s
theory (σ = 1) on the noisy image in (d) and the binary images correspond to the thin edge image in (e) which is thresholded at the threshold corresponding
to the minimum value of the bottom curve. The results for Fα∗ are the same as the Pm ∗ , χ2∗ measures and do not need to be reported. The scores of the
over-segmentation measure Dk are the same as F oMe , Υ, Θ and Γ. Moreover, F oM , d4 , SF oM and M F oM obtain almost the same result.
question is raised concerning the reliability of the datasets indicated by the SSIM miss most of the main edges. Also,
∗
regarded as ground truths for novel edge detection methods. Pm , Fα∗ , χ2∗ and d4 indicate an ideal edge image composed of
Thus, an incomplete ground truth penalizes a method detecting numerous false positive pixels (d4 obtains this result because
true boundaries and efficient edge detection algorithms obtain most statistics include this measure). On the contrary, the
between 30% and 40% of errors. Furthermore, contrary to statistical measure Φ, and the distance measures H, Dp and
the previous experiment, where only noise is added to the ∆k lose most of the desired boundaries; furthermore, the
synthetic image, the true contours are only reported by human ideal threshold for F oMe and Υ is 1. Ideal contour images
k
perception and, locally, the image intensities (for example the concerning Sk=2 , H5% , f2 d6 , F oM , SF oM and M F oM are
texture) create some disturbances which are nether spatially less polluted by FPs. Finally, the ideal contour map computed
positioned as a result of the filtering technique, nor humanly for Ψ is less noisy and better-quality, leading to a compromise
perceptible. Unfortunately, this type of contour can be ampli- between optimum under- and over-segmentation.
fied by a strong gradient created by the edge detector [8]. As
these undesirable pixels could be far from Gt , however, an
edge detection method must converge to a threshold which 3) Comparison of several edge detectors by filtering: In
preserves acceptable contours, close to the ground truth, and this section, we compare the score of different evaluation
removes undesirable contour pixels. In other words, the final measures involving real images (Figs. 20(b) and 26(a)) and 9
shapes created by the contours of the binarized image should filtering edge detectors: Sobel [53], Shen [50], Bourennane [6],
be similar, especially nearby the edges pixels of Gt . Thus, in Deriche [11], Canny [8], Steerable filter of order 1 (SF1 ) [15],
this case, the segmented images are totally different from Gt Steerable filter of order 5 (SF5 ) [25], Anisotropic Gaussian
than those in the synthetic case. For example, the ideal edges Kernels (AGK) [16], Half Gaussian Kernels (H-K) [35]. Table
∗
(a) Image 512×512 (b) Gt , |Gt | = 5951 (c) Thin edges , σ = 3 (d) Pm , Fα , χ2∗ , d4 th = 0.27 (e) Φ, Dp , th = 0.04
k k k
(h) SSIM , Dk=2 , th = 0.57 (i) H, th = 0.1 (j) H5% , th = 0.16 (k) f2 d6 , Sk=1 , F , th = 0.18 (l) Sk=2 , F oM , th = 0.17
(m) ∆k , th = 0.07 (n) Ψ, th = 0.23 (o) Detected edges of (j) (p) Detected edges of (m) (q) Detected edges of (n)
Fig. 19. Comparison of the minimal score for several edge detection evaluation measures concerning a real image and a hand-made ground truth. The edges
are extracted using Canny’s theory (σ = 3) on the real image in (a) and the binary images correspond to the thin edge image in (c) which is thresholded at
the value (th) corresponding to the minimum score for each measure. The best images for the measures Sk=1 k , f2 d6 , SF oM and M F oM are the same.
The optimal threshold for the over-segmentation measures ΘδT H =1 or 5 , Γ is th = 1, and is th = 0 for the under-segmentation measure ΩδT H =1 or 5 .
V shows how gradients are computed using these methods. 22 and 24). Worse still, concerning the image in Fig. 26(a), the
The parameters of the filters are chosen to keep the same scores of AGK are qualified as the second worst segmentation.
spatial support for the derivative information (see Fig. 27(o)). The segmented images with respect to F oM remain noisy
As above, the comparison is the same: the segmented image (Fig. 23) and the scores in Fig. 27(h) penalize both AGK and
corresponds to the one having the minimum score for the H-K filters almost like the Sobel filter.
considered measure. Finally, segmented images using f2 d6 , S k and Ψ are close
Box [53] and exponential [50] [6] filters do not delocalize to the ground truth edge image and, the more efficient the
contour points [28] whereas they are sensitive to noise (i.e., edge detection filter is, the more the main contours are visible
addition of FPs). The Deriche [11] and Gaussian filters [8] with reasonable FPs and the scores decrease in function of
are less sensitive to noise but suffer from rounding corners the effectiveness of the filter used. On the contrary, other
and junctions (see [28]) as the oriented filters SF1 [15], SF5 evaluation measures give either segmented images with high
[25] and AGK [16], but the more the 2D filter is elongated, the level of FPs, or incoherent scores (bars in Figs. 20 and 27).
more the segmentation remains robust against noise. Finally,
as a compromise, H-K correctly detects contours points having TABLE V
corners and is robust agains noise [35]. Consequently, the G RADIENT MAGNITUDE AND ORIENTATION COMPUTATION FOR A SCALAR
scores of the evaluation measures for the first 3 filters must IMAGE I WHERE Iθ REPRESENTS THE IMAGE DERIVATIVE USING A
FIRST- ORDER FILTER AT THE θ ORIENTATION ( IN RADIANS ).
be higher than the three last ones. Furthermore, the ideal
Type of operator Gradient magnitude Gradient direction
segmented image should be visually closer to the ground truth Fixed operator q
Iπ/2

edge image concerning SF5 , AGK and H-K. Thus, Fig. 21 [53], [50], [6], |∇I| = I02 + Iπ/2
2 η = arctan
[11], [8] I0
illustrates that Fα gives coherent segmentation and scores
Oriented Filters π
even though it is high noisy for images (a)-(f) but the scores [15], [25], [16]
|∇I| = max |Iθ |
θ∈[0,π[
η = arg max|Iθ | +
θ∈[0,π[ 2
are not consistent for the noisy image in Fig. 26(a), see Fig. !
Half Gaussian
27(a). H and ∆k measures are very sensitive to FPs, therefore, Kernels [35]
|∇I| = max Iθ −
θ∈[0,2π[
min Iθ
θ∈[0,2π[
η= arg max Iθ + arg min Iθ /2
θ∈[0,2π[ θ∈[0,2π[
important contours are missing in the segmented images (Figs.
0.9
0.8
0.7
Sobel
Shen
Measure score
0.6 Bourennane
Deriche
0.5
Canny
0.4 SF1
0.3 SF5
0.2
AGK
H−K
0.1
0
∗
(a) Original image 577×419 (b) Noisy image, PSNR = 15 dB (c) Image surface in the square (d) Ground truth edge map (e) Pm
1 0.8 25 6 0.7
0.9
0.7 0.6
5
0.8 Sobel Sobel 20 Sobel Sobel Sobel
0.6
Shen Shen Shen Shen Shen
Measure score
Measure score
Measure score
Measure score
Measure score
0.7 0.5
Bourennane Bourennane Bourennane 4 Bourennane Bourennane
0.5
0.6 Deriche Deriche 15 Deriche Deriche Deriche
0.4
0.5 Canny 0.4 Canny Canny 3 Canny Canny
SF1 SF1 SF1 SF1 0.3 SF1
0.4 10
0.3
SF5 SF5 SF5 2 SF5 SF5
0.3 0.2
AGK 0.2 AGK AGK AGK AGK
0.2 5
H−K H−K H−K 1 H−K H−K
0.1 0.1
0.1
0 0 0 0 0
(f) χ∗
2 (g) Φ∗ (h) H5% (i) f2 d6 (j) F
0.7 0.7 0.18 4.5 8
0.16 4 7
0.6 0.6
Sobel Sobel 0.14
Sobel 3.5
Sobel Sobel
6
Measure score
Measure score
Measure score
Measure score
Measure score
0.5 0.5
Bourennane Bourennane 0.12 Bourennane 3 Bourennane Bourennane
5
0.4 Deriche 0.4 Deriche Deriche Deriche Deriche
0.1 2.5
Canny Canny Canny Canny 4 Canny
0.3 SF1 0.3 SF1 0.08 SF1 2 SF1 SF1
3
SF SF 0.06 SF 1.5 SF SF
5 5 5 5 5
0.2 0.2
AGK AGK 0.04
AGK 1
AGK 2 AGK
0.1
H−K 0.1
H−K H−K H−K H−K
0.02 0.5 1
0 0 0 0 0
k k
(k) d4 (l) M F oM (m) Dp (n) Sk=1 (o) Sk=2
Fig. 20. Comparison of the minimal score for several edge detection evaluation measures concerning a real image and several edge detection methods.
(a) Sobel (b) Shen (c) Bourennane (d) Deriche (e) Canny
0.9
0.8
0.7
Sobel
Shen
Measure score
0.6 Bourennane
Deriche
0.5
Canny
0.4 SF1
0.3 SF
5
0.2
AGK
H−K
0.1
(f) SF1 (g) SF5 (h) AGK (i) H-K (j) Minimum scores of Fα∗
Fig. 21. Comparison of the minimal score for the Fα measure concerning a real image and several edge detection methods.
90
80
70
Sobel
Shen
Measure score
60 Bourennane
Deriche
50
Canny
40 SF1
30 SF5
20
AGK
H−K
10
(f) SF1 (g) SF5 (h) AGK (i) H-K (j) Minimum scores of H
Fig. 22. Comparison of the minimal score for the H measure concerning a real image and several edge detection methods.
V. C ONCLUSION the number of false positive, false negative, true positive or

true negative points are detailed. Firstly, examples of the
This study presents a review of supervised edge detection
evaluation of contour images prove the need to evaluate an
assessment methods. A supervised evaluation process esti-
edge detector using assessments involving distance measures.
mates scores between two binary images: a ground truth and
Fifteen distance measures are therefore also evaluated. The
a candidate edge map. Eight statistical evaluations based on
0.7
0.6
Sobel
Shen
Measure score
0.5
Bourennane
0.4 Deriche
Canny
0.3 SF1
SF5
0.2
AGK
H−K
0.1
(f) SF1 (g) SF5 (h) AGK (i) H-K (j) Minimum scores of F oM
Fig. 23. Comparison of the minimal score for the F oM measure concerning a real image and several edge detection methods.
18
16
14
Sobel
Shen
Measure score
12 Bourennane
10
Deriche
Canny
8 SF1
6 SF5
4
AGK
H−K
2
(f) SF1 (g) SF5 (h) AGK (i) H-K (j) Minimum scores of ∆k
Fig. 24. Comparison of the minimal score for the ∆k measure concerning a real image and several edge detection methods.
0.07
0.06
Sobel
Shen
Measure score
0.05
Bourennane
0.04 Deriche
Canny
0.03 SF1
SF5
0.02
AGK
0.01
H−K
(f) SF1 (g) SF5 (h) AGK (i) H-K (j) Minimum scores of Ψ
Fig. 25. Comparison of the minimal score for the Ψ measure concerning a real image and several edge detection methods.
latter compute a score depending mainly on the false positive The rest of the paper is dedicated to the experimental
and/or negative distances. By pointing out the drawbacks and results for synthetic and real data. Most of the edge detection
advantages of each evaluation measure, in this work a new evaluation methods are subjected to the following studies:
measure is proposed which takes into account the distance of addition of false negative points and/or false positive points,
all the misplaced pixels: false positives and false negatives. translation of the boundary, remoteness of false positive and
Moreover, one of the advantages of the developed method false negative contour chains. Finally, as the minimum value
is a function which normalizes the evaluation. Indeed, the of an edge detection evaluation corresponds to the threshold
score is close to zero concerning good segmentation and at which the edge detection is the best, the last experiment
increases to one concerning poor edge detection. The influence concerns the minimum value of the measures on image edges
of the parameters of the new evaluation measure is detailed, compared to the ground truth. This enables the performance
theoretically and with concrete cases. of edge detection algorithms to be studied based on filtering
(a) Original image (b) Ground truth edge map (c) Sobel (d) Shen (e) Bourennane (f) Deriche
0.07
0.06
Sobel
Shen
Measure score
0.05
Bourennane
0.04 Deriche
Canny
0.03 SF1
SF
5
0.02
AGK
0.01
H−K
(g) Canny (h) SF1 (i) SF5 (j) AGK (k) H-K (l) Minimum scores of Ψ
Fig. 26. Comparison of the minimal score for the Ψ measure concerning a real image and several edge detection methods.
0.7 0.9 1 0.8 120
0.8 0.9
0.6 0.7
100
Sobel 0.7
Sobel 0.8 Sobel Sobel Sobel
0.6
Measure score
Measure score
Measure score
Measure score
Measure score
0.5 0.7
Bourennane 0.6 Bourennane Bourennane Bourennane 80 Bourennane
0.5
Deriche Deriche 0.6 Deriche Deriche Deriche
0.4 0.5
Canny Canny 0.5 Canny 0.4 Canny 60 Canny
0.3 SF1 0.4 SF1 SF1 SF1 SF1
0.4
0.3
SF5 0.3 SF5 SF5 SF5 40 SF5
0.2 0.3
AGK 0.2
AGK AGK 0.2 AGK AGK
0.2
H−K H−K H−K H−K 20 H−K
0.1 0.1
0.1 0.1
0 0 0 0 0
(a) Fα∗ ∗
(b) Pm (c) χ∗
2 (d) P hi∗ (e) H
30 5 0.45 0.45 0.7
4.5 0.4 0.4

0.6
25
Sobel 4 Sobel 0.35
Sobel 0.35
Sobel Sobel
Measure score
Measure score
Measure score
Measure score
Measure score
3.5 0.5
20 Bourennane Bourennane 0.3 Bourennane 0.3 Bourennane Bourennane
Deriche 3 Deriche Deriche Deriche Deriche
0.25 0.25 0.4
15 Canny 2.5 Canny Canny Canny Canny
SF1 SF1 0.2 SF1 0.2 SF1 0.3 SF1
2
10 SF SF 0.15 SF 0.15 SF SF
5 1.5 5 5 5 5
0.2
AGK AGK 0.1
AGK 0.1
AGK AGK
1
5 H−K H−K H−K H−K 0.1
H−K
0.5 0.05 0.05
0 0 0 0 0
(f) H5% (g) f2 d6 (h) F oM (i) F (j) d4

0.7 4.5 12 30
Shen filter, α = 1.103
4 Bourenanne filter, s = 5.13
0.6
10 25
Sobel 3.5
Sobel Sobel Sobel Deriche filter, α = 1.46
Shen Shen Shen Shen Gaussian deriative, σ = 1.5
Measure score
Measure score
Measure score
Measure score
0.5
Bourennane 3 Bourennane 8 Bourennane 20 Bourennane 1
First order discret filters
0.4 Deriche Deriche Deriche Deriche

2.5
Canny Canny 6 Canny 15 Canny 0.5
0.3 SF1 2 SF1 SF1 SF1
SF5 1.5 SF5 4 SF5 10 SF5 0
0.2
AGK 1
AGK AGK AGK
−0.5
0.1
H−K H−K 2 H−K 5 H−K
0.5
−1
0 0 0 0 −10 −8 −6 −4 −2 0 2 4 6 8 10
k k
(k) M F oM (l) Sk=1 (m) Sk=1 (n) ∆k (o) Profile of the different filters
Fig. 27. Comparison of the minimal score for several edge evaluation measures concerning several edge detection methods of the image in Fig. 26 (a).
through the optimal threshold obtain by the supervised as- [3] M. Basseville, Distance measures for signal processing and pattern
sessment. Therefore, as the proposed function of dissimilarity recognition, Signal Proc.: Vol. 18(4), pp. 349–369, 1989.
[4] E. Baudrier, F. Nicolier, G. Millon and S. Ruan, Binary-image comparison
seems sufficiently robust to evaluate binary edge maps, we with local-dissimilarity quantification, Pattern Recognition: Vol. 41(5),
plan in a future study to compare more deeply several edge pp. 1461–1478, 2008.
detection algorithms using the optimum threshold computed [5] I. A.G. Boaventura and A. Gonzaga, Method to evaluate the performance
of edge detector, Brazilian Symposium on Comp. Graphics and Image
by the minimum of the evaluation. Proc., 2009.
[6] E. Bourennane, P. Gouton, M. Paindavoine and F. Truchetet, General-
R EFERENCES ization of Canny–Deriche filter for detection of noisy exponential edge,
Signal Proc.: Vol. 82(10), pp. 1317–1328, 2002.
[1] I. E. Abdou and W. K. Pratt, Quantitative design and evaluation of [7] K. Bowyer, C. Kranenburg and S. Dougherty, Edge detector evaluation
enhancement/thresholding edge detectors, Proc. of the IEEE: Vol. 67(5), using empirical ROC curves, Comp. Vision and Image Understanding:
pp. 753–763, 1979. pp. 77–103, 2001.
[2] A. J. Baddeley, An error metric for binary images, Robust Comp. [8] J. Canny, A computational approach to edge detection, IEEE Trans. on
Vision: Quality of Vision Algorithms: pp. 59–78, 1992. Pattern Analysis and Machine Intel.: Vol. 6, pp. 679–698, 1986.
[9] S. Chabrier, H. Laurent, C. Rosenberger, and B. Emile, Comparative study [35] B. Magnier, P. Montesinos and D. Diep, Fast anisotropic edge detection
of contour detection evaluation criteria based on dissimilarity measures, using gamma correction in color images, IEEE Int. Symposium on
EURASIP J. on Image and Video Proc.: Vol. 2008, pp. 1–13, 2008. Image and Signal Proc. and Analysis, pp. 212–217, 2011.
[10] D. Demigny, On optimal linear filtering for edge detection, IEEE [36] B. Magnier, A. Aberkane, P. Borianne, P. Montesinos, and C. Jourdan,
Trans. on Image Proc., Vol.11(7), pp.728–737, 2002. Multi-scale crest line extraction based on half gaussian kernels, IEEE
[11] R. Deriche, Using Canny’s criteria to derive a recursively implemented Int. Conf. on Acoustics, Speech, and Signal Proc., pp. 5105–5109, 2014.
optimal edge detector, Int. J. Computer Vision, Vol. 1, pp. 167–187, [37] B. Magnier, A. Le and A. Zogo, A quantitative error measure for the
1987. evaluation of roof edge detectors, IEEE Int. Conf. on Imaging Systems
[12] E. Deutsch and J. R. Fram, A quantitative study of the orientation bias and Techniques, pp. 429–434, 2016.
of some edge detector schemes, IEEE Trans. on Computers: Vol. 27(3), [38] D. Marr and E. Hildreth, Theory of Edge Detection, Proceedings of
pp. 205–213, 1978. the Royal Society of London. Series B, Biological Sciences: Vol. 207
[13] M.P. Dubuisson, A.K. Jain, A modified Hausdorff distance for object (1167), pp. 187–217, 1980.
matching, 12th IAPR Int. Conf. on Pattern Recognition: Vol. 1, pp. [39] D. R. Martin, C. C. Fowlkes and J. Malik, Learning to detect natural
566–568, 1994. image boundaries using local brightness, color, and texture cues, IEEE
[14] N. L. Fernández-Garcıa, R. Medina-Carnicer, A. and Carmona-Poyato, F. Trans. on Pattern Analysis and Machine Intel.: Vol. 26(5), pp. 530–549,
J. Madrid-Cuevas and M. Prieto-Villegas, Characterization of empirical 2004.
discrepancy evaluation measures, Pattern Recognition Letters: Vol. [40] C. Odet, B. Belaroussi and H. Benoit-Cattin, Scalable discrepancy
25(1), 35–47, 2003. measures for segmentation evaluation, IEEE Int. Conf. on Image Proc.:
[15] W. T. Freeman and E. H. Adelson, The design and use of steerable Vol. 1, pp. 785–788, 2002.
filters, IEEE TPAMI: Vol. 13, pp. 89–906, 1991. [41] N. Otsu, A threshold selection method from gray-level histograms,
[16] J.M. Geusebroek, A. Smeulders, and J. van de Weijer, Fast anisotropic IEEE Trans. on Systems Man and Cybernetics: Vol. 9, pp. 62-66, 1979.
gauss filtering, ECCV 2002, pp. 99–112, 2002. [42] K. Panetta, C.Gao, S.Agaian,and S.Nercessian, Non-reference medical
[17] J. Gimenez, J. Martinez and A. G. Flesia, Unsupervised edge map image edge map measure, J. of Bio. Imaging: Vol. 2014, pp. 1–12,
scoring: A statistical complexity approach, Comp. Vision and Image 2014.
Understanding: Vol. 122, pp. 131–142, 2014. [43] K. Panetta, C. Gao, S. Agaian and S. Nercessian, A New Reference-
[18] A. B. Goumeidane, M. Khamadja, B., Belaroussi, H., Benoit-Cattin and Based Edge Map Quality Measure, IEEE Trans. on Systems Man and
C. Odet, New discrepancy measures for segmentation evaluation, IEEE Cybernetics: Systems: Vol. 46(11), pp. 1505–1517, 2016.
ICIP: Vol. 2, pp. 411–414, 2003. [44] G. Papari and N.Petkov, Edge and line oriented contour detection: State
[19] C. Grigorescu, N. Petkov and M.A. Westenberg, Contour detection of the art, Image and Vision Computing: Vol. 29(2) , pp. 79–103, 2011.
based on nonclassical receptive field inhibition, IEEE Trans. on Image [45] J. Paumard, Robust comparison of binary images, Pattern Recognition
Proc.: Vol. 12(7), pp. 729–739, 2003. Letters: Vol. 18(10), pp. 1057–1063, 1997.
[20] R. M. Haralick, Digital step edges from zero crossing of second [46] T. Peli and D. Malah, A study of edge detection algorithms, CGIP:
directional derivatives, IEEE Trans. on Pattern Analysis and Machine Vol. 20(1), pp. 1–21, 1982.
Intel.: Vol. 6(1), pp. 58–68, 1984. [47] A. J. Pinho and L. B.Almeida, Edge detection filters based on artificial
neural networks. In Int. Conf. on Image Analysis and Proc., Springer
[21] M. D. Heath, S. Sarkar, T. Sanocki, and K. W. Bowyer, A robust
Berlin Heidelberg, pp. 159–164, 1995.
visual method for assessing the relative performance of edge-detection
[48] R. Román-Roldán, J. F. Gómez-Lopera, C. Atae-Allah, J. Martı’nez-
algorithms, IEEE Trans. on Pattern Analysis and Machine Intel.: Vol.
Aroza and P.L. Luque-Escamilla, A measure of quality for evaluating
19(12), pp. 1338–1359, 1997.
methods of segmentation and edge detection, Pattern Recognition: Vol.
[22] B. Hemery, H. Laurent, B. Emile, and C. Rosenberger, Comparative
34(5), 969–980, 2001.
study of localization metrics for the evaluation of image interpretation
[49] A. Rosenfeld and M. Thurston, Edge and curve detection for visual
systems, J. of Elec. Imaging: Vol. 19(2), pp. 023017–023017, 2010.
scene analysis, IEEE Trans. on Comp.: Vol. 100(5), pp. 562–569,
[23] X. Hou, A. Yuille and C. Koch, Boundary detection benchmarking: 1971.
Beyond f-measures, IEEE Conference on Comp. Vision and Pattern [50] J. Shen and S. Castan, An optimal linear operator for edge detection,
Recognition, pp. 2123–2130, 2013. IEEE CVPR, Vol. 86, pp. 109–114, 1986.
[24] D. P. Huttenlocher and W. J. Rucklidge, A multi-resolution technique [51] P. Sneath and R. Sokal, Numerical taxonomy. The principles and practice
for comparing images using the Hausdorff distance, IEEE Trans. on of numerical classification, 1973.
Pattern Analysis and Machine Intel.: Vol. 15(9), pp. 850–863, 1993. [52] K. C. Strasters and J. J. Gerbrands, Three-dimensional image segmen-
[25] M. Jacob and M. Unser, Design of steerable filters for feature detection tation using a split, merge and group approach, Pattern Recognition
using canny-like criteria, IEEE TPAMI, Vol. 26(8), pp. 1007–1019, Letters: Vol. 12(5), pp. 307–325, 1991.
2004. [53] I.E. Sobel, Camera Models and Machine Perception, PhD Thesis,
[26] R. K. Falah, P. W. Bolon, J. P. Cocquerez, A Region-Region and Region- Stanford University, 1970.
Edge Cooperative Approach of Image Segmentation, IEEE Int. Conf. [54] V. Torre and T.A. Poggio, On edge detection, IEEE Trans. on Pattern
on Image Proc.: Vol. 3, pp. 470–474, 1994. Analysis and Machine Intel.: Vol. 8, (2), pp. 147–163, 1986.
[27] U. Köthe, Reliable low-level image analysis, Habilitation thesis, 2007. [55] R. Usamentiaga, D. F. Garcı́a, C. López, and D. González, A method
[28] O. Laligant, F. Truchetet, F. Meriaudeau, Regularization preserving for assessment of segmentation success considering uncertainty in the
localization of close edges, IEEE Signal Proc. Letters, Vol. 14(3), edge positions, EURASIP J. on Applied Signal Proc., Vol. 2006, pp.
185–188, 2007. 207–207, 2006.
[29] S. U. Lee, S. Y. Chung, and R. H. Park, A comparative perfor- [56] S. Venkatesh and P.L. Rosin, Dynamic threshold determination by local
mance study of several global thresholding techniques for segmentation, and global edge evaluation, Comp. Vision, Graphics, and Image Proc.:
CVGIP, Vol. 52(2), pp. 171–190, 1990. Vol. 57(2), pp. 146–160, 1995.
[30] C. Lopez-Molina, H. Bustince, J. Fernandez, P. Couto and B. De Baets, [57] Z. Wang, , A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, Image Quality
A gravitational approach to edge detection based on triangular norms, Assessment: from Error Visibility to Structural Similarity, IEEE Trans.
Pattern Recognition: Vol. 43(11), pp. 3730–3741, 2010. on Pattern Analysis and Machine Intel.: Vol. 13(4), pp. 600–612, 2004.
[31] C. Lopez-Molina, B. De Baets and H. Bustince, Quantitative error [58] D. L. Wilson, A. J. Baddeley and R. A. Owens, A new metric for grey-
measures for edge detection, Pattern Recognition: Vol. 46(4), pp. 1125– scale image comparison, Int. J. of Comp. Vision: Vol. 24(1), 5–17,
1139, 2013. 1997.
[32] C. Lopez-Molina, M. Galar, H. Bustince, and B. De Baets, On the impact [59] W.A. Yasnoff, W. Galbraith and J.W. Bacus, Error measures for ob-
of anisotropic diffusion on edge detection, Pattern Recognition: Vol. 47 jective assessment of scene segmentation algorithms, Analytical and
(1), pp. 270–281, 2014. Quantitative Cytology: Vol. 1(2), pp. 107–121, 1978.
[33] B. Magnier, F., Comby, O., Strauss, J., Triboulet and C. Demonceaux [60] Y. Yitzhaky and E. Peli, A method for objective edge detection evaluation
Highly Specific Pose Estimation with a Catadioptric Omnidirectional and detector parameter selection, IEEE Trans. on Pattern Analysis and
Camera, IEEE Int. Conf. on Imaging Systems and Techniques, pp. Machine Intel., vol. 25, no. 8, pp. 10271033, 2003.
229–233, 2010. [61] C. Zhao, W. Shi and Y. Deng, A new Hausdorff distance for image
[34] B. Magnier, P. Montesinos, and D. Diep, Texture Removal by Pixel matching, Pattern Recognition Letters: Vol. 26(5), pp. 581–586, 2005.
Classification using a Rotating Filter, IEEE Int. Conf. on Acoustics,
Speech, and Signal Proc., pp. 1097–1100, 2011.

2017 2 Col Bare JRNL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2017 2 Col Bare JRNL

Uploaded by

Copyright:

Available Formats

Edge detection: a review of dissimilarity evaluations and

a proposed normalized measure

To cite this version:

HAL Id: hal-01940231

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

Edge Detection: A Review of Dissimilarity

Abstract—In digital images, edges characterize object bound-

Error measure name Formulation Parameters

Complemented P erf ormance measure ∗ (G , D ) = 1 − TP

with: P2 (L, C) = 0 P2 (C, L) = 0 P2 (C, C1 ) = 0.1429

Error measure name Formulation Parameters

Oversegmentation 1 P  dGt (p) k for [40]: k ∈ R+ and

Undersegmentation 1 P  dDc (p) k for [40]: k ∈ R+ and

Symmetric distance mea- F P +F N

most popular descriptors is the Figure of Merit (F oM ). This

because, for p ∈ Dc ∩ Gt , d2Gt (p) = 0 and 1+κ·d12 (p) = 1.

Ψ(Gt , D1 ) = 0.46 Ψ(Gt , D2 ) = 0.72 0.6 F, g = 0.1

Fig. 5. Results of evaluation measures. For the two candidate 0

|T P −max(|Gt |,|Dc |)| FN FP

T1 100 12.21 19 19 0.5 2 2 1.91 1.93 0.031 0.031 1.15

following input parameters |Gt |, |Dc |, F N and F P , whereas 0.35

the image dimensions should not be considered. Thus, the 0.3

coefficient parameter F P|G+F t|

C. Normalization of the edge detection evaluation 0

|Gt | = 200 |Gt | = 2000 |Gt | = 5000

Υ, Dk , ∆k and S k (with k = 2) are sensitive to FPs at the

Fig. 11. Evolution of the dissimilarity measures in function of the number

0.3 0.6 FoM, κ = 0.1

function of the threshold. 1

SSIM 0.6 FoM, κ = 0.1

to an under-segmentation measure. F oMe , Dk and Θ are 0.3

throughout the test after 5% of degradation and that ∆k is 0 2 4 6 8 10

nearly the same. Finally, χ2∗ , Fα∗ , F oM , F , d4 , SF oM , 1

M F oM , KP IΓ , KP IΨ , f2 d6 and S k (k = 1) ensure a 0.9 ϒ

stagnates around 0.35, then d4 and SF oM around 0.6. 0.7 Dk, k = 2

(g) Fα (h) H (i) H5% (j) D k , k = 2 (k) SSIM (l) f2 d6

(m) S k , k = 2 (n) F oM (o) F (p) ∆k (q) Dp (r) KP IΨ

V. C ONCLUSION the number of false positive, false negative, true positive or

0.7 0.9 1 0.8 120

4.5 0.4 0.4

(f) H5% (g) f2 d6 (h) F oM (i) F (j) d4

0.4 Deriche Deriche Deriche Deriche

You might also like

Oversegmentation 1 P dGt (p) k for [40]: k ∈ R+ and

Undersegmentation 1 P dDc (p) k for [40]: k ∈ R+ and